Fork me on GitHub
#tools-deps
<
2019-08-28
>
onetom18:08:53

I'm wondering how can I test running clj for the 1st time in a deps.edn project, without rm -r ~/.m2 I remember in the past I could just do something like env M2_HOME=$(realpath .)/m2 boot repl or something like that, but now this is the only way I found to affect that path:

mvn -Dmaven.repo.local=$(realpath .)/m2 help:evaluate -Dexpression=settings.localRepository
but if I try something similar with clj, it still seem to use the default location (since it's not downloading anything):
rm -r .cpcache; time clj -J-Dmaven.repo.local=$(realpath .)/m2 -e 1

onetom18:08:31

And, btw, I'm still seeing abysmal performance when accessing Maven Central from Hong Kong. After rm -r ~/.m2, it took 9minutes to download 25MB of jars (just clojure, netty, aleph, compojure). While my connection is shit and hardly ever goes over 2MByte/s and it's below 1MB/s mostly, 9minutes is unreasonable. It seems to me that the main reason is the number of requests, which are also made serially (and might not even use keep-alive?) So I don't think only S3 is slow.

onetom18:08:45

I've also checked the related DNS chain:

.	1735	IN	CNAME	.
.	21599	IN	CNAME	.
.	29	IN	A	151.101.196.215
and that IP points to San Jose.

onetom18:08:54

I had similar issues with the nixos binary caches. When they switched from AWS CloudFront to Fastly (which is significantly cheaper, plus they actually sponsor them), then the performance from Hong Kong dropped drastically.

Alex Miller (Clojure team)18:08:01

you can set :mvn/local-repo in your deps.edn

Alex Miller (Clojure team)18:08:31

to use a different (presumably empty) local repo dir

onetom18:08:33

so not from the command line or environment?

Alex Miller (Clojure team)18:08:48

you can clj -Sdeps '{:mvn/local-repo "foo"}'

onetom18:08:09

but im not talking about specifying dependencies

onetom18:08:19

im talking about specifying the maven cache directory

Alex Miller (Clojure team)18:08:32

the local repo is the maven cache directory

Alex Miller (Clojure team)18:08:55

usually it's ~/.m2/repository

onetom18:08:04

indeed, it works

Alex Miller (Clojure team)18:08:29

are you using latest clj?

onetom18:08:50

where is this documented?

Alex Miller (Clojure team)18:08:59

clj -Sverbose will tell you

Alex Miller (Clojure team)18:08:36

we are making use of maven session caches as of that version, so would be interested in whether that makes a difference for you

onetom18:08:36

version = 1.10.1.466

Alex Miller (Clojure team)18:08:58

it should mostly cut down on repeated download of metadata files though, which are pretty tiny

onetom18:08:07

it would make sense to use a more common option for this, like -v; i keep forgetting it 😕

onetom18:08:46

thanks, i will give it a go

Alex Miller (Clojure team)18:08:59

jars are serially downloaded

Alex Miller (Clojure team)18:08:21

there are ways to change that, but, it's somewhat involved

onetom18:08:34

is there an off the shelf solution for setting up a maven repo manager specifically for clojure usage? the https://maven.apache.org/repository-management.html page mentions a lot of options. a few years ago i've tried a few of them but i was struggling to make them work. it should be practically a caching http proxy... i was really hoping to find some small, turn-key solution, which i would run in the cloud, so i can share it between the office an home

Alex Miller (Clojure team)18:08:49

there are several turn-key products for this, not sure if any qualify as "small"

Alex Miller (Clojure team)18:08:13

I've used Nexus a lot

4
Alex Miller (Clojure team)18:08:52

but there is also JFrog and Artifactory

Alex Miller (Clojure team)18:08:09

sorry, those are the same

Alex Miller (Clojure team)18:08:15

Artifactory is by JFrog

hiredman18:08:17

you can just use a caching http proxy too, you just don't get the nice management interfaces, and the ability to merge sources, etc

onetom18:08:09

latest nixpkgs contains 1.10.1.469 on its master branch. downloading it now:

uhu:multiboxx onetom$ nix-shell -I nixpkgs=~/nixpkgs/ -p clojure
these paths will be fetched (307.02 MiB download, 990.58 MiB unpacked):
  /nix/store/0d8mpzq2dah05xqd6i1c9g02blvsvcnj-bash-interactive-4.4-p23-man
looks like it will take about 10minutes. i will try a clean download of the same 25MB dependencies afterwards

Alex Miller (Clojure team)18:08:11

there are some benefits to using a maven-aware product I think

Alex Miller (Clojure team)18:08:20

I guess one question I have is why you care what the perf is if you do it just once and then cache it forever?

onetom18:08:29

@hiredman and what would you recommend as a caching http proxy? would it work with https? so for example datomic can be cached too?

onetom18:08:57

@alexmiller these things never happen only once. i have multiple personal computers, multiple office computers, more and more servers and hoping to add more colleagues too. these caches should be primed on all those filesystems and having a bad 1st experience doesn't help...

hiredman18:08:58

I think it could be done, but it is a trade off, if you really want something a lean caching solution a caching proxy would be that, but you will have to invest time getting it working

onetom18:08:20

also on CI systems it's good to be able to build stuff afresh from time to time at least

Alex Miller (Clojure team)18:08:26

I'm not trying to be flip about it, obviously perf matters, just trying to probe a little closer

Alex Miller (Clojure team)18:08:00

the structure of maven is designed that you don't need to refresh - these are immutable, uniquely versioned artifacts

Alex Miller (Clojure team)18:08:16

CI is one use case where we see this come up

onetom18:08:20

more specifically, i have 2 clojure projects im working on at the moment and i just wanted to know how much are their dependencies

hiredman18:08:21

at my last job we had a nexus setup that aggregated a number of different repos, and our builds were setup to only check our nexus, and that worked pretty well

Alex Miller (Clojure team)18:08:43

but most CI allow you to keep a maven cache alive

onetom18:08:22

i could have built an uber jar, but my knowledge is outdated on how to do that with t.d im just getting back to using clojure after a ~2yr break (when i was working with ethereum and js...)

Alex Miller (Clojure team)18:08:57

there might also be better maven central mirrors you could use from HK, not sure

Alex Miller (Clojure team)19:08:22

clj (latest) supports Maven mirrors

onetom19:08:38

i looked into the mirrors but most of them doesn't work. there is a UK one which is not offline but to do a performance test, i wanted to know how can i relocate the m2 cache, since it's not fun to delete my ~/.m2 directory 🙂

Alex Miller (Clojure team)19:08:47

there's one on google storage I know

Alex Miller (Clojure team)19:08:29

<settings>
  <mirrors>
    <mirror>
      <id>google-maven-central</id>
      <name>Google Maven Central</name>
      <url>https://maven-central.storage.googleapis.com</url>
      <mirrorOf>central</mirrorOf>
    </mirror>
  </mirrors>
</settings>

onetom19:08:56

never seen that being mentioned anywhere

Alex Miller (Clojure team)19:08:58

looks like they have an asia-pacific one there too

onetom19:08:24

i was googling for hong kong maven central proxy and came across these: • http://repo.maven.apache.org/maven2/.meta/repository-metadata.xmlhttps://maven.apache.org/guides/mini/guide-mirror-settings.html • and one more which was also listing ibiblio.{net,org}, which are down

onetom19:08:15

thanks a lot, @alexmiller ! if these work, it will really help with years of frustration 🙂

Alex Miller (Clojure team)19:08:05

please report back, would be very interested to hear if those are any better

onetom19:08:14

hmm... it took one minute to download 18MB, but only the 1st dependency was printed

[nix-shell:/Volumes/Data/lab/multiboxx]$ clj -Sverbose
version      = 1.10.1.469
install_dir  = /nix/store/0k1f3llrx4aggbgx7rhh70mrardqvrgz-clojure-1.10.1.469-prefix
config_dir   = /Users/onetom/.clojure
config_paths = /nix/store/0k1f3llrx4aggbgx7rhh70mrardqvrgz-clojure-1.10.1.469-prefix/deps.edn /Users/onetom/.clojure/deps.edn deps.edn
cache_dir    = .cpcache
cp_file      = .cpcache/1871601414.cp

Refreshing classpath

[nix-shell:/Volumes/Data/lab/multiboxx]$ time clj -e 1
Downloading: riddley/riddley/0.1.12/riddley-0.1.12.pom from 
1

real	1m18.496s
user	0m13.404s
sys	0m0.742s

[nix-shell:/Volumes/Data/lab/multiboxx]$ ls m2/
cheshire/        clj-tuple/       commons-io/      me/              riddley/         tigris/
clj-http/        com/             commons-logging/ org/             seancorfield/
clj-soup/        commons-codec/   etaoin/          potemkin/        slingshot/

onetom19:08:56

ah, nvm, i forgot to rm -r m2 .cpcache beforehand

onetom19:08:53

2nd run is printing all the downloaded deps:

[nix-shell:/Volumes/Data/lab/multiboxx]$ rm -r m2 .cpcache/; time clj -e 1; du -hsc m2
Downloading: org/clojure/clojure/1.10.1/clojure-1.10.1.pom from 
...
Downloading: org/apache/httpcomponents/httpmime/4.5.2/httpmime-4.5.2.jar from 
Downloading: com/fasterxml/jackson/dataformat/jackson-dataformat-smile/2.7.5/jackson-dataformat-smile-2.7.5.jar from 
Downloading: org/clojure/data.codec/0.1.0/data.codec-0.1.0.jar from 
1

real	1m30.756s
user	0m16.289s
sys	0m1.102s
18M	m2
18M	total
this is magnitudes better! i guess i should make a counter test with the old version just to confirm im not just experiencing some change in network conditions

Alex Miller (Clojure team)19:08:59

using -Sforce will let you force a classpath recompute (don't need to nuke your .cpcache then)

4
Alex Miller (Clojure team)19:08:31

is that difference with just the new clj version or the new clj version + mirror?

onetom19:08:08

only the clj version is different

onetom19:08:30

this is my deps.edn:

{:paths   ["src" "rsc"]
 :deps    {org.clojure/clojure                {:mvn/version "1.10.1"}
           org.clojure/data.csv               {:mvn/version "0.1.4"}
           clj-soup/clojure-soup              {:mvn/version "0.1.3"}
           me.raynes/fs                       {:mvn/version "1.4.6"}

           org.clojure/java.jdbc              {:mvn/version "0.7.9"}
           org.xerial/sqlite-jdbc             {:mvn/version "3.28.0"}
           com.microsoft.sqlserver/mssql-jdbc {:mvn/version "7.2.2.jre8"}
           seancorfield/next.jdbc             {:mvn/version "1.0.5"}
           etaoin                             {:mvn/version "0.3.5"}}
 :aliases {:test {:extra-paths ["test"]}}
 :mvn/local-repo "m2"
 :mvn/repos {
             "central" {:url ""}
             ;"uk"      {:url ""}
             "clojars" {:url ""}
             }}

onetom19:08:20

next thing after the old version finishes is to try the google mirror

onetom19:08:04

which i guess, i can simply do by replacing the :url for the "central" entry to the google mirror's url, right?

onetom19:08:44

but currently im seeing 7-30kbyte/s download rates with the old version, so it will take awhile...

onetom19:08:41

nope, it took only ~3mins:

uhu:multiboxx onetom$ clojure -Sverbose
version      = 1.10.1.466
install_dir  = /nix/store/m2v1xd7cybi544jyl77w3mzjw1xklw41-clojure-1.10.1.466-prefix
config_dir   = /Users/onetom/.clojure
config_paths = /nix/store/m2v1xd7cybi544jyl77w3mzjw1xklw41-clojure-1.10.1.466-prefix/deps.edn /Users/onetom/.clojure/deps.edn deps.edn
cache_dir    = .cpcache
cp_file      = .cpcache/1871601414.cp

Refreshing classpath
^Cuhu:multiboxx onetom$ rm -r m2 .cpcache/; time clj -e 1; du -hsc m2
rm: cannot remove '.cpcache/': No such file or directory
...
Downloading: org/clojure/data.codec/0.1.0/data.codec-0.1.0.jar from 
1

real	3m21.764s
user	0m18.844s
sys	0m1.207s
18M	m2
18M	total

onetom19:08:52

still, it's quite significant difference, so thanks a lot for telling me about this enhancement and for implementing these improvements!

onetom19:08:56

clj 1.10.1.466 mirror: https://maven-central-asia.storage-download.googleapis.com/repos/central/data/

uhu:multiboxx onetom$ rm -r m2; time clj -Sforce -e 1; du -hsc m2
...
real	1m40.781s
user	0m21.119s
sys	0m1.290s
18M	m2
18M	total

onetom19:08:51

2/3 of the dependencies come from maven central and the rest is clojars:

uhu:multiboxx onetom$ rg -c maven-central-asia 466-deps.txt
71
uhu:multiboxx onetom$ rg -c  466-deps.txt
22
uhu:multiboxx onetom$ wc -l 466-deps.txt
92 466-deps.txt
(it almost adds up :)

onetom19:08:28

clj 1.10.1.469 mirror: https://maven-central-asia.storage-download.googleapis.com/repos/central/data/

uhu:multiboxx onetom$ rm -r m2; time clj -Sforce -e 1; du -hsc m2
...
real	0m44.863s
user	0m13.960s
sys	0m1.059s
18M	m2
18M	total

onetom19:08:55

that consistently seems to be a more than 2x difference!

Alex Miller (Clojure team)19:08:15

pretty big combined change! :)

onetom19:08:18

although it's possible that im warming up some CDN caches...

onetom19:08:04

do you happen to know any asia-pacific clojars mirror by any chance? 🙂

onetom19:08:24

i will try this mirror on my other project too

Alex Miller (Clojure team)19:08:35

I don't believe there is anything region specific. they do have a cdn mirror though

Alex Miller (Clojure team)19:08:54

I have no idea of difference in perf though

onetom19:08:45

this google mirror understands HTTP2:

uhu:reap onetom$ curl -vso /dev/null --http2  2>&1 | rg -i http2
* Using HTTP2, server supports multi-use
is the HTTP client in clj talking HTTP2?

Alex Miller (Clojure team)19:08:22

that's several layers below the code I'm in, so don't know

Alex Miller (Clojure team)19:08:30

it would not surprise me if the answer was no

Alex Miller (Clojure team)19:08:34

the maven resolver uses org.apache.httpcomponents/httpclient and the version it's pinned on does not look like it supports http 2 (the newest series does though)

Alex Miller (Clojure team)19:08:02

so my strong guess would be: no, but it potentially could

onetom19:08:02

in my other project; 24MB of deps: clj 1.10.1.469 mirror: https://maven-central-asia.storage-download.googleapis.com/repos/central/data/

[nix-shell:~/lab/reap]$ rm -r m2; time clj -Sforce -e 1; du -hsc m2
...
real	2m53.788s
user	0m24.007s
sys	0m1.858s
24M	m2
24M	total

onetom20:08:30

clj 1.10.1.466 mirror: https://maven-central-asia.storage-download.googleapis.com/repos/central/data/

[nix-shell:~/lab/reap]$ rm -r m2; time clj -Sforce -e 1; du -hsc m2
...
real	3m33.490s
user	0m26.609s
sys	0m1.819s
24M	m2
24M	total
that's less of a difference, but still quite noticable (~20%) anyway, enough testing for tonight; it's 4am in HK, so i shall sleep. thanks again for your attention and advice!