Fork me on GitHub

@tcrawley food for thought (in screenshot, conversation from #tools-deps)


@tcrawley I think for the purpose of libs like these, it would be super awesome if clojars had some kind of index of jars + the list of files in each jar, as EDN, or transit, which refreshed every so often (daily, weekly, monthly)


I think that would be great! I'm focused on adding group validation currently, but we could tackle this afterward. Do you have code already that will generate the index for a single jar?


@tcrawley Yeah, this code is in We could work on this together if you want. The part I do not control is the "ops" side, but I can write the "script" that produces the index from a dir of jars


A script to processes a sparse maven repo dir would do the trick. "sparse" meaning it is in the correct shape (`group-name/artfact-name/0.1.0/artifact-name-0.1.0.jar`), but has no pom files. The repo is in s3, but we sync down all of the jar files nightly in order to generate the maven-style indexes for tooling, and could generate this index as part of that process.


We could then upload these ns indexes to s3 alongside the feeds/jar lists:


Sounds excellent


@tcrawley Right now I have some code which walks over a dir with .jar files and produces one huge map:

 [{:mvn/version "0.2.5",
   :file "accountant/core.cljs",
   :group-id "venantius",
   :artifact "accountant"}],
 [{:mvn/version "2.1.5",
   :file "adzerk/boot_cljs.clj",
   :group-id "adzerk",
   :artifact "boot-cljs"}],
 [{:mvn/version "0.4.0",
   :file "adzerk/boot_cljs_repl.clj",
   :group-id "adzerk",
   :artifact "boot-cljs-repl"}],
 [{:mvn/version "2.1.5",
   :file "adzerk/boot_cljs/impl.clj",
   :group-id "adzerk",
   :artifact "boot-cljs"}],
 [{:mvn/version "2.1.5",
   :file "adzerk/boot_cljs/js_deps.clj",
   :group-id "adzerk",
   :artifact "boot-cljs"}],
 [{:mvn/version "2.1.5",
   :file "adzerk/boot_cljs/middleware.clj",
   :group-id "adzerk",
   :artifact "boot-cljs"}],


Perhaps it would be better to partition this into multiple files


For my local .m2 dir the file is 130822 lines long


@tcrawley I have this code here: It prints to stdout. You can run it with clojure -M -m deps-infer.clojars > /tmp/index.edn


This file takes 200ms to parse to EDN on my machine which is still quite ok


But for the entire clojars it might get a little bit bloated


You can change the location of the dir it scans for .jar files with --repo


Thanks! I'll see if I can find some time today to kick this off on the server to see how long it takes and how large of a file it produces.


I produced both an .edn and .transit file and zipped both, here's how it looks on my machine:

$ ls -la /tmp/index*
-rw-r--r--  1 borkdude  wheel  4363922 Feb 24 16:07 /tmp/index.edn
-rw-r--r--  1 borkdude  wheel   214482 Feb 24 17:00 /tmp/
-rw-r--r--  1 borkdude  wheel  3594066 Feb 24 16:59 /tmp/index.transit.json
-rw-r--r--  1 borkdude  wheel   393184 Feb 24 17:01 /tmp/
Funnily enough, the zipped edn looks better than the zipped transit.


I tried to run it on the repo cached on the server last night, but realized my recollection of how we build the maven index was wrong - we pull down the poms, not the jars for indexing :( However, I think we could: • pull down the jars once and index those, then store the index in s3 • index new jars as they are deployed, then merge with the existing index This should work since existing releases are immutable. We could also store the index as many timestamped files - that would allow clients to be able to cache the index, pulling down new files and merging them. I suspect the full index file will be pretty large.


yeah, those are good ideas


I like the second idea


then we can just pull only the latest files


Good deal. We should probably open an issue at and continue this discussion there


I think it might be better to have one file per namespace actually, since the amount of namespaces to check is usually little and downloading the entire index would be wasteful in that case. Just one http request per namespace would be ideal.


If you agree, I can change the code to produce those files