Fork me on GitHub
#clojars
<
2021-02-24
>
borkdude11:02:59

@tcrawley food for thought (in screenshot, conversation from #tools-deps) https://github.com/borkdude/deps-infer

borkdude13:02:58

@tcrawley I think for the purpose of libs like these, it would be super awesome if clojars had some kind of index of jars + the list of files in each jar, as EDN, or transit, which refreshed every so often (daily, weekly, monthly)

tcrawley14:02:21

I think that would be great! I'm focused on adding group validation currently, but we could tackle this afterward. Do you have code already that will generate the index for a single jar?

borkdude14:02:44

@tcrawley Yeah, this code is in https://github.com/borkdude/deps-infer We could work on this together if you want. The part I do not control is the "ops" side, but I can write the "script" that produces the index from a dir of jars

tcrawley14:02:04

A script to processes a sparse maven repo dir would do the trick. "sparse" meaning it is in the correct shape (`group-name/artfact-name/0.1.0/artifact-name-0.1.0.jar`), but has no pom files. The repo is in s3, but we sync down all of the jar files nightly in order to generate the maven-style indexes for tooling, and could generate this index as part of that process.

tcrawley14:02:41

We could then upload these ns indexes to s3 alongside the feeds/jar lists: https://github.com/clojars/clojars-web/wiki/Data#list-of-jars-and-versions-in-leiningen-syntax

borkdude14:02:42

Sounds excellent

borkdude15:02:32

@tcrawley Right now I have some code which walks over a dir with .jar files and produces one huge map:

{accountant.core
 [{:mvn/version "0.2.5",
   :file "accountant/core.cljs",
   :group-id "venantius",
   :artifact "accountant"}],
 adzerk.boot-cljs
 [{:mvn/version "2.1.5",
   :file "adzerk/boot_cljs.clj",
   :group-id "adzerk",
   :artifact "boot-cljs"}],
 adzerk.boot-cljs-repl
 [{:mvn/version "0.4.0",
   :file "adzerk/boot_cljs_repl.clj",
   :group-id "adzerk",
   :artifact "boot-cljs-repl"}],
 adzerk.boot-cljs.impl
 [{:mvn/version "2.1.5",
   :file "adzerk/boot_cljs/impl.clj",
   :group-id "adzerk",
   :artifact "boot-cljs"}],
 adzerk.boot-cljs.js-deps
 [{:mvn/version "2.1.5",
   :file "adzerk/boot_cljs/js_deps.clj",
   :group-id "adzerk",
   :artifact "boot-cljs"}],
 adzerk.boot-cljs.middleware
 [{:mvn/version "2.1.5",
   :file "adzerk/boot_cljs/middleware.clj",
   :group-id "adzerk",
   :artifact "boot-cljs"}],

borkdude15:02:04

Perhaps it would be better to partition this into multiple files

borkdude15:02:16

For my local .m2 dir the file is 130822 lines long

borkdude15:02:13

@tcrawley I have this code here: https://github.com/borkdude/deps-infer/blob/main/src/deps_infer/clojars.clj It prints to stdout. You can run it with clojure -M -m deps-infer.clojars > /tmp/index.edn

borkdude15:02:10

This file takes 200ms to parse to EDN on my machine which is still quite ok

borkdude15:02:19

But for the entire clojars it might get a little bit bloated

borkdude15:02:54

You can change the location of the dir it scans for .jar files with --repo

tcrawley15:02:08

Thanks! I'll see if I can find some time today to kick this off on the server to see how long it takes and how large of a file it produces.

borkdude16:02:46

I produced both an .edn and .transit file and zipped both, here's how it looks on my machine:

$ ls -la /tmp/index*
-rw-r--r--  1 borkdude  wheel  4363922 Feb 24 16:07 /tmp/index.edn
-rw-r--r--  1 borkdude  wheel   214482 Feb 24 17:00 /tmp/index.edn.zip
-rw-r--r--  1 borkdude  wheel  3594066 Feb 24 16:59 /tmp/index.transit.json
-rw-r--r--  1 borkdude  wheel   393184 Feb 24 17:01 /tmp/index.transit.zip
Funnily enough, the zipped edn looks better than the zipped transit.

tcrawley13:02:08

I tried to run it on the repo cached on the server last night, but realized my recollection of how we build the maven index was wrong - we pull down the poms, not the jars for indexing :( However, I think we could: • pull down the jars once and index those, then store the index in s3 • index new jars as they are deployed, then merge with the existing index This should work since existing releases are immutable. We could also store the index as many timestamped files - that would allow clients to be able to cache the index, pulling down new files and merging them. I suspect the full index file will be pretty large.

borkdude13:02:03

yeah, those are good ideas

borkdude13:02:07

I like the second idea

borkdude13:02:19

then we can just pull only the latest files

tcrawley13:02:21

Good deal. We should probably open an issue at https://github.com/clojars/clojars-web/issues/new/choose and continue this discussion there

borkdude09:02:09

I think it might be better to have one file per namespace actually, since the amount of namespaces to check is usually little and downloading the entire index would be wasteful in that case. Just one http request per namespace would be ideal.

borkdude09:02:25

If you agree, I can change the code to produce those files