clojars

2025-06-21T09:07:03.861129Z

i'd like to estimate the total disk space used by to-be-fetched .jar files from clojars before performing actual fetches. i was hoping that file size info for existing .jar files might already be in a single file i could fetch, but didn't succeed in finding one. the main place i looked was through the info here: https://github.com/clojars/clojars-web/wiki/Data (and in the details of a fetched feed.clj) so instead i'm thinking to use http head requests to collect this info via content-length headers (hopefully reusing connections). does this seem like a reasonable thing to do? the number of jars involved in the past has been somewhat over 20,000 (typically trying to fetch only the latest release jars) [1]. for some background, this is for testing tree-sitter-clojure against existing clojure code. [1] there may be a record in this channel of similar previous activity some years back 😅

2025-06-21T12:39:59.438419Z

If it helps, the entire repo is ~154GB on S3, so you wouldn't need more space than that.

2025-06-21T12:49:29.886519Z

thanks for the response! i know from past experience that extracting relevant files from around 20,000 jars results in a little under 20GB. i don't remember how much the jars themselves took up (should record that if i fetch again...). i haven't been keen on keeping the files around in my own setup and was hoping to make it clearer to anyone who might help maintain or take over how much they might need before they started fetching and make the total amount fetched configurable. do you think the number of http head requests i mentioned above (say 20,000) would be a problem (the intention would be to try to have those be in as few connections as possible)? (for the actual fetching of jar files, before i think it ended up taking almost half a day at worst -- so i was thinking to try to arrange for slow / spaced out retrieval.)

2025-06-22T16:38:04.537209Z

We currently serve ~26 million requests a day, so 20k shouldn't be an issue. It all hits a fastly CDN, which handles the load quite well. It think it's also fine to try to pull down the jars as fast as you can.

2025-06-23T01:31:45.979139Z

thank you!