Fork me on GitHub
#clojars
<
2023-06-04
>
minikomi05:06:48

Hi gang. I'm mucking around with the https://clojars-stats-production.s3.us-east-2.amazonaws.com & clerk, learning more about the library ecosystem. I've found a few "downloads" files which contain entries with strange group IDs: • There are multiple files where several group-ids are prefixed by ?prefix=

❯ rg "\"?prefix=" . -l
downloads-20230208.edn
downloads-20200917.edn
downloads-20230227.edn
downloads-20230530.edn
downloads-20230526.edn
downloads-20230118.edn
downloads-20230512.edn
downloads-20220101.edn
downloads-20221108.edn
downloads-20230206.edn
downloads-20230601.edn 
• There are multiple files where several group-ids are prefixed by ?marker
❯ rg "\"?marker=" . -l
downloads-20200208.edn
downloads-20200209.edn
downloads-20200211.edn 
• There's one group-id / artifact-id in downloads-20141229.edn which is ??/?? .. I think this is just junk so I ignored it. I've collected them all in a https://gist.github.com/minikomi/9e7a54fe049dde9949766a913fa118bd here if someone wants to take a look. I'm going ahead and just dropping the prefix-strings for my work, just thought someone here would be interested.

tcrawley21:06:09

Ah! This ls likely because we now serve html index files through fastly, and the download stats logic that parses the log files from fastly hasn't been updated to exclude them (that would be for the ?prefix= ones at least; I'm not sure where the ?marker= ones are coming from.