This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
- # beginners (43)
- # bristol-clojurians (2)
- # calva (11)
- # cider (10)
- # clj-kondo (3)
- # clojars (19)
- # clojure (93)
- # clojure-france (44)
- # clojure-nl (10)
- # clojure-uk (15)
- # clojuredesign-podcast (1)
- # clr (6)
- # core-typed (102)
- # data-science (1)
- # datomic (11)
- # docker (4)
- # emacs (12)
- # fulcro (27)
- # graalvm (6)
- # joker (1)
- # leiningen (4)
- # lumo (20)
- # nrepl (3)
- # off-topic (63)
- # parinfer (4)
- # reagent (40)
- # remote-jobs (2)
- # shadow-cljs (18)
- # spacemacs (7)
- # tree-sitter (7)
- # yada (3)
@ambrosebs @bjagg @atdixon I did some more digging but I still haven't figured out the root cause. A summary of what I've found so far, based on log files from 2020-02-20 and 2020-02-21:
• The 400s are being returned by the Fastly CDN
• It is only on a few GET requests for
maven-metadata.xml files (both existing (which should return a 200) and non-existing (which should return a 404)). No other paths are affected.
• Only six client IPs were affected out of 8,122 that made
GET /.../maven-metadata.xml requests
• Only 266 requests were affected out of 107,255 for
• Those 266 requests were for 54 different
• There is no common client, OS version, or Java version based on the user-agent strings
• There is no common cache host
• I have not been able to recreate it
Actions taken to remediate:
• There was custom configuration in Fastly to override the cache TTL for
maven-metadata.xml files. This is the only configuration that would treat those files differently. It was obviated by https://github.com/clojars/clojars-web/commit/6c081e809a4c2e0c4ddf5359c2fc858d1cd5dc2b and removed ~ 2020-02-21 01:00 UTC
• This didn't resolve the issue immediately, but it's possible the 400s were cached
• ~2020-02-22 16:30 UTC all 54 paths that had 400s in the logs were purged from Fastly's cache
• It is unknown if this will help with the issue
I've captured this in https://github.com/clojars/clojars-web/issues/744 as well.
Since I just purged all of those paths from Fastly, can y'all test again when you get a chance? I really appreciate your help and your patience.
I'll also analyze the logs again this evening to see how things look (the logs are aggregated every evening from 100s of individual files, so doing analysis before that aggregation occurs is painful)
I also added some more details to that GH issue about the workarounds that y'all have been successful with so far.
Thanks for testing @ambrosebs! I just added more logging to Fastly to a service where I can see the logs more easily - would you mind trying again when you have a chance? I'll then see if there are any clues in the extended logs
@tcrawley Deploying snapshot worked! Had a checksum error earlier, but that did not occur this last time. Thanks!!
@tcrawley gets a little further! https://github.com/clojars/clojars-web/issues/744#issuecomment-589990610
Thanks @ambrosebs @bjagg! With the additional logging, I also see 400s for
.md5 files from your requests. Unfortunately, the logs don't provide any additional insights. I have however enabled request logging on the S3 bucket itself. That log is supposed to include an error reason if the 400 is originating from S3. If y'all could try yet again I would be most appreciative, and I'll check on those logs a bit later.
@ambrosebs @bjagg ok, with S3 logging on, I can see the 400 requests being rejected due to
InvalidArgument. The S3 docs for
InvalidArgument aren't very helpful, but I did find reference to that being returned if the
Authorization header is set, but not to valid AWS credentials. So I added logging for that header, and see it set on the 400 error requests since I enabled it. I then added a rule in fastly to strip that header from the request before passing it to S3. Would y'all be willing to try again?