Looks like http://cljdoc.org is currently having some uptime issues. I'll take a peek soon.
Hmmm.... logs suggest maybe something is crawling cljdoc...
Ya, I dunno yet. Seems to be handling the load ok now. Still seeing crawl activity in the logs. Could have been the load, or maybe hitting some URL(s) were problematic.
Hmmm... unrelated to outage... but just happened to notice that cljdoc returns 404 for HEAD requests? That seems maybe a bit odd.
Perhaps AI scrapers? We're been inundated by them atm.
PErhaps something like this at the front of the website may help? https://anubis.techaro.lol/
We're thinking of similar protections
Thanks @dharrigan! I looked up a few IPs, seemed to be Alibaba. We are happy with bots like, google for search, indexing http://cljdoc.org. I do have an out of memory heap dump to take a peek at, curious to see what it will tell me.
Oh yes, we had to block entire swathes of Alibaba IPs
I mean whole /8's and so on
Alibana IPs are really notorious for being orginators of AI scrapers. They don't respect, honour or obey robots.txt. They just don't care.
I had to block lots of Singaporian IPs as well today
As you can tell, I regularly face this and I'm growing tired of it.
I'm not the only one.
Interesting! Thanks for sharing!
I was busy today with obligations-of-life-stuff, but had a peek at the heap dump, and it just seems like a lot of requests were active at once. The http://cljdoc.org server has been running fine for several hours, so I'll leave it for now.