cljdoc

lread 2025-04-02T14:07:07.738219Z

Looks like http://cljdoc.org is currently having some uptime issues. I'll take a peek soon.

👁️ 1
✅ 1
lread 2025-04-02T14:20:16.636079Z

Hmmm.... logs suggest maybe something is crawling cljdoc...

lread 2025-04-02T14:45:55.986679Z

Ya, I dunno yet. Seems to be handling the load ok now. Still seeing crawl activity in the logs. Could have been the load, or maybe hitting some URL(s) were problematic.

lread 2025-04-02T15:12:46.394899Z

Hmmm... unrelated to outage... but just happened to notice that cljdoc returns 404 for HEAD requests? That seems maybe a bit odd.

dharrigan 2025-04-02T16:38:37.342989Z

Perhaps AI scrapers? We're been inundated by them atm.

dharrigan 2025-04-02T16:39:50.799079Z

PErhaps something like this at the front of the website may help? https://anubis.techaro.lol/

dharrigan 2025-04-02T16:39:59.133639Z

We're thinking of similar protections

lread 2025-04-02T18:20:24.966709Z

Thanks @dharrigan! I looked up a few IPs, seemed to be Alibaba. We are happy with bots like, google for search, indexing http://cljdoc.org. I do have an out of memory heap dump to take a peek at, curious to see what it will tell me.

dharrigan 2025-04-02T19:44:43.580289Z

Oh yes, we had to block entire swathes of Alibaba IPs

dharrigan 2025-04-02T19:44:51.844709Z

I mean whole /8's and so on

dharrigan 2025-04-02T19:46:43.825459Z

Alibana IPs are really notorious for being orginators of AI scrapers. They don't respect, honour or obey robots.txt. They just don't care.

dharrigan 2025-04-02T19:48:37.332289Z

I had to block lots of Singaporian IPs as well today

dharrigan 2025-04-02T19:48:51.798929Z

As you can tell, I regularly face this and I'm growing tired of it.

dharrigan 2025-04-02T19:50:47.036389Z

dharrigan 2025-04-02T19:50:53.625839Z

dharrigan 2025-04-02T19:50:57.795099Z

I'm not the only one.

lread 2025-04-02T19:58:41.841099Z

Interesting! Thanks for sharing!

👍 1
lread 2025-04-02T21:52:11.632519Z

I was busy today with obligations-of-life-stuff, but had a peek at the heap dump, and it just seems like a lot of requests were active at once. The http://cljdoc.org server has been running fine for several hours, so I'll leave it for now.