cljdoc 2019-05-25 | Slack Archive

is http://cljdoc.org down for everyone or just me ?

😢 4

@carkh @seancorfield thanks for the ping, looking into it, seems DigitalOcean had to restart the instance and it didn't start up correctly again

martinklepsch07:05:55

Aaaaand... we're back! 🙂

👍 8

❤️ 4

borkdude08:05:49

post-mortem analysis for the curious? 🙂

carkh08:05:18

most likely clj-kondo docs brought the whole thing down =D

😅 4

borkdude08:05:12

luckily I had my local Docker instance running 🙂

carkh08:05:15

lucky you =)

martinklepsch08:05:20

@borkdude DigitalOcean migrated the Droplet to another physical machine and on that machine one of the important services (Consul) failed to start up. I'm not entirely sure why, the logs seem to indicate that the port was already in use which makes me think that maybe something went wrong during the migration ¯\(ツ)/¯

borkdude08:05:50

aaah. you use Docker to run cljdoc there?

martinklepsch08:05:12

I do (via http://nomadproject.io)

borkdude08:05:38

I see. Glad it’s fixed 🙂

martinklepsch08:05:56

same 🙂 took 5 minutes which is great

martinklepsch08:05:23

And I got alerted but was sleeping of course 😄

dominicm09:05:48

Time to get funds together to hire a 24h support team

martinklepsch09:05:56

😄

dominicm09:05:20

Alternatively, setup pager duty to call you in the middle of the night 😈

martinklepsch09:05:43

yeah... I think I have better things to do in the middle of the night 😛

martinklepsch09:05:03

I'd be happy to onboard more maintainers/admins in different timezones though

borkdude11:05:29

can’t help there I’m afraid (NL)

mfikes12:05:18

I'm also a little curious as to how often this occurs with DigitalOcean vs. Linode. I've been using Linode for production since 2013, and usually the pattern is that they will give you plenty of head's up if a migration needs to occur because of a potential hardware issue. (Thus giving you a chance to do the migration ahead of time yourself.) Truth be told, I think there may have been a handful (4 or 5) times where they needed to migrate thing "on the spot" because things were dire with hardware. In those cases, even though there was an outage, it resolved itself because the software servers came up. I suppose I'm lucky in that it is just a Clojure / nginx / mysql setup, and they always come back up by themselves.

martinklepsch12:05:14

I don't have enough experience to really comment on that. I think that things didn't boot up again is probably my fault. cljdoc's ops infrastructure is a little more involved to enable continuous deployment without downtime

👍 4

2019-05-25

Channels