Fork me on GitHub
#component
<
2016-06-15
>
mangr3n18:06:18

Are there any discussions/blog-posts that talk about making a component fault tolerant? What happens if my database goes down, or isn’t up, and I want to defer startup until after the connection is available and working?

mangr3n18:06:33

or halt a component and restart when the connection is available again

seancorfield18:06:46

How would you handle dependencies in general on a component restart?

seancorfield19:06:24

If you restart your db component, what about components that depend on it? Should they all be restarted too?

mangr3n19:06:13

I might have my answer

seancorfield19:06:15

I think an individual app can decide what to do in situations like these but I don't think Component itself can.

mangr3n19:06:48

Hmm… I need to rethink

mangr3n19:06:10

I shouldn’t be failing to start just because the real DB isn’t available.

mangr3n19:06:40

I should gracefully report an error, and keep functioning…

seancorfield19:06:33

For our Component-based apps, if the db is not available at startup, we'd want to fail hard.

seancorfield19:06:20

If the db goes offline while we're running, that's a different issue but some of our processes should just die at that point.

mangr3n19:06:07

I’m using docker and docker-compose… and the db starts slowly

mangr3n19:06:49

I think I can solve it there, but I was trying to come up with a solution because if I grow this and break the services across multiple cloud instances, then network route drops might play a role and I’d want the “components” which face external services to tolerate interruption

seancorfield19:06:17

So your app would need to maintain mutable state in the components to indicate whether they are "ready".

mangr3n19:06:31

when not ready respond thus to requests (50x) and poll in the background for availability and then bring myself back online.

mangr3n19:06:05

I’m wondering if I should have a coordinator service external that can bring stuff up and down. Like a service in docker, that sends docker commands to stop and start stuff...

mangr3n19:06:44

Hmmm… I could add a ring handler to report status and then post to the same url to do a restart…

donaldball19:06:12

I have two pieces that solve this problem for me

donaldball19:06:01

I have a boot fn which combines building and starting a system and, if it gets errors thrown starting the database too many times, will reconfigure and start the system with the database disabled

donaldball19:06:28

I also have a manager wrapping my system that watches for config changes and restarts the system during a safe period

mangr3n19:06:01

I like that. I’ve been thinking about building an application infrastructure that watches a datastore (git repo + config files) for changes, and triggers behaviors off of it. While being able to poll services (watch statistics) and respond to certain behaviors (if I have a defined response) or just report on that state through some notification (email/text/slack)

mangr3n19:06:18

thx donald and sean

seancorfield19:06:49

Yeah, if you have an app that can run in both "connected" and "disconnected" mode, that pretty much has to be baked into your logic from the get-go. I don’t think there’s much support a generic library can provide you with there (other than provide structure around your "state").

donaldball19:06:50

If my system were under any real load, I’d like to build a piece that triggers config changes on exception rate and/or timeouts

mangr3n19:06:58

Internal/External message queues might handle some of this also in a more tolerant way

seancorfield19:06:10

☝️:skin-tone-2:

mangr3n19:06:21

really been thinking about that

seancorfield19:06:23

We’re in the process of switching architectures to that approach.

roberto19:06:47

doesn’t netflix have a database driver that helps with that?

mangr3n19:06:22

in the end I want to be able to break apart a message request and compose it from the results of many different services, that’s going to require some type of queue pipelining functionality.

donaldball19:06:18

I have my web handlers describe the services on which they depend and respond automatically with 503s when they aren’t available

mangr3n19:06:24

Might be able to use a core.async channel to say “go” to the next component when it’s dependencies are ready...

roberto19:06:31

ah, so you have a channel that is passed in as a dependency, and the components are pushed to that channel when you want to “scale” or “replace” ?

mangr3n20:06:42

I was thinking that if I’m dependent, my start doesn’t get called until the previous component publishes “ready” to a channel it returns… then I’m free to go ahead

mangr3n20:06:16

IT’s an async callback like a promise done clojure’s way

mangr3n20:06:43

I do a lot of javascript that way, where I return a deferred instead of a result, and you trigger when I comeback. My templates render that way a lot.

mangr3n20:06:29

My component’s start function doesn’t get called until all deps have declared that they are “ready"

donaldball20:06:08

That’s the way component’s systems work now, except without the asynchrony

mangr3n20:06:22

My only problem is when the database comes up I check to verify/run the database table migrations (ragtime) and my own internal data-migration functions executed. before the component says it’s ready to be used, and that requires the connection to be active, which requires blocking

mangr3n20:06:39

I’m using wait-for-it.sh to block until the database server is running in my docker-compose config file, but that’s not as robust as doing something in my server code to be more tolerant of dropped connections.

mangr3n20:06:00

I just tested and deployed that while we’re chatting and it seems to be fine

seancorfield20:06:24

Sounds like you need something in your -main function that handles that prior to creating & starting the application’s Component?

mangr3n20:06:49

Maybe, I might have to duplicate the config code and block until the database connection is present.

mangr3n20:06:29

Maybe I do something else on my own with channels in my start function for my component instead...

mangr3n20:06:38

I could do what I said myself apart from system through the system map I think. Put a channel behind a key, and defer my web service from coming up until after the database connection is finished. That’s the solution for the existing architecture. Thanks for helping me think this through guys.

roberto20:06:09

let us know how it works out. I’m very interested in this.

roberto20:06:42

I’m having a similar challenge with components that hold tokens to third party services that expire after a certain time, and need to get refreshed.

donaldball20:06:22

Heh I just solved a similar problem thereby myself. My case is an internal keyring whose keys need to be rotated periodically.

donaldball20:06:51

Currently I have a worker go loop that starts when the component starts, stops when the component stops, and the keyring state is maintained in an atom

donaldball20:06:26

go loops and component lifecycle fns are a nice fit

mangr3n21:06:34

That’s great to hear

mangr3n21:06:06

especially since async channels are essentially queues so they fit with other models I want to use, like websockets and Message Queues.