Fork me on GitHub
Jacob O'Bryant05:09:13

I've decided to go with in-memory queues for an app I'm working on, and I've got it set up so feature maps can have a :workers key. E.g. :workers [{:id :sync-rss :handler #'sync-rss}] would put a PriorityBlockingQueue in the system map which you can add jobs (arbitrary maps) to (E.g. from an http handler, scheduled task, or tx listener), and then sync-rss will handle jobs in a thread pool. default thread pool size is 1. there's also a :batch-size option (default 1) which will pass multiple jobs to the handler at a time if there's more than one available (handy if the handler fn has some constant time overhead). and a few other conveniences, like callbacks for if you want to get the result of a job after submitting it. I generally like to use new features in my own apps for a bit before adding them to biff, but sooner or later this'll get merged in.;cid=C013Y4VG20J

👏 1
💯 1

Sounds great, would love to see it :) This would still be really nice to have backed by XTDB, to persist on restarts and scale with more server instances

👀 1
Jacob O'Bryant17:09:26

I'll probably post a gist/how-to article soon (tomorrow?) for anyone who wants to try it out earlier. Agreed on backing with XTDB, at least for restarts--however I'm not sure if it'd work very well for scaling. e.g. even > Datomic does not have anything like for update skip locked. Thus consuming a queue should be limited to a single JVM process. This library will take queue jobs by compare-and-swapping a lock+state, process the item and then compare-and-swapping the lock+new-state. It does so eagerly, thus if you have multiple JVM consumers you will most likely get many locking conflicts. It should work, but it's far from optimal. I wonder if it might work alright to use transaction functions to lock jobs. i.e. when a consumer is ready for a new job and it's notified that there is one available, it submits a tx function that looks for an available job and locks it (which may or may not be the one it was notified about). I'd be concerned about slowing down indexing though. Given all that I'm kind of wondering if maybe the best strategy is to just stick with simple in-memory queues, and then when you need retries and/or multiple worker machines per queue, throw in an e.g. redis instance and use an already-existing job queue library. (I suppose retries might still be useful at smaller scale.)

Jacob O'Bryant17:09:56

In any case what I'll probably do is just write an article that gives some pointers how you could use XT to add persistence on top of the in-memory queues and have them be consumed by separate worker machines (up to 1 machine per queue), and if that turns out to be useful, then maybe build it into biff.


That sounds great. This would already be very useful with in-memory queues 🙏 In most cases I would need to persist the queue though (more so than scaling consumers). If I get any ideas I'll share

Jacob O'Bryant21:09:17

sounds good--in any case it shouldn't be too complicated if you just need to handle application restarts, basically just 1. instead of adding jobs to the queues directly, create a job document in XT 2. create a tx listener that looks for new job documents and adds them to the appropriate queue 3. add a wrapper to your handler fns that marks jobs as completed (or as failed if there was an exception) 4. on startup, query the db for any existing uncompleted jobs and add them to the queue

Jacob O'Bryant21:09:55

that would easily let you move all the queues to a separate worker machine too. and you can scale out to a degree by putting each queue on a separate worker (each worker would have a config option that says which queues it handles). I guess all that is straightforward enough that there's no reason for me to not just include it soon after I get in-memory queues merged. and then I'd leave retry logic + job locking as an exercise for the reader basically :)

Jacob O'Bryant02:10:15

Queues are in! (also: upgraded XT to 1.22.0, and added a helper fn for so that new dependencies in deps.edn can be loaded whenever you save a file, without needing to restart the repl). I'll put this in #announcements on Monday. If anyone happens to try this out over the weekend, let me know if you run into any snags. I guess same goes if you try it out after the weekend too.

Mario Trost16:09:14

Oh shoot, I thought it was tomorrow

Mario Trost16:09:42

Have fun with fly, I like it a lot 😄

🙂 1
Jacob O'Bryant16:09:01

haha no worries, there'll be a recording at least!

Jacob O'Bryant17:09:11

I'll have that recording published soon--I might need your help because we couldn't actually get fly to work ha ha

Mario Trost10:09:27

Looking forward to watching it and very happy to help if I can. Buuut: I haven't used Fly machines before, just deployed multiple apps. And implemented some homemade automated per branch deployments using Github Actions with Fly, that was cool!

Jacob O'Bryant18:09:40

No worries--I figured out why it wasn't working! (cc @U054219BT) Also the video is up now, not that it's necessarily worth watching: