Fork me on GitHub
#core-async
<
2018-11-19
>
leontalbot16:11:12

Hey! Context: A first user request an API to run expensive task, say the generation of 1000 images. The API fires a pipeline-blocking job that uses all the cores. Question: If a second user comes in and requests same job, can we split threads equally between to requests so that both users get their processes run at the same time?

noisesmith16:11:57

setting aside fairness concerns if one user makes multiple requests concurrently, you can have a single pipeline-blocking that serves all incoming requests (plus some wrapping of the data to ensure the right results go to the right place of course)

leontalbot17:11:12

@noisesmith Thanks for answering. Can we dynamically say, alright, instead of putting user 2 jobs at the end of the queue, let’s “interleave” incoming jobs with other users?

noisesmith17:11:21

interesting - intuitively it seems like you could do that with a custom buffer implementation

noisesmith17:11:33

maybe some variant of a prioqueue ?

leontalbot17:11:19

so, if user 1 got 100/1000 images processed, then user 2 requests 1000 images to be generated, it goes like this user1:101th image, user2:1st image, user1:102th image, etc.

noisesmith17:11:01

perhaps an atom holding a vector with one chan per user id, and a driver function that iterates over the vector in a loop, taking one item from each chan?

noisesmith17:11:30

(with logic to skip to the next one if no job is immediately availabe, you could use poll! for that)

noisesmith17:11:46

just a thought

leontalbot17:11:05

interesting…

leontalbot17:11:21

I wonder if there is out of the box solution for this.

noisesmith17:11:46

not that I know of

noisesmith17:11:38

I worked on a heavily data intensive application that relied on similar constraints

noisesmith17:11:04

but our case was easier because each client had a auth token that was exercised by their work, and it had timeouts built in

noisesmith17:11:23

so we were effectively forwarding upstream limits to have the same behavioral consequences

hiredman17:11:04

you can do things with pipeline-async, where the actual jobs are run on some shared limited executor, where the pipeline-async's limit the number of jobs running per kind of task, and the shared executor limits the number of jobs run for all tasks

hiredman17:11:37

you can also give jobs coming in on a certain channel higher priority by instead of directly putting jobs directly on the input channel for a pipeline, putting them on the input channel to some copying processing that copies from multiple channels (using alts with the priority argument) to the input of the pipeline