Fork me on GitHub
#core-async
<
2017-10-05
>
gowder13:10:29

Wise concurrency peoples! Does anyone want to help me ponder a memory/io/cpu balancing trick? I want to parallelize the following, fairly banal operation: taking a folder full of images, rescaling each one, spitting them back out. I'm trying to think how best to do that, given that I want to be able to handle the case where there might be more images to transform than can fit into memory all at once. My first thought was to break the operation up, follow standard core.async practices and spin up threads for the i/o part, dump all the images on a channel, and then apply the transformation in a go block using a pipeline/transducer thing. But I'm not sure how that plays with the memory issue. So alternatively, maybe I actually want to treat it like an io/bound operation here? If I just keep it from holding more images in memory at once than I have threads available, that makes it pretty likely (unless the images turn out to be pathologically huge, which is an edge case I don't care about) that I'll never have to worry about memory issues. So then I can just do the whole thing in blocking threads with one pipeline transformation from a channel of file paths to another channel of file paths, and have the transducer on the inside handle reading, scaling, and writing? With the assumption that since all the actual image data would only touch local variables inside the transducer on the pipeline, that it would all get garbage collected away the moment it is done. So essentially I would impose some hard limits on the amount of concurrency I could squeeze out, but avoid memory problems. Or is there a standard strategy in these situations? Thanks!

jsa-aerial16:10:00

might want to have a look at #onyx for that

gowder17:10:22

oooh I hadn't thought about going for heavy tools like that, but why not? thanks!