Fork me on GitHub

I read this article about using core async and transducers to read and process a csv file The author closes on a slightly disappointed note regarding the performance. I wonder if there is something that could be done to optimize their code. Also; I read elsewhere that it's better to use blocking put for IO.


on a quick skim it looks like they are doing IO inside a go block, which is a very bad idea


go isn't a mechanism for faster throughput, a dedicated thread pool is much better at that, it's a mechanism for async coordination, which this task hardly needs

Alex Miller (Clojure team)17:07:01

everything about this is imo a weird approach to force something into using every core.async construct

βž• 9
Alex Miller (Clojure team)17:07:59

it is probably much simpler and faster to just write a tight sequential loop

Alex Miller (Clojure team)17:07:15

if you truly want to parallelize it, you probably want to memory map it or randomaccessfile, break it into n chunks, then do that same tight loop. the first part of that is somewhat complicated interop (and needs to take into account finding "line" breaks


ghadi spoke a bit about something like this in slack a while ago. using a custom pipeline iteration and a file walker pump to saturate cores. i made a gist out of it but would love to see a proper blog post about it

πŸ’― 3
Alex Miller (Clojure team)17:07:56

or you could juts write like 2 lines of awk

βž• 6
Alex Miller (Clojure team)17:07:47

well the ghadi stuff above is eventually probably coming to clojure and core.async and there will be some bloggy things when we get to that


cool. really enjoy everything he shares


(that's not the iteration stuff)

Alex Miller (Clojure team)17:07:13

or maybe I'm conflating


the gist has comments below that explain wiring it together that are super helpful in getting an idiomatic core async pipeline up and running doing tons of work safely


the stuff excerpted above takes a filesystem walker, and pipelines over the stream of files, shelling out to a process for each file


producer <> consumer , where the consumer is pipelined


with short core operations, it's important to batch


I think the article linked above probably misses that


seems like it unconditionally fans out even with short ops


as alex says, it's a bit kitchen-sinky


As an aside, I can delete or make that gist private if you don’t like me copying and preserving you like that


no it's fine πŸ™‚

πŸ‘ 3

if I put it out there, I put it out there πŸ™‚


@alexmiller may be worth considering making CompletableFuture interop with channels better


@hiredman has a gist about it, and L8-23 above are a manual adaptation of CF -> channel


the gist is likely incorrect, because the handler lock is mostly a noop; outside of alts handlers depend on the channel lock


not sure why it is usually a noop, performance? but it makes things like extending the protocols to existing types annoying