Fork me on GitHub
#core-async
<
2020-07-09
>
niveauverleih16:07:07

I read this article about using core async and transducers to read and process a csv file https://www.javacodegeeks.com/2017/12/gettin-schwifty-clojures-core-async.html The author closes on a slightly disappointed note regarding the performance. I wonder if there is something that could be done to optimize their code. Also; I read elsewhere that it's better to use blocking put for IO.

noisesmith17:07:17

on a quick skim it looks like they are doing IO inside a go block, which is a very bad idea

noisesmith17:07:55

go isn't a mechanism for faster throughput, a dedicated thread pool is much better at that, it's a mechanism for async coordination, which this task hardly needs

Alex Miller (Clojure team)17:07:01

everything about this is imo a weird approach to force something into using every core.async construct

βž• 9
Alex Miller (Clojure team)17:07:59

it is probably much simpler and faster to just write a tight sequential loop

Alex Miller (Clojure team)17:07:15

if you truly want to parallelize it, you probably want to memory map it or randomaccessfile, break it into n chunks, then do that same tight loop. the first part of that is somewhat complicated interop (and needs to take into account finding "line" breaks

dpsutton17:07:42

ghadi spoke a bit about something like this in slack a while ago. using a custom pipeline iteration and a file walker pump to saturate cores. i made a gist out of it but would love to see a proper blog post about it

πŸ’― 3
Alex Miller (Clojure team)17:07:56

or you could juts write like 2 lines of awk

βž• 6
Alex Miller (Clojure team)17:07:47

well the ghadi stuff above is eventually probably coming to clojure and core.async and there will be some bloggy things when we get to that

dpsutton17:07:10

cool. really enjoy everything he shares

ghadi17:07:10

(that's not the iteration stuff)

Alex Miller (Clojure team)17:07:13

or maybe I'm conflating

dpsutton17:07:17

the gist has comments below that explain wiring it together that are super helpful in getting an idiomatic core async pipeline up and running doing tons of work safely

ghadi17:07:19

the stuff excerpted above takes a filesystem walker, and pipelines over the stream of files, shelling out to a process for each file

ghadi17:07:49

producer <> consumer , where the consumer is pipelined

ghadi17:07:38

with short core operations, it's important to batch

ghadi17:07:50

I think the article linked above probably misses that

ghadi17:07:21

seems like it unconditionally fans out even with short ops

ghadi17:07:04

as alex says, it's a bit kitchen-sinky

dpsutton17:07:19

As an aside, I can delete or make that gist private if you don’t like me copying and preserving you like that

ghadi17:07:26

no it's fine πŸ™‚

πŸ‘ 3
ghadi17:07:37

if I put it out there, I put it out there πŸ™‚

ghadi17:07:02

@alexmiller may be worth considering making CompletableFuture interop with channels better

ghadi17:07:00

@hiredman has a gist about it, and L8-23 above are a manual adaptation of CF -> channel

hiredman17:07:41

the gist is likely incorrect, because the handler lock is mostly a noop; outside of alts handlers depend on the channel lock

hiredman17:07:59

not sure why it is usually a noop, performance? but it makes things like extending the protocols to existing types annoying