This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-04-09
Channels
- # beginners (41)
- # boot (4)
- # cider (36)
- # cljsrn (9)
- # clojure (365)
- # clojure-dev (1)
- # clojure-dusseldorf (1)
- # clojure-nl (1)
- # clojure-russia (3)
- # clojure-spain (1)
- # clojure-spec (19)
- # clojure-uk (1)
- # clojurescript (159)
- # code-reviews (7)
- # core-async (51)
- # cursive (2)
- # datascript (1)
- # datomic (1)
- # emacs (5)
- # figwheel (3)
- # hoplon (18)
- # incanter (1)
- # lein-figwheel (1)
- # leiningen (3)
- # lumo (145)
- # off-topic (26)
- # onyx (21)
- # re-frame (2)
- # reagent (45)
- # rum (4)
- # uncomplicate (10)
- # untangled (23)
- # yada (6)
Hi, I want to create a web crawler application using core.async. but it's not clear to me that how should I use core async exactly. I mean how my design should be ?
here is my current version which does not working http://dpaste.com/03BQJ5S
@lxsameer for starters, on line 41 it looks like you map a series of >! to a channel, before anybody has a chance to start reading from it
you seem to address this with a buffer, but a more elegant solution is to have the out channel be an argument - then the reader can park before that function even runs
also, does fetch> do io? if so it could easily starve your go blocks
@noisesmith good points, thanks. Anything else ?
those are the only things that really stand out - you could easily make the function using fetch> just use blocking takes on the channel without hurting performance (and probably improving it by not stealing a thread from core.async)
also, your flow doesn't seem to actually go concurrent anywhere - everything is just input to output?
@noisesmith So basic idea is to read from channels in go blocks before writing to them, right ?
@noisesmith and how should I make them concurrent ?
well - that's a design question - I just don't really know how core.async is helping at all in that code, since it looks like a one way flow of data without concurrency
which means it would perform better without core.async
but maybe it needs concurrency somewhere, I don't know your design well enough to answer that
aha. Let me describe my understanding from core async and correct me if I'm wrong ( which I'm )
Using <!
inside a go block will read from a channel and if there is no value to read, it will park and gave control to other go blocks
yes, but in this code, due to the linear pipeline, that saves you exactly 1 thread
and all the channel juggling is more expensive, than putting all the work in one (occasionally blocked) thread would be
i can't understand this one, because i thought using core async means to have thread pool with CPU CORES
+ 2 threads in it
right, and you are using CPU_CORES+2 threads to do 1 thread worth of work
plus useless context switching / state management
if I'm reading the code correctly at least...
OK - I need to back off on that because the pipeline nature means that after enough steps, all those blocks can be running
so it does go parallel in that sense
I read a lot about core async but still have a fuzzy understanding of core async. do you know any video or article that show me how to use core async in action instead of simple examples ?
tbaldridge 's talk at clojure/west was good
@noisesmith thanks man
@lxsameer what I've found useful is if I think I want to use core.async, make a diagram on paper of what the communication flow should be, and see what should serialize, what needs to be parallel, (maybe even launching N go blocks depending on how much I need to fan out) and whether there's a simpler pattern that would do the same thing faster without core.async
also separating out i/o so that it happens in thread blocks instead of go blocks (if you don't do that, everything suffers)
like in your case, I imagine every step is very fast except the io, which I would put into thread calls, and also have N loops running all reading from the same input and writing to the same output
hope it helps
you aren't using core.async/map though?
you could - but then you would also need to read off of each channel that the thread calls produce
for something like that, pipeline-blocking is a better match
it's made for parallelizing blocking operations https://clojure.github.io/core.async/#clojure.core.async/pipeline-blocking
@noisesmith can you give me an example of it ? just the syntax
(pipeline-blocking 12 out (map foo) in)
where (map foo) could be any transducer instead, and 12 is the parallelism (max thread usage) and out and in are your channels
@noisesmith cool. but I don't know the optimal number of threads number in the host platform, is the any facility to find out about it ?
(pipeline has the same syntax, but fixed for the one you would want here)
I don't know - it depends on what the threads are doing and how your OS handles threads to some degree...
if they are doing IO the count could be pretty high, if doing CPU you want to keep it closer to the CPU count of the machine