This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-07-04
Channels
- # beginners (8)
- # boot (20)
- # cider (8)
- # cljs-dev (263)
- # cljsjs (8)
- # cljsrn (20)
- # clojure (151)
- # clojure-argentina (1)
- # clojure-belgium (7)
- # clojure-dev (18)
- # clojure-italy (25)
- # clojure-spec (34)
- # clojure-uk (15)
- # clojurescript (89)
- # component (45)
- # core-async (27)
- # cursive (16)
- # datomic (53)
- # emacs (40)
- # figwheel (3)
- # hoplon (62)
- # jobs (1)
- # jobs-discuss (7)
- # luminus (8)
- # lumo (60)
- # off-topic (3)
- # parinfer (1)
- # precept (1)
- # protorepl (15)
- # re-frame (37)
- # reagent (7)
- # ring (3)
- # ring-swagger (73)
- # slack-help (1)
- # specter (19)
- # sql (4)
- # test-check (10)
- # uncomplicate (2)
- # unrepl (14)
- # untangled (52)
- # vim (5)
- # yada (42)
Hello all 👋
So I asked this question a long time ago, but it got lost in a wave of posts, so asking again because I am still curious.
Reference: https://github.com/pawandubey/crawljer/blob/master/src/crawljer/core.clj#L107
When the order of the two function calls in that function is reversed, my code (which is expected to start crawling the URLs provided) didn’t work - instead it immediately returned an instance of ManyToManyChannel
.
However, in this order, it does work. This has been most surprising to me. It seems adding urls to the channel before starting the go-loop
doesn’t work as expected. Instead, I have to start the go-loop
and then start adding URLs to the channel.
What is the insight that I am missing here which would help me understand why this is the case?
if something starts a go loop, you can expect it to immediately return a ManyToManyChannel
that doesn't have anything to do with whether it works or not
I see. But that doesn’t explain why it works just on switching the order.
I don't fully comprehend the code yet, but another issue here is that you are doing blocking IO inside go blocks, which can easily starve core.async
Do you mean the write-doc
function? I added it later. When I first encountered the problem, it was just a no-op. You can ignore its effects and the question is still valid.
no, I mean the call to http-get which is called inside read-url which is called inside a go block in read-urls
that's a blocking read and if it were parallelized it could block up the entirety of core.async
I don't think it's the core issue here, but it is a problem
So you suggest not putting blocking I/O inside go-loop
s? I thought that was the best way of improving I/O performance? Whether disk or network.
no, go blocks are not made for IO at all
core.async is for coordinating things that are asynchronous, go blocks are made for simplifying coordination code, not for running tasks, and core.async can lock up and fail if you do blocking tasks inside go
Interesting. What is the best way of parallelizing I/O in clojure then?
there's async/thread that is meant for blocking or CPU intensive code
there's also futures and ThreadPools if you don't need the sophisticated coordination core.async provides - if you are just doing a map/reduce type job
My main intension was to maintain two channels - one for reading in the web-pages and another for storing the urls left to visit - both acting as producers and consumers for each other. core.async and go-loop seemed perfect for this use case so I ran with it.
sure - async/thread returns a channel that a go block can park on
(<! (async/thread ...))
- returns the return value of the thread
Nice. Thanks!
But my original question remains unanswered 😅
so
(defn read-urls
[]
(async/go-loop [current-url (async/<! urls-chan)]
(when (read-url current-url)
(recur (async/<! urls-chan)))))
would become (defn read-urls
[]
(async/go-loop [current-url (async/<! urls-chan)]
(when (async/<! (async/thread (read-url current-url)))
(recur (async/<! urls-chan)))))
also tbaldridge's talk at clojure/west had some very good advice about this kind of code https://www.youtube.com/watch?v=096pIlA3GDo&list=PLAMHgQX0SWwrMLtv3V9z8NF3QIrm-6NLi
specifically about how to simplify it, separate the implementation details from the logic properly, etc.
Will give it a watch. Thanks for all the help 😄
I'll take a few more looks at that code - but my hunch is that reorganizing it in the way tbaldridge recommends will either eliminate the issue or make it much simpler to fix... I think his talk will make it more clear why
one more small thing - you can simplify the code around line 75 by replacing ->> with ->
You are right! Will refactor