Fork me on GitHub

Hello all 👋 So I asked this question a long time ago, but it got lost in a wave of posts, so asking again because I am still curious. Reference: When the order of the two function calls in that function is reversed, my code (which is expected to start crawling the URLs provided) didn’t work - instead it immediately returned an instance of ManyToManyChannel. However, in this order, it does work. This has been most surprising to me. It seems adding urls to the channel before starting the go-loop doesn’t work as expected. Instead, I have to start the go-loop and then start adding URLs to the channel. What is the insight that I am missing here which would help me understand why this is the case?


if something starts a go loop, you can expect it to immediately return a ManyToManyChannel


that doesn't have anything to do with whether it works or not


I see. But that doesn’t explain why it works just on switching the order.


I don't fully comprehend the code yet, but another issue here is that you are doing blocking IO inside go blocks, which can easily starve core.async


Do you mean the write-doc function? I added it later. When I first encountered the problem, it was just a no-op. You can ignore its effects and the question is still valid.


no, I mean the call to http-get which is called inside read-url which is called inside a go block in read-urls


that's a blocking read and if it were parallelized it could block up the entirety of core.async


I don't think it's the core issue here, but it is a problem


So you suggest not putting blocking I/O inside go-loops? I thought that was the best way of improving I/O performance? Whether disk or network.


no, go blocks are not made for IO at all


core.async is for coordinating things that are asynchronous, go blocks are made for simplifying coordination code, not for running tasks, and core.async can lock up and fail if you do blocking tasks inside go


Interesting. What is the best way of parallelizing I/O in clojure then?


there's async/thread that is meant for blocking or CPU intensive code


there's also futures and ThreadPools if you don't need the sophisticated coordination core.async provides - if you are just doing a map/reduce type job


My main intension was to maintain two channels - one for reading in the web-pages and another for storing the urls left to visit - both acting as producers and consumers for each other. core.async and go-loop seemed perfect for this use case so I ran with it.


sure - async/thread returns a channel that a go block can park on


(<! (async/thread ...)) - returns the return value of the thread


Nice. Thanks!


But my original question remains unanswered 😅



(defn read-urls
  (async/go-loop [current-url (async/<! urls-chan)]
    (when (read-url current-url)
      (recur (async/<! urls-chan)))))
would become
(defn read-urls
  (async/go-loop [current-url (async/<! urls-chan)]
    (when (async/<! (async/thread (read-url current-url)))
      (recur (async/<! urls-chan)))))


also tbaldridge's talk at clojure/west had some very good advice about this kind of code;list=PLAMHgQX0SWwrMLtv3V9z8NF3QIrm-6NLi


specifically about how to simplify it, separate the implementation details from the logic properly, etc.


Will give it a watch. Thanks for all the help 😄


I'll take a few more looks at that code - but my hunch is that reorganizing it in the way tbaldridge recommends will either eliminate the issue or make it much simpler to fix... I think his talk will make it more clear why


one more small thing - you can simplify the code around line 75 by replacing ->> with ->


You are right! Will refactor