clojurescript 2021-12-15

ns21:12:04

I have a question about core.async and performance - So I need to read about 1000 or more values from firebase, doing so using .once value fb method which is wrapped inside a promise and once the promise is resolved on the other end it is collected with "<p!" from core.async library (I'm trying to emulate async/await in javascript because I would rather not mix my logic with callbacks, it would be a mess). The results I get are fine but the performance is terrible, each call is around 200ms and the whole thing takes many minutes to complete, so I'm thinking I'm probably doing something wrong. Should I be using regular async.core channels instead of "<p!" (my understanding is that it turns into a channel anyways) or avoid channels and go blocks and go with regular promise.then or callbacks if I want it to be more performant? Any help is appreciated because I'm on a deadline and seem to be stuck here.

quoll21:12:05

Are you executing them all in their own go blocks?

quoll21:12:15

I have no idea of performance, but if you’re doing lots of requests together, then it reminded me of David’s post from 8 years ago: http://swannodette.github.io/2013/08/02/100000-processes/

quoll21:12:48

That was a follow-on post from one where he described using go blocks for parallel requests: http://swannodette.github.io/2013/07/12/communicating-sequential-processes/

ns22:12:08

Hi, sort of, initially I read all those values from firebase in just a single go block and take each one with "<p!" but then some of them get passed into another function which does have it's own go block as well.

quoll23:12:56

You're reading in a single request with a big response?

ns23:12:26

no, no, each item is a call to firebase

quoll00:12:34

If that's the case then I would execute them all together in go blocks, like David showed in those examples. It should reduce your latency

ns08:12:13

I rewrote the code using channels but it didn't help much performance wise so I ended up taking out go blocks and channels and going with just promises - code ended up being uglier and required me to write data using transactions to avoid overwriting data in firebase but it helped with the performance which was mission critical. I really enjoyed using and learning about core.async however the extra overhead affecting performance was too much for the task but I'll be using it for other things for sure. Perhaps there are some tricks how to optimize and make it faster but I'm still pretty new to clojurescript in general so have yet to learn. Thanks for your help!

raspasov08:12:20

Impossible to give much useful feedback in the abstract without some code samples and specifics. core.async should not have significant overhead, esp. when compiled under :advanced I believe

ns01:12:41

So you are saying if it is compiled under advanced the performance should be faster? I'm using shadow cljs and I don't see any settings for that in my shadow-cljs.edn file, perhaps it happens somewhere behind the scenes when running the production build. I'll see if I can post some code samples because I would rather do this with core.async, the code becomes much easier to reason about.

ns06:12:27

Ok here it is, I removed some implementation details but it should be enough to give you an idea, sorry about long post. I have a list of couple of thousands ids which need to be shortened and stored in firebase. Shortened version is like a human friendly version of the same, but it must be unique.

(def ids ["someListContainingLong80CharacterStringsContaining2000Total"
          ....
          ....])

Already shortened vesions are written in firebase in 2 collections, long2short and short2long. We query firebase for each id in the list and see which ones are already stored. If it's not stored we send it to function for abbreviation (shorten) which looks for shortest available candidate and writes it to firebase, if the short id is already taken then it increases the length of short id and tries again. This is the first version I tried with "<p!" macro from core.async.interop which takes the value from a promise resolved in function that calls firebase and it worked great but it was too slow for processing thousands of ids (literally minutes of wait time)

(:require
 [cljs.core.async.interop :refer-macros [<p!]])

(go
  (doseq [id (distinct ids)]
    (let [short (<p! (fb/get-long2short id))]
      (when (nil? short)
        (<! (shorten id))))))

(defn shorten [id]
  (go
    (loop [abbr-len 3]
      (let [candidate (subs id 0 abbr-len)
            candidate-id (<p! (fb/get-short2long candidate))]
        (cond
          (empty? candidate-id)
          (do
            (fb/set-short2long candidate id)
            (recur abbr-len))
          (= candidate-id id)
          candidate
          :else
          (recur (inc abbr-len)))))))

(defn get-long2short [id]
  (new js/Promise (fn [resolve _]
                    (.. firebase database (ref (str "long2short/" id))
                        (once "value"
                              (fn success [snapshot]
                                (let [data (-> snapshot .val (js->clj :keywordize-keys true))]
                                  (resolve data))))))))

(defn get-short2long [id]
      ;; same as get-long2short just querying "short2long/" collection
  )

Also I tried couple of versions using channels instead promises and "<p!" (which from what I understand converts promise to channel anyways) but it didn't help the performance much, it went something like this:

(defn shorten [id]
  (go
    (loop [abbr-len 3]
      (let [candidate (subs id 0 abbr-len)
            c (chan)
            candidate-id (<! (fb/get-short2long candidate c))]
  ...
  )

and then firebase function success would just put! data on channel:

(put! c data)

I also tried declaring a single channel to be used for this (instead of creating a new one for every shorten function call but it was similar performance wise. Finally I took out core.async stuff and went with promise.then only solution which gave me the perfomance I needed but the code ended up being much uglier. Ideally I would like to get this done to perform fast with core.async, either "<p!" or channels because it would be a lot easier to reason about my code. Any help is appreciated!

quoll12:12:59

I’m on a phone and it's hard to look at right now, but 2 things stand out: • you need to do a round trip for each individual id. There's no way that can be made fast. You either need to figure it out locally, or in batches. • shorten has the potential to make multiple requests per id. I would add a tap to accumulate some stats and see how often re-attempts occur. If hardly ever, then fine. But especially if you run in parallel, then I’d expect that you may see a lot.

ns15:12:00

Is there anything core.async related that could be slowing things down? When I run this exact same logic with promises alone it finishes in around 10 seconds while the version with core.async (whether using "<p!" in combination with promises or putting values on channels without promises) takes over 200 seconds.

ns09:12:13

Believe it or not, the whole issue with performance came down putting "go" block inside of "doseq" instead of outside, so this:

(go
  (doseq [id (distinct ids)]
    (let [short (<p! (fb/get-long2short id))]
      (when (nil? short)
        (<! (shorten id))))))

became this:

(doseq [id (distinct ids)]
  (go    
    (let [short (<p! (fb/get-long2short id))]
      (when (nil? short)
        (<! (shorten id))))))

and the performance went down from 8 minutes to 20 seconds or something like that. I just wish I knew reason for that, I guess I still don't understand how to use go blocks very well. So yes, the performance of version with core.async is just as good as the promise only version. Thank you everyone for your help!

quoll23:12:00

The first one is a single go block. Each call to get-long2short waits for a response before proceeding to the next one. The second starts an independent go block for each id. They all make their requests independently at close to the same time, and then collect the responses as they come in. It's the difference between serial and parallel.

leif21:12:25

Hey, looking at cljs-ajax (https://github.com/JulianBirch/cljs-ajax), and I'm wondering what the best way to log responses is. When I call (println requrest) , for the :body section I get something like: #object[org.eclipse.jetty.server.HttpInput 0x186d26d3 org.eclipse.jetty.server.HttpInput@186d26d3].

🙌 1

emccue03:12:47

You can (update response :body slurp) to get it as a string

leif21:12:43

And I'd rather print out the request as, say a string (or byte array).

leif21:12:52

err...the body of the request.

leif21:12:11

Since its kind of hard to see what HttpInput means after the fact...

leif21:12:13

Err...this might be wrong channel...sorry about that. 🙂

2021-12-15

Channels