beginners 2022-10-29 | Slack Archive

Should I and can I express this cond in terms of a protocol?

(fn [elm]
     (cond
       (associative? elm)
       (str/join (combine-texts (:text elm)))
       (seqable? elm)
       (str/join (combine-texts elm))
       :else elm))

Benjamin09:10:02

(defn combine-texts [coll]
  (walk/prewalk
   (fn [elm]
     (cond
       (map? elm)
       (str/join (combine-texts (:text elm)))
       (seqable? elm)
       (str/join (map combine-texts elm))
       :else elm))
   coll))

(combine-texts [{:text "fo"}])
"fo"
(combine-texts [{:text "fo"}
                {:text "fa"}])
"fofa"
(combine-texts [{:text "fo"}
                {:text {:text ["hurr" "durr"]}}])
"fohurrdurr"

there where a few errs in what I initially posted. This is the functionally correct version of what I want to achieve

pithyless11:10:12

Why a protocol? What would you try to gain from that?

pithyless11:10:13

There may be more hidden context to your question, but if you're only interested in the strings; perhaps consider postwalk?

(defn combine-texts [data]
  (let [^StringBuilder sb (new StringBuilder "")]
    (walk/postwalk
      (fn [x] (when (string? x) (.append sb x)))
      data)
    (str sb)))

(combine-texts
  [{:text "fo"}
   {:text {:text ["hurr" "durr"]}}])
; => "fohurrdurr"

pithyless11:10:42

(you may consider just using regular lists/apply/conj or a transient collection, if you don't want to rely on the StringBuilder)

Benjamin12:10:13

I see interesting

tschady13:10:47

I’d try tree-seq to get the collection.

Matthew Twomey15:10:16

Question about a good pattern: I have a function that calls an API and gets back an array of maps. I then doseq through those results and print “sutff” about certain results in that map. This all works fine. I have many of these functions, calling different APIs. The API calls are slow’ish. So I thought - let me do them with all with futures (with the deref/doall/map/future pattern). This also “works” - however since they are running in parallel - the output from the functions are all intermixed (which I now understand). So how to fix this? My first thought is that instead of printing inside a doseq loop, I could “build up a string” (in each function) using an atom and only print it out at the end. Thoughts on this? Is there a much better pattern I should consider?

pfeodrippe15:10:08

An alternative to print is to use tap>, with this you can control what to do with each element (e.g. collecting to an atom as you mentioned). And you also are able to leverage tools like https://github.com/djblue/portal, which can help a lot with debugging.

Matthew Twomey15:10:21

Ok - thanks @U5R6XUARE I will check out tap>!

Ben Sless16:10:40

Alternatively, send the logging to an agent, which works similarly

Matthew Twomey16:10:04

Oooh that’s a new one to me (agents). Ok, will read up - ty.

pithyless16:10:56

Consider how important order is for your use-case. You've obviously hit an issue where printing to console is intermingled. But is it enough if every "print" is on a separate line? Or do you need all the prints from a single completed API to show up together, before a different API (irrespective of whether you start them async together or not)? Or do you need the API results to show up in a specific order (but don't mind if the other APIs are running in the background - just won't appear in the output out-of-order)?

pithyless16:10:17

^ If any of those are true, you may consider that the API call should do its work asynchronously (e.g. by returning a promise or swapping some atom), but the work should not have a visible impact on the world (i.e. the API functions return transformed data, and you can then control how and when to print it irrespective of when it completes).

Matthew Twomey17:10:32

In my use case - I am unconcerned with the order that each API finishes (and produces output). However, I do want the (multi-line) output of each function to be “together” in the output. For now, I did actually do as you’re hinting - for each function I gather the output into an atom, and only print / expose the output at the end. This seems to work ok.

pithyless17:10:22

Are you sure you need an atom? Can't the function map over the data and return a collection without swapping on an atom?

Matthew Twomey17:10:47

I could do that - but I think that then I would need to wait for all the functions to finish before I can see any output. I’m wanting to see the output of each function as it finishes.

pithyless17:10:52

So, I'm not quite sure how atom is helping you achieve this; but I think I may be misinterpreting your code. Nevertheless, just want to throw out one more hack I've used before - if it's just a question of making sure the print output is not intermingled, you can wrap the print calls in a locking mechanism. Just make sure you don't wrap more than just the necessary stuff inside the locking transaction. :)

Matthew Twomey17:10:30

Oooh, ok - I will look into that, I haven’t heard of it (being pretty new here). Thank you very much!

Matthew Twomey17:10:44

As an aside on what I’m doing: each function in addition to parsing API results, outputs other things for human consumption - like a header row, description, summary results, …etc. So it’s not just the doseq through the API results, it’s this other stuff for humans.

Matthew Twomey17:10:02

So by using an atom, I can build all this output into a single string, then print it with a single print at the end - so that it’s not interrupted.

Matthew Twomey17:10:10

but perhaps the locking might be more simple to use for this case, will compare.

pithyless17:10:58

How are you ensuring that 2 "completed" atoms don't start printing at the same time?

Matthew Twomey17:10:19

It seems that a single print is “uninterruptible” - so that doesn’t “appear” to be an issue here.

Matthew Twomey17:10:16

(since the entire output of a given function, is in that atom and being printed with a single print statement)

Matthew Twomey17:10:13

(each function has its own atom in a let)

pithyless18:10:27

print is not "uninterruptible"

user=> (do (doall (repeatedly 3 #(future (print (interpose " " (range 10)))))) nil)
nil(((
0 0     10 1     2    2   1   3   3user=>    2
 4   3       4  5 4  5      6  5  6    7 6       78     7 9 )8     89   9))

But, it does seem that passing in a single string argument to print appears to be synchronized (I'm not sure if you can depend on this with every JVM implementation):

user=> (do (doall (repeatedly 3 #(future (print (apply str (interpose " " (range 10))))))) nil)
nil
0 1 2 3 4 5 6 7 8 90 1 2 3 4 5 6 7 8 90 1 2 3 4 5 6 7 8 9

Matthew Twomey18:10:05

Ooooh, interesting - good to know, thanks!

Matthew Twomey18:10:47

I think I’ll definitely look at the locking in this case, just to be extra safe.

Matthew Twomey19:10:21

For anyone that searches this - this thread was also very illuminating to me: https://stackoverflow.com/questions/18662301/make-clojures-println-thread-safe-in-the-same-way-as-in-java

walterl20:10:20

Wouldn't it be simpler to rather enqueue your output strings on a different core.async channel or queue, and have one, separate consumer of that queue that does the actual printing? That way you separate the processing of the API output from the output thereof, while not having to worry about coordinating output.

Matthew Twomey20:10:22

I don’t think this solves the issue of the “prints” from one function being interrupted by the “prints” from another function.

walterl20:10:42

With a queue there will only be a single printing function, printing items from the queue. Your parallel processing functions just feed their output to the queue rather than printing it themselves.

Matthew Twomey20:10:49

Yes - but the issue is that a single function will feed several things and in-between those things, another function may also feed into the queue - causing the queue to have intermingled content.

Matthew Twomey20:10:20

I’m trying to have all the output from a single function show up “together”.

walterl20:10:46

Is the output of a single function too large to collect and add to the output queue together?

Matthew Twomey20:10:57

No - it’s not. This is why I was simply collecting it into an atom.

Matthew Twomey20:10:30

Sending it to a queue, item by item, means that other functions may also send to the queue in-between those.

Matthew Twomey20:10:59

It does appear that simply wrapping my prints into a (locking *out*... solves the issue in a simple way.

didibus23:10:35

I think you might just be firing things strangely? Try this:

(->>
  [(future (fn1)) (future (fn2))]
  (mapcat deref)
  (run! println))

didibus00:10:03

Another issue is print and println with more than one argument are not synchronized, so the printing of each argument can be interleaves with the other print calls. So if you do something like (apply println [...]) in parallel you'll get interleaved printing, Same thing will happen if you (map println [...]) with parallel versions of this. Now it's not println thats not synchronized, but your map itself isn't. You can use locking to synchronize them, you can also use agent, etc. But again, in your case, you can just try what I showed, simple and easy. Now it does mean it will only start printing once the first function returns, so possibly if you didn't need that and wanted it to start printing faster, but I think the simplicity here is worth it, you're still calling the APIs in parallel.

didibus00:10:05

Or if you really cared, just use str

(->> [fn1 fn2]
     (map #(future (println (apply str (%)))))
     (run! deref))

And you can use join instead if you want a separator between them or whatever.

Matthew Twomey07:10:36

Thanks @U0K064KQV, I will give this a try!

pithyless07:11:16

Hey @U0250GGJGAE - had a deja vu reading this thread :) https://clojurians.slack.com/archives/CLX41ASCS/p1667492157486109?thread_ts=1667490033.012539&cid=CLX41ASCS

Matthew Twomey21:11:58

Ha yes! @U05476190

2022-10-29

Channels