babashka 2021-12-11 | Slack Archive

I thought it was more clj-kondo shaped today!

😆 2

I found this snippet gives the wrong error message in babashka

(do
  (defn rf [a b] (conj a b))
  (reduce rf []))
;; Wrong number of args (0) passed to: clojure.core/reduce

where in clojure it’s

Wrong number of args (0) passed to: user/rf

borkdude11:12:18

Feel free to post an issue.

Cnly11:12:34

Should it be a bb one or sci?

probably SCI

okay, done

thanks

I'm running a command with very long output (~5mb of text in total) and I need to process each line. I'm using a similar approach as described here: https://github.com/babashka/process#processing-output but it's still a bit slow. Any general advice for processing large outputs?

borkdude21:12:15

@alex.sheluchin do you need to process the output as you go or can you do it first by writing a file or in-memory string and then process the file?

sheluchin21:12:49

@borkdude I don't have any strict requirements about that.

borkdude21:12:05

Then I recommend {:out :string}and then process the output string

borkdude21:12:30

This approach might help with not holding everything in memory at once https://blog.michielborkent.nl/transducing-text.html

😍 1

borkdude21:12:39

if you need that

sheluchin21:12:39

@borkdude I've tried to output to string, split lines, and then do some processing, but it's still pretty slow. Like, with 10K lines of output:

(time                                        
 (doall                                      
  (doseq [part (->> (bp/process cmd          
                     {:dir dir               
                      :err :inherit          
                      :out :string           
                      :shutdown bp/destroy}) 
                    deref                    
                    :out                     
                    string/split-lines)])))) 

"Elapsed time: 11573.721905 msecs"

I'll read your transducing article next.

borkdude21:12:36

yeah, maybe it makes sense to write to a file first with :out (io/file ...)

borkdude21:12:45

and then read from the file

borkdude21:12:40

$ bb -e '(-> (babashka.process/process ["cat" "/tmp/lines.txt"] {:out (io/file "/tmp/out.txt")}) deref)'

borkdude21:12:16

if that part is slow, then there might be another solution, but try that first

borkdude21:12:41

you might first have to measure which part is causing the slowness

sheluchin22:12:40

(prn "processing")                                           
@(bp/process cmd                                             
  {:dir dir                                                  
   :err :inherit                                             
   :out (io/file out-file)                                   
   :shutdown bp/destroy})                                    
; (prn "sleeping")                                           
; (Thread/sleep 2000)                                        
(prn "opening")                                              
(with-open [rdr ( out-file)]           
  (def all-data (doall (->> (line-seq rdr)                   
                            (partition 2)                    
                            (map #(string/join " " %))))))))

The process call and its deref here takes about 10s with 10K lines.

sheluchin22:12:44

h0bbit05:12:03

Unrelated question: I'd love to know how you generated this flamegraph. What can I read to learn more about it?

sheluchin12:12:33

@U051S5XR3 https://github.com/clojure-goes-fast/clj-async-profiler http://clojure-goes-fast.com/blog/clj-async-profiler-tips/

h0bbit14:12:06

Thank you @alex.sheluchin

👍 1

borkdude22:12:40

Maybe the process which delivers the output is just slow? Can that be the case?

borkdude22:12:58

I tested a file with 100k lines which happens in milliseconds for me.

sheluchin22:12:06

Looks like a lot of it is just io, I guess as expected. Does this seem like there are options to reduce processing time or am I just running into my machine's limits here?

sheluchin22:12:55

heh, it's just git log. You are right, it's slow itself:

real    0m12.475s
user    0m10.921s
sys     0m1.028s

sheluchin22:12:50

Hmm, maybe not...

sheluchin22:12:12

No, that is right, it's the command itself. With --no-pager flag it takes just as long.

borkdude22:12:19

try git log > /tmp/output.log

borkdude22:12:23

and then measure the time

borkdude22:12:36

ah ok then

borkdude22:12:31

you could try direct Java interop like this:

(require '[ :as io])

(-> (doto (java.lang.ProcessBuilder. ["git" "log" "--no-pager"])
      (.redirectOutput (java.lang.ProcessBuilder$Redirect/to (io/file "/tmp/output.txt"))))
    (.start)
    (.waitFor))

Perhaps this is faster. If it is, then I can improve something in bb.process for outputting to files, so let me know in that case.

sheluchin22:12:49

Do you know how to set dir with the interop by any chance?

sheluchin22:12:38

I think since it's git log itself that's slow here, this won't make much of a difference, but I'm happy to try it.

borkdude22:12:57

@alex.sheluchin with dir:

(require '[ :as io])

(-> (doto (java.lang.ProcessBuilder. ["cat" "/tmp/lines.txt"])
      (.redirectOutput (java.lang.ProcessBuilder$Redirect/to (io/file "/tmp/output.txt")))
      (.directory (io/file "subdir")))
    (.start)
    (.waitFor))

sheluchin22:12:54

@borkdude "Elapsed time: 11457.470212 msecs" Not much of a change there I'm afraid.

sheluchin22:12:50

Thanks for the help. I'll have to either pull the data I need from the reflog or something or just deal with the delay. Should be alright.

borkdude22:12:05

alright then

2021-12-11

Channels