This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-12-11
Channels
- # adventofcode (42)
- # asami (13)
- # babashka (40)
- # beginners (25)
- # calva (39)
- # cider (18)
- # circleci (6)
- # cljs-dev (3)
- # clojure (39)
- # clojure-europe (16)
- # clojure-norway (9)
- # clojure-uk (2)
- # clojurescript (42)
- # datalevin (4)
- # datomic (23)
- # fulcro (33)
- # jobs (1)
- # malli (26)
- # minecraft (1)
- # off-topic (88)
- # pedestal (3)
- # polylith (8)
- # re-frame (6)
- # remote-jobs (2)
- # shadow-cljs (20)
- # tools-deps (12)
- # xtdb (5)
I found this snippet gives the wrong error message in babashka
(do
(defn rf [a b] (conj a b))
(reduce rf []))
;; Wrong number of args (0) passed to: clojure.core/reduce
where in clojure itβs
Wrong number of args (0) passed to: user/rf
I'm running a command with very long output (~5mb of text in total) and I need to process each line. I'm using a similar approach as described here: https://github.com/babashka/process#processing-output but it's still a bit slow. Any general advice for processing large outputs?
@alex.sheluchin do you need to process the output as you go or can you do it first by writing a file or in-memory string and then process the file?
This approach might help with not holding everything in memory at once https://blog.michielborkent.nl/transducing-text.html
@borkdude I've tried to output to string, split lines, and then do some processing, but it's still pretty slow. Like, with 10K lines of output:
(time
(doall
(doseq [part (->> (bp/process cmd
{:dir dir
:err :inherit
:out :string
:shutdown bp/destroy})
deref
:out
string/split-lines)]))))
"Elapsed time: 11573.721905 msecs"
I'll read your transducing article next.$ bb -e '(-> (babashka.process/process ["cat" "/tmp/lines.txt"] {:out (io/file "/tmp/out.txt")}) deref)'
(prn "processing")
@(bp/process cmd
{:dir dir
:err :inherit
:out (io/file out-file)
:shutdown bp/destroy})
; (prn "sleeping")
; (Thread/sleep 2000)
(prn "opening")
(with-open [rdr ( out-file)]
(def all-data (doall (->> (line-seq rdr)
(partition 2)
(map #(string/join " " %))))))))
The process
call and its deref here takes about 10s with 10K lines.Unrelated question: I'd love to know how you generated this flamegraph. What can I read to learn more about it?
@U051S5XR3 https://github.com/clojure-goes-fast/clj-async-profiler http://clojure-goes-fast.com/blog/clj-async-profiler-tips/
Looks like a lot of it is just io, I guess as expected. Does this seem like there are options to reduce processing time or am I just running into my machine's limits here?
heh, it's just git log
. You are right, it's slow itself:
real 0m12.475s
user 0m10.921s
sys 0m1.028s
No, that is right, it's the command itself. With --no-pager
flag it takes just as long.
you could try direct Java interop like this:
(require '[ :as io])
(-> (doto (java.lang.ProcessBuilder. ["git" "log" "--no-pager"])
(.redirectOutput (java.lang.ProcessBuilder$Redirect/to (io/file "/tmp/output.txt"))))
(.start)
(.waitFor))
Perhaps this is faster. If it is, then I can improve something in bb.process for outputting to files, so let me know in that case.I think since it's git log
itself that's slow here, this won't make much of a difference, but I'm happy to try it.
@alex.sheluchin with dir:
(require '[ :as io])
(-> (doto (java.lang.ProcessBuilder. ["cat" "/tmp/lines.txt"])
(.redirectOutput (java.lang.ProcessBuilder$Redirect/to (io/file "/tmp/output.txt")))
(.directory (io/file "subdir")))
(.start)
(.waitFor))
@borkdude "Elapsed time: 11457.470212 msecs" Not much of a change there I'm afraid.