This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2018-01-07
Channels
- # adventofcode (4)
- # aleph (1)
- # architecture (9)
- # beginners (67)
- # boot (7)
- # boot-dev (12)
- # cider (3)
- # clojure (166)
- # clojure-austin (3)
- # clojure-estonia (1)
- # clojure-greece (2)
- # clojure-russia (5)
- # clojure-spec (1)
- # clojure-uk (4)
- # clojurescript (19)
- # cursive (1)
- # data-science (5)
- # datascript (4)
- # datomic (3)
- # docs (10)
- # emacs (24)
- # events (4)
- # fulcro (16)
- # graphql (8)
- # hoplon (2)
- # jobs-discuss (1)
- # leiningen (5)
- # off-topic (2)
- # planck (30)
- # re-frame (20)
- # reagent (36)
- # ring (3)
- # shadow-cljs (5)
- # spacemacs (1)
- # specter (2)
I’m trying to use planc for processing a huuge file, I thought it might be cool to make it stream based and do ./blah.cljs < file > output
instead of reading the file, processing and then writing
I’m trying to decipher
docs but I don’t understand how to get stdin stream and stdout stream
@nooga If your processing is textual and line based, planck.core/read-line
might be useful
An interesting thing that may easily occur when processing an absolutely huge file this way is head-holding.
My immediate thoughts on that issue is to try to build something that reduces on (iterate (fn [_] (planck.core/read-line)) nil)
I’ve got ~300MB of stanzas like:
AA=12345678
BBA=12345678 CCC=12345678
and I basically need to make it so that they end up as 12345678 12345678 12345678
in separate lines
Also, to really go the transducer route, you'd need the reducible iterator
that is in ClojureScript head, which isn't yet in the shipping Planck. (It is easily built, though via script/pilot
in the Planck source tree.)
@nooga The reason I mention holding head, is that if blah.cljs
looked like
(require '[planck.core :refer [line-seq *in*]])
(run! println (partition 2 (line-seq *in*)))
Then it would print the pairs of lines, and arguably be a clean streaming solution. But it will still hold all lines in memory, if that's a concern.I’m writing an openrisc emulator in Java to have linux running inside of JVM and my main method of debugging is comparing CPU state logs from my emu and openrisc qemu
300 MB should easily fit in RAM. The transducer approach is fun to mess around with though.
I agree. The only reason ClojureScript doesn't clear locals is because there hasn't been much demand for it. Maybe if self-hosted ClojureScript becomes popular, that could cause some demand. In the meanwhile, I've been exploring the "reducible" route, if that makes sense. In other words, you could transduce on the sequence produced by iterate without consuming RAM. The only dirty thing about that approach for this problem is that you'd need to write to stdout as a side effect of the reduction 😞
@nooga I'm checking to see if this doesn't consume RAM:
(require '[planck.core :refer [read-line]])
(transduce (comp (drop 1)
(take-while some?)
(partition-all 2))
(fn [_ x] (println x))
nil
(iterate (fn [_] (read-line)) nil))
Cool. FWIW, Planck also has -s
, -f
, and -O simple
as ways to try to make things run faster.
Well, FWIW, the transducer approach using iterate
(with ClojureScript master) doesn't consume RAM
On Planck master line-seq
is directly reducible. This allows reducing over gigantic files without consuming RAM, avoiding ClojureScript head-holding.
This example is over a 1 GB file.
cljs.user=> (require '[planck.core :refer [line-seq]]
#_=> '[ :as io]
#_=> '[clojure.string :as string])
nil
cljs.user=> (reduce
#_=> (fn [c line]
#_=> (cond-> c
#_=> (string/starts-with? "a" line) inc))
#_=> 0
#_=> (line-seq (io/reader "big.txt")))
134217728