This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-07-19
Channels
- # announcements (5)
- # beginners (36)
- # calva (2)
- # cider (26)
- # cljs-dev (2)
- # clojure (121)
- # clojure-spec (2)
- # clojure-uk (2)
- # clojurescript (12)
- # component (1)
- # conjure (6)
- # datomic (9)
- # docker (1)
- # fulcro (15)
- # lambdaisland (1)
- # malli (5)
- # meander (24)
- # off-topic (5)
- # re-frame (5)
- # reagent (3)
- # ring (3)
- # shadow-cljs (41)
- # sql (4)
- # vim (3)
- # xtdb (7)
hi, I have a big csv, is it possible to read lazily the lines in inverse order? from tail to head
You can reverse the file with tac file.csv > rev_file.csv
and then read rev_file.csv
normally. Definitely an option if the file isn't too large.
surely someone has made a "from end buffered line reader" on top of mmap
it's inefficient (requires consuming the whole thing and working backward) but doable
actually, mmap might be the one way to do this that doesn't use heap space inefficiently (if I'm remembering the API correctly)
you can use the memory mapped API to do this without putting the whole contents in heap, it even lets you skip to the end and work toward the front, without consuming what's in between https://howtodoinjava.com/java7/nio/memory-mapped-files-mappedbytebuffer/
I'm curious why that would be useful. @vachichng
I assume you have something accumulated in time order and you want to process from new to old
you can use the memory mapped API to do this without putting the whole contents in heap, it even lets you skip to the end and work toward the front, without consuming what's in between https://howtodoinjava.com/java7/nio/memory-mapped-files-mappedbytebuffer/
next question is if anybody hooked that up to read lines in backward order yet
clever usage of tac and ProcessBuilder with output to an iostream instead of file might allow similar via the OS(?)
@drewverlee exactly as Alex said, I have a big cvs of timestamped data that I have to process it windowed by timeframe and the original source has it ordered by recent to old, but the window function needs old to recent
maybe ReversedLinesFileReader would work, http://commons.apache.org/proper/commons-io/apidocs/org/apache/commons/io/input/ReversedLinesFileReader.html
> Reads lines in a file reversely (similar to a BufferedReader, but starting at the last line). Useful for e.g. searching in log files.
nice, I'd hoped someone made that
But if you have to process the whole thing then the out put will be the same no? The window closing logic should handle any order of time stamped events.
many reducing processes aren't associative, ordering can matter a lot
Yes, but I think, for instance, onyxs semantics express the window closing trigger this way.
@drewverlee no, ordering matters because the window function needs to know the oldest timestamp to create sequence
@drewverlee the windows functions looks like, every element inside a initial time and a hour after
starting a thread so we dont lock up beginners. Is the csv ordered by timestamp?
> no, ordering matters because the window function needs to know the oldest timestamp to create sequence What does "create a sequence" mean?
is there a side effect as part of that processing?
that those need to be ordered?
Gotcha. 1. its weird to have ordered the data this way if its not how its used. It means you always pay a performance penalty. This is the main issue, that the data is ordered in such a way that readers pay for it. 2. Out of order data process is unavoiable and has to be accounted for. 3. streaming frameworks with windowing semantics can read out of order data. Say we had data from 3 to 5. so its basically ordered in your csv like 5:01 4:01 3:01. It would read 5:01 first, then 4:01 and the window trigger would close on 5-6 because we saw 4:01 and release that data to kafka (or what ever). The thing reading kafka can and should also account for out of order time stamps (because this is unavoidable). E.g lets say you do reverse the csv and send 3-4 first, what if that network call fails (and has to retry) but 4-5 succeeds and so kafka gets 4-5, 2-3, 5-6 regardless.
If thats more or less all well understood then the solutions suggested by others about reverse reading a csv are likely good ones. i don't have any particular insight there 🙂
@drewverlee thanks, I'm not familiar with kafka, will have a look on that, I was doing the transformations with transducers to feed the kafka topic, because it turns out that the cvs is not ordered properly, so a naive reverse of the cvs is still not ordered as I need it.
hello again everyone! i have a question: is there a preferred structure in a repo for having both a client (cljs) and api (clj)?
i currently used lein new shadow-cljs <app-name> +reagent but curious if there is another way to generate both the client template and a plain lein project? maybe i should just
lein new my-app` and then lein new shadow-cljs
inside of that app?
luminus template has really good defaults and the overall structure(clj/cljs/cljc). lein new luminus new-app +cljs +shadow-cljs
@U0113AVHL2W -- thank you so much! ill take a peek
haha sorry mate. I recommended you giving luminus a try and looks like you experienced the exactly the same issue I was fighting in the recent Luminus template 😄 Infinite build completed/compiling Referenced issue https://github.com/luminus-framework/luminus/issues/270 You may need to upgrade your shadow-cljs to 2.10.16 or use ~ month old luminus template as a workaround
@U0113AVHL2W thank you a ton! i got it resolved when you responded to me in #shadow-cljs i appreciate your support and guidance on this!