Fork me on GitHub

hi, I have a big csv, is it possible to read lazily the lines in inverse order? from tail to head


You can reverse the file with tac file.csv > rev_file.csv and then read rev_file.csv normally. Definitely an option if the file isn't too large.


surely someone has made a "from end buffered line reader" on top of mmap


it's inefficient (requires consuming the whole thing and working backward) but doable


actually, mmap might be the one way to do this that doesn't use heap space inefficiently (if I'm remembering the API correctly)


you can use the memory mapped API to do this without putting the whole contents in heap, it even lets you skip to the end and work toward the front, without consuming what's in between


I'm curious why that would be useful. @vachichng


I assume you have something accumulated in time order and you want to process from new to old


next question is if anybody hooked that up to read lines in backward order yet


clever usage of tac and ProcessBuilder with output to an iostream instead of file might allow similar via the OS(?)


@drewverlee exactly as Alex said, I have a big cvs of timestamped data that I have to process it windowed by timeframe and the original source has it ordered by recent to old, but the window function needs old to recent


> Reads lines in a file reversely (similar to a BufferedReader, but starting at the last line). Useful for e.g. searching in log files.


nice, I'd hoped someone made that


But if you have to process the whole thing then the out put will be the same no? The window closing logic should handle any order of time stamped events.


many reducing processes aren't associative, ordering can matter a lot


Yes, but I think, for instance, onyxs semantics express the window closing trigger this way.


@drewverlee no, ordering matters because the window function needs to know the oldest timestamp to create sequence


@drewverlee the windows functions looks like, every element inside a initial time and a hour after


@drewverlee the windows functions looks like, every element inside a initial time and a hour after


starting a thread so we dont lock up beginners. Is the csv ordered by timestamp?


> no, ordering matters because the window function needs to know the oldest timestamp to create sequence What does "create a sequence" mean?


a sequence of grouped data created by the window function


yeah, it is ordered by new to old


but, I have to process it from old to new


is there a side effect as part of that processing?


that those need to be ordered?


yes, I have to send to a kafka topic to do further processing


Gotcha. 1. its weird to have ordered the data this way if its not how its used. It means you always pay a performance penalty. This is the main issue, that the data is ordered in such a way that readers pay for it. 2. Out of order data process is unavoiable and has to be accounted for. 3. streaming frameworks with windowing semantics can read out of order data. Say we had data from 3 to 5. so its basically ordered in your csv like 5:01 4:01 3:01. It would read 5:01 first, then 4:01 and the window trigger would close on 5-6 because we saw 4:01 and release that data to kafka (or what ever). The thing reading kafka can and should also account for out of order time stamps (because this is unavoidable). E.g lets say you do reverse the csv and send 3-4 first, what if that network call fails (and has to retry) but 4-5 succeeds and so kafka gets 4-5, 2-3, 5-6 regardless.


If thats more or less all well understood then the solutions suggested by others about reverse reading a csv are likely good ones. i don't have any particular insight there 🙂


@drewverlee thanks, I'm not familiar with kafka, will have a look on that, I was doing the transformations with transducers to feed the kafka topic, because it turns out that the cvs is not ordered properly, so a naive reverse of the cvs is still not ordered as I need it.


hello again everyone! i have a question: is there a preferred structure in a repo for having both a client (cljs) and api (clj)?


i currently used lein new shadow-cljs <app-name> +reagent but curious if there is another way to generate both the client template and a plain lein project? maybe i should just lein new my-app` and then lein new shadow-cljs inside of that app?


luminus template has really good defaults and the overall structure(clj/cljs/cljc). lein new luminus new-app +cljs +shadow-cljs


@ -- thank you so much! ill take a peek


haha sorry mate. I recommended you giving luminus a try and looks like you experienced the exactly the same issue I was fighting in the recent Luminus template 😄 Infinite build completed/compiling Referenced issue You may need to upgrade your shadow-cljs to 2.10.16 or use ~ month old luminus template as a workaround


@ thank you a ton! i got it resolved when you responded to me in #shadow-cljs i appreciate your support and guidance on this!