This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-11-12
Channels
- # announcements (1)
- # architecture (112)
- # asami (22)
- # babashka (74)
- # beginners (189)
- # chlorine-clover (4)
- # cider (105)
- # clj-kondo (21)
- # clojure (45)
- # clojure-australia (1)
- # clojure-europe (26)
- # clojure-losangeles (4)
- # clojure-nl (3)
- # clojure-spec (5)
- # clojure-uk (8)
- # clojurescript (16)
- # conjure (1)
- # cursive (29)
- # datascript (21)
- # datomic (35)
- # events (1)
- # fulcro (12)
- # graalvm (3)
- # graphql (31)
- # kaocha (13)
- # malli (14)
- # meander (3)
- # mount (3)
- # off-topic (73)
- # pathom (9)
- # pedestal (5)
- # portal (2)
- # re-frame (4)
- # reagent (8)
- # reitit (3)
- # rum (1)
- # shadow-cljs (26)
- # spacemacs (3)
- # sql (6)
Good morning
@ordnungswidrig cool. Glad it worked. What stack did you use in the end?
http://Tech.ml and vega
Anyone here got a good suggestion for something like nippy but that writes out records in a file rather than just a big take it or leave it data structure?
file per record?
I've done something before with baldr and record separators, but that felt a bit janky
file per record would overwhelm the OS file handles I think. There are about 2-10 million records
I like the speed of nippy, and the compression is pretty good too, but I lose a lot of compression by needing to split things up and I lose a lot of file efficiency by having each file be a single vector of records that gets read in
looks like transit, based on fressian, might be the sweet spot? Looks like you can read and write individual objects from a stream. https://cognitect.github.io/transit-clj/#cognitect.transit/read
and there is a reducible friendly wrapper already https://gitlab.com/pjstadig/reducibles
probably not the performance you are looking for, but this is the main reason for ednl https://github.com/lambdaisland/edn-lines
as this is often the eduction channel, I've been looking at @ben.hammond's blog post here: https://juxt.pro/blog/ontheflycollections-with-reducible and thinking that you don't need to have a reducible for the directory of files, you just need a reducible for each file type, you can then have a vector of eduction of those reducibles which would give you all your short circuiting/ reduced? functionality if you did something like
(eduction ;; changed from sequence thanks to Ben Hammond's advice
cat
[(eduction mappify-record (reducible-type-1 file-1))
(eduction mappfiy-record (redcucible-type-1 file-2))])
you can replace sequence with eduction depending on whether or not you want to have the results in memory or recalculate them each time (from what I understand)
An eduction of a reducible might not implement ISeq
, at which point things start breaking