This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-01-09
Channels
- # beginners (38)
- # boot (160)
- # cider (143)
- # cljs-dev (62)
- # cljsjs (2)
- # cljsrn (3)
- # clojure (278)
- # clojure-austin (8)
- # clojure-brasil (5)
- # clojure-greece (2)
- # clojure-italy (11)
- # clojure-russia (188)
- # clojure-sg (2)
- # clojure-spec (118)
- # clojure-uk (103)
- # clojurescript (87)
- # core-async (8)
- # cryogen (2)
- # cursive (12)
- # datomic (119)
- # emacs (13)
- # hoplon (4)
- # immutant (12)
- # off-topic (12)
- # om (54)
- # om-next (5)
- # onyx (1)
- # pedestal (2)
- # portland-or (2)
- # re-frame (58)
- # reagent (18)
- # ring-swagger (18)
- # rum (4)
- # spacemacs (4)
- # specter (3)
- # untangled (65)
- # yada (25)
Bore da
@paulspencerwilliams is your talk going to be recorded?
BTW for those of you on Linux if you ever want to do a screen recording, a talk or even pair with someone and you want to show your keypresses on screen, this is quite cool https://github.com/wavexx/screenkey
Looks like I'll be walking from KX to Camden this morning!
@agile_geek not sure, but it's only an intro. Might try and run it at Brum FP in slightly more detail and should be able to get that one recorded...
@paulspencerwilliams would be good if you can
Morning all š
Here's a Clojure question for a change. Is it just me who finds the file io in Clojure awkward? If I want to read a large file lazily I end up having to either:
1. use with-open
somewhere at the top of a stack of calls that deal with each line of the file and line-seq
in a function to process each line
2. write a custom function that opens a reader and recursively calls a closed over helper fn (with a lazy-seq wrapping the body) and manually calls .read
conj'ing the results to its recursive call. E.g.
(defn read-messages [filename]
(let [reader (io/reader filename)]
(letfn [(lines []
(lazy-seq
(try
(if-let [line (.readLine reader)]
(cons line (lines))
(.close reader))
(catch Exception e (when reader (.close reader))))))]
(lines))))
The first option feels ugly as I have to wrap with-open
around a stack of fns in some top level function when the top level function doesn't need to know about the file. The second option is much cleaner but I have to write this everytime. I would have thought this pattern so common there would be a std abstraction/fn to do this? Or am I completely missing something obvious here?@jonpither see point 1
I end up with the with-open
several levels higher up OR I have to pass the bulk of the app as a higher order fn doseq
within the fn that uses line-seq
. That also works but I struggle a bit with thinking that way although maybe that's a better approach as it means the fn to read the lines calls the fn stack to process them (inversion of control) but when I start nesting writes inside the fn that processes each line it can get ugly.
I don't think I'm expressing this issue well in here. I will prepare a gist or something tonight with examples of approaches I can think of and post it here to see what people think.
I vaguely remember the one time I had to deal with bigger files it wasn't straight forward.
I guess with small files you don't actually notice it when you do something wrong (eg not close file handle)
I've done it a number of times but with many months in between and it always takes me ages to figure out how to process them lazily in a sensible way. I seem to lose hours on this everytime. But again I am a bit slow sometimes.
@thomas with small files you can just process them eagerly and have the whole file in memory
@agile_geek never had a particular problem with the with-open idiom. I see the point to be forced to have it at the top of the computation, but that can always be hidden inside the the first layer that you consider āpublic interfaceā.
If you donāt mind the library, https://github.com/thebusby/iota approach is quite smart.
@reborg hmm, I think it's maybe just a hang up from my procedural programming days! The more I think about it the more I think I am favouring having a with-open
and a line-seq
in a doseq
at the top of the computation and passing the body of the doseq
the computation to perform on each line.
It's just that I go through this thought process everytime I need to process a file lazily and it never seems to come naturally to me.
@agile_geek perhaps https://github.com/ring-clojure/ring/blob/1.5.0/ring-core/src/ring/util/io.clj#L11?
@agile_geek: Iāve done quite a lot of I/O on large filesā¦ including a lot of reifying files as lazy seqsā¦ Iāve come to the conclusion though that the best way is to avoid laziness & I/O altogetherā¦ Best option in my mind is to use a transducer if you can.
@rickmoynihan not sure I understand. How can you use a transducer to avoid I/O and laziness?
agile_geek: I/O is one of the main things transducers/reducers were intended to supportā¦ itās just not that widely known.
@rickmoynihan any links to examples?
agile_geek: yeah hold on...
canāt remember the name of it
I've used core.async pipelines with transducers with a with-open
at the top of the stack of fn's but I think I was just reading each line in a go
loop from the file and putting to a channel
from memory
you donāt need to use channels - you can use the underlying java io methods with them
I'm still struggling to understand.
sorry still trying to find an example
OK - no hurry š
sorry I canāt find a better opensource example but you can take a look at this code where I was tying to benchmark whether it was worth writing a transducer based csv parser.
Code was never meant to be used, but basically you just hook into CollReduce
on the IOstream and write the rest as xforms e.g hereās a really crude CSV parser:
https://github.com/RickMoynihan/transducer-csv/blob/master/src/transduce_csv/bench.clj#L36
basically you need something like this bit: https://github.com/RickMoynihan/transducer-csv/blob/master/src/transduce_csv/core.clj#L10
though I think ideally youād move the .readline
out into a transducer function too
there is another repo somewhere that does something similarā¦ wracks brain
@agile_geek: let me know if it still doesnāt make sense
@agile_geek: Ahhh hereās the other repoā¦ IIRC I discovered this after Iād done the above, but itās basically the same idea:
https://github.com/pjstadig/reducible-stream
I think it would benefit from being wrapped up at a lower level of abstraction thoughā¦ more like
Cheers
the big problem with laziness + IO is that the life cycle of the laziness is different from the resource life-cycleā¦ i.e. how do you know when the consumer is done with the sequence? You donāt, hence you have to wrap it in a with-open
and consume it eagerly somewhere, which means somewhere you have to treat it like it's eager.
Transducers solve this, by actually being eager; but by separating the computational elements out (so it still feels lazy)ā¦ e.g. a transducer can know that itās the final (take 10 ,,,)
and close the stream after the items have been taken; and as a user you donāt need to bother wrapping it in with-open
& doall
anymore
interesting rickmoynihan although I donāt think reducers/xducer are solving all problems
theres no such thing as a silver bullet š
But they do solve the resource issue with I/O - whilst letting you map/filter/etc over files
if you donāt want/need to reduce but just process a lazy-seq you still have that problem of closing the IO resource somewhere. If you reduce you have the possibility to hook up the .close
at the end of the reduction which I find quite fine idea
was looking at this from reborg above https://clojurians.slack.com/archives/clojure-uk/p1483959035002035
was wondering what the hashmap equivalent would be (other than a database (key value or otherwise))
reborg: yeah but the problem with lazyseqs and closing the resource is why Iām saying use a (trans|re)ducer
.... I think if you want a lazy-seq coupled to I/O youāre doing it wrong. lazy-seq + I/O has historically been used a lot, but Rich has been saying itās bad since basically Clojure 1.0
@otfrom: thanks - Iād forgotten about iota.
the general theme iām seeing is passing the data processing function into the IO code, rather than passing the IO reference into some data processing code
glenjamin: +1
I think the biggest issue with this approach is how to use it with things like ringā¦ e.g. to my knowledge you canāt really return a transducer to ring
I'm wondering if konserve might be my huckleberry https://github.com/replikativ/konserve
rickmoynihan not sure there is a wrong/good, I suppose it depends on your requirements. Personally Iād say use lazy-seqs with IO if your app is fetching/processing/spitting out and you canāt afford to bring the thing in memory
i think the key thing is that the lazy-seq should stay within a single function, and not get passed around
glenjamin: agreed - but then itās not really like a normal lazy seq
@glenjamin I agree about the "passing the data processing function into the IO code" but I need a way to remember this! I think I feel a blog post coming on. Usually helps me solidify my thinking
reborg: You can totally do thatā¦ but I think the problem is that clojure.core doesnāt have transducible-IO libraryā¦ Having done a huge amount of lazy-seq I/O Iām trying to slowly move away from it - because the costs of laziness are really high for large datasets and controlling resource cycle is a pain. You can use transducer/reducer based I/O and still not load everything into memory.
As soon as you reduce (xform or not) you are not loading everything into memory by design š Is only that not all problems can be solved with a reduce as the last form
reborg: not trueā¦
You can e.g. reduce into a file
or an outputstream
just need to implement CollReduce
and friends
it basically entirely depends on the reducing functionā¦ e.g. here I transduce over a 1gb CSV file to count the linesā¦ I donāt hold more than a line of the file in memory at any one point in timeā¦ Infact itās actually much easier on memory/GC than the equivalent lazy-seq code: https://github.com/RickMoynihan/transducer-csv/blob/master/src/transduce_csv/bench.clj#L36
the reducing function could equally just write it to an outputstream (to do a file copy)
What I mean is that (reduce + (range 1e9))
is not going to load all the 1e9 items in memory at any given time. The head is consumed and garbage collected and yes, depending on the reducing function.
reborg: completely agreeā¦ and like I said you can use lazy-seqās for I/O - itās something Iāve done a huge amountā¦ however maintaing a code base with a lot of I/O done in that style makes me yearn for something betterā¦ e.g. hacks where you add (.close rdr)
to the end of a lazy seq is far from ideal.
@agile_geek upgrade from laziness and go async - manifold streams have on-closed
and on-drained
callbacks which can be used for the resource close - i would certainly like to use a stream based file i/o lib, though i haven't cared enough to implement one yet. some stream info - https://github.com/ztellman/manifold/blob/master/docs/stream.md
mccraigmccraig: itās definitely an option, and a lot of people seem to be moving that wayā¦ I do wonder if core.async chans could compete with the performance of a transducer based I/O solution thoughā¦ also transducers can work with core.async chans too so not necessarily mutually exclusive solutionsā¦ e.g could probably wire a CollReduce
reader through a transducer into a core.async channel pretty seamlessly
bigger issues are as you say having to be aware of the differences between blocking/non-blocking I/O - Iāve been assuming blocking I/O so far as nio is not supported by libraries I need
and whilst nio is trendy - I really donāt typically need to handle 100k concurrent connections on a single server
yes, without nio you are stuffed for async models - manifold will cheat for you and use a threadpool to manage blocking actions, but there's not a lot of point to using that feature if all your i/o ops are blocking
yeah - weāre pretty tied to a suite of parsers built on blocking I/O - async doesnāt make much sense for us
I really do wish the clojure I/O ecosystem was a little less fractured
rickmoynihan mccraigmccraig who has the best I/O approach in your opinion or do we need to come up with one?
otfrom: I think thereās necessarily basically two approaches to I/O on the JVM: async & blocking, so clojure needs at a minimum two approaches also. I think in an ideal world clojure would provide an io mechanism that extended CollReduce etc to readers out of the box and that backing I/O and effects with lazy-seqās would be frowned upon or treated as deprecated/legacy. So yeah I think thereās work needed to unify this stuff - and that is probably "a new wayā.
i/o covers a tonne of stuff @otfrom ... i've been happy with my network i/o options recently (via aleph), but i would have liked an async file i/o lib
converting async to blocking is easy though, so i think there only needs to be one async approach !
(with some blocking lip-gloss)
there are some problems thoughā¦ I had thought that because you can call sequence
on a transducer to return a lazy seq - you could potentially interop with lazy-seq stuff whilst having a transducer backed solution through this mechanism; but Iām not sure it has some problems as Seqable
etc arenāt protocols - so might be doable but YMMV on that front.
async -> blocking can be done as mccraigmccraig says; but if the abstraction has a cost (core.asyc chanās seem to), Iām not sure you want to require everything to be expressed as an async thingā¦ I think transducers might provide a way out thoughā¦ as you can pick an async/blocking source/destination depending on your needs and providing you mix in the appropriate xforms write the bulk of the transducer with no knowledge of either.
also thereās been murmerings from cognitect that transducers will eventually be made to support the parallel cases of reducers too
you could e.g. do something like (def xf (comp (->AsyncFile ā/blah/file.csvā) async-split-lines record->hash-map))
or (def xf (comp (->BlockingFile ā/blah/file.csvā) split-lines record->hash-map))
But basically the user should decide.
Iāve not though too much about the async case though
ok lunch time!
well, i'm all for abstractions which let the user decide
I ended up with a function that took a reader and a transducer, and applied it: https://github.com/glenjamin/hand-parser/blob/master/src/hand/parser.clj#L5
lol readuction
:thumbsup:
mccraigmccraig: totally agree that i/o covers a tonne of things though - so might not be possible to have one approach to rule them all
Anyone 'attending' Clojure Remote?
I thought the tickets were really expensive Tbh. Though there might be a good reason.