This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
- # alda (2)
- # beginners (6)
- # boot (25)
- # cider (10)
- # clojars (5)
- # clojure (81)
- # clojure-brasil (1)
- # clojure-dev (2)
- # clojure-russia (19)
- # clojure-spec (21)
- # clojure-uk (69)
- # clojurescript (23)
- # code-reviews (15)
- # cursive (3)
- # datavis (1)
- # datomic (8)
- # euroclojure (3)
- # events (5)
- # flambo (15)
- # hoplon (17)
- # jobs-rus (13)
- # lambdaisland (50)
- # mount (5)
- # off-topic (3)
- # om (1)
- # parinfer (72)
- # proton (1)
- # protorepl (1)
- # re-frame (17)
- # reagent (59)
- # videos (1)
@otfrom Nice, are you eagerly reading the entire input file into memory with vec intentionally? I guess if it's a small amount of processing on small files, you'll be IO bound, so eager reading makes sense.
chrisjd actually, I should remove that vec given that I'm pushing each line onto the channel
I'm thinking about changing it to pipeline-blocking to work over a seq of files with onto-chan, but I worry a bit about the resource handling without with-open (closing on exceptions, and having
(map #(close-that-stream %)) feels weird
I know that pipeline-blocking can take an exception handler that I can probably use to handle some of the things that can go wrong
chrisjd updated now thx to you https://github.com/otfrom/otfrom.copper/blob/932e387223df90dcbc0f53129bce354938f1413f/src/otfrom/copper.clj#L8-L23
going to try something with my old data friend: NHSE GP Prescriptions http://digital.nhs.uk/searchcatalogue?productid=20844&q=title%3a%22presentation+level+data%22&sort=Relevance&size=10&page=1#top
Didn’t realise the NHS published so much data like that. It must be rewarding to work on that sort of data — things that can have a real-world benefit to people.
@otfrom: some of that dataset is my older friend. Some of that data is still loaded and extracted using COBOL programs I wrote in 1989-91
The shocking thing about that data is I was producing reports via COBOL using a pair of car sized lazer printers that were sent to every practice in the UK in early 90's. Themes changed every quarter and included prescribing of Statins and generics vs proprietary drug prescribing. The cost savings and habit changes identified then are still being highlighted now.
I have no evidence but suspect increased admin, pressure from partially educated patients (internet in its negative aspect) and targets focussed on simple one dimensional metrics (like patient waiting lists) mean GPs don't have time to make convincing case for alternatives or to explain to patients so they take easy root to keep up throughput.
TIL this month: spent a little time learning about spec. Re-found keep-indexed and map-indexed (had forgotten all about them)
Looking like the SoW I submitted to a client a few weeks back is not going to bear fruit until October or possibly November. Oh well, that's how it goes.
agile_geek the only thing we saw that changed GP prescribing behaviour was a Primary Care Trust (PCT) bullying the GPs to do the right thing according to the NICE guidelines and omg did they complain about that a lot
I've been back as a consultant since (2012 and 2014) and it's like The Land Time Forgot
In terms of data analysis it was pretty cutting edge in late eighties/early nineties
Is it just me that finds core.async really hard to reason about? I struggle with how to 'unpack' values from channels within go blocks when not using the blocking take <!!
(let [c (chan)] (go (>! c "hello")) (go (let [res (<! c)] (println res))) (close! c)) ;; randomly prints nil and hello? why?
I am guessing cos println relies on side effects and is evaluated at some point after the take has parked?
but if so what's best way to grab a value from core.async and write it somewhere without using a blocking take?
In that instance, isn’t it just a race between
close!? If you allow
<! to win with
close! then it works fine.
@agile_geek: i think you have a race between
>! ... if
close! happens before
>! nothing will get put on the channel... it shouldn't matter if
close! is called before
close! doesn't prevent takes, just puts
(let [c (chan)] (go (>! c "hello") (close! c)) (go (let [res (<! c)] (println res))))
should be ok, though i don't have an editor to hand, so parens may be all wrong
i generally prefer to use promises rather than core.async for any async stuff which isn't about a stream of values though... core.async doesn't help you out with any error handling and if you are thinking of a
promise-chan you might as well go all the way and use a promise with built-in error handling
mccraigmccraig what I'm really after is the parallelism in pipeline. I often find myself with a seq of files (containing seqs of lines) that I want to do some mapping and then reducing over that are smaller than something I'd do in spark and I'm trying to find good ways of doing that from a performance and clarity pov
@otfrom: if you use manifold-streams to represent those seqs, and manifold's map/reduce operations, then that should efficiently soak up all your CPU resources while providing you with the coordination operations you need
also manifold streams convert straightforwardly to/from core.async channels and have error handling too
@otfrom: https://github.com/ztellman/manifold/blob/master/docs/stream.md is a good intro, though there are no more detailed docs afaik - browsing the fn names in the source is instructive
manifold's conversion capabilities are quite convincing (to me anyway) - i'm using it for everything async on the backend (which is just everything) and it makes it easy to pick up a core.async lib and plug it it, or convert from lazy-seqs etc
actually, i white lie - we interact with /tmp synchronously... everything else is async
(handling the resources is all a bit of a pain I'm looking at this https://github.com/pjstadig/reducible-stream/ )
was thinking I'd do something monadic w/the transducers (but my thinking on this is all pretty early)
mccraigmccraig do you find you get enough help w/o the haskell compiler or using something like core.spec when doing monadic stuff?
it was very much like starting off with lazy-seqs - there are some difficult-to-grok errors at first, which are difficult to relate to their cause, but you quickly learn that there are only a very few types of errors like that and get used to how to diagnose and trace them
in particular, failing to wrap return values from a monadic function and not establishing the context (the lack of static types means you often have to give cats some help in identifying the monadic type)
I'm trying to figure out what my general approach to "annoying size" data is (stuff to small for spark/hadoop but to big to do w/o thinking about it)
hmm... actually, i have a tonne of monadic promise-based code (e.g. https://github.com/employeerepublic/er-cassandra/blob/master/src/er_cassandra/model/select.clj ), but you really want stuff for processing streams... and i don't have any of that
i did just get alia's manifold-stream based queries working for cassandra, for pretty much the reason you outlined (too little for spark, too big for a single query) ... so if you have stuff in cassandra that will help you get it out https://github.com/mpenet/alia/blob/master/modules/alia-manifold/src/qbits/alia/manifold.clj#L62
that's handy. this data size in cassie is one of the things we do a lot of in kixi.hecuba and probably more in kixi.workspaces and kixi.datastore (see various on http://github.com/mastodonc)
there's a core.async equivalent in alia too - https://github.com/mpenet/alia/blob/master/modules/alia-async/src/qbits/alia/async.clj#L63