This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-07-24
Channels
- # beginners (10)
- # boot (14)
- # cider (80)
- # clara (1)
- # cljs-dev (19)
- # cljsrn (7)
- # clojure (284)
- # clojure-france (4)
- # clojure-italy (57)
- # clojure-poland (8)
- # clojure-russia (10)
- # clojure-spec (65)
- # clojure-uk (155)
- # clojurescript (156)
- # code-reviews (6)
- # copenhagen-clojurians (16)
- # cursive (10)
- # datomic (10)
- # emacs (13)
- # euroclojure (1)
- # graphql (4)
- # jobs (2)
- # lein-figwheel (3)
- # luminus (4)
- # off-topic (2)
- # onyx (42)
- # parinfer (23)
- # pedestal (1)
- # protorepl (8)
- # re-frame (34)
- # reagent (17)
- # ring-swagger (5)
- # timbre (24)
- # vim (72)
- # yada (1)
Bore da
I'm pretty sure I know what the answer will be, but I'll anyway How was EuroClojure, @thomas?
Best of luck, @agile_geek š
I have my son at home for the next 7 weeks and I work from home so having my own sense of dread š He's already starting to bounce off the walls
good luck @agile_geek
and #euroclojure was fantastic as always... albeit a very hot first day (aircon didn't work), @yogidevbear
Hopefully I will be able to come next year š
good morning
Good morning @elise_huard thank you again for your great talk! really informative!
thank you @thomas, appreciate it! I really enjoyed the other talks as well, there was definitely a theme
@agile_geek I'd suggest hugging all the juxtaposers on the first day and all the bankers on day 2
I'm not hugging anyone in a bank
there was indeed @elise_huard there were some very good ones. and I also really liked all the sketching that was going on.
@agile_geek bank hugs have been some of my best hugs. Even in Swiss banks in the wharf
If I were to admit that Iāve never got my head around core.async (never had cause / a use for it), and that I was wondering if anyone had a particular recommendation vis Ć vis a primer / intro tutorial, as I think that I have a use for it now, what would people say..?
Iāve ended up here several times: https://github.com/clojure/core.async/blob/master/examples/walkthrough.clj
I suspect there are a few people on this channel who could answer specific questions
the main thing to remember when playing at the REPL is that if you donāt create channels with non-zero sized buffers, or donāt perform operations in separate threads or go-blocks, then you can end up blocked quite easily
ā¦and in production be sure to catch exceptions, or at least be logging them. It can be too easy to loose errors (and the processes that threw them)
@peterwestmacott - Thanks for the hints š
there was actually a EuroClojure talk (that doesnāt seem to be up yet) about a library that built on top of core.async because they felt it was too low level
(I am lazy, and have spent a career building apps on top of the work of smarter folk š )
(One day, just once, I would like to have the time + inspiration to break the rule, in factā¦)
another approach is to use manifold
, Iāve not spent nearly as much time with it as I have with core.async, but so far Iāve found it pretty plain sailing.
itās use of deferred values makes it harder to block the REPL - but obviously the buffering still has to happen somewhere!
i like manifold
for general async stuff more than core.async
if you make a note of the nrepl port when you start a repl, you can always connect a second repl to the same process to recover from accidental blocking
morning!
I think I mostly want to use core.async as a way of chaining together bits of data processing. Tho I do see how manifold would be good for that too
Yeah, I want to put a bunch of calls to an external API on a channel and have them ādo their thingā in their own time in the background. This seems__ to be a good use for core.asyncā¦ If Manifold is as good / better, then that would be interesting too, but I am really unfamiliarā¦
@otfrom if your processing is of streams of data then i've found core.async
is fine (as are manifold Stream
s) ... if your ops are better suited to promises then i prefer manifold Deferred
... and if you are wanting streams of promises then manifold is a more complete solution (because the manifold stream
and deferred
work easily together
@maleghast API calls are generally more promise-like than stream-like (unless your result is an SSE stream or websocket etc)
there are promise-chan
s in core.async too though
@mccraigmccraig - This is VERY trueā¦ So Manifold might be a better option, in that case?
I've generally thought of manifold as being something I would use when I want to collect a number of remote resources together in some kind of let
manifold has two distinct core abstractions - the deferred
, which is a promise with callbacks and additional machinery, and the stream
which is a umm stream of values, and supports buffering, backpressure etc
@mccraigmccraig - OK, so hereās the problemā¦ First call to the API I will get a total back as part of the result and as there is no paging functionality ābuilt inā on the API in question I will need to take the total and figure out how many more calls I need to make to get the ārestā of the data. I was going to do this by throwing the calls onto a channel using core.asyncā¦ I am sensing that their promise-y nature would, you feel, be better suited to Manifold Deferred..?
a stream or a channel works quite well for that sort of query @maleghast
in my data access lib we actually use a promise of a stream for that sort of access
(which is an improvement over just a plain stream because it allows easy mixing with other calls which return promises of a value)
I was intending to stack up API operations in a channel as a queue so that a) I donāt block execution and b) so that I donāt have to use an actual__ queue (SQS / rabbitMQ etc)
I am starting to think I may not have understood the implications of what I want to doā¦ š
i'm not sure - are you just trying to get a stream of records from a paginated api ?
@mccraigmccraig - Thatās the point, the API is not paginated, I need to figure out how many pages there are and stack up the calls.
@glenjamin - main thread
but you can pass an offset or something ?
@glenjamin - What I am looking for (and increasingly thinking I am misunderstanding) is a way to stack work up asynchronously in the background so that my call(s) to the external API donāt lock up the whole app / program for minutes at a time.
@mccraigmccraig - yes, but there is no ānextā call - the offset and page-size are arbitrary params every call, there is no way to ask for āpage 2 of my last query to the APIā
a channel of results in core.async is perfectly reasonable @maleghast , as is a manifold stream of deferred responses
so: 1. Make call for first page 2. Process first page (hopefully async) 3. Use total to work out how many more ops I need to make 4. Fill up channel with calls 5. Consume channel āelsewhereā. is my thesis - does that make sense..?
core.async itself uses a threadpool, so you might not need to funnel them all down one channel
@glenjamin the app that ārunsā which in turn is a hybrid API / webapp
@glenjamin - I would like to do as much concurrently as possible, but itās not a deal-breaker, serial is fine as long as the work can be ākicked offā and left going.
order of results being processed is not important, so concurrency (particularly if it makes the whole thing faster) would be great.
I need to make one call to find out how many results match the request, the vendor / curator of the AI in question is not prepared to produce a simplified response to figure out the size of results sets, so I am stuck with that.
I am assuming that I need to ādefā the channel(s) and then have a form that is in an evaluated namespace that is āwaitingā for the channel(s) to have something on them..?
a common idiom is to return channels/streams/promises as the result of a query fn @maleghast
many options @maleghast - put the consuming code inside a core.async go
block @maleghast , or create a new channel with a transducer and pipe your first channel to that, or have your api fn take the channel you want responses put on, and pass in a channel with a transducer
similarly in manifold, chain a step onto a deferred https://github.com/ztellman/manifold/blob/master/docs/deferred.md#composing-with-deferreds or map a fn over a stream https://github.com/ztellman/manifold/blob/master/docs/stream.md#stream-operators
@mccraigmccraig - Yeah, thatās what I was meaning ^^ when I said I would need some code, somewhere in an evaluated namespace, that was effectively ālisteningā for there to be āthingsā on the channel, just a form at the end of my namespace containing a go block that was consuming a named channel onto which my API function would place āthingsā ā¦
most web or UI frameworks will already have an event loop to do that iād have expected?
@glenjamin - I donāt think that Edge doesā¦ @dominicm ?
@maleghast are you wanting to execute the upstream query in response to an API request and return the result to the API client ?
@maleghast why? that's where all the glory is
@otfrom "grumbling ETL coder" is a tautology isn't it ?
@mccraigmccraig - order of ops: 1. Make first query to API 2. Process result, including calculation of how many more ops required 3. Load up a channel with the other calls
@mccraigmccraig yeah, like grumpy devops person
@otfrom - I am happiest when I am ādoing the plumbingā and nothing I develop will be consumed by a human, only by machines
@maleghast if it makes you happy, then you clearly aren't dealing with enough data generated by hoo-mans
but itās my skillset (according to other people) and so I am doing as much plumbing as I can and as little UI as I can get away with
@otfrom - well, yeah, I grant you that, but happier than web-sitey interfacey yubnub
@mccraigmccraig - Itās an app that will periodically (every hour / day not sure yet) make calls to an API, stash the returned data in a database and an ElasticSearch cluster, and then do it all again the next time.
@maleghast you might want to add [4] concatenate results from each of the page queries into a single record stream
but what will consume the eventual record stream ?
This makes the API into a smaller, custom dataset that can be interrogated via Kibana
@mccraigmccraig a Human, at first, and later on an app that I am not developing that will get the data back out from ES programatically and do āstuffā to it that does not concern me š
@mccraigmccraig - I am not saying I donāt want to add ā[4] concatenate results from each of the page queries into a single record streamā, but I canāt think of why I would do that, and that is probably me being ignorant of the benfits etc. Please could you explain to me why I would add this step - I really am asking, not being a prick, I promise š
haha, well do you want to expose your downstream consumers to an additional level of structure (pages) which is an implementation feature of the upstream API ?
I want to take each of the 100 / 1000 / 10000 results and store them as individual documents in ES and as JSONB fields in Postgres
The API I am āharvestingā has a 90 day sliding window, so over time the queries I make will have different results. I donāt want to keep track of the last article I harvested, nor do I want to have to āfindā it in the results to then get all the newer ones. Itās easier to just āeatā the whole response every time and reply on ES refusing to re-import a document with an existing id (into the same index) and on postgresās ability to enforce a āuniqueā index on the id field.
but I canāt āgetā all of the results in one query, the API limits āpagesā to 1000 results, so I need to be able to stack up calls and execute them in an async, non-blocking manner.
yep, so you can concatenate the pages into a single record-stream, and process each of the records individually
OK, I like the sound of this in principle, and I think__ I am sort of doing that already with the synchronous, manual approach, as I get 100 articles back and then I do a doseq over the vector of maps to do INSERT queries into postgres and PUT calls to ES
by "record stream" i mean a conceptual sequence of individual records... could be on a core.async chan
or a manifold stream
OKā¦ I guess that I could, I just donāt know what the benefit of doing that isā¦
the benefit is just simplicity - a sequence of individual records is a simpler thing than a sequence of pages of individual records... but there are tradeoffs - sometimes you want to deal with pages of records
Oh I see! Right, yeah, I was just going to consume the channel of returned promises with the doseq I already have, so concatenating them together into one HUGE vector first seemed like a redundant step.
i.e. one of the queries I am going to do returns (currently) a little over 13,000 records - I was expecting to grab the results of 14 promises off the channel and ādoseqā each one until the channel was empty
I suppose I could consume them off the channel into one big vector, or indeed another channel and then have another consumer running what currently runs inside the doseq on each map / JSON blob that comes out of the channelā¦ Is that what you mean?
so: channel of promises consumer turns vector of maps into another channel of individual maps consumer2 puts maps into DB and ES off second channel ??
i meant another channel @maleghast, yes
possibly even: consumer2 puts maps into DB and onto another channel consumer3 puts maps on third channel into ES
This may indeed, now I grok your meaning, be an even better idea, yes š Thanks š
(as an aside @maleghast , doing any long-running or blocking processing in a vanilla core.async go
block isn't a good idea - there is a fixed-size core.async threadpool which you can exhaust, causing blocking - so you can use https://clojure.github.io/core.async/index.html#clojure.core.async/thread )
@mccraigmccraig - This is also good to know, as I would have assumed that the go block macro as āmanagingā the thread poolā¦ Thanks š
@mccraigmccraig - So if I use thread inside a go block, or instead of a go block..?
(a/go
(let [v (a/<! (a/thread (do-blocking-stuff)))]
(do-non-blocking-stuff v))
something like that
Thatās what I thought, but I wasnāt sure enough - thanks for the clarification š
also, beware long-running processes in core.async that expand items with eg. mapcat
operations. You can break back pressure that way. (ie. pages on a channel being expanded into multiple events)
ooo i haven't come across that problem @peterwestmacott ... what happens ?
@peterwestmacott - OK, will do.
requires a very specific use case to be a problem, but itās caught a few people out: https://stackoverflow.com/questions/37953401/where-is-the-memory-leak-when-mapcat-breaks-backpressure-in-core-async
youāre not likely to hit it unless you are using a lot of xforms on your channels, and then its easily worked around, but it can work fine in test, and then blow up in prod with more data/longer running processing
Mmmmmm Are you getting that Angry Orchard stuff that comes from America..? That stuff is YUMMY
i was thinking of something a little harder š¬ https://www.ciderbrandy.co.uk/shop.html
that still counts as cider, right ?
Oh the joy of being behind a corporate firewall unable to see anything...how has everyone been today. Anything good happened?
we had the agm of the async clojure appreciation society @agile_geek š
@mccraigmccraig I recommend this also: http://www.newforestcider.co.uk/ š
I've heard the sound of @jonpither approaching a project causes (:require [clojure.core.async :refer :all])
to appear in every file.