This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
- # beginners (10)
- # boot (14)
- # cider (80)
- # clara (1)
- # cljs-dev (19)
- # cljsrn (7)
- # clojure (284)
- # clojure-france (4)
- # clojure-italy (57)
- # clojure-poland (8)
- # clojure-russia (10)
- # clojure-spec (65)
- # clojure-uk (155)
- # clojurescript (156)
- # code-reviews (6)
- # copenhagen-clojurians (16)
- # cursive (10)
- # datomic (10)
- # emacs (13)
- # euroclojure (1)
- # graphql (4)
- # jobs (2)
- # lein-figwheel (3)
- # luminus (4)
- # off-topic (2)
- # onyx (42)
- # parinfer (23)
- # pedestal (1)
- # protorepl (8)
- # re-frame (34)
- # reagent (17)
- # ring-swagger (5)
- # timbre (24)
- # vim (72)
- # yada (1)
I'm pretty sure I know what the answer will be, but I'll anyway How was EuroClojure, @thomas?
I have my son at home for the next 7 weeks and I work from home so having my own sense of dread 😉 He's already starting to bounce off the walls
and #euroclojure was fantastic as always... albeit a very hot first day (aircon didn't work), @yogidevbear
thank you @thomas, appreciate it! I really enjoyed the other talks as well, there was definitely a theme
@agile_geek I'd suggest hugging all the juxtaposers on the first day and all the bankers on day 2
there was indeed @elise_huard there were some very good ones. and I also really liked all the sketching that was going on.
@agile_geek bank hugs have been some of my best hugs. Even in Swiss banks in the wharf
If I were to admit that I’ve never got my head around core.async (never had cause / a use for it), and that I was wondering if anyone had a particular recommendation vis à vis a primer / intro tutorial, as I think that I have a use for it now, what would people say..?
I’ve ended up here several times: https://github.com/clojure/core.async/blob/master/examples/walkthrough.clj
I suspect there are a few people on this channel who could answer specific questions
the main thing to remember when playing at the REPL is that if you don’t create channels with non-zero sized buffers, or don’t perform operations in separate threads or go-blocks, then you can end up blocked quite easily
…and in production be sure to catch exceptions, or at least be logging them. It can be too easy to loose errors (and the processes that threw them)
there was actually a EuroClojure talk (that doesn’t seem to be up yet) about a library that built on top of core.async because they felt it was too low level
(I am lazy, and have spent a career building apps on top of the work of smarter folk 😉 )
(One day, just once, I would like to have the time + inspiration to break the rule, in fact…)
another approach is to use
manifold, I’ve not spent nearly as much time with it as I have with core.async, but so far I’ve found it pretty plain sailing.
it’s use of deferred values makes it harder to block the REPL - but obviously the buffering still has to happen somewhere!
if you make a note of the nrepl port when you start a repl, you can always connect a second repl to the same process to recover from accidental blocking
I think I mostly want to use core.async as a way of chaining together bits of data processing. Tho I do see how manifold would be good for that too
Yeah, I want to put a bunch of calls to an external API on a channel and have them “do their thing” in their own time in the background. This seems__ to be a good use for core.async… If Manifold is as good / better, then that would be interesting too, but I am really unfamiliar…
@otfrom if your processing is of streams of data then i've found
core.async is fine (as are manifold
Streams) ... if your ops are better suited to promises then i prefer manifold
Deferred ... and if you are wanting streams of promises then manifold is a more complete solution (because the manifold
deferred work easily together
@maleghast API calls are generally more promise-like than stream-like (unless your result is an SSE stream or websocket etc)
@mccraigmccraig - This is VERY true… So Manifold might be a better option, in that case?
I've generally thought of manifold as being something I would use when I want to collect a number of remote resources together in some kind of let
manifold has two distinct core abstractions - the
deferred, which is a promise with callbacks and additional machinery, and the
stream which is a umm stream of values, and supports buffering, backpressure etc
@mccraigmccraig - OK, so here’s the problem… First call to the API I will get a total back as part of the result and as there is no paging functionality “built in” on the API in question I will need to take the total and figure out how many more calls I need to make to get the “rest” of the data. I was going to do this by throwing the calls onto a channel using core.async… I am sensing that their promise-y nature would, you feel, be better suited to Manifold Deferred..?
in my data access lib we actually use a promise of a stream for that sort of access
(which is an improvement over just a plain stream because it allows easy mixing with other calls which return promises of a value)
I was intending to stack up API operations in a channel as a queue so that a) I don’t block execution and b) so that I don’t have to use an actual__ queue (SQS / rabbitMQ etc)
I am starting to think I may not have understood the implications of what I want to do… 😞
i'm not sure - are you just trying to get a stream of records from a paginated api ?
@mccraigmccraig - That’s the point, the API is not paginated, I need to figure out how many pages there are and stack up the calls.
@glenjamin - What I am looking for (and increasingly thinking I am misunderstanding) is a way to stack work up asynchronously in the background so that my call(s) to the external API don’t lock up the whole app / program for minutes at a time.
@mccraigmccraig - yes, but there is no “next” call - the offset and page-size are arbitrary params every call, there is no way to ask for “page 2 of my last query to the API”
a channel of results in core.async is perfectly reasonable @maleghast , as is a manifold stream of deferred responses
so: 1. Make call for first page 2. Process first page (hopefully async) 3. Use total to work out how many more ops I need to make 4. Fill up channel with calls 5. Consume channel “elsewhere”. is my thesis - does that make sense..?
core.async itself uses a threadpool, so you might not need to funnel them all down one channel
@glenjamin - I would like to do as much concurrently as possible, but it’s not a deal-breaker, serial is fine as long as the work can be “kicked off” and left going.
order of results being processed is not important, so concurrency (particularly if it makes the whole thing faster) would be great.
I need to make one call to find out how many results match the request, the vendor / curator of the AI in question is not prepared to produce a simplified response to figure out the size of results sets, so I am stuck with that.
I am assuming that I need to “def” the channel(s) and then have a form that is in an evaluated namespace that is “waiting” for the channel(s) to have something on them..?
a common idiom is to return channels/streams/promises as the result of a query fn @maleghast
many options @maleghast - put the consuming code inside a core.async
go block @maleghast , or create a new channel with a transducer and pipe your first channel to that, or have your api fn take the channel you want responses put on, and pass in a channel with a transducer
similarly in manifold, chain a step onto a deferred https://github.com/ztellman/manifold/blob/master/docs/deferred.md#composing-with-deferreds or map a fn over a stream https://github.com/ztellman/manifold/blob/master/docs/stream.md#stream-operators
@mccraigmccraig - Yeah, that’s what I was meaning ^^ when I said I would need some code, somewhere in an evaluated namespace, that was effectively “listening” for there to be “things” on the channel, just a form at the end of my namespace containing a go block that was consuming a named channel onto which my API function would place “things” …
most web or UI frameworks will already have an event loop to do that i’d have expected?
@maleghast are you wanting to execute the upstream query in response to an API request and return the result to the API client ?
@mccraigmccraig - order of ops: 1. Make first query to API 2. Process result, including calculation of how many more ops required 3. Load up a channel with the other calls
@otfrom - I am happiest when I am “doing the plumbing” and nothing I develop will be consumed by a human, only by machines
@maleghast if it makes you happy, then you clearly aren't dealing with enough data generated by hoo-mans
but it’s my skillset (according to other people) and so I am doing as much plumbing as I can and as little UI as I can get away with
@otfrom - well, yeah, I grant you that, but happier than web-sitey interfacey yubnub
@mccraigmccraig - It’s an app that will periodically (every hour / day not sure yet) make calls to an API, stash the returned data in a database and an ElasticSearch cluster, and then do it all again the next time.
@maleghast you might want to add  concatenate results from each of the page queries into a single record stream
This makes the API into a smaller, custom dataset that can be interrogated via Kibana
@mccraigmccraig a Human, at first, and later on an app that I am not developing that will get the data back out from ES programatically and do “stuff” to it that does not concern me 🙂
@mccraigmccraig - I am not saying I don’t want to add “ concatenate results from each of the page queries into a single record stream”, but I can’t think of why I would do that, and that is probably me being ignorant of the benfits etc. Please could you explain to me why I would add this step - I really am asking, not being a prick, I promise 🙂
haha, well do you want to expose your downstream consumers to an additional level of structure (pages) which is an implementation feature of the upstream API ?
I want to take each of the 100 / 1000 / 10000 results and store them as individual documents in ES and as JSONB fields in Postgres
The API I am “harvesting” has a 90 day sliding window, so over time the queries I make will have different results. I don’t want to keep track of the last article I harvested, nor do I want to have to “find” it in the results to then get all the newer ones. It’s easier to just “eat” the whole response every time and reply on ES refusing to re-import a document with an existing id (into the same index) and on postgres’s ability to enforce a “unique” index on the id field.
but I can’t “get” all of the results in one query, the API limits “pages” to 1000 results, so I need to be able to stack up calls and execute them in an async, non-blocking manner.
yep, so you can concatenate the pages into a single record-stream, and process each of the records individually
OK, I like the sound of this in principle, and I think__ I am sort of doing that already with the synchronous, manual approach, as I get 100 articles back and then I do a doseq over the vector of maps to do INSERT queries into postgres and PUT calls to ES
by "record stream" i mean a conceptual sequence of individual records... could be on a core.async
chan or a manifold
the benefit is just simplicity - a sequence of individual records is a simpler thing than a sequence of pages of individual records... but there are tradeoffs - sometimes you want to deal with pages of records
Oh I see! Right, yeah, I was just going to consume the channel of returned promises with the doseq I already have, so concatenating them together into one HUGE vector first seemed like a redundant step.
i.e. one of the queries I am going to do returns (currently) a little over 13,000 records - I was expecting to grab the results of 14 promises off the channel and “doseq” each one until the channel was empty
I suppose I could consume them off the channel into one big vector, or indeed another channel and then have another consumer running what currently runs inside the doseq on each map / JSON blob that comes out of the channel… Is that what you mean?
so: channel of promises consumer turns vector of maps into another channel of individual maps consumer2 puts maps into DB and ES off second channel ??
possibly even: consumer2 puts maps into DB and onto another channel consumer3 puts maps on third channel into ES
(as an aside @maleghast , doing any long-running or blocking processing in a vanilla core.async
go block isn't a good idea - there is a fixed-size core.async threadpool which you can exhaust, causing blocking - so you can use https://clojure.github.io/core.async/index.html#clojure.core.async/thread )
@mccraigmccraig - This is also good to know, as I would have assumed that the go block macro as “managing” the thread pool… Thanks 🙂
@mccraigmccraig - So if I use thread inside a go block, or instead of a go block..?
(a/go (let [v (a/<! (a/thread (do-blocking-stuff)))] (do-non-blocking-stuff v))
also, beware long-running processes in core.async that expand items with eg.
mapcat operations. You can break back pressure that way. (ie. pages on a channel being expanded into multiple events)
requires a very specific use case to be a problem, but it’s caught a few people out: https://stackoverflow.com/questions/37953401/where-is-the-memory-leak-when-mapcat-breaks-backpressure-in-core-async
you’re not likely to hit it unless you are using a lot of xforms on your channels, and then its easily worked around, but it can work fine in test, and then blow up in prod with more data/longer running processing
Mmmmmm Are you getting that Angry Orchard stuff that comes from America..? That stuff is YUMMY
i was thinking of something a little harder 😬 https://www.ciderbrandy.co.uk/shop.html
Oh the joy of being behind a corporate firewall unable to see anything...how has everyone been today. Anything good happened?
I've heard the sound of @jonpither approaching a project causes
(:require [clojure.core.async :refer :all]) to appear in every file.