Fork me on GitHub

morning 😼 ... got back from EuroClojure rather late last night.


how have was things here?


I'm pretty sure I know what the answer will be, but I'll anyway simple_smile How was EuroClojure, @thomas?


Best of luck, @agile_geek 👍


I have my son at home for the next 7 weeks and I work from home so having my own sense of dread 😉 He's already starting to bounce off the walls


and #euroclojure was fantastic as always... albeit a very hot first day (aircon didn't work), @yogidevbear


lots of old friends and even more new ones.


Hopefully I will be able to come next year 😄


Good morning @elise_huard thank you again for your great talk! really informative!


thank you @thomas, appreciate it! I really enjoyed the other talks as well, there was definitely a theme


@agile_geek I'd suggest hugging all the juxtaposers on the first day and all the bankers on day 2


I'm not hugging anyone in a bank


there was indeed @elise_huard there were some very good ones. and I also really liked all the sketching that was going on.


@agile_geek bank hugs have been some of my best hugs. Even in Swiss banks in the wharf


@thomas involves more suits usually


Everyone hugs in one go overnight


I am missing the Clojure Community’s propensity to hug (in person, I mean)


Morning / Afternoon, everyone 🙂


If I were to admit that I’ve never got my head around core.async (never had cause / a use for it), and that I was wondering if anyone had a particular recommendation vis à vis a primer / intro tutorial, as I think that I have a use for it now, what would people say..?


I'd quite like that too. Esp one that deals with core.async + transducers


tho if I carry on as I am, I might be writing one

Rachel Westmacott12:07:07

I suspect there are a few people on this channel who could answer specific questions

Rachel Westmacott12:07:42

the main thing to remember when playing at the REPL is that if you don’t create channels with non-zero sized buffers, or don’t perform operations in separate threads or go-blocks, then you can end up blocked quite easily

Rachel Westmacott12:07:39

…and in production be sure to catch exceptions, or at least be logging them. It can be too easy to loose errors (and the processes that threw them)


@peterwestmacott - Thanks for the hints 🙂


@otfrom - Let me know if you write that tutorial, I might have need of it 😉

Rachel Westmacott12:07:22

there was actually a EuroClojure talk (that doesn’t seem to be up yet) about a library that built on top of core.async because they felt it was too low level


is this the clojure otp one?


Oooh, I like the sound of that!


(I am lazy, and have spent a career building apps on top of the work of smarter folk 😉 )


(One day, just once, I would like to have the time + inspiration to break the rule, in fact…)

Rachel Westmacott13:07:31

another approach is to use manifold, I’ve not spent nearly as much time with it as I have with core.async, but so far I’ve found it pretty plain sailing.

Rachel Westmacott13:07:14

it’s use of deferred values makes it harder to block the REPL - but obviously the buffering still has to happen somewhere!


i like manifold for general async stuff more than core.async


if you make a note of the nrepl port when you start a repl, you can always connect a second repl to the same process to recover from accidental blocking


I think I mostly want to use core.async as a way of chaining together bits of data processing. Tho I do see how manifold would be good for that too


Yeah, I want to put a bunch of calls to an external API on a channel and have them “do their thing” in their own time in the background. This seems__ to be a good use for core.async… If Manifold is as good / better, then that would be interesting too, but I am really unfamiliar…


(whereas I know what core.async does)


@otfrom if your processing is of streams of data then i've found core.async is fine (as are manifold Streams) ... if your ops are better suited to promises then i prefer manifold Deferred ... and if you are wanting streams of promises then manifold is a more complete solution (because the manifold stream and deferred work easily together


@maleghast API calls are generally more promise-like than stream-like (unless your result is an SSE stream or websocket etc)


I might look again at manifold streams then.


there are promise-chans in core.async too though


@mccraigmccraig - This is VERY true… So Manifold might be a better option, in that case?


I've generally thought of manifold as being something I would use when I want to collect a number of remote resources together in some kind of let


which isn't very stream like


manifold has two distinct core abstractions - the deferred, which is a promise with callbacks and additional machinery, and the stream which is a umm stream of values, and supports buffering, backpressure etc


@mccraigmccraig - OK, so here’s the problem… First call to the API I will get a total back as part of the result and as there is no paging functionality “built in” on the API in question I will need to take the total and figure out how many more calls I need to make to get the “rest” of the data. I was going to do this by throwing the calls onto a channel using core.async… I am sensing that their promise-y nature would, you feel, be better suited to Manifold Deferred..?


a stream or a channel works quite well for that sort of query @maleghast


in my data access lib we actually use a promise of a stream for that sort of access


I think that you may have gone past my understanding barrier…


(which is an improvement over just a plain stream because it allows easy mixing with other calls which return promises of a value)


I was intending to stack up API operations in a channel as a queue so that a) I don’t block execution and b) so that I don’t have to use an actual__ queue (SQS / rabbitMQ etc)


I am starting to think I may not have understood the implications of what I want to do… 😞


i'm not sure - are you just trying to get a stream of records from a paginated api ?


what are you trying to not block the execution of?


@mccraigmccraig - That’s the point, the API is not paginated, I need to figure out how many pages there are and stack up the calls.


but you can pass an offset or something ?


@glenjamin - What I am looking for (and increasingly thinking I am misunderstanding) is a way to stack work up asynchronously in the background so that my call(s) to the external API don’t lock up the whole app / program for minutes at a time.


@mccraigmccraig - yes, but there is no “next” call - the offset and page-size are arbitrary params every call, there is no way to ask for “page 2 of my last query to the API”


a channel of results in core.async is perfectly reasonable @maleghast , as is a manifold stream of deferred responses


so: 1. Make call for first page 2. Process first page (hopefully async) 3. Use total to work out how many more ops I need to make 4. Fill up channel with calls 5. Consume channel “elsewhere”. is my thesis - does that make sense..?


what is the main thread in this context?


core.async itself uses a threadpool, so you might not need to funnel them all down one channel


@glenjamin the app that “runs” which in turn is a hybrid API / webapp


unless you wanted to limit the number of concurrent api calls


@glenjamin - I would like to do as much concurrently as possible, but it’s not a deal-breaker, serial is fine as long as the work can be “kicked off” and left going.


(i.e. in the “background”)


order of results being processed is not important, so concurrency (particularly if it makes the whole thing faster) would be great.


I need to make one call to find out how many results match the request, the vendor / curator of the AI in question is not prepared to produce a simplified response to figure out the size of results sets, so I am stuck with that.


I am assuming that I need to “def” the channel(s) and then have a form that is in an evaluated namespace that is “waiting” for the channel(s) to have something on them..?


a common idiom is to return channels/streams/promises as the result of a query fn @maleghast


OK, but how would I consume them without tying up the main thread?


(I am showing my n00b here, a LOT it feels like)


many options @maleghast - put the consuming code inside a core.async go block @maleghast , or create a new channel with a transducer and pipe your first channel to that, or have your api fn take the channel you want responses put on, and pass in a channel with a transducer


@mccraigmccraig - Yeah, that’s what I was meaning ^^ when I said I would need some code, somewhere in an evaluated namespace, that was effectively “listening” for there to be “things” on the channel, just a form at the end of my namespace containing a go block that was consuming a named channel onto which my API function would place “things” …


most web or UI frameworks will already have an event loop to do that i’d have expected?


or maybe i’m too used to working on web apps


@glenjamin - I don’t think that Edge does… @dominicm ?


honestly, you people with your web programming.


@maleghast are you wanting to execute the upstream query in response to an API request and return the result to the API client ?


@otfrom - I know what you mean, I would be done with the web if I could be…


@maleghast why? that's where all the glory is


@otfrom "grumbling ETL coder" is a tautology isn't it ?


@mccraigmccraig - order of ops: 1. Make first query to API 2. Process result, including calculation of how many more ops required 3. Load up a channel with the other calls


@mccraigmccraig yeah, like grumpy devops person


@otfrom - If I wanted glory I’d be an app developer!


@otfrom - I am happiest when I am “doing the plumbing” and nothing I develop will be consumed by a human, only by machines


making stuff for humans to look at is fraught with pain and suffering


@maleghast if it makes you happy, then you clearly aren't dealing with enough data generated by hoo-mans


but it’s my skillset (according to other people) and so I am doing as much plumbing as I can and as little UI as I can get away with


@otfrom - well, yeah, I grant you that, but happier than web-sitey interfacey yubnub


@mccraigmccraig - It’s an app that will periodically (every hour / day not sure yet) make calls to an API, stash the returned data in a database and an ElasticSearch cluster, and then do it all again the next time.


@maleghast you might want to add [4] concatenate results from each of the page queries into a single record stream


but what will consume the eventual record stream ?


This makes the API into a smaller, custom dataset that can be interrogated via Kibana


@mccraigmccraig a Human, at first, and later on an app that I am not developing that will get the data back out from ES programatically and do “stuff” to it that does not concern me 🙂


@mccraigmccraig - I am not saying I don’t want to add “[4] concatenate results from each of the page queries into a single record stream”, but I can’t think of why I would do that, and that is probably me being ignorant of the benfits etc. Please could you explain to me why I would add this step - I really am asking, not being a prick, I promise 🙂


haha, well do you want to expose your downstream consumers to an additional level of structure (pages) which is an implementation feature of the upstream API ?


I want to take each of the 100 / 1000 / 10000 results and store them as individual documents in ES and as JSONB fields in Postgres


The API I am “harvesting” has a 90 day sliding window, so over time the queries I make will have different results. I don’t want to keep track of the last article I harvested, nor do I want to have to “find” it in the results to then get all the newer ones. It’s easier to just “eat” the whole response every time and reply on ES refusing to re-import a document with an existing id (into the same index) and on postgres’s ability to enforce a “unique” index on the id field.


but I can’t “get” all of the results in one query, the API limits “pages” to 1000 results, so I need to be able to stack up calls and execute them in an async, non-blocking manner.


yep, so you can concatenate the pages into a single record-stream, and process each of the records individually


OK, I like the sound of this in principle, and I think__ I am sort of doing that already with the synchronous, manual approach, as I get 100 articles back and then I do a doseq over the vector of maps to do INSERT queries into postgres and PUT calls to ES


What do you mean by a “record-stream”?


(I think I am misunderstanding terms)


by "record stream" i mean a conceptual sequence of individual records... could be on a core.async chan or a manifold stream


OK… I guess that I could, I just don’t know what the benefit of doing that is…


(and I do want to know, I am feeling ignorant and helpless, not beligerent)


the benefit is just simplicity - a sequence of individual records is a simpler thing than a sequence of pages of individual records... but there are tradeoffs - sometimes you want to deal with pages of records


Oh I see! Right, yeah, I was just going to consume the channel of returned promises with the doseq I already have, so concatenating them together into one HUGE vector first seemed like a redundant step.


i.e. one of the queries I am going to do returns (currently) a little over 13,000 records - I was expecting to grab the results of 14 promises off the channel and “doseq” each one until the channel was empty


I suppose I could consume them off the channel into one big vector, or indeed another channel and then have another consumer running what currently runs inside the doseq on each map / JSON blob that comes out of the channel… Is that what you mean?


so: channel of promises consumer turns vector of maps into another channel of individual maps consumer2 puts maps into DB and ES off second channel ??


i meant another channel @maleghast, yes


possibly even: consumer2 puts maps into DB and onto another channel consumer3 puts maps on third channel into ES


This may indeed, now I grok your meaning, be an even better idea, yes 🙂 Thanks 🙂


(as an aside @maleghast , doing any long-running or blocking processing in a vanilla core.async go block isn't a good idea - there is a fixed-size core.async threadpool which you can exhaust, causing blocking - so you can use )


@mccraigmccraig - This is also good to know, as I would have assumed that the go block macro as “managing” the thread pool… Thanks 🙂


@mccraigmccraig - So if I use thread inside a go block, or instead of a go block..?


(Sorry the Clojure docs are a little but opaque to me at times)


  (let [v (a/<! (a/thread (do-blocking-stuff)))]
    (do-non-blocking-stuff v))


something like that


That’s what I thought, but I wasn’t sure enough - thanks for the clarification 🙂

Rachel Westmacott14:07:55

also, beware long-running processes in core.async that expand items with eg. mapcat operations. You can break back pressure that way. (ie. pages on a channel being expanded into multiple events)


ooo i haven't come across that problem @peterwestmacott ... what happens ?

Rachel Westmacott14:07:13

you’re not likely to hit it unless you are using a lot of xforms on your channels, and then its easily worked around, but it can work fine in test, and then blow up in prod with more data/longer running processing


raises eyebrow This is definitely worth knowing, thanks 🙂


Mmmmmm Are you getting that Angry Orchard stuff that comes from America..? That stuff is YUMMY


i was thinking of something a little harder 😬


that still counts as cider, right ?


(now I want some too)


Oh the joy of being behind a corporate firewall unable to see has everyone been today. Anything good happened?


we had the agm of the async clojure appreciation society @agile_geek 🙂


Love a bit of core async personally.


I've heard the sound of @jonpither approaching a project causes (:require [clojure.core.async :refer :all]) to appear in every file.


:shudder: :refer :all is the devil outside of a test namespace