Fork me on GitHub
#clojure-uk
<
2017-07-24
>
thomas07:07:05

morning 😼 ... got back from EuroClojure rather late last night.

thomas07:07:16

how have was things here?

yogidevbear07:07:19

I'm pretty sure I know what the answer will be, but I'll anyway simple_smile How was EuroClojure, @thomas?

yogidevbear07:07:36

Best of luck, @agile_geek 👍

yogidevbear07:07:39

I have my son at home for the next 7 weeks and I work from home so having my own sense of dread 😉 He's already starting to bounce off the walls

thomas08:07:15

and #euroclojure was fantastic as always... albeit a very hot first day (aircon didn't work), @yogidevbear

thomas08:07:29

lots of old friends and even more new ones.

yogidevbear08:07:30

Hopefully I will be able to come next year 😄

thomas08:07:30

Good morning @elise_huard thank you again for your great talk! really informative!

elise_huard08:07:14

thank you @thomas, appreciate it! I really enjoyed the other talks as well, there was definitely a theme

otfrom09:07:40

@agile_geek I'd suggest hugging all the juxtaposers on the first day and all the bankers on day 2

agile_geek09:07:20

I'm not hugging anyone in a bank

thomas09:07:57

there was indeed @elise_huard there were some very good ones. and I also really liked all the sketching that was going on.

otfrom09:07:52

@agile_geek bank hugs have been some of my best hugs. Even in Swiss banks in the wharf

otfrom11:07:02

@thomas involves more suits usually

glenjamin11:07:11

Everyone hugs in one go overnight

maleghast12:07:18

I am missing the Clojure Community’s propensity to hug (in person, I mean)

maleghast12:07:29

Morning / Afternoon, everyone 🙂

maleghast12:07:44

If I were to admit that I’ve never got my head around core.async (never had cause / a use for it), and that I was wondering if anyone had a particular recommendation vis à vis a primer / intro tutorial, as I think that I have a use for it now, what would people say..?

otfrom12:07:05

I'd quite like that too. Esp one that deals with core.async + transducers

otfrom12:07:18

tho if I carry on as I am, I might be writing one

Rachel Westmacott12:07:07

I suspect there are a few people on this channel who could answer specific questions

Rachel Westmacott12:07:42

the main thing to remember when playing at the REPL is that if you don’t create channels with non-zero sized buffers, or don’t perform operations in separate threads or go-blocks, then you can end up blocked quite easily

Rachel Westmacott12:07:39

…and in production be sure to catch exceptions, or at least be logging them. It can be too easy to loose errors (and the processes that threw them)

maleghast12:07:31

@peterwestmacott - Thanks for the hints 🙂

maleghast12:07:56

@otfrom - Let me know if you write that tutorial, I might have need of it 😉

Rachel Westmacott12:07:22

there was actually a EuroClojure talk (that doesn’t seem to be up yet) about a library that built on top of core.async because they felt it was too low level

otfrom12:07:45

is this the clojure otp one?

maleghast12:07:03

Oooh, I like the sound of that!

maleghast12:07:23

(I am lazy, and have spent a career building apps on top of the work of smarter folk 😉 )

maleghast12:07:54

(One day, just once, I would like to have the time + inspiration to break the rule, in fact…)

Rachel Westmacott13:07:31

another approach is to use manifold, I’ve not spent nearly as much time with it as I have with core.async, but so far I’ve found it pretty plain sailing.

Rachel Westmacott13:07:14

it’s use of deferred values makes it harder to block the REPL - but obviously the buffering still has to happen somewhere!

mccraigmccraig13:07:38

i like manifold for general async stuff more than core.async

glenjamin13:07:03

if you make a note of the nrepl port when you start a repl, you can always connect a second repl to the same process to recover from accidental blocking

otfrom13:07:55

I think I mostly want to use core.async as a way of chaining together bits of data processing. Tho I do see how manifold would be good for that too

maleghast13:07:33

Yeah, I want to put a bunch of calls to an external API on a channel and have them “do their thing” in their own time in the background. This seems__ to be a good use for core.async… If Manifold is as good / better, then that would be interesting too, but I am really unfamiliar…

maleghast13:07:45

(whereas I know what core.async does)

mccraigmccraig13:07:58

@otfrom if your processing is of streams of data then i've found core.async is fine (as are manifold Streams) ... if your ops are better suited to promises then i prefer manifold Deferred ... and if you are wanting streams of promises then manifold is a more complete solution (because the manifold stream and deferred work easily together

mccraigmccraig13:07:41

@maleghast API calls are generally more promise-like than stream-like (unless your result is an SSE stream or websocket etc)

otfrom13:07:59

I might look again at manifold streams then.

mccraigmccraig13:07:11

there are promise-chans in core.async too though

maleghast13:07:18

@mccraigmccraig - This is VERY true… So Manifold might be a better option, in that case?

otfrom13:07:23

I've generally thought of manifold as being something I would use when I want to collect a number of remote resources together in some kind of let

otfrom13:07:28

which isn't very stream like

mccraigmccraig13:07:04

manifold has two distinct core abstractions - the deferred, which is a promise with callbacks and additional machinery, and the stream which is a umm stream of values, and supports buffering, backpressure etc

maleghast13:07:27

@mccraigmccraig - OK, so here’s the problem… First call to the API I will get a total back as part of the result and as there is no paging functionality “built in” on the API in question I will need to take the total and figure out how many more calls I need to make to get the “rest” of the data. I was going to do this by throwing the calls onto a channel using core.async… I am sensing that their promise-y nature would, you feel, be better suited to Manifold Deferred..?

mccraigmccraig13:07:18

a stream or a channel works quite well for that sort of query @maleghast

mccraigmccraig13:07:10

in my data access lib we actually use a promise of a stream for that sort of access

maleghast13:07:44

I think that you may have gone past my understanding barrier…

mccraigmccraig13:07:03

(which is an improvement over just a plain stream because it allows easy mixing with other calls which return promises of a value)

maleghast13:07:46

I was intending to stack up API operations in a channel as a queue so that a) I don’t block execution and b) so that I don’t have to use an actual__ queue (SQS / rabbitMQ etc)

maleghast13:07:18

I am starting to think I may not have understood the implications of what I want to do… 😞

mccraigmccraig13:07:56

i'm not sure - are you just trying to get a stream of records from a paginated api ?

glenjamin13:07:09

what are you trying to not block the execution of?

maleghast13:07:29

@mccraigmccraig - That’s the point, the API is not paginated, I need to figure out how many pages there are and stack up the calls.

mccraigmccraig13:07:13

but you can pass an offset or something ?

maleghast13:07:13

@glenjamin - What I am looking for (and increasingly thinking I am misunderstanding) is a way to stack work up asynchronously in the background so that my call(s) to the external API don’t lock up the whole app / program for minutes at a time.

maleghast13:07:47

@mccraigmccraig - yes, but there is no “next” call - the offset and page-size are arbitrary params every call, there is no way to ask for “page 2 of my last query to the API”

mccraigmccraig13:07:00

a channel of results in core.async is perfectly reasonable @maleghast , as is a manifold stream of deferred responses

maleghast13:07:45

so: 1. Make call for first page 2. Process first page (hopefully async) 3. Use total to work out how many more ops I need to make 4. Fill up channel with calls 5. Consume channel “elsewhere”. is my thesis - does that make sense..?

glenjamin13:07:19

what is the main thread in this context?

glenjamin13:07:51

core.async itself uses a threadpool, so you might not need to funnel them all down one channel

maleghast13:07:54

@glenjamin the app that “runs” which in turn is a hybrid API / webapp

glenjamin13:07:05

unless you wanted to limit the number of concurrent api calls

maleghast13:07:41

@glenjamin - I would like to do as much concurrently as possible, but it’s not a deal-breaker, serial is fine as long as the work can be “kicked off” and left going.

maleghast13:07:49

(i.e. in the “background”)

maleghast13:07:29

order of results being processed is not important, so concurrency (particularly if it makes the whole thing faster) would be great.

maleghast13:07:35

I need to make one call to find out how many results match the request, the vendor / curator of the AI in question is not prepared to produce a simplified response to figure out the size of results sets, so I am stuck with that.

maleghast13:07:41

I am assuming that I need to “def” the channel(s) and then have a form that is in an evaluated namespace that is “waiting” for the channel(s) to have something on them..?

mccraigmccraig13:07:06

a common idiom is to return channels/streams/promises as the result of a query fn @maleghast

maleghast13:07:08

OK, but how would I consume them without tying up the main thread?

maleghast13:07:23

(I am showing my n00b here, a LOT it feels like)

mccraigmccraig13:07:47

many options @maleghast - put the consuming code inside a core.async go block @maleghast , or create a new channel with a transducer and pipe your first channel to that, or have your api fn take the channel you want responses put on, and pass in a channel with a transducer

maleghast13:07:17

@mccraigmccraig - Yeah, that’s what I was meaning ^^ when I said I would need some code, somewhere in an evaluated namespace, that was effectively “listening” for there to be “things” on the channel, just a form at the end of my namespace containing a go block that was consuming a named channel onto which my API function would place “things” …

glenjamin13:07:55

most web or UI frameworks will already have an event loop to do that i’d have expected?

glenjamin13:07:23

or maybe i’m too used to working on web apps

maleghast13:07:01

@glenjamin - I don’t think that Edge does… @dominicm ?

otfrom14:07:55

honestly, you people with your web programming.

mccraigmccraig14:07:46

@maleghast are you wanting to execute the upstream query in response to an API request and return the result to the API client ?

maleghast14:07:06

@otfrom - I know what you mean, I would be done with the web if I could be…

otfrom14:07:26

@maleghast why? that's where all the glory is

mccraigmccraig14:07:06

@otfrom "grumbling ETL coder" is a tautology isn't it ?

maleghast14:07:19

@mccraigmccraig - order of ops: 1. Make first query to API 2. Process result, including calculation of how many more ops required 3. Load up a channel with the other calls

otfrom14:07:26

@mccraigmccraig yeah, like grumpy devops person

maleghast14:07:40

@otfrom - If I wanted glory I’d be an app developer!

maleghast14:07:14

@otfrom - I am happiest when I am “doing the plumbing” and nothing I develop will be consumed by a human, only by machines

maleghast14:07:29

making stuff for humans to look at is fraught with pain and suffering

otfrom14:07:52

@maleghast if it makes you happy, then you clearly aren't dealing with enough data generated by hoo-mans

maleghast14:07:56

but it’s my skillset (according to other people) and so I am doing as much plumbing as I can and as little UI as I can get away with

maleghast14:07:38

@otfrom - well, yeah, I grant you that, but happier than web-sitey interfacey yubnub

maleghast14:07:04

@mccraigmccraig - It’s an app that will periodically (every hour / day not sure yet) make calls to an API, stash the returned data in a database and an ElasticSearch cluster, and then do it all again the next time.

mccraigmccraig14:07:36

@maleghast you might want to add [4] concatenate results from each of the page queries into a single record stream

mccraigmccraig14:07:48

but what will consume the eventual record stream ?

maleghast14:07:51

This makes the API into a smaller, custom dataset that can be interrogated via Kibana

maleghast14:07:38

@mccraigmccraig a Human, at first, and later on an app that I am not developing that will get the data back out from ES programatically and do “stuff” to it that does not concern me 🙂

maleghast14:07:09

@mccraigmccraig - I am not saying I don’t want to add “[4] concatenate results from each of the page queries into a single record stream”, but I can’t think of why I would do that, and that is probably me being ignorant of the benfits etc. Please could you explain to me why I would add this step - I really am asking, not being a prick, I promise 🙂

mccraigmccraig14:07:29

haha, well do you want to expose your downstream consumers to an additional level of structure (pages) which is an implementation feature of the upstream API ?

maleghast14:07:05

I want to take each of the 100 / 1000 / 10000 results and store them as individual documents in ES and as JSONB fields in Postgres

maleghast14:07:42

The API I am “harvesting” has a 90 day sliding window, so over time the queries I make will have different results. I don’t want to keep track of the last article I harvested, nor do I want to have to “find” it in the results to then get all the newer ones. It’s easier to just “eat” the whole response every time and reply on ES refusing to re-import a document with an existing id (into the same index) and on postgres’s ability to enforce a “unique” index on the id field.

maleghast14:07:39

but I can’t “get” all of the results in one query, the API limits “pages” to 1000 results, so I need to be able to stack up calls and execute them in an async, non-blocking manner.

mccraigmccraig14:07:47

yep, so you can concatenate the pages into a single record-stream, and process each of the records individually

maleghast14:07:28

OK, I like the sound of this in principle, and I think__ I am sort of doing that already with the synchronous, manual approach, as I get 100 articles back and then I do a doseq over the vector of maps to do INSERT queries into postgres and PUT calls to ES

maleghast14:07:37

What do you mean by a “record-stream”?

maleghast14:07:46

(I think I am misunderstanding terms)

mccraigmccraig14:07:41

by "record stream" i mean a conceptual sequence of individual records... could be on a core.async chan or a manifold stream

maleghast14:07:31

OK… I guess that I could, I just don’t know what the benefit of doing that is…

maleghast14:07:57

(and I do want to know, I am feeling ignorant and helpless, not beligerent)

mccraigmccraig14:07:15

the benefit is just simplicity - a sequence of individual records is a simpler thing than a sequence of pages of individual records... but there are tradeoffs - sometimes you want to deal with pages of records

maleghast14:07:49

Oh I see! Right, yeah, I was just going to consume the channel of returned promises with the doseq I already have, so concatenating them together into one HUGE vector first seemed like a redundant step.

maleghast14:07:51

i.e. one of the queries I am going to do returns (currently) a little over 13,000 records - I was expecting to grab the results of 14 promises off the channel and “doseq” each one until the channel was empty

maleghast14:07:57

I suppose I could consume them off the channel into one big vector, or indeed another channel and then have another consumer running what currently runs inside the doseq on each map / JSON blob that comes out of the channel… Is that what you mean?

maleghast14:07:48

so: channel of promises consumer turns vector of maps into another channel of individual maps consumer2 puts maps into DB and ES off second channel ??

mccraigmccraig14:07:07

i meant another channel @maleghast, yes

maleghast14:07:35

possibly even: consumer2 puts maps into DB and onto another channel consumer3 puts maps on third channel into ES

maleghast14:07:04

This may indeed, now I grok your meaning, be an even better idea, yes 🙂 Thanks 🙂

mccraigmccraig14:07:17

(as an aside @maleghast , doing any long-running or blocking processing in a vanilla core.async go block isn't a good idea - there is a fixed-size core.async threadpool which you can exhaust, causing blocking - so you can use https://clojure.github.io/core.async/index.html#clojure.core.async/thread )

maleghast14:07:15

@mccraigmccraig - This is also good to know, as I would have assumed that the go block macro as “managing” the thread pool… Thanks 🙂

maleghast14:07:49

@mccraigmccraig - So if I use thread inside a go block, or instead of a go block..?

maleghast14:07:08

(Sorry the Clojure docs are a little but opaque to me at times)

mccraigmccraig14:07:37

(a/go 
  (let [v (a/<! (a/thread (do-blocking-stuff)))]
    (do-non-blocking-stuff v))

mccraigmccraig14:07:43

something like that

maleghast14:07:08

That’s what I thought, but I wasn’t sure enough - thanks for the clarification 🙂

Rachel Westmacott14:07:55

also, beware long-running processes in core.async that expand items with eg. mapcat operations. You can break back pressure that way. (ie. pages on a channel being expanded into multiple events)

mccraigmccraig14:07:34

ooo i haven't come across that problem @peterwestmacott ... what happens ?

Rachel Westmacott14:07:13

you’re not likely to hit it unless you are using a lot of xforms on your channels, and then its easily worked around, but it can work fine in test, and then blow up in prod with more data/longer running processing

maleghast14:07:01

raises eyebrow This is definitely worth knowing, thanks 🙂

maleghast15:07:38

Mmmmmm Are you getting that Angry Orchard stuff that comes from America..? That stuff is YUMMY

mccraigmccraig15:07:32

i was thinking of something a little harder 😬 https://www.ciderbrandy.co.uk/shop.html

mccraigmccraig15:07:43

that still counts as cider, right ?

maleghast15:07:29

(now I want some too)

agile_geek16:07:40

Oh the joy of being behind a corporate firewall unable to see anything...how has everyone been today. Anything good happened?

mccraigmccraig17:07:35

we had the agm of the async clojure appreciation society @agile_geek 🙂

jonpither17:07:21

Love a bit of core async personally.

dominicm20:07:56

I've heard the sound of @jonpither approaching a project causes (:require [clojure.core.async :refer :all]) to appear in every file.

otfrom22:07:01

:shudder: :refer :all is the devil outside of a test namespace