Fork me on GitHub
#clojure-uk
<
2017-07-24
>
thomas07:07:05

morning šŸ˜¼ ... got back from EuroClojure rather late last night.

thomas07:07:16

how have was things here?

yogidevbear07:07:19

I'm pretty sure I know what the answer will be, but I'll anyway simple_smile How was EuroClojure, @thomas?

yogidevbear07:07:36

Best of luck, @agile_geek šŸ‘

yogidevbear07:07:39

I have my son at home for the next 7 weeks and I work from home so having my own sense of dread šŸ˜‰ He's already starting to bounce off the walls

thomas08:07:15

and #euroclojure was fantastic as always... albeit a very hot first day (aircon didn't work), @yogidevbear

thomas08:07:29

lots of old friends and even more new ones.

yogidevbear08:07:30

Hopefully I will be able to come next year šŸ˜„

thomas08:07:30

Good morning @elise_huard thank you again for your great talk! really informative!

elise_huard08:07:14

thank you @thomas, appreciate it! I really enjoyed the other talks as well, there was definitely a theme

otfrom09:07:40

@agile_geek I'd suggest hugging all the juxtaposers on the first day and all the bankers on day 2

agile_geek09:07:20

I'm not hugging anyone in a bank

thomas09:07:57

there was indeed @elise_huard there were some very good ones. and I also really liked all the sketching that was going on.

otfrom09:07:52

@agile_geek bank hugs have been some of my best hugs. Even in Swiss banks in the wharf

otfrom11:07:02

@thomas involves more suits usually

glenjamin11:07:11

Everyone hugs in one go overnight

maleghast12:07:18

I am missing the Clojure Communityā€™s propensity to hug (in person, I mean)

maleghast12:07:29

Morning / Afternoon, everyone šŸ™‚

maleghast12:07:44

If I were to admit that Iā€™ve never got my head around core.async (never had cause / a use for it), and that I was wondering if anyone had a particular recommendation vis Ć  vis a primer / intro tutorial, as I think that I have a use for it now, what would people say..?

otfrom12:07:05

I'd quite like that too. Esp one that deals with core.async + transducers

otfrom12:07:18

tho if I carry on as I am, I might be writing one

Rachel Westmacott12:07:07

I suspect there are a few people on this channel who could answer specific questions

Rachel Westmacott12:07:42

the main thing to remember when playing at the REPL is that if you donā€™t create channels with non-zero sized buffers, or donā€™t perform operations in separate threads or go-blocks, then you can end up blocked quite easily

Rachel Westmacott12:07:39

ā€¦and in production be sure to catch exceptions, or at least be logging them. It can be too easy to loose errors (and the processes that threw them)

maleghast12:07:31

@peterwestmacott - Thanks for the hints šŸ™‚

maleghast12:07:56

@otfrom - Let me know if you write that tutorial, I might have need of it šŸ˜‰

Rachel Westmacott12:07:22

there was actually a EuroClojure talk (that doesnā€™t seem to be up yet) about a library that built on top of core.async because they felt it was too low level

otfrom12:07:45

is this the clojure otp one?

maleghast12:07:03

Oooh, I like the sound of that!

maleghast12:07:23

(I am lazy, and have spent a career building apps on top of the work of smarter folk šŸ˜‰ )

maleghast12:07:54

(One day, just once, I would like to have the time + inspiration to break the rule, in factā€¦)

Rachel Westmacott13:07:31

another approach is to use manifold, Iā€™ve not spent nearly as much time with it as I have with core.async, but so far Iā€™ve found it pretty plain sailing.

Rachel Westmacott13:07:14

itā€™s use of deferred values makes it harder to block the REPL - but obviously the buffering still has to happen somewhere!

mccraigmccraig13:07:38

i like manifold for general async stuff more than core.async

glenjamin13:07:03

if you make a note of the nrepl port when you start a repl, you can always connect a second repl to the same process to recover from accidental blocking

otfrom13:07:55

I think I mostly want to use core.async as a way of chaining together bits of data processing. Tho I do see how manifold would be good for that too

maleghast13:07:33

Yeah, I want to put a bunch of calls to an external API on a channel and have them ā€œdo their thingā€ in their own time in the background. This seems__ to be a good use for core.asyncā€¦ If Manifold is as good / better, then that would be interesting too, but I am really unfamiliarā€¦

maleghast13:07:45

(whereas I know what core.async does)

mccraigmccraig13:07:58

@otfrom if your processing is of streams of data then i've found core.async is fine (as are manifold Streams) ... if your ops are better suited to promises then i prefer manifold Deferred ... and if you are wanting streams of promises then manifold is a more complete solution (because the manifold stream and deferred work easily together

mccraigmccraig13:07:41

@maleghast API calls are generally more promise-like than stream-like (unless your result is an SSE stream or websocket etc)

otfrom13:07:59

I might look again at manifold streams then.

mccraigmccraig13:07:11

there are promise-chans in core.async too though

maleghast13:07:18

@mccraigmccraig - This is VERY trueā€¦ So Manifold might be a better option, in that case?

otfrom13:07:23

I've generally thought of manifold as being something I would use when I want to collect a number of remote resources together in some kind of let

otfrom13:07:28

which isn't very stream like

mccraigmccraig13:07:04

manifold has two distinct core abstractions - the deferred, which is a promise with callbacks and additional machinery, and the stream which is a umm stream of values, and supports buffering, backpressure etc

maleghast13:07:27

@mccraigmccraig - OK, so hereā€™s the problemā€¦ First call to the API I will get a total back as part of the result and as there is no paging functionality ā€œbuilt inā€ on the API in question I will need to take the total and figure out how many more calls I need to make to get the ā€œrestā€ of the data. I was going to do this by throwing the calls onto a channel using core.asyncā€¦ I am sensing that their promise-y nature would, you feel, be better suited to Manifold Deferred..?

mccraigmccraig13:07:18

a stream or a channel works quite well for that sort of query @maleghast

mccraigmccraig13:07:10

in my data access lib we actually use a promise of a stream for that sort of access

maleghast13:07:44

I think that you may have gone past my understanding barrierā€¦

mccraigmccraig13:07:03

(which is an improvement over just a plain stream because it allows easy mixing with other calls which return promises of a value)

maleghast13:07:46

I was intending to stack up API operations in a channel as a queue so that a) I donā€™t block execution and b) so that I donā€™t have to use an actual__ queue (SQS / rabbitMQ etc)

maleghast13:07:18

I am starting to think I may not have understood the implications of what I want to doā€¦ šŸ˜ž

mccraigmccraig13:07:56

i'm not sure - are you just trying to get a stream of records from a paginated api ?

glenjamin13:07:09

what are you trying to not block the execution of?

maleghast13:07:29

@mccraigmccraig - Thatā€™s the point, the API is not paginated, I need to figure out how many pages there are and stack up the calls.

mccraigmccraig13:07:13

but you can pass an offset or something ?

maleghast13:07:13

@glenjamin - What I am looking for (and increasingly thinking I am misunderstanding) is a way to stack work up asynchronously in the background so that my call(s) to the external API donā€™t lock up the whole app / program for minutes at a time.

maleghast13:07:47

@mccraigmccraig - yes, but there is no ā€œnextā€ call - the offset and page-size are arbitrary params every call, there is no way to ask for ā€œpage 2 of my last query to the APIā€

mccraigmccraig13:07:00

a channel of results in core.async is perfectly reasonable @maleghast , as is a manifold stream of deferred responses

maleghast13:07:45

so: 1. Make call for first page 2. Process first page (hopefully async) 3. Use total to work out how many more ops I need to make 4. Fill up channel with calls 5. Consume channel ā€œelsewhereā€. is my thesis - does that make sense..?

glenjamin13:07:19

what is the main thread in this context?

glenjamin13:07:51

core.async itself uses a threadpool, so you might not need to funnel them all down one channel

maleghast13:07:54

@glenjamin the app that ā€œrunsā€ which in turn is a hybrid API / webapp

glenjamin13:07:05

unless you wanted to limit the number of concurrent api calls

maleghast13:07:41

@glenjamin - I would like to do as much concurrently as possible, but itā€™s not a deal-breaker, serial is fine as long as the work can be ā€œkicked offā€ and left going.

maleghast13:07:49

(i.e. in the ā€œbackgroundā€)

maleghast13:07:29

order of results being processed is not important, so concurrency (particularly if it makes the whole thing faster) would be great.

maleghast13:07:35

I need to make one call to find out how many results match the request, the vendor / curator of the AI in question is not prepared to produce a simplified response to figure out the size of results sets, so I am stuck with that.

maleghast13:07:41

I am assuming that I need to ā€œdefā€ the channel(s) and then have a form that is in an evaluated namespace that is ā€œwaitingā€ for the channel(s) to have something on them..?

mccraigmccraig13:07:06

a common idiom is to return channels/streams/promises as the result of a query fn @maleghast

maleghast13:07:08

OK, but how would I consume them without tying up the main thread?

maleghast13:07:23

(I am showing my n00b here, a LOT it feels like)

mccraigmccraig13:07:47

many options @maleghast - put the consuming code inside a core.async go block @maleghast , or create a new channel with a transducer and pipe your first channel to that, or have your api fn take the channel you want responses put on, and pass in a channel with a transducer

maleghast13:07:17

@mccraigmccraig - Yeah, thatā€™s what I was meaning ^^ when I said I would need some code, somewhere in an evaluated namespace, that was effectively ā€œlisteningā€ for there to be ā€œthingsā€ on the channel, just a form at the end of my namespace containing a go block that was consuming a named channel onto which my API function would place ā€œthingsā€ ā€¦

glenjamin13:07:55

most web or UI frameworks will already have an event loop to do that iā€™d have expected?

glenjamin13:07:23

or maybe iā€™m too used to working on web apps

maleghast13:07:01

@glenjamin - I donā€™t think that Edge doesā€¦ @dominicm ?

otfrom14:07:55

honestly, you people with your web programming.

mccraigmccraig14:07:46

@maleghast are you wanting to execute the upstream query in response to an API request and return the result to the API client ?

maleghast14:07:06

@otfrom - I know what you mean, I would be done with the web if I could beā€¦

otfrom14:07:26

@maleghast why? that's where all the glory is

mccraigmccraig14:07:06

@otfrom "grumbling ETL coder" is a tautology isn't it ?

maleghast14:07:19

@mccraigmccraig - order of ops: 1. Make first query to API 2. Process result, including calculation of how many more ops required 3. Load up a channel with the other calls

otfrom14:07:26

@mccraigmccraig yeah, like grumpy devops person

maleghast14:07:40

@otfrom - If I wanted glory Iā€™d be an app developer!

maleghast14:07:14

@otfrom - I am happiest when I am ā€œdoing the plumbingā€ and nothing I develop will be consumed by a human, only by machines

maleghast14:07:29

making stuff for humans to look at is fraught with pain and suffering

otfrom14:07:52

@maleghast if it makes you happy, then you clearly aren't dealing with enough data generated by hoo-mans

maleghast14:07:56

but itā€™s my skillset (according to other people) and so I am doing as much plumbing as I can and as little UI as I can get away with

maleghast14:07:38

@otfrom - well, yeah, I grant you that, but happier than web-sitey interfacey yubnub

maleghast14:07:04

@mccraigmccraig - Itā€™s an app that will periodically (every hour / day not sure yet) make calls to an API, stash the returned data in a database and an ElasticSearch cluster, and then do it all again the next time.

mccraigmccraig14:07:36

@maleghast you might want to add [4] concatenate results from each of the page queries into a single record stream

mccraigmccraig14:07:48

but what will consume the eventual record stream ?

maleghast14:07:51

This makes the API into a smaller, custom dataset that can be interrogated via Kibana

maleghast14:07:38

@mccraigmccraig a Human, at first, and later on an app that I am not developing that will get the data back out from ES programatically and do ā€œstuffā€ to it that does not concern me šŸ™‚

maleghast14:07:09

@mccraigmccraig - I am not saying I donā€™t want to add ā€œ[4] concatenate results from each of the page queries into a single record streamā€, but I canā€™t think of why I would do that, and that is probably me being ignorant of the benfits etc. Please could you explain to me why I would add this step - I really am asking, not being a prick, I promise šŸ™‚

mccraigmccraig14:07:29

haha, well do you want to expose your downstream consumers to an additional level of structure (pages) which is an implementation feature of the upstream API ?

maleghast14:07:05

I want to take each of the 100 / 1000 / 10000 results and store them as individual documents in ES and as JSONB fields in Postgres

maleghast14:07:42

The API I am ā€œharvestingā€ has a 90 day sliding window, so over time the queries I make will have different results. I donā€™t want to keep track of the last article I harvested, nor do I want to have to ā€œfindā€ it in the results to then get all the newer ones. Itā€™s easier to just ā€œeatā€ the whole response every time and reply on ES refusing to re-import a document with an existing id (into the same index) and on postgresā€™s ability to enforce a ā€œuniqueā€ index on the id field.

maleghast14:07:39

but I canā€™t ā€œgetā€ all of the results in one query, the API limits ā€œpagesā€ to 1000 results, so I need to be able to stack up calls and execute them in an async, non-blocking manner.

mccraigmccraig14:07:47

yep, so you can concatenate the pages into a single record-stream, and process each of the records individually

maleghast14:07:28

OK, I like the sound of this in principle, and I think__ I am sort of doing that already with the synchronous, manual approach, as I get 100 articles back and then I do a doseq over the vector of maps to do INSERT queries into postgres and PUT calls to ES

maleghast14:07:37

What do you mean by a ā€œrecord-streamā€?

maleghast14:07:46

(I think I am misunderstanding terms)

mccraigmccraig14:07:41

by "record stream" i mean a conceptual sequence of individual records... could be on a core.async chan or a manifold stream

maleghast14:07:31

OKā€¦ I guess that I could, I just donā€™t know what the benefit of doing that isā€¦

maleghast14:07:57

(and I do want to know, I am feeling ignorant and helpless, not beligerent)

mccraigmccraig14:07:15

the benefit is just simplicity - a sequence of individual records is a simpler thing than a sequence of pages of individual records... but there are tradeoffs - sometimes you want to deal with pages of records

maleghast14:07:49

Oh I see! Right, yeah, I was just going to consume the channel of returned promises with the doseq I already have, so concatenating them together into one HUGE vector first seemed like a redundant step.

maleghast14:07:51

i.e. one of the queries I am going to do returns (currently) a little over 13,000 records - I was expecting to grab the results of 14 promises off the channel and ā€œdoseqā€ each one until the channel was empty

maleghast14:07:57

I suppose I could consume them off the channel into one big vector, or indeed another channel and then have another consumer running what currently runs inside the doseq on each map / JSON blob that comes out of the channelā€¦ Is that what you mean?

maleghast14:07:48

so: channel of promises consumer turns vector of maps into another channel of individual maps consumer2 puts maps into DB and ES off second channel ??

mccraigmccraig14:07:07

i meant another channel @maleghast, yes

maleghast14:07:35

possibly even: consumer2 puts maps into DB and onto another channel consumer3 puts maps on third channel into ES

maleghast14:07:04

This may indeed, now I grok your meaning, be an even better idea, yes šŸ™‚ Thanks šŸ™‚

mccraigmccraig14:07:17

(as an aside @maleghast , doing any long-running or blocking processing in a vanilla core.async go block isn't a good idea - there is a fixed-size core.async threadpool which you can exhaust, causing blocking - so you can use https://clojure.github.io/core.async/index.html#clojure.core.async/thread )

maleghast14:07:15

@mccraigmccraig - This is also good to know, as I would have assumed that the go block macro as ā€œmanagingā€ the thread poolā€¦ Thanks šŸ™‚

maleghast14:07:49

@mccraigmccraig - So if I use thread inside a go block, or instead of a go block..?

maleghast14:07:08

(Sorry the Clojure docs are a little but opaque to me at times)

mccraigmccraig14:07:37

(a/go 
  (let [v (a/<! (a/thread (do-blocking-stuff)))]
    (do-non-blocking-stuff v))

mccraigmccraig14:07:43

something like that

maleghast14:07:08

Thatā€™s what I thought, but I wasnā€™t sure enough - thanks for the clarification šŸ™‚

Rachel Westmacott14:07:55

also, beware long-running processes in core.async that expand items with eg. mapcat operations. You can break back pressure that way. (ie. pages on a channel being expanded into multiple events)

mccraigmccraig14:07:34

ooo i haven't come across that problem @peterwestmacott ... what happens ?

Rachel Westmacott14:07:13

youā€™re not likely to hit it unless you are using a lot of xforms on your channels, and then its easily worked around, but it can work fine in test, and then blow up in prod with more data/longer running processing

maleghast14:07:01

raises eyebrow This is definitely worth knowing, thanks šŸ™‚

maleghast15:07:38

Mmmmmm Are you getting that Angry Orchard stuff that comes from America..? That stuff is YUMMY

mccraigmccraig15:07:32

i was thinking of something a little harder šŸ˜¬ https://www.ciderbrandy.co.uk/shop.html

mccraigmccraig15:07:43

that still counts as cider, right ?

maleghast15:07:21

Sure! šŸ˜‰

maleghast15:07:29

(now I want some too)

agile_geek16:07:40

Oh the joy of being behind a corporate firewall unable to see anything...how has everyone been today. Anything good happened?

mccraigmccraig17:07:35

we had the agm of the async clojure appreciation society @agile_geek šŸ™‚

jonpither17:07:21

Love a bit of core async personally.

dominicm20:07:56

I've heard the sound of @jonpither approaching a project causes (:require [clojure.core.async :refer :all]) to appear in every file.

otfrom22:07:01

:shudder: :refer :all is the devil outside of a test namespace