This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2019-12-14
Channels
- # adventofcode (14)
- # announcements (3)
- # babashka (18)
- # beginners (32)
- # calva (1)
- # clj-kondo (65)
- # cljs-dev (5)
- # cljsrn (3)
- # clojure (22)
- # clojure-spec (13)
- # clojure-uk (53)
- # clojured (3)
- # clojuredesign-podcast (50)
- # clojurescript (8)
- # core-async (32)
- # cursive (15)
- # data-science (1)
- # datomic (17)
- # fulcro (48)
- # hyperfiddle (1)
- # off-topic (5)
- # shadow-cljs (2)
- # testing (2)
Interesting stuff around core.async etc. Something I'm still trying to work upon. Say for example you had a function that downloaded files from S3, that could be run in parallel (i.e., function fetches files A and function fetches files B), would you launch each function in a thread (I think I read that using go-blocks are not for IO operations, or blocking operations??).
pipeline
/ pipeline-blocking
/ pipeline-async
actually you probably don’t care about the ordering
yeah, I wouldn't use go blocks for blocking things (as you'd exhaust the pool) go blocks and things built on them (reduce/into/etc) are for things that want to use all the CPU
claypoole (mentioned above) might be a way to solve some of that as at least it would be a different thread pool
(that is an ill informed guess btw, I'd want to read up more on how I'd do that first)
What I have can definitely be parallelised, for fetching files from S3 - same function can download from different buckets
tho if you have too many at the same time then you might have trouble if they are all trying to do work and the switching overhead gets you
but if you are downloading a lot from S3 you might create too many threads at once and exhaust memory (possibly by each thread holding a lot of data from s3 at the same time)
I wouldn't be launching a thread per key at the lowest level, i.e. if the buckets are organised as year=2019/month=12/day=15/hour=13
, I would launch a thread at the day level, then that function would pull back all the files contained in the hour bucket.
I'll have a play, and do some benchmarking around memory/cpu and see what I can discover 🙂
ideally you want to do async i/o (rather than threaded i/o)... then you don't much care about how many threads you are using
e.g. aleph client does async requests and returns a promise of the result... it won't block and when a response (or error) is received the promise will be resolved (or rejected)
you generally don't need to care much about which thread the response will be processed on
until you do, but when you do manifold let's you control threadpools
yes @dharrigan, aleph and manifold will do it
there are some core.async libs too, tho i haven't looked at core.async for ages
I think aelph is too low level, I'm using the cognitect aws library (wonderful!) to connect and retrieve objects
ah, you are talking about s3 specifically... the newer aws Java libs do async properly (callback based, rather than cheaty futures) , so wrapping those with manifold is def an option
but it's a bit of a rabbit hole
unless it's core to what you are doing, or you are just playing for learning, i would use an existing clj s3 client if you just want to get stuff done
ah, cool - does the cognitect aws client do async properly now?
yes, I'm using the aws-api and it's working fantastically - but sequentially - just playing around to see if I can make it faster by doing things which can be done in parallel - like downloading from multiple buckets (the order in which the files are received/processed is not important)
@mccraigmccraig I'll soon find out - I'll have a play 🙂
one of the things i like about async stream-of-promises stuff is it makes reasoning about concurrency very explicit... operations are values, concurrency is a buffer size
what's the congnitect aws api using as its http client?
looks like the http client is pluggable https://github.com/cognitect-labs/aws-api/blob/master/src/cognitect/aws/http.clj tho i haven't found the default impl yet
ah, it's this https://github.com/cognitect-labs/aws-api/blob/master/src/cognitect/aws/http/cognitect.clj
cognitect.http-client
but i can't find the source for it
Yeah weird that it’s not in a repo;
I have the jar in my local .m2
repo though, and took a look…
It looks like it’s apache licensed.
And it looks like it’s async:
- It’s built on jetty’s client with the non-blocking interface: https://www.eclipse.org/jetty/documentation/current/http-client-api.html#http-client-async
- It optionally takes a core/async channel, and always returns one which contains the response and headers or an error.
The client looks quite good actually, not sure why it’s not in a repo somewhere; might be worth adding an issue to the aws lib to ask them to publish it too.
Though I’m guessing this is deliberate, that they don’t want it to be widely used outside of the aws lib; as they probably use it internally and want to evolve it slowly.
i.e. it’s opensource but they’re kinda simulating private
by not publishing a repo for it.