Fork me on GitHub
#clojure
<
2020-05-11
>
Daniel Aamer14:05:58

Hey Clojourians!! Nice to meet you all! Is anyone in Europe here? I have a remote long term project with a leading client where I’m looking for an experienced Clojure/ClojureScript with Java Microservices BE and AWS skills too. If anyone is interested in a high paid remote role, please get in touch with me ASAP 🙂 Have a nice day, hope everyone is staying safe. Many Thanks Daniel

dominicm14:05:31

You should move this message to #jobs

dominicm14:05:03

Oh, I see you have. You should remove this message, as it's not the appropriate channel.

dominicm16:05:46

Same for your #clojurescript message too

seancorfield17:05:01

Both messages removed (as Admin).

plins19:05:04

any recommendations on a mature http client that supports core.async by default? its server side so it dont need to be compatiblle with CLJS

ghadi19:05:12

@plins it's kind of a gap

ghadi19:05:45

requests with cognitect.http-client return a channel, but cognitect.http-client is designed for a very specific use-case

ghadi19:05:11

I had a wrapper of http://java.net.http that returns you a channel...

ghadi19:05:22

Separately, I've been working on adapting the client within Cognitect Labs' aws-api to support broader use-cases.

thanks3 4
ghadi19:05:21

you can always take one of the existing clients and adapt it to return you a channel

ghadi19:05:40

but IMHO existing clients have a lot of dependencies

lilactown20:05:55

what’s the use case for an http client that returns a channel?

lilactown20:05:07

sorry if that’s a #stupid-question

noisesmith20:05:24

when you think http requests are cheaper than threads?

bfabry20:05:24

most of the time you want IO to happen in the background

bfabry20:05:45

well, that's a bad generalisation. but it's often helpful

bfabry20:05:27

if I request data from 8 different web services and the requests don't depend on each other it's nice if something goes off and gets all that concurrently

noisesmith20:05:56

@lilactown real world use case: to get a result for a client, I need data from 5 apis in parallel - none of the requests wait on results from the others

bfabry20:05:30

there's a paradigm where all IO is async by default, and you have to be explicit about what things you want to wait for. it's pretty popular

noisesmith20:05:23

sometimes it's actually a good idea and not just js stockholm syndrome :D

bfabry20:05:43

haha. yeah vert.x is not quite the same thing as node.js 😆

phronmophobic20:05:23

right, but shouldn’t how the http call gets handled (asynchronous, synchronous, channels, which thread, etc) be left to the consumer of the http library rather than bundled with the http library itself?

noisesmith20:05:43

there's a decent argument that syncing on something async has fewer gotchas than going async on something sync

noisesmith20:05:08

we do have good concurrency tools (futures, core.async, thread pool executors) but there are gotchas

lilactown20:05:00

yeah, I’m just pattern matching on the rule of thumb that “io inside of a go channel is bad”

noisesmith20:05:29

channels don't go inside go, go goes inside channels

lilactown20:05:40

sounds like having async IO that returns a channel representation of that async process is what we’re talking about, not actually running the request inside of a go?

noisesmith20:05:54

it's easy enough to return a channel and deliver to it without using a go thread

noisesmith20:05:21

I was recently looking at guile fibers, it's an interesting compare / contrast (instead of go recompiling a state machine, the can use continuations; just like us they need to worry about blocking calls eating up their thread pool)

hiredman20:05:20

https://wingolog.org/archives/2017/06/29/a-new-concurrent-ml is a very good blog post covering some of that work in guile

💯 4
lilactown20:05:53

fibers/continuations at the language level seem like such a good idea for the 80% case, we just need some behemoth to throw a bunch of money at making it good enough at the 20% where it kinda sucks rn

noisesmith20:05:25

well, they have continuations at the language level, fibers are userspace just like our go blocks are

👍 4
noisesmith20:05:09

they fixed multiple bugs in their compiler (and more to come) due to problems exposed by using fibers as intended, which is cool

noisesmith20:05:44

one thing i like - instead of our go which puts your code into a singleton async scheduler, fibers have run-fibers which creates a new scheduler to run the body in - so for example two libraries can each use their own fiber scheduler if they are independently async and don't need to coordinate

dominicm20:05:25

I'm not sure how practical that is in practice. The reality is that you really want a single thread pool in your application really. You can't reserve the number of cores*2 to infinity. In theory I'm guessing that core async could do that by not using a Singleton?

noisesmith20:05:18

probably - I'm not sure how much (if any) of the current code relies on there being a single instance of the thread pool - eg. what would happen if one channel was used in two go blocks inside different schedulers? since the continuation of a go block is expressed as a channel callback, I'm not sure how that should work

noisesmith20:05:54

in fact, the least disruptive thing would be channels attached to schedulers, not go blocks, but that's very counterintuitive

lilactown20:05:31

yes that makes sense to me

ghadi20:05:35

go/thread isn't the dilemma here

ghadi20:05:59

can avoid using either of them and submit normal functions to an ExecutorService, that yield things on channels when done

ghadi20:05:14

(let [ch (async/chan 1)]
  (.submit executor 
    (fn []
      (put! ch (http/request ....))))
  ch)

ghadi20:05:18

is the general pattern

lilactown20:05:25

aren't you just re-creating go ?

lilactown20:05:42

but inside your own executor

hiredman20:05:12

I suspect grouping fibers like guile does is more about the ability to wait for them all to finish then it is about scheduling

noisesmith20:05:28

and to cancel them as a group even?

hiredman20:05:41

maybe, dunno

noisesmith20:05:49

since their vm allows canceling from the outside

ghadi20:05:05

@lilactown I don't understand your question

hiredman20:05:19

I think guile is single thread, so having multiple "schedulers" on top of a single thread would be really odd

noisesmith20:05:47

fibers use multiple real OS threads, and theyve had a thread API for a while

noisesmith21:05:03

(I mean a real one, not limited by a GIL on gc)

ghadi20:05:34

having a process return a channel allows you to use all the coordination facilities in core.async

ghadi20:05:40

which are very very useful

hiredman20:05:56

but other than being a big fan of http://wingolog.org I don't know much

ghadi21:05:42

here is an example of shelling out to ffmpeg and returning a channel:

ghadi21:05:00

returns you a clojure map on the channel

ghadi21:05:13

util/future->ch is a couple lines:

lilactown21:05:23

I believe you; I'm very #beginners in my understanding of Java threads / Clojure core.async.

lilactown21:05:21

I was thinking of sort of the general core.async case where you want to throw some computation to be scheduled to be run, and it might await other channels

ghadi21:05:39

you can't do blocking IO calls within async/go blocks, must communicate with channels

dpsutton21:05:50

i get future->ch but what is future-map?

ghadi21:05:18

future-map is basically fmap with the CompletableFuture

lilactown21:05:47

I thought we were talking about the general case - I put some code in a go block and it gets scheduled to run on some thread pool or what have you

bfabry21:05:04

isn't it more you can do blocking IO there, but you should not, because that thread pool has been sized specifically for doing compute?

lilactown21:05:13

and when you said that go/thread isn't relevant, and you can create your own function/macro that runs the code on the executor

lilactown21:05:48

it sounded like you were saying the solution was to essentially create your own go construct that runs on the executor of your choice

lilactown21:05:02

which is fine, but seems pretty onerous

ghadi21:05:23

that is not what I'm saying

ghadi21:05:40

use whatever thread pools you want, but I coordinate processes using channels

ghadi21:05:58

I wrote a process that crawls a filesystem, looking for all video files, and launches ffmpeg using the above code to extract metadata from all the videos. the process runs max 20 ffmpeg calls in parallel

ghadi21:05:08

it lights up the cores:

ghadi21:05:09

the clojure process doesn't even register on the map

ghadi21:05:29

one thread walks the filesystem and pumps all videos encountered onto a channel

ghadi21:05:52

another process in the middle takes video files and shells out to ffmpeg, subject to the concurrency limit

ghadi21:05:08

and it places its result maps on a third channel

ghadi21:05:27

that channel contains datomic transaction data

ghadi21:05:50

so I can answer questions like "What files have h.264 streams that are more than 720p?", etc.

👀 4
ghadi21:05:22

wherever I said "ffmpeg" I could replace that with "http request"

ghadi21:05:51

HTTP is a little bit complex though. If you're sending JSON or receiving JSON, in which thread pool does the JSON get written/read?

ghadi21:05:04

the HTTP client's pool? caller's thread?

ghadi21:05:16

a lot of libraries muddy the waters

lilactown21:05:23

I'm still a little confused, since it didn't sound like we were talking about I/O before

lilactown21:05:08

it does make sense paired with the previous part of the convo when I was confused about a question about http requests and core.async

lilactown21:05:42

I thought you were replying to > in fact, the least disruptive thing would be channels attached to schedulers, not go blocks, but that's very counterintuitive

lilactown21:05:47

which it actually sounds like is what happens in practice, but channels returned by go blocks are inherently tied to whatever thread pool core.async comes with by default, which it sounds like is different than guile's fibers that allow you to pick the scheduler at runtime

ghadi21:05:22

that's right, go blocks run within the core.async managed pool

noisesmith21:05:23

go translates channel ops into code that attaches callbacks to channels

hiredman21:05:47

the jvm has had a default threadpool for a while now too (https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/ForkJoinPool.html#commonPool--) which core.async and clojure don't use

noisesmith21:05:56

is it go or the channel that decides where the execution happens?

hiredman21:05:24

the channel

phronmophobic21:05:32

I thought you could do all of the channel operations on whichever thread you want with the blocking( !! suffix) operations

noisesmith21:05:48

those compile to the non-blocking calls internally

hiredman21:05:14

the channels directly add thunks invoking callbacks to core.async's threadpool, but an alternative implementation would be the channels directly invoking the callbacks and relying on them to do work on whatever threadpool they want

phronmophobic21:05:14

my understanding was that if you want to use a default threadpool, then use go blocks. if you want more control over scheduling and threads, use the blocking operations (`!!` suffix)

hiredman21:05:56

I imagine it was just chance of the draw that channels ended up putting stuff on dispatch instead of the other way

hiredman21:05:07

but there would even be some benefits to having the callbacks control the threadpooling queing

hiredman21:05:00

right now the way real blocking operations are implemented is a callback is attached to the channel which fulfills a promise when run, and the operation blocks on that promise

hiredman21:05:44

that callback still gets run on the core.async pool, so even if you are only ever doing real blocking stuff you are running lots of small little tasks on the core.async pool

hiredman21:05:21

if the callback got to decide if it ran on an executor or not it could skip the executor for just fulfilling the promise

ghadi21:05:04

@smith.adriane https://groups.google.com/d/msg/clojure/yUIl2tl4HoM/awXms4n4AAAJ > In general, you should never directly or indirectly use blocking IO operations in go block threads. The go block threads are a fixed pool of (by default 8) threads. If you block enough of these threads, you lock up the pool, potentially in unrecoverable ways. > This release contains a new Java system property (intended primarily for development use) that will throw if core.async blocking operations (anything ending in "!!") are used in a go block. The exception will be thrown in a go block thread which by default will bubble up to the ThreadGroup's uncaught exception handler and get printed to stderr. You can also set Thread.setDefaultUncaughtExceptionHandler() if you want to do something else. Note that this only catches one set of blocking calls, other blocking IO is equally as problematic and will not be caught with this flag.

ghadi21:05:57

(That's not the latest release anymore, since @alexmiller released more good stuff this morning)

phronmophobic21:05:10

:thumbsup:. I wasn’t recommending using blocking operations inside of go blocks. Is it still reasonable to use your own threads with blocking operations if want more control over scheduling and threads?

ghadi21:05:49

ok cool, just clarifying. And, BYO threads/executors when desired

👍 4
ghadi21:05:22

In the system I mentioned above, the filesystem pump uses async/thread the ffmpeg stuff is a CompletableFuture that dumps onto a channel the concurrency limiter in the middle is a single go block the main thread reads the results of video metadata extraction

phronmophobic21:05:36

outside of cljs, I typically avoid using go blocks except for temporary solutions

dpsutton22:05:39

@ghadi do you mind sharing what the concurrency limiter looks like? I'm trying to think of what that looks like with completeable futures

ghadi22:05:06

sure. the concurrency limiter is a variant of core.async/pipeline

ghadi22:05:13

it knows nothing about completable futures

ghadi22:05:42

its input, like clojure.core.async/pipeline-async, is a function that returns a channel

ghadi22:05:01

docstring:

ghadi22:05:05

"Runs asynchronous function 'af' on each input from channel 'in',
   producing results to channel 'out'. af is presumed to return a channel

   Input order is *not* preserved.

   Runs af with maximum 'max' concurrency. max can be an integer
   or a function returning integer (allowing dynamic concurrency
   control)

   close?, default true, controls whether the output channel is closed
   upon completion"

ghadi22:05:32

args [max af in out]

ghadi22:05:58

which uses a CompletableFuture internally, because that's what java offers you when shelling out

ghadi22:05:06

(it's a Java 9+ thing)

ghadi22:05:01

why not use clojure.core.async/pipeline-async? 1) I don't care about preserving input order 2) there is a tiny bug (maybe?) in pipeline-async's concurrency limit, where you might get N+2 tasks in flight

dpsutton22:05:37

i've never looked at the source of pipeline. that's quite nice

ghadi22:05:19

the key here is not caring about anything about af besides it takes an input and returns you a channel with eventual result

ghadi22:05:45

that way a specific af is free to do whatever particular resource mgmt it needs

ghadi22:05:00

hope that helps!

dpsutton22:05:24

always love to glean from your examples. thank you!

4
hiredman22:05:57

you can directly use a completablefuture as a readport (the read half of a channel)

ghadi22:05:53

that is a very neat gist. I had to do some blocking IO on the process stdout/stderr right as soon as it was completed, so I stayed in future land before converting to channel