Fork me on GitHub
#clojure
<
2021-07-18
>
Jacob Rosenzweig00:07:27

In next.jdbc is there a way to handle nested jsonb values elegantly? Using rs/as-unqualified-lower-maps will only convert the result set to JSON but the PGObjects stay:

{:oauth_id "google-someoauthzssz",
 :user_id "spaghettimaster98",
 :card
 #object[org.postgresql.util.PGobject 0x63332fc "{\"type\": \"ContactCard\", \"links\": [{\"url\": \"\", \"type\": \"Link.Instagram\", \"title\": \"jessicawaltz\"}, {\"url\": \"\", \"type\": \"Link.Twitter\", \"title\": \"jessicawaltz\"}, {\"url\": \"\", \"type\": \"Link.LinkedIn\", \"title\": \"cunningham74\"}, {\"url\": \"tel:+15555555555\", \"type\": \"Link.Phone\", \"title\": \"Cell\"}, {\"url\": \"\", \"type\": \"Link.Email\", \"title\": \"\"}], \"style\": {\"colors\": [\"#3F2B96\", \"#A8C0FF\"], \"gradient\": \"Vertical\"}, \"header\": {\"name\": \"Jessica Walker\", \"company\": \"Roku TV\", \"portrait\": \"jessica.jpg\", \"description\": \"...\", \"jobPosition\": \"Chief Financial Officer\", \"companyLocation\": \"Seattle, WA\"}, \"$schema\": \"\", \"version\": \"0.1\"}"],
 :inserted_at #inst "2021-03-01T07:24:59.000000000-00:00",
 :updated_at #inst "2021-03-01T07:24:59.000000000-00:00"}

seancorfield01:07:12

Also, #sql exists where folks will be more likely to answer your next.jdbc/PostgreSQL/JSON column questions.

👍 3
Ben Sless07:07:00

How can I make sure (do I need to) the head of a lazy sequence passed to a function isn't being held on to? It roughly does something like:

(defn f
  [xs]
  (g (first xs))
  (h (rest xs)))

(defn h
  [xs]
  (doseq [x xs] (foo x)))

quoll13:07:03

It’s not usually an issue. In your first example, the g function gets the value from the head of the seq, but not the seq itself, so it’s OK. The h function uses doseq and the doc for that function explicitly says: > Does not retain the head of the sequence.

Ben Sless13:07:23

But will f still retain the head of the sequence until h returns?

ghadi14:07:06

no, f clears the reference to xs before the call to h

👍 3
ghadi14:07:36

in general args and let bindings are cleared before their last usage

ghadi14:07:33

in f, the compiler will push xs onto the stack, clear xs, then call rest, then call h

tlonist13:07:39

What do A and I refer to for Afn, APersistentVector and Ifn, IPersistentCollection which serve as base interfaces for clojure datastructure?

quoll13:07:58

Abstract class, and Interface.

🙌 3
tlonist13:07:43

gosh, that was very simple. Thanks!

deleted17:07:23

how do you configure the thread pools clojure uses for futures and go blocks and similar?

potetm17:07:06

What about them are you wanting to configure?

potetm17:07:18

(I’ve previously wanted to configure them, but upon reflection, decided it wasn’t necessary.)

potetm17:07:08

Right, goblocks are limited by design, futures are unlimited by design.

potetm17:07:08

ah, gotcha

potetm17:07:21

Still, you want that number to be fairly low.

potetm17:07:35

Yeah there’s an agent threadpool

potetm17:07:28

Yes. That’s why I’m bringing it up.

potetm17:07:12

The point of goblocks is that they’re lighter-weight than threads. If you devolve into spinning up a thread-per-goblock, you’ve tossed all the gains.

potetm17:07:55

Not necessarily. The nature of goblocks is that they should be parked a lot of the time.

potetm17:07:00

That’s why they scale.

potetm17:07:32

If that’s not true, you probably want to toss some CPU intensive bit or I/O intensive bit in a thread

potetm17:07:15

But of course, there are limits to that theory. I would assume that most of the time you don’t want your goblock pool to exceed the number of core on the machine.

potetm17:07:11

There are def reasons to up the threadpool. I’m saying that setting it to 100 on a 16 core machine is almost certainly a mistake.

potetm17:07:20

I’m not saying this because I’m a purist. (I’m not btw.)

potetm17:07:43

If that’s the situation you’re in, you are literally better off using thread everywhere instead of go

potetm17:07:55

Less complexity, same efficiencies.

potetm17:07:52

Well. That makes more sense to me 🙂

potetm17:07:48

Meaning: “I have a code base that used go blocks poorly. How can I fix it now?” => Okay yeah, up the threadpool 😄

potetm17:07:57

same feel 😄

potetm17:07:25

yeah deffo

alpox17:07:59

What ive done before is pushing cpu intensive work into a thread (from within a go-block) and parked the go block until the thread is done. Im not sure if that is commonly done but it felt to me like its the way to go as go blocks should be parked most of the time and hard work done outside of the go blocks threadpools

☝️ 4
potetm18:07:24

@U6JS7B99S That’s exactly the pattern.

potetm18:07:27

core.async expects you, the dev, to be vigilant about cpu/io and push it into thread blocks

potetm18:07:52

(But to Cora’s point, if you have a big ol’ codebase where that didn’t happen, either you wander the codebase fixing it, or you do something drastic.)

alpox18:07:23

I guess thats one of the reasons why https://github.com/clj-commons/manifold is a thing 😅 (the mashup)

emccue18:07:47

@U8QBZBHGD set-agent-executor-threadpool!

emccue18:07:53

or something similarly named

emccue18:07:32

which has a future you can supply an executor service to at callsite

emccue18:07:41

(by default future uses one of the agent threadpools)

emccue18:07:27

biased opinion of the young, but I think using a futures and blocking operations is a better investment than go blocks atm

emccue18:07:24

at least if you are keeping up to date with jvms

rutledgepaulv02:07:14

Going to share a post that I consider a "must read" if you're doing core.async work. In my opinion it's usually a sign that you're misusing go block threads if you need to increase the number above 8. Any blocking work that is being throttled by only having 8 go block threads is really work that should live on separate dedicated threads in the first place. https://eli.thegreenplace.net/2017/clojure-concurrency-and-blocking-with-coreasync/

☝️ 4
rutledgepaulv02:07:21

agree, looking forward to loom 🙂

didibus03:07:57

The threads behind GO blocks are meant for parallel computation, so it always sets them to the number of cores-ish that your computer has. At least I think, I might have assumed that

rutledgepaulv04:07:23

I think that's correct. There's no advantage to spreading CPU bound work onto more threads(ish) than number of CPUs. So that's one scenario that makes sense to change core.async's dispatch pool size. It's interesting to me, however, that it defaults exactly to 8 and not to ncpu+2 like the other pools in clojure.core for cpu bound work. Anyways, my primary point was sometimes people end up doing blocking work on those threads and then looking to that setting as an escape hatch for a misconceived async system.

Ben Sless05:07:53

The way I came to view go blocks is as a "thread for async work", which in core.async's context is taking and putting on channels. They just let me write code which looks synchronous so I won't have to lose my mind in callback hell. My rule of thumb, and my advice to any new Clojure devs I happen to mentor at work, is to avoid (read: minimize) CPU in go blocks

Ben Sless05:07:29

Instead, use them only "mechanically" to transfer data between channels

didibus05:07:43

To me, they seem specifically designed for concurrent computation, they use cooperative multi-tasking, so its up to you to decide when to yield a large computation. I can see if you need to have some responsiveness, that you don't want people not realizing their computation can choke other processes, but like to actually make things faster and parallel they work great too

didibus05:07:43

I guess you could argue, for any large computation, the overhead of a real thread won't matter, so maybe it is a good practice what you say so you avoid ever stalling

Ben Sless06:07:44

This is slightly tangential, but I highly recommend you read this overview: https://webtide.com/do-looms-claims-stack-up-part-1/ https://webtide.com/do-looms-claims-stack-up-part-2/ Virtual threads aren't free, either. The abstraction is less "leaky" than go blocks, but there's always a price Go blocks are not for cooporative multitasking. They are for Communicating Sequential Processes. The way they're designed (with a global thread pool), you want to use them for the communicating property. They have no runtime that can tell them "hey, suspend this task and go do something else" like virtual threads or goroutines. They can only "release" themselves when they park. If you don't let them park, you choke the thread pool.

Ben Sless06:07:12

Yes, it's leaky

potetm13:07:15

Another way to view it: CPU-bound tasks are no different from I/O-bound tasks—they both dominate a thread. The core.async threadpool is primarily meant for processes to wait for communications. (i.e. What Ben said.)

rutledgepaulv13:07:52

i think the gist with cpu bound work is that "it doesn't really matter what thread it's on, it's still going to impact everything about the same" whereas i/o bound DOES matter what thread it's on because it can "get in the way" of other work that could still be done while the machine is waiting for the i/o bound stuff to complete. Moving cpu bound work off of the go-block dispatch threads won't really change anything because the dispatch threads will still be competing with the other cpu bound worker threads for cpu time. But I don't see harm in separating them and keeping the original purpose of the dispatch threads clear.

👍 2
Ben Sless13:07:37

It does matter because they are managed by the operating system which can schedule different threads

Ben Sless13:07:09

so even if your CPU bound threads are heating up CPU, at some point you can let in your go-pool, shuffle data around between channel, then go back to doing CPU

Ben Sless13:07:10

So it makes lots of sense separating them, because only one concurrency abstraction can give you the synchronous facade over essentially async operations (blocking on channels is actually callbacks when using >!)

rutledgepaulv13:07:53

yeah that's fair. probably that is most important in a system that uses core.async to orchestrate a mix of cpu and blocking tasks? i guess i'm imagining if you had a core.async system whose sole purpose was to do computation you wouldn't benefit a lot from separating them

Ben Sless13:07:40

Still would, what do you think pipeline does? 🙂

Ben Sless13:07:36

Let dedicated threads worry about crunching data, go blocks worry about shuffling data around. It leaks, but best case scenario is when the go blocks pool isn't busy doing things you don't want it to

rutledgepaulv13:07:10

thanks, i didn't know :compute pipelines spawned dedicated threads

potetm13:07:40

I dunno what you mean “still.”

potetm13:07:48

That’s literally the whole point.

potetm13:07:24

In case it’s not clear, node does all of this with a single unconfigurable thread.

potetm13:07:35

Using a small, dedicated threadpool for small bits here and there scales like whoa. However, if you mess it up, you hose the whole system for everyone.

Ben Sless13:07:14

We haven't reached bedrock yet 😄

Ben Sless13:07:09

You got suspended by your operating system last night and now you're back!

😂 2
didibus17:07:18

Hum, I think you've convinced me. I didn't really realize that core.async doesn't actually have fibers, and has nothing similar to Goshed to force yielding a process, and also doesn't yield on loop, or on system calls, IO, etc. That does make it quite different to Go and Erlang in that way, so I can see since it only yields on >! and <! that it really does act simply as a callback rewriting scheme. So GO is for async, THREAD is for blocking or compute

potetm17:07:02

iiuc it can yield on any top-level form, but that’s still extremely limited, yeah

didibus17:07:03

And in ClojureScript you just dont run heavy compute :rolling_on_the_floor_laughing: Though actually I checked it looks like some people use generators to force yielding of long computations. I think you could do the same in ClojureScript and just use a chan as a way to yield throughout your computation like so:

(do
 (let [yield (a/chan)]
  (a/go-loop []
   (when (a/>! yield 1)
    (recur))) 
  (a/go-loop [i 10]
   (a/<! yield)
   (print i)
   (if (pos? i)
    (recur (dec i))
    (a/close! yield))))
 (let [yield (a/chan)]
  (a/go-loop []
   (when (a/>! yield 1)
    (recur))) 
  (a/go-loop [i -10]
   (a/<! yield)
   (print i)
   (if (neg? i)
    (recur (inc i))
    (a/close! yield)))))

alpox18:07:06

Maybe in webworkers you may

didibus18:07:36

With that pattern you get proper cooperative multitasking, and can choose when to yield inside your compute

Ben Sless02:07:23

Not to toot my own horn, but doesn't it reinforce what I said about go blocks?

Ben Sless03:07:05

Messy means I get woken up at night because the server blew up

Ben Sless03:07:06

So you are correct in that it depends. But that behooves us to find what it depends on

Ben Sless08:07:35

I came to exactly the opposite conclusion from our discussion 😂 The go blocks pool it too big!

😅 2
2
Ben Sless08:07:43

(theoretically)

didibus18:07:31

I thought the +2 came from experimentation, the OS won't always perfectly allocate all thread to a core, so having a little extra probably helps get a turn on the cpu

didibus19:07:38

So, maybe we're missing another macro, like go-async. Does pipeline share the same compute pool? Or each call to it makes a new pool?

Ben Sless19:07:55

Which does what?

didibus19:07:26

It would do what go currently does. So then go could be for compute, go-async to check on channels for values or put values on them.

didibus19:07:08

Where go has num of cores of threads, and go-async has like 4 fixed thread (or the 8 we currently have)

didibus19:07:37

And thread would be used for blocking IO as it is now. I believe it already uses a cached umbound pool?

Ben Sless19:07:16

Use pipeline for compute

Ben Sless19:07:51

and don't assume go-async won't be abused

Ben Sless19:07:54

go already is async

Ben Sless19:07:36

spawn thread in an unbounded manner for blocking IO, pipeline for compute, go for async

didibus19:07:42

Ya, I guess, but pipeline for compute has the issue that your compute must be well contained. What if it's spread in a lot of places? But adds up?

didibus19:07:32

I think it also all depends if you want to optimize for throughput or latency no? If you want to optimize for throughput, it seems better to do compute inside GO blocks, and only move blocking IO to threads. That way you don't get the overhead of context switching all the time to check on async IO or new requests, etc. If you want to optimize for latency, than move your compute to pipeline or pipeline-async, and keep GO only for coordinating. With blocking IO still done by thread or pipeline-blocking

Ben Sless20:07:30

How will go blocks give you better throughput? Did you even check this assertion?

didibus20:07:10

I didn't try it, but I feel logically, it would prioritize compute while making sure all cores are doing work. Unlike threads, context switch between them is cheaper (or I assume). And because they don't yield in the middle of their compute, there's less overhead. But this is all me trying to reason about performance, which I know is hard. I'm thinking like, put all requests in an input channel. Take only as many requests as you have cores from it. Process them in a GO block, when there is blocking IO needed as part of the processing, send the blocking IO to a thread, and have the go block park on the result. Make sure you configure GO pool to the number of cores you have.

didibus20:07:15

So process only as many things as you have cores at a time, except if you need to block, park that, and start another request in the meantime.

didibus20:07:04

Now you can selectively threat some requests, if they are edge cases, and ask you to perform some super long compute, and you'd rather in that case give others a chance, well for those you can send them to a thread as well, or use a pipeline.

Ben Sless04:07:39

Let's work through this logically: We are considering the options go blocks vs threads for compute Your claim is that go blocks will be better because they will achieve higher throughput Why? Cheaper context switch Go blocks represent logical processes multiplexed over a real thread pool. So you can have TWO kinds of context switches - logical and OS level Logical switches only happen when you park. If you occupy the thread with CPU, they will not happen OS level switches happen just like for regular threads So, if you're doing compute, you're not benefitting from go blocks being "virtual" threads, and keeping code that was written correctly from doing so as well by blocking a finite pool. Leaving the question is compute faster in a go block or thread? Even considered in isolation, computation is faster in a thread, because go blocks rewrite your code to a state machine and threads don't. They add overhead all on their own.

didibus05:07:06

> OS level switches happen just like for regular threads I think this is what logically I would assume it would happen less often, since there won't be as many "other threads" needing to be scheduled, I'd assume each current thread would be allowed to run for longer before switching

didibus05:07:10

But, I'm thinking this is where logic probably fails us 😛, with all my assumptions. Would need to try it and benchmark I guess

Ben Sless06:07:17

No, logic is sufficient. Go blocks yield only when blocking on channels. You have plenty of OS and process threads floating about anyway. Context switching adds overhead when you have thousands of threads, not 16 vs 24

Ben Sless07:07:39

The reason OS level switches in go blocks happen just like in threads is because they are running in real threads, in the end. They're just "suspended" execution which is picked up by a real thread in a thread pool. If that thread gets suspended, same situation

alpox09:07:59

Maybe what it comes down to is a tradeoff between not hurting other parts of the app by starving message-passing and maximal performance for compute

Ben Sless10:07:53

@U6JS7B99S your analysis is correct in my opinion, which is why I concluded: • if the app has lots of compute, no need to put it on the go pool • if the app has lots of message passing, doing compute on the go pool will starve it Therefore, leave the go pool to message passing

didibus15:07:44

It won't starve it, it will delay it until one of the Go block parks. But I'm talking about a throughput optimized case. So yes, it will possibly take longer for a new request to be started, but it will be faster for each request to complete once they do. I don't see what you mean that it is logical because Go block yield only on a channel? I know that, but that's actually why I think they could be more optimial, because they will only yield when they are truly waiting on something else. (though I get your point that the underlying thread might be yielded by the OS, so will this happen less often or not is what I can't reason about)

didibus15:07:32

I'm curious, how do you set things up then? Do you have one thread queue incoming requests and then you have a pipeline over that? Or do you instead allocate N number of threads with are abtiriraly tuned N ? for some N requests? And then you create more threads for each blocking IO? And only use GO for callbacks on that blocking IO? How do you coordinate the result to the request with that? Do you block the request thread using <!!?

Ben Sless16:07:23

Let's take a very simple example, read from Kafka, deserialize JSON, serialize back, write back to Kafka. How would you handle it?

didibus16:07:54

I'd have one input channel buffered to num of cores, I'd have one thread read from Kafka and >!! on the channel. I'd have num of cores go blocks <! from the input channel that deserialize the JSON, transforms it however we want, serialize it back to JSON, and >!on an output channel of a large n based on how much I could buffer in-memory before running out. I'd have anoother thread <!!on the output chan and write back to Kafka. Possibly I'd add a few reader threads from Kafka or writer threads to Kafka in case I see that my GO are waiting on them a lot. And I'd set the thread pool of GO to num of cores as well

Ben Sless16:07:43

Okay, what's the benefit of using the go pool in that case and not "real" threads?

didibus17:07:30

Well, in this case not much. But if you added a DB query in the middle, deserialize JSON, query db, serialize back. Now your threads would block on IO. While they are blocked you'd want to process other messages to go faster. So with using GO, you spawn a thread to do the DB query and the go block then <!on the thread result chan. That will yield the GO thread underneath so you can reuse it to start processing another msg from Kafka's input chan

Ben Sless17:07:02

But then if you do CPU on the go pool you might find you can't "release" the queries fast enough, no?

Ben Sless17:07:24

I find that the more things "happen" in the program the more I want to move data between channels. If I want to move data efficiently between channels, my only option is go blocks. So I want those threads running around moving data between the threads which will block and do the actual work

didibus17:07:59

(assume 8 cores) Well, I guess it depends what fast enough is, like 8 go are processing 8 msg at first, one of them now waits for IO, so it yields and a new GO picks up another message, now 8 GO are processing and 1 is waiting, maybe the query is done now, and ya, the result won't be processed immediately, it'll be processed only the next time one of the 8 GO either >! on the output chan, <! on the input or they <! on another thread blocking IO

didibus17:07:12

So the query result will wait until one of the active GO are done "computing" whatever they were currently processing. But this isn't "wasted" work, so you're not waiting idle. But the particular msg whose query result it is will take a bit longer to complete.

Ben Sless17:07:22

You can simulate it pretty easily - take malli, use it to generate interesting data, do the ser/de regularly, instead of a query just Thread/sleep

Ben Sless17:07:40

you won't even have to go out of process

didibus17:07:27

I might try to mess with that tonight, now I'm curious. So how would you have done it? So I could compare?

Ben Sless17:07:00

Try regular threads too, and pipeline

didibus17:07:49

Regular threads I can do it. With pipeline I'm actually a bit confused how to do it. Like where would you plug the blocking IO in the middle?

Ben Sless17:07:52

pipeline-blocking 🙂

didibus17:07:02

And by regular threads, you mean just swap the GO blocks for a a/thread correct? Cause I can also go Executor and ThreadPool, but now that has a whole set of options into it as well and its not really using core.async at all

Ben Sless17:07:38

yes, async/thread

didibus17:07:15

Ya ok, oh ya duh 😅 pipeline-blocking

Ben Sless17:07:12

although you'll notice pipeline-blocking and pipeline have the same implementation under the hood (for now)

didibus17:07:27

Ya, so what's weird is that the way pipeline works is almost the design I described, like it queues jobs in a channel, then it process N at a time either on the GO thread pool, or on another N num of threads, and puts the results on a result chan.

Ben Sless17:07:41

True, so where's the difference?

didibus17:07:23

And I guess where things can get weird with what I described is if that's not all you are doing in your service. Like if you process Kafka msgs, but if somewhere else you also process incoming TomCat requests, and somewhere else you processed events from a GUI, etc. Like if you did that, the management of the GO blocks could get messy, so I can see having pipeline :compute spawn threads just to avoid that, so if you use multiple pipeline at the same time they don't get weird with each other.

Ben Sless17:07:22

Wouldn't want to cross streams threads

didibus17:07:23

But... I think there is one difference, it seems pipeline cannot go faster when blocking. Maybe I'm not fully getting the implementation, but it seems like it will never go beyond N concurrency. What I'm thinking, you spawn another GO once one of them parks on a blocking IO. That's how you get to go faster

Ben Sless17:07:57

I'm curious to see which one of us is right 🙂

didibus19:07:13

Clearly there were learnings from the core team in moving compute of to threads, as it used to run in go blocks as well, see: https://github.com/clojure/core.async/commit/3429e3e1f1d49403bf9608b36dbd6715ffe4dd4f So my guess is you might be right, or at least, maybe not in terms of absolute performance (that's TBD still), but in terms of not shooting yourself in the foot which could ruin your performance much more easily, its probably still a best-practice for core.async

👍 2
Ben Sless19:07:35

> as it used to run in go blocks as well I know, I was subtly nudging you to find it previously, did you notice?

Ben Sless19:07:15

Sneaky 😄

😝 2
didibus20:07:47

Lol, well I already knew about the change haha, but I always accounted it to people using GO blocks not knowing what they're doing and ending up doing blocking IO even in a "compute" task

didibus20:07:56

Which I guess makes sense. Like, probably spawning threads doesn't degrade performance at all or much at all, and then if you mistakenly do some blocking IO, or have some infinite thread in there, you don't hurt your latencies in doing so. But at the same time, it begs the question, why use GO at all? You can just use a/thread all the time and >!! and <!! You said GO is the only efficient way to communicate, can you speak more to that? Is (go (<!)) faster then just <!! ?

Ben Sless20:07:54

When you use go blocks to shuffle data between channels, they can be busy only moving stuff around while threads just do work. That way you can multiplex more work onto the pool

Ben Sless20:07:07

That's the part where you want to schedule with the runtime and not the OS

Ben Sless20:07:22

Your threads will keep working as long as they keep getting data

Ben Sless20:07:27

so keep the data moving

Ben Sless20:07:53

go blocks have more chances to get more work done before the OS tells the thread it's nap time

Ben Sless20:07:11

not sure regarding the speed, you can check that, too

didibus20:07:06

I'm not sure I follow the whole multiplexing? Most advice I saw says to actually avoid GO for that, and to use put!, take! and poll! instead as they have less overhead.

narendraj920:07:55

If you would like to set it dynamically instead of passing the System property to the JVM, you can try (System/setProperty "clojure.core.async.pool-size" "<num>") before any of the core.async functions are called. Since there are delays protecting evaluation of this system property's value till the first use of a core.async facility.

didibus03:07:42

Hum, I guess I was wrong, that's weird, since both pmap and agents use the same processor + 2, I'd have thought core.async would do the same.

Ben Sless03:07:00

Look at the git history. It used to

didibus03:07:23

Can you link me to it?

didibus03:07:39

Interesting, this is the commit that ended up making it default to 8: https://github.com/clojure/core.async/commit/a690c4f3b7bf9ae9e7bdc899c030955d5933042d#diff-df2b18760355fb977cc2720a5b3fece009ba26aec07a04e1b09537a1bb32fd90 Its a rare instance of a contributor that didn't seem to be involved with core or core.async that got in. I don't know, I feel like it be good to change it back to number of processors + 2. At first it looked like they were hoping to make it big or growing so if people blocked in a GO it wouldn't choke, then it looks like they decided that whatever people shouldn't do that, but a default 8 is kind of strange. Or maybe at least make it the max of number of processores +2 OR 8

narendraj920:07:39

Using a magic number in this case does seem like an arbitrary choice. There should have been a comment about it if there was some empirical evidence for that choice.

💯 2