Fork me on GitHub
#clojure
<
2023-03-21
>
stopa00:03:00

Hey team, if I have a system that is primarily io-bound, what would be the most idiomatic way to pmap? I could use claypoole, but I am not really sure about how to choose the thread pool size. My current thinking is, that for io-bound stuff core-async would be the way to go. My reasoning: • I could "open" thousands of io-parked threads, vs in the claypoole case, I could max have thread-pool-size open connections Is this thinking reasonable?

hiredman01:03:22

No, this is, I dunno, "not even wrong"

hiredman01:03:20

core.async doesn't provide any integration with any non-blocking io libraries

hiredman01:03:23

If you want to spin up unlimited threads (and you don't want this) doing so via core.async or something else is the same

stopa01:03:26

Wow. So to make sure I understand: • If core-async has a thread-pool size of 8 • If I have 1000 http requests. • If I used this https://gist.github.com/seancorfield/e7fa885e35df232dbb3758735088ae6e function, where I initiate 1000 go blocks, each with a blocking http-request call Then what would happen is, only 8 requests would be open at a time?

hiredman01:03:36

core.async specifically says not to do io in go blocks

stopa01:03:49

...oi. good to know. Thank you @U0NCTKEV8

stopa01:03:56

So, how would you do it?

hiredman01:03:06

So you will likely deadlock the core.async threadpool

👍 2
stopa01:03:43

• I have 1000 http requests. • I want to initiate all 1000 requests at once • Then wait for the results

hiredman01:03:35

1000 is nothing, you'll be fine with using real threads

hiredman01:03:47

Do not use pmap

stopa01:03:57

Let's say 100K

hiredman01:03:06

(pmap from claypool may be fine)

👍 2
stopa01:03:25

One thing I could do: 1. use core-async, and wrap something like http-kit into something that receives from a channel 2. use the new virtual threads from project loom

hiredman01:03:33

Like if the place you are starting from is pmap there are a lot of assumptions baked into that function (in order, keeps results, etc) you'll want to tease apart which bits you actually want

hiredman01:03:54

Very often people don't care about order at all

stopa01:03:23

Indeed -- upmap from claypoole.

stopa01:03:52

But I guess I am a bit confused, by the following thought: If I were using something like Node.js and made let's say 100K promises of http requests. What is happening there? Afaik, it makes as many requests as it can. From light searching I think there's a maxSockets property, but it is set to Infinity --- But in the clojure case, all the default options we have, limit the number of concurrent requests we can have at a time.

hiredman01:03:41

There are limits in the node case, it may not expose them, and they may be implicit

hiredman01:03:17

(you are limited by the amount of ram you have to hold callbacks)

stopa01:03:06

Yes -- I am sure there's some other system-level block too, though this is an interesting question. Will look into it.

hiredman01:03:48

There is nothing stopping you from doing something similar with http kit, or whatever

hiredman01:03:05

The limit on core.async is a limit on the number of threads that run callbacks

hiredman01:03:31

Which on node is just 1

stopa01:03:08

Gotcha! Okay, so then my understanding is: core-async would be the way to go, as long as we can make the actual request non-blocking. [^] Does that jive with you? [^] ofc I could just w/ callbacks as well, but core-async would solve the callback-hell

hiredman01:03:49

I would likely start from the newish http client that has come with the jvm since java 11, I happen to be partial to core.async, so there is a good chance I would use that, but you could just use the completablefuture built in stuff as well

👍 2
pppaul01:03:56

I haven't heard of Claypool before. but I use manifold for most of my async stuff. it may be good to look into that. however your problem involves http stuff... so far I haven't heard that you need to reply to the http stuff, do you need to hold onto those 100k connections while you do your work?

❤️ 2
hiredman01:03:58

manifold is another async library for clojure, that started life as a library using netty from clojure, so the io integration is basically infinitely better than core.async (which has none)

👍 2
hiredman01:03:38

But I have some hairs to split with manifold (I think the way it does nondeterministic choice is ok for io systems, but not good for other things), but, again depending on what you are doing you might not care

pppaul01:03:49

I think that may be aleph that you are talking about. I don't think manifold has http stuff. manifold has a lot of overlap with core async, but it's code is a bit easier to read, and probably less documented. I find it pretty small and intuitive, though.

pppaul01:03:32

I think that whatever concurrent tool you adopt, the hard part will mostly be surrounding it. how far can you get without Claypool, and what is something closer to the real problem. do you need open connections, do you need to save partial calculations, are you aggregating these requests?

stopa01:03:50

Great questions!

hiredman01:03:59

(aleph was written first, then the author, with that experience, wrote manifold, and it is pretty clear to me it starts from a place of wanting to interop with netty's channels and pipelines)

pppaul01:03:31

oh, i didn't realize that

stopa01:03:01

For the problem that prompted this, it's actually requests to the database, so instead of http it's next-jdbc. I do something like:

(let [ids (query1 ...)
      items (my-pmap query2 ids)] 
  ;; some recursion
  )
;; I am aware this is an N+1 query.
Thinking it through with you all, I realize a few things. There's two layers: 1. thread-pool-size -- the parallelism of my-pmap 2. connection-pool-size -- the parallelism with next-jdbc If I understood correctly, thread-pool-size in this case must be <= connection-pool-size. Anything more won't increase the performance. Similarly, if I created a sort of async api to next-jdbc, it still wouldn't do us much good, because all the callbacks would be bound by connection-pool-size Since there is a real limit to connection-pool-size, I should just use something like claypoole, with thread-pool-size = connection-pool-size Let me know if you guys think differently!

hiredman01:03:46

There are no async/non-blocking jdbc drivers

👍 2
stopa01:03:21

Yes. that makes sense. I could create some non-blocking api on top, but it wouldn't do any good.

pppaul01:03:34

what is the size of the data you are dealing with?

pppaul01:03:41

can you make a less precise query, but get your data in fewer requests?

stopa01:03:57

Eventually these queries will batch together. You can think of them as a kind of GraphQL engine, but in my case I haven't yet implemented GraphQL's DataLoader abstraction

pppaul01:03:27

pathom or some other tools may do that for you. may be good to look into. how much control do you have over the db? could you make a table that would make this easy? does your data fit in memory? if you could use any sql statements, are there some that would work in your case that may not be available with the tool you are using?

stopa01:03:14

pathom looks cool! I remember seeing it briefly, but look more in depth, perhaps there's some shared code we can use. Thanks @U0LAJQLQ1

pppaul02:03:20

pathom seems like it wants to be a graphql thing that is more general, I haven't used it yet, but I've used things that are like it and they have been useful. if you use something like it, you'll still have to figure out the query to fetch all the data at once

👍 2
sirwobin08:03:45

By the by, if you do ever actually need 100k threads you might want to check out https://ales.rocks/notes-on-virtual-threads-and-clojure. Since you're interacting with a jdbc database that would be a very bad thing since you would hose your server without improving performance.

stopa00:03:03

And on that topic, I see in core-async, the default https://github.com/clojure/core.async/commit/a690c4f3b7bf9ae9e7bdc899c030955d5933042d is 8. It used to be 2 * availableProcessors + 42 Was there a reason that it changed?

👀 4
jrychter09:03:15

Hmm. I'm running my production servers with -Dclojure.core.async.pool-size=128 . I had problems with unexplained hangs, this made the problem go away (and yes, I do understand that this is a bug and should be fixed, but I have no resources to go deep into all the libraries).

👍 2
ruseel02:03:39

“pool size number” is controversial issue. My guess is it doesn’t matter much if it is large enough. It doesn’t matter which pool size is for — core.async or hikari connection pool. I got this opinion after saw [this video from Oracle](https://www.youtube.com/watch?v=_C77sBcAtSQ). I exposed to this video through [HikariCP wiki - About Pool Sizing](https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing). So when I first saw 2 * availableProcessors + 42 a few years ago, I guess that was a great “performance art”. And a few month ago, when I found default size is 8, I was disappointed that art was gone. Thanks for great blog posts from you, stopa. I am your fan! 🙂

stopa13:03:01

Brightened my day @U28U9Q8CU, thank you : ) Great video too!

agorgl09:03:25

Hello there! Is there a more idiomatic way to write this?:

(if condition
  (-> value
      (transform1)
      (transform2)
      (transform3))
  value)
(a form that does stuff to a value if a condition is true, or returns the original value if not)

Ben Sless09:03:26

(-> value (cond-> condition (-> f1 f2 f3)))

👍 4
agorgl09:03:46

Nice one, thank you!

Ben Sless09:03:06

cool thing about threading macros, is they compose

vemv10:03:02

I tend to write it like this

(let [c? (condition)]
  (cond-> value
    c? transform1
    c? transform2
    c? transform3))
Where c? can be any short symbol. That way I get something concise but that won't surprise others too much. If you appreciate https://stuartsierra.com/2018/07/06/threading-with-style, such thinking will come organically :)

agorgl10:03:07

nice article, checking it right now!

🙌 2
p-himik10:03:08

I would extract (-> value tr1 tr2 tr3) as its own function (say, f) and then write it as (cond-> value c? f).

👀 4
tommyB14:03:10

I don’t know, but the if form looks pretty clear to me.

Stefan Roex10:03:08

Hi! I’ve hit a stupid roadblock when working with a Java Library. A function requires an int argument. It’s a primitive int, so java.lang.Integer isn’t allowed. And I can’t find a way to convert or construct a primitive int from Clojure. Run out of ideas to try…

Stefan Roex10:03:32

Tried that:

(class (.intValue 1))
  ;;    #<Class@5acf9800 java.lang.Integer>

genmeblog10:03:12

(int 1) should work

💯 2
iperdomo10:03:14

user=> (class 10)
java.lang.Long
user=> (class (int 10))
java.lang.Integer
user=> (.intValue (int 10))
10

Stefan Roex10:03:34

can you check the class on that? I still returns java.lang.Integer here.

Stefan Roex10:03:00

I’m thinking maybe it’s some JVM setting. Although I mostly use default settings. And a clean jvm also returns the same

genmeblog10:03:10

It returns class because of boxing, but it should be compiled to a primitive path.

2
Stefan Roex10:03:54

Hmm, I’m still not sure. When I wrap the integer with int, for example (int 1) . class still indicates that it’s a java.lang.Integer, not a int. I’m using the sshj library and when I call a function which requires the int arg, it throws: 1. Unhandled java.lang.IllegalArgumentException No matching field found: setTimeout for class java.lang.Integer

genmeblog10:03:42

try (unchecked-int 1)

Stefan Roex10:03:24

been working on this for more than an hour, but seems like the args are just wroing

Stefan Roex10:03:42

just a java.lang.Long even works

genmeblog10:03:46

(Math/abs (unchecked-int -1))
translates directly to:
Math.abs((int)(-1L));

genmeblog10:03:37

yes, it should work beacuse of boxing, translation between numeric class and a primitive is handled by JVM

👏 2
Stefan Roex10:03:40

😅 It’s solved by passing an extra argument. The exception pointed us to the primitive int, but the problem was the signature of a whole

🙂 2
Stefan Roex10:03:45

ah, good to know 🙂

Stefan Roex10:03:58

Thanks for helping!! 🙂

iperdomo11:03:01

I was surprised and did a quick test locally, you don't need to cast ....

Ben Sless11:03:31

The class function will always return a boxed type because its argument is an Object It's sufficient to call int on the argument

2
Alex Miller (Clojure team)12:03:14

In Clojure 1.11+, just use clojure.core/abs

vlad_poh17:03:26

@U0GTGHX1D is your screenshot from emacs? what theme and setup are you using?

iperdomo17:03:52

@U06GMV0B0 yeap, emacs, AFAIK the default :thinking_face:

iperdomo17:03:29

(when (display-graphic-p)
  (load-theme 'leuven t))

iperdomo17:03:46

regarding the setup, I use a customized (my own package list) version of this - https://github.com/larstvei/dot-emacs

👍 2
stopa14:03:58

Hey team, I have a noob issue using thread-pools: Context: Context: • I have a db with a connection-pool-size of 50 • I have a recursive function, which takes a tree, and resolves the queries • I thought, let me use a threadpool to parallelize the queries I am sending to the db • The size of the thread-pool is also 50 (as this is mostly io-bound) Problem:

(defn- query-one
  [{:keys [pool] :as ctx} form]
  (let [[where-view eids] (resolve-eids ctx form)
        [obj-nodes child-form-nodes]
        (cp/pvalues pool
                    (cp/upmap pool (partial obj-node ctx) eids)
                    (cp/upmap (partial query-one ctx) (child-forms forms eids)))]
    (-> (make-node where-view)
        (add-children obj-nodes)
        (add-children child-form-nodes))))
You can imagine this solves a graphql-like query, in the form of "get me posts for this user, with comments and profile photos" The problem: I could end up in a deadlock. This is because, as I recurse into children, I may end up taking up all the threads. --- I may be thinking about this incorrectly. Would you write it differently? Maybe I need to rethink where I am doing the pmap.

lispyclouds14:03:20

I have 2 questions: • any reasons to limit the thread pool size to 50? • can you use jdk 19+?

stopa14:03:46

Re: limit to 50 ^ My thinking was, the parallelism here is constrained by the number of jdbc connections I can have at one time. Since there are 50 connections, having more than 50 threads wouldn't help with performance ^ I could use loom, but thought it would be simpler to avoid it if I could

reefersleep15:03:50

It sounds to me like you need some sort of way to visit child nodes first, and keep track of how many queries you’ve got going on at once. Which is much more manual than what you’re doing now, I think.

lispyclouds15:03:58

so if you have more threads than the db conn pool, the pending ones will get queued. yes not much increase in perf. but that makes you not think of the threads too

👍 2
lispyclouds15:03:24

and on the contrary using loom should simplify this more

lispyclouds15:03:10

with loom for instance, id walk the tree and gather all the calls into a vec and fire them all and not have a thread pool at all

stopa15:03:44

(ah, I realize jdk19+ may not be an option either; I am on beanstalk which I think is on 17) I can't quite visit child-nodes first, because the child-nodes depend on the parent. Something I just found: https://stackoverflow.com/questions/35022403/recursion-aware-threadpool-in-java Will look deeper!

2
lispyclouds15:03:56

> I can't quite visit child-nodes first, because the child-nodes depend on the parent. the way im thinking of this is i walk the tree level order to get a DAG like effect and get the sequence of calls. each call has an id which the nodes can look up for the value.

lispyclouds15:03:48

essentially avoid making the calls when walking because of the top down dependency. Harder to parallelise

lispyclouds15:03:56

nodes at the same level can be parallelised is what im thinking?

stopa15:03:55

Yes, I think so. Here's a quick visualization for what's happening in query-one (see attached) This is the query "get me users with their bookshelves" At the top level. We'd have 1. get me user ids Then for each user id, we can parallelize 'get me user info'

(cp/upmap pool (partial obj-node ctx) eids)
And we can parallelize the recursion: 2. get me bookshelves for each user
(cp/upmap (partial query-one ctx) (child-forms forms eids)))

stopa15:03:24

https://gist.github.com/divs1210/f1f050bbdaf045b1f43bbb346267e69c https://puredanger.github.io/tech.puredanger.com/2011/01/04/forkjoin-clojure/ It does look like fork/join pools handle recursion. I am not 100% sure how -- need to run but will keep digging. Thanks for riffing with me team!

stopa18:03:00

Update: I don't think Fork/Join pool will do the trick. Mainly, because, if the thread is blocking on io, I don't think it can be reused. The only option I see at the moment would be virtual-threads, or to rewrite this in a way that doesn't rely on recursion. But this is a bit surprising, I feel like I must be missing something.

stopa12:03:06

For posterity: https://stackoverflow.com/questions/75805022/thread-pool-for-recursive-calls-to-jdbc-in-clojure This helped. Used a threadpool for jdbc calls, and core-async for the recursion.

👍 2
Grimgog18:03:39

Short question about outdated in windows: I use "clojure -T:search/outdated" in my terminal (aka powershell) but it crashes with following error message: (seems to be some damn windows path missmatches - anyone an idea how to fix this?) {:clojure.main/message"Execution error (IllegalArgumentException) at antq.util.file/normalize-path (file.clj:8).\r\nInvalid match arg: \r\n", :clojure.main/triage {:clojure.error/class java.lang.IllegalArgumentException, :clojure.error/line 8, :clojure.error/cause "Invalid match arg: ", :clojure.error/symbol antq.util.file/normalize-path, ....

p-himik18:03:26

Which version of antq?

Grimgog18:03:21

clojure 1.11.1 - so it should be [antq "2.2.1017"]

p-himik18:03:32

Uhm, why? Your tools' versions are unrelated to CLJ version. You install tools yourself, and you gotta keep them up-to-date yourself as well.

Grimgog18:03:59

ah sure I checked my deps.edn and it said: com.github.liquidz/antq {:mvn/version "1.6.1"}

Grimgog18:03:39

Alright, thank you very much! You helped me saving some damn headaches 😉

👍 2