This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-03-21
Channels
- # announcements (26)
- # babashka (115)
- # babashka-sci-dev (5)
- # beginners (48)
- # calva (69)
- # cider (4)
- # clj-commons (11)
- # clj-kondo (1)
- # cljfx (29)
- # clojure (109)
- # clojure-art (1)
- # clojure-czech (1)
- # clojure-europe (33)
- # clojure-nl (1)
- # clojure-nlp (3)
- # clojure-norway (7)
- # clojure-uk (1)
- # clojurescript (63)
- # clr (1)
- # data-science (41)
- # datalevin (1)
- # datomic (11)
- # emacs (58)
- # etaoin (11)
- # figwheel-main (1)
- # fulcro (5)
- # google-cloud (12)
- # helix (2)
- # honeysql (21)
- # hyperfiddle (22)
- # joyride (53)
- # malli (52)
- # off-topic (27)
- # portal (4)
- # re-frame (19)
- # releases (3)
- # ring-swagger (5)
- # xtdb (30)
Hey team, if I have a system that is primarily io-bound, what would be the most idiomatic way to pmap? I could use claypoole, but I am not really sure about how to choose the thread pool size. My current thinking is, that for io-bound stuff core-async would be the way to go. My reasoning: • I could "open" thousands of io-parked threads, vs in the claypoole case, I could max have thread-pool-size open connections Is this thinking reasonable?
If you want to spin up unlimited threads (and you don't want this) doing so via core.async or something else is the same
Wow. So to make sure I understand: • If core-async has a thread-pool size of 8 • If I have 1000 http requests. • If I used this https://gist.github.com/seancorfield/e7fa885e35df232dbb3758735088ae6e function, where I initiate 1000 go blocks, each with a blocking http-request call Then what would happen is, only 8 requests would be open at a time?
...oi. good to know. Thank you @U0NCTKEV8
• I have 1000 http requests. • I want to initiate all 1000 requests at once • Then wait for the results
One thing I could do: 1. use core-async, and wrap something like http-kit into something that receives from a channel 2. use the new virtual threads from project loom
Like if the place you are starting from is pmap there are a lot of assumptions baked into that function (in order, keeps results, etc) you'll want to tease apart which bits you actually want
But I guess I am a bit confused, by the following thought:
If I were using something like Node.js and made let's say 100K promises of http requests.
What is happening there?
Afaik, it makes as many requests as it can. From light searching I think there's a maxSockets
property, but it is set to Infinity
---
But in the clojure case, all the default options we have, limit the number of concurrent requests we can have at a time.
Yes -- I am sure there's some other system-level block too, though this is an interesting question. Will look into it.
I do remember reading this: https://engineering.zalando.com/posts/2019/04/how-to-set-an-ideal-thread-pool-size.html
There is nothing stopping you from doing something similar with http kit, or whatever
Gotcha! Okay, so then my understanding is: core-async would be the way to go, as long as we can make the actual request non-blocking. [^] Does that jive with you? [^] ofc I could just w/ callbacks as well, but core-async would solve the callback-hell
I would likely start from the newish http client that has come with the jvm since java 11, I happen to be partial to core.async, so there is a good chance I would use that, but you could just use the completablefuture built in stuff as well
I haven't heard of Claypool before. but I use manifold for most of my async stuff. it may be good to look into that. however your problem involves http stuff... so far I haven't heard that you need to reply to the http stuff, do you need to hold onto those 100k connections while you do your work?
manifold is another async library for clojure, that started life as a library using netty from clojure, so the io integration is basically infinitely better than core.async (which has none)
But I have some hairs to split with manifold (I think the way it does nondeterministic choice is ok for io systems, but not good for other things), but, again depending on what you are doing you might not care
I think that may be aleph that you are talking about. I don't think manifold has http stuff. manifold has a lot of overlap with core async, but it's code is a bit easier to read, and probably less documented. I find it pretty small and intuitive, though.
I think that whatever concurrent tool you adopt, the hard part will mostly be surrounding it. how far can you get without Claypool, and what is something closer to the real problem. do you need open connections, do you need to save partial calculations, are you aggregating these requests?
(aleph was written first, then the author, with that experience, wrote manifold, and it is pretty clear to me it starts from a place of wanting to interop with netty's channels and pipelines)
For the problem that prompted this, it's actually requests to the database, so instead of http it's next-jdbc. I do something like:
(let [ids (query1 ...)
items (my-pmap query2 ids)]
;; some recursion
)
;; I am aware this is an N+1 query.
Thinking it through with you all, I realize a few things.
There's two layers:
1. thread-pool-size -- the parallelism of my-pmap
2. connection-pool-size -- the parallelism with next-jdbc
If I understood correctly, thread-pool-size
in this case must be <=
connection-pool-size
. Anything more won't increase the performance. Similarly, if I created a sort of async
api to next-jdbc, it still wouldn't do us much good, because all the callbacks would be bound by connection-pool-size
Since there is a real limit to connection-pool-size
, I should just use something like claypoole
, with thread-pool-size = connection-pool-size
Let me know if you guys think differently!Yes. that makes sense. I could create some non-blocking api on top, but it wouldn't do any good.
Eventually these queries will batch together. You can think of them as a kind of GraphQL engine, but in my case I haven't yet implemented GraphQL's DataLoader
abstraction
pathom or some other tools may do that for you. may be good to look into. how much control do you have over the db? could you make a table that would make this easy? does your data fit in memory? if you could use any sql statements, are there some that would work in your case that may not be available with the tool you are using?
pathom looks cool! I remember seeing it briefly, but look more in depth, perhaps there's some shared code we can use. Thanks @U0LAJQLQ1
pathom seems like it wants to be a graphql thing that is more general, I haven't used it yet, but I've used things that are like it and they have been useful. if you use something like it, you'll still have to figure out the query to fetch all the data at once
By the by, if you do ever actually need 100k threads you might want to check out https://ales.rocks/notes-on-virtual-threads-and-clojure. Since you're interacting with a jdbc database that would be a very bad thing since you would hose your server without improving performance.
And on that topic, I see in core-async, the default https://github.com/clojure/core.async/commit/a690c4f3b7bf9ae9e7bdc899c030955d5933042d is 8. It used to be 2 * availableProcessors + 42 Was there a reason that it changed?
Hmm. I'm running my production servers with -Dclojure.core.async.pool-size=128
. I had problems with unexplained hangs, this made the problem go away (and yes, I do understand that this is a bug and should be fixed, but I have no resources to go deep into all the libraries).
“pool size number” is controversial issue. My guess is it doesn’t matter much if it is large enough. It doesn’t matter which pool size is for — core.async or hikari connection pool.
I got this opinion after saw [this video from Oracle](https://www.youtube.com/watch?v=_C77sBcAtSQ).
I exposed to this video through [HikariCP wiki - About Pool Sizing](https://github.com/brettwooldridge/HikariCP/wiki/About-Pool-Sizing).
So when I first saw 2 * availableProcessors + 42
a few years ago, I guess that was a great “performance art”. And a few month ago, when I found default size is 8, I was disappointed that art was gone.
Thanks for great blog posts from you, stopa. I am your fan! 🙂
Brightened my day @U28U9Q8CU, thank you : ) Great video too!
Hello there! Is there a more idiomatic way to write this?:
(if condition
(-> value
(transform1)
(transform2)
(transform3))
value)
(a form that does stuff to a value if a condition is true, or returns the original value if not)I tend to write it like this
(let [c? (condition)]
(cond-> value
c? transform1
c? transform2
c? transform3))
Where c?
can be any short symbol. That way I get something concise but that won't surprise others too much. If you appreciate https://stuartsierra.com/2018/07/06/threading-with-style, such thinking will come organically :)I would extract (-> value tr1 tr2 tr3)
as its own function (say, f
) and then write it as (cond-> value c? f)
.
Hi! I’ve hit a stupid roadblock when working with a Java Library. A function requires an int
argument. It’s a primitive int, so java.lang.Integer isn’t allowed. And I can’t find a way to convert or construct a primitive int from Clojure. Run out of ideas to try…
try https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Integer.html#intValue()
Tried that:
(class (.intValue 1))
;; #<Class@5acf9800 java.lang.Integer>
user=> (class 10)
java.lang.Long
user=> (class (int 10))
java.lang.Integer
user=> (.intValue (int 10))
10
can you check the class on that? I still returns java.lang.Integer here.
I’m thinking maybe it’s some JVM setting. Although I mostly use default settings. And a clean jvm also returns the same
It returns class because of boxing, but it should be compiled to a primitive path.
Hmm, I’m still not sure. When I wrap the integer with int
, for example (int 1)
. class
still indicates that it’s a java.lang.Integer, not a int
. I’m using the sshj library and when I call a function which requires the int
arg, it throws:
1. Unhandled java.lang.IllegalArgumentException
No matching field found: setTimeout for class java.lang.Integer
oooh, fuu*
haha 🙂
been working on this for more than an hour, but seems like the args are just wroing
wrong*
just a java.lang.Long even works
yes, it should work beacuse of boxing, translation between numeric class and a primitive is handled by JVM
😅 It’s solved by passing an extra argument. The exception pointed us to the primitive int, but the problem was the signature of a whole
ah, good to know 🙂
Thanks for helping!! 🙂
The class function will always return a boxed type because its argument is an Object It's sufficient to call int on the argument
In Clojure 1.11+, just use clojure.core/abs
@U0GTGHX1D is your screenshot from emacs? what theme and setup are you using?
@U06GMV0B0 yeap, emacs, AFAIK the default :thinking_face:
regarding the setup, I use a customized (my own package list) version of this - https://github.com/larstvei/dot-emacs
Hey team, I have a noob issue using thread-pools: Context: Context: • I have a db with a connection-pool-size of 50 • I have a recursive function, which takes a tree, and resolves the queries • I thought, let me use a threadpool to parallelize the queries I am sending to the db • The size of the thread-pool is also 50 (as this is mostly io-bound) Problem:
(defn- query-one
[{:keys [pool] :as ctx} form]
(let [[where-view eids] (resolve-eids ctx form)
[obj-nodes child-form-nodes]
(cp/pvalues pool
(cp/upmap pool (partial obj-node ctx) eids)
(cp/upmap (partial query-one ctx) (child-forms forms eids)))]
(-> (make-node where-view)
(add-children obj-nodes)
(add-children child-form-nodes))))
You can imagine this solves a graphql-like query, in the form of "get me posts for this user, with comments and profile photos"
The problem: I could end up in a deadlock. This is because, as I recurse into children, I may end up taking up all the threads.
---
I may be thinking about this incorrectly. Would you write it differently? Maybe I need to rethink where I am doing the pmap.I have 2 questions: • any reasons to limit the thread pool size to 50? • can you use jdk 19+?
Re: limit to 50 ^ My thinking was, the parallelism here is constrained by the number of jdbc connections I can have at one time. Since there are 50 connections, having more than 50 threads wouldn't help with performance ^ I could use loom, but thought it would be simpler to avoid it if I could
It sounds to me like you need some sort of way to visit child nodes first, and keep track of how many queries you’ve got going on at once. Which is much more manual than what you’re doing now, I think.
so if you have more threads than the db conn pool, the pending ones will get queued. yes not much increase in perf. but that makes you not think of the threads too
and on the contrary using loom should simplify this more
with loom for instance, id walk the tree and gather all the calls into a vec and fire them all and not have a thread pool at all
(ah, I realize jdk19+ may not be an option either; I am on beanstalk which I think is on 17) I can't quite visit child-nodes first, because the child-nodes depend on the parent. Something I just found: https://stackoverflow.com/questions/35022403/recursion-aware-threadpool-in-java Will look deeper!
> I can't quite visit child-nodes first, because the child-nodes depend on the parent. the way im thinking of this is i walk the tree level order to get a DAG like effect and get the sequence of calls. each call has an id which the nodes can look up for the value.
essentially avoid making the calls when walking because of the top down dependency. Harder to parallelise
nodes at the same level can be parallelised is what im thinking?
Yes, I think so. Here's a quick visualization for what's happening in query-one (see attached) This is the query "get me users with their bookshelves" At the top level. We'd have 1. get me user ids Then for each user id, we can parallelize 'get me user info'
(cp/upmap pool (partial obj-node ctx) eids)
And we can parallelize the recursion:
2. get me bookshelves for each user
(cp/upmap (partial query-one ctx) (child-forms forms eids)))
https://gist.github.com/divs1210/f1f050bbdaf045b1f43bbb346267e69c https://puredanger.github.io/tech.puredanger.com/2011/01/04/forkjoin-clojure/ It does look like fork/join pools handle recursion. I am not 100% sure how -- need to run but will keep digging. Thanks for riffing with me team!
Update: I don't think Fork/Join pool will do the trick. Mainly, because, if the thread is blocking on io, I don't think it can be reused. The only option I see at the moment would be virtual-threads, or to rewrite this in a way that doesn't rely on recursion. But this is a bit surprising, I feel like I must be missing something.
For posterity: https://stackoverflow.com/questions/75805022/thread-pool-for-recursive-calls-to-jdbc-in-clojure This helped. Used a threadpool for jdbc calls, and core-async for the recursion.
Short question about outdated in windows:
I use "clojure -T:search/outdated" in my terminal (aka powershell) but it crashes with following error message:
(seems to be some damn windows path missmatches - anyone an idea how to fix this?)
{:clojure.main/message"Execution error (IllegalArgumentException) at antq.util.file/normalize-path (file.clj:8).\r\nInvalid match arg: \r\n",
:clojure.main/triage
{:clojure.error/class java.lang.IllegalArgumentException,
:clojure.error/line 8,
:clojure.error/cause "Invalid match arg: ",
:clojure.error/symbol antq.util.file/normalize-path,
....
Uhm, why? Your tools' versions are unrelated to CLJ version. You install tools yourself, and you gotta keep them up-to-date yourself as well.
ah sure I checked my deps.edn and it said: com.github.liquidz/antq {:mvn/version "1.6.1"}
Yeah, you gotta update: https://github.com/liquidz/antq/issues/172#issuecomment-1183700642