Fork me on GitHub
#clojure
<
2023-03-16
>
pesterhazy12:03:28

Pseudocode:

(->> jobs (pmap http/download) vec)
pmap works great for HTTP io, but it's limited. It doesn't allow me to customize the number of concurrent jobs or what to do in case of an exception. Is there a more customizable option?

pesterhazy12:03:47

Bonus points if it works in babashka

opqdonut12:03:24

not sure about babashka support

opqdonut12:03:22

looks like somebody just suggested using an Executor previously: https://clojurians-log.clojureverse.org/babashka/2022-11-25/1669366747.630009

lispyclouds12:03:27

id recommend against the use of pmap for side effects

👍 2
p-himik12:03:09

Yeah, pretty much nobody recommends pmap for anything, except maybe one-off REPL evaluations. :)

jjttjj12:03:13

Promesa supports babashka and has useful stuff for this https://funcool.github.io/promesa/latest/executors.html

lispyclouds12:03:52

raw executors are pretty much meant for this and has the virtual thread sweetness too in bb 😄

pesterhazy12:03:40

Good point about java.util.concurrent.Executor. Is there a nice Clojure example example for lazy copy-n-pasters like me?

opqdonut12:03:58

the source of claypoole 😛

pesterhazy12:03:51

> id recommend against the use of pmap for side effects Any particular reason other than "`map` should be free of side-effects"?

lispyclouds12:03:13

yeah pmap has lazy chunking and would have weird effects

lispyclouds12:03:49

laziness+side effects = generally bad idea

opqdonut12:03:50

as long as the function executions are independent (e.g. fetching different urls) it's fine tho

opqdonut12:03:01

and you force the whole result, like vec does

pesterhazy12:03:26

yeah, i think it's fine for simple cases as well (but clearly not a perfect match)

lispyclouds12:03:46

the classic clojure quote: its easy not simple

opqdonut12:03:21

pmap does a lot of fancy stuff to be able to process a list incrementally, but if you want all of the results at the same time you can just use Executor.invokeAll like that example above

👍 2
opqdonut12:03:31

or promesa/all

👍 2
pesterhazy12:03:40

Can I say "download 10 at a time" with promesa?

opqdonut12:03:30

> • px/fixed-executor: creates a thread pool that reuses a fixed number of threads operating off a shared unbounded queue.

💡 2
lispyclouds12:03:29

similarly the vanilla executor has a fixed threads version of it with the virtual threads factory

pesterhazy12:03:36

Awesome thanks for the input

borkdude13:03:08

@U06F82LES you can use babashka.http-client with async requests

borkdude13:03:32

Processing async request results can be done like this (I want to make this easier using callbacks or so):

(-> (http/request (-> (assoc request :async true)))
                           (.thenApply
                            (reify Function
                              (apply [_ resp]
                                (do-something-with resp))))
                           (.exceptionally
                            (reify Function
                              (apply [_ e]
                                (throw? e))))))))

borkdude13:03:57

You can give it a fixed thread executor with:

(http/client (assoc http/default-client-opts :executor (java.util.concurrent.Executors/newFixedThreadPool 10))

borkdude13:03:03

All of this works in bb

pesterhazy18:03:55

That's pretty cool @U04V15CAJ

emilaasa17:03:41

I really enjoy using claypoole, would recommend. There's a host of functions that are helpful: https://cljdoc.org/d/org.clj-commons/claypoole/1.2.2/api/com.climate.claypoole.lazy#pdoseq

👍 2
pesterhazy11:03:33

Would you say that Claypool 's pmap can safely be used for i/o, like http downloads?

lispyclouds11:03:34

id say use the virtual promesa ones or the raw virtual threaded executor. if on java 19+ and not using virtual threads for IO is quite the waste of resources IMO

borkdude11:03:02

promesa doesn't run in bb, but executor is supported

borkdude11:03:40

but is there a way to limit the threads in a virtual executor? this seems important since pesterhazy asked for "10 at a time"

borkdude11:03:55

and he asked for a bb compatible solution

lispyclouds11:03:06

yeah you create a fixed thread pool with the vitual thread factory

lispyclouds11:03:22

can whip up an example

lispyclouds11:03:04

hold my parens

😄 2
lispyclouds11:03:42

(def vfactory (.. (Thread/ofVirtual)
                  (name "blazing-vthread-" 0)
                  (factory)))

(def executor (java.util.concurrent.Executors/newFixedThreadPool 10 vfactory))

(http/client (assoc http/default-client-opts :executor executor))
needs bb 1.3.176+

lispyclouds11:03:41

i just love the design the loom people did to be forever backwards compatible 😄

2
pesterhazy11:03:16

What are the benefits and tradeoffs of virtual threads? Never used them

lispyclouds11:03:58

its similar to go routines or erlang processes. detects when io happens and offloads from the carrier thread

borkdude11:03:11

@U06F82LES for 10 threads it's ok to use OS threads, but virtual threads multiplex on real threads, so you can have a near-unbounded amount of them, similar to the go macro in clojure, but more efficient since it's on the JVM level

lispyclouds11:03:13

much more efficient waiting on IO

borkdude11:03:07

I rewrote the clojure.core.async go macro to take advantage of virtual threads in the next release of bb

😮 2
lispyclouds11:03:38

as part of the project loom, most of lower level java SDK was rewritten to make them notify the jvm of the IO calls and it manages the vthrerad->osthread mapping

💡 2
lispyclouds11:03:02

biggest change to the jdk since java

borkdude11:03:27

and all of this without writing a line of "async/await" crap :)

lispyclouds11:03:56

tradeoffs are: • you dont think of pools anymore but that could mean the tiny db you were talking to could die under the unbounded load 😉 • vthreads are not meant for CPU bound tasks. its for concurrency and not parallelism. use the real threads for that

pesterhazy11:03:18

Thanks, super interesting

lispyclouds11:03:11

another nice solution to the https://journal.stuffwithstuff.com/2015/02/01/what-color-is-your-function/ apart from Go, Erlang/Elixir, Lua, Zig

borkdude16:03:12

1.3.176 released which supports the above virtual thread stuff and also has async callbacks now when making async requests

🧡 3
m.q.warnock17:03:43

Choice of concurrency-primitives question: I'm building a "massively-single-player" game; ie, each player has their own independent game-state, but it's managed server-side. In addition to events coming in from the client, timers may fire, and 'authors' can do things with their own clients which result in events the player's game 'loop' needs to handle. I currently have a function (`step!`) which is passed an event, grabs the player's current state, constructs a next-state and sequence of 'actions' (side-effecting), then tries to execute the actions, and if they all succeed, calls reset! on the atom containing the player's current-state. There is an obvious race-condition in this, if that function is called in parallel. Using goroutines or agents are obvious options, but I worry about the scalability; should I? Does a mutex over that function make more sense? Or maybe an event-queue that doesn't rely on core.async or agents (but then, how do I pull events, especially timers, without a busy-loop)? Maybe I'm overthinking it, since as long as it's abstracted enough I can try different things later, if I have the nice problem of having to scale; on the other hand, I'm curious what the consensus is on best-practice for this sort of thing.

👀 2
phronmophobic18:03:33

What kind of side-effecting actions are you considering? What happens if they partially fail/succeed?

phronmophobic18:03:19

In theory, updating the current player's state is also a side-effecting action (and could be treated similarly to the other side-effects).

m.q.warnock18:03:40

the actions are currently all messages to the client, to change its state. If any of them fail, it's probably because the client dropped offline, and the state before that will at least be consistent when they reconnect

phronmophobic18:03:13

Scaling up usually comes from being able to add more servers. For that, the harder problem is usually moving "player's states" between servers so that you can scale up/down.

phronmophobic18:03:01

> If any of them fail, it's probably because the client dropped offline, and the state before that will at least be consistent when they reconnect There's actually no way to know what state the client was in when they disconnected.

m.q.warnock18:03:29

this is a hobby project, and requires ML inference on a gpu; I'm serving it from home, and would like a single very beefy server to handle everything for the foreseeable future.

m.q.warnock18:03:44

yes, but when they reconnect, everything is reestablished from the last-known-consistent state

phronmophobic18:03:39

Usually, you keep a queue of events and have a way for the client to ask for "everything since message id X" so there's no benefit to waiting for an acknowledgement before committing state updates on the server.

m.q.warnock18:03:46

yes; I could do that, and probably will longer-term, but the client is very 'thin', and another benefit of the way I'm doing all-or-nothing state-updates is that bugs in my rapidly changing codebase don't leave state inconsistent, whether client or server-side. all of this is orthogonal to my question

phronmophobic18:03:13

are the messages sending deltas, the full state, or a mix?

phronmophobic18:03:41

The answer to your original question really depends on what the side effects and what happens if they fail. Since the side effects are messages to the client, I would suggest updating the server state right away and have a separate process that syncs the server state to the client (via deltas, full state, or both)

phronmophobic18:03:25

Unless there's a good reason not to, I would use swap! rather than reset! to make sure your updates are consistent and avoid race conditions.

m.q.warnock18:03:08

I guess I didn't explain myself well. I'll try again some other time.

hiredman18:03:18

our billing system at work is sort of the same kind of thing, there is a billing state machine, and we run the state machine for each user

hiredman18:03:21

the state for each user is essentially an atom (really a custom IAtom type backed by a database row), and we advance through the state machine processing different events using compare-and-set!

hiredman19:03:41

we build up a set of side effects to run if a given compare-and-set! succeeds, then do the compare-and-set!, and if it succeeds we execute the side effects, and if not different things happening depending (sometimes there is a retry loop, etc)

hiredman19:03:59

Using reset! bypasses all the useful concurrency properties of using an atom

2
Nate19:03:40

atoms with reset! are still valuable for single updater many reader scenarios

👀 2
m.q.warnock03:03:32

Thanks, Nate. That is, indeed, the scenario in question, except that I'm not currently funneling all events through one 'updater', in a strict fashion. That isn't a problem yet, because it's close enough in practice. Using swap, without changing anything else, would be worse than pointless. I didn't think isolating pure functions from side-effecting ones would be at all controversial here. My pure code is about 100x as complex as the impure: calling into pytorch, etc. I could work on making all the side-effects strictly idempotent (they mostly are, again, in practice), or go the route of reliably replicating state between client and server, but that seems premature to me, and I'd have to explain a lot about what my client currently does (not much), and what all my plans are, to convince anyone of that. I was really just wondering what people were doing within the scope of what I described, these days. I've gotten the impression that core.async and agents have fallen out of 'fashion'.

genmeblog22:03:53

Does clojure.data.int-map keys are sorted? Ie. is keys function on int-map returns sorted sequence? Looks like yes, but I'm not sure...

dpsutton23:03:03

i don’t see it in the documentation but it seems so

(let [numbers (range 1 1e6)
      shuffled (shuffle numbers)
      int-map (into (i/int-map) (map (fn [x] [x x]) shuffled))]
  (= (keys int-map) numbers))

p-himik08:03:47

It was a deliberate change, albeit not documented: https://clojure.atlassian.net/browse/DIMAP-4

genmeblog09:03:05

Oh, good to know, so it wasn't the case from the very beginning. I think it could also implement Sorted interface like PersistentTreeMap does.

p-himik09:03:05

Oh, could you add it as a comment to the Ask?

genmeblog09:03:09

Added already! 🙂

👍 2