This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-12-21
Channels
- # adventofcode (60)
- # aleph (2)
- # announcements (3)
- # architecture (2)
- # babashka (12)
- # beginners (90)
- # calva (14)
- # cider (32)
- # clj-kondo (1)
- # clj-together (7)
- # cljsrn (4)
- # clojars (10)
- # clojure (161)
- # clojure-dev (110)
- # clojure-europe (58)
- # clojure-nl (3)
- # clojure-spec (35)
- # clojure-taiwan (1)
- # clojure-uk (24)
- # clojuredesign-podcast (3)
- # clojurescript (27)
- # conjure (47)
- # cursive (17)
- # data-science (1)
- # datomic (1)
- # depstar (6)
- # fulcro (20)
- # java (4)
- # jobs-rus (1)
- # luminus (4)
- # malli (10)
- # off-topic (8)
- # re-frame (4)
- # reagent (1)
- # reitit (9)
- # reveal (1)
- # rewrite-clj (8)
- # ring (3)
- # sci (44)
- # shadow-cljs (5)
- # spacemacs (6)
- # specter (8)
- # tools-deps (6)
- # vim (1)
- # xtdb (11)
In Joy of Clojure 2nd ed. (p. 253 - 255) they give a following example of making array mutations safe:
(defn make-safe-array [t sz]
(let [a (make-array t sz)]
(reify SafeArray
(count [_] (clj/count a))
(seq [_] (clj/seq a))
;; is locking really neccessary for aget? what could happen?
(aget [_ i] (locking a
(clj/aget a i)))
(aset [this i f] (locking a
(clj/aset a i (f (aget this i))))))))
(full sample here: https://github.com/jumarko/clojure-experiments/blob/master/src/clojure_experiments/books/joy_of_clojure/ch10_mutation_and_concurrency.clj#L280-L282)
I'm wondering why they lock aget
at all? Isn't it enough to lock aset
? Why should I block readers while there's a write in progress?Likely because java.lang.reflect.Array/get
doesn't say anything about it being thread-safe.
Hmm, that might be it. But what would that mean? Like observing a half-set value? What would that even be?
Maybe because of this: https://docs.oracle.com/javase/tutorial/essential/concurrency/memconsist.html
Exactly, you need something that orders both reads and writes respective of each other, otherwise the jvm can do things like read the array index once and cache it in a register, and just say your writes all happened after the read
@U06BE1L6T locking emits a memory fencing instruction prevents operation reordering and that makes sure your CPU caches are synced. One thread might update a value in L1 cache (which are per-core) then another thread on another core might read the same value in it’s own L1 cache. Typically memory fence causes the changes to get pushed to L3 cache which isn’t per core. writing a volatile does the same, so generally for scalar values (int, long, writing a reference), volatile is sufficient
Ah right, so the read lock is there only to provide a fresh value - otherwised it could get cached;
I think it's unlikely to happen here (I increment the array values in 100 concurrent threads, then read them all afterwards), maybe because the cache coherence protocol will actually fetch the proper value when it's modified by the aset
operation (even when there's no lock in aget
)
I definitely couldn't find any consistency issue when removing the aget
lock and testing it (https://github.com/jumarko/clojure-experiments/blob/master/src/clojure_experiments/books/joy_of_clojure/ch10_mutation_and_concurrency.clj#L286-L290)
Such bugs are notoriously difficult to test for. Sometimes you may catch them with such tests, but there is no guarantee you will
Yeah, based on my understanding of JMM and memory consistency properties (https://docs.oracle.com/en/java/javase/15/docs/api/java.base/java/util/concurrent/package-summary.html#MemoryVisibility) they do the right thing in the book; in particular: > Actions prior to "releasing" synchronizer methods such as Lock.unlock, Semaphore.release, and CountDownLatch.countDown happen-before actions subsequent to a successful "acquiring" method such as Lock.lock, Semaphore.acquire, Condition.await, and CountDownLatch.await on the same synchronizer object in another thread. Here are some good resources dealing with more details regarding the notion of volatile et al meaning "flush to main memory" (which was the impression I got from reading some Java book a decade ago but found much later that this is likely false when reading about the MESI cache coherence protocol): • https://stackoverflow.com/questions/1850270/memory-effects-of-synchronization-in-java • https://mechanical-sympathy.blogspot.com/2013/02/cpu-cache-flushing-fallacy.html • https://stackoverflow.com/questions/42746793/does-a-memory-barrier-ensure-that-the-cache-coherence-has-been-completed/42750844#42750844
@U06BE1L6T locks or not, there's a race condition here because the sequence can be constructed before the threads are done mutating. Look at the result of (-> (make-safe-array Integer/TYPE 8) (doto pummel) seq)
Oh yeah, you're right. I think they basically rely on the reader waiting until the threads are done (which is quick for a human experimenting in the REPL 🙂 ).
... in which case, I think, the read lock basically doesn't matter at all but would be the right thing to do for an operation happening immediately after a previous aset
, right?
there's many possible reasons why you could see the latest value without explicit synchronization, but in general physical time is not something you should rely on
What’s the default for clojure.compiler.direct-linking
and elide-meta
jvm options when doing a lein jar
or lein uberjar
?
by default those aren't used at all afaik
so no direct linking, no elide-meta
@roklenarcic a build tool should not change these options unless the users asks for it
Anyone here using vim with conjure in a monorepo? My issue is that I typically open files in multiple projects and it becomes tedious to launch the repl for every file. Is there a way to configure vim to find the projects root path and launch an nrepl-server in that dir?
I do but I don't start the REPL from nvim, I start a bunch of REPLs using a kinda custom docker-compose wrapper then I set up Conjure to connect to the right REPL depending on what dir I :cd
into.
Conjure allows you to work on multiple projects at a time by setting the :ConjureClientState [state-key]
At work, I set up a "cwd changed" autocmd that sets my ConjureClientState
to the cwd path. So every time I :cd
I get a fresh Conjure state with it's own nREPL connection and config.
You could set up something similar + use something like https://github.com/clojure-vim/vim-jack-in if you really want to start your REPL from within nvim.
I still recommend setting up your REPLs outside of nvim with your own script though, ensure you write your .nrepl-port
files into each sub-repo directory, then :cd
into each module as you work on them and Conjure will auto connect.
Then you can set up the autocmd to set the state as you hop around to have multiple concurrent connections.
augroup conjure_set_state_key_on_dir_changed
autocmd!
autocmd DirChanged * execute "ConjureClientState " . getcwd()
augroup END
I have a script that goes through my docker processes and maps the nREPL ports into .nrepl-port files in the correct directories of the mono repo. Making :cd
ing into directories synonymous with connecting to them.
You can also discuss conjure over at https://conjure.fun/discord if you so wish 🙂
I guess I can simply use a script to launch repls for all projects.. I guess it will eat some memory. Anyway, I joined #conjure so I'll ask future questions there.
spec generators rely on the Clojure property testing library test.check. However, this dependency is dynamically loaded and you can use the parts of spec other than gen, exercise, and testing without declaring test.check as a runtime dependency.
The above is from the spec guide
where it speaks of loading the test.check
lib. What does it mean to dynamically load a lib ? how does that work ?:test-deps {:extra-paths ["test"]
:extra-deps {org.clojure/test.check {:mvn/version "1.0.0"}
peridot/peridot {:mvn/version "0.5.2"}}}
:run-tests {:extra-deps {com.cognitect/test-runner
{:git/url ""
:sha "209b64504cb3bd3b99ecfec7937b358a879f55c1"}}
:main-opts ["-m" "cognitect.test-runner"
"-d" "test"]}
an example of adding test.checkif you do generator stuff, it will load the test.check.generator namespace. if you don't, then it won't.
so you can safely include test.check at test/repl time but exclude it at production time
(map (fn [k v]
(println " K " k)
(println " v " v)
(if-not (re-matches #"^[a-z]+\*$" (->str v))
nil
(->str v)))
{:id "john"})
(fn [k v] …) is for 2 arguments. If you want to have key and value you need (fn [[k v]] …).
(defn foo [x1 x2 x3] ...)
is the fn with 3 arguments
(defn foo [x1 [k v] x3] ...)
is the function with 3 arugments, but second one is destructed to [k v]
Thanks @U0WL6FA77
There was website with challenging tasks to transform data where you can try to solve this online. After all you can compare your solutions to the best solutions made by other people. This is really god place to start.
Maybe someone else remember URL to website where you can do online tasks challenger and compare your solution to other people?
(map (fn [[k v]] (println "===1===k " k) (println "===1===v " v) (println "matches " (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v))) (if (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v)) true false)){:id "john"})
{:id "John"}
is a map, but you want to use map
functions on collection like [{:id "John"} {:id "Popeye"}]
(map println {:id "john" :foo "bar"})
[:id john]
[:foo bar]
=> (nil nil)
(map println [{:id "john"} {:foo "bar"}])
{:id john}
{:foo bar}
=> (nil nil)
I don’t understand the question.
The logic is map
take each element from collection and run function with this element. The result is returned by list.
yes, I got the functionality of map, In my logic i want to take key value which will be single map element and do pattern patching and result us true or false
if you want to operate on single map, then you don’t need to use map
as a function at all
((fn [[k v]] (println "===1===k " k) (println "===1===v " v) (println "matches " (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v))) (if (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v)) true false){:id "john"}))
((fn [m]
(println m))
{:foo "bar" :x "y"})
{:foo bar, :x y}
=> nil
(map (fn [m]
(println m))
{:foo "bar" :x "y"})
[:foo bar]
[:x y]
=> (nil nil)
((fn [m]
(println (:foo m)))
{:foo "bar" :x "y"})
bar
=> nil
if you want to check :id
(which is :foo here)On the end you wouldn’t write anonymous function and call them right a way like that
(let [f (fn [{:keys [foo] :as m}]
(println foo))]
(f {:foo "bar" :x "y"}))
this can be easier to understand(fn [v] (println "===1===v " v) (println "matches " (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v))) (if (re-matches #"^[a-zA-Z0-9]+\.\*$" (->str v)) false true )(map val (:id "john")))
BTW if you want to get all values only from map use vals
so (vals {:foo "bar" :x 1})
really hard to talk about how things should be done while we are doing things to learn
Hello Team I am passing a map to anonymous function and wanted to validate the function and tried with below code ,but it is not working, how can I pass {:id "john"} to anonymous function ?
Maybe someone else remember URL to website where you can do online tasks challenger and compare your solution to other people?
from https://clojure.org/reference/protocols#_extend_via_metadata: > As of Clojure 1.10, protocols can optionally elect to be extended via per-value metadata:
(defprotocol Component
:extend-via-metadata true
(start [component]))
Is there a resource that talks about how to decide if a protocol should opt in to extension via metadata?Here's a fun little example of why Functional is better than OOP 😛
data = None
if data and "domain" in data:
domain = data.get("domain").get("name", "foo")
else:
domain = "bar"
print(domain)
Notice in this code, you need the condition to be: if data and "domain" in data:
, the reason we have to check for the fact that data
is not None
otherwise the type None
will not have a in
method and you will see: TypeError: argument of type 'NoneType' is not iterable
If you didn't use methods, and instead used a functional approach, and in
was a function, this would not be a problem, because you could easily implement a None check inside that function.
This is also a good example why nil
isn't as bad in Clojure as it is in non null-safe OOP languages like Python or Java
cljs.user=> (key nil)
ERROR - No protocol method IMapEntry.-key defined for type null:
you have to check nil
and types in Clojure too 🙂Yes, sometimes, but now it's just a design choice, not a limitation of the paradigm. Key is just a function implemented with:
(defn key
"Returns the key of the map entry."
[map-entry]
(-key map-entry))
If it wanted, it could handle nil in any way.I wouldn't say that's a fair comparison. you typically wouldn't want to accept data
as either None
or a dict. I think it would be appropriate to only expect a dict
. additionally, idiomatic python follows "it's easier to ask for forgiveness than permission". I would expect to just see:
data.get("domain", {}).get("name", "bar")
To complete the example :-):
(data or {}).get("domain", {}).get("name", "bar")
That being said these days I end up with a get-in
function in python code.
the above is a nice addition. I still prefer clojure to python by quite a bit, but python isn't so bad
My point being, what if you wanted a .get that can handle None or any other type, maybe vector, etc.
In OO, all types would need to agree to share a .get interface, and provide an implementation for it
But also, in this particular case, ya I do find Python's handling of None on .get less then ideal. Think Clojure's handling is much nicer specifically because I think the above is a common source of bug.
I think we understood and agreed with your point, but we didn't think that the comparison was fair. In practice (at least on python codebases I worked on) that python code would look like:
get-in(data, ('domain', 'name'), 'bar')
or
get-in(data, 'domain.name', 'bar')
which doesn't compare that unfavourably to
(get-in data ["domain", "name"], "bar")
as your initial example.
It's possible, no one on our team is really a pro at Python, more like learned at university or picked it up here and there. This code is in a script file part of our infra, so it also doesn't get the same level of code review scrutiny and all.
I can't seem to find get-in
though? Is that from a popular library?
If so, I think it demonstrates my point pretty well, and I'd be curious to look at the implementation. My guess is get-in
is a function that people create for this very problem. Instead of adding a method to Dictionaries and None, if people have found the need to change get
from a method to a function, that would be a good example of what I'm talking about.
In Python, you could argue that you want a null error to be thrown, maybe you prefer the fail fast, and if you didn't explicitly handle null, maybe you consider a null appearing a bug that you'd want to know about. So that can be a design choice, what do you do with data being None? And while I like that Clojure has get handle nil by default, I don't want to say that throwing a null error if get encounters a null is necessarily worse or bad.
But, in OOP, you actually can't do anything about it if you did want to handle this case the way Clojure does. That's because of how methods work versus functions. If the type is wrong, the methods won't exist. All you can do is add the method to more and more types, but even then, there's always a chance a type shows up that doesn't have the method, and you get an error again. That's one of the Functional advantages in my opinion. Which you could also do in Python, since it has Functions, you could make get a function and do this.
I would say the biggest difference for me is I can focus on moving from room A to B instead of object door which is not what I am interested in to achieve, because I want to move to B - but this is very abstractive description :)
I'm working on an app where I'm making several api calls concurrently to fetch data. The number is variable but let's say it's 50 on average. I'm currently using pmap to transform the urls into the response in parallel, but I was wondering if it could be faster since pmap is limited to 2 + num_cpus and the time is mostly spent in I/O wait. Any tips?
When I had an app that heavily used APIs, the pattern that worked best was to have separate resource pooling per API service. This is because there's usually a per API limit (either imposed by the API, or their own resources being able to serve you)
that pooling could be a thread pool (eg. claypoole which lets you use futures with custom pools) or a queue per service, with a different number of workers dedicated to each queue
if you aren't hitting the limits of the APIs, you can just use future
for each call, and skip pmap
which is rarely the right answer
if you need to do any coordination (eg. combining results from multiple calls before calling another endpoint) look into core.async
(but make sure all the io is inside core.async/thread
calls)
Also note that pmap will very likely run more than 2+cpus tasks at the same time due to chunking: https://github.com/jumarko/clojure-experiments/blob/master/src/clojure_experiments/experiments.clj#L556-L576
@U06BE1L6T I don't think you're correct here. The parallelization level is restricted by the thread pool it uses, chunking won't change that.
the parallelization is controlled by the lag between the launch of new futures and the deref, it uses future which is an expanding unlimited pool
chunking changes the behavior of (map #(future (f %)) coll)
which is what actually creates the threads
so the answer is weird and complicated (another reason I don't like pmap) - chunking causes futures to be launched a chunk at a time, if the input is chunked, otherwise the number of futures in flight is controlled by the lag between future generation and future realization (which is done via the blocking deref
)
(defn pmap
"Like map, except f is applied in parallel. Semi-lazy in that the
parallel computation stays ahead of the consumption, but doesn't
realize the entire result unless required. Only useful for
computationally intensive functions where the time of f dominates
the coordination overhead."
{:added "1.0"
:static true}
([f coll]
(let [n (+ 2 (.. Runtime getRuntime availableProcessors))
rets (map #(future (f %)) coll)
step (fn step [[x & xs :as vs] fs]
(lazy-seq
(if-let [s (seq fs)]
(cons (deref x) (step xs (rest s)))
(map deref vs))))]
(step rets (drop n rets))))
([f coll & colls]
(let [step (fn step [cs]
(lazy-seq
(let [ss (map seq cs)]
(when (every? identity ss)
(cons (map first ss) (step (map rest ss)))))))]
(pmap #(apply f %) (step (cons coll colls))))))
the (drop n rets)
creates the lag between creation of new futures and blocking deref to wait on them
breaking a common piece of advice to not mix lazy calculation with procedural side effects
;; changes to this atom will reported via println
(def snitch (atom 0))
(add-watch snitch :logging
(fn [_ _ old-value new-value]
(print (str "total goes from " old-value " to " new-value "\n"))))
(defn exercise
[coll]
(doall
(pmap (fn [x]
(swap! snitch inc)
(print (str "processing: " x "\n"))
(swap! snitch dec)
@snitch)
coll)))
user=> (exercise (range 10))
total goes from 3 to 4
total goes from 4 to 5
total goes from 2 to 3
total goes from 1 to 2
total goes from 0 to 1
processing: 0
processing: 4
processing: 2
processing: 3
processing: 1
total goes from 5 to 4
total goes from 4 to 3
total goes from 1 to 0
total goes from 2 to 1
total goes from 3 to 2
total goes from 0 to 1
total goes from 1 to 2
processing: 6
processing: 7
total goes from 2 to 3
total goes from 3 to 4
total goes from 5 to 4
total goes from 4 to 5
processing: 8
total goes from 4 to 3
processing: 9
processing: 5
total goes from 3 to 2
total goes from 2 to 1
total goes from 1 to 0
(0 0 0 0 0 0 3 2 0 0)
max parallelism here is 5 - I'm going to try a version where I capture the max and exercise it more aggressively@U0K064KQV I am not good enough with lazy-seqs to read the pmap code and know whether it unchunks, so I'm working empirically
yeah, here's my version of exercise that captures the max parallelism:
(defn exercise
[coll]
(let [biggest (atom 0)]
(dorun
(pmap (fn [x]
(swap! snitch inc)
(swap! biggest max @snitch)
(print (str "processing: " x "\n"))
(swap! snitch dec)
@snitch)
coll))
@biggest))
(exercise (range 1000))
prints a lot more than I'm going to paste here, and returns 19lmk if that's flawed, but to my eye that will accurately tell you the max futures spawned concurrently by pmap
(nb range is chunked, which is why I'm using it here)
Hum. Ya, looking at the code, its kind of hard to get a full picture. I think the branch of if-let that uses cons will unchunk, but the other branch would not. And the drop n will also trigger the first chunk.
all the retries on that poor little atom make the output with bigger inputs absurd
or maybe that's caused by the printing contention...
Might be better to use a sempahore? I think a lock instead of atom's retry maybe would make this more clear?
(the reason all the prints call str
is because otherwise the parts of the prints overlap in the output
Oh, no I don't think that's what I meant. Whatever the thing that is a locking counter is called
Then again, hum... What if you changed the impl of pmap so that inside the future it incremented and decremented the counter before and after running f ?
that would be the same behavior, with more work to achieve it
I rewrote to an agent (doesn't retry), the prints are now in intelligible order, the answer is still high (33, 37, 38, 39, 36 ...)
max value in theory is 42 (32 chunk size + 8 processors + 2)
(when you overlap the next chunk)
Oh boy, that's one confusing little function haha. It does seem like, it was written pre-chunking though, so I guess chunking just wasn't taken into account. Hum, I wonder if that explains why I see poor performance improvements from it in practice, like with chunking, the thread overhead is way too high for parallelization
it launches chunk-size futures, but iterates by nproc+2 delay between reader of input and reader of future values, if your input is big enough to have multiple chunks you can have more than chunk size in flight
that could be - I consider it more like "an example of what you could do to parallelize a specific problem" that happened to make it into the codebase, and it doesn't match most people's problems
reducers are more general, but I haven't used them in anger and haven't seen much usage of them in the wild
Ya, I think having to require their namespace and the fact that only fold is still useful now that we have transducers makes them kind of DOA
Well, maybe this chunking behavior is actually a blessing in disguise? Now it means using this re-chunk function:
(defn re-chunk [n xs]
(lazy-seq
(when-let [s (seq (take n xs))]
(let [cb (chunk-buffer n)]
(doseq [x s] (chunk-append cb x))
(chunk-cons (chunk cb) (re-chunk n (drop n xs)))))))
Taken from clojuredocs, you can actually control the concurrency level of pmap 😛(dorun (pmap (fn[_] (Thread/sleep 100)) (re-chunk 1 (range 1000))))
Will give you ~2+cores
(dorun (pmap (fn[_] (Thread/sleep 100)) (re-chunk 100 (range 1000))))
Will give you ~100