Fork me on GitHub
seancorfield00:12:17 We've historically used s/conformer at work and even published a library of "conforming specs" that are useful for validating (and lightly transforming) form data -- from strings to numbers, booleans, etc. But we're looking at switching to which derives coercions from specs so that you can write your specs for just the target data you want and then run your string input through spec-coerce and then into Spec itself, keeping the two concerns separated.


I strongly suggest using coax instead @seancorfield. It's more complete/battle tested. It's also way more performant


(it started as a fork of spec-coerce)


Ah, good to know @mpenet -- I'll look into it.


Oh, the exoscale library? I have looked at it before so thanks for reminding me!


I’ve got a problem: I have a protocol in a library dependency. I have a defrecord that extends the protocol. When I run (satisfies? Protocol record) it says false, which it really shouldn’t. But when I reload the namespace then suddenly it does extend the protocol.


> when I reload the namespace The namespace that defines the protocol? Or the one that defines the record?


the one that defines the record


I know that reloading the namespace that defined the protocol breaks existing implementations, but in this case I start at this state


I start the REPL and ask satisfies? it will return false, even though the protocol implementation is in defrecord definition


that is before doing any reloads whatsoever


Something in your workflow is probably implicitly reloading the protocol namespace. I can't really say anything else without an MRE.


why would it be implicitly reloaded? The require directives don’t load already loaded namespaces, right?


You said you were using some library that defines that protocol. Do you know for sure that that library never reloads anything? Also, did you try to reproduce it using just clj as your REPL, with nothing else?


I can try that now


I swapped a couple of branches and did a clean and now the error is gone…


weirdest thing


Pulling at straws here @U66G3SGP5, but a dissoc on a record field returns a map. Any chance you did a dissoc on your record?

Clojure 1.10.1
user=> (defprotocol MyProtocol (my-fn [this a]))
user=> (defrecord MyRecord [my-field] MyProtocol (my-fn [this a] a))
user=> (def r (->MyRecord "field-value"))
user=> (satisfies? MyProtocol r)
user=> (my-fn r 42)
user=> (def r2 (dissoc r :my-field))
user=> (satisfies? MyProtocol r2)
user=> (my-fn r2 42)
Execution error (IllegalArgumentException) at user/eval145$fn$G (REPL:1).
No implementation of method: :my-fn of protocol: #'user/MyProtocol found for class: clojure.lang.PersistentArrayMap
user=> (type r2)
user=> (type r)


no, but that’s a good trick, never thought of that…

jumar13:12:08 says Clojure refs implement snapshot isolation. Is somewhere stated what kind of concurrency issues this isolation level (specifically in Clojure code) can cause? Does anyone experienced such issues in practice?

Alex Miller (Clojure team)14:12:01

The concurrency issue you are most likely to see with refs is write skew (because read-only refs are not part of the default ref set that can cause a transaction to retry). But that’s easily worked around when it’s an issue by using ensure instead of deref to add the ref to the ref set even on read.

👍 3

Is there anything written down about the decision to use an explicit “ensure” rather than track all deref? For my curiosity

Alex Miller (Clojure team)16:12:39

adding read-only refs to the ref set means your transactions have a greater chance of failure and retry. but it's not always necessary. so the current setup gives you the option and a way to choose your semantics. if they were always included, you would have no way to weaken that constraint when needed.

Alex Miller (Clojure team)16:12:47

like say you had two refs - one for an account balance and one for transaction fee amount. you have a transaction that updates the balance and assesses the fee (which only needs to be read). if the fee changes infrequently and the exact moment when a change starts being applied is not important to the business, it's fine to just deref the fee ref. but if it's really important that the fee change takes effect immediately, you could ensure that ref


What kind of maps are struct maps good for reducing the size of? My (limited )experiments so far show arraymaps to be consistently smaller.


I thought those were deprecated


I think struct-maps were basically a first pass experiment that lead to defrecords


I would not really expect them to better than anything at anything


Unfortunately records don't support namespaced keys or any other kind of key for that matter.


If you are looking into implementation level details of why certain data structures use the amount of memory that they do, and want something that can draw pictures of JVM objects and references between them for you, you might enjoy tinkering with the cljol library:


I have not used it to investigate struct maps before, and haven't had an occasion to delve into struct map implementation. array maps are good for memory utilization, for sure, but they do have O(n) lookup time, so something to keep in mind if you ever want to make a big one (that and as soon as you take a large array map and create an updated version of it with operations like assoc, etc., you will typically get back a hash map)


defrecords do support other kinds of keys and namespaced keys, they just don't get turned into object fields


@hiredman but doesn't that negate optimizations?


@hiredman In fact, it adds an extra 8 bytes of overhead! :p


user=> (defrecord Foo [])
user=> (->Foo)
user=> (assoc (->Foo) ::a 1)
#user.Foo{:user/a 1}


not supporting as well as you would like is not the same thing as not supporting at all


@hiredman Sure. But there's no size optimization to be had by using one that way.


user=> (mm/measure (assoc (->X) ::a 1))
"264 B"
user=> (mm/measure (assoc {} ::a 1))
"232 B"


I would be surprised if you found a built-in Clojure data structure for maps that is lower memory than array-map, and also supported qualified keywords as keys. But I haven't done the measurements you are doing -- just giving guess from knowledge I do have. array-maps are O(n) lookup time, as I mentioned above, and 'fragile', as mentioned above. Note that keywords are interned, i.e. only stored in memory once, so the mm/measure results you are showing probably contain all of the objects for the keyword once, but if you did a similar measurement for 1000 objects that use the same keyword repeatedly, the keyword memory is only counted once overall (as it should be, since it is only stored once in memory)


Is there a java-y solution for spinning up multiple workers running the same callable over and over or just an infinite length task? I'm imagining a threadpool where you define the size and have a method to interrupt the whole pool.


there's probably an Executor that makes this easy, they can own pools


interruption on the jvm is tricky, period, unless you use one of a specific set of predefined interruptable methods, or are OK with checking a sentinel value and shutting down manually at execution boundaries


I thought interruption was okay, you check isInterrupted and catch the exception, either happens, you quit?


I've missed the executor if it exists :(


I don't think that is good


Like, in general, you want structured concurrency, tree shaped task graphs, forkjoin, etc


What you are asking for is extremely unstructured


It doesn't even have the structure of iteration where previous results feed back in, just the same callable over and over


It basically demands side effects as the only way to have results


The goto and labels of concurrency


@hiredman isn't this a common pattern for core async where you might have multiple go-loops?


That is in no way equivalent to running the same callable over and over


How would you model concurrency or workers reading from a queue and then writing state out somewhere, e.g. Database?


Not an in memory queue that is.


It depends on the queue implementation, but usually it is better to have a single thread(sometimes for limiting work in progress, sometimes for doing blocking up, just lots of reasons this usually ends up better) pulling items from the queue and then running a handler or whatever per item


Basically the pattern as writing a socket server


Single threaded generator pushing into a thread pool, you mean?


You have a loop accepting connections and hand connections off to workers


And the workers are not invoking the same callable over and over


Right, yeah. Makes sense. So you only need one go-loop. Although I guess core async doesn't provide much in the way of rate limiting push to consumers like a thread pool would.


I'm not using core async, so just observing the parallels.


The workers might be core async loops


I use core.async a lot


I haven't used in a couple years. But I've seen the pattern of starting multiple go loops to be consumers as a sort of pool of workers which then had complex cancel channels managed across all of them with pub sub and such. Difficult stuff.


@hiredman if I had plenty of network cards and cores, would you still advise against multiple queue readers?


It really depends, my point is just none of those cases map to "invoking the same callable over and over"


Actually the closest thing it maps to is the lowest level behavior of an executor


E.g. each thread an executor is managing is conceptually running the same code over and over in a loop: pull a runnable from the executors queue and run it


so like, writing an executor on top of an executor


Yeah. Exactly. Although that's still a single producer really.


the "gotos and labels" of concurrency.


code compiles to gotos and labels, but we write function calls, concurrency happens on threadpool threads running a loop, but you try to write higher level stuff


Could you have a memory mapped backed map?


That would offload the memory to disk


It be cool actually if there was one that implemented all the Clojure Map interfaces


there are disk back implementations of java.util.Map, the tricky thing about clojure maps is they are immutable, so you never have a single map on disk, you have a forest, and then you need to manage that


That's neat. Wouldn't you be able to just MMAP the backing trie ?


Or you mean you'd need some sort of GC for it?


you would need to manage it some way, which might look like a gc


but at this point you are kind of halfway to a database with mvcc like postgresql


halfway is overly generous, but it presents a lot of the same issues as mvcc


I also wonder, what about a hybrid, where the trie is kept in memory, but the leafs are MMAPed?


what you want is a block cache


which of course, the os already has one, but you might want more


I think datomic caches both "blocks" of raw storage and deserialized objects


I've never particularly had this use case, but I can imagine someone say who'd want to load up like a large amount of data in some map to do some report on it or whatever, and if it doesn't fit, but somehow they need it all or something of that sort. But then again maybe there's just a way to get Java to put its whole HEAP on disk


just use derby


I have done this, basically reinventing swap by spilling data into derby when it is too large for processing in memory, it is ok, this was a batch system so the performance was likely terrible, but no one was waiting for the results in realtime


Ya, but there's something nice about a change that wouldn't require any code change. You know, like say you started and it would fit in memory, and suddenly you try to process an even larger file. Instead of like rewriting things to adapt to using derby or some other thing.


just start off using the in memory derby storage 🙂


Fair fair, still think it would be a cool little project though, even if I don't need it lol


Cool, I'll give them a look


datomic is sort of that, and the way it manages the forest of trees is by exposing it as history


Is it possible to declare custom metadata on defn directly?

user=> (defn foo ^{:custom "Custom metadata!"} [] 'foo!)
user=> (:custom (meta #'foo))
i.e. for this to return "Custom metadata!" instead


the place to it is on the name


what do you mean by that?


a type hint is the only thing that can go on the arg vector


(and the type hint can go on either the name or the arg vector)


not really, you can put any metadata on the arg vector, but it's not reflected on the var, it's reflected on the arglists key

user=> (defn foo ^:bar [])
user=> (-> #'foo meta :arglists first meta)
{:bar true}


but as @hiredman says, if oyu want metadata on the var, hint the var name


^ means put the follow metadata on the next thing


so ^{:custom "whatever"} [] means put that metadata map on that vector


in this case the vector you are attaching the metadata to is the arglist vector for the function


oh that makes sense


(defn ^:foo bar [] ,,,)


right before that you have the name you are defing, any metadata you attach to that symbol will be copied to the defined var


Thanks, I thought I'd tried that but probably just did it wrong 🙂


working now!