clojure 2020-12-11 | Slack Archive

seancorfield00:12:17

@ryan.is.gray We've historically used s/conformer at work and even published a library of "conforming specs" that are useful for validating (and lightly transforming) form data -- from strings to numbers, booleans, etc. But we're looking at switching to https://github.com/wilkerlucio/spec-coerce which derives coercions from specs so that you can write your specs for just the target data you want and then run your string input through spec-coerce and then into Spec itself, keeping the two concerns separated.

mpenet04:12:05

I strongly suggest using coax instead @seancorfield. It's more complete/battle tested. It's also way more performant

mpenet04:12:49

(it started as a fork of spec-coerce)

seancorfield04:12:40

Ah, good to know @mpenet -- I'll look into it.

seancorfield04:12:14

Oh, the exoscale library? I have looked at it before so thanks for reminding me!

rdgd05:12:28

thanks @seancorfield 👀

roklenarcic10:12:54

I’ve got a problem: I have a protocol in a library dependency. I have a defrecord that extends the protocol. When I run (satisfies? Protocol record) it says false, which it really shouldn’t. But when I reload the namespace then suddenly it does extend the protocol.

p-himik11:12:37

> when I reload the namespace The namespace that defines the protocol? Or the one that defines the record?

roklenarcic11:12:48

the one that defines the record

roklenarcic11:12:32

I know that reloading the namespace that defined the protocol breaks existing implementations, but in this case I start at this state

roklenarcic11:12:03

I start the REPL and ask satisfies? it will return false, even though the protocol implementation is in defrecord definition

roklenarcic11:12:10

that is before doing any reloads whatsoever

p-himik11:12:43

Something in your workflow is probably implicitly reloading the protocol namespace. I can't really say anything else without an MRE.

roklenarcic11:12:39

why would it be implicitly reloaded? The require directives don’t load already loaded namespaces, right?

p-himik11:12:31

You said you were using some library that defines that protocol. Do you know for sure that that library never reloads anything? Also, did you try to reproduce it using just clj as your REPL, with nothing else?

roklenarcic11:12:41

I can try that now

roklenarcic12:12:20

I swapped a couple of branches and did a clean and now the error is gone…

roklenarcic12:12:29

weirdest thing

lread15:12:27

Pulling at straws here @U66G3SGP5, but a dissoc on a record field returns a map. Any chance you did a dissoc on your record?

Clojure 1.10.1
user=> (defprotocol MyProtocol (my-fn [this a]))
MyProtocol
user=> (defrecord MyRecord [my-field] MyProtocol (my-fn [this a] a))
user.MyRecord
user=> (def r (->MyRecord "field-value"))
#'user/r
user=> (satisfies? MyProtocol r)
true
user=> (my-fn r 42)
42
user=> (def r2 (dissoc r :my-field))
#'user/r2
user=> (satisfies? MyProtocol r2)
false
user=> (my-fn r2 42)
Execution error (IllegalArgumentException) at user/eval145$fn$G (REPL:1).
No implementation of method: :my-fn of protocol: #'user/MyProtocol found for class: clojure.lang.PersistentArrayMap
user=> (type r2)
clojure.lang.PersistentArrayMap
user=> (type r)
user.MyRecord

roklenarcic15:12:05

no, but that’s a good trick, never thought of that…

jumar13:12:08

https://clojure.org/reference/refs says Clojure refs implement snapshot isolation. Is somewhere stated what kind of concurrency issues this isolation level (specifically in Clojure code) can cause? Does anyone experienced such issues in practice?

Alex Miller (Clojure team)14:12:01

The concurrency issue you are most likely to see with refs is write skew (because read-only refs are not part of the default ref set that can cause a transaction to retry). But that’s easily worked around when it’s an issue by using ensure instead of deref to add the ref to the ref set even on read.

👍 3

lilactown16:12:42

Is there anything written down about the decision to use an explicit “ensure” rather than track all deref? For my curiosity

Alex Miller (Clojure team)16:12:39

adding read-only refs to the ref set means your transactions have a greater chance of failure and retry. but it's not always necessary. so the current setup gives you the option and a way to choose your semantics. if they were always included, you would have no way to weaken that constraint when needed.

Alex Miller (Clojure team)16:12:47

like say you had two refs - one for an account balance and one for transaction fee amount. you have a transaction that updates the balance and assesses the fee (which only needs to be read). if the fee changes infrequently and the exact moment when a change starts being applied is not important to the business, it's fine to just deref the fee ref. but if it's really important that the fee change takes effect immediately, you could ensure that ref

dominicm17:12:05

What kind of maps are struct maps good for reducing the size of? My (limited )experiments so far show arraymaps to be consistently smaller.

noisesmith17:12:10

I thought those were deprecated

dominicm17:12:31

@noisesmith Nope. Just better served by records, https://clojure.org/reference/data_structures#StructMaps

hiredman17:12:37

I think struct-maps were basically a first pass experiment that lead to defrecords

hiredman17:12:55

I would not really expect them to better than anything at anything

dominicm17:12:19

Unfortunately records don't support namespaced keys or any other kind of key for that matter.

andy.fingerhut17:12:57

If you are looking into implementation level details of why certain data structures use the amount of memory that they do, and want something that can draw pictures of JVM objects and references between them for you, you might enjoy tinkering with the cljol library: https://github.com/jafingerhut/cljol

andy.fingerhut17:12:44

I have not used it to investigate struct maps before, and haven't had an occasion to delve into struct map implementation. array maps are good for memory utilization, for sure, but they do have O(n) lookup time, so something to keep in mind if you ever want to make a big one (that and as soon as you take a large array map and create an updated version of it with operations like assoc, etc., you will typically get back a hash map)

hiredman17:12:51

defrecords do support other kinds of keys and namespaced keys, they just don't get turned into object fields

dominicm17:12:04

@hiredman but doesn't that negate optimizations?

dominicm17:12:36

@hiredman In fact, it adds an extra 8 bytes of overhead! :p

hiredman17:12:59

user=> (defrecord Foo [])
user.Foo
user=> (->Foo)
#user.Foo{}
user=> (assoc (->Foo) ::a 1)
#user.Foo{:user/a 1}
user=>

hiredman17:12:23

not supporting as well as you would like is not the same thing as not supporting at all

dominicm17:12:54

@hiredman Sure. But there's no size optimization to be had by using one that way.

dominicm17:12:35

user=> (mm/measure (assoc (->X) ::a 1))
"264 B"
user=> (mm/measure (assoc {} ::a 1))
"232 B"

andy.fingerhut17:12:25

I would be surprised if you found a built-in Clojure data structure for maps that is lower memory than array-map, and also supported qualified keywords as keys. But I haven't done the measurements you are doing -- just giving guess from knowledge I do have. array-maps are O(n) lookup time, as I mentioned above, and 'fragile', as mentioned above. Note that keywords are interned, i.e. only stored in memory once, so the mm/measure results you are showing probably contain all of the objects for the keyword once, but if you did a similar measurement for 1000 objects that use the same keyword repeatedly, the keyword memory is only counted once overall (as it should be, since it is only stored once in memory)

dominicm20:12:34

Is there a java-y solution for spinning up multiple workers running the same callable over and over or just an infinite length task? I'm imagining a threadpool where you define the size and have a method to interrupt the whole pool.

noisesmith20:12:03

there's probably an Executor that makes this easy, they can own pools

noisesmith20:12:19

interruption on the jvm is tricky, period, unless you use one of a specific set of predefined interruptable methods, or are OK with checking a sentinel value and shutting down manually at execution boundaries

dominicm20:12:15

I thought interruption was okay, you check isInterrupted and catch the exception, either happens, you quit?

dominicm20:12:38

I've missed the executor if it exists :(

hiredman20:12:24

I don't think that is good

hiredman20:12:46

Like, in general, you want structured concurrency, tree shaped task graphs, forkjoin, etc

hiredman20:12:13

What you are asking for is extremely unstructured

hiredman20:12:37

It doesn't even have the structure of iteration where previous results feed back in, just the same callable over and over

hiredman20:12:05

It basically demands side effects as the only way to have results

hiredman20:12:42

The goto and labels of concurrency

dominicm20:12:10

@hiredman isn't this a common pattern for core async where you might have multiple go-loops?

hiredman20:12:52

hiredman20:12:38

That is in no way equivalent to running the same callable over and over

dominicm20:12:49

How would you model concurrency or workers reading from a queue and then writing state out somewhere, e.g. Database?

dominicm20:12:09

Not an in memory queue that is.

hiredman20:12:51

It depends on the queue implementation, but usually it is better to have a single thread(sometimes for limiting work in progress, sometimes for doing blocking up, just lots of reasons this usually ends up better) pulling items from the queue and then running a handler or whatever per item

hiredman20:12:06

Basically the pattern as writing a socket server

dominicm20:12:23

Single threaded generator pushing into a thread pool, you mean?

hiredman20:12:34

You have a loop accepting connections and hand connections off to workers

hiredman20:12:36

Yes

hiredman20:12:05

And the workers are not invoking the same callable over and over

dominicm20:12:33

Right, yeah. Makes sense. So you only need one go-loop. Although I guess core async doesn't provide much in the way of rate limiting push to consumers like a thread pool would.

dominicm20:12:54

I'm not using core async, so just observing the parallels.

hiredman20:12:59

The workers might be core async loops

hiredman20:12:07

I use core.async a lot

dominicm20:12:32

I haven't used in a couple years. But I've seen the pattern of starting multiple go loops to be consumers as a sort of pool of workers which then had complex cancel channels managed across all of them with pub sub and such. Difficult stuff.

dominicm20:12:33

@hiredman if I had plenty of network cards and cores, would you still advise against multiple queue readers?

hiredman20:12:55

It really depends, my point is just none of those cases map to "invoking the same callable over and over"

hiredman20:12:15

Actually the closest thing it maps to is the lowest level behavior of an executor

hiredman20:12:28

E.g. each thread an executor is managing is conceptually running the same code over and over in a loop: pull a runnable from the executors queue and run it

hiredman21:12:52

so like, writing an executor on top of an executor

dominicm21:12:59

Yeah. Exactly. Although that's still a single producer really.

hiredman21:12:13

the "gotos and labels" of concurrency.

hiredman21:12:34

code compiles to gotos and labels, but we write function calls, concurrency happens on threadpool threads running a loop, but you try to write higher level stuff

didibus21:12:50

Could you have a memory mapped backed map?

didibus21:12:03

That would offload the memory to disk

didibus21:12:39

It be cool actually if there was one that implemented all the Clojure Map interfaces

hiredman21:12:07

there are disk back implementations of java.util.Map, the tricky thing about clojure maps is they are immutable, so you never have a single map on disk, you have a forest, and then you need to manage that

didibus21:12:59

That's neat. Wouldn't you be able to just MMAP the backing trie ?

didibus21:12:26

Or you mean you'd need some sort of GC for it?

hiredman21:12:21

it depends

hiredman21:12:36

you would need to manage it some way, which might look like a gc

hiredman21:12:46

but at this point you are kind of halfway to a database with mvcc like postgresql

hiredman21:12:53

halfway is overly generous, but it presents a lot of the same issues as mvcc

didibus21:12:02

I also wonder, what about a hybrid, where the trie is kept in memory, but the leafs are MMAPed?

hiredman21:12:23

what you want is a block cache

hiredman21:12:47

which of course, the os already has one, but you might want more

hiredman21:12:18

I think datomic caches both "blocks" of raw storage and deserialized objects

didibus21:12:04

I've never particularly had this use case, but I can imagine someone say who'd want to load up like a large amount of data in some map to do some report on it or whatever, and if it doesn't fit, but somehow they need it all or something of that sort. But then again maybe there's just a way to get Java to put its whole HEAP on disk

hiredman21:12:11

just use derby

hiredman21:12:32

I have done this, basically reinventing swap by spilling data into derby when it is too large for processing in memory, it is ok, this was a batch system so the performance was likely terrible, but no one was waiting for the results in realtime

didibus21:12:41

Ya, but there's something nice about a change that wouldn't require any code change. You know, like say you started and it would fit in memory, and suddenly you try to process an even larger file. Instead of like rewriting things to adapt to using derby or some other thing.

hiredman21:12:07

just start off using the in memory derby storage 🙂

didibus21:12:35

Fair fair, still think it would be a cool little project though, even if I don't need it lol

hiredman22:12:33

https://github.com/Factual/durable-queue and https://github.com/Factual/riffle are things vaguely in this area

didibus22:12:32

Cool, I'll give them a look

hiredman21:12:10

datomic is sort of that, and the way it manages the forest of trees is by exposing it as history

hiredman21:12:03

http://www.mapdb.org/

coby22:12:58

Is it possible to declare custom metadata on defn directly?

user=> (defn foo ^{:custom "Custom metadata!"} [] 'foo!)
#'user/foo
user=> (:custom (meta #'foo))
nil

i.e. for this to return "Custom metadata!" instead

hiredman22:12:47

the place to it is on the name

coby22:12:39

what do you mean by that?

hiredman22:12:15

a type hint is the only thing that can go on the arg vector

hiredman22:12:06

(and the type hint can go on either the name or the arg vector)

bronsa22:12:01

not really, you can put any metadata on the arg vector, but it's not reflected on the var, it's reflected on the arglists key

user=> (defn foo ^:bar [])
#'user/foo
user=> (-> #'foo meta :arglists first meta)
{:bar true}

bronsa22:12:47

but as @hiredman says, if oyu want metadata on the var, hint the var name

hiredman22:12:48

^ means put the follow metadata on the next thing

hiredman22:12:18

so ^{:custom "whatever"} [] means put that metadata map on that vector

hiredman22:12:35

in this case the vector you are attaching the metadata to is the arglist vector for the function

coby22:12:41

oh that makes sense

lilactown22:12:50

(defn ^:foo bar [] ,,,)

hiredman22:12:10

right before that you have the name you are defing, any metadata you attach to that symbol will be copied to the defined var

coby22:12:07

Thanks, I thought I'd tried that but probably just did it wrong 🙂

coby22:12:07

working now!

2020-12-11

Channels