Fork me on GitHub
#clojure
<
2021-01-09
>
andy.fingerhut00:01:22

One minor nit I saw recently was that I think defstruct allows qualified keywords as keys, but defrecord does not.

seancorfield00:01:45

defstruct allows all sorts of things as keys (in addition to qualified keywords):

dev=> (defstruct Bar :a 'b "d" [:e] :f/g ::h)
#'dev/Bar
dev=> (struct Bar 1 2 3 4 5 6)
{:a 1, b 2, "d" 3, [:e] 4, :f/g 5, :dev/h 6}
dev=> (get *1 [:e])
4
dev=> (::h *2)
6
dev=> 

🤯 6
seancorfield00:01:36

dev=> (defstruct Obj (Object.))
#'dev/Obj
dev=> (= (struct Obj 1) (struct Obj 1))
true
🙂

andy.fingerhut03:01:33

So looks like defstruct is about as general as regular Clojure maps there?

didibus05:01:16

What's a defstruct under the hood? Something I've thought about recently is it be nice to have an array backed struct, where keys are static. Basically they'd be a bit like class fields, but much more lightweight then creating a class.

didibus05:01:37

Like (defstatic Point3D :x :y :z). And then you'd do (get-field point :x) or something, but get-field would be a macro which will rewrite this to an (aget point 0)

didibus05:01:50

Something of that sort

didibus05:01:55

Oh, maybe struct are kind of what I'm thinking actually way better then what I was thinking. You need to use accessor though to get the kind of direct access I'm talking about it seems

andy.fingerhut05:01:24

The Java implementation isn't too hard to follow. It is a persistent map of the "base" keys to small integer indexes in the range [0, n-1], stored once for each named defstruct, and then for each instance of such a persistent struct there is an array indexed by [0, n-1] of associated values. There is an optional persistent map to store other keys that might be assoc'd on later that are outside of those named when you create a named defstruct

andy.fingerhut05:01:38

Yeah, looks like you don't need to use the accessor function to retrieve elements from a struct, but if you do not, then performance is pretty much like a persistent array map or persistent hash map, depending the number of keys.

didibus05:01:48

Ya, so its quite close to what I was saying. The struct fields are stored in an Object array. There's a map from the field name to the index where their value is. And there is another map for dynamically "added" fields. Accessor will create a function that already looked up the field in the map and closed over the index, so when you use the accessor afterwards, it's just getting you the value directly from the array by index. At least that's my 30 second glance.

andy.fingerhut05:01:08

That all agrees with my 4-minute glance 🙂

didibus05:01:29

I'm actually not sure why regular gets of its fields seem to be faster than for regular maps hum...

didibus05:01:04

Like getting the index form a small map of field -> index and then looking it up in an array is faster than just a normal map lookup?

andy.fingerhut05:01:34

What test are you running that shows regular gets of a defstruct's keys are faster than for regular maps?

didibus05:01:48

(def ks [:a :b :c :d :e :f :g :h :i :j :k :l :m :n :o :p])
(def Foo (apply create-struct Foo ks))
(def s (struct Foo 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6))
(time (let [s s] (dotimes [i 1000000000] (:a s))))
"Elapsed time: 17160.9286 msecs"
(def z (zipmap ks (range)))
(time (let [z z] (dotimes [i 1000000000] (:a z))))
"Elapsed time: 24000.9011 msecs"

andy.fingerhut05:01:00

I would guess that is because the regular map there is a hash map, but the defstruct is like an array-map for keys in the 'base' set, and your test is always accessing the first one in the array.

andy.fingerhut05:01:24

Try always accessing key :p in the struct map, and I would guess you will notice a difference in performance.

didibus05:01:47

I was thinking array-map vs hash-map probably, but ya forgot that for array map if I'm using the first key it be way faster. Let me try

andy.fingerhut05:01:48

wait, no cancel that explanation.

andy.fingerhut05:01:06

The defstruct should have a hash map for the keys, too, I think.

didibus05:01:29

wouldn't 12 keys though result in a hash-map?

didibus05:01:26

Still faster with struct

(time (let [s s] (dotimes [i 1000000000] (:p s))))
"Elapsed time: 22069.8155 msecs"
(time (let [z z] (dotimes [i 1000000000] (:p z))))
"Elapsed time: 30195.8514 msecs"

andy.fingerhut05:01:51

Trying to dig now to see if it actually uses a hash-map for that many keys, or an array-map. Not there yet.

didibus05:01:21

I think that's the line: return new Def(keys, RT.map(v)); so it seems to delegate to the RT map constructor, which I think would create an array-map if length under 8 and a hash-map if greater

didibus05:01:32

Oh, HASHTABLE_THRESHOLD = 16; I thought it was 8

andy.fingerhut05:01:08

but the v passed to RT.map has 2 elements per key, one for the key, one for the associated value

didibus06:01:29

Ah yes, that's why it is actually 8

didibus06:01:42

The treshold is on the length of the list of key/value pairs. So set at 16, it means a map of 8 mapEntry

didibus06:01:07

So pretty sure in this example, both are using HashMap

andy.fingerhut06:01:17

Yep, PersistentHashMap class for the value of the keyslots field for the defstruct keys you are using, so I don't currently have a guess what the performance difference might be caused by.

andy.fingerhut06:01:58

(verified via reflective access to the value of the field keyslots of an actual JVM object created by defstruct, rather than reading the code)

👍 3
andy.fingerhut06:01:38

There is a pretty significant difference in run times from one key to another in your results, too, so I'm not getting very excited over the performance differences yet.

andy.fingerhut06:01:50

FYI the hash maps are different between those two cases, because your expression (apply create-struct Foo ks) is adding the var Foo itself as one of the fields of the struct. You probably want to change that to (apply create-struct ks)

didibus06:01:55

Doesn't seem to make a difference :man-shrugging:

didibus06:01:38

Maybe some weird JVM optimization

didibus06:01:13

Anyways, the difference is pretty small (if you don't run it like a million time)

didibus06:01:41

Using accessor is really where you see a big speedup

seancorfield06:01:25

Presumably because the accessor does the hash map lookup for you, and then using it is just a simple array access?

🎯 3
didibus07:01:32

Ya and since each instance of the same struct will have the fields in the same order in the array, you can use the same accessor for all of them. So it's not just like a cache over a key lookup.

nha12:01:06

I shared a form a while ago to probe the interests of clojurians around message queues technologies. As promised here are the raw results: https://account606590.typeform.com/report/sirkUS13/xTQgUFApGNu34PSV TL’DR: people are familiar with Kafka, prefer hosted SaaS products that are easy to get started with. I think I can work in that direction 🙂

👍 3
didibus20:01:19

I welcome more Clojure based SaaS

❤️ 3
clj 3
Kevin16:01:47

Is there an easy way to prettyprint maps without commas?

jjttjj16:01:21

You might not want to use another library, but https://github.com/kkinnear/zprint let's you do that and many other formatting options

Kevin16:01:12

I'll try this, thanks

Kevin16:01:05

Works great, thanks!

Kevin16:01:24

Just have to add the :comma? true and it works

emccue17:01:03

@jjttjj you can always use regular java cache libraries like caffiene

jjttjj17:01:18

I might be missing something but that still has a map-like interface, ie you still need a key/value for everything, right? I'm looking for a vector-like interface where I just conj stuff to the end

jjttjj17:01:54

I could be missing something in their docs, I might just not know the right terminology to look for

benny17:01:48

is there a “standard” on when to use a map as one parameter vs a sequence of parameters? for example…

(defn foo [{:keys [foo-a foo-b]}] ...)
(foo {:foo-a "bar" ...
; vs 
(defn foo [foo-a foo-b] ...)
(foo "bar" ...

andy.fingerhut17:01:21

Not a standard, but certainly if you expect the list of values to grow in the future, a map is more easily expandable than a list of parameters.

jjttjj17:01:31

Yeah I often agonize over these choices. I think for 2 arguments, with no further information, I would default to positional args but of course there are a lot of factors. more than 3 arguments it gets harder.

👍 3
andy.fingerhut17:01:42

A map lets you name each one, rather than remember their position in a list of args, and remembering position in a list of args over about 4 or so, or even manually checking such calls while looking at the definition of the function, can be pretty taxing.

👍 3
benny18:01:45

sounds like i’m not alone 😉 thanks @jjttjj @andy.fingerhut

andy.fingerhut18:01:51

One thing that some people worry about on the disadvantages of using a map there, is that if you later want to change the function so that a new key/value pair is required in order for the function call to work, there is no automatic editor/IDE support that I know of that will automate the process of finding all calls, other than the obvious one of "search for the function name throughout your code to find all calls". That approach works for the separate arguments approach, too, of course, but there are also lint tools like clj-kondo that could help you find wrong-arity calls.

didibus20:01:55

I've proposed on http://ask.clojure.org a solution to this. If Spec were to conform functions specs at compile time the same way it does for macros, than you could have compile time error for this if using inline named args (fn [& {:keys []}])

benny18:01:04

interesting, good point! everything a map would make refactoring challenging for sure

jjttjj18:01:43

One thing that kind of trips me up is sometimes I know I'm going to want a map for args, because I have a lot of args and/or want to have the option to add more later easily. But then I'm not quite sure how to break things up. Do I just use one big map for all args? Do I separate a leading map for "component" type args, use a separate, trailing "options" map, possibly with some positional args thrown in between (things that are definitely intrinsic to the nature of the function and probably wont change)? I've wondered if the clojure philosophy of "maps should be open and use qualified keys and combined liberally" from the spec related talks is an argument for just one big map of arguments for any functions that are going to use any map args as a convenience measure

jjttjj18:01:48

I guess, more specifically, components and options are two totally separate use cases for map arguments, do you mix them together or keep them separate?

robertfw19:01:23

I've used the "one map for components, one map for options" approach in a few places and quite liked it. The app structure was such that the components were being injected via a map already, and could easily be passed into dependent sub functions following the same pattern, and when it came time for testing, I could use generators to create my options and then do my own injection of the component map with various component mocks.

borkdude18:01:51

Usually I make required args fixed and options go last in a map

borkdude18:01:31

Not a hard rule

emccue18:01:04

Any philosophy talks go out the window when talking about something low level like caching

emccue18:01:22

@jjttjj does this look good?

jjttjj19:01:08

I think that's just a ring/sliding buffer with a integer capacity, where I need just a vector where inserts have a TTL. I'll have to dig deeper into the guava stuff though it might be in there somewhere

benny19:01:32

how about the other way around now…when destructuring, when should the destructuring happen in the signature vs a let:

(defn foo [{bar         :bar
            {{a :a} :b} :c}])
vs
(defn foo [mamap]
  (let [{bar         :bar
         {{a :a} :b} :c}]))

dpsutton19:01:55

In the signature allows tooling to display the shape more easily

💡 3
dpsutton19:01:21

And removed a layer of nesting for the let

👍 3
crinklywrappr22:01:33

Anyone have an example for doing an embedded repl? I keep trying variations of this but it never works. (clojure.core.server/start-server {:name "repl" :port 5563 :accept clojure.core.server/repl})

crinklywrappr22:01:08

I see this error on the server during connect: java.lang.ClassCastException: clojure.core.server$repl cannot be cast to clojure.lang.Named

noisesmith22:01:35

it wants a symbol, not the function itself

noisesmith22:01:39

it resolves the symbol

crinklywrappr22:01:27

yup, that works

noisesmith22:01:18

user=> (doc clojure.core.server/start-server)
-------------------------
clojure.core.server/start-server
([opts])
  Start a socket server given the specified opts:
    :address Host or address, string, defaults to loopback address
    :port Port, integer, required
    :name Name, required
    :accept Namespaced symbol of the accept function to invoke, required
    :args Vector of args to pass to accept function
    :bind-err Bind *err* to socket out stream?, defaults to true
    :server-daemon Is server thread a daemon?, defaults to true
    :client-daemon Are client threads daemons?, defaults to true
   Returns server socket.
nil

crinklywrappr22:01:51

granted, i should have looked closer at the docs. I was thrown off because the code example just provides a function reference: https://archive.clojure.org/design-wiki/display/design/Socket%2BServer%2BREPL.html

Alex Miller (Clojure team)00:01:18

This is a very old archived design page. Official docs at https://clojure.org/reference/repl_and_main

seancorfield23:01:11

Most people start a socket REPL using the JVM property so you don't need any code inside your process @doubleagent

crinklywrappr17:01:17

I remember adding that to :jvm-opts in my leiningen project and it produced an error. Maybe I did it wrong, though. :man-shrugging:

seancorfield23:01:35

That's what that page is showing, BTW: -Dclojure.server.NAME="{:address \"127.0.0.1\" :port 5555 :accept clojure.repl/repl}" That's an argument to java itself when starting up the process. You can pass it to the Clojure CLI via the -J option.

👍 3