Fork me on GitHub
#clojure
<
2020-07-16
>
zeitue04:07:44

How would I go about turning a vector of maps [{:a 3 :c 6} {:a 10 :c 8}] into something like this {:a [3 10] :c [6 8]} I am trying to get the average of each key and I think this is the best way to go about doing this.

salam04:07:42

(apply merge-with vector coll)

noisesmith04:07:12

this fails if any key is only present in one item

noisesmith04:07:59

(not fails per se, but imposes data that could be a vector of items, or a single item)

salam04:07:52

yeah, very interesting behavior of merge-with

noisesmith04:07:47

a rule of thumb I try to follow is never use merge-with to change data representation (because it only hits conflicted keys) - which is a shame because your one liner is a lot nicer than my transduce

noisesmith04:07:13

maybe there's a similar one liner that's escaping my at this hour, which does the data transform correctly

zeitue04:07:34

I actually spent a lot of time trying to use merge and group-by to do this

noisesmith04:07:19

it looks so close to group-by, but my experience is that it's easier to start from a reduce with update/conj than it is to start with group-by then fix it

noisesmith04:07:37

or at least the latter leads to clearer code

noisesmith04:07:06

then transduce lets me apply the cat transform which simplifies the reduce

๐Ÿ‘ 3
zeitue04:07:41

yeah I agree your code looks much cleaner and smaller than the mess I wrote

noisesmith04:07:16

alternative, messier version that uses transients and exploits the transducing function completion arity

(transduce cat
           (fn
             ([m]
              (into {} (map (fn [[k v]] [k (persistent! v)]))
                    m))
             ([m [k v]]
              (update m k (fnil conj! (transient [])) v)))
           {}
           [{:a 3 :c 6} {:a 10 :c 8}])

noisesmith04:07:42

I wouldn't use this version unless each key contained a relatively large number of items

๐Ÿ‘ 3
noisesmith04:07:03

and the map transducer should probably be lifted into a def with a sensible name

noisesmith04:07:25

persist-keys or whatever

noisesmith04:07:38

but the non-transient version is probably better for the simplicity

zeitue04:07:04

OK, I'll def that function out

seancorfield04:07:01

How about this:

(let [base (zipmap (into #{} (mapcat keys coll)) (repeat []))]
  (apply merge-with conj base coll))

seancorfield04:07:46

I hadn't thought about the downside to merge-with until @noisesmith mentioned it -- and that got me think about what would create a basis for using it without that.

noisesmith04:07:55

yeah, I wonder what it should be called - lift-with if we were a ML language :D

noisesmith05:07:46

@U04V70XH6 so in the case of two maps, wouldn't that need to be updated or called on both? because currently that only augments one map

noisesmith05:07:37

oh, never mind, coll is a sequence of hash-maps here, now I get it

seancorfield05:07:44

It builds a basis map with all the keys and a standard "zero" value for each of them, then does merge-with on all the original data.

noisesmith05:07:15

right, took me a moment, I shouldn't be trying to code at this hour (I'm a morning programmer...)

noisesmith04:07:34

I think this is a good use case for transduce

(transduce cat
           (completing (fn [m [k v]]
                         (update m k (fnil conj []) v)))
           {}
           [{:a 3 :c 6} {:a 10 :c 8}])

{:a [3 10], :c [6 8]}
โ€ข edited for formatting

๐Ÿ‘ 3
โค๏ธ 3
zeitue04:07:37

that's pretty cool, did not know about transduce

pithyless10:07:12

@zeitue I find myself reaching often for net.cgrand/xforms to work with transducers:

(into {}
        (comp cat
              (x/by-key first (comp (map second)
                                    (x/into []))))
        [{:a 3 :c 6} {:a 10 :c 8}])
  ;; => {:a [3 10], :c [6 8]}
What's really nice, is you can do the averages, etc. directly in the transducer, eg:
(into {}
        (comp cat
              (x/by-key first (comp (map second)
                                    (x/transjuxt 
                                     {:all (x/into [])
                                      :sum (x/reduce +)
                                      :avg x/avg}))))
        [{:a 3 :c 6} {:a 10 :c 8}])
  ;; => {:a {:all [3 10], :sum 13, :avg 6.5},
  ;;     :c {:all [6 8], :sum 14, :avg 7.0}}

๐Ÿ˜ฒ 3
zeitue12:07:37

that's pretty great, I can see this as a huge time saver, thanks

datran15:07:51

I want to implement a protocol that works on all maps - do I have to individually extend-type on each type? clojure.lang.PersistentHashMap, clojure.lang.PersistentArrayMap, etc.

datran15:07:59

Or is there a way to get them all in one swoop?

noisesmith15:07:28

I'd find the set of interfaces that suffices to have all the operations you need - often java.util.Map and clojure.lang.IPersistentMap are enough. Don't extend to types, extend over the Interfaces that all the types implement.

datran15:07:02

How do you find what interfaces are implemented by each type? Is there a way to get that from the repl?

noisesmith15:07:15

user=> (supers (class {}))
#{java.lang.Runnable java.lang.Iterable clojure.lang.IFn clojure.lang.APersistentMap clojure.lang.IPersistentCollection java.util.Map java.io.Serializable clojure.lang.IMeta clojure.lang.IMapIterable clojure.lang.Seqable clojure.lang.IPersistentMap clojure.lang.Counted clojure.lang.Associative clojure.lang.IKVReduce java.util.concurrent.Callable clojure.lang.ILookup java.lang.Object clojure.lang.IEditableCollection clojure.lang.IHashEq clojure.lang.AFn clojure.lang.MapEquivalence clojure.lang.IObj}

noisesmith15:07:08

but it's easier to start with "what set of operations do I need to implement my feature?" then find the minimal set of interfaces that provide that set of operations when combined

noisesmith15:07:52

the reason the interfaces are so granular, is that means you can implement the precise set that you need for your feature, then know that all data types that are relevant are now usable

noisesmith15:07:19

if it was just one big interface with all the methods it would actually be harder

datran15:07:28

Is there a way to see which methods an interface/protocol declare?

noisesmith15:07:59

easiest is to find the doc (or worst case the source code - all the interface is, as a java file, is a set of methods without implementations)

noisesmith15:07:17

the javadoc function in the repl will find a doc and open it in your web-browser

noisesmith15:07:26

just pass it the thing you want method info about

noisesmith15:07:36

eg

user=> (javadoc java.util.Map)
true
goes and opens https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/util/Map.html for me in my browser

noisesmith15:07:49

I guess you could even map javadoc on all the supers if you want to open 20 tabs

datran15:07:29

ok, this is making more sense now, thanks

noisesmith15:07:55

so you can safely extend your new protocol to all Map if you limit yourself to the methods that that interface defines (tricky with clojure since our maps don't use the put method which mutates)

noisesmith15:07:29

but maybe by extending both IPeristentMap and Map, you get all the data types you needed

MatthewLisp15:07:21

Hello Clojurists

๐Ÿ‘‹ 3
MatthewLisp15:07:54

Is there an easy and safe way to perform a deep contains? for nested keys ?

MatthewLisp15:07:43

so i don't need to write get-in's on lot's of predicates

noisesmith15:07:49

(ins)user=> (some (fn [x] (and (map? x) (contains? x :k))) (tree-seq coll? seq {:a 0 :b {:k 1}}))
true
(cmd)user=> (some (fn [x] (and (map? x) (contains? x :k))) (tree-seq coll? seq {:a 0 :b {:l 1}}))
nil

noisesmith15:07:03

tree-seq is a great way to abstract deep searches, and do them lazily

โœ”๏ธ 3
noisesmith15:07:45

I guess contains? is flexible enough we don't need the usual type check here...

noisesmith15:07:21

user=> (some #(contains? % :k) (tree-seq coll? seq {:a 0 :b {:k 1}}))
Execution error (IllegalArgumentException) at user/eval269$fn (REPL:1).
contains? not supported on type: clojure.lang.Keyword
nope, need the type check

noisesmith15:07:40

if you need the value too, replace contains? with find to get the k/v pair instead of a boolean

MatthewLisp15:07:23

much better than having to write a lot of get-in for my predicates

noisesmith15:07:51

really IMHO there should be a def like (def ->tree (partial tree-seq coll? seq)) in the language, but it's easy to make ourselves

MatthewLisp15:07:24

unfortunately i haven't set time aside to sit down and understand tree-seq

MatthewLisp15:07:44

but i'll do it because i see i'm in need

noisesmith15:07:44

it's a lazy-seq generator over a tree walk

noisesmith15:07:38

where the first args tell you how to walk - coll? is the most common way in real code to know you have a node, and seq is the most common way to get the children, in the clojure code I've seen

MatthewLisp15:07:21

i don't really understand why we don't have in clojure.core functions like deep dissoc, guess it's because in reality we can have duplicate key names across different paths

noisesmith15:07:11

there's a jira ticket requesting dissoc-in where the discussion explains why it isn't implemented

MatthewLisp15:07:38

oh, i'll look into it

noisesmith15:07:56

that said, there's helper libraries that do have it

noisesmith15:07:13

but it sounds like what you really need is a recursive scrub (remove key on all levels from all maps)?

noisesmith15:07:23

I'd use clojure.walk/postwalk for that

noisesmith15:07:01

something like (walk/postwalk (fn [x] (if (map? x) (dissoc x :password :sig :secret) x))) coll)

noisesmith15:07:52

postwalk is for postwalk traversal rewriters as tree-seq is for simple visiting of a tree

MatthewLisp15:07:26

nah, it's just a set of predicates to determine the state of the application (for example, if i don't have the key :data/user it means that the user is not registered), some information is deeply nested and i'm lazy enough to don't want to write get-in's over and over for different predicates

noisesmith15:07:58

ahh, you could still use tree-seq for that then (or fire your architect and get a proper data model / API :D)

MatthewLisp15:07:19

well, here's the sad news

MatthewLisp15:07:24

i'm the architect

๐Ÿ˜„ 3
noisesmith15:07:46

it's never too late to do some designing in your hammock though

๐Ÿ‘ 3
MatthewLisp15:07:47

and that's the best i could do with the little exp i have

MatthewLisp15:07:15

i saw that talk from Rich about values of values

MatthewLisp15:07:31

and went crazy defining a huge state map on every system request

MatthewLisp15:07:09

on that state map, the lack of information, or the presence of information, means some particular state (like lacking the :data/user it means user not registered)

MatthewLisp15:07:25

that's somewhat much better than before, because i didn't even had a state map

MatthewLisp15:07:03

honestly that's not much different from checking nil values on keys anyway

MatthewLisp16:07:43

@noisesmith i know that's somewhat vague, but do you have recommendations of any resource's for learning how to architect better data models and how to correctly use them?

MatthewLisp16:07:15

or anyone that have any recommendations are welcome too

noisesmith16:07:51

there's a lot of art to it, the kind of design you want for frames of DSP data is very different from the design you want for ledger entries in a banking system or relationships between users of a social service

noisesmith16:07:51

there's good books out there but they are usually domain specific - one resource would be good for SQL tables, another good for mainstream java object hierarchies, etc.

noisesmith16:07:11

for clojure, generally, I think Zach Tellmand's Elements of Clojure is great

noisesmith16:07:42

also consider having a flattened map with namespaced keys, and separate clojure.spec definitions for various aspects (eg. a spec that must match for a logged in user, a spec that must match for a user accessing the admin, a spec that must match for a request for some resource...) that would all exist on the same level of one map

๐Ÿ‘ 9
MatthewLisp16:07:18

I'm currently using Clara-rules in place of the spec definitions for each aspect

MatthewLisp16:07:41

guess if i spec'ed these states i could avoid using Clara in this project

JAtkins17:07:08

Before I go and duplicate some work already done, is there a project that tries do describe a data structure? e.g. for development I often have to wrangle some weird nested data (usually very large) into some more useful form. It would be nice to convert something like this

{:a ["a" "b" "c"]
 :b {:c '(\a \b \c)
     :k {:R :l}}
 :n [{:a :b}
     {:c :d}
     {:e :f}]}
to this
{:a [string?]
 :b {:c '(char?)
     :k {:R keyword?}}
 :n [{keyword? keyword?}]}
so that I can see easier what I'm working with

flowthing17:07:38

I suppose this is not exactly what youโ€™re looking for, but it might help nonetheless: https://github.com/stathissideris/spec-provider

flowthing17:07:13

Feeding the result provided by spec-provider into https://github.com/jebberjeb/specviz might also be helpful.

JAtkins17:07:43

Both those projects are super useful, thanks for pointing me to them ๐Ÿ™‚

pithyless17:07:26

Similar to spec-provider, malli also has its own malli.provider/provide :

(def sample-data
    [{:a ["a" "b" "c"]
      :b {:c '(\a \b \c)
          :k {:R :l}}
      :n [{:a :b}
          {:c :d}
          {:e :f}]}])

  (malli/provide sample-data)
;; => [:map
;;     [:a [:vector string?]]
;;     [:b [:map [:c [:list char?]] [:k [:map [:R keyword?]]]]]
;;     [:n
;;      [:vector
;;       [:map
;;        [:a {:optional true} keyword?]
;;        [:c {:optional true} keyword?]
;;        [:e {:optional true} keyword?]]]]]
(Note the use of a vector of sample-data, because the api is intended to return better specs if you give it more samples.)

๐Ÿ‘ 6
flowthing18:07:51

Good call โ€” I think Malli can also visualize its specs? Havenโ€™t tried it, though.

pithyless18:07:46

Yeah, the export to DOT notation is a rather new feature; have not played with it yet. But I find malli has been my go-to data speccing library recently (and only use clojure.spec for things like macro syntax or interop with existing library specs).

noisesmith17:07:59

very literally that looks like plumatic/schema but most people have moved from that to clojure.spec.alpha

noisesmith17:07:16

in terms of auto-generating that transform - I'm sure someone has done it

lilactown17:07:23

meander sounds similar to what youโ€™re saying

lilactown17:07:55

in terms of describing the transformation

JAtkins17:07:56

That's the goal actually. I'm using meander and being able to see the large scale structure would be super useful

lilactown17:07:21

ah I see sorry

JAtkins17:07:42

No problem, should have been more clear

lilactown17:07:10

schema and spec is typically whatโ€™s used for authoring something like that

lilactown17:07:22

it would be neat to try the other way - generate a spec or schema based on examples

JAtkins17:07:55

I'm kind of thinking of what it would be like to have a web ui for this. Since multiple different descriptions can apply, maybe have a cycle button for each node to see the different applicable descriptions... I may still need to try this out.

lilactown17:07:30

I donโ€™t know off the top of my head anything like that yet

flowthing17:07:12

I posted it in the thread above, but thereโ€™s https://github.com/stathissideris/spec-provider.

โœ”๏ธ 3
phronmophobic18:07:57

some caveats of the spec-provider library are: 1. works best with namespaced keywords 2. if you're not using namespaced keywords, it works best when reused keys refer to the same data type

Drew Verlee18:07:58

is there a way for edn/read to be give the more information about where it failed to read? E.g i have an error that there is no reader function but as far as i can tell i'm definatlyl passing edn/read ad a :reader with that tag.

Drew Verlee18:07:55

I narrowed it down, apparently when the discard tag #_ followed by something that contains another custom tag e.g #drews/thing results in the reader not having the reader passed to it available.

edn file
{ #_ {:foo #drew/thing "..."}}

reading:

(edn/read {:readers {'drew/thing ...} ...)

ghadi18:07:09

that's not what that signifies @drewverlee it just means the thing you're discarding must itself be readable

ghadi18:07:20

so you can't discard a tagged value you don't normally handle

Drew Verlee18:07:43

> A reader should not call user-supplied tag handlers during the processing of the element to be discarded.

Drew Verlee18:07:05

I'm interperating that as dont' call #_ on a user supplied tag handler.

ghadi18:07:20

#_ isn't something to call

ghadi18:07:26

it's not a tag

ghadi18:07:32

though it looks like one

ghadi18:07:40

user=> (def input "{ #_ {:foo #drew/thing \"...\"}}")
#'user/input
user=> (require '[clojure.edn :as edn])
nil
user=> (edn/read-string {:readers {'drew/thing (fn [v])}} input)
{}

Drew Verlee18:07:42

know i have more questions. if its not a tag what is it?

Alex Miller (Clojure team)18:07:09

# is the generic dispatch character - will dispatch to a specific reader

๐Ÿ‘ 3
Drew Verlee18:07:06

the docs say "its the discard sequence"

ghadi18:07:13

it's syntax

ghadi18:07:21

not a tag

seancorfield18:07:55

It says "Read the next form with user-supplied tag handlers turned off, then throw that form away"

ghadi18:07:53

that's how I interpret it the spec too, but the implementation still calls the tag handlers, even when discarding + @U064X3EF3

Alex Miller (Clojure team)18:07:53

I would interpret what's there as - the thing after #_ will be read, but will not invoke the tag handler

Alex Miller (Clojure team)18:07:06

if it does, that seems like a bug per the spec

seancorfield18:07:10

user=> (edn/read-string {:readers {'drew/thing (fn [v] (println "drew/thing called for" v))}} input)
drew/thing called for ...
{}
user=> 
Ah, yes, it does seem to call the user-supplied reader.

Alex Miller (Clojure team)18:07:04

user=> (edn/read-string "[#_#foo 1]")
Execution error at user/eval3 (REPL:1).
No reader function for tag foo

Alex Miller (Clojure team)18:07:18

per that text, I would expect that to return []

seancorfield18:07:51

Me too. So... bug in read-string...

Alex Miller (Clojure team)18:07:43

yeah, that's how I'd read it

seancorfield18:07:37

Or am I also misinterpreting it?

icats18:07:22

Hello! Was wondering if anyone had guidelines / philosophies on Clojure naming conventions (namespaces, functions, and especially symbols in let bindings, etc). Do you tend to lean more towards self-documenting code (eg spelled out descriptive names instead of less-than-meaningful abbreviations or acronyms), or, keeping things super short (ex: shortening first-name to nf and last-name to ln, etc) and perhaps supplementing with function comments?

hiredman18:07:34

my def'ed names tend towards comically long, and locals tend towards cryptically short

๐Ÿ‘ 6
๐Ÿ˜† 3
icats18:07:15

the cryptically short locals is something Iโ€™ve been trying to get used to ๐Ÿ™‚. do you do this for readability? Have you run any any issues related to this (eg understanding code later, etc)

hiredman18:07:18

it is because coming up with good names is hard

hiredman18:07:25

so like, if I have x and then do something to adding more information to it, the easiest name for it is x' (although I do try to avoid being that terse)

icats18:07:00

yeah - I have run into the idiomatic naming conventions like https://github.com/bbatsov/clojure-style-guide#idiomatic-names; which makes sense to me; there are readability gains that you can get from just having shorter lines.

hiredman18:07:58

in general, the more generic the code the less descriptive the names

icats19:07:53

Yeah, and certainly that benefit too.

salam19:07:11

> Names are meaningful and specific, and their length is proportional to their scope. A loop variable used only once in a two-statement loop may be called "i", but a global variable that may be used anywhere in the program will have a long name that accurately describes its usage.-----D. Boundy One of the rules that I've been following when naming things.

๐Ÿ‘ 3
seancorfield18:07:46

Zach Tellman's "Elements of Clojure" has some great guidance on naming.

๐Ÿ‘ 18
icats20:07:37

Conveniently, the sample of the book contains at least some of the naming sections: https://leanpub.com/elementsofclojure/read_sample

markbastian21:07:54

Is there a more efficient way to pretty-print output to a file than (spit file (with-out-str (pp/pprint data)))? That works fine for small data, but is very slow for anything substantial. pr-str doesn't seem to have any "prettification" options.

dpsutton21:07:41

i think fipp is supposed to be faster.

๐Ÿ‘ 6
noisesmith21:07:48

also, if you provide a stream as the second arg to pprint, it can output as it goes, with-out-str constructs a single string then outputs that

๐Ÿ‘ 3
noisesmith21:07:11

I'd expect the stream arg to perform better for larger inputs

noisesmith21:07:08

not much difference actually

cmd)user=> (time (pprint (range 1000) (io/writer (io/file "/tmp/a.edn"))))
"Elapsed time: 63.58207 msecs"
nil
(cmd)user=> (time (spit "/tmp/b.edn" (with-out-str (pprint (range 1000)))))
"Elapsed time: 72.99806 msecs"
nil

bbloom21:07:29

hi, author of fipp here ๐Ÿ™‚

bbloom21:07:02

@noisesmith for anything < a kilobyte or so, indirecting through a string wonโ€™t matter much b/c thatโ€™s going to be about the size of the intermediate buffers in java or your kernel or whatever

๐Ÿ‘ 3
bbloom21:07:30

but youโ€™re right, printing to a file writer directly instead of a string will be a huge win as sizes grow

markbastian22:07:51

Just tried fipp and it smoked it! Thanks!

๐Ÿ˜ 3
dpsutton22:07:23

got rough numbers you can share?

dpsutton22:07:33

size and time for clojure.pprint vs fipp?

ghadi22:07:03

way faster ^ rough numbers

ghadi22:07:01

I wish I had a tiny bit more control on the EDN side of things with fipp. I was looking at printing byte arrays as tagged hex or base64 strings

ghadi22:07:58

emacs was way snappier with fipp

markbastian22:07:59

The raw data dump (spit file data) was around 20MB. pp/pprint died, so I got nothing for you there. Serializing using this:

(with-open [o (->> (io/file filename)
                   io/output-stream
                   GZIPOutputStream.
                   io/writer)]
  (fipp/pprint data {:writer o}))
took about 30s ("Elapsed time: 30274.847977 msecs"). And worked. Final file is 1.4MB. Uncompressed is ~34MB.

bbloom22:07:15

@ghadi the edn printer is ~100 lines of code - copy/paste it and tweak to your hearts content ๐Ÿ™‚

ghadi22:07:31

yeah seems tight, just didn't get around to it

bbloom22:07:36

tho you should be able to simply implement IEdn

bbloom22:07:22

@ghadi obviously, use a faster hex encoder than this, but here you go:

bbloom22:07:25

user=> (extend (class (byte-array 0)) fipp.ednize/IEdn {:-edn (fn [bs] (tagged-literal 'bytes  (apply str (map #(format "%02x" (int %)) bs))))})
nil

user=> (fipp.edn/pprint (byte-array 10))
#bytes "00000000000000000000"
nil

9