clojure-uk 2017-11-16 | Slack Archive

same feeling here @maleghast I wish I also knew more about Datomic.. I kinda know what it does conceptually... but no idea how to actually use it and what it all means.

maleghast11:11:10

@thomas - I am in the same boat. I know about some cool things it can do, but rather expect that I need to have a baptism of fire (i.e. do something with it, make lots of mistakes, learn the hard way), or work with / along side someone generous of spirit who has already got the knowledge (however they got it) and learn from them.

maleghast11:11:45

There do not appear to be any good books, and online tutorials I have found suffer dreadfully from the “toy app” problem.

thomas11:11:57

that sounds like the plan indeed!!

maleghast11:11:24

Just need my CEO to get the damn funding in… 😉

thomas11:11:37

I never got round to sit down and just do it... (like so many things...)

maleghast11:11:37

’tis the way the cookie crumbles, sometimes…

maleghast11:11:07

I have two side-projects that I am not doing anything with properly and one of them would use Datomic, so maybe, one day…

maleghast11:11:58

(or if our funding comes in and we have operating money and stuff, there are clear applications for it in what I am doing now)

maleghast12:11:43

[Idiomatic / Clojuric Question] - I have a vector of maps, and I want to know how many have a certain value for a certain key. I know I can do this with a reduce, but is there a better / snappier way of doing this? (I am trying to get better at being idiomatic / thinking like a Clojurist)

otfrom13:11:00

what's the preferred way nowadays to integrate test.check w/with clojure.test?

rickmoynihan13:11:10

I’m guessing you’ve seen: https://clojure.github.io/test.check/clojure.test.check.clojure-test.html#var-defspec In my experience the above works quite well. Though there’s a difference between this and clojure.spec, figuring out how to include generative tests from spec with clojure.test is another issue altogether.

rickmoynihan13:11:52

one other tip it can be worth adding meta-data to the tests so you can run generative tests separately from the rest of the suite.

otfrom17:11:18

thx 🙂

rickmoynihan17:11:29

well not sure it’s what you were looking for

otfrom08:11:46

it is worth a try

dominicm13:11:23

chicken blood and a red moon

reborg13:11:10

@maleghast how big is the vector?

dominicm13:11:48

My brain goes to filter and count tbh @maleghast

dominicm13:11:16

(count (filter #(= :certainly (:key %)) xs))

maleghast13:11:18

@dominicm - I like the look of that - seems less heavy-handed than doing a reduce with an incrementing accumulator.

dominicm13:11:34

I use reduce very rarely

maleghast13:11:52

@reborg - Right now, in development it’s small, but it could reasonable go into the hundreds, perhaps thousand+ maps.

maleghast13:11:24

@dominicm - I am SURE I use it too much, i.e. when there are better / simpler solutions that I don’t realise I have at my disposal. 🙂

reborg13:11:02

the count filter above is perfectly fine then (imho)

dominicm13:11:42

@reborg is there something more efficient for larger collections? I'm curious

reborg13:11:23

🙂 you can go all the way up to enterprise edition

maleghast13:11:05

dominicm13:11:21

@reborg Only idea I've come up with is core.reducers for parallelizing the counting. No idea if that would produce a noticeable improvement (or if it would be a detriment)

reborg13:11:06

right, I'm curious to measure it, coming shortly

dominicm13:11:09

😂

maleghast13:11:51

😄

maleghast13:11:15

@dominicm - your help above, exactly what I needed - thanks again. This is all in the pursuit of thinking more Clojuric thoughts, so thanks for that too 🙂

reborg13:11:38

a few options for larger stuff http://sprunge.us/dSTN

reborg13:11:09

going from normal to totally overkill

maleghast13:11:42

That’s pretty new to me, but the outputs do tell an interesting tale of optimisation…

reborg14:11:53

I guess the highlight is that the fold:

(r/fold +
    ((comp
       (map :samplevalue)
       (filter #(= 75584 %))
       (map (constantly 1))) +)
    data)

reborg14:11:22

is reasonably easy to follow and ~x3 faster

dominicm14:11:47

I wonder if there's much impact by the number of matches you get. I'd be curious to know how a filter that matched >50% impacts the performances.

reborg14:11:26

yeah, my example is pretty sparse, 5 overlapping elements only

rickmoynihan14:11:45

out of curiosity @reborg what are the timings for building up the vector from the sequences, the into step… reducers are fine when you have the data in a tree already; but I’ve found seqs can be so pervasive that in practice it takes a lot of care with mapv/`filterv` etc to know things are in the right form for a fold

reborg14:11:31

yes, the "problem statement" above was "I have a vector" so I was happy to acknowledge the happy path already 🙂

rickmoynihan14:11:21

yup appreciated… just curious how bad it is when it’s not already in that shape

bronsa14:11:40

it doesn't make any sense to time how much building the vector takes

bronsa14:11:09

that time is eventually gonna have to be spent, whether eagerly or lazily on demand

rickmoynihan14:11:33

sorry I don’t know the usecase… but generally isn’t it an extra cost you may have to pay to put the seq into a tree for the fold to be parallelisable?

bronsa14:11:55

not necessarily, you could e.g. implement CollFoldable for chunked seqs

rickmoynihan14:11:06

e.g. if the API to a database gives you a lazyseq, I’d have to pay some presumably linear cost for into in order to parallelise processing each result.

bronsa14:11:25

but usually if you're using reducers/transducers you shouldn't operate on lazy seqs, but on reducibles

rickmoynihan14:11:40

> not necessarily, you could e.g. implement CollFoldable for chunked seqs Interesting idea…

reborg14:11:06

with databases, files, etc, there are approaches where the fold can load the chunk directly from the source, without the need for you to load the entire thing in memory

reborg14:11:23

with DBs the primary key is a natural choice to chunk a big table up

rickmoynihan14:11:12

@reborg: Yes agreed. I’ve been looking to do such things for e.g. CSV parsing, etc… i.e. splitting a file into roughly equal chunks seqing to row ends, and batching it up with a reducible API. But to my knowledge such things aren’t widespread yet… Do you know of any examples?

reborg14:11:32

Links foldable-seq https://github.com/paulbutcher/foldable-seq/blob/master/src/foldable_seq/core.clj and iota for files https://github.com/thebusby/iota

rickmoynihan14:11:58

ahh yes I’d seen iota… but foldable-seq is new to me… I have a side project that I keep meaning to pick up again around doing (trans|re)ducible I/O… but maybe I don’t need to…

bronsa14:11:15

stuff like foldseq is usually a bad idea

bronsa14:11:28

it can easily cause the the heap to explode

bronsa14:11:51

see https://groups.google.com/forum/#!topic/clojure-dev/qJo7z_9CVdw

rickmoynihan14:11:55

@bronsa: doesn’t it depend what you’re reducing into? e.g. if I were counting lines in a file you should be ok, no?

bronsa14:11:01

rickmoynihan14:11:44

or is it because fold is presuming a tree-like reduction?

bronsa14:11:07

lazy seqs are impossible to fold in parallel w/o holding onto the original seq, period

rickmoynihan14:11:19

@bronsa: ahh yes lazy-seq’s totally

bronsa14:11:21

hence why i suggested chunked seqs not lazy seqs

rickmoynihan14:11:31

ok… understood… sorry had missed that detail

rickmoynihan14:11:20

yes, in that case I agree… the (trans|re)ducible I/O stuff I was trying to do wasn’t backed by seqs. Basically underneath it would just keep feeding you the mutable IOStream/Reader and let higher level things parse them into collections etc.

rickmoynihan14:11:31

but interesting idea about ChunkedSeqs… will need to do some digging

conan14:11:39

here's a question:

(update-in #{1 2 3} [1] inc)
ClassCastException clojure.lang.PersistentHashSet cannot be cast to clojure.lang.Associative  clojure.lang.RT.assoc (RT.java:820)

Why is a set not an associative data type that maps its values to themselves? For example:

(#{1 2 3} 2)
=> 2

bronsa14:11:05

what would (assoc #{} 1 2) return

bronsa14:11:15

associative has to do with assoc not get

dominicm14:11:00

I suppose, in a world where update-in would work, it would return #{2}

dominicm14:11:17

and then (update-in #{2} [2] inc) would return #{3}

bronsa14:11:22

that doesn't make any sense

dominicm14:11:36

I agree, I don't really think it makes much sense either.

dominicm14:11:53

I think the idea is that essentially, updating a value, renames the key?

bronsa15:11:17

you need to think about the assoc behaviour not update-in

bronsa15:11:27

update-in follows from assoc, not the other way round

bronsa15:11:41

and there's just no sensical behaviour for assoc over sets

conan15:11:53

isn't there? surely associng a key into a set adds the value to that set?

conan15:11:22

so (assoc #{1 2} 3) => #{1 2 3}

bronsa15:11:56

that's not how assoc works

bronsa15:11:00

that's conj

conan15:11:50

it seems to me that is how i would expect it to work

conan15:11:08

it just happens that conj and assoc are the same thing for sets

conan15:11:24

(i mean i know the language doesn't currently work like that, but i don't understand why not)

bronsa15:11:36

assoc takes k,v not just k

yogidevbear15:11:55

It would be for hashmaps or vectors yes?

bronsa15:11:09

conan15:11:38

maps map arbitrary keys to arbitrary values, vectors map natural number keys to arbitrary values, and sets map arbitrary keys to themselves

bronsa15:11:36

look, to make assoc work on sets there's 2 alternatives: - either accept (assoc my-set k v), but it only makes sense when k == v and when k <> v the behaviour would make no sense - or accept (assoc my-set k), changing the signature of assoc, but then (assoc my-map k) would make no sense

bronsa15:11:49

it just doesn't make sense either way you look at it

conan15:11:57

yes i'm not so worried about the syntactical implications

conan15:11:19

we don't need assoc to work for sets because we have conj

conan15:11:31

but conceptually i don't see why it shouldn't

conan15:11:46

and hence why update-in wouldn't work

bronsa15:11:51

then what? a set is trivially a specialization of a map, so if that's your argument then ok. but it's not an argument for making assoc work on sets