Fork me on GitHub
#clojure
<
2017-08-24
>
bfabry00:08:45

we often use pipeline-blocking as an easy way to parallelise a whole heap of external requests, say running bigquery load jobs in p// or whatever. the default threadpool size of 8 is pretty low for that kind of task. is setting it to something like 100 a reasonable approach? or are we using the wrong tool for the job etc

hiredman00:08:45

I would be inclined to try and use pipeline-async where it runs tasks on an executor I control vs fiddling with the defaults on the global executor for core.async, but I dunno that it would matter much

yonatanel09:08:18

@bfabry I like manifold over core.async because among other things it gives you that control over the thread pool. https://github.com/ztellman/manifold

pseud09:08:31

Hey - I'm unfortunate enough to sit on a rather spotty Wifi connection and so some queries I issue against a cassandra cluster using alia time out on occasion and it's really preventing me from doing some more large-scale analysis work. I'm wondering if anyone here have some working examples / snippets and/or have tried to instruct alia to more aggressively retry fetches. I've looked at http://mpenet.github.io/alia/qbits.alia.policy.retry.html#var-on-read-timeout and while I get that these options need to be provided to alia/cluster (shown in https://github.com/mpenet/alia/blob/master/docs/guide.md) , I haven't really figured out how to define a sensible retry-policy. Anyone used this ? Anyone managed to get a reliable retry mechanism going ?

mpenet12:08:01

@pseud You can pass either kw that are mentioned in the codox for alia/cluster retry-policy, or a retrypolicy instance (from java-driver directly), in that case you need to refer to the docs of the driver

mpenet12:08:59

(on vacation, on the beach, super slow 3g cnx, difficult to provide more help from here for now sorry)

pseud13:08:17

@mpenet Hey I'm just happy you feel like helping even when you're on vacation 🙂 Thanks

Geoffrey Gaillard13:08:44

Hi everybody ! I'd like to clean my ns declarations, so I tried eastwood and slamhound on my project but both of them seems to struggle with namespaced keywords aliases (they either fail or report false positives). My project contains a lot of clojure.spec declarations, which means namespaced keywords everywhere. Is there a better alternative ?

souenzzo14:08:51

There is a good explanation about differences/use of binding, with-bindings, with-redefs-fn, with-redefs?

bronsa14:08:05

binding is a macro over with-bindings, with-redefs is a macro over with-redefs-fns

bronsa14:08:39

the macro version of the function (in both cases) takes a vector of symbols to value, the function version takes a map of vars to value

bronsa14:08:07

you'd use the function version if you need to bind/redef a var dynamically (e.g. you only have the var object at runtime)

bronsa14:08:02

as per the difference between binding and redeffing, binding shadows the root value on a thread context, meaning that all other threads will see the original value and just the current thread will see the new one, while executing code in the same dynamic scope as the binding

bronsa14:08:33

redeffing changes athe root binding of a var while executing the body, all threads will see that change so it's really only used for testing, not production

owen14:08:25

I've never hit this situation before, curious if anyone has. Trying to test a java file from a clojure repl, is there anyway to compile the java file and see the changes in clojure without restarting the repl?

owen14:08:59

hard for me to google, as all the results I've seen have been for how clojure gets compiled

owen14:08:46

yeah looks possible, thanks

rinaldi19:08:46

Anyone with experience in Clojure ↔️ Apache Spark integration? I am divided between Sparkling and Powderkeg. Would be very helpful to hear from other people's experience.

mccraigmccraig19:08:19

@rinaldi i think mastodon-c and @otfrom might be able to help you there

mccraigmccraig19:08:48

they hang out in #clojure-uk @rinaldi

yedi19:08:34

is there a built in fn that takes a list of maps and creates a map of those maps where the key is a certain unique key from each map

yedi19:08:32

so if i have a list of maps that each have uuid keys, i want to make a map of those maps keyed by uuid

yedi19:08:48

should be simple to write my own fn, but seems like something there might be a built-in for

yedi19:08:55

group-by is close, but the mapped value is a vector of the map instead of just the map

yedi19:08:04

which makes sense since theres no uniqueness guarantee

noisesmith19:08:56

right, it’s easy to fix that in post-processing - especially if you know they will be distinct

yedi19:08:08

ill just go with

(defn group-by-unique-key [key coll]
  (into {} (map #(vector (get % key) %) coll)))

noisesmith19:08:33

you can replacethat function with (juxt key identity) if the keys are keywords, (juxt #(% key) identity) if not

noisesmith19:08:55

also style wise it’s good not to shadow clojure.core functions with locals

Alex Miller (Clojure team)19:08:47

meh, shadow away I say (as long as the scope is clear)

noisesmith19:08:45

I find it slows my reading- needing to double check when I recognize names from clojure.core, but yeah it’s not a correctness thing it’s just an opinionated style suggestion

hmaurer19:08:51

Is there a function that does a map + filter on a map?

hmaurer19:08:30

take key-value pairs, filter some out and apply a function to the key and/or values

tbaldridge19:08:48

it's not that bad to write one yourself:

(defn filter-map [pred mp]
  (into {}
     (filter pred)
     mp))

tbaldridge19:08:09

(filter-map (fn [[k v]] (pos? v)) {...})

noisesmith19:08:13

yeah, this is where transducers are great

hmaurer19:08:14

@tbaldridge fair point, I was just wondering if it was in core

dpsutton19:08:43

i try not to ever shadow because if i rename a parameter i may not get all instances but the code won't break at compile but at runtime

noisesmith19:08:46

transducers are the thing that core provides - since you might want some arbitrary series of filters and mappings in between

tbaldridge19:08:06

it's a fairly rare function, most of the time you need to filter vectors or perhaps sets. Or just let the extra values flow through.

hmaurer20:08:04

oh I just discovered rename-keys; I didn’t explain my problem but it actually turns out to be an even easier solution

hmaurer20:08:22

I should really take the time to read the list of functions available in core, set, etc

gdeer8120:08:19

who among us hasn't implemented half of clojure.core before realizing there was a function for that?

hmaurer20:08:54

Question: is there a shortcut for an anonymous function which returns a value without calling a function? e.g. (map #(identity [1 %]) [1 2 3])

hmaurer20:08:14

not that adding “identity” is a problem; I am just wondering…

hmaurer20:08:26

I would also use the syntax (fn [x] [1 x])

sundarj20:08:59

you can use do or -> too

hmaurer20:08:28

@sundarj ah good point, thanks! is one more idiomatic than the other in this scenario?

sundarj20:08:03

not sure, i think it's down to preference. personally, i would say the order of clarity goes: identity, do, ->

hagmonk20:08:37

(map (partial vector 1) [1 2 3])

hmaurer20:08:12

Actually, my use-case is to build a map from a list of keys, by applying a function to every key. Is there a core function for that?

tbaldridge20:08:17

the one I use the most is (fn [[x]] x)

tbaldridge20:08:25

or (fn [[_ x]] x)

hmaurer20:08:32

e.g. go from [:a :b :c] to {:a (f :a) :b (f :b) :c (f :c)}

tbaldridge20:08:14

(zipmap ks (map f ks))

hagmonk20:08:42

yesterday, I found an elegant way to filter a vector of maps, such that I was guaranteed each map contained at minimum a certain set of keys … anyone want to suggest a solution? 🙂 assuming the keys are keywords.

tbaldridge20:08:21

That's the cleanest way. There are faster, less-simpler approaches though.

(persistent!
    (reduce
       (fn [acc k]
         (assoc! acc k (f k)))
       (transient {})
       ks))
is probably one of the fastest

hmaurer20:08:11

what does transient do?

noisesmith20:08:56

shouldn’t that be equivalent to

(into {}
     (map (juxt identity f))
     ks)

noisesmith20:08:11

or is using persistent and transient directly helping there?

noisesmith20:08:45

oh right assoc! is faster, I benchmarked this before but forgot

hmaurer20:08:31

@hagmonk my shot:

(defn my-filter
  [ms min-set-of-keys]
  (filter #(subset? min-set-of-keys (set (keys %))) ms))

hagmonk20:08:46

[ {:a 1 :b 2 :c3 } {:a 1 :b 2} {:a 1 :c 3}]

roklenarcic20:08:48

if anyone is using Cursive, I think there's some menu where I can add macros that are defn-like so that cursive knows how to format

roklenarcic20:08:53

where would that be

hagmonk20:08:20

@hmaurer that's where I initially went, clojure.set ... but then remembered keywords are functions ...

hagmonk20:08:48

Assuming that sample data is in v

hagmonk20:08:00

(filter (every-pred :a :b) v)

hagmonk20:08:25

@roklenarcic you can hover over the form and a little lightbulb icon appears, which lets you add more to that set. Aside from that just search for "clojure" in the settings pane and it should be quick to find

hmaurer20:08:42

=> (filter (every-pred :a :b) [{:a 33 :b false}])
()

hmaurer20:08:16

same issue if the key is nil

hagmonk20:08:40

yeah it relies on truthy key values 🙂 So it's not a strict solution to the problem

hmaurer20:08:50

neat though!

hagmonk20:08:24

there's probably some gymnastics you could do with juxt or composing another function for every-pred that solved the strict case

gdeer8120:08:57

@hmaurer this almost looks like a contains? function where you can pass in more than one key

gdeer8120:08:23

but it also kind of looks like something you could do with clojure.spec

gdeer8120:08:08

(defn multi-contains? [m & args] (every? true? (map #(contains? m %) args))) which lets you do (filter #(multi-contains? % :a :b) [{:a 1 :b 2 :c 3} {:a 1 :b 2} {:a 1 :c 3}])

noisesmith20:08:48

@gdeer81 what about (every? #(contains? m %) args)

gdeer8120:08:31

yes that also works 😊

noisesmith20:08:11

@matthewdaniel map is lazy, it does nothing if you don’t consume its result

noisesmith20:08:18

you can replace it with run!

MegaMatt20:08:32

(run! log-subject) ?

noisesmith20:08:46

that’s missing the subjects arg, but yeah

MegaMatt20:08:15

oh geez. I should have asked earlier

gdeer8120:08:29

well since he's using the ->> it will get added in

noisesmith20:08:31

it’s one of the most common early clojure problems

noisesmith20:08:05

oh, in the second case sure, I was talking about line 7

noisesmith20:08:22

the function on line 11 should be fine as long as its return value is consumed

MegaMatt20:08:43

yup, tested it with the 11 version and its working

gdeer8121:08:14

there are some guides on clojure's laziness but to a new clojure developer that means nothing so it won't click that you shouldn't use map with side effecting functions. @stuartsierra wrote the best blog post about this https://stuartsierra.com/2015/08/25/clojure-donts-lazy-effects

gdeer8121:08:49

who among us doesn't have a story about trying to process a giant excel spreadsheet and put every row into a database just to realize there are no rows in the database because you made a rookie mistake?

gdeer8121:08:02

so you make the code realize the intermediate results from slurping the excel data but your process to put all that data into the database is also lazy

misha21:08:40

@noisesmith this is what I end up with. I did not do a topo sort, since (I think) I will need to be able to add states to compiled machine one at a time, w/o recalculating all of it.

misha21:08:04

would you be so kind and tell me if it does look "acceptable"? opieop

misha21:08:29

(there was a bit more readable version with assoc-ins and update-ins, but it was like ~20% slower)

noisesmith21:08:03

I’d do optimization for readability and optimization for performance as separate steps, and when in doubt do readability first

noisesmith21:08:19

it’s easier to make readable and correct code fast than it is to make fast and correct code readable

misha21:08:50

same input, but with update-in: Execution time mean : 25.338626 µs

misha21:08:10

yeah, I made it readable (at least for me right now :) ), and then tried to make it a bit faster with not too much added ugliness

misha21:08:08

on the other hand, not sure if that 1µs worth it when function is used just once, for a single state

misha21:08:54

actually, with full readability on it is: Execution time mean : 31.226946 µs with things like:

m* (-> m
                     (assoc-in [:regions region-id] region*)
                     (update-in [:states state-id :regions] sconj region-id)
                     (update-in [:states state-id :kids] sinto region*))]

noisesmith21:08:59

I’d actually keep that version, until you determine that function is a meaningful bottleneck of your app performance

noisesmith21:08:19

there’s no point in wasting time optimizing things that will barely nudge your resource usage in the big picture

misha21:08:35

that is true

noisesmith21:08:37

(and that includes not wasting the extra reading time)

misha22:08:26

thank you

nathanmarz22:08:25

@misha update-in and assoc-in are very slow, you can use specter's transform and setval for 6x and 3x performance improvement respectively

Benchmark: set value in nested map (2500000 iterations)

Avg(ms)		vs best		Code
433.29 		 1.00 		 (setval [:a :b :c] 1 data)
1445.9 		 3.34 		 (assoc-in data [:a :b :c] 1)

********************************

Benchmark: update value in nested map (500000 iterations)

Avg(ms)		vs best		Code
89.021 		 1.00 		 (manual-transform data inc)
107.42 		 1.21 		 (transform [:a :b :c] inc data)
613.26 		 6.89 		 (update-in data [:a :b :c] inc)

misha22:08:38

@nathanmarz does it make sense to use specter for just 2-3 levels deep maps/vecs?

nathanmarz22:08:07

yea, I use it for stuff like that all the time

misha22:08:46

I will look into specter, thank you; however me optimizing µs above is clearly just a form of procrastination at this point opieop

pesterhazy22:08:01

@nathanmarz any idea why these functions are so slow?

pesterhazy22:08:13

also, is this true for clojurescript as well?

nathanmarz22:08:56

@pesterhazy the overhead of doing a reduction over the sequence slows it down a bunch

nathanmarz22:08:40

not sure how much of difference that accounts for compared to whatever else they're doing

nathanmarz22:08:49

have to look at source

nathanmarz22:08:53

specter caches a nested function at the callsite roughly equivalent to nested update calls, so most of overhead is stripped out

pesterhazy22:08:00

Thanks for the explanation