Fork me on GitHub
#clojure-europe
<
2024-02-06
>
thomas07:02:34

morning... very 💨 here again.

schmalz08:02:27

Morning all.

otfrom08:02:04

morning. Wind has died down here, but the temp is going to drop again

thomas08:02:03

temperature is nice here... but you just don't notice it because of the wind.

iperdomo08:02:19

Morning! back from FOSDEM '24 ... recording available on Cyber Resilience Act (CRA) and the Product Liability Directive (PLD) ... - https://fosdem.org/2024/schedule/event/fosdem-2024-3683-the-regulators-are-coming-one-year-on/

1
1
maleghast11:02:11

madainn mhath :flag-scotland:

Ed11:02:26

Morning

Ben Hammond11:02:22

I'm wrestling with a deps.edn tree that is determined to download artifacts from maven central which is not available via the cantankerous corporate firewall. does clj have a way to generate a consolidated deps.edn file? or a --trace equivalent so I can see where it is picking up the url from I can see a list of deps.edn in clj -Sverbose but none of them reference maven central...

Ben Hammond11:02:18

-Strace does not seem to survive the Failed to read artifact descriptor so it does not provide any useful debug info

imre12:02:19

#C6QH853H8 might be able to help?

otfrom13:02:21

I might have asked before, but do you prefer: (into #{} (filter my-pred) my-list) or (->> list (filter my-pred) distinct) ?

1️⃣ 1
imre13:02:28

first one for sure

imre13:02:38

it uses a transducer

imre13:02:53

second one creates a lazy sequence that then is thrown away

imre13:02:23

well, it creates one that's thrown away and then another one

imre13:02:03

if order is to be kept you can still use (into [] (comp (filter my-pred) (distinct)) my-list) and if you want the end result to be lazy then (sequence (comp (filter my-pred) (distinct)) my-list)

otfrom13:02:24

in this instance it is the distinctness I'm after mostly. And afterwards I'm usually using it in a seq compatible fashion, so either works.

otfrom13:02:09

my pref is the transducer (either into, xforms/into, sequence, eduction, transduce, whatever)

otfrom13:02:40

If I have a seq with map/filter/etc that I then do a reduce on the end of then I go for transduce

otfrom13:02:31

things get a bit more complicated where the data is big enough I want to go multithreaded when I start to reach for tech.ml.dataset.reductions/group-by-column-agg or ham-fisted.reduce/preduce and friends

imre13:02:01

Yeah my main point is that the filter in the middle is wasteful if it's the lazy arity

otfrom13:02:10

tho there is a lot in ham-fisted I still need to get my head around

otfrom13:02:31

yeah ->> is one to be a bit careful about

Ben Sless14:02:22

Another way to go multi threaded with transducers is core.async pipelines. Same semantics as into

otfrom14:02:19

I've done that too. I often found it pretty slow, but I'm thinking that was a PEBCAK problem rather than anything else

otfrom14:02:48

what I do like about core.async was being able to create a tree of processing. Lot of the stuff I do want has a common beginning (and or common middle steps) and being able to do those things once felt nice even if it didn't give me the wall clock performance I'm after (most of my constraints are around me sitting in front of a computer waiting for a bit of analysis or projection or modelling to complete)

otfrom14:02:36

I do like being able to put a (comp (map foo) (filter bar)) on a channel though

otfrom14:02:25

if someone could point to me how I can really make core.async do batch processing in a fast way that takes advantage of all the cores in front of me I'd be very grateful

Ben Sless14:02:39

The main bottleneck is where doing per element processing isn't very efficient for CPU bound tasks. You can partition the input, then in parallel transduce every each batch, then combine Essentially a diy map reduce Play with the batch size and parallelism parameters and you'll probably get significant speedups

otfrom14:02:53

yeah, I end up playing with batch size a lot (luckily my data fits well with that as I usually have 100 or 1000 simulations so I can make a reasonable number of batches w/o thinking too hard about it)

Ben Sless15:02:40

Tried the reducers library?

Ed15:02:54

For me, I prefer transducers over seq's because they are explicit about the return type (and I'm including lazyness in the type). When I started learning clojure, i tripped over lazy-seq's quite a bit, including returning them from dynamic scoped things like db transactions. Also you see people mixing side-effects and lazyness all the time. transduce gives you a clear separation between side effects (in the rf) and functional transformation (the xform). The better gc profile and flexible parallelism are, I think, indications that it's a flexible abstraction. I kinda wish they weren't considered an advanced topic. I think lazy-seq's have more subtle rules that make them harder to get right when mixed with things like side-effects.

👍 1
1
otfrom16:02:26

@UK0810AQ2 I have used reducers before. I do go for them sometimes still, but they are a bit last on my list.

otfrom16:02:57

@U0P0TMEFJ I find using transducers to be really straightforward (most of the time). Tho I do have to admit I still see the rf in the transduce as functional (tho maybe I'm missing something here).

Ed19:02:21

oh ... my bad ... I didn't mean that rf's should be side effecty, just that they could - and they provide a distinct separation. Transducers are a transformation of a reducing function and a reduction has a source and a sink. Putting those together into a transducing context means that you can write a little mechanical bit that takes data from a kafka stream and puts data into a db (or whatever), passing in a transducer. It's a great way to separate out the functional transformation (and test it using into) from the clunky side effecty stuff ... it just seems like a really powerful abstraction to me ...

💯 1
otfrom13:02:42

I suppose one will return a seq and the other a set

yes 1
☝️ 1
1