Fork me on GitHub
#clojure
<
2023-01-06
>
Célio00:01:38

Hello folks, I'm playing with depth-first search algorithm in Clojure, and while trying to optimize it with transient collections I was surprised to see that the version with transients is slower than the version with persistents. I assume there's something wrong with my code but I can't seem to figure it out. Does anyone know what's going on? I put the code in this github gist: https://gist.github.com/ccidral/4816a84eaa0c8d758e8c3cd93790e9b6 With a large dataset, i.e. a graph with 1M vertices and ~7M edges, the version with transients is about 1 second slower in average on a machine with an Intel i5 11th gen CPU and 16G RAM. I was able to make some other optimizations in both versions which reduced the difference in runtime to about 0.5 seconds, but still the transients version was slower. For simplicity/readability the code in the gist above is the non-optimized code.

phronmophobic01:01:08

I would usually reach for https://github.com/clojure-goes-fast/clj-async-profiler to investigate. The only thing I see that looks suspicious is the implementation of peek!. The docstring for peek is "for a vector, same as, but much more efficient than, last". I'm not sure your implementation of peek! has the same characteristics.

👍 2
hiredman01:01:28

Your into! Is really inefficient (nope, it isn't, it's fine)

phronmophobic01:01:41

You'll probably get more info by profiling, but I might also try using this implementation for peek!, (although it might not make any difference)

(defn peek! [tv] (nth tv (dec (count tv))))

phronmophobic01:01:45

I would also try individually making next and visited transients to try to isolate which change might be slowing it down.

💯 2
hiredman01:01:07

it might just be in this bench mark the jvm can bump allocate and memcpy fast enough

hiredman01:01:59

Like, for most of those operations (get a vector without the last, getting the last, set membership, etc) there is basically no advantage to using transients

hiredman01:01:47

The only thing that might be an advantage is building the visited set (into is already using transients internally, so the non-transient version is likely benefiting from that)

hiredman01:01:11

And the set is a hash set, so the allocation of internal nodes of the trie will be kind of random (if I recall), so their might not be much of an allocation difference between the transient set and the persistent set

hiredman01:01:13

Nah, there would still be the path copying for the persistent version

👍 2
Célio13:01:28

Thank you for the replies I appreciate that. @U7RJTCH6J Isolating them individually is a good idea, I'll try that.

quoll15:01:50

I wouldn’t know without profiling, but on casual inspection I’m in agreement with previous comments: pop! and into! seem likely suspects. • I don’t know how well reduce over conj! (via the into!) works in comparison to a transduce (which is what into gives you). That would be interesting to look at. • There’s definitely a couple of extra function calls in the stack when doing peek!, though this is typically not a big expense. Using peek dispatches directly into APersistentVector.peek(), which gives you: if(count() > 0) return nth(count() - 1); Clojure does the dec via a function call, and get wraps the nth, so it’s an extra call too. But it’s probably too small to show up. • pop! really stands out. In the general case, transient pop is fast, since it just changes the count. But if it occurs when the transient tail is empty, then it actually has to https://github.com/clojure/clojure/blob/527b330045ef35b47a968d80ed3dc4999cfa2623/src/jvm/clojure/lang/PersistentVector.java#L854 with a copy of existing data, while the immutable pop does need a new root object, but only ever updates the count. But the JVM can do interesting tricks, and intuition can be wrong. So you’ll never know without profiling.

2
👀 2
Célio02:01:57

@U051N6TTC I replaced reduce over conj! with transduce but there was virtually no difference in runtime. Same for get vs nth. I also isolated the next and visited collections, switching them from persistents to transients and vice-verse in the following scenarios: • both next and visited using persistents: ~4.2 secs • both using transients: ~5.6 secs • next using persistents + visited using transients: 4.9 secs • next transients + visited persistents: 5.4 secs So what you said about pop! standing out sounds right since next uses it along with peek! and conj!, while visited uses only conj! .

Célio02:01:33

I got the results above using the same dataset as before. What's interesting though is that, while I was evaluating the performance with randomly generated graphs, I found that the graph search was significantly faster with transients in scenarios where the graph had vertices with less neighbors in average than the dataset used in the tests above. The first graph has vertices with about 15 neighbors average, while the randomly generated graph had an average of about 3 neighbors.

quoll02:01:42

One thing I should have said when commenting on allocating a new tail array… that’s immediately followed by System.arrayCopy from the previous to the new array. Only 32 elements, but it adds up

Célio02:01:01

I guess the lesser number of neighbors correlates with calls to pop! .

quoll02:01:55

Well, everything has pop! called on it, but the frequency of crossing over the array boundaries can vary. Popping 15 will be about 5 times more likely to cross a boundary than 3

quoll02:01:30

And it’s crossing the boundary that causes a new tail array allocation followed by an array copy

👍 2
Célio02:01:00

As a side note, just for curiosity's sake I rewrote into! , replacing reduce with a loop , first using first + rest and then using peek + pop , and it was slower than the reduce version. • first + rest : ~0.5 seconds slower • peek + pop : ~1.5 seconds slower But I digress 🙂

Célio23:01:42

Interesting, turns out pop! takes only a very small bite from the overall runtime. Map lookup comes first, followed by peek! which is really just count + nth. By the way thanks @U7RJTCH6J for the profiler suggestion, it's a great tool!

👍 2
phronmophobic04:01:34

It’s a great tool. I’m sure someone would have suggested it if I hadn’t. They also have a great blog, http://clojure-goes-fast.com/blog/

💯 2
didibus01:01:14

Can you type hint to primitive the inputs of an anonymous function?

(fn [^long a ^long b]
  (+ a b))

hiredman01:01:05

The tricky thing is the compiler might not generate invokePrim calls for anonymous/higher order functions

didibus02:01:59

That might explain why I also had to type hint the return where I was using it.

didibus01:01:39

Or what am I doing wrong here for the type hints?

(fn ^double [^long resource ^long bot ^long time]
            (+ resource
              (* bot time)
                (/ (* time (inc time)) 2)))

didibus01:01:53

Hum, this works:

(fn ^double [^double resource ^double bot ^double time]
                    (+ resource
                       (* bot time)
                       (/ (* time (inc time)) 2)))
Can I not type hint the same to long ?

didibus01:01:39

Ok, I think I got it:

(fn ^double [^long resource ^long bot ^long time]
                    (+ resource
                       (* bot time)
                       (/ (* time (inc time)) 2.0)))

cddr15:01:38

A clojure team I’m in would like to invoke data-science models defined in Python. I know there’s a few ways of invoking python from a jvm program. Any opinions on what might be best fit these days? For background, our app is a data pipeline. At a high-level, we’ll be reading a stream of events, identifying batches that represent a higher-level event, and once these batches are identified, will try to apply them to a data-science model that infers some additional information we’d like to include in our output.

Hendrik15:01:53

It depends on your data and requirements. Simplest would be to serialize data and run python in a subprocess. If this is not feasible, then you can have a look on: https://github.com/clj-python/libpython-clj With libpython-clj you can call python code directly from clj in the same thread. It works very well. However, I had headaches to set it up on m1 mac. If either Java or python is native arm and the other is x86 executed via Rosetta, it crashes. Both have to have the same architecture.

Carsten Behring16:01:04

Model your datapipeline polyglot using DVC (http://dvc.org) is an other option

practicalli-johnny18:01:01

Is there value in invoking the models directly? Consider addinf an API around the data science models if results can be returned in a few seconds Or use a message queue (Kafka) etc and send messages to the models and the message consumer invoke the relevant models, this sounds more appropriate considering you have a data pipeline or if there are multiple models that could be run Is there any value in coupling the Clojure and Python systems? This coupling could add more complexity, so would need to be ballanced with benefit

cddr18:01:25

Thanks everyone. Food for thought.

Hendrik18:01:34

@U05254DQM Regarding the coupling. libpython-clj uses the C-Api from the Python Interpreter directly. So you get 2 benefits: 1. performance. There is almost no overhead in calling python functions. Data could be shared without copying

Hendrik18:01:18

2. State. It runs in the same process, so you do not loose your state and get something similiar to a jupiter notebook but in clj

Ben Lieberman16:01:35

If I have a Clojure type that implements a Java interface, and that Java interface's methods return other Java interfaces, is reify the correct/optimal choice or am I misunderstanding how this is used?

Drew Verlee17:01:40

Does anyone have a simple example project that uses reloaded.repl/reset? i'm getting some very odd behavior e.g sometimes my components are being updated (i just change a print statement) and sometimes they aren't. I'm not sure where the issue is and seeing a simple example might give an idea.

Drew Verlee17:01:01

One thing i'm questioning is if i should be calling go right after set-init . in our app it looks like:

(reloaded.repl/set-init! #((resolve 'centriq-web.dev/new-dev-system)))
  (reloaded.repl/go)

Drew Verlee17:01:29

that's basically our setup, and then i would expect to just call reloaded.repl/reset and have my components re-load with the new state.

Drew Verlee17:01:57

i'm also questioning what that resolve is doing for us.

Drew Verlee17:01:16

I think it just could be the

(reloaded.repl/set-init! centriq-web.dev/new-dev-system)

lukasz17:01:02

That looks like it should do it, but I ran into issues with reloaded.repl around refreshing the code and having the system running, so I built my own repl helpers - and store the running system in an atom, manage stop/start based on its state: https://github.com/lukaszkorecki/rumble/blob/master/src/rumble/repl.clj#L104 (there's more to it as it dynamically finds where the system is).

👀 2
Drew Verlee18:01:18

thanks lukasz that looks like a bit more then i would hope i would have to chew on. for some context into my problem, context that i'm not sure is helpful. The steps i took were to change the 1 in more or less this code:

(compojure/context
    "/graphql"
    _
    (-> (compojure/routes
         (compojure/POST "/2022-06-06" _ "1"))))
to a 2 and then i ran reloaded.repl/reset which outputted :resumed (which i imply to mean it worked as expected). And i still go the "1" as the output from hitting that endpoint. Any ideas on what i should look into are welcome

lukasz18:01:44

Yeah, that's exactly what I was running into :-) some namespace changes would work, and some wouldn't - so if you strip down my code to basics it's more or less (if you ignore the convention that I use to store running system in <service name>.user/SYS - my team works across ~15 services so we had to standardize this): in start function • tell it where is the atom that you'll be storing the running system in • refresh the code • start the system and store it in the atom in your restart fn • stop the system • run start it Should Just Work ™️

Drew Verlee18:01:04

hmm. ill think about that. I'm trying to understand how things are put together with what i have before i start changing it. Intrestingly, the namespace does reload. It's just the defrecord that doesn't seem to change. e.g if i change 1 to a 2 i don't see it print the 2 here:

(defrecord foo [] (component/Lifecycle (start [] (println "1") ...)))

Drew Verlee18:01:30

i bet i'm not setting something up correctly with the components. Like i need to tell reloaded.repl about them?

lukasz19:01:12

Are you stopping the system, refreshing and then starting the system again?

Drew Verlee21:01:33

To my knowledge yes. I see that the namespace was reloaded, e.g if i change a print statement in the ns it picks up that change but NOT the change to the defrecord that contains the life cycle.

phill22:01:07

That's why you have to stop and reinitialize your system of components. Reloading the namespace does not alter the behavior of your existing intances of the old defrecord class. You must remake those records from the new defrecord definition. Likewise any functions held in closure by your running system. To some degree you can work around this (allowing reloading to have a useful effect) by making the protocol methods do very little other than delegate to an ordinary (defn'd) function, because those do get replaced by reload.

phill22:01:14

In any case, stopping and starting your system should be quick, at least compared to the amount of time you could spend troubleshooting non-problems that were artifacts of incomplete activation of newly-loaded definitions.

Drew Verlee23:01:23

@U0HG4EHMH my code looks like this:

(reloaded.repl/set-init! #((resolve 'centriq-web.dev/new-dev-system)))
  (reloaded.repl/go)
To stop and then start the system i call this:
(reloaded.repl/reset)
which comes from a library which is described as > A Clojure library that implements the user functions of Stuart Sierra's http://thinkrelevance.com/blog/2013/06/04/clojure-workflow-reloaded. looking at the code
(defn reset []
  (suspend)
  (refresh :after 'reloaded.repl/resume))
(defn suspend []
  (alter-var-root #'system #(if % (suspendable/suspend %)))
  :suspended)
(defn resume []
  (if-let [init initializer]
    (do (alter-var-root #'system #(suspendable/resume (init) %)) :resumed)
    (throw (init-error))))
Based on reading the code, and from the docs, and from tutorials, this all looks correct to me.

Drew Verlee23:01:46

though i would have to dig a bit into what resume is doing a bit, as its not clear to me how alter-var-root is working. I feel like alter-var-root isn't exactly a celebrated function in the clojure community. I mean, i don't recall it coming up much in any of the books i read as the predominate way to manage state followed with lots of little helpful examples.

Drew Verlee23:01:53

usually its all about STM and atoms.

Drew Verlee23:01:54

Is everyone just rolling there own versions of reloaded workflows using alter-var-root and #'s? because if so we need more educational material on it.

lukasz17:01:38

Not necessarily, but gotchas around reloading and its impact on records and protocols are definitely not mentioned enough

lukasz17:01:03

I rolled my own because I had my own needs (and wanted to understand what's going on), but fundamentally it's not that different from reloaded.repl

Drew Verlee17:01:52

I wonder if some kind of macro and a visualization tool might be used to give more visual feedback as to whats going on. Maybe even an emacs plugin that shows which state was reloaded and which is out of sync with the files. (assuming that makes sense. Ideally it would just always work though right?

lukasz17:01:57

I mean, It does work, I use my repl helpers every day - but it does require getting use to how things work, as in what's safe to refresh while system is running (functions, vars) vs what's not (records, protocols) - it definitely is confusing. In my early days I would just restart the whole JVM process (which took ages) and I was wondering how does everyone else work because startup times were insufferable

Ben Lieberman17:01:29

I think the following definition should give me access to a .send method, no?

(deftype MessageContext []
  JMSContext
  (createProducer [_]
    (reify JMSProducer
      (^JMSProducer send [_ ^Destination destination ^String body]
        (println "sending msg")))))
This compiles but when I try to invoke it I get no matching method send found taking two args

hiredman18:01:57

access how or where? when does it complain? compilation, runtime? are you sure you are calling .send on the object created by reify?

Ben Lieberman18:01:47

(def context (->MessageContext))

(def producer (.createProducer context))

(comment (.send producer [] "foobar"))
Runtime error (`IllegalArgumentException`). My best guess is that it doesn't like me using a PersistentVector as a Destination but idk

hiredman18:01:41

yeah, because the type of producer is not known at the call site it is a reflective call

hiredman18:01:54

reflection is looking for a method where the types match

2
hiredman18:01:04

no matching method

Alex Miller (Clojure team)22:01:42

https://clojurians.slack.com/archives/C0C4WV96U/p1673042571865769 - see for info on tickets, call for presentations, and sponsorship!

👍 9
❤️ 4
clojure-spin 13
Drew Verlee23:01:58

quick poll. What are you using to manage stop and start your stateful (e.g postgres) things/components. • https://github.com/weavejester/reloaded.repl • something else • you rolled your own thing using xyz clojure core functions like var-alter-root? • atoms • Integrant

kwladyka23:01:20

integrant or if above fail kill and start REPL

👍 4
kwladyka23:01:12

(defonce state nil)

(defn- start-system []
  (alter-var-root #'state
                  (fn [current-state]
                    (ig/init (ig/prep config)))))

(defn stop-system []
  (when state
    (alter-var-root #'state ig/halt!)))

(defn restart-system []
  (stop-system)
  (start-system)

👍 2
kwladyka23:01:01

[integrant.core :as ig]

pppaul23:01:17

integrant, mount

Benjamin C23:01:06

Have not personally tried yet, but I'm planning to try out https://github.com/nivekuil/nexus.

Drew Verlee23:01:11

I'm pretty wary of pathom stuff myself (i see it as useful for glueing togther lots of dbs) but that looks interesting none the less. Thanks.

kwladyka00:01:20

nexus looks interesting

kwladyka00:01:39

it bring things to next level

kwladyka00:01:01

I am curious if there are issues with current version comparing to integrant

kwladyka00:01:07

some corner cases etc.

kwladyka00:01:17

more memory consumption because of pathom / 1 sec. longer cold start or other not obvious things - just examples not suggestions

Benjamin C00:01:39

@U797MAJ8M would be better able to tell you, but last I heard it was still in the somewhat experimental stage.

Benjamin C00:01:33

@U0DJ4T5U1 I'm curious what about pathom makes you wary about it? Or maybe just the overhead of having another dependency + having to grok it when things go wrong?

Cora (she/her)01:01:51

"mostly" because I use mount only for the reloading bits but don't use it for global state

👍 2
practicalli-johnny01:01:36

Integrant are what I use when there are multiple system components, coupled with aero for a config across environments Integrant REPL supports my Repl workflow, called from a dev/user namespace

👍 2
markbastian02:01:46

I use integrant.

nivekuil06:01:54

hi, I would say nexus is more unpolished than experimental. It should generally function just fine. It probably does use more memory than integrant (a few kbs tops?) but the cold start perf should actually be faster because pathom gives us parallel initialization for free

nivekuil06:01:28

that said pathom is not trivial to learn. I think of it as a programming paradigm unto itself, turning clojure from a functional imperative language to a functional declarative language

nivekuil06:01:20

the upshot of that is if you already know/want to learn pathom it's one less thing you have to learn -- one general declarative logic engine instead of a specialized one just for dependency injection 🙂 it's also conceivable to do some fancy things with this, like initializing over network from a centralized config store

p-himik07:01:13

Integrant. Throwing looks at juxt/clip. @U797MAJ8M The example in nexus is so small and limited that I struggle to see a few things: 1. What happens when the map that I need to pass to nx/init grows to hundreds of entries? Do I then have a giant map with all the keys at one level, merged together? 2. What if I need 2 different instances of the my.ns/call-api component, initialized with different parameters? 3. The README states: "The inversion of control is good to have here, because it means we can easily stub it out for tests and such". But it doesn't actually show how you can stub things out. Especially while having the item above in mind 4. It seems to rely on a very specific form of the arglist. What if my component depends on 5 components from other namespaces? What if it doesn't know the namespace that one of its components comes from and just knows that it must be a function with side effects and no arguments? 5. This might be answered automatically if I understand answers for items 1-4, but how is it different from passing a giant map around while calling all the functions directly from each other?

👍 2
dharrigan08:01:35

donut system

👀 2
mdiin08:01:23

Component, looking to try out nexus at some point

Ben Sless10:01:40

Component, possibly with tools namespace Or just nothing

nivekuil10:01:32

@U2FRKM4TW yes the example is simple, I wrote it in a few minutes in response to an issue. feel free to request a more complex use case and I will get to it sometime. Those are all good questions, 2. in particular was asked by someone else. In general I think all 5 are answered by learning Pathom -- the README does assume that you know Pathom, I think there's no way around learning Pathom given how thin a layer nexus is atop it, and I don't think it's worth learning Pathom just for nexus (although I think Pathom is worth learning in general as clojure's killer app) But quickly, 1. yes 2. I haven't had this use case myself but it seems like an instance of a pretty typical use case for pathom, calling one resolver and getting two different outputs from it. Either a join or using params should work there. 3. you would just pass the output of the stub directly as a key in the map 4. not sure what you mean exactly, I think you just have to understand that nx/def is 99% just pco/defresolver and how pathom works 5. this is the difference between imperative (telling the computer the steps to do) and declarative (telling the computer the result you want). The benefit is especially visible when you have a very deep graph of dependencies

p-himik11:01:33

I see, thank you. I'm postponing delving into Pathom till it stops getting new major versions. :) The memories of Django and Python still haunt me.

jpmonettas14:01:42

I use components, mount and some times a custom thing, together with tools namespace for refreshing. Also with some Emacs menus so I can move the system into different states with hotkeys as I demoed here : https://youtu.be/2nH59edD5Uo?t=959 which I think make a big difference in dev flow once you get used to it

upvote 1
igrishaev12:01:22

Usually, component/integrant is enough. What much more important is, how you start/stop the system in tests, and how you override some parts of it.

Thierry21:01:15

I use mount.lite, mostly because it was already used in the project I maintain for work. Recently updated it to the newest version. https://github.com/aroemers/mount-lite Besides this I use ragtime for migrations and hugsql for prepared statements.