Fork me on GitHub
#beginners
<
2021-12-08
>
popeye06:12:37

I saw this code in reagent doc

(defn timer-component []
  (let [seconds-elapsed (reagent/atom 0)]   
    (fn []     
      (js/setTimeout #(swap! seconds-elapsed inc) 1000)
      [:div "Seconds Elapsed: " @seconds-elapsed])))

popeye06:12:56

can we also write this as ?

(defn timer-component []
  (let [seconds-elapsed (reagent/atom 0)]   
    (do
    (js/setTimeout #(swap! seconds-elapsed inc) 1000)
      [:div "Seconds Elapsed: " @seconds-elapsed])))

phronmophobic07:12:19

the first version returns a function, but the second returns a vector. This makes a difference in how reagent handles it. There's an explanation that follows: > The previous example also uses another feature of Reagent: a component function can return another function, that is used to do the actual rendering. This function is called with the same arguments as the first one. > This allows you to perform some setup of newly created components without resorting to React’s lifecycle events.

popeye07:12:25

Thanks for response @U7RJTCH6J, But how it will be useful if it returned as function ?

phronmophobic07:12:38

reagent will check to see if you returned a function or not. if you do return a function, reagent will assume that your component just does the setup when called and that it returns the actual render function

popeye07:12:52

oh so it is regent specific ? I just started to learn reagent

phronmophobic07:12:40

yea, it's a reagent specific thing

popeye07:12:01

oh ok! thanks @U7RJTCH6J

👍 1
popeye07:12:28

do I need to learn react for it ?

phronmophobic07:12:44

I'm not sure it's completely necessary, but it would definitely help.

Benjamin08:12:51

Where can i read on what rules are and how to code with rules?

Benjamin08:12:09

From what I have gathered it is using maps instead of conditionals

cddr10:12:14

Is this the kind of thing you're thinking of? http://www.clara-rules.org/docs/approach/

Benjamin10:12:10

gonna check thanks

Romit Gandhi10:12:58

Hello all, I want to make XML dynamically means from the data like JSON I want to make XML. Does any one who how to make it? Thanks. Eg: Data: {:user [{:name "Romit"} {:name "Demo}]} Should be converted to <root> <user> <name> "Romit" </user> <user> <name> "Demo" </user> </root> Likewise. Thanks.

ryan11:12:42

Maybe a library like https://github.com/clojure/data.xml will help. Could use hiccup format with sexp-as-element

maverick11:12:51

How can I convert Java Map<String,String> to clojure map ?

Jelle Licht11:12:22

(into {} your-map)

Jelle Licht11:12:03

But in most cases, there’s no need to do this, as many clojure functions already work on Map

maverick11:12:54

Okay thanks

sheluchin15:12:53

I have a very large seq like this:

((([4 4] [4 4] [4 4]))                             
 (([4 4] [4 4] [4 4]))                             
 (([0 2])                                          
  ([0 1])                                          
  ([0 2])                                          
  ([0 1] [0 1] [0 1] [0 1] [0 1] [0 1] [0 1] [0 1])
  ([0 1])))                                        
I need to compute two numbers: a) the sum of all first integers in each entry b) the sum of all second integers in each entry Is there an efficient approach for doing this? I think at the moment it's overloading my memory and slowing down as it gets further ahead. I guess I need to partition-all of it to operate on smaller chunks and then reduce the totals?

pithyless15:12:31

(filter vector? (tree-seq list? identity data)) will give you a seq of tuples; then you can use reduce , combine with transducers, etc. to get the sums you want

sheluchin19:12:04

tree-seq is kinda of hard to understand..

sheluchin19:12:11

Like here:

(tree-seq list? identity '((1 2 (3)) (4) 5))
I'm expecting 5 to be excluded from the result because list? tests false for it. But it returns:
(((1 2 (3)) (4) 5) (1 2 (3)) 1 2 (3) 3 (4) 4 5)

sheluchin19:12:21

(tree-seq branch? children root)
> Will only be called on nodes for which branch? returns true. So shouldn't it NOT call identity (`children`) on 5?

pithyless20:12:44

@UPWHQK562 it didn't, but it did call it on the list that included 5; and on that list identity returns all the elements of the list - including 5

pithyless20:12:06

user=> (tree-seq list? #(do (prn %) %) '(((1 2 (3)) (4) 5) (1 2 (3)) 1 2 (3) 3 (4) 4 5))

((((1 2 (3)) (4) 5) (1 2 (3)) 1 2 (3) 3 (4) 4 5)
((1 2 (3)) (4) 5)
(((1 2 (3)) (4) 5) (1 2 (3)) 1 2 (3) 3 (4) 4 5) (1 2 (3))
((1 2 (3)) (4) 5) (1 2 (3)) 1 (3)
2 (3) (4)
3 (4) 4 (1 2 (3))
5 (1 2 (3)) 1 (3)
2 (3) 3 1 (3)
2 (3) 3 (4)
3 (4) 4 4 5)

pithyless20:12:33

^ each of those lines is what is being called with identity

pithyless20:12:38

the trick is that all the other elements in the nested lists will still be returned by tree-seq - but only those that are branch? will also be expanded via children

pithyless20:12:12

> I'm expecting 5 to be excluded from the result because `list?` tests false for it. But it returns: Don't think of tree-seq as filter or remove -> instead, it's like seq but instead of just returning all the elements in the list, it will also recursively expand (kind of like a recursive mapcat) anything that looks like a nested branch. So, tree-seq will return the same amount of items as seq (or perhaps more); and then you need to filter as usual in a second step.

Sam Ritchie04:12:17

(transduce (filter vector?)
           (fn
             ([acc] acc)
             ([[l-acc r-acc] [l r]]
              [(+ l-acc l) (+ r-acc r)]))
           [0 0]
           (tree-seq (complement vector?) identity input))

Sam Ritchie04:12:03

@UPWHQK562 this is how you would use transduce to filter out just the vectors (if that is indeed how leaves are defined), and how you can accumulate both sums in parallel

sheluchin14:12:18

Thanks @U017QJZ9M7W and @U05476190. I'm going to have to take some time to study this stuff. Still, no matter what I try, I can't seem to get this to work efficiently. It starts to get quite slow due to low memory after a while. Here's the snippet and the results:

(doseq [part (->> (extract-data)                                                               
                  (map transform1)                                                             
                  (map (fn [{:keys [x data]}]                                                  
                         ; returning this instead produces a result very quickly               
                         ; so I'm convinced I've isolated the latency to this code             
                         ; {:x x :y 0 :z 0}                                                    
                         (let [[y z] (transduce (filter vector?)                               
                                                (fn                                            
                                                  ([acc] acc)                                  
                                                  ([[l-acc r-acc] [l r]]                       
                                                   [(+ l-acc l) (+ r-acc r)]))                 
                                                [0 0]                                          
                                                (tree-seq (complement vector?) identity data))]
                           {:x x :y y :z z})))                                                 
                  ; tried all kinds of partition sizes, not much difference                    
                  (partition-all 50))]                                                         
  (prn (count part))))                                                                         
;; returning static map
13:56:38.459Z - 1000
13:56:38.637Z - 1000
13:56:38.826Z - 1000
13:56:38.993Z - 1000
13:56:39.171Z - 1000
13:56:39.354Z - 1000
13:56:39.508Z - 1000
13:56:39.646Z - 1000
13:56:39.789Z - 1000
13:56:39.949Z - 1000
13:56:40.125Z - 1000
13:56:40.242Z - 1000
13:56:40.351Z - 979
"Elapsed time: 2081.846321 msecs"

;; transducing totals
13:57:27.456Z - 1000
13:57:30.291Z - 1000
13:57:33.112Z - 1000
13:57:35.567Z - 1000
13:57:39.327Z - 1000
;; major slowdown slow here
13:58:06.428Z - 1000
13:58:30.104Z - 1000
13:58:33.682Z - 1000
13:58:43.273Z - 1000
13:58:49.193Z - 1000
13:58:50.857Z - 1000
13:58:51.640Z - 1000
13:58:52.349Z - 979
"Elapsed time: 86573.092394 msecs"
Any suggestions for how I might fix this? I was under the impression that by using lazy sequences, partitioning them into small parts, and then realizing them with doseq, I could make this efficient even with a small amount of available memory, but I can't seem to get it to work well.

Sam Ritchie14:12:45

@UPWHQK562 can you put up a gist with some slow example data?

sheluchin17:12:35

@U017QJZ9M7W I'm trying but when I dump the data to an edn file and run my minimal example code against it, it's fast lol I guess I'm missing something here. Something about extract-data or transform1 must be adding load and leading to latency.

sheluchin17:12:14

Going to try profiling it with https://github.com/ptaoussanis/tufte and seeing if that gives some insights.

sheluchin19:12:27

I'm not having much luck figuring this out. When I export the data to a file and process its lines, it processes quickly. When I process the data in place (without exporting), the transduce-totals function @U017QJZ9M7W provided above eats up the majority of the processing time and its quite slow overall. Here's some profile data with each snippet:

(profile {}
  (doseq [part (->> (extract-data)
                    (map #(p :details (transform1 %)))
                    (map #(p :totals (transduce-totals %))))]
   (log/info)
   (prn (count part))))

; pId           nCalls        Min      50% ≤      90% ≤      95% ≤      99% ≤        Max       Mean   MAD      Clock  Total
;
; :totals       10,265     2.93μs   462.61μs     3.11ms     6.19ms    45.93ms     6.60s      5.95ms ±163%     1.02m     91%
; :details      10,265    58.57μs   208.15μs   390.52μs   551.75μs     2.73ms    64.50ms   338.06μs  ±69%     3.47s      5%
;
; Accounted                                                                                                   1.08m     97%
; Clock                                                                                                       1.11m    100%


(defn slow-sample []
  (with-open [rdr ( exported-edn)]
    (doseq [part (->> (line-seq rdr)
                      (map read-string)
                      (map transduce-totals))]
      (prn (count part)))))

(time (slow-sample))
; => "Elapsed time: 1340.89125 msecs"

Sam Ritchie19:12:15

It is hard to help without seeing those other functions. Can you a small project that can reproduce this?

Sam Ritchie19:12:53

Also is there any structure to this input beyond “nested sequences that eventually bottom out in vector s?”

sheluchin20:12:43

These are operations on a git repo to extract its log data using jgit. The input at the transduce-totals step of the ->> is just a seq of maps. Does what happens in the previous steps matter in such a scenario? I thought the steps would be isolated from each other.

sheluchin20:12:18

They are all maps exactly like this:

{:sha "cd946b229bd2316cfe8c336badb0392b38c81015", :changes (([4 4] [4 4] [4 4]))}
{:sha "c4cd9d58808ce00916a495bf03b5706c07b8a148", :changes (([0 2]) ([0 1]) ([0 2]) ([0 1] [0 1] [0 1] [0 1] [0 1] [0 1] [0 1] [0 1]) ([0 1]) ([0 1] [0 1] [0 1] [0 1] [0 1]) ([0 1]) ([0 4] [0 3] [0 2] [0 2] [1 1] [0 6] [0 2]) ([0 2] [0 13]) ([0 1]) ([19 0] [1 2] [1 1] [1 1] [21 13]) ([0 7]) ([0 9]) ([0 9]) ([0 1] [0 1] [0 27]))}
{:sha "3742992e0204ed8a1b2b559cf5f34afb1805b8e3", :changes (([0 1] [1 1] [0 2] [0 1] [0 1] [1 1] [1 2] [1 3] [0 1] [0 1]) ([1 1] [1 1] [1 1] [1 1] [1 1] [1 1]) ([2 3]))}
{:sha "bf9445c365f663d484b4fce480cd77456e56d0b1", :changes (([2 1] [2 1]) ([0 3]))}
{:sha "ef18886ade283a0c51ca5568cc06a0ae78574609", :changes (([1 1]) ([1 1] [1 1] [1 1] [1 1]) ([1 1]) ([1 1]) ([0 4]) ([5 5] [3 3]) ([1 1] [1 1] [1 1] [1 1] [1 1]) ([1 1] [1 3] [2 1] [0 10] [13 0] [1 1]) ([0 170]) ([0 46]) ([0 246]) ([1 1] [1 1] [6 2] [1 1] [1 1] [1 1] [1 1] [1 1] [1 1] [1 1] [1 1] [1 1] [1 1]) ([1 1] [1 1]) ([0 1] [0 1] [2 39] [1 1] [8 0]) ([1 0] [1 0] [1 0] [1 0] [1 0] [16 0] [90 0] [70 0]))}

sheluchin20:12:43

I can work on a minimal example project. At this point it's more about understanding why I'm getting this behaviour and how to avoid it rather than fixing this exact implementation. I can think of other ways to get the data - there are workarounds.

Sam Ritchie20:12:41

Something is hanging on to the head of the sequence as you materialize it from jgit, in way that is not happening when you pull from the file

Sam Ritchie20:12:53

When I get home shortly I will show a better way of processing this

🙏 1
Sam Ritchie21:12:22

@UPWHQK562 okay, here we go. and what is the final result you want from that?

Sam Ritchie21:12:34

totals for each map, along with one of the keys extracted?

Sam Ritchie21:12:27

if you want the global totals, try this:

(defn global-totals
  "the first transducer pulls out the `:changes` entry for each map and
  concatenates them all together. The second one, `cat`, concatenates all the
  subsequences together. This will feed only the vectors into the reducing
  function."
  [data]
  (let [xform (comp (mapcat :changes) cat)
        f     (completing
               (fn [acc item]
                 (mapv + acc item)))]
    (transduce xform f [0 0] data)))

sicmutils.env> (time (global-totals (take 100000 (cycle inputs))))
"Elapsed time: 2386.474 msecs"
[6440000 14540000]

Sam Ritchie21:12:37

this can do 500,000 maps in 2.3 seconds

sheluchin21:12:15

The changes key is a set of tuples. Need to sum of first positions within each tuple (per map) and the sum of second positions, likewise per map. The sha key just gets extracted and is used to identify what commit each sum pair belongs to.

sheluchin21:12:50

So result should be a set of maps, each with a sha and changes reduced into deletions/insertions. The computation was producing the correct value already, it's just too slow.

Sam Ritchie21:12:01

awesome, let me post something to try that does it per map

Sam Ritchie21:12:15

that should be absolutely no problem so the slowdown has gotta be somebody holding onto the full sequence of maps

Sam Ritchie21:12:48

that function above is a good one to stare at if you have not used the transducer idea yet

Sam Ritchie21:12:00

(while elevator music plays and I type)

Sam Ritchie21:12:02

(defn global-totals [data]
  (transduce (comp (mapcat :changes) cat)
             (completing
              (partial mapv +))
             [0 0]
             data))

Sam Ritchie21:12:13

alternate way of writing it, with no let to give things names, and a partial use for fun

sheluchin21:12:23

Thanks very much for helping here. I might be a little slow to respond, just afk right now... But I'll definitely dig right into this.

Sam Ritchie21:12:15

(defn change-sum
  "Collapses a sequence of changes into a pair of sums; the first entry is the sum
  of all first entries in the leaves of each changeset, the second is the sum of
  all second entries.` "
  [xs]
  (let [f (completing (partial mapv +))]
    (transduce cat f [0 0] xs)))

(defn sum-changes [m]
  (update m :changes change-sum))

Sam Ritchie21:12:25

sicmutils.env> (map sum-changes inputs)
({:sha "cd946b229bd2316cfe8c336badb0392b38c81015", :changes [12 12]} {:sha "c4cd9d58808ce00916a495bf03b5706c07b8a148", :changes [44 127]} {:sha "3742992e0204ed8a1b2b559cf5f34afb1805b8e3", :changes [12 23]} {:sha "bf9445c365f663d484b4fce480cd77456e56d0b1", :changes [4 5]} {:sha "ef18886ade283a0c51ca5568cc06a0ae78574609", :changes [250 560]})

Sam Ritchie21:12:58

@UPWHQK562 change-sum does what you want for the entry under :changes, and then sum-changes uses that to make a function that processes each map individually

Sam Ritchie21:12:18

so now if you do (map sum-changes inputs), you will get a lazy sequence of transformed maps, with that entry updated

Sam Ritchie21:12:47

slightly slower, roughly 2.3 seconds to do 100k maps

sicmutils.env> (time (nth (map sum-changes (cycle inputs))  100000))
"Elapsed time: 2319.499959 msecs"
{:sha "cd946b229bd2316cfe8c336badb0392b38c81015", :changes [12 12]}

Sam Ritchie21:12:27

but even at 500k entries if I hold on to the head by binding it like this:

(def input-hold (cycle inputs))

sheluchin21:12:51

Do you think using this code instead of the previous function will resolve something holding the whole seq in memory?

Sam Ritchie21:12:18

well, I am less convinced now that my guess was right, since these maps are not big…

Sam Ritchie21:12:45

but yeah I don’t reach for tree-seq much, so it could be that collapsing all of the maps into one and then using tree-seq materializes a huge amount of stuff?

Sam Ritchie21:12:00

but I would be very surprised if this approach is slow for you (though I’m prepared to be surprised 🙂

sheluchin21:12:51

I'll try and post results. Probably tomorrow. It could be RevCommit objects being held in the JVM.

sheluchin21:12:48

Thanks again @U017QJZ9M7W. This community is amazing!

❤️ 1
sheluchin14:12:59

@U017QJZ9M7W it turns out that just one or two of the changesets in the repo I'm exploring contain a large number of changes - in the hundreds of thousands. I tried it using your code above and it still chokes up and performs pretty slowly, which makes sense, because change-sum will hold all that data when transducing it.

pId              nCalls        Min      50% ≤      90% ≤      95% ≤      99% ≤        Max       Mean   MAD      Clock  Total

:sum-changes     10,265     1.87μs   366.70μs     2.29ms     4.60ms    36.34ms     5.78s      5.26ms ±167%    54.03s     93%
:details         10,265    82.01μs   166.60μs   324.75μs   463.01μs   982.93μs    83.94ms   239.72μs  ±59%     2.46s      4%

Accounted                                                                                                     56.49s     98%
Clock                                                                                                         57.86s    100%
It is an edge case, but I think I will come across it once in a while.

Sam Ritchie14:12:12

@UPWHQK562 it should not hold anything in memory while transducing - a transducer will happily walk and realize a lazy sequence, transducing as it goes. probably the code producing the changesets is realizing it all in memory, vs realizing it in a lazy way (or with a java iterator for example)

Sam Ritchie14:12:51

for example:

(def example-changeset
  '(([0 2])
    ([0 1])
    ([0 2])
    ([0 1] [0 1] [0 1] [0 1] [0 1] [0 1] [0 1] [0 1])
    ([0 1])
    ([0 1] [0 1] [0 1] [0 1] [0 1])
    ([0 1])
    ([0 4] [0 3] [0 2] [0 2] [1 1] [0 6] [0 2])
    ([0 2] [0 13])
    ([0 1])
    ([19 0] [1 2] [1 1] [1 1] [21 13])
    ([0 7])
    ([0 9])
    ([0 9])
    ([0 1] [0 1] [0 27])))

sheluchin14:12:17

But then wouldn't that show as latency in those steps of the ->> instead of in the sum-changes step? I'm confused why the time is being consumed in this particular portion of the ->> if it's really happening in another step.

Sam Ritchie14:12:20

15 items; you can use cycle to get an infinite lazy stream of those elements, repeating, so we can check the function’s performance on big stuff

Sam Ritchie14:12:45

2.5 seconds for 1M items:

sicmutils.env> (time (change-sum
                 (take 1000000
                       (cycle example-changeset))))
"Elapsed time: 2578.9755 msecs"
[2933305 8466638]

Sam Ritchie14:12:12

@UPWHQK562 I think the key to debugging is to try and isolate a single record that is going very slow

sheluchin14:12:27

I have that record.

Sam Ritchie14:12:45

what does its :changes field look like?

Sam Ritchie14:12:47

can you gist?

Sam Ritchie14:12:27

I agree that 5.78s is very strange

Sam Ritchie14:12:35

unless it is 2M items!

sheluchin14:12:43

Yep, gisting it. 5s is strange, but the total time there is about a minute usually!

Sam Ritchie14:12:17

it is faster btw if we skip the mapv thing and just directly add the items

Sam Ritchie14:12:22

(defn change-sum
  "Collapses a sequence of changes into a pair of sums; the first entry is the sum
  of all first entries in the leaves of each changeset, the second is the sum of
  all second entries.` "
  [xs]
  (letfn [(f
            ([] [0 0])
            ([acc] acc)
            ([[l-acc r-acc] [l r]]
             [(+ l-acc l) (+ r-acc r)]))]
    (transduce cat f xs)))

Sam Ritchie14:12:27

a little more than 2x faster

Sam Ritchie14:12:59

sicmutils.env> (time (change-sum
                 (take 1000000
                       (cycle example-changeset))))
"Elapsed time: 1327.460958 msecs"
[2933305 8466638]

Sam Ritchie14:12:44

I also took away “completing”, since all it does is add that single-arity version that just returns the result; and I added a 0-arity that provides the starting value for the transduce

Sam Ritchie14:12:37

haha that record is a monster @UPWHQK562, my browser tab is choking!

Sam Ritchie14:12:48

on my machine (and you had found something like this before) summing the values in that record takes 20ms

Sam Ritchie14:12:10

when you isolate it do you see that too?

sheluchin14:12:13

lol I know 🙂 Transduce is something I've avoided up until now because it seemed like complexity that I didn't need as a beginner, but now that I'm running into performance issues of this kind, I guess it's time to watch Rich's talk on transducers and do some studying. The concept is not clear.

Sam Ritchie14:12:24

here is how to think about it -

Sam Ritchie14:12:11

in this case these are identical:

(defn change-sum
  "Collapses a sequence of changes into a pair of sums; the first entry is the sum
  of all first entries in the leaves of each changeset, the second is the sum of
  all second entries.` "
  [xs]
  (letfn [(f
            ([] [0 0])
            ([acc] acc)
            ([[l-acc r-acc] [l r]]
             [(+ l-acc l) (+ r-acc r)]))]
    (reduce f (mapcat identity xs))))

(defn change-sum
  "Collapses a sequence of changes into a pair of sums; the first entry is the sum
  of all first entries in the leaves of each changeset, the second is the sum of
  all second entries.` "
  [xs]
  (letfn [(f
            ([] [0 0])
            ([acc] acc)
            ([[l-acc r-acc] [l r]]
             [(+ l-acc l) (+ r-acc r)]))]
    (transduce cat f xs)))

Sam Ritchie14:12:18

don’t worry about how it works, just think about it as a combo of some mapping / filtering / mapcatting transformation step and a reduce at the same time. if you do it the first way, (mapcat identity xs) is going to make a new sequence; but then that sequence is immediately eaten up by the reduce and collapsed into the final counts

Sam Ritchie14:12:26

so transduce is reduce with an extra slot for a “transform”

sheluchin14:12:42

I gotta run out for a short while unfortunately. Duty calls 🙂 will be back shortly to dig back into this.

Sam Ritchie14:12:45

@UPWHQK562 in this case, it is cat because we have a sequence of sequences of vectors, and we want to concatenate them all

Sam Ritchie14:12:11

@UPWHQK562 I think your issue is somewhere in how this data is getting produced; maybe it is streaming from jgit, and there is some rate limiting thing going on?

Sam Ritchie14:12:21

so that you block waiting for that :changes entry to appear

Sam Ritchie14:12:00

you could test this by timing (def all-data (doall (get-the-data))) , where doall will force the whole sequence. then separately time the reduction

sheluchin16:12:16

(time
  (porc/with-repo repo-path
   (let [df (#'querying/diff-formatter-for-changes repo)
         old-tree-iter (EmptyTreeIterator.)
         reader (.newObjectReader (.getRepository repo))]
      (time (def all-data (doall (->> (extract-repo-commits repo)
                                      (map #(detailed-changed-files repo % df old-tree-iter reader))))))
      (time (doall (map transduce-totals all-data))))))
"Elapsed time: 1890.655849 msecs"
"Elapsed time: 47884.185641 msecs"
"Elapsed time: 49778.123504 msecs"

sheluchin16:12:21

@U017QJZ9M7W still looks like the slow part is the reduction... for some reason I totally don't understand.

sheluchin16:12:23

With the newest change-sum and isolating it to the single record:

(time
      (porc/with-repo repo-path
       (let [df (#'querying/diff-formatter-for-changes repo)
             old-tree-iter (EmptyTreeIterator.)
             reader (.newObjectReader (.getRepository repo))]
          (->> (porc/git-log repo :since "439fd78cd426b7ba2b1a1cba0c712b018a5c27c3^"
                                  :until "439fd78cd426b7ba2b1a1cba0c712b018a5c27c3"
                                  :rev-filter (. RevFilter NO_MERGES))
               (map #(detailed-changed-files repo % df old-tree-iter reader))
               (map sum-changes-2)
               doall)))))
({:sha "439fd78cd426b7ba2b1a1cba0c712b018a5c27c3", :changes [65755 188558]})
"Elapsed time: 8077.543741 msecs"
({:sha "439fd78cd426b7ba2b1a1cba0c712b018a5c27c3", :changes [65755 188558]})
"Elapsed time: 5888.550605 msecs"
({:sha "439fd78cd426b7ba2b1a1cba0c712b018a5c27c3", :changes [65755 188558]})
"Elapsed time: 6133.369632 msecs"
({:sha "439fd78cd426b7ba2b1a1cba0c712b018a5c27c3", :changes [65755 188558]})
"Elapsed time: 6327.125068 msecs"
({:sha "439fd78cd426b7ba2b1a1cba0c712b018a5c27c3", :changes [65755 188558]})
"Elapsed time: 6329.911996 msecs"

Sam Ritchie17:12:51

Sorry, did this isolate the fetch, then once the fetch is done with do all, THEN do the sum?

sheluchin17:12:24

https://clojurians.slack.com/archives/C053AK3F9/p1639153516124200?thread_ts=1638975773.014100&amp;cid=C053AK3F9 that one did it that way but using transduce-totals. Let me fixup the most recent snippet that isolates it.

sheluchin17:12:37

(time
  (porc/with-repo repo-path
   (let [df (#'querying/diff-formatter-for-changes repo)
         old-tree-iter (EmptyTreeIterator.)
         reader (.newObjectReader (.getRepository repo))]
     (time (def all-data (doall (->> (porc/git-log repo :since "439fd78cd426b7ba2b1a1cba0c712b018a5c27c3^"
                                                        :until "439fd78cd426b7ba2b1a1cba0c712b018a5c27c3"
                                                        :rev-filter (. RevFilter NO_MERGES))
                                     (map #(detailed-changed-files repo % df old-tree-iter reader))))))
     (time (doall (map sum-changes-2 all-data))))))
"Elapsed time: 7.821379 msecs"
"Elapsed time: 4990.340047 msecs"
"Elapsed time: 5000.335191 msecs"
Looks like it's quick as can be to sum-changes when it's isolated to a single record.

sheluchin17:12:40

Do you think the detailed-changed-files is holding onto all the data, filling memory, and then making sum-changes-2 look slow due to lack of available memory when they aren't done as separate steps?

Sam Ritchie17:12:47

yeah I bet detailed-changed-files is doing something lazy, and when you actually go to access each record it forces it to go hit the network or something

Sam Ritchie17:12:14

@UPWHQK562 it is very very suspicious to me that querying all of those records etc, populating all the lists, vectors maps etc would take 7ms

Sam Ritchie17:12:56

the doall was an attempt to force side effects, but in this case I think you need to do the equvalent of mapping doall across the sequence and forcing that

Sam Ritchie17:12:13

does that make sense?

sheluchin17:12:36

It might take 7ms because in that last snippet I'm limiting the output to a single revision by using the same since/until.

sheluchin17:12:34

detailed-changed-files has a pair of nested for-loops to go through each entry in the diff list. No network calls or anything like that, all local.

Sam Ritchie17:12:32

How about this- can you do the aggregation call twice and time each one?

Sam Ritchie17:12:38

See if it is faster the second time

pithyless17:12:48

@UPWHQK562 I suspect all-data is keeping some nested lazy data. Normally I'd say force a realization, eg. spit it all to a file, then slurp it back - but if this is not feasible due to Java objects, than evaling over the same collection twice as @U017QJZ9M7W suggests should do the trick.

sheluchin18:12:06

@U05476190 yeah, I've actually gone through that above. The spit/slurp thing makes it so reading the data is very quick. It's an available workaround, but doesn't really advance my understanding of how to do this properly/better next time around 🙂

Sam Ritchie18:12:58

The way forward usually is to try and isolate the piece that you force, and figure out which tiniest function call is slow

sheluchin18:12:24

> How about this- can you do the aggregation call twice and time each one? So get all the data and then pass it to the transducer twice?

Sam Ritchie18:12:05

(time (doall (map sum-changes-2 all-data)))

Sam Ritchie18:12:14

Add this line a second time after the first

pithyless18:12:14

I would profile the function - so we know where the CPU (and/or I/O) is actually hanging. https://github.com/clojure-goes-fast/clj-async-profiler is your friend

pithyless18:12:27

If you need some help getting the profile working this should help you get started: http://clojure-goes-fast.com/blog/profiling-tool-async-profiler/ and/or just ask :] I think a flamechart would clear up a lot of questions.

pithyless18:12:08

I'd check the :cpu flamegraph first, but if it's a question of laziness - the :allocation flamegraph may also be insightful.

sheluchin18:12:17

@U017QJZ9M7W

"get data"
"Elapsed time: 14.498089 msecs"
"first sum-changes"
"Elapsed time: 14630.261689 msecs"
"second sum-change"
"Elapsed time: 8.897242 msecs"
"Elapsed time: 14658.340088 msecs"

Sam Ritchie18:12:14

Woohoo, so some lazy thing is getting forced by the traversal to get the sums

Sam Ritchie18:12:23

And the second time everything is already in memory

sheluchin18:12:30

@U05476190 I'll try those. I've been using tufte for profiling but it doesn't offer memory profiling, just timing. I'll definitely try your link - good tool to get familiar with.

Sam Ritchie18:12:39

You're getting close!

sheluchin18:12:43

I've decided if I don't figure it out today - with your generous help or without 🙂 - I'm going to "just get it done" in the hackiest way imaginable until I get better with Clojure and revisit it.

pithyless18:12:12

tufte timing is OK, but what you really want - irrespective of CPU or allocation - is a sampling profiler that will give you more granular details about which part of the code is slow.

pithyless18:12:56

I'm not even sure if it's "slow" because of holding on to some lazy head, or if this is not just a case of some I/O issues.

pithyless18:12:52

^ eg. if it's allocation issues, you'd see lots of objects being created and then GC'ed; if it's I/O... then you have a different problem

sheluchin18:12:51

Alright I'm gonna set that up now.

sheluchin18:12:16

@U017QJZ9M7W if you have some ideas for something I should try while working on that, I can probably multitask.

Sam Ritchie18:12:20

if it were me at the REPL this would be my next few tasks other than profiling:

Sam Ritchie18:12:15

• go look at a single record and see the type - is it a clojure lazy seq, or some java iterable thing that is getting forced into a seq later by the transduce call? • either way, read down into the code you’re calling to figure out how :changes is getting populated, and why it is so slow. is it disk access? is it maybe doing a separate jgit call for every single change entry, or something silly like that?

Sam Ritchie18:12:20

for the next power debugging session, it will be way easier if you can makea reproducible example that we could point at a local git repo or something

Sam Ritchie18:12:26

what is porc? what library?

sheluchin18:12:42

I'm guessing it comes down to this silly thing I wrote:

(defn- entries->change-map
  [entries df]
  (let [changes (for [entry entries
                      :let [fh (.toFileHeader df entry)
                            el (.toEditList fh)]]
                  (for [edit el
                        :let [deletions (.getLengthA edit)
                              insertions (.getLengthB edit)]]
                    [deletions insertions]))]
    changes))

Sam Ritchie18:12:52

where something like ObjectReader is loading way more than you need

sheluchin18:12:44

When I first embarked on this effort, I thought "I should just spawn processes, call git directly, and parse it's output so I don't have to deal with the git->java->clj abstraction layers." Then I second guessed myself and thought "it would be best to get familiar with the tooling in the ecosystem instead." Kinda thinking I made the wrong call lol

Sam Ritchie18:12:57

once you get to clojure life is great!! but yeah I feel you for sure

pithyless19:12:34

@UPWHQK562 perhaps a little off-topic, but if you plan on merging those deletions and insertions later anyways, perhaps you can do it eagerly instead. One idea:

(defn- entries->change-map
  [entries df]
  (letfn [(calc-edit [m edit]
            (-> m
              (update m :deletions + (.getLengthA edit))
              (update m :insertions + (.getLengthB edit))))
          (calc-entries [m entry]
            (let [header (.toFileHeader df entry)
                  edits  (.toEditList header)]
              (reduce calc-edit m edits)))]
    (reduce calc-entries
            {:deletions 0
             :insertions 0}
            entries)))

👍 1
pithyless19:12:16

I changed the tuple [deletions insertions] to a map, but it's not strictly necessary. This is irrespective of what @U017QJZ9M7W mentioned about using the most appropriate jgit API (which I'm unfamiliar with).

sheluchin19:12:57

It's not totally clear to me when I should switch between lazy/eager eval. I thought that it was a good idea to stay lazy as long as possible until I actually needed the realization.

pithyless19:12:25

rule of thumb (with lots of caveats): lazy is good if you don't need all the results now (or perhaps ever) and if you can "summarize" or forget things you've seen; lazy is terrible if you keep holding on to lots of things you've seen and your structure just grows more and more as you progress further in the calculation.

pithyless19:12:10

if you know you need to go through everything to get a result, and the result is some kind aggregation/summary - usually eager is going to perform better (and more predictably)

pithyless19:12:05

transducers (as mentioned earlier) try to get the best of both worlds: you write things as composable independent pieces (which is one of the reasons why lazy may have previously been used) and you still get the performance of running it eager (and with extra optimizations that the compiler can make, because it has better control of the runtime assumptions).

sheluchin19:12:24

I think my case fits in the good category then. I don't need the result at all except to summarize.

Sam Ritchie19:12:48

Lazy is great when you can stream through and aggregate as you go;

pithyless19:12:21

yeah, but notice that your entries->change-map is actually keeping a lot of state in memory and not aggregating aggressively enough

👍 1
Sam Ritchie19:12:27

Like, in theory this is a great lazy computation (the sequence with the maps) because each item can get processed individually

sheluchin19:12:23

profiling: ran into https://github.com/clojure-goes-fast/clj-async-profiler/issues/8 and I don't see an immediate fix for that

sheluchin19:12:41

Let me try an eager entries->change-map

pithyless19:12:46

> ran into https://github.com/clojure-goes-fast/clj-async-profiler/issues/8 and I don't see an immediate fix for that Hmm, that's weird; if you want to debug that further: (1) which version of the JDK are you running? (2) lein or deps-tools? (3) have you added the correct JVM options? https://github.com/clojure-goes-fast/clj-async-profiler#jvm-options

sheluchin19:12:28

Ah, missed 3. Thanks, looks like it works now.

sheluchin19:12:38

Doesn't look like much of a change with the eager implementation.

sheluchin19:12:47

> "Elapsed time: 48718.480005 msecs"

sheluchin19:12:18

And I have the flamegraph...

sheluchin19:12:34

I'm trying to understand how to interpret the CPU graph. Looks like I need to install something else for the allocation profiling to work.

pithyless19:12:00

@UPWHQK562 can you upload the cpu svg somewhere?

pithyless19:12:21

for allocation, you just pass in the option:

(prof/profile
 {:event :alloc}
 ,,,))

sheluchin19:12:47

Yeah, will do, one sec..

sheluchin19:12:56

Did that.. it gave me:

Execution error (ExceptionInfo) at clj-async-profiler.core/start (core.clj:277).
No AllocTracer symbols found. Are JDK debug symbols installed?

pithyless19:12:34

I guess you can run

(prof/list-event-types)
to see which ones are supported on your JDK

sheluchin19:12:53

Yes, it does list allocation as a supported type :face_with_raised_eyebrow:

sheluchin20:12:10

Basic events:
  cpu
  alloc
  lock
  wall
  itimer

sheluchin20:12:26

Let me try installing openjdk-8-dbg.

pithyless20:12:44

> I'm trying to understand how to interpret the CPU graph You can click on any event to "zoom in". If you hover over an event, it shows on the bottom what percentage of the total is spent there

pithyless20:12:01

^ notice the amount of time (percentage of width of row vs total width) - is spent by DiffFormatter.open and RawText.load

pithyless20:12:13

and you can further see a large portion of the DiffFormatter.open time is spent in PackFile.decompress

pithyless20:12:26

so, no matter how much you improve the speed of calculating the diffs, most of the time is actually spent reading from Disk and decompressing the data

Sam Ritchie20:12:35

So in this case either this is how long it takes - OR if you are doing multiple passes then you may want to be careful about getting it into and keeping the data in memory so you can stay fast

☝️ 1
pithyless20:12:38

I suggest you check if there is a different jgit API that is available the won't make you spend all this time loading this data into memory and decompressing it; I'm assuming there must be a better and more efficient way of just getting the git stats your interested in

pithyless20:12:23

in case it's not clear, the way to read a flame graph is bottom to top -> each entry width shows the total time spent in this function, and the row above it shows what that function called (and how long each of those functions spent as a percentage of the parent function). I'm not sure that's a good description... :P

pithyless20:12:55

Also pro tip: all the way on the top left and right of the SVG are small links "Reset zoom" and "Search" to help with interactive exploration. I only mention this, because they're small, gray, and easy to miss.

sheluchin20:12:00

How do you know to zero in on those particular items in the flamegraph? Do you just scan up until the total percentage starts to get narrow? I understand that familiarity with the code certainly helps, and I did identify the DF as a line of interest, but from there I don't know how you pick out decompress as the next item of interest... it looks about the same as the others, except it also starts to narrow.

pithyless20:12:14

precisely, you kind of gawk at it and notice that entries->change-map-2 is interesting (in my codebase); then look up and most of those callers are the same width (which means they don't spend much time doing anything - all their time is spent in their "child")

pithyless20:12:37

createFormatResult starts looking interesting, as it has 2 prominent children

pithyless20:12:58

the DiffFormatter.diff (roughly 30%) and DiffFormatter.open (roughly 70%)

pithyless20:12:26

the DiffFormatter.open calls a bunch of stuff (that take basically no time) - but decompress sounds like the important one (before we're in java.util.zip.Inflater.inflate)

sheluchin20:12:29

I get it. Then all the items above decompress are clearly compression related, so the buck stops there.

👍 1
pithyless20:12:28

then, all you can do is make some hypotheses that would explain these different behaviors; consider which ones you should test; and consider which ones you can reasonably fix

pithyless20:12:17

e.g. you probably won't speed up java.util.zip.Inflater.inflate - but if it's a function you control, perhaps you can - or if it's the compression, perhaps you can avoid calling it as often.

pithyless20:12:20

rinse and repeat

sheluchin20:12:10

@U017QJZ9M7W and @U05476190, you guys are awesome. Thanks you so much for helping me reason through this process and arrive at this bitter end lol 🙂 I've certainly learned a great deal in this thread.

2
pithyless20:12:43

@U017QJZ9M7W made an important distinction: sampling profilers are a ghost image of the system; they just pause every X ticks and check what the CPU was working on. But they don't really tell you if "this function is really slow" or "this function is getting called a lot more than it needs to". That, you need to figure out separately.

pithyless20:12:03

There is a similar flamegraph you can build for allocations (assuming you get your JVM sorted)

sheluchin20:12:53

Looks like this gitblit thing might be somewhat of a successor to jgit and has the diffing functionality in it https://github.com/gitblit/gitblit/blob/master/src/main/java/com/gitblit/utils/JGitUtils.java#L1047

Sam Ritchie20:12:57

Thanks for being a good patient @UPWHQK562 :) I have also learned a lot, and I am sure I will avoid similar bugs in the future when my ears start to tingle during jgit interactions, no question about it

Sam Ritchie20:12:46

Hopefully this is a positive experience! Sitting over top of a few layers of code can induce anxiety (“what the heck is going on, my code is simple and fast!!!”) in the best of us

sheluchin20:12:41

I gotta admit those transducer functions you shared are still causing a bit of anxiety 🙂 I just gotta sit down and pull it apart when I'm not trying to get stuff done. "reducer with a transform"...

Sam Ritchie20:12:06

yes! if you want to use those, try always writing them FIRST as “transform them reduce”, then see if you can change them into transducers

Sam Ritchie20:12:06

Another note @UPWHQK562, where @U05476190 may disagree but let me toss it out there - if you find that, in this example, it is NOT beneficial to go push your reduction down into your entries->change-map , then don’t do it, keep that aggregation of changes function separate

Sam Ritchie20:12:19

then you will be able to reuse it no problem if you swap out git libraries later etc

Sam Ritchie20:12:43

unless it really matters for speed, write your functions in terms of data structures you care about, and separately do the xform ->change-map, just like you did

Sam Ritchie20:12:09

and then later, say you have to fuse them to get speed - well, keep the old, simple, easy to understand separate ones there in your test code

Sam Ritchie20:12:22

(now we are into software stuff you probably already know, just wanted to reinforce in this context!!)

pithyless20:12:41

Kudos to @UPWHQK562 for not giving up and coming out the other side (hopefully with more knowledge and confidence) and @U017QJZ9M7W for the awesome support (technical, but I would say even more importantly, emotional). Next time I'm dealing with a hairy problem I'd love to have you two in my corner. :)

sheluchin20:12:42

Noted @U017QJZ9M7W. I try to keep it small and simple where I can.

👍 1
pithyless20:12:59

(PS. I have absolutely no need to rescue that code; it was just a suggestion when I thought maybe the nested lazy-seqs were eating up memory. "You are not your code" and you should most definitely get rid of any abstractions that don't make it easier to reason about the system)

sheluchin20:12:00

Yeah @U05476190, been kinda a slow week in terms of crossing items off my list but a good week in terms of covering new ground and putting more tools in my arsenal. Totally off-topic, but I think I watched one of your talks on Fulcro a few months back!

pithyless20:12:43

Small world :]

sheluchin20:12:28

Yep, good stuff.

sheluchin20:12:38

I need a break from this machine. Have a good weekend guys!

❤️ 2
true.neutral19:12:46

Hello! Say I don't know a thing about JS or frontend. What would be the easiest way to build a very basic page displaying a couple of tables with the data fed from a Clojure backend say via a websocket or something? Maybe any libraries/frameworks/resources you could point me to? The more Clojure and the less JS it is, the better.

Arthur19:12:31

I’d probably use shadow-cljs, reagent (or helix) and sente. Shadow compiles your code, reagent is a react wrapper for clojurescript (same for helix) and sente is a websocket library for Clojure(script). If you don’t feel like playing around with React you can probably go with plain goog.dom, but that will probably look very akin to plain js

true.neutral19:12:57

Thanks. I'm alright with playing with a bit, just need to do something very barebones, doesn't have to be pretty or anything, just give me an interface to conveniently and interactively view endless maps of maps of maps I have on the backend. I don't want to spend too much time on it since data is far more interesting than how I display it.

true.neutral19:12:05

Thank you for the pointers!

hiredman19:12:09

I just wrote something like this for screen scraping package tracking data, cljs and clojure in a single file, no fancy shadle this or that or whatever

hiredman19:12:45

lemme make sure I don't have any passwords or anything in the file and I will gist it

true.neutral19:12:45

Oh neat. That'd be great, thanks

hiredman19:12:54

https://gist.github.com/hiredman/c6868603eb9bf3620f2b89acfaef623e#file-packages-cljc-L1105-L1125 is where the clojure file compiles itself as a clojurescript file to serve to clients

hiredman20:12:25

😳 there are some terrible bits in there, like the macros trying to provide a unified logging api between clojure and clojurescript

true.neutral20:12:10

No worries, I think I'll be able to get the basic idea from it 🙂 Thank you a ton for sharing!

hiredman20:12:18

yeah, I've never seen anyone do a project like that, but I have no interest in adding more tooling layers (like shadow-cljs) for something so simple

Michaël Salihi20:12:26

You can also use htmx + clojure to present your data table. Here is an example with Babashka https://github.com/prestancedesign/babashka-htmx-todoapp

pithyless20:12:09

> just need to do something very barebones, doesn't have to be pretty or anything, just give me an interface to conveniently and interactively view endless maps of maps of maps I have on the backend. @U5USC6WNL - do you need to build a webpage? Perhaps it may be enough to just use (or perhaps extend a custom viewer) for something like https://github.com/djblue/portal or https://vlaaad.github.io/reveal/ Or, if you need to publish these results online - perhaps something like https://github.com/nextjournal/clerk will be sufficient as both a data explorer and static public website?

Daniel Craig20:12:46

I'm doing something like this with re-com, re-frame, and reagent

mathpunk21:12:23

I'm trying to (spit filename data) and when I open the file it's a string representation of a LazySeq. data evaluates to (loop <some stuff>) so I tried setting data equal to (doall (loop...)) instead

mathpunk21:12:36

Same result! How do I force evaluation correctly?

dpsutton21:12:20

its not evaluation that is your problem. If you spit data you just get a string representation of that data. (str (filter even? [1 2 3]))

dpsutton21:12:51

the toString on a lazy sequence (regardless of whether it has been realized or not) is just "clojure.lang.LazySeq@21" or similar

mathpunk21:12:25

hm. I guess what's confusing me tho is evaluating the loop form in my repl gives results

mathpunk21:12:39

and i want to inspect them in a file

dpsutton21:12:07

the repl printer will print out the values in a lazy sequence

mathpunk21:12:01

how can i put the realized sequence to a file, then?

dpsutton21:12:44

(spit "spat" (pr-str (filter even? (range 4))))

dpsutton21:12:17

(defn spit
  "Opposite of slurp.  Opens f with writer, writes content, then
  closes f. Options passed to ."
  {:added "1.2"}
  [f content & options]
  (with-open [^java.io.Writer w (apply jio/writer f options)]
    (.write w (str content))))
spit is a very simple thing. it just calls str on the argument. (str (filter even? (range 4)) will show what you end up with. If you need more control you can minic what spit does here and control the writer

mathpunk21:12:37

thanks! that's good advice, i have feared the reader/writer

dpsutton21:12:17

you’ve been using them all along 🙂

bad_ash21:12:36

hi, i'm programming a reagent app and i have a questino about this piece of code:

(let [atom-val (r/atom "")]
    [:> rn/View {:style {:align-items :center}}
     [:> rn/TextInput {:on-change-text #(reset! atom-val %)}]
     [:> rn/Button {:title @atom-val)}]
the title property of the Button isn't being updated by the TextInput callback. am i wrong in assuming that this should work? or is there a bug in the code? thanks in advance.

dpsutton21:12:52

there’s a bug in this. each time it renders, it creates a new r/atom whose contents is the empty string. When you call reset! to some new value, it re-renders, and creates a new r/atom whose contents are the empty string …

facepalm 1
dpsutton21:12:24

(let [atom-val ...] (fn [] [:> rn/View ...))

bad_ash21:12:56

ah i had a feeling it was some dumb mistake. thank you

dpsutton21:12:38

that is a super common mistake. you’ll probably do it again in the future and add some printlns that you won’t believe possible. and then you’ll remember after 7 minutes of screaming “this cannot be”

🙂 1
bad_ash21:12:31

hah i don't doubt i'll keep making mistakes

bad_ash21:12:49

by the way, why does the example you posted work?

bad_ash21:12:44

thanks

👍 1