data-science

Joe R. Smith 2024-12-19T17:11:33.779799Z

Hello! Could someone help me understand why shifting a row followed by some array math with the new row takes almost 3x longer than the time it takes to do the shift and do math on other columns? Is there something special about the shifted column that would cause, in this case subtraction, to be so much slower? This is Tablecloth 7.029.2

(def rand-ds (tc/dataset {:a (repeatedly 1e6 #(* 1000 (rand)))
                            :b (repeatedly 1e6 #(* 1000 (rand)))}))

  (time
    (-> rand-ds
        (tc/shift :b_s :b 1)
        (tc/- :a-b [:a :b])))
  ;; "Elapsed time: 54.576875 msecs"

  (time
    (-> rand-ds
        (tc/shift :b_s :b 1)
        (tc/- :a-b_s [:a :b_s])))
  ;; "Elapsed time: 142.5745 msecs"

chrisn 2025-01-06T22:28:11.817729Z

Hey Joe - did you figure out the reason for this and are you still interested? My guess is its due to indexing indirection but if you really want some compound set of equations to be fast we could take a closer look. We tend to both visualvm and clj-async-profiler to piece together how to make these things faster.

Joe R. Smith 2025-01-07T20:56:37.433839Z

Thanks for the reply Chris– I'm not too worried about it, yet. It just surprised me when I discovered it / reproduced it in a simple example. My usecase might require dropping down a level of abstraction, anyway, so I'll see if the same anomaly presents itself there.