I try to investigate an issue, where Clerk sometimes decides to re-evaluate a certain form.
I run clerk/show in a loop and measure time, and indeed I see sometimes far longer duration of form evaluations.
This is confirmed by the debug output coming from here:
https://github.com/nextjournal/clerk/blob/b0e13859af5280e4d75ba9cb324cbf8010e555cd/src/nextjournal/clerk.clj#L145
I have somehow the impression that the hashing of the forms is not always "stable". I see for example that the same form gets different hashes in different calls of "show!" without any code change in between.
"EVAL !!!" :hash "5dr7SHMXW1tdQYL1UbqD6Xdg3bd3jj" :form (def pipe-fn (ml/pipeline (mm/select-columns (concat cat-features numeric-features [:SalePrice])) (mm/replace-missing cat-features :value :NA) (mm/replace-missing numeric-features :midpoint) (fn [ctx] (assoc ctx : train-data)) (mm/transform-one-hot cat-features :full) (mm/set-inference-target :SalePrice) #:metamorph{:id :model} (mm/model {:model-type :smile.regression/random-forest, :trees 1000})))
.....
"EVAL !!!" :hash "5dtaiHaQeang6r83DzaWmbUmTtExFJ" :form (def pipe-fn (ml/pipeline (mm/select-columns (concat cat-features numeric-features [:SalePrice])) (mm/replace-missing cat-features :value :NA) (mm/replace-missing numeric-features :midpoint) (fn [ctx] (assoc ctx : train-data)) (mm/transform-one-hot cat-features :full) (mm/set-inference-target :SalePrice) #:metamorph{:id :model} (mm/model {:model-type :smile.regression/random-forest, :trees 1000}))) So running this:
(def times
(doall
(repeatedly 10 (fn [] (clerk/time-ms (clerk/show! "src/kaggle.clj"))))))produces: ({:result nil, :time-ms 22870.775138} {:result nil, :time-ms 20775.000902} {:result nil, :time-ms 1289.373927} {:result nil, :time-ms 1285.455593} {:result nil, :time-ms 21002.457645} {:result nil, :time-ms 20808.522161} {:result nil, :time-ms 20450.540389} {:result nil, :time-ms 1207.209592} {:result nil, :time-ms 20773.024323} {:result nil, :time-ms 1282.546341})
4 times fast (from cache) and 6 times slow (Clerk re-evaluated the slow form)
We are getting somewhere...
(repeatedly
10
(fn []
(->
(clerk/parse-file "src/kaggle.clj")
(h/build-graph)
h/hash
(get 'kaggle/train))))Returns different results.....
;; => ("5dt6Bi1CeH3guTNdnCKCid5TB1hgwA" ;; "5duH7vdgoEpu9E36jc5fxzWJhMJ7Xg" ;; "5dt6Bi1CeH3guTNdnCKCid5TB1hgwA" ;; "5duH7vdgoEpu9E36jc5fxzWJhMJ7Xg" ;; "5duH7vdgoEpu9E36jc5fxzWJhMJ7Xg" ;; "5ds3nXtgTQ4aUw3xt4KiM41tQbJUsE" ;; "5duH7vdgoEpu9E36jc5fxzWJhMJ7Xg" ;; "5dt6Bi1CeH3guTNdnCKCid5TB1hgwA" ;; "5duH7vdgoEpu9E36jc5fxzWJhMJ7Xg" ;; "5ds3nXtgTQ4aUw3xt4KiM41tQbJUsE")
thanks for digging into this!
Very inteesting.... I came this far: (-> (clerk/parse-file "src/kaggle.clj") (h/build-graph) :graph :dependencies (get 'kaggle/r-object)) ;; => #{clojure.core/ex-info ;; clojure.string/split ;; kaggle/base-url ;; clojure.core/first ;; clojure.lang.RT/nth ;; opencpu-clj.ocpu/object ;; clojure.lang.Numbers/gt}
The 'kaggle/r-object changes it's hash, eventhough all deps are a constant string "kaggle/base-url" and the other symbols are from jars
So the hash should be constant, should it ?
yes
One thing ...
For ope symbol no location was found ...
an issue with a minimal repro would be great
which one?
(h/find-location 'opencpu-clj.ocpu/object) -> nil
this should not be
Is it this "-" in the ns ?
"_" vs "-"
here we go !
(h/ns->jar (namespace 'opencpu_clj.ocpu/object)) -> working
(h/ns->jar (namespace 'opencpu-clj.ocpu/object)) > nil
This fix is needed in my view:
(defn ns->path [ns]
(str/replace (str/replace (str ns) "-" "_") "." fs/file-separator))So, the jar get found for opencpu-clj namespace.
now all deps above have a location.
But the hash of kaggle/r-object is nevertheless sometimes different.
-> a hash of one of the deps need to be different "sometimes" as code text is constant!
nice catch!
adding a test now
https://github.com/nextjournal/clerk/commit/51b945f539bc86623ab67ee6eace1d378d098db7
where’s the r-object coming from? don’t see it in https://github.com/behrica/kaggleHP/blob/main/src/kaggle.clj
I will try to commit later:
(defn r-object [library function params]
(let [resp (ocpu/object base-url :library library :R function params)]
(when (> (:status resp) 201) (throw (ex-info "error" resp)))
(-> resp
:result
first
(str/split #"/")
(nth 3))))
It only depends on one more object:
(def base-url "") I tried everything, but don't understand it. All inputs to the hashing are constant, and the hashing itself seems as well deterministic
I did not find , why it fails, but at least which line it makes fail.
(str/split #"/")So replacing the above definition with:
(def re #"/")
(defn r-object [library function params]
(let [resp (ocpu/object base-url :library library :R function params)]
(when (> (:status resp) 201) (throw (ex-info "error" resp)))
(-> resp
:result
first
(str/split re)
(nth 3))))makes it work.
This small example reproduces it.
There is something strange about the re
(ns kaggle
(:require [nextjournal.clerk :as clerk]
[nextjournal.clerk.viewer :as v]))
(def re #"/")
(defn use-re--working-always [s]
(println "very slow !! - cache working always")
(Thread/sleep 5000)
(clojure.string/split s re))
(defn use-re--failing-sometimes [s]
(println "very slow !! -- cache failing sometimes ")
(Thread/sleep 5000)
(clojure.string/split s #"/"))
(def splitted (use-re--working-always "hello/clerk"))
(comment
(def times
(do
(clerk/clear-cache!)
(doall
(repeatedly 100 (fn [] (clerk/time-ms (clerk/show! "src/buggy.clj")))))))
:ok)
;; using `use-re--failing-sometimes` does not cache 3 times
(->> times (map :time-ms) (filter #(> % 5000)))
;; => (5118.088344
;; 5112.789448
;; 5111.539959
;; using `use-re--alawys-working` does always cache
(->> times (map :time-ms) (filter #(> % 5000)))
;; => []In here the "frequency" of "Not caching" is very low, 3 out of 1000 In my real code is higher, evey 5th run, I would say.
regular expressions don’t have equality in clojure
digging into how this could affect caching…
this seems to reproduce it pretty reliably:
(require '[nextjournal.clerk.hashing :as h])
(into []
(comp (map (comp set vals h/hash h/build-graph h/parse-clojure-string))
(distinct))
(repeat 100 "(fn [x] (clojure.string/split x #\"/\"))"))thanks a lot for figuring out the the regexp can cause issues! 🙌
(frequencies (into []
(comp (map (comp vals h/hash h/build-graph h/parse-clojure-string)))
(repeat 1000 "(fn [x] (clojure.string/split x #\"/\"))")))
;;=> {("5dtVT9HVuLAa4BCJBWwz9KQ2b9u37n" "5drKysDWADNZJWwWfgWYs6TETrz7zx")
946,
("5dtVT9HVuLAa4BCJBWwz9KQ2b9u37n" "5dt92a8LBSfEScpUk3X7Y1L3vBDKJm")
54}a heisenbug 🙃
Indeed interesting.
But I though that Clerk uses the "text representation" via prn-str for hashing.
for hashing yes but as nodes in the dep graph it uses either var names or for top level expression the read form
https://clojurians.slack.com/archives/C03S1KBA2/p1645124691463129
can you try https://github.com/nextjournal/clerk/compare/regex-hash-fix?expand=1 ?
Looks good to me 👍
Clerk evaluated 'src/kaggle.clj' in 373.011182ms.
Clerk evaluated 'src/kaggle.clj' in 408.822775ms.
Clerk evaluated 'src/kaggle.clj' in 454.97627ms.
Clerk evaluated 'src/kaggle.clj' in 529.977895ms.
Clerk evaluated 'src/kaggle.clj' in 506.647304ms.
Clerk evaluated 'src/kaggle.clj' in 437.870516ms.
Clerk evaluated 'src/kaggle.clj' in 466.663666ms.
Clerk evaluated 'src/kaggle.clj' in 431.667927ms.
Clerk evaluated 'src/kaggle.clj' in 488.64744ms.
Clerk evaluated 'src/kaggle.clj' in 449.524183ms.great, thanks
I am trying to make a viewer for a MD dataset and started withg something like:
(clerk/set-viewers! [{:pred tc/dataset?
:render-fn (quote v/table-viewer)
:transform-fn (fn [x]
(->
(cons (tds/column-names x)
(into [] (tds/rowvecs x)))))}])
The transform-fn transforms the dataset to something the table viewer of Clerk understands, but it does not work.@carsten.behring something like this should work
(clerk/set-viewers! [{:pred tc/dataset?
:transform-fn #(hash-map :nextjournal/value
(clerk/table {:head (tds/column-names x)
:rows (tds/rowvecs x)}))}])@carsten.behring on latest main, this can be simplified to
{:pred tc/dataset?
:transform-fn #(clerk/table {:head (tds/column-names %)
:rows (tds/rowvecs %)})}ok, I will try it.👍 Would be good to have at some point docu somewhere on the viewer API, it is for sure the most complex public surface of Clerk.
yep for sure, will focus on that once it has settled and we feel it's stable
this is what I used to test it
^{::clerk/viewer {:transform-fn #(hash-map :nextjournal/value
(clerk/table {:head (first %) :rows (rest %)}))}}
[[:a :b] [:c :d]](I only tested it on a simpler example)
Yes, great !
Do you think there is any way, to get this working with Clerk ? https://datatables.net/