Fork me on GitHub
#nextjournal
<
2022-02-20
>
Carsten Behring12:02:54

I wrote a little function which "removes a symbol" from Clerks cache (in-memory and from disk). It could be used as a one-arity function on clear-cache! It can be useful as an alternative for timestamping a symbol which comes from IO (disk, database, web service). If you find it useful, I do a PR

mkvlr19:02:01

thanks, I’ll take a closer look at this next week

mkvlr16:02:10

this is the refactor regarding the caching I’ve been wanting to do

mkvlr16:02:18

would be great if you can take this for a spin

mkvlr16:02:28

and let me know if there’s any issues I missed

Carsten Behring23:02:48

I noticed an issue of the caching when we have 2 notebook files in the same folder. In case these different files use the same variables, the caching can get confused as the hashes of 2 codeblocks in the 2 files might be the same.

Carsten Behring23:02:13

An easy fix could be to include the namespace in the hash calculation (or the file name)

mkvlr07:02:41

do you have a repro? Hashes should intentionally be the same if both forms are the same and have the same depedencies.

Carsten Behring09:02:50

I have a case of two notebooks, where a vega image appears in the "wrong" output. And indeed I copy/pasted the code from one to the other notbook file, so they have identical text. (referring to the same variable), but the variable get different values (in normal, non-clerk evaluation)

(clerk/vl

 {:$schema ""
  :config {:axis {:grid true :tickBand "extent"}}
  :width 600
  :height 600
  :data {:values (vec pps-scores)}
  :encoding {:x {:field "x" :type "ordinal"}
             :y {:field "y" :type "ordinal"}}
  :layer [{:encoding {:color {:field "pps"
                              :legend {:orient "top"
                                       :direction "horizontal"
                                       :gradientLength 120}
                              :title "PPS"
                              :type "quantitative"}}
           :mark "rect"}
          {:encoding {
                      :text {:field "pps" :type "quantitative"}}
           :mark "text"}]})
The same snippet is present in both notebook files. But in one pps-scores refers to kaggle/pps-score and in the other to predict-heart-attack/pps-score But a hash based on the text is identical for the text block in both files.

Carsten Behring09:02:09

To change the hashing and including the ns has fixed it for my concrete case (I think ...): This results in a kind of "partition" of the space of hashes in case of several notebook files. -> The common disk based cache should have 2 different hashes and results, even if the text of the code block is identical. Not sure, how this affects the in-memory cache, though.

(defn hash-codeblock [->hash {:keys [hash form deps]}]
  (let [hashed-deps (into #{} (map ->hash) deps)]
    (sha1-base58 (pr-str *ns* (conj hashed-deps (if form form hash))))))

Carsten Behring10:02:48

I tried again, to run both notebooks via:

rm -rf .clerk/
clojure -X:nextjournal/clerk

Carsten Behring10:02:59

with and without the "ns" inside hash-codeblock and having it included fixed the issue.

Carsten Behring11:02:55

This screenshot of both notebooks shows the "wrong" plot (its identical from the other): But it should have completely different variables. The "text" of the code above the plot is identical...

mkvlr11:02:04

kaggle/pps-score  and `predict-heart-attack/pps-score` should get different hashes if they’re different and lead to a different hash of the forms depending on them

Carsten Behring11:02:24

should... look at this:

Carsten Behring11:02:43

The data of the plot and the plot do not match at all.

Carsten Behring11:02:11

The plot is "from the other notebook" ...

Carsten Behring11:02:09

So it goes wrong in the code block which produces the vega lite svg. The data before is ok.

mkvlr11:02:41

sorry, pretty hard to follow like this

mkvlr11:02:17

like I said there’s known issues with the in memory cache that I plan to look at

Carsten Behring11:02:43

The code is here: https://github.com/behrica/ds-notebooks/blob/main/notebooks running clojure -X:nextjournal/clerk will produce the 2 html files, and the kaggle.html has a wrong plot. (wrong axes already, and it is a "copy" of the plot from predict_heart_attack.html)

mkvlr11:02:33

I want to fix the issues I know about first and then check if this is still a problem

Carsten Behring12:02:04

ok, seem good idea.

Carsten Behring12:02:24

let me know, and I can re-check.

👍 1
Carsten Behring12:02:54

I think it is a dependency analysis problem. (not hashing as such) For the code block:

(clerk/vl

 {:$schema ""
  :config {:axis {:grid true :tickBand "extent"}}
  :width 600
  :height 600
  :data {:values (vec pps-scores)}
  :encoding {:x {:field "x" :type "ordinal"}
             :y {:field "y" :type "ordinal"}}
  :layer [{:encoding {:color {:field "pps"
                              :legend {:orient "top"
                                       :direction "horizontal"
                                       :gradientLength 120}
                              :title "PPS"
                              :type "quantitative"}}
           :mark "rect"}
          {:encoding {
                      :text {:field "pps" :type "quantitative"}}
           :mark "text"}]})
It does not detect the dependency to var`pps-scores`
;; => [(clerk/vl
;;      {:$schema "",
;;       :config {:axis {:grid true, :tickBand "extent"}},
;;       :width 600,
;;       :height 600,
;;       :data {:values (vec pps-scores)},
;;       :encoding
;;       {:x {:field "x", :type "ordinal"}, :y {:field "y", :type "ordinal"}},
;;       :layer
;;       [{:encoding
;;         {:color
;;          {:field "pps",
;;           :legend {:orient "top", :direction "horizontal", :gradientLength 120},
;;           :title "PPS",
;;           :type "quantitative"}},
;;         :mark "rect"}
;;        {:encoding {:text {:field "pps", :type "quantitative"}}, :mark "text"}]})
;;     {:form
;;      (clerk/vl
;;       {:$schema "",
;;        :config {:axis {:grid true, :tickBand "extent"}},
;;        :width 600,
;;        :height 600,
;;        :data {:values (vec pps-scores)},
;;        :encoding
;;        {:x {:field "x", :type "ordinal"}, :y {:field "y", :type "ordinal"}},
;;        :layer
;;        [{:encoding
;;          {:color
;;           {:field "pps",
;;            :legend {:orient "top", :direction "horizontal", :gradientLength 120},
;;            :title "PPS",
;;            :type "quantitative"}},
;;          :mark "rect"}
;;         {:encoding {:text {:field "pps", :type "quantitative"}}, :mark "text"}]}),
;;      :ns-effect? false,
;;      :deps #{nextjournal.clerk/vl},
;;      :file "notebooks/kaggle.clj"}]
"deps" only contains nextjournal.clerk/vl Therefore the hashing does not contain a hash for pps-scores which results in the hash of the full text block of both files being the same. (-> re-use across files) This explains why adding "ns" into the hashing fixes it (by coincidence)

mkvlr13:02:36

I have a fix

Carsten Behring13:02:19

ok, very good. I just found a "minimal issue", but maybe not needed:

(->
 "(clerk/vl
 {
  
  :data pps-scores
 
  })"
 
 read-string
 h/analyze)
But maybe it can confirm your fix as well.

👍 1
mkvlr13:02:17

please do confirm 🙃

Carsten Behring14:02:56

Yes, just tried it. It fixes the issue.👍

💯 1