nextjournal

2022-02-20T12:08:54.398329Z

I wrote a little function which "removes a symbol" from Clerks cache (in-memory and from disk). It could be used as a one-arity function on clear-cache! It can be useful as an alternative for timestamping a symbol which comes from IO (disk, database, web service). If you find it useful, I do a PR

mkvlr 2022-02-21T16:58:10.013359Z

this is the refactor regarding the caching I’ve been wanting to do

mkvlr 2022-02-21T16:58:18.011439Z

would be great if you can take this for a spin

mkvlr 2022-02-21T16:58:28.760589Z

and let me know if there’s any issues I missed

mkvlr 2022-02-20T19:01:01.708769Z

thanks, I’ll take a closer look at this next week

2022-02-20T23:06:48.593039Z

I noticed an issue of the caching when we have 2 notebook files in the same folder. In case these different files use the same variables, the caching can get confused as the hashes of 2 codeblocks in the 2 files might be the same.

2022-02-21T09:28:50.954949Z

I have a case of two notebooks, where a vega image appears in the "wrong" output. And indeed I copy/pasted the code from one to the other notbook file, so they have identical text. (referring to the same variable), but the variable get different values (in normal, non-clerk evaluation)

(clerk/vl

 {:$schema ""
  :config {:axis {:grid true :tickBand "extent"}}
  :width 600
  :height 600
  :data {:values (vec pps-scores)}
  :encoding {:x {:field "x" :type "ordinal"}
             :y {:field "y" :type "ordinal"}}
  :layer [{:encoding {:color {:field "pps"
                              :legend {:orient "top"
                                       :direction "horizontal"
                                       :gradientLength 120}
                              :title "PPS"
                              :type "quantitative"}}
           :mark "rect"}
          {:encoding {
                      :text {:field "pps" :type "quantitative"}}
           :mark "text"}]})
The same snippet is present in both notebook files. But in one pps-scores refers to kaggle/pps-score and in the other to predict-heart-attack/pps-score But a hash based on the text is identical for the text block in both files.

2022-02-21T09:31:55.328149Z

It is here: https://github.com/behrica/ds-notebooks

2022-02-21T09:46:09.624939Z

To change the hashing and including the ns has fixed it for my concrete case (I think ...): This results in a kind of "partition" of the space of hashes in case of several notebook files. -> The common disk based cache should have 2 different hashes and results, even if the text of the code block is identical. Not sure, how this affects the in-memory cache, though.

(defn hash-codeblock [->hash {:keys [hash form deps]}]
  (let [hashed-deps (into #{} (map ->hash) deps)]
    (sha1-base58 (pr-str *ns* (conj hashed-deps (if form form hash))))))

2022-02-21T10:04:48.745449Z

I tried again, to run both notebooks via:

rm -rf .clerk/
clojure -X:nextjournal/clerk

2022-02-21T10:05:59.644819Z

with and without the "ns" inside hash-codeblock and having it included fixed the issue.

2022-02-21T11:44:55.214859Z

This screenshot of both notebooks shows the "wrong" plot (its identical from the other): But it should have completely different variables. The "text" of the code above the plot is identical...

mkvlr 2022-02-21T11:50:04.052319Z

kaggle/pps-score  and `predict-heart-attack/pps-score` should get different hashes if they’re different and lead to a different hash of the forms depending on them

2022-02-21T11:50:24.208859Z

should... look at this:

2022-02-21T11:50:43.016639Z

The data of the plot and the plot do not match at all.

2022-02-21T11:51:11.464349Z

The plot is "from the other notebook" ...

2022-02-21T11:55:09.846789Z

So it goes wrong in the code block which produces the vega lite svg. The data before is ok.

mkvlr 2022-02-21T11:55:41.356449Z

sorry, pretty hard to follow like this

mkvlr 2022-02-21T11:56:17.326849Z

like I said there’s known issues with the in memory cache that I plan to look at

2022-02-21T11:58:43.756259Z

The code is here: https://github.com/behrica/ds-notebooks/blob/main/notebooks running clojure -X:nextjournal/clerk will produce the 2 html files, and the kaggle.html has a wrong plot. (wrong axes already, and it is a "copy" of the plot from predict_heart_attack.html)

mkvlr 2022-02-21T11:59:33.023709Z

I want to fix the issues I know about first and then check if this is still a problem

2022-02-21T12:00:04.619909Z

ok, seem good idea.

2022-02-21T12:00:24.443979Z

let me know, and I can re-check.

👍 1
2022-02-21T12:36:54.968909Z

I think it is a dependency analysis problem. (not hashing as such) For the code block:

(clerk/vl

 {:$schema ""
  :config {:axis {:grid true :tickBand "extent"}}
  :width 600
  :height 600
  :data {:values (vec pps-scores)}
  :encoding {:x {:field "x" :type "ordinal"}
             :y {:field "y" :type "ordinal"}}
  :layer [{:encoding {:color {:field "pps"
                              :legend {:orient "top"
                                       :direction "horizontal"
                                       :gradientLength 120}
                              :title "PPS"
                              :type "quantitative"}}
           :mark "rect"}
          {:encoding {
                      :text {:field "pps" :type "quantitative"}}
           :mark "text"}]})
It does not detect the dependency to var`pps-scores`
;; => [(clerk/vl
;;      {:$schema "",
;;       :config {:axis {:grid true, :tickBand "extent"}},
;;       :width 600,
;;       :height 600,
;;       :data {:values (vec pps-scores)},
;;       :encoding
;;       {:x {:field "x", :type "ordinal"}, :y {:field "y", :type "ordinal"}},
;;       :layer
;;       [{:encoding
;;         {:color
;;          {:field "pps",
;;           :legend {:orient "top", :direction "horizontal", :gradientLength 120},
;;           :title "PPS",
;;           :type "quantitative"}},
;;         :mark "rect"}
;;        {:encoding {:text {:field "pps", :type "quantitative"}}, :mark "text"}]})
;;     {:form
;;      (clerk/vl
;;       {:$schema "",
;;        :config {:axis {:grid true, :tickBand "extent"}},
;;        :width 600,
;;        :height 600,
;;        :data {:values (vec pps-scores)},
;;        :encoding
;;        {:x {:field "x", :type "ordinal"}, :y {:field "y", :type "ordinal"}},
;;        :layer
;;        [{:encoding
;;          {:color
;;           {:field "pps",
;;            :legend {:orient "top", :direction "horizontal", :gradientLength 120},
;;            :title "PPS",
;;            :type "quantitative"}},
;;          :mark "rect"}
;;         {:encoding {:text {:field "pps", :type "quantitative"}}, :mark "text"}]}),
;;      :ns-effect? false,
;;      :deps #{nextjournal.clerk/vl},
;;      :file "notebooks/kaggle.clj"}]
"deps" only contains nextjournal.clerk/vl Therefore the hashing does not contain a hash for pps-scores which results in the hash of the full text block of both files being the same. (-> re-use across files) This explains why adding "ns" into the hashing fixes it (by coincidence)

2022-02-21T13:14:42.511199Z

Lets continue here: https://github.com/nextjournal/clerk/issues/94

mkvlr 2022-02-21T13:24:36.928699Z

I have a fix

2022-02-21T13:31:19.404629Z

ok, very good. I just found a "minimal issue", but maybe not needed:

(->
 "(clerk/vl
 {
  
  :data pps-scores
 
  })"
 
 read-string
 h/analyze)
But maybe it can confirm your fix as well.

👍 1
mkvlr 2022-02-21T13:43:17.869699Z

please do confirm 🙃

2022-02-21T14:38:56.537309Z

Yes, just tried it. It fixes the issue.👍

💯 1
2022-02-20T23:08:13.736419Z

An easy fix could be to include the namespace in the hash calculation (or the file name)

mkvlr 2022-02-21T07:31:41.558339Z

do you have a repro? Hashes should intentionally be the same if both forms are the same and have the same depedencies.