I wrote a little function which "removes a symbol" from Clerks cache (in-memory and from disk).
It could be used as a one-arity function on clear-cache!
It can be useful as an alternative for timestamping a symbol which comes from IO (disk, database, web service).
If you find it useful, I do a PR
https://github.com/nextjournal/clerk/compare/better-in-memory-cache?expand=1
this is the refactor regarding the caching I’ve been wanting to do
would be great if you can take this for a spin
and let me know if there’s any issues I missed
thanks, I’ll take a closer look at this next week
I noticed an issue of the caching when we have 2 notebook files in the same folder. In case these different files use the same variables, the caching can get confused as the hashes of 2 codeblocks in the 2 files might be the same.
I have a case of two notebooks, where a vega image appears in the "wrong" output. And indeed I copy/pasted the code from one to the other notbook file, so they have identical text. (referring to the same variable), but the variable get different values (in normal, non-clerk evaluation)
(clerk/vl
{:$schema ""
:config {:axis {:grid true :tickBand "extent"}}
:width 600
:height 600
:data {:values (vec pps-scores)}
:encoding {:x {:field "x" :type "ordinal"}
:y {:field "y" :type "ordinal"}}
:layer [{:encoding {:color {:field "pps"
:legend {:orient "top"
:direction "horizontal"
:gradientLength 120}
:title "PPS"
:type "quantitative"}}
:mark "rect"}
{:encoding {
:text {:field "pps" :type "quantitative"}}
:mark "text"}]})
The same snippet is present in both notebook files.
But in one pps-scores refers to kaggle/pps-score and in the other to predict-heart-attack/pps-score
But a hash based on the text is identical for the text block in both files.It is here: https://github.com/behrica/ds-notebooks
To change the hashing and including the ns has fixed it for my concrete case (I think ...): This results in a kind of "partition" of the space of hashes in case of several notebook files. -> The common disk based cache should have 2 different hashes and results, even if the text of the code block is identical. Not sure, how this affects the in-memory cache, though.
(defn hash-codeblock [->hash {:keys [hash form deps]}]
(let [hashed-deps (into #{} (map ->hash) deps)]
(sha1-base58 (pr-str *ns* (conj hashed-deps (if form form hash))))))I tried again, to run both notebooks via:
rm -rf .clerk/
clojure -X:nextjournal/clerkwith and without the "ns" inside hash-codeblock and having it included fixed the issue.
This screenshot of both notebooks shows the "wrong" plot (its identical from the other): But it should have completely different variables. The "text" of the code above the plot is identical...
kaggle/pps-score and `predict-heart-attack/pps-score` should get different hashes if they’re different and lead to a different hash of the forms depending on them
should... look at this:
The data of the plot and the plot do not match at all.
The plot is "from the other notebook" ...
So it goes wrong in the code block which produces the vega lite svg. The data before is ok.
sorry, pretty hard to follow like this
like I said there’s known issues with the in memory cache that I plan to look at
The code is here:
https://github.com/behrica/ds-notebooks/blob/main/notebooks
running clojure -X:nextjournal/clerk will produce the 2 html files, and the kaggle.html has a wrong plot.
(wrong axes already, and it is a "copy" of the plot from predict_heart_attack.html)
I want to fix the issues I know about first and then check if this is still a problem
ok, seem good idea.
let me know, and I can re-check.
I think it is a dependency analysis problem. (not hashing as such) For the code block:
(clerk/vl
{:$schema ""
:config {:axis {:grid true :tickBand "extent"}}
:width 600
:height 600
:data {:values (vec pps-scores)}
:encoding {:x {:field "x" :type "ordinal"}
:y {:field "y" :type "ordinal"}}
:layer [{:encoding {:color {:field "pps"
:legend {:orient "top"
:direction "horizontal"
:gradientLength 120}
:title "PPS"
:type "quantitative"}}
:mark "rect"}
{:encoding {
:text {:field "pps" :type "quantitative"}}
:mark "text"}]})
It does not detect the dependency to var`pps-scores`
;; => [(clerk/vl
;; {:$schema "",
;; :config {:axis {:grid true, :tickBand "extent"}},
;; :width 600,
;; :height 600,
;; :data {:values (vec pps-scores)},
;; :encoding
;; {:x {:field "x", :type "ordinal"}, :y {:field "y", :type "ordinal"}},
;; :layer
;; [{:encoding
;; {:color
;; {:field "pps",
;; :legend {:orient "top", :direction "horizontal", :gradientLength 120},
;; :title "PPS",
;; :type "quantitative"}},
;; :mark "rect"}
;; {:encoding {:text {:field "pps", :type "quantitative"}}, :mark "text"}]})
;; {:form
;; (clerk/vl
;; {:$schema "",
;; :config {:axis {:grid true, :tickBand "extent"}},
;; :width 600,
;; :height 600,
;; :data {:values (vec pps-scores)},
;; :encoding
;; {:x {:field "x", :type "ordinal"}, :y {:field "y", :type "ordinal"}},
;; :layer
;; [{:encoding
;; {:color
;; {:field "pps",
;; :legend {:orient "top", :direction "horizontal", :gradientLength 120},
;; :title "PPS",
;; :type "quantitative"}},
;; :mark "rect"}
;; {:encoding {:text {:field "pps", :type "quantitative"}}, :mark "text"}]}),
;; :ns-effect? false,
;; :deps #{nextjournal.clerk/vl},
;; :file "notebooks/kaggle.clj"}]
"deps" only contains nextjournal.clerk/vl
Therefore the hashing does not contain a hash for pps-scores
which results in the hash of the full text block of both files being the same. (-> re-use across files)
This explains why adding "ns" into the hashing fixes it (by coincidence)Lets continue here: https://github.com/nextjournal/clerk/issues/94
I have a fix
ok, very good. I just found a "minimal issue", but maybe not needed:
(->
"(clerk/vl
{
:data pps-scores
})"
read-string
h/analyze)
But maybe it can confirm your fix as well.https://github.com/nextjournal/clerk/commit/549f9956870c69ef0951ca82d55a8e5ec2e49ed4
please do confirm 🙃
Yes, just tried it. It fixes the issue.👍
An easy fix could be to include the namespace in the hash calculation (or the file name)
do you have a repro? Hashes should intentionally be the same if both forms are the same and have the same depedencies.