This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-02-14
Channels
- # announcements (11)
- # babashka (82)
- # beginners (51)
- # calva (11)
- # cider (3)
- # clj-kondo (62)
- # cljdoc (10)
- # cljs-dev (22)
- # clojure (75)
- # clojure-boston (1)
- # clojure-brasil (3)
- # clojure-czech (4)
- # clojure-europe (49)
- # clojure-france (10)
- # clojure-italy (16)
- # clojure-nl (5)
- # clojure-uk (9)
- # clojurescript (69)
- # community-development (33)
- # conjure (12)
- # core-async (6)
- # cursive (2)
- # datalevin (7)
- # datomic (6)
- # graalvm (13)
- # gratitude (2)
- # honeysql (3)
- # introduce-yourself (1)
- # lsp (37)
- # nextjournal (62)
- # off-topic (29)
- # pathom (1)
- # quil (2)
- # reitit (4)
- # releases (2)
- # sci (1)
- # shadow-cljs (28)
- # spacemacs (10)
- # sql (1)
- # tools-build (3)
- # vim (3)
Can it be that Clojure "fn" are never cached by Clerk? So in a code of:
(defn my-fn ....)
(def a-result (a-function-using-my-fn my-fn ))
a-result
would be re-evaluated every time, even though the body of my-fn
is unchanged ?The concreate code is this:
(def pipe-fn
(ml/pipeline
(mm/replace-missing [:BsmtCond :PoolQC] :value :NA)
(mm/select-columns [:OverallQual :GarageCars :BsmtCond
:GrLivArea :1stFlrSF :2ndFlrSF :TotalBsmtSF :GarageArea :Neighborhood :YearBuilt
:SalePrice])
(fn [ctx]
(assoc ctx : (load-hp-data "train.csv.gz")))
(mm/transform-one-hot [:OverallQual :GarageCars :Neighborhood :BsmtCond :PoolQC] :full)
(mm/set-inference-target :SalePrice)
{:metamorph/id :model}
(mm/model {:model-type :smile.regression/gradient-tree-boost
:max-depth 50
:max-nodes 10
:node-size 8
:trees 2000})))
(def result
(ml/evaluate-pipelines [pipe-fn] splits ml/rmse :loss))
pipe-fn
is a function and it gets passed to evaluate-pipelines
I see that evaluate-pipelines
is re-run even without any change in the file.
on nextjournal.clerk/show!
After enable the exceptions, I get indeed this:
The type of pipe-fn
is "un-freezable" and so it does not cache it.
:freeze-error #error {
:cause "Unfreezable type: class scicloj.metamorph.core$pipeline$local_pipeline__41340"
:data {:type scicloj.metamorph.core$pipeline$local_pipeline__41340, :as-str "#function[scicloj.metamorph.core/pipeline/local-pipeline--41340]"}
:via
[{:type clojure.lang.ExceptionInfo
:message "Unfreezable type: class scicloj.metamorph.core$pipeline$local_pipeline__41340"
:data {:type scicloj.metamorph.core$pipeline$local_pipeline__41340, :as-str "#function[scicloj.metamorph.core/pipeline/local-pipeline--41340]"}
:at [taoensso.nippy$throw_unfreezable invokeStatic "nippy.clj" 1003]}]
Ok, I went more in detail and understood the reason for the issue. Clerk fails to cache any result which contains a "fn", so expression getting evaluated repeated. Simples show case:
(defn my-fn [] nil)
(def b {:fn my-fn
:y (do (println "slow") (Thread/sleep 10000) :a)})
'b' does not get cached, so it evaluated on every call to show!
even without code change,
can you run https://github.com/nextjournal/clerk/blob/72fae00d67d42122b76f9a22eeaae683d571eec4/src/nextjournal/clerk.clj#L20-L21 and then re-evaluate your notebook and paste the output of running https://github.com/nextjournal/clerk/blob/72fae00d67d42122b76f9a22eeaae683d571eec4/src/nextjournal/clerk.clj#L22 here?
it seems we can also improve this situation by falling back to an in-memory cache when the nippy cache (which is also persistent across JVM restarts) fails. This should make the caching work as long as you’re looking at the same notebook even if nippy cannot freeze & thaw it.
yes. I was thinking about that. It is true that for me the use case of persistent caching was not super important, as none of the other notebooks has it.
for sure "only nippy" seems too restrictive to me.
I had the issue with "fns" and nippy before. functions cannot be serialized, as far as i remember.
they claim to have solved it ... https://tech.redplanetlabs.com/2020/01/06/serializing-and-deserializing-clojure-fns-with-nippy/
yep, I’m aware of that. So far not caching functions hasn’t been a problem since I’ve not yet run into code where the result is a function but evaluation of it takes a long time. Does that apply to your code above?
though that you’re seeing the unfreezable error makes me think it’s a different problem and can be solved by tweaking the allow list
Yes. The "result" above contains functions. It is the result of a ML model training, so takes long.
I see this a very frequent situation. It is idiomatic Clojure to pass maps around which contain fns, as fns are "first class".
> though that you’re seeing the unfreezable error makes me think it’s a different problem and can be solved by tweaking the allow list
why different ? Fns are "unfreezable", no ?
I uncommented the "error printing" in clerk to see them.
They are indeed hidden in current Clerk code.
I just noted that the training was done repeatedly (due to logging of the training process itself and slowness of notebook evaluation) It all worked, just slow due to repeated execution (as cache as not working for the fns inside my result
)
but you can see Clerk does not reevaluate these:
(ns test)
(defn my-fn [x]
(prn :my-fn x)
(:hello x))
(defn a-function-using-my-fn [f x]
(prn :a-function-using-my-fn)
(f x))
(def a-result
(do
(prn :a-result)
(a-function-using-my-fn my-fn {:hello :world})))
Indeed, in your code it works. The issue is if a var contains a fn.
(defn my-fn [] nil)
(def b {:fn my-fn
:y (do (println "slow") (Thread/sleep 10000) :a)})
In this b
will be re-evaluated every time we do clerk/show!
So my initial comments was wrong.
This one fixes it for me: https://github.com/redplanetlabs/nippy-serializable-fns
So it allows indeed to serialize fn
with nippy. I can make a PR to add it.
The "drawback" of freezing fns is the need of "identical code" in freeze and un-freeze. But "cleaning the cache" can guarantee this in clerk. Just to be documented, maybe.
I did this PR https://github.com/nextjournal/clerk/pull/81 but might need more testing. Inside a single JVM run it solves the freezing for fns it seems. It solved my issue, at least.
> All JVM instances that could end up deserializing a fn instance are required to be launched from the same precompiled jar. from https://tech.redplanetlabs.com/2020/01/06/serializing-and-deserializing-clojure-fns-with-nippy/
Not sure if this is a big problem in Clerk. It requires that "freeze" and "un-freeze" are called by the "same code" (at least the same code where the seralized function come from) In the typical situation of usage of the Clerk cache (I freeze "now", and un-freeze 2 minutes later) this is given. The Clerk cache is not typicaly used for long-term storage, is it ? (with a high chance of code changes in between) And "clean cache" will fix it in any case.
But indeed "my issue" could be solved by an in-memory cache as well. (so not using nippy at all)
We seem to have three options to "get a result" for a form: • re-compute • in-memory cache without nippy • current persistent nippy based cache all have pros and cons...
interesting, was not on my radar so far,
"try nippy else re-compute" as current, rules Clerk out for analysis with long running operations. It seems to me that nippy has quite some more gaps in type coverage. "general serialisation of all types" is a hard problem But maybe we can start by logging , so we see when "try nippy" fails and re-compute was triggered.
I don't want to be forced to restrict my data, only to make Clerk caching work (or accept re-compute)
I will test it, I have a good test case.
Maybe it can be even something to be later configurable per form: • cache in-memory • persistent cache • no cache with a per namespace default
#82 fixes my problem. (at least while in the same JVM process). So both #80 and #82 fix my issue with the fns, in 2 different ways. (while in the same JVM process) I cannot see a speed difference (in-memory cache vs nippy on-disk cache), not seems immidiate. #80 should make the cache work even across JVM runs, but I can not test this due to issue with freeze / unfreeze of tech datasets
@U7CAHM72M excellent, thanks! I’ll go with #82 for now, as that fixes an obvious bug. Feel free to play with the nippy fns approach in userspace, see https://github.com/nextjournal/clerk/pull/81#issuecomment-1040408571.
@U5H74UNSF Maybe there is an other bug lurking... It seems to me that the caching only works "once", on the overnext evaluation it is evaluated again. It seem, that the memory-cache only works "once".
Not true... I need to test more. Seems to work. Is there a way I can make my own install of clerk ?? The "bb release:jar" is not working
No:
clojure -T:build jar
Cloning: [email protected]:nextjournal/cas
Downloading: org/slf4j/slf4j-nop/maven-metadata.xml from central
Error building classpath. Unable to clone /home/carsten/.gitlibs/_repos/ssh/github.com/nextjournal/cas
ERROR: Repository not found.
fatal: Could not read from remote repository.
Please make sure you have the correct access rights
and the repository exists.
does not exist
yes, good idea 👍