This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-02-15
Channels
- # announcements (8)
- # architecture (9)
- # autochrome-github (1)
- # babashka (48)
- # beginners (55)
- # calva (36)
- # cider (16)
- # clj-commons (1)
- # clj-kondo (38)
- # cljs-dev (44)
- # cljsrn (1)
- # clojure (164)
- # clojure-europe (35)
- # clojure-nl (2)
- # clojure-norway (10)
- # clojure-uk (23)
- # clojurescript (50)
- # conjure (24)
- # core-async (1)
- # cryogen (2)
- # cursive (38)
- # datalevin (11)
- # datascript (2)
- # datomic (13)
- # duct (1)
- # emacs (16)
- # events (12)
- # exercism (3)
- # figwheel-main (7)
- # fulcro (26)
- # honeysql (5)
- # integrant (1)
- # jobs (3)
- # kaocha (6)
- # lsp (72)
- # malli (22)
- # nextjournal (35)
- # nrepl (1)
- # off-topic (34)
- # pathom (5)
- # polylith (8)
- # portal (40)
- # re-frame (14)
- # reagent (42)
- # reitit (1)
- # releases (1)
- # remote-jobs (1)
- # reveal (9)
- # sci (2)
- # shadow-cljs (13)
- # sql (3)
- # tools-deps (33)
- # vim (25)
I tried to build clear myself, but it cannot resolve:
io.github.nextjournal/cas {:git/url "[email protected]:nextjournal/cas"
:git/sha "5e8079b720e347b9466db9c2282ce79a125a011c"}
"io.github.nextjournal/cas" does not exists publicly, it seemsI still have an issue with freezing / un-freezing of TMD datasets. After a JVM restart, I get an exception: Clean cache and re-evaluate makes it go away.
Unhandled java.lang.ClassCastException
class [D cannot be cast to class [Ljava.lang.Object; ([D and
[Ljava.lang.Object; are in module java.base of loader 'bootstrap')
array_buffer.clj: 333 tech.v3.datatype.array-buffer/array-buffer/reify
BufferIter.java: 60 tech.v3.datatype.BufferIter/next
protocols.clj: 49 clojure.core.protocols/iter-reduce
protocols.clj: 75 clojure.core.protocols/fn
protocols.clj: 75 clojure.core.protocols/fn
protocols.clj: 13 clojure.core.protocols/fn/G
core.clj: 6886 clojure.core/transduce
core.clj: 6901 clojure.core/into
core.clj: 6889 clojure.core/into
viewer.cljc: 422 nextjournal.clerk.viewer$describe/invokeStatic
viewer.cljc: 366 nextjournal.clerk.viewer$describe/invoke
viewer.cljc: 424 nextjournal.clerk.viewer$describe$fn__19341/invoke
core.clj: 7300 clojure.core/map-indexed/fn/fn
core.clj: 2881 clojure.core/take/fn/fn
core.clj: 2929 clojure.core/drop/fn/fn
protocols.clj: 49 clojure.core.protocols/iter-reduce
protocols.clj: 75 clojure.core.protocols/fn
protocols.clj: 75 clojure.core.protocols/fn
protocols.clj: 13 clojure.core.protocols/fn/G
core.clj: 6886 clojure.core/transduce
core.clj: 6901 clojure.core/into
core.clj: 6889 clojure.core/into
viewer.cljc: 422 nextjournal.clerk.viewer$describe/invokeStatic
viewer.cljc: 366 nextjournal.clerk.viewer$describe/invoke
viewer.cljc: 424 nextjournal.clerk.viewer$describe$fn__19341/invoke
core.clj: 7300 clojure.core/map-indexed/fn/fn
core.clj: 2881 clojure.core/take/fn/fn
core.clj: 2929 clojure.core/drop/fn/fn
ArraySeq.java: 116 clojure.lang.ArraySeq/reduce
core.clj: 6885 clojure.core/transduce
core.clj: 6901 clojure.core/into
core.clj: 6889 clojure.core/into
viewer.cljc: 422 nextjournal.clerk.viewer$describe/invokeStatic
viewer.cljc: 366 nextjournal.clerk.viewer$describe/invoke
viewer.cljc: 372 nextjournal.clerk.viewer$describe/invokeStatic
viewer.cljc: 366 nextjournal.clerk.viewer$describe/invoke
view.clj: 110 nextjournal.clerk.view/->result
view.clj: 109 nextjournal.clerk.view/->result
view.clj: 164 nextjournal.clerk.view/describe-block
view.clj: 151 nextjournal.clerk.view/describe-block
core.clj: 2635 clojure.core/partial/fn
core.clj: 2746 clojure.core/map/fn/fn
PersistentVector.java: 343 clojure.lang.PersistentVector/reduce
core.clj: 6885 clojure.core/transduce
core.clj: 6901 clojure.core/into
core.clj: 6889 clojure.core/into
view.clj: 171 nextjournal.clerk.view/doc->viewer/fn
core.clj: 6185 clojure.core/update
core.clj: 6177 clojure.core/update
view.clj: 171 nextjournal.clerk.view/doc->viewer
view.clj: 167 nextjournal.clerk.view/doc->viewer
view.clj: 168 nextjournal.clerk.view/doc->viewer
view.clj: 167 nextjournal.clerk.view/doc->viewer
webserver.clj: 80 nextjournal.clerk.webserver/update-doc!
webserver.clj: 78 nextjournal.clerk.webserver/update-doc!
clerk.clj: 221 nextjournal.clerk/show!
clerk.clj: 208 nextjournal.clerk/show!
REPL: 1 kaggle/eval43131
REPL: 1 kaggle/eval43131
Compiler.java: 7181 clojure.lang.Compiler/eval
Compiler.java: 7136 clojure.lang.Compiler/eval
core.clj: 3202 clojure.core/eval
core.clj: 3198 clojure.core/eval
btw you can opt out of the clerk cache by setting the clerk.disable_cache
system prop to a value that isnt false
Thanks for the tip. My current notebook contains "training of a model", which is slow. So using Clerk at all only makes sense, if caching is enabled. For this concrete issue, "cleaning the cache ones" is good enough as work around. Nevertheless I think we need a in-memory cache, even maybe as default. It seems to me that the nippy frezze / unfreeze has many issues and should not be the default.
there is an in-memory cache, I just need to fix an issue where it’s not used for things that aren’t nippy freezable which I’m doing right now
@mkvlr an idea occurs to me about this issue, which I have also faced: could metadata be used to allow Clerk users to annotate values with their own caching functions (analogous to custom viewers)? certainly ordinary Clojure data is easy to persist using Nippy, but if Clerk's caching mechanism were extensible to other disk-backed formats (e.g. CSV/Arrow/etc) then you could potentially cache to a file format that makes sense for bigger things like TMD datasets (or images, etc).
@afoltzm yes! Something @jackrusher has also mentioned and definitely on the roadmap.
that's a strong statement :upside_down_face:
Yes, bad wording from my side.
I wanted to say that I still think that it is technically impossible to make a serialisation system which guarantees to faithfully serialize / deserialize all (even unknown) potential JVM classes out-of-the-box.
So relying fully on it in Clerk and not have a "work around" (= in-memory cache) seems to be dangerous.
Nippy is a great library !!!on my issue with TDM
class [D cannot be cast to class [Ljava.lang.Object; ([D and
[Ljava.lang.Object; are in module java.base of loader 'bootstrap')
Does this means that "something" (Nippy ?) converts object arrays into double arrays or the other way arround? Or are we still see a "rendering" issue in Clerk ? It goes away when cleaning caches (but comes back on next JVM restart)
doing: 1. (clerk/show! "src/kaggle.clj") 2. JVM restart 3. (clerk/show! "src/kaggle.clj")
should trigger it
can highly recommend trying with a hotkey for clerk/show!
https://github.com/nextjournal/clerk#editor-workflow
Regarding "configurable caching": I have my TMD datasets often nested inside a map. This seems to make specific caching via annotations pretty hard. Viewers are simpler, as we control well what to view, while data structures are often just big maps. An other way to say this: The caching should be IMHO completely invisible even if this means that persistent caching (across restarts) is not possible. To wonder about the viewers is quite some work already, to wonder about "how to store objects" should be avoided. (specially as I have not yet a use case for the persistence)
No, I went to hot key in emacs. Very good, indeed ! "C-c c" does all of it:+1:
> The caching should be IMHO completely invisible even if this means that persistent caching (across restarts) is not possible. that’s how it works now or am I misunderstanding?
this seems to be a more minimal repro
(ns kaggle-min
(:require [nextjournal.clerk :as clerk]
[tablecloth.api :as tc]))
(defn load-hp-data [file]
(println "load a file : " file)
(-> (tc/dataset file {:key-fn keyword})
(tc/convert-types (zipmap [:BedroomAbvGr
:BsmtFullBath
:BsmtHalfBath
:Fireplaces
:FullBath
:GarageCars
:HalfBath
:KitchenAbvGr
:OverallCond
:OverallQual
:MoSold
:TotRmsAbvGrd
:MSSubClass]
(repeat :string)))))
(def df (load-hp-data "train.csv.gz"))
(defn ->table [df]
(clerk/table {:head (tc/column-names df)
:rows (tc/rows df :as-seqs)}))
;; # The data
^{::clerk/width :full}
(->table df)
In my personal experience as a data science practitioner, I'd say if you're working with models and datasets that are expensive to retrain/recompute, you probably are going to need to think explicitly about how serializing to disk fits into a workflow at some point. I don't see why Clerk asking its users to be explicit about custom caching is really that different than a lot of work that's already a big part of the data science lifecycle. I think Clerk is a very useful library, but expecting it to work well with heavyweight computation like that without user-supplied configuration may be asking too much of it. My personal preference is for flexible configuration via programming that may require more upfront effort than configuration via settings that is more brittle.
That is for sure true. There is a moment where exploration becomes engineering, and then Clerk is the wrong tool. But this line can be dynamic, and I think Clerk should support the use case where single form computations maybe takes 1 minute. And that minute I only want to wait if really needed (after changes of relevant code). And I think (and it seems so), that Clerk can bring that via caching.
The issue above is clearly related to caching. When I disable nippy cache, it goes away.
@mkvlr I found the root cause of thge exception
Unhandled java.lang.ClassCastException
class [D cannot be cast to class [Ljava.lang.Object; ([D and
[Ljava.lang.Object; are in module java.base of loader 'bootstrap')
it happens when Clerk calls on a dataset:
(#{:nextjournal/missing} df)
Inside the loop of all predicates here: https://github.com/nextjournal/clerk/blob/120b1223cfe6a37a12ce6e13bdbe11fabc6161dc/src/nextjournal/clerk/viewer.cljc#L302
The same exception hppens doing this:
(clerk/eval-string "df")
So does this mean that the nippy caching changes somehow the dataset object and makes it "inconsistent" or something like this ?
It is not a problem of Clerk. https://github.com/techascent/tech.ml.dataset/issues/287
@mkvlr an idea occurs to me about this issue, which I have also faced: could metadata be used to allow Clerk users to annotate values with their own caching functions (analogous to custom viewers)? certainly ordinary Clojure data is easy to persist using Nippy, but if Clerk's caching mechanism were extensible to other disk-backed formats (e.g. CSV/Arrow/etc) then you could potentially cache to a file format that makes sense for bigger things like TMD datasets (or images, etc).
@afoltzm yes! Something @jackrusher has also mentioned and definitely on the roadmap.