Fork me on GitHub
#nextjournal
<
2022-02-15
>
Carsten Behring10:02:37

I tried to build clear myself, but it cannot resolve:

io.github.nextjournal/cas {:git/url "[email protected]:nextjournal/cas"
                                                     :git/sha "5e8079b720e347b9466db9c2282ce79a125a011c"}
"io.github.nextjournal/cas" does not exists publicly, it seems

Carsten Behring11:02:48

I still have an issue with freezing / un-freezing of TMD datasets. After a JVM restart, I get an exception: Clean cache and re-evaluate makes it go away.

Unhandled java.lang.ClassCastException
   class [D cannot be cast to class [Ljava.lang.Object; ([D and
   [Ljava.lang.Object; are in module java.base of loader 'bootstrap')

          array_buffer.clj:  333  tech.v3.datatype.array-buffer/array-buffer/reify
           BufferIter.java:   60  tech.v3.datatype.BufferIter/next
             protocols.clj:   49  clojure.core.protocols/iter-reduce
             protocols.clj:   75  clojure.core.protocols/fn
             protocols.clj:   75  clojure.core.protocols/fn
             protocols.clj:   13  clojure.core.protocols/fn/G
                  core.clj: 6886  clojure.core/transduce
                  core.clj: 6901  clojure.core/into
                  core.clj: 6889  clojure.core/into
               viewer.cljc:  422  nextjournal.clerk.viewer$describe/invokeStatic
               viewer.cljc:  366  nextjournal.clerk.viewer$describe/invoke
               viewer.cljc:  424  nextjournal.clerk.viewer$describe$fn__19341/invoke
                  core.clj: 7300  clojure.core/map-indexed/fn/fn
                  core.clj: 2881  clojure.core/take/fn/fn
                  core.clj: 2929  clojure.core/drop/fn/fn
             protocols.clj:   49  clojure.core.protocols/iter-reduce
             protocols.clj:   75  clojure.core.protocols/fn
             protocols.clj:   75  clojure.core.protocols/fn
             protocols.clj:   13  clojure.core.protocols/fn/G
                  core.clj: 6886  clojure.core/transduce
                  core.clj: 6901  clojure.core/into
                  core.clj: 6889  clojure.core/into
               viewer.cljc:  422  nextjournal.clerk.viewer$describe/invokeStatic
               viewer.cljc:  366  nextjournal.clerk.viewer$describe/invoke
               viewer.cljc:  424  nextjournal.clerk.viewer$describe$fn__19341/invoke
                  core.clj: 7300  clojure.core/map-indexed/fn/fn
                  core.clj: 2881  clojure.core/take/fn/fn
                  core.clj: 2929  clojure.core/drop/fn/fn
             ArraySeq.java:  116  clojure.lang.ArraySeq/reduce
                  core.clj: 6885  clojure.core/transduce
                  core.clj: 6901  clojure.core/into
                  core.clj: 6889  clojure.core/into
               viewer.cljc:  422  nextjournal.clerk.viewer$describe/invokeStatic
               viewer.cljc:  366  nextjournal.clerk.viewer$describe/invoke
               viewer.cljc:  372  nextjournal.clerk.viewer$describe/invokeStatic
               viewer.cljc:  366  nextjournal.clerk.viewer$describe/invoke
                  view.clj:  110  nextjournal.clerk.view/->result
                  view.clj:  109  nextjournal.clerk.view/->result
                  view.clj:  164  nextjournal.clerk.view/describe-block
                  view.clj:  151  nextjournal.clerk.view/describe-block
                  core.clj: 2635  clojure.core/partial/fn
                  core.clj: 2746  clojure.core/map/fn/fn
     PersistentVector.java:  343  clojure.lang.PersistentVector/reduce
                  core.clj: 6885  clojure.core/transduce
                  core.clj: 6901  clojure.core/into
                  core.clj: 6889  clojure.core/into
                  view.clj:  171  nextjournal.clerk.view/doc->viewer/fn
                  core.clj: 6185  clojure.core/update
                  core.clj: 6177  clojure.core/update
                  view.clj:  171  nextjournal.clerk.view/doc->viewer
                  view.clj:  167  nextjournal.clerk.view/doc->viewer
                  view.clj:  168  nextjournal.clerk.view/doc->viewer
                  view.clj:  167  nextjournal.clerk.view/doc->viewer
             webserver.clj:   80  nextjournal.clerk.webserver/update-doc!
             webserver.clj:   78  nextjournal.clerk.webserver/update-doc!
                 clerk.clj:  221  nextjournal.clerk/show!
                 clerk.clj:  208  nextjournal.clerk/show!
                      REPL:    1  kaggle/eval43131
                      REPL:    1  kaggle/eval43131
             Compiler.java: 7181  clojure.lang.Compiler/eval
             Compiler.java: 7136  clojure.lang.Compiler/eval
                  core.clj: 3202  clojure.core/eval
                  core.clj: 3198  clojure.core/eval

1
mkvlr11:02:24

btw you can opt out of the clerk cache by setting the clerk.disable_cache system prop to a value that isnt false

Carsten Behring11:02:12

Thanks for the tip. My current notebook contains "training of a model", which is slow. So using Clerk at all only makes sense, if caching is enabled. For this concrete issue, "cleaning the cache ones" is good enough as work around. Nevertheless I think we need a in-memory cache, even maybe as default. It seems to me that the nippy frezze / unfreeze has many issues and should not be the default.

mkvlr12:02:20

that’s a strong statement 🙃

mkvlr12:02:36

it does work incredibly well with regular Clojure data

mkvlr12:02:23

there is an in-memory cache, I just need to fix an issue where it’s not used for things that aren’t nippy freezable which I’m doing right now

respatialized15:02:20

@mkvlr an idea occurs to me about this issue, which I have also faced: could metadata be used to allow Clerk users to annotate values with their own caching functions (analogous to custom viewers)? certainly ordinary Clojure data is easy to persist using Nippy, but if Clerk's caching mechanism were extensible to other disk-backed formats (e.g. CSV/Arrow/etc) then you could potentially cache to a file format that makes sense for bigger things like TMD datasets (or images, etc).

mkvlr15:02:00

@afoltzm yes! Something @jackrusher has also mentioned and definitely on the roadmap.

🎯 1
1
Carsten Behring15:02:22

that's a strong statement :upside_down_face:
Yes, bad wording from my side. I wanted to say that I still think that it is technically impossible to make a serialisation system which guarantees to faithfully serialize / deserialize all (even unknown) potential JVM classes out-of-the-box. So relying fully on it in Clerk and not have a "work around" (= in-memory cache) seems to be dangerous. Nippy is a great library !!!

Carsten Behring15:02:32

on my issue with TDM

class [D cannot be cast to class [Ljava.lang.Object; ([D and
   [Ljava.lang.Object; are in module java.base of loader 'bootstrap')

Carsten Behring15:02:47

Does this means that "something" (Nippy ?) converts object arrays into double arrays or the other way arround? Or are we still see a "rendering" issue in Clerk ? It goes away when cleaning caches (but comes back on next JVM restart)

mkvlr15:02:09

do you have a small repro of the above error?

Carsten Behring15:02:58

doing: 1. (clerk/show! "src/kaggle.clj") 2. JVM restart 3. (clerk/show! "src/kaggle.clj")

Carsten Behring15:02:04

should trigger it

mkvlr15:02:19

thanks, I’ll try…

mkvlr15:02:51

btw, you’re still mostly working with the file watcher, right?

mkvlr15:02:29

can highly recommend trying with a hotkey for clerk/show! https://github.com/nextjournal/clerk#editor-workflow

Carsten Behring15:02:57

Regarding "configurable caching": I have my TMD datasets often nested inside a map. This seems to make specific caching via annotations pretty hard. Viewers are simpler, as we control well what to view, while data structures are often just big maps. An other way to say this: The caching should be IMHO completely invisible even if this means that persistent caching (across restarts) is not possible. To wonder about the viewers is quite some work already, to wonder about "how to store objects" should be avoided. (specially as I have not yet a use case for the persistence)

Carsten Behring15:02:44

No, I went to hot key in emacs. Very good, indeed ! "C-c c" does all of it:+1:

mkvlr15:02:00

> The caching should be IMHO completely invisible even if this means that persistent caching (across restarts) is not possible. that’s how it works now or am I misunderstanding?

mkvlr15:02:31

(well besides the bug you’re encountering 😹)

👍 1
mkvlr15:02:02

do you know which cell it is that causes the failure?

mkvlr16:02:07

seems to be this one (def test-data\n (load-hp-data \"test.csv.gz\"))

mkvlr16:02:59

this seems to be a more minimal repro

(ns kaggle-min
  (:require [nextjournal.clerk :as clerk]
            [tablecloth.api :as tc]))



(defn load-hp-data [file]
  (println "load a file : " file)
  (-> (tc/dataset file {:key-fn keyword})

      (tc/convert-types (zipmap [:BedroomAbvGr
                                 :BsmtFullBath
                                 :BsmtHalfBath
                                 :Fireplaces
                                 :FullBath
                                 :GarageCars
                                 :HalfBath
                                 :KitchenAbvGr
                                 :OverallCond
                                 :OverallQual
                                 :MoSold
                                 :TotRmsAbvGrd
                                 :MSSubClass]
                                (repeat :string)))))

(def df (load-hp-data "train.csv.gz"))

(defn ->table [df]
  (clerk/table {:head (tc/column-names df)
                :rows (tc/rows df :as-seqs)}))


;;  # The data
^{::clerk/width :full}
(->table df)

respatialized16:02:25

In my personal experience as a data science practitioner, I'd say if you're working with models and datasets that are expensive to retrain/recompute, you probably are going to need to think explicitly about how serializing to disk fits into a workflow at some point. I don't see why Clerk asking its users to be explicit about custom caching is really that different than a lot of work that's already a big part of the data science lifecycle. I think Clerk is a very useful library, but expecting it to work well with heavyweight computation like that without user-supplied configuration may be asking too much of it. My personal preference is for flexible configuration via programming that may require more upfront effort than configuration via settings that is more brittle.

2
Carsten Behring23:02:51

That is for sure true. There is a moment where exploration becomes engineering, and then Clerk is the wrong tool. But this line can be dynamic, and I think Clerk should support the use case where single form computations maybe takes 1 minute. And that minute I only want to wait if really needed (after changes of relevant code). And I think (and it seems so), that Clerk can bring that via caching.

👍 1
Carsten Behring19:02:24

The issue above is clearly related to caching. When I disable nippy cache, it goes away.

Carsten Behring12:02:22

@mkvlr I found the root cause of thge exception

Unhandled java.lang.ClassCastException
   class [D cannot be cast to class [Ljava.lang.Object; ([D and
   [Ljava.lang.Object; are in module java.base of loader 'bootstrap')

Carsten Behring12:02:58

it happens when Clerk calls on a dataset:

(#{:nextjournal/missing} df)

Carsten Behring12:02:20

The same exception hppens doing this:

(clerk/eval-string "df")

Carsten Behring12:02:23

So does this mean that the nippy caching changes somehow the dataset object and makes it "inconsistent" or something like this ?

respatialized15:02:20
replied to a thread:I still have an issue with freezing / un-freezing of TMD datasets. After a JVM restart, I get an exception: Clean cache and re-evaluate makes it go away. Unhandled java.lang.ClassCastException class [D cannot be cast to class [Ljava.lang.Object; ([D and [Ljava.lang.Object; are in module java.base of loader 'bootstrap') array_buffer.clj: 333 tech.v3.datatype.array-buffer/array-buffer/reify BufferIter.java: 60 tech.v3.datatype.BufferIter/next protocols.clj: 49 clojure.core.protocols/iter-reduce protocols.clj: 75 clojure.core.protocols/fn protocols.clj: 75 clojure.core.protocols/fn protocols.clj: 13 clojure.core.protocols/fn/G core.clj: 6886 clojure.core/transduce core.clj: 6901 clojure.core/into core.clj: 6889 clojure.core/into viewer.cljc: 422 nextjournal.clerk.viewer$describe/invokeStatic viewer.cljc: 366 nextjournal.clerk.viewer$describe/invoke viewer.cljc: 424 nextjournal.clerk.viewer$describe$fn__19341/invoke core.clj: 7300 clojure.core/map-indexed/fn/fn core.clj: 2881 clojure.core/take/fn/fn core.clj: 2929 clojure.core/drop/fn/fn protocols.clj: 49 clojure.core.protocols/iter-reduce protocols.clj: 75 clojure.core.protocols/fn protocols.clj: 75 clojure.core.protocols/fn protocols.clj: 13 clojure.core.protocols/fn/G core.clj: 6886 clojure.core/transduce core.clj: 6901 clojure.core/into core.clj: 6889 clojure.core/into viewer.cljc: 422 nextjournal.clerk.viewer$describe/invokeStatic viewer.cljc: 366 nextjournal.clerk.viewer$describe/invoke viewer.cljc: 424 nextjournal.clerk.viewer$describe$fn__19341/invoke core.clj: 7300 clojure.core/map-indexed/fn/fn core.clj: 2881 clojure.core/take/fn/fn core.clj: 2929 clojure.core/drop/fn/fn ArraySeq.java: 116 clojure.lang.ArraySeq/reduce core.clj: 6885 clojure.core/transduce core.clj: 6901 clojure.core/into core.clj: 6889 clojure.core/into viewer.cljc: 422 nextjournal.clerk.viewer$describe/invokeStatic viewer.cljc: 366 nextjournal.clerk.viewer$describe/invoke viewer.cljc: 372 nextjournal.clerk.viewer$describe/invokeStatic viewer.cljc: 366 nextjournal.clerk.viewer$describe/invoke view.clj: 110 nextjournal.clerk.view/-&gt;result view.clj: 109 nextjournal.clerk.view/-&gt;result view.clj: 164 nextjournal.clerk.view/describe-block view.clj: 151 nextjournal.clerk.view/describe-block core.clj: 2635 clojure.core/partial/fn core.clj: 2746 clojure.core/map/fn/fn PersistentVector.java: 343 clojure.lang.PersistentVector/reduce core.clj: 6885 clojure.core/transduce core.clj: 6901 clojure.core/into core.clj: 6889 clojure.core/into view.clj: 171 nextjournal.clerk.view/doc-&gt;viewer/fn core.clj: 6185 clojure.core/update core.clj: 6177 clojure.core/update view.clj: 171 nextjournal.clerk.view/doc-&gt;viewer view.clj: 167 nextjournal.clerk.view/doc-&gt;viewer view.clj: 168 nextjournal.clerk.view/doc-&gt;viewer view.clj: 167 nextjournal.clerk.view/doc-&gt;viewer webserver.clj: 80 nextjournal.clerk.webserver/update-doc! webserver.clj: 78 nextjournal.clerk.webserver/update-doc! clerk.clj: 221 nextjournal.clerk/show! clerk.clj: 208 nextjournal.clerk/show! REPL: 1 kaggle/eval43131 REPL: 1 kaggle/eval43131 Compiler.java: 7181 clojure.lang.Compiler/eval Compiler.java: 7136 clojure.lang.Compiler/eval core.clj: 3202 clojure.core/eval core.clj: 3198 clojure.core/eval

@mkvlr an idea occurs to me about this issue, which I have also faced: could metadata be used to allow Clerk users to annotate values with their own caching functions (analogous to custom viewers)? certainly ordinary Clojure data is easy to persist using Nippy, but if Clerk's caching mechanism were extensible to other disk-backed formats (e.g. CSV/Arrow/etc) then you could potentially cache to a file format that makes sense for bigger things like TMD datasets (or images, etc).

mkvlr15:02:00
replied to a thread:I still have an issue with freezing / un-freezing of TMD datasets. After a JVM restart, I get an exception: Clean cache and re-evaluate makes it go away. Unhandled java.lang.ClassCastException class [D cannot be cast to class [Ljava.lang.Object; ([D and [Ljava.lang.Object; are in module java.base of loader 'bootstrap') array_buffer.clj: 333 tech.v3.datatype.array-buffer/array-buffer/reify BufferIter.java: 60 tech.v3.datatype.BufferIter/next protocols.clj: 49 clojure.core.protocols/iter-reduce protocols.clj: 75 clojure.core.protocols/fn protocols.clj: 75 clojure.core.protocols/fn protocols.clj: 13 clojure.core.protocols/fn/G core.clj: 6886 clojure.core/transduce core.clj: 6901 clojure.core/into core.clj: 6889 clojure.core/into viewer.cljc: 422 nextjournal.clerk.viewer$describe/invokeStatic viewer.cljc: 366 nextjournal.clerk.viewer$describe/invoke viewer.cljc: 424 nextjournal.clerk.viewer$describe$fn__19341/invoke core.clj: 7300 clojure.core/map-indexed/fn/fn core.clj: 2881 clojure.core/take/fn/fn core.clj: 2929 clojure.core/drop/fn/fn protocols.clj: 49 clojure.core.protocols/iter-reduce protocols.clj: 75 clojure.core.protocols/fn protocols.clj: 75 clojure.core.protocols/fn protocols.clj: 13 clojure.core.protocols/fn/G core.clj: 6886 clojure.core/transduce core.clj: 6901 clojure.core/into core.clj: 6889 clojure.core/into viewer.cljc: 422 nextjournal.clerk.viewer$describe/invokeStatic viewer.cljc: 366 nextjournal.clerk.viewer$describe/invoke viewer.cljc: 424 nextjournal.clerk.viewer$describe$fn__19341/invoke core.clj: 7300 clojure.core/map-indexed/fn/fn core.clj: 2881 clojure.core/take/fn/fn core.clj: 2929 clojure.core/drop/fn/fn ArraySeq.java: 116 clojure.lang.ArraySeq/reduce core.clj: 6885 clojure.core/transduce core.clj: 6901 clojure.core/into core.clj: 6889 clojure.core/into viewer.cljc: 422 nextjournal.clerk.viewer$describe/invokeStatic viewer.cljc: 366 nextjournal.clerk.viewer$describe/invoke viewer.cljc: 372 nextjournal.clerk.viewer$describe/invokeStatic viewer.cljc: 366 nextjournal.clerk.viewer$describe/invoke view.clj: 110 nextjournal.clerk.view/-&gt;result view.clj: 109 nextjournal.clerk.view/-&gt;result view.clj: 164 nextjournal.clerk.view/describe-block view.clj: 151 nextjournal.clerk.view/describe-block core.clj: 2635 clojure.core/partial/fn core.clj: 2746 clojure.core/map/fn/fn PersistentVector.java: 343 clojure.lang.PersistentVector/reduce core.clj: 6885 clojure.core/transduce core.clj: 6901 clojure.core/into core.clj: 6889 clojure.core/into view.clj: 171 nextjournal.clerk.view/doc-&gt;viewer/fn core.clj: 6185 clojure.core/update core.clj: 6177 clojure.core/update view.clj: 171 nextjournal.clerk.view/doc-&gt;viewer view.clj: 167 nextjournal.clerk.view/doc-&gt;viewer view.clj: 168 nextjournal.clerk.view/doc-&gt;viewer view.clj: 167 nextjournal.clerk.view/doc-&gt;viewer webserver.clj: 80 nextjournal.clerk.webserver/update-doc! webserver.clj: 78 nextjournal.clerk.webserver/update-doc! clerk.clj: 221 nextjournal.clerk/show! clerk.clj: 208 nextjournal.clerk/show! REPL: 1 kaggle/eval43131 REPL: 1 kaggle/eval43131 Compiler.java: 7181 clojure.lang.Compiler/eval Compiler.java: 7136 clojure.lang.Compiler/eval core.clj: 3202 clojure.core/eval core.clj: 3198 clojure.core/eval

@afoltzm yes! Something @jackrusher has also mentioned and definitely on the roadmap.

🎯 1
1