This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-02-17
Channels
- # announcements (3)
- # babashka (41)
- # beginners (118)
- # calva (4)
- # cider (22)
- # clj-kondo (4)
- # clj-on-windows (1)
- # clj-together (1)
- # clojure (164)
- # clojure-europe (46)
- # clojure-filipino (1)
- # clojure-indonesia (1)
- # clojure-my (1)
- # clojure-nl (3)
- # clojure-sg (1)
- # clojure-spec (13)
- # clojure-uk (16)
- # clojurescript (18)
- # cloverage (3)
- # conjure (5)
- # core-async (8)
- # cursive (21)
- # datomic (4)
- # deps-new (15)
- # emacs (12)
- # expound (4)
- # fulcro (45)
- # graalvm (32)
- # jobs (1)
- # malli (5)
- # nextjournal (63)
- # off-topic (27)
- # other-languages (3)
- # pathom (27)
- # proletarian (1)
- # rdf (24)
- # re-frame (10)
- # reagent (9)
- # releases (2)
- # shadow-cljs (72)
- # spacemacs (4)
- # timbre (4)
- # tools-deps (29)
- # xtdb (4)
I try to investigate an issue, where Clerk sometimes
decides to re-evaluate a certain form.
I run clerk/show
in a loop and measure time, and indeed I see sometimes far longer duration of form evaluations.
This is confirmed by the debug output coming from here:
https://github.com/nextjournal/clerk/blob/b0e13859af5280e4d75ba9cb324cbf8010e555cd/src/nextjournal/clerk.clj#L145
I have somehow the impression that the hashing of the forms is not always "stable". I see for example that the same form gets different hashes in different calls of "show!" without any code change in between.
"EVAL !!!" :hash "5dr7SHMXW1tdQYL1UbqD6Xdg3bd3jj" :form (def pipe-fn (ml/pipeline (mm/select-columns (concat cat-features numeric-features [:SalePrice])) (mm/replace-missing cat-features :value :NA) (mm/replace-missing numeric-features :midpoint) (fn [ctx] (assoc ctx : train-data)) (mm/transform-one-hot cat-features :full) (mm/set-inference-target :SalePrice) #:metamorph{:id :model} (mm/model {:model-type :smile.regression/random-forest, :trees 1000})))
.....
"EVAL !!!" :hash "5dtaiHaQeang6r83DzaWmbUmTtExFJ" :form (def pipe-fn (ml/pipeline (mm/select-columns (concat cat-features numeric-features [:SalePrice])) (mm/replace-missing cat-features :value :NA) (mm/replace-missing numeric-features :midpoint) (fn [ctx] (assoc ctx : train-data)) (mm/transform-one-hot cat-features :full) (mm/set-inference-target :SalePrice) #:metamorph{:id :model} (mm/model {:model-type :smile.regression/random-forest, :trees 1000})))
So running this:
(def times
(doall
(repeatedly 10 (fn [] (clerk/time-ms (clerk/show! "src/kaggle.clj"))))))
produces: ({:result nil, :time-ms 22870.775138} {:result nil, :time-ms 20775.000902} {:result nil, :time-ms 1289.373927} {:result nil, :time-ms 1285.455593} {:result nil, :time-ms 21002.457645} {:result nil, :time-ms 20808.522161} {:result nil, :time-ms 20450.540389} {:result nil, :time-ms 1207.209592} {:result nil, :time-ms 20773.024323} {:result nil, :time-ms 1282.546341})
4 times fast (from cache) and 6 times slow (Clerk re-evaluated the slow form)
We are getting somewhere...
(repeatedly
10
(fn []
(->
(clerk/parse-file "src/kaggle.clj")
(h/build-graph)
h/hash
(get 'kaggle/train))))
Returns different results.....
;; => ("5dt6Bi1CeH3guTNdnCKCid5TB1hgwA" ;; "5duH7vdgoEpu9E36jc5fxzWJhMJ7Xg" ;; "5dt6Bi1CeH3guTNdnCKCid5TB1hgwA" ;; "5duH7vdgoEpu9E36jc5fxzWJhMJ7Xg" ;; "5duH7vdgoEpu9E36jc5fxzWJhMJ7Xg" ;; "5ds3nXtgTQ4aUw3xt4KiM41tQbJUsE" ;; "5duH7vdgoEpu9E36jc5fxzWJhMJ7Xg" ;; "5dt6Bi1CeH3guTNdnCKCid5TB1hgwA" ;; "5duH7vdgoEpu9E36jc5fxzWJhMJ7Xg" ;; "5ds3nXtgTQ4aUw3xt4KiM41tQbJUsE")
Very inteesting.... I came this far: (-> (clerk/parse-file "src/kaggle.clj") (h/build-graph) :graph :dependencies (get 'kaggle/r-object)) ;; => #{clojure.core/ex-info ;; clojure.string/split ;; kaggle/base-url ;; clojure.core/first ;; clojure.lang.RT/nth ;; opencpu-clj.ocpu/object ;; clojure.lang.Numbers/gt}
The 'kaggle/r-object changes it's hash, eventhough all deps are a constant string "kaggle/base-url" and the other symbols are from jars
So the hash should be constant, should it ?
One thing ...
For ope symbol no location was found ...
(h/find-location 'opencpu-clj.ocpu/object) -> nil
this should not be
Is it this "-" in the ns ?
"_" vs "-"
here we go !
(h/ns->jar (namespace 'opencpu_clj.ocpu/object)) -> working
(h/ns->jar (namespace 'opencpu-clj.ocpu/object)) > nil
This fix is needed in my view:
(defn ns->path [ns]
(str/replace (str/replace (str ns) "-" "_") "." fs/file-separator))
So, the jar get found for opencpu-clj
namespace.
now all deps above have a location.
But the hash of kaggle/r-object
is nevertheless sometimes different.
-> a hash of one of the deps need to be different "sometimes" as code text is constant!
where’s the r-object
coming from? don’t see it in https://github.com/behrica/kaggleHP/blob/main/src/kaggle.clj
I will try to commit later:
(defn r-object [library function params]
(let [resp (ocpu/object base-url :library library :R function params)]
(when (> (:status resp) 201) (throw (ex-info "error" resp)))
(-> resp
:result
first
(str/split #"/")
(nth 3))))
It only depends on one more object:
(def base-url "")
I tried everything, but don't understand it. All inputs to the hashing are constant, and the hashing itself seems as well deterministic
I did not find , why it fails, but at least which line it makes fail.
(str/split #"/")
So replacing the above definition with:
(def re #"/")
(defn r-object [library function params]
(let [resp (ocpu/object base-url :library library :R function params)]
(when (> (:status resp) 201) (throw (ex-info "error" resp)))
(-> resp
:result
first
(str/split re)
(nth 3))))
makes it work.
This small example reproduces it.
There is something strange about the re
(ns kaggle
(:require [nextjournal.clerk :as clerk]
[nextjournal.clerk.viewer :as v]))
(def re #"/")
(defn use-re--working-always [s]
(println "very slow !! - cache working always")
(Thread/sleep 5000)
(clojure.string/split s re))
(defn use-re--failing-sometimes [s]
(println "very slow !! -- cache failing sometimes ")
(Thread/sleep 5000)
(clojure.string/split s #"/"))
(def splitted (use-re--working-always "hello/clerk"))
(comment
(def times
(do
(clerk/clear-cache!)
(doall
(repeatedly 100 (fn [] (clerk/time-ms (clerk/show! "src/buggy.clj")))))))
:ok)
;; using `use-re--failing-sometimes` does not cache 3 times
(->> times (map :time-ms) (filter #(> % 5000)))
;; => (5118.088344
;; 5112.789448
;; 5111.539959
;; using `use-re--alawys-working` does always cache
(->> times (map :time-ms) (filter #(> % 5000)))
;; => []
In here the "frequency" of "Not caching" is very low, 3 out of 1000 In my real code is higher, evey 5th run, I would say.
(require '[nextjournal.clerk.hashing :as h])
(into []
(comp (map (comp set vals h/hash h/build-graph h/parse-clojure-string))
(distinct))
(repeat 100 "(fn [x] (clojure.string/split x #\"/\"))"))
(frequencies (into []
(comp (map (comp vals h/hash h/build-graph h/parse-clojure-string)))
(repeat 1000 "(fn [x] (clojure.string/split x #\"/\"))")))
;;=> {("5dtVT9HVuLAa4BCJBWwz9KQ2b9u37n" "5drKysDWADNZJWwWfgWYs6TETrz7zx")
946,
("5dtVT9HVuLAa4BCJBWwz9KQ2b9u37n" "5dt92a8LBSfEScpUk3X7Y1L3vBDKJm")
54}
Indeed interesting.
But I though that Clerk uses the "text representation" via prn-str
for hashing.
for hashing yes but as nodes in the dep graph it uses either var names or for top level expression the read form
Looks good to me 👍
Clerk evaluated 'src/kaggle.clj' in 373.011182ms.
Clerk evaluated 'src/kaggle.clj' in 408.822775ms.
Clerk evaluated 'src/kaggle.clj' in 454.97627ms.
Clerk evaluated 'src/kaggle.clj' in 529.977895ms.
Clerk evaluated 'src/kaggle.clj' in 506.647304ms.
Clerk evaluated 'src/kaggle.clj' in 437.870516ms.
Clerk evaluated 'src/kaggle.clj' in 466.663666ms.
Clerk evaluated 'src/kaggle.clj' in 431.667927ms.
Clerk evaluated 'src/kaggle.clj' in 488.64744ms.
Clerk evaluated 'src/kaggle.clj' in 449.524183ms.
I am trying to make a viewer for a MD dataset and started withg something like:
(clerk/set-viewers! [{:pred tc/dataset?
:render-fn (quote v/table-viewer)
:transform-fn (fn [x]
(->
(cons (tds/column-names x)
(into [] (tds/rowvecs x)))))}])
The transform-fn transforms the dataset to something the table viewer of Clerk understands, but it does not work.@carsten.behring something like this should work
(clerk/set-viewers! [{:pred tc/dataset?
:transform-fn #(hash-map :nextjournal/value
(clerk/table {:head (tds/column-names x)
:rows (tds/rowvecs x)}))}])
this is what I used to test it
^{::clerk/viewer {:transform-fn #(hash-map :nextjournal/value
(clerk/table {:head (first %) :rows (rest %)}))}}
[[:a :b] [:c :d]]
@carsten.behring on latest main
, this can be simplified to
{:pred tc/dataset?
:transform-fn #(clerk/table {:head (tds/column-names %)
:rows (tds/rowvecs %)})}
ok, I will try it.👍 Would be good to have at some point docu somewhere on the viewer API, it is for sure the most complex public surface of Clerk.
Yes, great !
Do you think there is any way, to get this working with Clerk ? https://datatables.net/