Hi Everyone,
I'm running into a weird issue with Clerk that I can't make sense of... I'd guess it's a relatively common use case, so maybe I can get some pointers here. This is Python interop via libpython-clj2. I'm using the Python dataset library to grab some data from the HuggingFace data hub, cache it via defonce; works fine initially... (see screenshot)
But as soon as a re-evaluation happens (like a file save while watching), it blows up; seems like it's trying to parse some license text as Clojure for some reason.
Unhandled clojure.lang.ExceptionInfo
Invalid symbol: .
The actual change is completely irrelevant (like add a whitespace in a comment, or whatever).
After this, when I comment out the line using dataset, the notebook recovers and displays again, but uncommenting leads to the same error again.
Clearing cache, halt, restart via clear-cache! , halt!, serve! doesn't help, but a full JVM restart resets the situation - meaning the first evaluation via Clerk is successful again.
Note that everything is working as it should in the actual REPL, it's just Clerk that gets confused.
Tried with clerk 0.14.919 and 0.15.957 This is Clojure 1.11.1 on Java 19.0.1 .
Anyone ran into this before?
The code for reference:
(ns dlai.clerk-error-minimal
(:require [libpython-clj2.require :refer [require-python]]))
;; require python stuff
(require-python 'datasets)
;; load dataset from web
(defonce dataset (datasets/load_dataset "knkarthick/dialogsum"))Your code defines symbol datasets twice. The require-python does so and the def does so.
In general Clerk does not handle always re-definition of symbols well. Try to use a different name for the def
Eventually there is slight difference in different lib-pythoin versions regarding the precise symbol created on
(require-python `datasets)As far as I know, both are nearly doing the same, namely creating a symbol called dlai.clerk-error-minimal/datasets
Which you can use to call the functions of the module.
So you only need one of the two. Probably the second alone works, as it would overwrite in any case the first definition. (at least outside Clerk)
Thanks for looking into this, @carsten.behring. The require-python defines dataset*s* (plural), while the def defines dataset singular, so there should be no collision / double def .
My own investigation seems to show that depending on the way I do Python interop, some python source files seem to end up processed by parse-file ( https://github.com/nextjournal/clerk/blob/main/src/nextjournal/clerk/parser.cljc#L397 ). Did not have more time to look into it but would definitely like to understand this more.
I see both in plural in your code
Note, that using a different way of Python interop resolves the issue
(ns dlai.clerk-error-minimal
(:require [libpython-clj2.require :refer [require-python]]
[libpython-clj2.python :as py]
[dlai.core :as core]))
;; require python stuff
(require-python 'datasets)
(def datasets (py/import-module "datasets"))
(py/py.. datasets (load_dataset "knkarthick/dialogsum"))could you file an issue ideally with a repro?
Sure thing
Show off: a notebook for using python interop, exercising a large language model in Clerk: http://indolamine.io/assets/clerk/dlai-lab1.html My first sizeable Clerk notebook, would appreciate any feedback (sorry, source is not available yet, just this static build)