Fork me on GitHub
#babashka
<
2023-07-22
>
timo13:07:49

As I am writing a Datahike pod I tried out how the error handling works and I now got a problem that I don't know how to solve. Based on bad input to a function it throws an exception but this exception seems not to be handled well in babashka or is it something I need to fix? https://app.circleci.com/pipelines/github/TimoKramer/datahike/458/workflows/0bb82d13-49ce-4f24-90bb-053dc6721dca/jobs/2251 https://github.com/TimoKramer/datahike/blob/feat/babashka-pod/bb/resources/native-image-tests/run-bb-pod-tests.clj#L63 running locally with bb ni-cli && ./bb/resources/native-image-tests/run-bb-pod-tests.clj

borkdude13:07:00

You're supposed to handle exceptions yourself in the pod loop and send an exception response

borkdude13:07:29

I think it's documented in the README of the pods project. Let me know it isn't. There's also tests for it

timo13:07:25

I think I got it as advertised, basically copied it over from other pods: https://github.com/TimoKramer/datahike/blob/feat/babashka-pod/src/datahike/pod.clj#L239-267

borkdude13:07:29

yeah. is there a question left at this point?

timo13:07:24

Just thought this is a problem in bb but if you're sure it's not I guess you're right:

#error {
 :cause "clojure.lang.PersistentVector cannot be cast to byte[]"
 :via
 [{:type java.lang.ClassCastException
   :message "clojure.lang.PersistentVector cannot be cast to byte[]"
   :at [babashka.pods.impl$bytes__GT_string invokeStatic "impl.clj" 28]}]
 :trace
 [[babashka.pods.impl$bytes__GT_string invokeStatic "impl.clj" 28]
  [babashka.pods.impl$processor invokeStatic "impl.clj" 190]
  [babashka.pods.sci$load_pod$fn__27430 invoke "sci.clj" 122]
  [sci.impl.vars$binding_conveyor_fn$fn__424 invoke "vars.cljc" 133]
  [clojure.core$binding_conveyor_fn$fn__5823 invoke "core.clj" 2047]
  [clojure.lang.AFn call "AFn.java" 18]
  [java.util.concurrent.FutureTask run "FutureTask.java" 264]
  [java.util.concurrent.ThreadPoolExecutor runWorker "ThreadPoolExecutor.java" 1128]
  [java.util.concurrent.ThreadPoolExecutor$Worker run "ThreadPoolExecutor.java" 628]
  [java.lang.Thread run "Thread.java" 829]
  [com.oracle.svm.core.thread.PlatformThreads threadStartRoutine "PlatformThreads.java" 705]
  [com.oracle.svm.core.posix.thread.PosixPlatformThreads pthreadStartRoutine "PosixPlatformThreads.java" 202]]}

borkdude13:07:13

doesn't ring a bell. what I would do is debug with the JVM pods library. perhaps you can do the same

timo13:07:41

alright, I'll try this

borkdude13:07:25

if you aren't aware, I'd also recommend running your pod in a JVM, so you don't have to graal compile it every time you make a change, this iterates much faster

timo13:07:48

right, that makes sense

borkdude13:07:09

I can take a look later tonight, if you haven't found the issue

timo15:07:31

hm, I tried a few things, but couldn't find the problem. It think it's the input to the bytes->string -fn... 🤷

timo15:07:51

would be great if you took a look

borkdude15:07:13

have you tried debugging what goes into the bytes->string function?

borkdude15:07:30

apparently there is a "value" in the response, else it couldn't have ended up there

borkdude15:07:51

just try printing what goes into that function and report back. check out the pods library using :local/root and make a println change

👍 2
timo15:07:24

that's what I am doing 🙂

borkdude15:07:53

cool. what does it print?

timo15:07:42

so the function receives an empty persistenvector presumably from the ex-message part of the processor

borkdude15:07:06

why do you think it hits that path? don't the locations you pasted above show that it's this codepath? https://github.com/babashka/pods/blob/8b717eb001811bc5da5d15d1163565de00b4ffa4/src/babashka/pods/impl.clj#L190-L193

borkdude15:07:19

that is even before it processes ex-message, etc

timo15:07:34

that's what I thought but I couldn't find the problem there since it has the some->> macro in there

timo15:07:51

but of course you're right

borkdude18:07:49

BABASHKA_POD=true "clojure" "-M" "-m" "datahike.cli"
Execution error (ClassNotFoundException) at java.net.URLClassLoader/findClass (URLClassLoader.java:445).
datahike.java.IEntity

borkdude18:07:29

then I did:

bb jcompile

borkdude18:07:37

new error:

$ BABASHKA_POD=true "clojure" "-M" "-m" "datahike.cli"
Execution error (FileNotFoundException) at datahike.pod/eval37638$loading (pod.clj:1).
Could not locate bencode/core__init.class, bencode/core.clj or bencode/core.cljc on classpath.

Full report at:
/var/folders/j9/xmjlcym958b1fr0npsp9msvh0000gn/T/clojure-14533572372815876965.edn

borkdude18:07:19

I'm willing to take another look if you can set up stuff in a way that I can debug with the pods library and have clear instructions how to start

borkdude18:07:50

One thing I changed in run-bb-pod-tests was:

(pods/load-pod ["clojure" "-M" "-m" "datahike.cli"])

👍 2
timo19:07:50

I created an alias so it's

BABASHKA_POD=true "clojure" "-M:pod" "-m" "datahike.cli"
now

timo19:07:27

thanks for trying

borkdude10:07:56

$ ./bb/resources/native-image-tests/run-bb-pod-tests.clj

Testing user

Ran 1 tests containing 20 assertions.
0 failures, 0 errors.

borkdude13:07:05

so it works when running the pod in a JVM. Could be a reflection issue perhaps? It might be worth fixing these:

$ BABASHKA_POD=true "clojure" "-M:pod" "-m" "datahike.cli"
:yolo
Reflection warning, superv/async.cljc:834:40 - reference to field getTime on java.lang.Object can't be resolved.
Reflection warning, datahike/array.cljc:45:13 - call to static method compare on java.util.Arrays can't be resolved (argument types: unknown, unknown).
Reflection warning, datahike/db/transaction.cljc:231:32 - reference to field e can't be resolved.
Reflection warning, datahike/query_stats.cljc:6:11 - call to java.math.MathContext ctor can't be resolved.

borkdude17:07:13

borkdude@m1 /tmp/datahike (feat/babashka-pod?) $ ./bb/resources/native-image-tests/run-bb-pod-tests.clj
----- Error --------------------------------------------------------------------
Type:     java.lang.NullPointerException
Location: /private/tmp/datahike/./bb/resources/native-image-tests/run-bb-pod-tests.clj:7:1

----- Context ------------------------------------------------------------------
 3: (require '[babashka.pods :as pods]
 4:          '[clojure.test :refer [run-tests deftest testing is]])
 5: (import '[java.util Date])
 6:
 7: (pods/load-pod ["clojure" "-M:pod" "-m" "datahike.cli"])
    ^---
 8:
 9: (require '[datahike.pod :as d])
10:
11: (def config {:keep-history? true,
12:              :search-cache-size 10000,

----- Stack trace --------------------------------------------------------------
babashka.pods.impl/bytes->string      - <built-in>
babashka.pods.impl/describe->metadata - <built-in>
babashka.pods.impl/load-pod           - <built-in>
babashka.pods.sci/load-pod/fn--28088  - <built-in>
babashka.pods.sci/load-pod            - <built-in>
clojure.core/apply                    - <built-in>
babashka.impl.pods/load-pod           - <built-in>
user                                  - /private/tmp/datahike/./bb/resources/native-image-tests/run-bb-pod-tests.clj:7:1

borkdude@m1 /tmp/datahike (feat/babashka-pod?) $ Reflection warning, superv/async.cljc:834:40 - reference to field getTime on java.lang.Object can't be resolved.
Reflection warning, datahike/array.cljc:45:13 - call to static method compare on java.util.Arrays can't be resolved (argument types: unknown, unknown).
Reflection warning, datahike/db/transaction.cljc:231:32 - reference to field e can't be resolved.
Reflection warning, datahike/query_stats.cljc:6:11 - call to java.math.MathContext ctor can't be resolved.
:writing {"format" "transit+json", "namespaces" [{"name" "datahike.pod", "vars" [{"name" "delete-database"} {"name" "pull"} {"name" "entity"} {"name" "metrics"} {"name" "datoms"} {"name" "q"} {"name" "schema"} {"name" "pull-many"} {"name" "since"} {"name" "create-database"} {"name" "db"} {"name" "connect"} {"name" "history"} {"name" "database-exists?"} {"name" "transact"} {"name" "as-of"} {"name" "release-db"} {"name" "db-with"} {"name" "with-db", "code" "(defmacro with-db [bindings & body]\n                (cond\n                  (= (count bindings) 0) `(do ~@body)\n                  (symbol? (bindings 0)) `(let ~(subvec bindings 0 2)\n                                            (try\n                                              (with-db ~(subvec bindings 2) ~@body)\n                                              (finally\n                                                (release-db ~(bindings 0)))))\n                  :else (throw (IllegalArgumentException.\n                                 \"with-db only allows Symbols in bindings\"))))"}]}], "id" "42aaf0eb-4ba1-43b4-b459-0929172b81e5", "ops" {"shutdown" {}}}

2
borkdude09:07:33

It seems I'm getting an NPE during the "describe" payload

borkdude09:07:44

You don't get this locally?

timo09:07:01

no, I'm sorry

timo09:07:28

you're just running the tests?

timo09:07:52

what jvm version are you using?

borkdude09:07:09

ah wait, now it works, I had some reflection warning stuff in place in user.clj

borkdude09:07:21

I recommend fixing those as graal doesn't work under reflection warnings in clojure

timo09:07:52

Yes, we have to do that

timo09:07:30

yeah sorry, disable the debug in datahike.pod-ns. The NPE is what I expect in the test. What I am getting back with java-17-graalvm is

timo@Zockolette> ./bb/resources/native-image-tests/run-bb-pod-tests.clj                                                                                                                                                                                                                      ~/projects/datahike

Testing user

ERROR in (pod-workflow) (/home/timo/projects/datahike/./bb/resources/native-image-tests/run-bb-pod-tests.clj:22)
transact with bad arg
expected: (= [:tempids :db-before :db-after :tx-meta :tx-data] (keys (d/transact "foo" [{:name "Alice", :age 20} {:name "Bob", :age 30} {:name "Charlie", :age 40} {:age 15}])))
  actual: clojure.lang.ExceptionInfo: Cannot invoke "java.util.concurrent.Future.get()" because "fut" is null
{:argument-type "", :type "class java.lang.NullPointerException"}
 at babashka.pods.impl$processor.invokeStatic (impl.clj:219)
    babashka.pods.sci$load_pod$fn__28022.invoke (sci.clj:122)
    sci.impl.vars$binding_conveyor_fn$fn__440.invoke (vars.cljc:133)
    clojure.core$binding_conveyor_fn$fn__5823.invoke (core.clj:2047)
    clojure.lang.AFn.call (AFn.java:18)
    java.util.concurrent.FutureTask.run (FutureTask.java:317)
    java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1144)
    java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:642)
    java.lang.Thread.run (Thread.java:1589)
    com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine (PlatformThreads.java:775)
    com.oracle.svm.core.posix.thread.PosixPlatformThreads.pthreadStartRoutine (PosixPlatformThreads.java:203)

Ran 1 tests containing 21 assertions.
0 failures, 1 errors.

borkdude09:07:24

can you maybe make a separate branch if this is easier? and then I'll check that one out and run

timo09:07:45

debugging is now disabled

borkdude09:07:53

ok, this is what I'm seeing:

ERROR in (pod-workflow) (/private/tmp/datahike/./bb/resources/native-image-tests/run-bb-pod-tests.clj:22)
transact with bad arg
expected: (= [:tempids :db-before :db-after :tx-meta :tx-data] (keys (d/transact "foo" [{:name "Alice", :age 20} {:name "Bob", :age 30} {:name "Charlie", :age 40} {:age 15}])))
  actual: clojure.lang.ExceptionInfo: Cannot invoke "java.util.concurrent.Future.get()" because "fut" is null
{:argument-type "", :type "class java.lang.NullPointerException"}
 at babashka.pods.impl$processor.invokeStatic (impl.clj:219)
    babashka.pods.sci$load_pod$fn__28124.invoke (sci.clj:122)
    sci.impl.vars$binding_conveyor_fn$fn__440.invoke (vars.cljc:133)
    clojure.core$binding_conveyor_fn$fn__5823.invoke (core.clj:2047)
    clojure.lang.AFn.call (AFn.java:18)
    java.util.concurrent.FutureTask.run (FutureTask.java:317)
    java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1144)
    java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:642)
    java.lang.Thread.runWith (Thread.java:1636)
    java.lang.Thread.run (Thread.java:1623)
    com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine (PlatformThreads.java:807)
    com.oracle.svm.core.posix.thread.PosixPlatformThreads.pthreadStartRoutine (PosixPlatformThreads.java:210)

Ran 1 tests containing 21 assertions.
0 failures, 1 errors.

💯 2
timo09:07:21

I would expect to see the NPE from the debug output from your gist

timo09:07:57

and in the catch-block from my pod-impl I am not sending a value back as you are not doing either in your samples. so I don't get the exception message since there is a some->> and it should not get into the bytes->string -fn

borkdude09:07:50

this is not what I'm seeing here either. could there be laziness at play or so?

timo09:07:05

I wouldn't expect laziness in transact but I can check with @UB95JRKM3

borkdude09:07:50

This doesn't ring a bell, I'd have to dig into your implementation I guess

timo09:07:58

ok, I guess it's not worth it then.

timo09:07:29

thanks a lot for taking the time though

borkdude09:07:53

not worth what?

timo09:07:08

you investigating my implementation 🙂

timo09:07:22

when it's a problem with my impl

borkdude09:07:43

well, I don't know off hand, would need more time

borkdude09:07:57

which I don't have in the coming days probably :)

timo09:07:31

alright, I'll be happy if you do

borkdude09:07:37

you could try to throw a manual exception and see if that is handled correctly

borkdude09:07:44

instead of relying on the datahike stuff

borkdude09:07:01

to exclude any external factors

borkdude09:07:15

When you feed "foo" into this function:

(defn transact [conn arg-map]
  (let [c (get @conns conn)
        {:keys [db-before db-after tx-meta tx-data tempids]} (d/transact c arg-map)]
    (-> {:tempids tempids}
        (assoc :db-before (select-keys db-before [:max-tx :max-eid]))
        (assoc :db-after (select-keys db-after [:max-tx :max-eid]))
        (assoc :tx-meta tx-meta)
        (assoc :tx-data (map seq tx-data)))))

borkdude09:07:33

then c is null. Could it be that the Cannot invoke "java.util.concurrent.Future.get()" because "fut" is null is coming from that?

timo09:07:37

Yes, that should be the error.

timo09:07:56

I can try some exceptions manually

borkdude09:07:02

so what is then the problem?

borkdude09:07:58

if you transact with a bad argument, why don't you expect an exception?

(testing "transact with bad arg"
      (is (= [:tempids :db-before :db-after :tx-meta :tx-data]
             (keys (d/transact
                     "foo"
                     [{:name  "Alice", :age   20}
                      {:name  "Bob", :age   30}
                      {:name  "Charlie", :age   40}
                      {:age 15}])))))

timo09:07:29

I do expect an exception but I don't understand why I am getting the stacktrace from bb/sci

borkdude09:07:57

the test seems to suggest you just get some transaction data back, rather than an exception?

borkdude10:07:41

I don't get a stacktrace from bb/sci, I just get a failing test, like expected, I would say

timo10:07:34

hmm, so this is what you want to see when a test is failing?

at babashka.pods.impl$processor.invokeStatic (impl.clj:219)
    babashka.pods.sci$load_pod$fn__28022.invoke (sci.clj:122)
    sci.impl.vars$binding_conveyor_fn$fn__440.invoke (vars.cljc:133)
    clojure.core$binding_conveyor_fn$fn__5823.invoke (core.clj:2047)
    clojure.lang.AFn.call (AFn.java:18)
    java.util.concurrent.FutureTask.run (FutureTask.java:317)
    java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1144)
    java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:642)
    java.lang.Thread.run (Thread.java:1589)
    com.oracle.svm.core.thread.PlatformThreads.threadStartRoutine (PlatformThreads.java:775)
    com.oracle.svm.core.posix.thread.PosixPlatformThreads.pthreadStartRoutine 

timo10:07:52

ok, then it's my bad

borkdude10:07:12

yes, this is the exception thrown from the pod processor as this is is the exception you wanted to raise by sending exception data

borkdude10:07:34

so you write a test with a thrown? or thrown-with-msg?

timo10:07:51

that's what I would have done yes

borkdude10:07:21

I guess it's sorted then :)

timo10:07:35

oh man, sorry for keeping you busy

borkdude10:07:48

the internal pod failure we saw before isn't something I can reproduce here based on the borkdude branch

borkdude10:07:55

no worries!

borkdude10:07:05

if you have more questions, feel free to bug me. the better the repro, the faster I can provide you with more info

❤️ 2