I'm working on improving the Java interop in SCI using type hints. Java interop is implemented using reflection in SCI, based off of clojure.lang.Reflector (with some minor changes). So far type hints weren't really used in reflection (clojure.lang.Reflector also does not use them, it just chooses the right method based on the arguments provided). But sometimes this is necessary to disambiguate. An example:
(def fut (let [^java.lang.Runnable f (fn [] 3)] (.submit (java.util.concurrent.Executors/newCachedThreadPool) f)))
(.get fut)
This will return nil in Clojure. Why? Because f is type hinted as Runnable and java.util.concurrent.ExecutorService/.submit has an overload on Callable and Runnable . The return value of .get on the returned future will be affected by this. In the case of Runnable it is nil but in the case of Callable it will be the return value of the function.
When trying to make the https://github.com/potetm/fusebox library work with bb, some tests failed (similar to the above example, actual: nil, expected: 3) , because the wrong method was chosen. The order of methods found by reflection apparently isn't deterministic in graalvm native-image so this gave flaky failing tests in CI.
I've implemented a fix in SCI + bb in a branch. Right now, like Clojure, a program will fail if a type hint refers to a class that can't be found. So (let [^SomeClass x ...]) will fail if SomeClass isn't part of the bb binary. This is a breaking change (the type hint was mostly ignored so far in SCI), so I'm not sure if I should proceed down this path. The behavior is closer to JVM Clojure though. CLJS doesn't care about "invalid" type hints either.
The upside of failing on "invalid tags" is that it's easier to detect if bb will actually make use of the type hint. The downside is that existing programs that used to be able to run with bb will no longer run and I should add a class or two in the next release.
What do you think?🧵
What is the current behavior? I’m guessing it errors if you try to actually use a non-existent class, but as long as you avoid interop it works fine?
The current behavior is that the class isn't resolved since it was never used for reflection
I'm exclusively talking about classes mentioned in type hints now
not about (FooBar/foo), this always crashed (at analysis time)
K so if the program has an incorrect type hint, but the invoked method is correct, it works fine today but would break under this change?
correct
I needed to add these classes to bb:
java.time.chrono.ChronoLocalDate
java.time.temporal.TemporalUnit
java.time.chrono.ChronoLocalDateTime
java.time.chrono.ChronoZonedDateTime
java.time.chrono.Chronology])
to make CI still pass with the currently tested librariesbecause they are used in type hints
Perhaps I could just print a warning instead
but that would perhaps also be annoying for stuff that used to "just work"
Oh I see. It’s not just incorrect programs.
no, the reflection worked alright for pretty much everything, except the counter-example I came across in fusebox :)
all these years
I’m wondering if you couldn’t just drop the tag if you don’t know the class?
yes you can , but often you don't control the libraries you use, unless you fork them
and it would also be annoying if you still want the type hints for clj JVM usage
clj-kondo will warn though
but just for typos, it doesn't know if a class is part of bb
I mean can sci be made to work like that? Drop the tag if the class isn’t available?
That way it falls back to current reflection behavior.
Not sure what I think about it, but it might be a way to maintain backward compatibility.
if you mean by "drop the tag" is "do nothing with it, just ignore", then yes, it could be made so, that's what my question is about basically, if I should preserve that behavior
exactly, the question is about backward compatibility
backward compatibility sounds good right? :)
but you do get silent failures like in the fusebox test suite
so it would be nice to actually get some feedback on it somehow
I'm guessing the downside to just including the entire Java standard library is binary size?
hehe yes, definitely
I'm curious what the size diff is. The road you're on definitely ends with you including the whole thing yeh? Where's the line I wonder?
I don't think bb will have the entire java stdlib... but either way, I don't want to decide on that right now
Well, if it were me, and I was using a library where I had to avoid certain paths, I would probably want to know that. My vote is for a warning.
I think so too. bb already has a --debug flag so I can make the warnings part of that flow
this way you won't be annoyed for stuff that just works
Might be helpful if you posted a pared down version like, "bb can now detect when type-hint classes are not on the classpath. To date, bb would just ignore this and execute what it could. However, assuming the type-hint is correct, it means there there are paths that would error if they were invoked. I know of a few libraries where this was happening. What behavior would you rather see in this case? 1. Fail (breaking change) 2. Warn (maybe annoying?) 3. Silently ignore (current behavior)" Up to you. It just took me a while to figure out exactly what the deal was, so I thought I'd take a shot at a more poll-like question for you.
My harder to read text also filters out people blindly voting for one option without knowing why this change was actually necessary :)
but I agree I could have written it more terse
blindly voting for 2 (just kidding)... I like the idea of warning in order to not break existing code, but to sort of "best effort" mimic clj's behavior. I do somewhat worry that there's an outside chance it spawns more "can this class be added to bb so that this one edge case in this one script doesn't show a warning?", and then that class ends up pulling in two other classes, and it adds like 3MB to the executable size for what might be an esoteric situation that's not actually breaking anything. But that might just be my bias of wanting bb to stay small and focused on CLI-ish stuff.
yeah exactly. I could also go with 3 for now and wait until it actually becomes annoying to not have these warnings and then build them
it's not like it has to be done all in the next release I guess
I think there's a gap in my understanding somewhere... the change to sci would be noticing/honoring type hints in reflector, and the decision/dilemma is what to do when it can't honor those hints because the class isn't available. But I think I understand from the initial message that clj's reflector doesn't need to do this... does that mean that the JVM's overload selection is more precise/accurate than what ends up in a native image? I guess what I'm missing is: if clj isn't flaky on overloads, but it also doesn't use type hints in the reflector, why? Is it a more dynamic decision in the JVM than is available in native-image? Don't feel like you need to answer that if it's too much; I might just be having Friday-brain or something.
clj is just as flaky when reflecting, it just takes the first applicable method, then keeps searching for "better" (more specific) methods and returns the one it found doing that. But since Runnable and Callable are not related to each other, it just takes whatever overload fits fn (which implements both of these interfaces). The thing that is more flaky in native-image is that the list of methods isn't deterministic. sometimes the Callable method is before the Runnable or vice versa.
Oh wait, I guess clj isn't as flaky as I thought (it doesn't fall back on runtime reflection here)
user=> (let [fut (let [f (fn [] 3)] (.submit (java.util.concurrent.Executors/newCachedThreadPool) f))] (.get fut))
Syntax error (IllegalArgumentException) compiling . at (REPL:1:30).
More than one matching method found: submitIt does accept this one though:
user=> (let [fut (let [f (identity (fn [] 3))] (.submit (java.util.concurrent.Executors/newCachedThreadPool) f))] (.get fut))
Reflection warning, NO_SOURCE_PATH:1:41 - call to method submit on java.util.concurrent.ExecutorService can't be resolved (argument types: unknown).
Reflection warning, NO_SOURCE_PATH:1:108 - reference to field get can't be resolved.
3in this case, the return value nil could also have occurred
At least this fails now:
user=> (def fut (let [^String f (fn [] 3)] (.submit (java.util.concurrent.Executors/newCachedThreadPool) f)))
No matching method submit found taking 1 args for class java.util.concurrent.ThreadPoolExecutor
(I type hinted the function as a string, which makes this fail)
(previously this would work in bb, since type hints weren't used)
Clojure does accept this though:
user=> (def fut (let [^String f (fn [] 3)] (.submit (java.util.concurrent.Executors/newCachedThreadPool) f)))
#'user/fut
I guess it's stubborn because it knows the return type of fnso in this case it ignores the faulty type hint
ah well
huh, it even accepts this:
user=> (def fut (let [^String f (identity (fn [] 3))] (.submit (java.util.concurrent.Executors/newCachedThreadPool) f)))
#'user/futlol... I was just gonna say "that seems to work in clj"
why does it work though
user=> (defn f [^String f] (.submit (java.util.concurrent.Executors/newCachedThreadPool) f))
Reflection warning, NO_SOURCE_PATH:1:21 - call to method submit on java.util.concurrent.ExecutorService can't be resolved (argument types: java.lang.String).
#'user/fso in this case it doesn't crash, only warn
this is getting complicated ;)
alright, I think I'll go with the minimal fix that makes the interop choose the right method and nothing else for now, the rest will come some other time
if you're curious about the SCI PR: https://github.com/babashka/sci/pull/960/files
for a moment I thought: o no, despite my fixes the problem persists...
https://app.circleci.com/pipelines/github/babashka/sci/3640/workflows/0fde473d-f1c3-4dfe-9f52-e3cca507ea35/jobs/14972?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-checks-link&utm_content=summary&invite=true#step-104-18391_48
but it turned out I needed to write Callable fully qualified since SCI doesn't resolve it to java.util.concurrent.Callable without a proper configuration
but this was actually a case where the non-determinism also showed in a normal JVM
O no..
FAIL in (cached-thread-pool) (interop_test.clj:247)
expected: (= 3 (bb nil "(import '(java.util.concurrent Executors ExecutorService))\n (let [fut (.submit ^ExecutorService (Executors/newCachedThreadPool)\n ^Callable (fn [] 3))]\n (.get fut))"))
actual: (not (= 3 nil))
:-S :https://github.com/babashka/babashka/actions/runs/14667591492/job/41166154642
also failing here: https://app.circleci.com/pipelines/github/babashka/babashka/7476/workflows/3184dd5d-5776-4166-937a-e2a7fa4d26e9/jobs/40933 facepalm
I'm debugging again
got it, there's something about not preserving a tag that's put on a function.
(prn :arg-meta arg-meta :arg (str arg) :type (type arg))
phew, luckily it's not some random thing, but the SCI test suite didn't catch it, this is just another edge case I didn't test in SCIso, luckily I have something to do tomorrow to finish this up ;)
IME, nothing uncovers bugs/missing tests/mistaken assumptions like trying to add a feature 🙂
expected: (= [:something :dangerous] (bw/bulwark spec (Thread/sleep 100) [:something :dangerous]))
actual: (not (= [:something :dangerous] nil))
I'm still left with this failing test, even though I have tests for the Runnable/Callable interop in place, so that can no longer be it... ... I think. This fails in CI sometimes (mostly on macos and windows), can't reproduce locally yet.bingo, I can reproduce it:
$ BABASHKA_TEST_ENV=native script/run_lib_tests com.potetm.fusebox.bulwark-test
Testing com.potetm.fusebox.bulwark-test
FAIL in (bulwark-test) (/Users/borkdude/.gitlibs/libs/com.potetm/fusebox/2f42391868c82c193628bec8922f8735ae3cac66/test/com/potetm/fusebox/bulwark_test.clj:12)
it works
expected: (= [:something :dangerous] (bw/bulwark spec (Thread/sleep 100) [:something :dangerous]))
actual: (not (= [:something :dangerous] nil))
Ran 1 tests containing 2 assertions.
1 failures, 0 errors.
{:test 1, :pass 1, :fail 1, :error 0}can confirm that using this binary, the interop works as it should
possible another edge case
another edge case for sure. When change the locals in the interop expression form to:
^Callable f (bound-fn* f)
fut (.submit ^ExecutorService @timeout-threadpool
f)
instead of:
fut (.submit ^ExecutorService @timeout-threadpool
^Callable (bound-fn* f))
then it starts to work again. Phew... it's reproducible.lol, ok, this works without bound-fn* which I had in my test:
(let [fut (.submit ^ExecutorService (Executors/newCachedThreadPool)
^Callable (bound-fn* (fn [] 3)))]
(prn (.get fut)))fixed: https://github.com/babashka/sci/commit/1c750bdeacaa966d56038f13d7362f42c47db4f4?w=1
merged to bb master now 😅
hmm, these failed in one CI step:
FAIL in (bulkhead-test) (/Users/runner/.gitlibs/libs/com.potetm/fusebox/2f42391868c82c193628bec8922f8735ae3cac66/test/com/potetm/fusebox/bulkhead_test.clj:10)
it works
expected: (= ExceptionInfo (type ex))
actual: (not (= clojure.lang.ExceptionInfo clojure.lang.Keyword))
FAIL in (bulkhead-test) (/Users/runner/.gitlibs/libs/com.potetm/fusebox/2f42391868c82c193628bec8922f8735ae3cac66/test/com/potetm/fusebox/bulkhead_test.clj:10)
it works
expected: (= "fusebox timeout" (ex-message ex))
actual: (not (= "fusebox timeout" nil))
Might be flakiness? Not sure about these.I'm pretty sure it's not the interop problem anymore since these are now all tested and the tests that would be affected by that do pass
alright, I assume this is legit flakiness, since I downloaded a bb binary from a build where the test was failing with these messages and when I run it locally, all tests pass
CI is all green now. Aaaaah.
Ok yeah. I’ve never seen that one before flaky, but if the futures aren’t submitted fast enough that would cause what you see here.
*be flaky before
this could be due to SCI not being as fast as compiled Clojure
can't reproduce locally on m1 air here, but it tends to only happen in Rosetta2 Mac CI and Windows Github CI
I disabled those tests now
This is the list of disabled + enabled tests that I don't have problems with so far:
[#_com.potetm.fusebox.bulkhead-test
com.potetm.fusebox.bulwark-test
com.potetm.fusebox.circuit-breaker-test
com.potetm.fusebox.fallback-test
com.potetm.fusebox.memoize-test
#_com.potetm.fusebox.rate-limit-test
com.potetm.fusebox.registry-test
com.potetm.fusebox.retry-test
#_com.potetm.fusebox.timeout-test]yep those are the ones that are heavily timing dependent lol
can I get this version via brew? once I can run this locally, I can maybe adjust timings to be more reliable.
just a minute, I've just pulled the trigger on a release
I'll notify you
of course I needed to tweak some release script just before release which is why I'm now missing half of my release assets. hold on.
I made this conditional since I'm running some script twice to be able to download the failed bb binary for testing from the artifacts:
if [ "$BABASHKA_RELEASE" = "true" ]; then
./bb --config .build/bb.edn --deps-root . release-artifact "/tmp/release/$archive"
fi
but apparently
- run:
name: Release
command: .circleci/script/release
- persist_to_workspace:
root: /tmp
paths:
- release
- run:
name: Run tests
command: |-
script/test
script/run_lib_tests
- run:
name: Release + publish
command: BABASHKA_PUBLISH=true .circleci/script/release
the last call to the release script, doesn't do what I want.apparently circleci has a weird way of dealing with env vars or so https://circleci.com/docs/set-environment-variable/