babashka

borkdude 2025-04-25T10:06:54.634109Z

I'm working on improving the Java interop in SCI using type hints. Java interop is implemented using reflection in SCI, based off of clojure.lang.Reflector (with some minor changes). So far type hints weren't really used in reflection (clojure.lang.Reflector also does not use them, it just chooses the right method based on the arguments provided). But sometimes this is necessary to disambiguate. An example:

(def fut (let [^java.lang.Runnable f (fn [] 3)] (.submit (java.util.concurrent.Executors/newCachedThreadPool) f)))
(.get fut)
This will return nil in Clojure. Why? Because f is type hinted as Runnable and java.util.concurrent.ExecutorService/.submit has an overload on Callable and Runnable . The return value of .get on the returned future will be affected by this. In the case of Runnable it is nil but in the case of Callable it will be the return value of the function. When trying to make the https://github.com/potetm/fusebox library work with bb, some tests failed (similar to the above example, actual: nil, expected: 3) , because the wrong method was chosen. The order of methods found by reflection apparently isn't deterministic in graalvm native-image so this gave flaky failing tests in CI. I've implemented a fix in SCI + bb in a branch. Right now, like Clojure, a program will fail if a type hint refers to a class that can't be found. So (let [^SomeClass x ...]) will fail if SomeClass isn't part of the bb binary. This is a breaking change (the type hint was mostly ignored so far in SCI), so I'm not sure if I should proceed down this path. The behavior is closer to JVM Clojure though. CLJS doesn't care about "invalid" type hints either. The upside of failing on "invalid tags" is that it's easier to detect if bb will actually make use of the type hint. The downside is that existing programs that used to be able to run with bb will no longer run and I should add a class or two in the next release. What do you think?

borkdude 2025-04-25T10:07:06.851679Z

🧵

2025-04-25T10:39:01.729459Z

What is the current behavior? I’m guessing it errors if you try to actually use a non-existent class, but as long as you avoid interop it works fine?

borkdude 2025-04-25T10:39:29.557639Z

The current behavior is that the class isn't resolved since it was never used for reflection

borkdude 2025-04-25T10:40:09.704869Z

I'm exclusively talking about classes mentioned in type hints now

borkdude 2025-04-25T10:40:23.282899Z

not about (FooBar/foo), this always crashed (at analysis time)

2025-04-25T10:42:07.535219Z

K so if the program has an incorrect type hint, but the invoked method is correct, it works fine today but would break under this change?

borkdude 2025-04-25T10:42:20.309009Z

correct

borkdude 2025-04-25T10:43:09.626069Z

I needed to add these classes to bb:

java.time.chrono.ChronoLocalDate
                java.time.temporal.TemporalUnit
                java.time.chrono.ChronoLocalDateTime
                java.time.chrono.ChronoZonedDateTime
                java.time.chrono.Chronology])
to make CI still pass with the currently tested libraries

borkdude 2025-04-25T10:43:23.224369Z

because they are used in type hints

borkdude 2025-04-25T10:43:59.985469Z

Perhaps I could just print a warning instead

borkdude 2025-04-25T10:44:30.834899Z

but that would perhaps also be annoying for stuff that used to "just work"

2025-04-25T10:45:35.871059Z

Oh I see. It’s not just incorrect programs.

borkdude 2025-04-25T10:46:02.726509Z

no, the reflection worked alright for pretty much everything, except the counter-example I came across in fusebox :)

borkdude 2025-04-25T10:46:06.214619Z

all these years

2025-04-25T10:46:21.356829Z

I’m wondering if you couldn’t just drop the tag if you don’t know the class?

borkdude 2025-04-25T10:46:36.103349Z

yes you can , but often you don't control the libraries you use, unless you fork them

borkdude 2025-04-25T10:46:52.901909Z

and it would also be annoying if you still want the type hints for clj JVM usage

borkdude 2025-04-25T10:47:15.389089Z

clj-kondo will warn though

borkdude 2025-04-25T10:47:30.622259Z

but just for typos, it doesn't know if a class is part of bb

2025-04-25T10:47:58.081409Z

I mean can sci be made to work like that? Drop the tag if the class isn’t available?

2025-04-25T10:48:22.834729Z

That way it falls back to current reflection behavior.

2025-04-25T10:48:45.161659Z

Not sure what I think about it, but it might be a way to maintain backward compatibility.

borkdude 2025-04-25T10:48:47.100929Z

if you mean by "drop the tag" is "do nothing with it, just ignore", then yes, it could be made so, that's what my question is about basically, if I should preserve that behavior

borkdude 2025-04-25T10:49:00.069289Z

exactly, the question is about backward compatibility

borkdude 2025-04-25T10:52:50.418599Z

backward compatibility sounds good right? :)

borkdude 2025-04-25T10:53:07.851789Z

but you do get silent failures like in the fusebox test suite

borkdude 2025-04-25T10:53:14.923129Z

so it would be nice to actually get some feedback on it somehow

2025-04-25T11:05:16.285759Z

I'm guessing the downside to just including the entire Java standard library is binary size?

borkdude 2025-04-25T11:05:31.637009Z

hehe yes, definitely

2025-04-25T11:06:18.132889Z

I'm curious what the size diff is. The road you're on definitely ends with you including the whole thing yeh? Where's the line I wonder?

borkdude 2025-04-25T11:07:00.008719Z

I don't think bb will have the entire java stdlib... but either way, I don't want to decide on that right now

2025-04-25T11:10:45.546079Z

Well, if it were me, and I was using a library where I had to avoid certain paths, I would probably want to know that. My vote is for a warning.

borkdude 2025-04-25T11:12:01.833089Z

I think so too. bb already has a --debug flag so I can make the warnings part of that flow

borkdude 2025-04-25T11:12:25.349869Z

this way you won't be annoyed for stuff that just works

2025-04-25T12:03:25.654689Z

Might be helpful if you posted a pared down version like, "bb can now detect when type-hint classes are not on the classpath. To date, bb would just ignore this and execute what it could. However, assuming the type-hint is correct, it means there there are paths that would error if they were invoked. I know of a few libraries where this was happening. What behavior would you rather see in this case? 1. Fail (breaking change) 2. Warn (maybe annoying?) 3. Silently ignore (current behavior)" Up to you. It just took me a while to figure out exactly what the deal was, so I thought I'd take a shot at a more poll-like question for you.

borkdude 2025-04-25T12:12:48.042449Z

My harder to read text also filters out people blindly voting for one option without knowing why this change was actually necessary :)

1
borkdude 2025-04-25T12:13:20.180279Z

but I agree I could have written it more terse

Bob B 2025-04-25T12:48:11.243259Z

blindly voting for 2 (just kidding)... I like the idea of warning in order to not break existing code, but to sort of "best effort" mimic clj's behavior. I do somewhat worry that there's an outside chance it spawns more "can this class be added to bb so that this one edge case in this one script doesn't show a warning?", and then that class ends up pulling in two other classes, and it adds like 3MB to the executable size for what might be an esoteric situation that's not actually breaking anything. But that might just be my bias of wanting bb to stay small and focused on CLI-ish stuff.

borkdude 2025-04-25T12:49:10.219519Z

yeah exactly. I could also go with 3 for now and wait until it actually becomes annoying to not have these warnings and then build them

borkdude 2025-04-25T12:49:43.627259Z

it's not like it has to be done all in the next release I guess

Bob B 2025-04-25T13:01:54.263999Z

I think there's a gap in my understanding somewhere... the change to sci would be noticing/honoring type hints in reflector, and the decision/dilemma is what to do when it can't honor those hints because the class isn't available. But I think I understand from the initial message that clj's reflector doesn't need to do this... does that mean that the JVM's overload selection is more precise/accurate than what ends up in a native image? I guess what I'm missing is: if clj isn't flaky on overloads, but it also doesn't use type hints in the reflector, why? Is it a more dynamic decision in the JVM than is available in native-image? Don't feel like you need to answer that if it's too much; I might just be having Friday-brain or something.

borkdude 2025-04-25T13:04:52.660069Z

clj is just as flaky when reflecting, it just takes the first applicable method, then keeps searching for "better" (more specific) methods and returns the one it found doing that. But since Runnable and Callable are not related to each other, it just takes whatever overload fits fn (which implements both of these interfaces). The thing that is more flaky in native-image is that the list of methods isn't deterministic. sometimes the Callable method is before the Runnable or vice versa.

borkdude 2025-04-25T13:06:42.277719Z

Oh wait, I guess clj isn't as flaky as I thought (it doesn't fall back on runtime reflection here)

user=> (let [fut (let [f (fn [] 3)] (.submit (java.util.concurrent.Executors/newCachedThreadPool) f))] (.get fut))
Syntax error (IllegalArgumentException) compiling . at (REPL:1:30).
More than one matching method found: submit

borkdude 2025-04-25T13:07:33.425569Z

It does accept this one though:

user=> (let [fut (let [f (identity (fn [] 3))] (.submit (java.util.concurrent.Executors/newCachedThreadPool) f))] (.get fut))
Reflection warning, NO_SOURCE_PATH:1:41 - call to method submit on java.util.concurrent.ExecutorService can't be resolved (argument types: unknown).
Reflection warning, NO_SOURCE_PATH:1:108 - reference to field get can't be resolved.
3

borkdude 2025-04-25T13:07:57.402019Z

in this case, the return value nil could also have occurred

borkdude 2025-04-25T13:36:25.974839Z

At least this fails now:

user=> (def fut (let [^String f (fn [] 3)] (.submit (java.util.concurrent.Executors/newCachedThreadPool) f)))
No matching method submit found taking 1 args for class java.util.concurrent.ThreadPoolExecutor

borkdude 2025-04-25T13:36:44.368029Z

(I type hinted the function as a string, which makes this fail)

borkdude 2025-04-25T13:37:25.083479Z

(previously this would work in bb, since type hints weren't used)

borkdude 2025-04-25T13:38:28.010429Z

Clojure does accept this though:

user=> (def fut (let [^String f (fn [] 3)] (.submit (java.util.concurrent.Executors/newCachedThreadPool) f)))
#'user/fut
I guess it's stubborn because it knows the return type of fn

borkdude 2025-04-25T13:38:46.261119Z

so in this case it ignores the faulty type hint

borkdude 2025-04-25T13:38:47.458209Z

ah well

borkdude 2025-04-25T13:39:12.297779Z

huh, it even accepts this:

user=> (def fut (let [^String f (identity (fn [] 3))] (.submit (java.util.concurrent.Executors/newCachedThreadPool) f)))
#'user/fut

Bob B 2025-04-25T13:39:13.532389Z

lol... I was just gonna say "that seems to work in clj"

borkdude 2025-04-25T13:39:52.042949Z

why does it work though

borkdude 2025-04-25T13:40:42.738009Z

user=> (defn f [^String f] (.submit (java.util.concurrent.Executors/newCachedThreadPool) f))
Reflection warning, NO_SOURCE_PATH:1:21 - call to method submit on java.util.concurrent.ExecutorService can't be resolved (argument types: java.lang.String).
#'user/f

borkdude 2025-04-25T13:40:58.061629Z

so in this case it doesn't crash, only warn

borkdude 2025-04-25T13:41:06.373819Z

this is getting complicated ;)

borkdude 2025-04-25T13:43:56.489999Z

alright, I think I'll go with the minimal fix that makes the interop choose the right method and nothing else for now, the rest will come some other time

👍 2
borkdude 2025-04-25T14:35:59.169829Z

if you're curious about the SCI PR: https://github.com/babashka/sci/pull/960/files

borkdude 2025-04-25T14:55:49.834909Z

for a moment I thought: o no, despite my fixes the problem persists... https://app.circleci.com/pipelines/github/babashka/sci/3640/workflows/0fde473d-f1c3-4dfe-9f52-e3cca507ea35/jobs/14972?utm_campaign=vcs-integration-link&utm_medium=referral&utm_source=github-checks-link&utm_content=summary&invite=true#step-104-18391_48 but it turned out I needed to write Callable fully qualified since SCI doesn't resolve it to java.util.concurrent.Callable without a proper configuration but this was actually a case where the non-determinism also showed in a normal JVM

borkdude 2025-04-25T15:12:45.652819Z

O no..

FAIL in (cached-thread-pool) (interop_test.clj:247)
expected: (= 3 (bb nil "(import '(java.util.concurrent Executors ExecutorService))\n                    (let [fut (.submit ^ExecutorService (Executors/newCachedThreadPool)\n                       ^Callable (fn [] 3))]\n   (.get fut))"))
  actual: (not (= 3 nil))
:-S :

borkdude 2025-04-25T15:19:56.924279Z

I'm debugging again

borkdude 2025-04-25T15:23:30.406009Z

got it, there's something about not preserving a tag that's put on a function.

(prn :arg-meta arg-meta :arg (str arg) :type (type arg))
phew, luckily it's not some random thing, but the SCI test suite didn't catch it, this is just another edge case I didn't test in SCI

borkdude 2025-04-25T15:23:42.694199Z

so, luckily I have something to do tomorrow to finish this up ;)

Bob B 2025-04-25T15:30:41.554239Z

IME, nothing uncovers bugs/missing tests/mistaken assumptions like trying to add a feature 🙂

borkdude 2025-04-26T09:01:27.938979Z

expected: (= [:something :dangerous] (bw/bulwark spec (Thread/sleep 100) [:something :dangerous]))
  actual: (not (= [:something :dangerous] nil))
I'm still left with this failing test, even though I have tests for the Runnable/Callable interop in place, so that can no longer be it... ... I think. This fails in CI sometimes (mostly on macos and windows), can't reproduce locally yet.

borkdude 2025-04-26T09:04:30.022089Z

bingo, I can reproduce it:

$ BABASHKA_TEST_ENV=native script/run_lib_tests com.potetm.fusebox.bulwark-test

Testing com.potetm.fusebox.bulwark-test

FAIL in (bulwark-test) (/Users/borkdude/.gitlibs/libs/com.potetm/fusebox/2f42391868c82c193628bec8922f8735ae3cac66/test/com/potetm/fusebox/bulwark_test.clj:12)
it works
expected: (= [:something :dangerous] (bw/bulwark spec (Thread/sleep 100) [:something :dangerous]))
  actual: (not (= [:something :dangerous] nil))

Ran 1 tests containing 2 assertions.
1 failures, 0 errors.
{:test 1, :pass 1, :fail 1, :error 0}

borkdude 2025-04-26T09:06:08.660679Z

can confirm that using this binary, the interop works as it should

borkdude 2025-04-26T09:11:54.188179Z

possible another edge case

borkdude 2025-04-26T09:13:43.528669Z

another edge case for sure. When change the locals in the interop expression form to:

^Callable f (bound-fn* f)
          fut (.submit ^ExecutorService @timeout-threadpool
                       f)
instead of:
fut (.submit ^ExecutorService @timeout-threadpool
                       ^Callable (bound-fn* f))
then it starts to work again. Phew... it's reproducible.

borkdude 2025-04-26T09:14:29.117889Z

lol, ok, this works without bound-fn* which I had in my test:

(let [fut (.submit ^ExecutorService (Executors/newCachedThreadPool)
                   ^Callable (bound-fn* (fn [] 3)))]
  (prn (.get fut)))

borkdude 2025-04-26T10:17:08.781669Z

merged to bb master now 😅

borkdude 2025-04-26T10:31:48.578059Z

hmm, these failed in one CI step:

FAIL in (bulkhead-test) (/Users/runner/.gitlibs/libs/com.potetm/fusebox/2f42391868c82c193628bec8922f8735ae3cac66/test/com/potetm/fusebox/bulkhead_test.clj:10)
it works
expected: (= ExceptionInfo (type ex))
  actual: (not (= clojure.lang.ExceptionInfo clojure.lang.Keyword))

FAIL in (bulkhead-test) (/Users/runner/.gitlibs/libs/com.potetm/fusebox/2f42391868c82c193628bec8922f8735ae3cac66/test/com/potetm/fusebox/bulkhead_test.clj:10)
it works
expected: (= "fusebox timeout" (ex-message ex))
  actual: (not (= "fusebox timeout" nil))
Might be flakiness? Not sure about these.

borkdude 2025-04-26T10:35:43.637389Z

I'm pretty sure it's not the interop problem anymore since these are now all tested and the tests that would be affected by that do pass

borkdude 2025-04-26T11:01:35.362619Z

alright, I assume this is legit flakiness, since I downloaded a bb binary from a build where the test was failing with these messages and when I run it locally, all tests pass

borkdude 2025-04-26T11:40:24.290109Z

CI is all green now. Aaaaah.

2025-04-26T12:04:40.618129Z

Ok yeah. I’ve never seen that one before flaky, but if the futures aren’t submitted fast enough that would cause what you see here.

2025-04-26T12:05:00.450659Z

*be flaky before

borkdude 2025-04-26T12:05:18.907949Z

this could be due to SCI not being as fast as compiled Clojure

borkdude 2025-04-26T12:05:38.654499Z

can't reproduce locally on m1 air here, but it tends to only happen in Rosetta2 Mac CI and Windows Github CI

borkdude 2025-04-26T12:05:52.331769Z

I disabled those tests now

borkdude 2025-04-26T12:06:15.336569Z

This is the list of disabled + enabled tests that I don't have problems with so far:

[#_com.potetm.fusebox.bulkhead-test
                                       com.potetm.fusebox.bulwark-test
                                       com.potetm.fusebox.circuit-breaker-test
                                       com.potetm.fusebox.fallback-test
                                       com.potetm.fusebox.memoize-test
                                       #_com.potetm.fusebox.rate-limit-test
                                       com.potetm.fusebox.registry-test
                                       com.potetm.fusebox.retry-test
                                       #_com.potetm.fusebox.timeout-test]

2025-04-26T13:24:40.364149Z

yep those are the ones that are heavily timing dependent lol

2025-04-26T14:25:45.405719Z

can I get this version via brew? once I can run this locally, I can maybe adjust timings to be more reliable.

borkdude 2025-04-26T14:26:14.849139Z

just a minute, I've just pulled the trigger on a release

👍 1
borkdude 2025-04-26T14:26:25.887229Z

I'll notify you

borkdude 2025-04-26T14:52:47.867899Z

of course I needed to tweak some release script just before release which is why I'm now missing half of my release assets. hold on.

1
borkdude 2025-04-26T14:56:47.998769Z

I made this conditional since I'm running some script twice to be able to download the failed bb binary for testing from the artifacts:

if [ "$BABASHKA_RELEASE" = "true" ]; then
    ./bb --config .build/bb.edn --deps-root . release-artifact "/tmp/release/$archive"
fi
but apparently
- run:
        name: Release
        command: .circleci/script/release
    - persist_to_workspace:
        root: /tmp
        paths:
        - release
    - run:
        name: Run tests
        command: |-
          script/test
          script/run_lib_tests
    - run:
        name: Release + publish
        command: BABASHKA_PUBLISH=true .circleci/script/release
the last call to the release script, doesn't do what I want.

borkdude 2025-04-26T14:58:47.457689Z

apparently circleci has a weird way of dealing with env vars or so https://circleci.com/docs/set-environment-variable/