sci

borkdude 2026-06-16T13:29:05.093549Z

Here is @whilo’s :interrupt-fn PR with my changes on top with how I think it should work. Docs: https://github.com/whilo/sci/blob/49a58c22afab3bd2392db8f782f1c41382d7b7bb/doc/interrupt.md The caveats in the docs still make me wonder: should we do this or not... Performance-wise there are no regressions, so that's not a reason to not include it. @whilo is already using this feature in a project he announced recently: https://clojurians.slack.com/archives/C06MAR553/p1781382004787979 cc @mkvlr @jackrusher @whilo @bhauman @jeroenvandijk @smith.adriane

👍 1
2026-06-16T13:41:43.346869Z

I guess this interrupt-fn approach handles function recursion which is nice The example in the docs .pow can be countered with what I had in mind here https://github.com/babashka/sci/pull/1027 I just need to finish it 🙃

borkdude 2026-06-16T13:43:19.095749Z

yeah sure, you can lock down interop, that's fine, people just need to be aware of the caveats

2026-06-16T13:44:37.109929Z

Should be ok I think. There are enough warning signs

mkvlr 2026-06-16T14:15:33.500399Z

I like it and the limitations sound reasonable. I wasn't expecting this feature to work around the halting problem. Seems quite polished to already provide interruptible core functions that can be brought in easily.

bhauman 2026-06-16T14:17:48.871139Z

This seems great 👍

borkdude 2026-06-16T14:23:23.809159Z

k merged

🎉 2
2026-06-16T14:33:18.427099Z

Regarding performance, there should be a tiny bit of overhead because of the (when (some? interrupt-fn#) ...) check, right? In a simple loop with many iterations you don't see this? https://github.com/babashka/sci/compare/master...whilo:sci:49a58c22afab3bd2392db8f782f1c41382d7b7bb#diff-7b599c1eeb2f916482bdf63e8ce63c9a342e165b8e930bc722bccad3d994d914R40

borkdude 2026-06-16T14:34:14.624839Z

(some? ...) compiles to a null check, it's not noticeable. just test it out locally

2026-06-16T14:34:40.901799Z

I believe you. Ok that's great 👍 just wanted to understand it

borkdude 2026-06-16T14:35:24.100129Z

this is why I asked while to change it to (some? ..) instead of (if interrupt-fn) since the latter goes through the equiv function which is an extra frame

👌 1
borkdude 2026-06-16T14:36:43.299889Z

eh sorry not equiv, but the truth function

borkdude 2026-06-16T14:39:49.948469Z

at least, that's the case in CLJS... but in the JVM it's an actual function call, thanks for mentioning that

👍 1
borkdude 2026-06-16T14:39:53.897189Z

I'll get rid of those

borkdude 2026-06-16T14:40:36.720149Z

I didn't see any perf regressions though, but just to make sure

2026-06-16T14:41:01.140859Z

Would be cool now to create some example sandboxes with a subset of clojure core that we assume now are safe for malicious code (so at least no interop yet)

mkvlr 2026-06-16T14:44:25.727909Z

do you think providing a jvm interrupt fn along those lines could be sensible or would you rather folks always write their own?

(defn throw-when-interrupted []
  (when (java.lang.Thread/.isInterrupted (Thread/currentThread))
    (throw (java.lang.InterruptedException.))))

borkdude 2026-06-16T14:45:15.789139Z

I think we can bring it into the example and that should probably be sufficient?

borkdude 2026-06-16T14:45:39.725799Z

like: check if counter is higher than OR Thread/isInterrupted OR time has passed

borkdude 2026-06-16T14:46:12.593229Z

I'll make that change

👍 2
2026-06-16T15:53:44.731619Z

Regarding one of the examples that would bypass interrupt-fn mentioned in https://github.com/babashka/sci/issues/1038#issuecomment-4266924033 This is one doesn't trigger anything?

(re-matches #"(a+)+$" "aaaa…aaab")
;=> nil
I was curious to look at a workaround as mentioned https://www.ocpsoft.org/regex/how-to-interrupt-a-long-running-infinite-java-regular-expression/, but even that one doesn't create an infinite loop Are these old examples maybe? Is there another example?

borkdude 2026-06-16T16:12:49.268779Z

that was a bad example

borkdude 2026-06-16T16:12:57.423469Z

refer to the docs for a good example

borkdude 2026-06-16T16:13:02.793249Z

on master

👍 1
2026-06-16T16:24:01.425219Z

FYI, couldn't find a regexp example in the docs, but this one takes a long time on my machine (6 seconds in clj):

(time
  (re-matches #"^(.*a){20}$"
              (str (apply str (repeat 28 \a)) "!")))

👍 1
whilo 2026-06-16T17:49:08.695079Z

I picked the nil check because the JIT should prove that it can be removed when the function is nil in general. Which is also what I observed in tests. I think being careful about merging features like this does make some sense, because the question is how systematic and compositional the approach can be. I think it should be ok, but it is still somewhat unclear how to turn sci fully into a robust sandbox. I think LLMs maybe don't need that, but to add maintainable features this is a very valid thing to consider. If the interpreter was written in CPS style this was almost trivial to do, but this itself introduces overhead and sci integrates more intimately with standard Clojure than would be possible in a CPS-transformed interpreter maybe.

borkdude 2026-06-16T18:03:19.825539Z

yeah, it just works differently in CLJS than in JVM Clojure, but even with some? on the JVM I didn't actually see a perf regression in real usage, only in a microbenchmark that zoomed in on that specific check

2026-06-16T17:08:17.642149Z

A safer version of re-matches in Sci https://gist.github.com/jeroenvandijk/d0cbc94552025a189a1ae9fc8916223a

borkdude 2026-06-16T17:10:51.233689Z

clever!

2026-06-16T17:11:29.566449Z

ChatGpt generated this example btw so can't give credits to a human, but i'm sure it was made up by someone The sequence wrapper looks similar to this Java version https://www.ocpsoft.org/regex/how-to-interrupt-a-long-running-infinite-java-regular-expression/

2026-06-16T17:13:07.415659Z

I'm surpised how fast the safe Sci version is only 4 times slower than when I run it directly in clojure. This is probably still without JIT though

borkdude 2026-06-16T17:13:34.400349Z

you should maybe compare safe SCI vs unsafe SCI too?

2026-06-16T17:13:51.446979Z

true that's a better comparison, let me try

borkdude 2026-06-16T17:13:58.549209Z

PR welcome btw, I think it's a reasonable approach and at least having a test for this is good (test is kind of more valuable than the exact impl)

2026-06-16T17:17:22.029759Z

Yeah I guess a regression test would be to test that the interrupted version doesn't take longer than few milliseconds. Or maybe i can do something with counters as well

borkdude 2026-06-16T17:18:05.673249Z

yeah probably something side-effecty with atoms is good

borkdude 2026-06-16T17:18:47.811219Z

I should probably make a note in the README that using the core overrides makes your code slower. E.g. count will be very slow on non-counted? seqs since it doesn't use chunking

borkdude 2026-06-16T17:20:22.375429Z

> Note that the core overrides can introduce performance regressions in your code compared to the standard SCI clojure core functions.

👍 1
2026-06-16T17:22:51.427529Z

> you should maybe compare safe SCI vs unsafe SCI too? 27 seconds vs 9 seconds In this case it is a direct call of re-matches so not much Sci overhead except for the deftype sequence in the safe version

borkdude 2026-06-16T17:23:16.566999Z

right, so about 3x times slower due to the implementation

borkdude 2026-06-16T17:23:39.423219Z

PR welcome

2026-06-16T17:23:53.887309Z

will do tomorrow i think

borkdude 2026-06-16T17:23:59.484119Z

sure

borkdude 2026-06-16T17:43:52.791039Z

I realize all kinds of tweaks can be made. e.g. only do the check every 1024 chars or so

borkdude 2026-06-16T17:44:02.373049Z

same for count on chunked seqs etc

borkdude 2026-06-16T17:45:05.813269Z

but for now, I guess we'll just go with the simple option

2026-06-16T17:45:08.039259Z

How do you know you are at the 1024th char?

borkdude 2026-06-16T17:45:21.735409Z

I mean every 1024th charAt call

2026-06-16T17:45:58.276759Z

But this would be for optimization?

borkdude 2026-06-16T17:46:02.299469Z

yeah

borkdude 2026-06-16T17:46:47.994539Z

I guess :interrupt-fn could also get an argument of whereabouts its called, e.g. normal fn, or in regex context etc so the user can decide how much it counts towards the budget

2026-06-16T17:47:23.209589Z

yeah makes you want to make some things more expensive than others

2026-06-16T17:47:47.307819Z

Before I was thinking of having special counters for recursion. Like in general i would probably limit recursion

borkdude 2026-06-16T17:47:46.995759Z

I'm going to mark this feature experimental in the docs ;)

borkdude 2026-06-16T17:47:51.963169Z

so we can still make some changes

2026-06-16T17:48:08.887349Z

makes sense, it is pretty fresh, and experimental 🙂

borkdude 2026-06-16T18:02:06.520089Z

I guess people can also insert their own core overrides if they aren't happy with the granularity

borkdude 2026-06-16T18:02:12.132349Z

enough flexibility there

2026-06-16T18:02:30.329459Z

Yeah I agree

2026-06-16T18:08:36.617769Z

I guess the only more fixed thing itself would be the interrupt-fn signature. If you want to add a label for location that changes the signature and would affect all overrides as well. I guess you can built in some flexibility by having multiple arities and have :`unknown` as default or something. Maybe you can check what the arity is of the given interrupt-fn and adapt for what is given

borkdude 2026-06-16T18:09:33.410419Z

> Maybe you can check what the arity is of the given interrupt-fn and adapt for what is given that's way too expensive on the hot path and also unreliable

2026-06-16T18:10:15.686889Z

yeah i was thinking at the time of setting up the context, but not sure how reliable this is. Probably too hacky

borkdude 2026-06-16T18:10:24.082429Z

I also thought: maybe I could pass a map. but allocating a map is way too expensive too on this hot path

2026-06-16T18:12:37.847369Z

maybe the interrupt-fn needs to accept one argument for future cases of budgetting. I'm guessing (interrupt nil) is as fast as (interrupt)

borkdude 2026-06-16T18:12:42.179549Z

the regex stuff has already landed on master now

2026-06-16T18:12:51.919969Z

you are too fast

borkdude 2026-06-16T18:13:33.824419Z

yes, I thought of this too, we will just break when we need to

2026-06-16T18:13:34.822229Z

Looks good

2026-06-16T18:14:56.926169Z

btw chatgpt found a case where subsequence and tostring were called 1000 times as well. But you have to feed it a big string and regexp:

(do
                   (defn many-whole-input-captures [n]
                     (re-pattern
                      (str "^"
                           (apply str (repeat n "(?=(.*))"))
                           ".*$")))
                   (time (count
                    (re-matches
                     (many-whole-input-captures 1000)
                     (apply str (repeat 100000 \a)))))
                   )

2026-06-16T18:15:51.123919Z

so not sure if that is too relevant then, but probably doesn't hurt to add (ifn) to the other methods

borkdude 2026-06-16T18:16:10.326839Z

ok repros with PRs welcome tomorrow :)

2026-06-16T18:16:20.163189Z

haha yes

2026-06-16T18:33:04.140229Z

An idea for a future sci feature could be to assert the safety of the sandbox. E.g. (assert-sandbox ctx). E.g. in cljs this would fail if a regular expression function is allowed, but not in clj where the regular expression functions can be controlled. This assert-sandbox function can evolve over time. When we learn about new exploits it can be that a later version of Sci throws on an old configuration. This is what we want I think