Here is @whilo’s :interrupt-fn PR with my changes on top with how I think it should work.
Docs: https://github.com/whilo/sci/blob/49a58c22afab3bd2392db8f782f1c41382d7b7bb/doc/interrupt.md
The caveats in the docs still make me wonder: should we do this or not...
Performance-wise there are no regressions, so that's not a reason to not include it. @whilo is already using this feature in a project he announced recently:
https://clojurians.slack.com/archives/C06MAR553/p1781382004787979
cc @mkvlr @jackrusher @whilo @bhauman @jeroenvandijk @smith.adriane
I guess this interrupt-fn approach handles function recursion which is nice
The example in the docs .pow can be countered with what I had in mind here https://github.com/babashka/sci/pull/1027 I just need to finish it 🙃
yeah sure, you can lock down interop, that's fine, people just need to be aware of the caveats
Should be ok I think. There are enough warning signs
I like it and the limitations sound reasonable. I wasn't expecting this feature to work around the halting problem. Seems quite polished to already provide interruptible core functions that can be brought in easily.
This seems great 👍
k merged
Regarding performance, there should be a tiny bit of overhead because of the (when (some? interrupt-fn#) ...) check, right? In a simple loop with many iterations you don't see this? https://github.com/babashka/sci/compare/master...whilo:sci:49a58c22afab3bd2392db8f782f1c41382d7b7bb#diff-7b599c1eeb2f916482bdf63e8ce63c9a342e165b8e930bc722bccad3d994d914R40
(some? ...) compiles to a null check, it's not noticeable. just test it out locally
I believe you. Ok that's great 👍 just wanted to understand it
this is why I asked while to change it to (some? ..) instead of (if interrupt-fn) since the latter goes through the equiv function which is an extra frame
eh sorry not equiv, but the truth function
at least, that's the case in CLJS... but in the JVM it's an actual function call, thanks for mentioning that
I'll get rid of those
I didn't see any perf regressions though, but just to make sure
Would be cool now to create some example sandboxes with a subset of clojure core that we assume now are safe for malicious code (so at least no interop yet)
do you think providing a jvm interrupt fn along those lines could be sensible or would you rather folks always write their own?
(defn throw-when-interrupted []
(when (java.lang.Thread/.isInterrupted (Thread/currentThread))
(throw (java.lang.InterruptedException.))))I think we can bring it into the example and that should probably be sufficient?
like: check if counter is higher than OR Thread/isInterrupted OR time has passed
I'll make that change
Regarding one of the examples that would bypass interrupt-fn mentioned in https://github.com/babashka/sci/issues/1038#issuecomment-4266924033 This is one doesn't trigger anything?
(re-matches #"(a+)+$" "aaaa…aaab")
;=> nil
I was curious to look at a workaround as mentioned https://www.ocpsoft.org/regex/how-to-interrupt-a-long-running-infinite-java-regular-expression/, but even that one doesn't create an infinite loop
Are these old examples maybe? Is there another example?that was a bad example
refer to the docs for a good example
on master
FYI, couldn't find a regexp example in the docs, but this one takes a long time on my machine (6 seconds in clj):
(time
(re-matches #"^(.*a){20}$"
(str (apply str (repeat 28 \a)) "!")))
I picked the nil check because the JIT should prove that it can be removed when the function is nil in general. Which is also what I observed in tests. I think being careful about merging features like this does make some sense, because the question is how systematic and compositional the approach can be. I think it should be ok, but it is still somewhat unclear how to turn sci fully into a robust sandbox. I think LLMs maybe don't need that, but to add maintainable features this is a very valid thing to consider. If the interpreter was written in CPS style this was almost trivial to do, but this itself introduces overhead and sci integrates more intimately with standard Clojure than would be possible in a CPS-transformed interpreter maybe.
yeah, it just works differently in CLJS than in JVM Clojure, but even with some? on the JVM I didn't actually see a perf regression in real usage, only in a microbenchmark that zoomed in on that specific check
https://clojurians.slack.com/archives/C06MAR553/p1781623160599079
A safer version of re-matches in Sci https://gist.github.com/jeroenvandijk/d0cbc94552025a189a1ae9fc8916223a
Follow up on https://clojurians.slack.com/archives/C015LCR9MHD/p1781627041425219?thread_ts=1781616545.093549&cid=C015LCR9MHD
clever!
ChatGpt generated this example btw so can't give credits to a human, but i'm sure it was made up by someone The sequence wrapper looks similar to this Java version https://www.ocpsoft.org/regex/how-to-interrupt-a-long-running-infinite-java-regular-expression/
I'm surpised how fast the safe Sci version is only 4 times slower than when I run it directly in clojure. This is probably still without JIT though
you should maybe compare safe SCI vs unsafe SCI too?
true that's a better comparison, let me try
PR welcome btw, I think it's a reasonable approach and at least having a test for this is good (test is kind of more valuable than the exact impl)
Yeah I guess a regression test would be to test that the interrupted version doesn't take longer than few milliseconds. Or maybe i can do something with counters as well
yeah probably something side-effecty with atoms is good
I should probably make a note in the README that using the core overrides makes your code slower. E.g. count will be very slow on non-counted? seqs since it doesn't use chunking
> Note that the core overrides can introduce performance regressions in your code compared to the standard SCI clojure core functions.
> you should maybe compare safe SCI vs unsafe SCI too? 27 seconds vs 9 seconds In this case it is a direct call of re-matches so not much Sci overhead except for the deftype sequence in the safe version
right, so about 3x times slower due to the implementation
PR welcome
will do tomorrow i think
sure
I realize all kinds of tweaks can be made. e.g. only do the check every 1024 chars or so
same for count on chunked seqs etc
but for now, I guess we'll just go with the simple option
How do you know you are at the 1024th char?
I mean every 1024th charAt call
But this would be for optimization?
yeah
I guess :interrupt-fn could also get an argument of whereabouts its called, e.g. normal fn, or in regex context etc so the user can decide how much it counts towards the budget
yeah makes you want to make some things more expensive than others
Before I was thinking of having special counters for recursion. Like in general i would probably limit recursion
I'm going to mark this feature experimental in the docs ;)
so we can still make some changes
makes sense, it is pretty fresh, and experimental 🙂
I guess people can also insert their own core overrides if they aren't happy with the granularity
enough flexibility there
Yeah I agree
I guess the only more fixed thing itself would be the interrupt-fn signature. If you want to add a label for location that changes the signature and would affect all overrides as well. I guess you can built in some flexibility by having multiple arities and have :`unknown` as default or something. Maybe you can check what the arity is of the given interrupt-fn and adapt for what is given
> Maybe you can check what the arity is of the given interrupt-fn and adapt for what is given that's way too expensive on the hot path and also unreliable
yeah i was thinking at the time of setting up the context, but not sure how reliable this is. Probably too hacky
I also thought: maybe I could pass a map. but allocating a map is way too expensive too on this hot path
maybe the interrupt-fn needs to accept one argument for future cases of budgetting. I'm guessing (interrupt nil) is as fast as (interrupt)
the regex stuff has already landed on master now
you are too fast
yes, I thought of this too, we will just break when we need to
Looks good
btw chatgpt found a case where subsequence and tostring were called 1000 times as well. But you have to feed it a big string and regexp:
(do
(defn many-whole-input-captures [n]
(re-pattern
(str "^"
(apply str (repeat n "(?=(.*))"))
".*$")))
(time (count
(re-matches
(many-whole-input-captures 1000)
(apply str (repeat 100000 \a)))))
)
so not sure if that is too relevant then, but probably doesn't hurt to add (ifn) to the other methods
ok repros with PRs welcome tomorrow :)
haha yes
An idea for a future sci feature could be to assert the safety of the sandbox. E.g. (assert-sandbox ctx). E.g. in cljs this would fail if a regular expression function is allowed, but not in clj where the regular expression functions can be controlled.
This assert-sandbox function can evolve over time. When we learn about new exploits it can be that a later version of Sci throws on an old configuration. This is what we want I think