Fork me on GitHub
#clojure
<
2022-04-11
>
Nom Nom Mousse07:04:39

Under what circumstances can't I cancel a future?

user=> (clojure.repl/doc future-cancel)
-------------------------
clojure.core/future-cancel
([f])
  Cancels the future, if possible.

gerritjvv08:04:54

looking at the clojure source https://github.com/clojure/clojure/blob/clojure-1.10.1/src/clj/clojure/core.clj#L7000

[^java.util.concurrent.Future f] (.cancel f true))
this just calls cancel on the Future class.

gerritjvv08:04:19

Returns: false if the task could not be cancelled, typically because it has already completed normally; true otherwise

gerritjvv08:04:34

also

This attempt will fail if the task has already completed, has already been cancelled, or could not be cancelled for some other reason

gerritjvv08:04:09

the "for some other reason" is imo because the JVM delegates to the OS so there might be other reasons unknown of when/why a future could not be canceled.

πŸ™ 1
delaguardo09:04:43

for example if in the future there is a tight loop running it might not be canceled.

Clojure 1.11.1
user=> (def s (atom 0))
#'user/s
user=> (def f (future (while true (swap! s inc))))
#'user/f
user=> @s
344653116
user=> (future-cancel f)
true
user=> @s
701648902
user=> @s
769185846
user=> @s
834034676
user=> @s
894427378
user=> @s
953997173
user=> @s
1003335986
user=> @s
1054499559
user=> @s
1101502513

πŸ™ 1
Nom Nom Mousse09:04:48

I was unable to cancel a future with a started ProcessBuilder

gerritjvv09:04:45

cool, if you send some code I can offer more advice πŸ˜›, ProcessBuilder will create an OS process, which is abstracted by the Process class, https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/lang/Process.html this class has two methods destroy and destroyForcibly() . If this is important for you I would suggest handling the Process object directly. My guess is your'e using the process inside a Future, canceling the Future would not kill/cancel the Process.

πŸ™ 1
Adam Kalisz14:04:36

It seems like re-matches in CLJ is sometimes 10x slower than the same thing in CLJS. E.g. for: (def regex-matcher #"[0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f]-[0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f]-[0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f]-[0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f]-[0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f][0-9A-Fa-f]") (re-matches regex-matcher "399d0134-9629-4bc6-8f8e-437de87eaaea")

p-himik14:04:14

Indeed, interesting. Same if you replace repeating patterns with a single copy followed by {n} and use interop instead of CLJ[S] functions, so it must be a difference between JS and Java.

πŸ‘ 1
isak15:04:06

Time to dust off Nashorn

p-himik15:04:03

Stumbled upon this: https://swtch.com/~rsc/regexp/regexp1.html Haven't read it yet but seems relevant.

πŸ‘ 1
littleli15:04:30

@U08JKUHA9 There is an official node implementation for graalvm. Nashorn is dead.

1
Adam Kalisz15:04:34

In JavaScript the multiplicative brace makes 2x speed difference in this case. In Clojure, there is no difference but both are much slower. Seems like there is a lot of potential for improvement somewhere.

Adam Kalisz15:04:56

The regex engine difference is a classic. The was a Cloudflare outage related to bad regex +- recently (2 years ago?).

Adam Kalisz15:04:41

Seems like Java doesn't use a suitable algorithm for regex? http://www.amygdalum.net/en/efficient-regular-expressions-java.html

ghadi15:04:31

I'd be suspicious of claims with no benchmarks

andy.fingerhut16:04:09

Perhaps it is already well known to everyone here, but Clojure/JVM uses JVM regex matching libraries, and ClojureScript uses whatever JavaScript regex matching libraries it uses, and those implementations are different. Clojure/ClojureScript make no attempts to hide those differences from a developer.

Adam Kalisz16:04:34

Sure, I was just surprised the performance difference on such a simple test is so strikingly large. It is a useful heads-up for people to know about an order of magnitude difference or not?

ghadi16:04:32

we don't know how you tested

ghadi16:04:00

benchmarking is really hard

p-himik08:04:17

What would you consider a proper benchmark for this case - same code across different platforms?

Adam Kalisz16:05:54

@U050ECB92 Just do a match of a UUID. You can see it performs rather poorly.

Danilo Oliveira15:04:00

I want to write a system that allows users to submit small programs remotely. But harmful programs should not be allowed to execute. They also should not be turing complete. Is it too dangerous to accept chunks of Clojure code as a string, parse it as a data, use some sort of allow-list of special forms and functions, and then run it as Clojure code?

πŸ‘ 1
delaguardo15:04:03

filtering input code based on some threatening level calculated by some heuristics will be a rat race. Instead you could prepare something like a sandbox environment where you can diminish potential harm. For example you can use https://github.com/babashka/SCI with preconfigured "allowed to use" namespaces and vars. Another option is to try GraalVM where you can run Java, Python, Ruby, JavaScript and other less mainstream programming languages in a sandbox.

gerritjvv16:04:30

second that. There are so many ways even with allow lists to circumvent security measures. Best is to run it in an isolated environment. I've seen GraalVM used as a sandbox where IO/threading etc is forbidden as a configuration option.

Danilo Oliveira18:04:26

I see. This is what I was afraid, maybe there will be always a way to circumvent and it will lead to some situation like log4shell. Maybe I can write my own language/instruction on top of EDN. Being non Turing complete is also important for what I want to achieve

Danilo Oliveira07:04:57

This is SCI is what I had in mind, thanks!!!!! @U04V4KLKC

restenb16:04:02

someone please remind me, is there some fixed hard limit on number of entries in an unbuffered (chan)?

hiredman16:04:46

An unbuffered channel contains nothing, it is just a synchronization point

πŸ’― 1
hiredman16:04:51

Containing nothing is kind of hand wavy, a channel is basically three places "things" are, a queue of writers, a buffer of values, and a queue of readers

hiredman16:04:42

An unbuffered channel mostly has nothing in the buffer of values ever (transducers on channel can mess with this)

hiredman16:04:32

The queues of readers and writers on channels are limited, there is a hard coded limit of 1024

hiredman16:04:08

(these queues growing without bounds is usually a big, bad flow control)

hiredman16:04:04

A put! is queued as a writer even if you don't pass a callback, which is why most uses of put! are bad (broken flow control that queues writers without bound)

restenb16:04:14

i also use put! everywhere. now my head hurts. πŸ˜…

hiredman16:04:31

Yeah, don't use put!

restenb16:04:59

what to use then? >!! instead? iirc >!! just does buffer checking and then calls put! anyway

hiredman16:04:20

If you must use put! then you should use the callback arity and pass the continuation of whatever you are doing as a callback

hiredman16:04:03

>!! calls put in a way that blocks the current thread until writer is matched with a reader

hiredman16:04:31

Proper flow control, no unbounded groth

restenb16:04:00

and the callback will only be called if the put! succeeds I take it?

hiredman16:04:52

Bad (loop [] (put! ch (get-work)) (recur)) good ((fn f [_] (put! ch (get-work) f)) nil)

restenb16:04:53

these are mostly polling queues so they tend to be emptied out by some external call every N seconds, or be of a fixed size that is drained over time

hiredman16:04:55

I missed the channel in those example calls to put(fixed)

Joshua Suskalo20:04:58

I've started to see thread-last pipelines as basically a code smell, telling me that something should be a transducer. I'm curious how common of an opinion that is these days, and if others think there's still a good common reason to prefer a thread-last pipeline (besides maybe readability in performance-insensitive contexts).

Cora (she/her)20:04:58

Clojure's threading macros (the -> and ->> thrushes) are great for navigating into data and transforming sequences. injest's path thread macros +> and +>> are just like -> and ->> but with expanded path navigating abilities similar to get-in.

Transducers are great for performing sequence transformations efficiently. x>> combines the efficiency of transducers with the better ergonomics of +>>. Thread performance can be further extended by automatically parallelizing work with =>>.

Joshua Suskalo20:04:25

that sounds interesting. Basically a macro that automatically performs a translation for some clojure code in a thread last into a transducer equivalent?

Cora (she/her)20:04:36

yep, that's what it does

Joshua Suskalo21:04:04

oh, so I see this #(do []) and I'd like to raise #(-> []) which also feels pretty good

Joshua Suskalo21:04:35

This is a really cool library, thanks for sharing!

1
Cora (she/her)21:04:45

no prob! πŸ’œ

Alex Miller (Clojure team)21:04:40

Disagree with the original post - sequences are totally fine if the size is small or transformations are few or especially, if you don't actually need all the results (in which case transducer is probably slower)

Alex Miller (Clojure team)21:04:46

And you may find that timing comparisons won't hold up in future versions

Joshua Suskalo22:04:37

That's good to know, thanks Alex.

didibus23:04:14

I find that transducers are more complicated to use, and have more edge cases. So my personal default is to use sequences unless I specifically need the performance.

didibus00:04:53

I think what can be a code smell is a mixture of ->> and ->, because that implies you've lost the laziness, since the -> will force realize the ->>, and maybe that's then best to just switch to transducers for it all... but again, even there I think it can sometimes be more complex to switch, so I don't know, I'd probably still use sequences all the time unless I need specific performance

didibus00:04:51

Or I need explicit control on the realization, like with side-effects

Ben Sless03:04:45

Just add an inline meta to all sequence functions and get operators fusion at compile time 😏

Jeffrey Bay21:04:47

hi all - trying to further optimize some pretty heavily optimized clojure code, and thought of trying to build a java set and then "wrap" it in a clojure set - is that something that can be done easily like you can do with vec?

hiredman21:04:39

Clojure sets are java sets

hiredman21:04:20

In the sense that they are implemented in java and in the sense that they implement java.util.Set

hiredman21:04:01

https://github.com/clojure/data.int-map#sets is an example of a more specialized clojure set implemented in a mix of clojure and java

joshjones22:04:53

I have a protocol P defined in namespace A, a type T (record) defined in namespace B which implements a different protocol there, and a namespace C where T is extended to implement P. The code in namespace C only defines the implementation of P ; it isn't loaded from anywhere, and when I fail to force it to reload in the REPL I get the error that the type doesn't implement the protocol. When I load it in the REPL, it works as expected. Any suggestions on how to structure things so that C is properly loaded?

isak22:04:07

Not sure if it makes sense based on the names, but can you require A, B, and C from another namespace? For example, the entrypoint to your app, or some sub-section of it?

noisesmith23:04:00

you can either always reload C and redefine all instances of P if A is changed, or use a setup like stuartsierra/component or weavejester/integrant that automates that reloading based on the dependency graph

joshjones23:04:21

hmm.. maybe. What's interesting here is that it's already being loaded via mount. It implements another protocol also. However, previously the type extended the protocol in the same namespace, but I've moved the implementation out of the namespace where the protocol is defined.

noisesmith23:04:33

also, maybe a nitpick, but I find the concept of "code that isn't loaded from anywhere" strange. if it's not loaded it doesn't get run. so in that sense it might help to do as @U08JKUHA9 suggests and make a namespace that does need A, B, and C

noisesmith23:04:06

OK - if you are using mount and seeing this problem, you are using mount wrong (or it could be a bug in mount, I'm not a fan of that particular lib and avoid it)

noisesmith23:04:10

mount should know about the relationship between the definition of P and the usage of that P by C

noisesmith23:04:22

and it should ensure the reloading is done correctly for you

joshjones23:04:18

Mount is handed an instance of the type.. but how would it know to load other namespaces where that type implements some other protocol?

joshjones23:04:57

I think I will create a new namespace here and see where that path leads. Thanks for the suggestions!

noisesmith23:04:09

as I understand it, the correct solution with mount is to use defstate so that mount knows about the dependencies between the definitions, so that the reloading is done coherently https://github.com/tolitius/mount#the-importance-of-being-reloadable

didibus00:04:17

I'm confused, where do you expect to use the type T with functions from the protocol P ? Wherever you expect to do that, that namespace needs to require A, B and C

βž• 1
didibus00:04:53

If the protocol function is foo and bar for example, and you plan to use foo and bar in D to modify a record of type T, then D must require B for the record, and it must require A for the protocol, and C for the extension of T to P

didibus00:04:20

Or C could require A and B, and then D requires C