missionary

Eric Dvorsak 2024-04-16T09:14:45.018919Z

what is the context of the success continuation function and an error continuation function, are their limitations documented somewhere? As I mentionned yesterday I have noticed cases where some async(?) code in these functions did not seem to run (timbre/log not logging anything but println working fine)

Eric Dvorsak 2024-04-17T07:01:06.638579Z

No they are both lost if I don't add a try catch around the log statement

Eric Dvorsak 2024-04-17T07:02:26.862189Z

Also I have to run a test I'm starting to wonder if httpkit is causing the interruption as it uses loom threads

leonoel 2024-04-17T08:04:46.433399Z

ok that's definitely not right, would be great if you could provide a full repro

Eric Dvorsak 2024-04-17T08:20:48.005139Z

@leonoel interestingly without using http-kit newVirtualThreadPerTaskExecutor I don't get that exception anymore. It still take more than 30 sec (sometime minutes) between the print I have right before throwing and the print before the log in the catch statement (so theoritically nothing inbetween) I will try to make a much more minimal repro

Eric Dvorsak 2024-04-17T08:23:07.372589Z

but yeah without virtual thread timbre logs work fine within missionary

Eric Dvorsak 2024-04-17T09:08:18.380019Z

@leonoel I refined the issue https://github.com/leonoel/missionary/issues/107 writing a repro really helped narrowing down the issue, the repro now only needs missionary

Eric Dvorsak 2024-04-17T09:24:30.248719Z

So essentially a blocking call is as much of an issue in the task than it was in the success and error continuation functions, but where I got confused is that it is seems to only really cause an actual bug when these conditions are met: • throwing an exception • from the m/blk executor • in a m/reduce

leonoel 2024-04-17T12:42:28.319039Z

I do not think these snippets show any exception being swallowed

Eric Dvorsak 2024-04-17T12:47:30.470099Z

what do you mean? in the last one in particular, you can see everything prints, but if you remove the try-catch around the deref/future it stops printing after "Printing in future"

leonoel 2024-04-17T12:52:19.751879Z

That is the correct behavior. deref throws InterruptedException, it is not caught therefore it propagates to the sp task, then you get it in the failure callback

leonoel 2024-04-17T12:53:00.762689Z

Are you challenging that deref throws ?

Eric Dvorsak 2024-04-17T12:53:29.583709Z

I guess I'm confused that it only throws when all the above conditions are met

leonoel 2024-04-17T12:54:12.351549Z

I do not consider it a bug the fact that user code can be run on a thread in interrupted state. While missionary generally discourages blocking outside of m/via, it may sometimes be acceptable if the blocking is quick, in this case the blocking call should ignore thread interruption.

Eric Dvorsak 2024-04-17T12:54:52.601799Z

that is the case here, it's just printing

leonoel 2024-04-17T12:55:15.287499Z

log statements arguably fall in this category and I find questionable the decision to make timbre logs interruptible

Eric Dvorsak 2024-04-17T12:55:15.850369Z

the error is indeed in the callback but then it will throw again if eg one logs errors with timbre because it will again use deref

Eric Dvorsak 2024-04-17T12:56:12.226949Z

I agree with you about that, I don't think it's explicitely interruptible, it happens to use a promise/future/deref

Eric Dvorsak 2024-04-17T12:59:18.995439Z

how come the interruptible state is only there when throwing in a blocking task while reducing over a flow? eg this works

((m/sp
     (try
       (m/? (m/reduce
             (constantly nil)
             (m/ap (let [s (m/?> (m/seed [1 2 3]))]
                     (m/? (m/sp (throw (ex-info "BOOM" {}))))))))
       (catch Exception e
         (println "Catching the exception")
         @(future (println "Printing in future"))
         (println "This would never print without catching the exception from deref")))
     @(future (println "Printing in future outside catch"))
     (println "And this prints because the deref above doesn't blow up"))
   (constantly nil) println)

leonoel 2024-04-17T13:00:33.278839Z

that's an implementation detail of m/via

Eric Dvorsak 2024-04-17T13:01:59.024229Z

how can one safely ignore thread interruption in exception handling code?

leonoel 2024-04-17T13:05:37.860379Z

(defn uninterrupted [f]
  (let [i (Thread/interrupted)]
    (try (f)
         (finally
           (when i (.interrupt (Thread/currentThread)))))))

leonoel 2024-04-17T13:06:38.369109Z

((m/sp
   (try
     (m/? (m/reduce
            (constantly nil)
            (m/ap (let [s (m/?> (m/seed [1 2 3]))]
                    (m/? (m/via m/blk (throw (ex-info "BOOM" {}))))))))
     (catch Exception e
       (println "Catching the exception")
       (uninterrupted #(deref (future (println "Printing in future"))))
       (println "This would never print without catching the exception from deref")))
   @(future (println "Printing in future outside catch"))
   (println "And this prints because the deref above doesn't blow up"))
 prn prn)
this will print past the catch

Eric Dvorsak 2024-04-17T13:07:30.800479Z

Thank you very much for helping me debugging through that!

Eric Dvorsak 2024-04-17T13:08:08.571729Z

Now I'm aware of how it work and won't trip on it anymore, but I'm worried that newcomers might be quite confused by the difference in behavior

Eric Dvorsak 2024-04-17T13:09:43.406479Z

technically it is documented in via doc "Cancellation interrupts the evaluating thread."

leonoel 2024-04-17T13:09:44.087019Z

np, thank you for this feedback I will think about possible ways to mitigate this or at least document it better

leonoel 2024-04-17T13:11:02.474089Z

yes, but what is not obvious is user code can be called from an interrupted thread even in a non-cancelled process

leonoel 2024-04-17T13:12:13.591059Z

let me know if you find any quirks with virtual threads, I've not experimented much with project loom yet

Eric Dvorsak 2024-04-17T13:15:45.763239Z

trying to write a small issue describing the problem as temporary documentation, still not sure I understand why this behavior of via only causes the issue when the via is in a flow that is reduced, if its just in a sp it works fine

((m/sp
   (try
     (m/? (m/via m/blk (throw (ex-info "BOOM" {}))))
     (catch Exception e
       (println "Catching the exception")
       @(future (println "PRINTING"))
       (println "This prints because the above doesn't blow up")))
   @(future (println "PRINTING"))
   (println "And this prints because the deref above doesn't blow up"))
 (constantly nil) println)

Eric Dvorsak 2024-04-17T13:17:44.199939Z

oh I think I get it it's because there's only 1 blocking task and it throws so it's not cancelled

leonoel 2024-04-17T13:23:11.127939Z

I don't know the answer but it's clearly implementation specific

Eric Dvorsak 2024-04-17T13:23:22.049859Z

Right I can reproduce like this as well:

((m/sp
   (try
     (println "Running")
     (m/? (m/join (m/via m/blk (throw (ex-info "BOOM" {})))
                  (m/via m/blk (throw (ex-info "BOOM2" {})))
                  (m/sleep 10000)))
     (catch Exception e
       (println "Catching the exception")
       @(future (println "WONT PRINT 1"))
       (println "WONT PRINT 2")))
   @(future (println "WONT PRINT 3"))
   (println "WONT PRINT 4"))
 (constantly nil) println)

leonoel 2024-04-17T13:25:01.700819Z

yes, you just need a task that is terminated by cancelled m/via

Eric Dvorsak 2024-04-17T13:26:23.555979Z

but what does interrupting the evaluating thread bring? what do other missionary construct do, just throw missionary.Cancelled exception?

leonoel 2024-04-17T13:29:03.096839Z

that is the point of process supervision, the pending threads must shut down cleanly to run finally blocks and release resources

leonoel 2024-04-17T13:30:11.125059Z

Cancelled is just a special value to indicate the process could not run to completion, like InterruptedException

Eric Dvorsak 2024-04-17T13:33:15.596649Z

oh I see the part about finally blocks is mentionned here https://cljdoc.org/d/missionary/missionary/b.36/doc/readme/tutorials/hello-task?q=finally#parallel-composition I guess last time I read through it I wasn't ready to really understand

Eric Dvorsak 2024-04-17T13:41:57.313309Z

wouldn't it make sense that compel also inhibits the interruption and not just cancellation?

leonoel 2024-04-17T13:46:41.473289Z

not really, all operators rely on the idea that it doesn't matter which thread runs user code (except m/via, but you can still m/compel a m/via to inhibit interruption)

leonoel 2024-04-17T13:49:43.213889Z

what could make sense is to ensure m/via always clears interruption flag before running callbacks, but I have to check the implications first

Eric Dvorsak 2024-04-17T13:53:33.895369Z

I think the biggest issue is that this can cause the unclean shut down of supervised processes, since the interrupted exception will cause any (accidently) uninterruptible code in a finally block to throw

Eric Dvorsak 2024-04-17T14:02:46.942099Z

as in:

((m/sp
   (try
     (println "Running")
     (m/? (m/join []
                  (m/sp (try
                          (m/sleep 1000)
                          (finally
                            (println "finally task")
                            @(future (println "preparing cleanup task"))
                            (println "cleanup task DONE"))))
                  (m/via m/blk (throw (ex-info "BOOM" {})))))

     (finally
       (println "finally supervisor")
       @(future (println "preparing cleanup supervisor"))
       (println "cleanup supervisor DONE"))))
 println println)

leonoel 2024-04-17T19:48:47.593359Z

@yenda1 I released b.37. The continuation of m/via now runs after clearing the interruption flag, which means interruptible user code should not be interrupted anymore.

Eric Dvorsak 2024-04-17T21:48:52.783629Z

Thanks it worked!

leonoel 2024-04-16T09:21:02.015919Z

the limitations are • don't block the thread • don't throw exceptions • return quickly

Eric Dvorsak 2024-04-16T09:23:34.625319Z

So in the task specification the sentence "it must return a canceller, must not throw and must not block the calling thread" also apply to the success continuation function and error continuation functions?

leonoel 2024-04-16T09:23:48.022229Z

yes

Eric Dvorsak 2024-04-16T09:23:48.799009Z

or is it specific to missionary implementation of the spec

leonoel 2024-04-16T09:25:04.067779Z

but it should not matter if timbre/log is wrapped in an m/via

leonoel 2024-04-16T09:25:58.878579Z

it is not implementation specific

Eric Dvorsak 2024-04-16T09:26:58.911649Z

ok, it's an interesting gotcha because one doesn't necessarily think that a log statement might be doing crazy async stuff in the background

Eric Dvorsak 2024-04-16T09:28:09.528589Z

that got me worried the past few days of experimenting I've add quite a few runs where I thought missionary was not properly terminating the task because it wasn't logging from the success continuation

Eric Dvorsak 2024-04-16T09:28:22.621839Z

and I didn't suspect a log statement to fail

Eric Dvorsak 2024-04-16T09:28:38.594239Z

So I assume because timbre must be blocking the thread or something, it falls into undefined behavior

Eric Dvorsak 2024-04-16T09:29:32.312699Z

I mean it does as we saw the other day from that error https://clojurians.slack.com/archives/CL85MBPEF/p1712591473744209

leonoel 2024-04-16T09:36:17.492589Z

yes I would not have expected that either

leonoel 2024-04-16T09:39:05.720019Z

missionary operators could be more resilient to callback abuse, e.g. redirecting exceptions to stderr

Eric Dvorsak 2024-04-16T09:46:22.728189Z

so these callbacks are not regular clojure functions, they run in some coroutine context that might terminate them early

leonoel 2024-04-16T09:48:11.951529Z

yes

leonoel 2024-04-16T09:48:42.316399Z

callbacks are machinery, user is not supposed to manipulate them

Eric Dvorsak 2024-04-16T09:58:04.034589Z

so a recommended pattern would actually be to just do at the top of the supervision tree? (task (constantly nil) (constantly nil)) and if you want to do something on success the top-level task would be:

(m/sp (let [res (m/? (actual-task))] (m/? (on-success res)))
and a try/catch for on-error?

leonoel 2024-04-16T09:59:55.779699Z

yes, that's the recommended pattern

Eric Dvorsak 2024-04-16T10:12:56.921279Z

Thanks! Is that somewhere in the doc/wiki that I missed?

leonoel 2024-04-16T10:38:22.239129Z

I don't think it's anywhere

Eric Dvorsak 2024-04-16T10:40:11.536739Z

I can write it down in an issue so you can edit/copy it into the docs. Seems like an important detail, isn't it basically how one would use missionary in prod?

leonoel 2024-04-16T10:42:43.983539Z

yes I agree it's important, you can file an issue

leonoel 2024-04-16T10:54:15.549469Z

thanks

Eric Dvorsak 2024-04-16T10:55:25.537179Z

@leonoel here it is https://github.com/leonoel/missionary/issues/105 I think there is also other ways for the error maybe something with m/compel Also would unsupected async like timbre log have unexpected behavior as well if they are run outside of m/via? eg:

(defn task []
  (m/sp 
     (let [res (m/? (actual-task))] 
       (log/info "SUCCESS!"))

Eric Dvorsak 2024-04-16T10:56:08.342269Z

I'll refine it through the day.

Eric Dvorsak 2024-04-16T20:13:27.006529Z

So yes, even inside the task it's not safe to use timbre logging (at least directly). I was logging the error:

(defn task []
  (m/sp 
    (try
      (m/? (actual-task))
      (catch Exception e
         (log/error "OOPS"))
It's actually the same root cause that was preventing the logs from appearing in the success and cancel continuation functions, log can throw a java.lang.InterruptedException exception from a deref in:
[taoensso.encore$eval3772$get_hostname__3773 invokePrim encore.cljc 5621]
When that happens the exception is silenced. It's interesting that I finally found it by wrapping the log statement in the catch block with a try catch block

leonoel 2024-04-17T06:30:32.221639Z

Just to be clear, when you say the exception is silenced you mean due to log/error throwing another exception the original e is lost ?