Hello. Sometime my app, after running for a while, goes into WSOD and won't recover (restart ring server does not do). On the same process after WSOD, I've noticed that m/sleep no longer terminate, e.g. (m/? (m/sleep 500)). I'm still struggling to pin down the issue with my codebase. Can anyone help me see what might cause this?
Are you doing blocking calls? For example (e/server (slow-query)) If so, changing them to (e/server (e/Offload #(slow-query))) might help 😄
electric v3, right?
It sounds like a deadlock to me, please tell us more about what your app does and if you are using e/Offload
also are you using any sleeps or clocks, and how - is it m/sleep or is it thread/sleep or is it e/System-time-ms
Yes most of the above 🤣 . It's v3. It's a retailer inventory management system, about 7k LOC. I don't use Thread/sleep nor e/System-time-ms, though I use m/sleep a lot, for eventually doing something, e.g. closing a modal like: (case (e/Task (m/sleep ...)) (CloseModal)), for polling, which I have a subroutine I've just rewritten after seeing this https://github.com/hyperfiddle/electric/blob/45f7881df46f86e91a1730e893a05bed6cf4e728/src/contrib/missionary_contrib.cljc#L43,
(defn poll-f-1 [ms f]
(m/ap (m/amb (f)
(loop []
(m/? (m/sleep ms))
(m/amb (f) (recur))))))
(defn poll-f-2 [ms f]
(m/ap (loop [r (f)]
(m/amb r
(recur (do (m/? (m/sleep ms))
(f)))))))
Also, I've been siting control flows and e/for on client since leaving them on server could blow up the page. Not sure how true is this, but after re-siting, it does help.I think it's a deadlock, the good news is the next Electric release (including relevant parts of missionary) is lock-free, making deadlocks impossible. The bad news is we're a couple months away from delivering the release
as for a workaround - how are these fns being called from electric? What is the ms param as well?
Right on! Glad to hear about the new release. The poll function I used like this:
(e/server (e/input
(poll-f 1000 ; every second yields new formatted time
(fn [] (tick/format "HH:mm:ss dd/MM"
(tick/zoned-date-time))))))
But I also got rid of it in my attempt to fix WSOD. Now I use them only on client. I moved m/sleep+e/Task and control flows to client side. Made sure to e/Offload blocking calls yes. I haven't see WSOD anymore so far.Here I use poll function on client to query info from Android hosting webview. These calls on client I didn't get rid of.
(e/client
(e/reduction {} nil
(poll-f 1000
(fn [] (.post js/Android "bluetooth" "connected_device")))))Just to confirm - e/offload of blocking calls eliminated the problem?
what is an example of this? I am surprised
Well, I'm not sure what eliminated the problem since I couldn't reliably reproduce it in the first place. Only saw them in prod after a while. Actually sorry about the confusion, I didn't change anything about e/Offload when the problem went away, I only mean I went back to check that I have offload all blockings JDBC query for example. What I did change in the few last commits before it seemed to have fixed is avoiding m/sleep including the poll function on server side.
For example, here I wait 4 seconds on client instead of on server, before mutating an atom on server.
- (do (e/server (case (e/Task (m/sleep 4000 item)) (reset! !curr-tag nil)))
+ (do (case (e/Task
+ (m/sleep 4000
+ (e/server (:products/id item))))
+ (e/server (reset! !curr-tag nil) :OK))
yeah, m/sleep has been a major source of deadlocks historically at the missionary level so I am not surprised. You may have more success using (e/Offload #(do (Thread/sleep 1000) x)) instead of m/sleep