Fork me on GitHub
#graphql
<
2022-09-14
>
hlship16:09:10

Perhaps the default executor should be less bounded? I suspect that if you are doing testing that is executing multiple queries in parallel, you can hit the thread pools max threads limit pretty quickly, however:

(defn ^{:added "1.2"} default-executor
  "Creates and returns a default executor used if one is not explicitly provided.
  Returns an unbounded ThreadPoolExecutor, with a maximum of 10 threads, and a 1 second
  keep-alive."
  []
  (let [queue (LinkedBlockingQueue.)
        *thread-id (atom 0)
        ^ThreadFactory factory (reify ThreadFactory
                                 (^Thread newThread [_ ^Runnable runnable]
                                   (Thread. runnable
                                            (format "GraphQL Executor #%d" (swap! *thread-id inc)))))]
    (ThreadPoolExecutor. (int 0) (int 10)
                         1 TimeUnit/SECONDS
                         queue
                         factory)))
This is the default, an unbounded queue, with a max of 10 concurrent threads. So I'd like to know more about what failed for you as under lots of parallel pressure this may get slow but I'd be very concerned if it started locking up.

thumbnail17:09:25

a coworker reported that the test locked up while doing a schema/execute call inside a resolver, with only 1 thread in use. So the 10-threads doesnt seem to be the problem.

thumbnail17:09:59

I dont recall more information, but ill ask them to checkout this thread and provide any missing info

hiredman17:09:58

anything that blocks on the result of something else in the queue to execute wouldn't lock up with an unbound pool but has a high chance of locking up with a bounded one.

hiredman17:09:21

which might be some way to share information between resolvers using promises

hiredman17:09:49

we have something like that at work, but it is using core.async promise-chans and we have an adapter between core.async and ResolverResult so no real blocking, but if you did something like that using regular clojure.core/promise and really blocked

hlship17:09:44

So, that pool is primarily used for callbacks when a ResolveResultPromise delivers a value. It's also used for the top-level query execution.

hlship17:09:29

A thread dump would be helpful!

hiredman17:09:16

I think the analysis in the initial issue suspect, it calls out the deliver in the catch as potentially raising an exception that would escape when the callback runs, but I think according to https://github.com/walmartlabs/lacinia/blob/8b2ca6488e47a024c2cbef151fdcd6a25c30636b/src/com/walmartlabs/lacinia/resolve.clj#L187-L191 the won't be executed "there" (a little tricky to tell, it is conditional), but will be queued on the executor again to run, which is already using execute

hiredman17:09:09

I guess that assumes the callback executor is bound to something

hiredman17:09:34

(which, if you are already binding callback executor, the new code appears to clobber that binding)

hlship21:09:27

Good point. The default should be the current binding of the var, if non-nil, or a new instance if nil.

Dominik Engelhardt08:09:20

Coworker of @UHJH8MG6S here. We have a bit of a weird test setup, I guess. One of our resolvers does an http request to query a separate GraphQL endpoint instead of a database. In our integration test we don't want to have the actual external service, so we extend our lacinia schema to include said endpoint plus a mock resolver. Now with the new default executor we saw the http GraphQL request time out in the first resolver. Debugging this, only a single thread is created from the default executor thread pool. It seems that with the LinkedBlockingQueue (capacity Int/MAX_VALUE) used, tasks are always queued onto the first thread instead of new threads started. So the task of resolving the second resolver ends up in the same queue, but behind the first one, which is still waiting on the second one. So a nice deadlock. Using Executors/newCachedThreadPool instead of the default executor works, because they use a zero-capacity SynchronousQueue. In that case, new threads are started (or idle ones reused), because offering to the queue never succeeds. So I'm not sure if our setup is something that should be supported or not, but I do think that the default executor should probably use more than a single thread, rather making the max number of threads unbounded instead of making the queue unbounded.

Dominik Engelhardt08:09:59

Also, instead of setting the min number of threads to zero, creating threads as daemon threads in the thread factory would probably also prevent the process from hanging at the end of tests.

hlship16:09:08

I don't want to use daemon threads as that can cause tests runs to hang after tests complete. But I'll look into the above; seems like I chose the wrong queue. I'll add some tests around the default thread pool.

hiredman16:09:44

nesting query execution is going to get weird because of the usage of dynamic variables, if in your test you moved the execution of the subquery to another thread (ideally using a non-binding conveying means, but I think it would work even with) then that would likely unblock things

hlship16:09:35

My goal with the default executor is that things should work with it, but if set up and tuned correctly for the workload, work better.

❤️ 1
thumbnail16:09:02

@U0NCTKEV8 that's definitely something we can try!