code-reviews

2024-08-06T22:47:03.817469Z

Wrote up a 'poor man's dataloader'. Would love your thoughts! Context A user can make N concurrent database requests:

SELECT * FROM users WHERE id = 1 
SELECT * FROM users WHERE id = 2
(these are analogous to get-one) But we want to instead batch it together
SELECT * FROM users WHERE id IN (1, 2)
(this is analogous to get-batched) Here's how I achieve it:
(def loader (atom {:requests [] :schedule-delay nil}))

(defn get-batched [inputs]
  (println "batching!" inputs)
  (mapv (fn [id] (keyword (str "r-" id)))
        inputs))

(defn run-batch! []
  (let [[old _] (swap-vals! loader
                            (fn [_]
                              {:requests []
                               :schedule-delay nil}))
        {:keys [requests]} old
        inputs (mapv :input requests)
        results (get-batched inputs)
        result-promises (mapv :result-promise requests)]
    (doseq [[p r] (map vector result-promises results)]
      (deliver p r))))

(defn schedule [request]
  (let [{:keys [schedule-delay]}
        (swap! loader (fn [{:keys [requests schedule-delay]}]
                        {:requests (conj requests request)
                         :schedule-delay (or schedule-delay
                                             (delay
                                               (ua/vfuture
                                                (Thread/sleep 5)
                                                (run-batch!))))}))]
    @schedule-delay))

(defn get-one [input]
  (let [p (promise)
        m {:input input
           :result-promise p}]
    (schedule m)
    @p))

(comment
  (pmap (fn [i] (get-one i)) (range 10)))

phronmophobic 2024-08-06T22:54:00.637679Z

Here's a few thoughts from skimming: • It's not clear to me what the goal is. A description of the goal and what each function does would be helpful. • run-batch! doesn't accept any arguments. I think it's ok for helper functions like this to use globals, but I would implement it so that there's some arity that accepts all the inputs and a zero arity that passes in all the globals. • I suspect you would be better off using some sort of executor.

❤️ 1
2024-08-06T23:18:16.998269Z

Thanks for looking @smith.adriane! I updated the description to explain the problem & the goal a bit more. I also noted that I am running the future inside a virtual thread executor.

phronmophobic 2024-08-06T23:20:36.301619Z

why not just use a SingleThreadExecutor? Why is there a 5ms delay being added?

phronmophobic 2024-08-06T23:31:20.340969Z

There's also some coupling between the task data and task runner. run-batch! expects an :input key, which it internally calls ids . Afterwards, the ids are transformed into vs? All of that is confusing to me. Is there any reason not to use a consistent name?

phronmophobic 2024-08-06T23:33:25.583939Z

It also seems like results are being held inside the loader atom forever.

2024-08-06T23:34:20.820439Z

> why not just use a SingleThreadExecutor? Why is there a 5ms delay being added? My goal is, if someone does:

(pmap (fn [i] (get-one i)) (range 10))
I would want get-batched to be called with [1 2 3 4 5 6 7 8 9] I am not sure how this would work with a single threaded executor. Would love to learn more!

phronmophobic 2024-08-06T23:36:42.592959Z

I haven't actually tested this code, but it would look something like:

(def promises
  (mapv (fn [input]
          (.submit my-executor (fn []
                                 (do-work input))))
        inputs))

(def first-result @(first promises))

phronmophobic 2024-08-06T23:37:10.474689Z

if my-executor is a SingleThreadExecutor, then jobs would be executed one at a time, in order, on a single thread.

phronmophobic 2024-08-06T23:37:44.503709Z

(.submit my-executor a-fn) returns a Future which can be dereferenced.

2024-08-06T23:39:09.528409Z

(I went ahead and updated the code to use more consistent variable names)

2024-08-06T23:39:27.030809Z

Is do-work in that example, the function get-one?

phronmophobic 2024-08-06T23:40:56.687449Z

get-one schedules something where as do-work actually accomplishes a single task

phronmophobic 2024-08-06T23:41:18.942259Z

you could run (do-work my-input) directly and it wouldn't use the executor

2024-08-06T23:43:16.441949Z

In the code above, what I would like to do is, to 'batch' all the tasks into one function call. This is for something like, a user makes N concurrent database requests: SELECT * FROM users WHERE id = 1 SELECT * FROM users WHERE id = 2 (these are analogous to`get-one`) But we instead run

SELECT * FROM users WHERE id IN (1, 2)
(this is analogous to get-batched) I may be misunderstanding what you mean by do-work and input here.

phronmophobic 2024-08-06T23:45:38.772809Z

Ah. I did not realize that was the goal.

phronmophobic 2024-08-06T23:46:24.574119Z

There are multiple ways to do this, but I would use core.async since I'm familiar with it and it could be expressed naturally.

🫡 1
2024-08-06T23:49:25.104309Z

Updated the post to try to state the goal better 🫡. If you end up outlining a solution with core async would love to see it! I was thinking about core.async too, but wasn't sure how to express it more naturally than this.

phronmophobic 2024-08-07T00:04:52.103879Z

The updated problem statement is great! 👍

❤️ 1