Fork me on GitHub
#clojure
<
2023-01-11
>
Drew Verlee17:01:33

The awesome http-kit server seems to give us a request body of Type BytesInputStream and we (our app code) has to call the .bytes function on it (as discussed in this https://github.com/http-kit/http-kit/issues/144) before passing it to https://github.com/ring-clojure/ring-json. I'm curious what people think of this situation, i personally found it a bit of a headache to find where things were breaking down because wrap-json-body didn't throw, it just returned nil. That seems (though i'm probably wrong here) to be cause ring-json uses json/parse-stream which doesn't throw when given the body/BytesInputStream instead it just returns a nil. Drilling into it a bit more it seems to be cheshire parse-stream that does the actual parsing. Here the body of https://github.com/ring-clojure/ring-json/blob/9f64fb4ea136da59c75dd8dd2a89c6487b41713a/src/ring/middleware/json.clj#L24 with debug code, that hopefully solidifies whats going on.

(:body request)
  ;; => #<org.httpkit.BytesInputStream@29d788e4 BytesInputStream[len=100]>

  (def bigdecimals? true)
  (def keywords? true)
  (def key-fn nil)
  (if-let [^InputStream body (:body request)]
    (let [^String encoding (or (character-encoding request)
                               "UTF-8")
          body-reader (java.io.InputStreamReader. body encoding)]

      (def body body)
      (def body-reader (java.io.InputStreamReader. body "UTF-8"))
      (binding [parse/*use-bigdecimals?* bigdecimals?]
        (try
          [true (json/parse-stream body-reader (or key-fn keywords?))]
          (catch com.fasterxml.jackson.core.JsonParseException ex
            [false nil])))))
  ;; => [true nil]

  body-reader
  ;; => #<java.io.InputStreamReader@138f44ba>

  (json/parse-stream body-reader true)
  ;; => nil  
From there i'm personally a bit ignorant of the narrative around these data structures and what kind of translation is ideal. Though i'm guessing the fastest solution would take into our specific context, a "good enough"/transparent solution would probably use clojure as an intermediate format. To that end, which of these two (hander/http-kit or reciever/wrap-json-body), if any, should be responsible? Or maybe it's just better to write a middleware that calls .bytes on the body and then call wrap-json-body. That's more or less what were doing now, but i'm excited to chat about this because i feel like this is an area that i rarely get to dive into. 🙂

hiredman18:01:06

the inputstream is a mutable object, what is important is it's internal state which I suggested examining last time you asked about this, but isn't shown in the above at all

phronmophobic18:01:53

Why is a call to .bytes required?

hiredman18:01:07

because .bytes returns the buffer contents regardless of the internal state of the inputstream

👍 2
phronmophobic18:01:31

oh, so it does work, but not if the contents have already been consumed?

hiredman18:01:58

I believe so, and that is what I suggested last time they asked about this

👀 2
👍 2
phronmophobic18:01:43

As a suggestion, I would start with listing all the middleware that care about the response's body and what they require/provide.

borkdude18:01:05

or just get rid of most of the middleware and just parse the json yourself - I'm hesitant about adding this complexity to #C04J0N76E2G

hiredman18:01:53

in fact you don't really need to examine the internal state of the inputstream, because the only way it would behave as observed (with parse-stream and bytes) is if the inputstream was completely consumed and returning -1 (indicating eof) from calls to read

Drew Verlee18:01:54

Thanks for the advice, I'll circle back and examine the internal state to see what i can learn.

hiredman18:01:25

so it is 100% the case that something else is reading that entire inputstream before the json decoding middleware sees it

hiredman18:01:48

if I had the guess you are logging the contents of the body

👀 2
👍 2
igrishaev08:01:06

That's completely fine to have a raw InputStream body as it gives you more freedom to for development. Often, you need to calculate some hash sum before parsing the payload, so it's completely fine. Below, here is a draft version of a middleware you can use:

(defn request-json-middleware
  [handler [& [cheshire-opt]]]
  
  (fn -request-json [request]
    (let [json?
          (check-if-content-type-matches "application/json")]
      (if json?
        (let [encoding
              (or (get-encoding-from-header ...)
                  "UTF-8")
              
              data
              (-> request
                  :body
                  (io/input-stream encoding)
                  (cheshire/parse-stream cheshire-opt))

              request'
              (assoc request :data data)]
          
          (handler request'))
        
        (handler request)))))

igrishaev08:01:03

You only need to extend the check-if-content-type-matches and get-encoding-from-header fragments

igrishaev08:01:07

Also, that's a good habit to not substitute the body field with its parsed counterpart but store it separately, e.g. in the :data field or similar. In rare cases, you still need access to the origin InputStream. Although it has been already read, it's possible to .reset() it, read the bytes and calculate the hashsum again.