Fork me on GitHub
#ring
<
2023-09-13
>
Anders Corlin19:09:44

I have successfully got multipart working, at least as long as the submitter uses content type multipart/form-data. Now I have another client that sends data as multipart/mixed. I can see that ring.middleware.multipart-params/multipart-params-request only parses if content type is multipart/form-data, and I can't find any Clojure example of other multipart types such as mixed. Anybody has a clue how to parse multipart/mixed would be wonderful!

weavejester22:09:22

Can you open an issue for this on the Ring repository? It sounds like the sort of thing we'd need to patch.

weavejester23:09:30

Looking into this further, it looks like multipart/mixed data doesn't necessarily have anything that keys the multipart data, and it looks like the Commons FileUpload library used by Ring for its multipart parsing only supports formdata (unless we use some of the lower-level classes instead), as far as I can tell. If you want multipart/mixed, you'll probably need to grab a Java multipart library and write some middleware to parse it yourself.

🙏 1
Anders Corlin23:09:34

Thanks @U0BKWMG5B! Can't almost find any information about multipart/mixed, but it seems I can convince the sender to encode the data as multipart/form-data instead, so we can use the standard libraries on the backend.

Anders Corlin13:09:48

Had a look and found that the header contains a boundary that separates the parts:

Content-Type: multipart/mixed; boundary=3d3edb89-ba76-4ed5-b199-eb3f6f278646
Hence I wrote this middleware to parse it:

Anders Corlin13:09:10

(ns xxxx.multipart-mixed
  (:require [cheshire.core :as cheshire]
            [cuerdas.core :as str])
  (:import ( InputStream)
           (java.util Arrays)
           ( IOUtils)))

(defn array-find
  "Search a byte array for a pattern, return the offset found"
  [^bytes data ^bytes pattern & [start-offset]]
  (let [pattern-length (alength pattern)
        length (- (alength data) pattern-length)]
    (if (pos? length)
      (loop [i (or start-offset 0)]
        (when (<= i length)
          (if (Arrays/equals pattern (Arrays/copyOfRange data i (+ i pattern-length)))
            i
            (recur (inc i))))))))

(defn array-split
  "Split an array on the pattern"
  [^bytes data ^bytes pattern]
  (let [pattern (if (string? pattern)
                  (.getBytes pattern)
                  pattern)
        pattern-length (alength pattern)]
    (loop [data data
           start-offset 0
           acc []]
      (if-let [offset (array-find data pattern start-offset)]
        (let [acc (if (= start-offset offset)
                    acc
                    (conj
                      acc
                      (Arrays/copyOfRange data start-offset offset)))]
          (recur data (+ offset pattern-length) acc))
        acc))))

(defn parse-header
  [s separator]
  (into
    {}
    (keep
      #(let [parts (str/split % separator)]
         (if (= 2 (count parts))
           [(keyword (str/lower (str/trim (first parts))))
            (str/trim (second parts) "\n\t\f\r \"")]))
      s)))

(defn get-content-type-info
  [content-type]
  (let [parts (str/split content-type ";")]
    (assoc
      (parse-header parts "=")
      :content-type (first parts))))

(defn wrap-multipart-mixed
  [handler]
  (fn [{{content-type "content-type"} :headers
        :as                           request}]
    (let [content-type-info (get-content-type-info content-type)]
      (if (= (:content-type content-type-info) "multipart/mixed")
        (let [content-type-info (get-content-type-info content-type)
              boundary (str "--" (:boundary content-type-info))
              body (IOUtils/toByteArray ^InputStream (:body request))
              multiparts (array-split body boundary)]
          (handler
            (reduce
              (fn [request multipart]
                (let [multipart-rows (array-split multipart "\r\n")
                      multipart-info (parse-header (map #(String. %) multipart-rows) ":")
                      content-type-info (get-content-type-info (:content-type multipart-info))
                      content-type (:content-type content-type-info)]
                  (case content-type
                    "application/json" (update-in
                                         request
                                         [:parameters :body]
                                         merge
                                         (cheshire/parse-string (String. (last multipart-rows)) true))
                    "image/jpeg" (let [content-disposition-info (get-content-type-info (:content-disposition multipart-info))
                                       k (keyword (:name content-disposition-info))]
                                   (assoc-in
                                     request
                                     [:parameters :body k]
                                     {:content-type content-type
                                      :filename (:filename content-disposition-info)
                                      :bytes (last multipart-rows)}))
                    request)))
              request
              multiparts)))))))

Anders Corlin13:09:59

I have tested it with this data in the body:

--3d3edb89-ba76-4ed5-b199-eb3f6f278646
Content-Type: application/json; charset=utf-8
Content-Length: 27

{"name":"Andreas"}
--3d3edb89-ba76-4ed5-b199-eb3f6f278646
Content-Disposition: form-data; name="image"; filename="image.jpg"
Content-Type: image/jpeg
Content-Length: 1814

<binary data for the image>
--3d3edb89-ba76-4ed5-b199-eb3f6f278646--

Anders Corlin13:09:45

Don't think it's generic enough to be in any library yet, but somebody searching for multipart/mixed is welcome to capture inspiration from the code.

Anders Corlin15:09:04

Now I get confused, happened to see the raw input of a multipart/form request that Ring already so nicely parses, and found the content very similar with boundary separators. And reading the header of org.apache.commons.fileupload which ring.middleware.multipart-params uses for it's parsing, it says: > This class handles multiple files per single HTML widget, sent using multipart/mixed encoding type, as specified by RFC 1867 . Use parseRequest(RequestContext) to acquire a list of FileItems associated with a given HTML widget. So maybe most of my code isn't necessary at all? Gonna see what happens if I just change the content type from multipart/mixed to multipart/form-data a bit later on.

Anders Corlin21:09:22

Yes! So actually if I just bypass the check in ring.middleware.multipart-params/multipart-form? with another middleware before that changes content type from multipart/mixed to multipart/form-data Ring parses out the image as well. Feels like I've been going around in circles, @U0BKWMG5B what is you're understanding about the differences about the two formats? And as far as I can read in the header of commons.FileUpload it supports multipart/mixed; where did you read that it doesn't?

weavejester13:09:45

My understanding is that the sole difference is that multipart/form-data must set a name attribute in the Content-Disposition header, while multipart/mixed has no such restriction. This means that form data can be loaded into a map of parameters, while mixed data cannot, as it doesn't have a guaranteed key for each multipart section. Mixed data is more generally used for things like email attachments. The FileItem interface in FileUpload requires a name attribute to be constructed, however now that I look at the actual parsing code, I see that this name attribute may actually be null. So I believe I was wrong before, and that it's possible to use FileUpload to parse multipart/mixed directly, without dropping down to the lower level API of the library. You'll get an iterator of FileItem instances where getFieldName may return null. Of course, this isn't compatible with Ring because it expects each multipart section to be named, so it has keys for the map it produces.