reitit

Galaux 2024-09-25T11:59:11.121749Z

Hi y'all ! Has anyone got a tip on how to handle "Byte Order Mark" – an invisible UTF-8 encoded character added at the head of the JSON which make the JSON decoder fail – in Reitit ?

opqdonut 2024-09-25T13:18:04.750229Z

two options come to mind: 1. seeing if jackson has an option to ignore the byte order mark. reitit uses jsonista which uses jackson 2. inserting a middleware that slurps the :body inputstream and removes the byte order mark

Galaux 2024-09-25T13:21:04.036849Z

Thanks for your input simple_smile Yes that was my idea too. Jackson doesn't seem to mention anything about BOM. Also, I'd like to be automatic: some file may not have the BOM. So that would be something in reitit or muuntaja. And as for the second option: I'm a little suspicious about it as I don't expect to be the first one to encounter such a problem.

opqdonut 2024-09-25T13:21:42.938339Z

I've only bumped into the BOM stuff with file uploads (especially excel), not JSON APIs

opqdonut 2024-09-25T13:22:05.952989Z

the middleware could just be a no-op if there is no BOM, for sure

Galaux 2024-09-25T13:22:44.387199Z

Yes, my JSON file is a CSV turned into JSON so … I suspect the process that turned the CSV into JSON also set the BOM in the JSON.

Galaux 2024-09-25T13:23:00.661269Z

> the middleware could just be a no-op if there is no BOM, for sure Right …

juhoteperi 2024-09-25T13:30:02.390699Z

JSON RFC says JSON documents must not have BOM but of course that doesnt help if you still need to read those: https://stackoverflow.com/a/38036753

juhoteperi 2024-09-25T13:32:46.381399Z

If Jackson had an option for this I would think it was mentioned here: https://github.com/FasterXML/jackson-core/wiki/JsonParser-Features (doesn't look like there is)

Galaux 2024-09-25T13:33:19.894479Z

JSON RFC says JSON documents must not have BOM but of course that doesnt help if you still need to read thoseOh that's interesting … it gives me munition to go and tell the person that sends me this JSON that they have to fix their files …

juhoteperi 2024-09-25T13:34:24.524959Z

Yeah, I didn't check the RFCs directly but if the SO quotes are correct it could be good argument

juhoteperi 2024-09-25T13:41:42.754009Z

IF you need to remove the BOM from an InputStream, it should be possible to do without slurppin the whole stream into byte-array or such and creating a new Stream. With BufferedInputStream, you can add a mark at the beginning on stream, read 2 bytes, check if it was UTF-8 BOM, if it was, give rest of the stream to next mw, if not, reset stream to the previous mark -> next mw.

Galaux 2024-09-25T13:45:51.144329Z

There is a BOMInputStream for this

juhoteperi 2024-09-25T13:46:05.240659Z

Oh right, in apache-commons-io

juhoteperi 2024-09-25T13:47:10.234319Z

(let [is (BufferedInputStream. (:body req))
       _ (.mark is 3) ;; allow max 3 bytes to be read while allowing the return to the mark
       first-bytes (byte-array 3)
       _ (.read is first-bytes 0 3)
        has-bom (= [0xEF 0xBB 0xBF] (vec first-bytes))]
  (when-not has-bom (.reset is))
  (handler (assoc req :body is)))

juhoteperi 2024-09-25T13:48:13.274579Z

Just for fun 😄

Galaux 2024-09-25T13:48:25.819779Z

So I would have to prevent reitit from parsing into JSON and do it manually right ?

juhoteperi 2024-09-25T13:48:49.994789Z

No, you could add this to a Reitit middleware, just making sure it is before Jsonista middleware

juhoteperi 2024-09-25T13:49:26.974799Z

Or Muuntaja middleware

Galaux 2024-09-25T13:49:28.599869Z

Ok yeah !

juhoteperi 2024-09-25T13:49:37.667979Z

like before that line

juhoteperi 2024-09-25T13:50:28.316269Z

or after...

juhoteperi 2024-09-25T13:50:56.515979Z

It is hard to remember which way the middlewares need to be defined so the middleware handler is executed BEFORE another mw

juhoteperi 2024-09-25T13:51:46.898159Z

(mw-1 (mw-2 (mw-3 ...))) if you were just wrapping the handler fns, the middleware would need to be outside/after the muuntaja middleware

Galaux 2024-09-25T13:52:24.485249Z

👍🏻 Cool ! Thanks ! I'm going to explore adding a middleware then …

juhoteperi 2024-09-25T13:52:51.665449Z

If your mw sees the req body as InputStream, then the order is correct, if it is clj data, it is running after Muuntaja

👍🏻 1
Galaux 2024-09-25T13:53:17.515239Z

Thanks for the suggestion @joel.kaasinen @juhoteperi