Hi y'all ! Has anyone got a tip on how to handle "Byte Order Mark" – an invisible UTF-8 encoded character added at the head of the JSON which make the JSON decoder fail – in Reitit ?
two options come to mind: 1. seeing if jackson has an option to ignore the byte order mark. reitit uses jsonista which uses jackson 2. inserting a middleware that slurps the :body inputstream and removes the byte order mark
Thanks for your input simple_smile Yes that was my idea too. Jackson doesn't seem to mention anything about BOM. Also, I'd like to be automatic: some file may not have the BOM. So that would be something in reitit or muuntaja. And as for the second option: I'm a little suspicious about it as I don't expect to be the first one to encounter such a problem.
I've only bumped into the BOM stuff with file uploads (especially excel), not JSON APIs
the middleware could just be a no-op if there is no BOM, for sure
Yes, my JSON file is a CSV turned into JSON so … I suspect the process that turned the CSV into JSON also set the BOM in the JSON.
> the middleware could just be a no-op if there is no BOM, for sure Right …
JSON RFC says JSON documents must not have BOM but of course that doesnt help if you still need to read those: https://stackoverflow.com/a/38036753
If Jackson had an option for this I would think it was mentioned here: https://github.com/FasterXML/jackson-core/wiki/JsonParser-Features (doesn't look like there is)
JSON RFC says JSON documents must not have BOM but of course that doesnt help if you still need to read thoseOh that's interesting … it gives me munition to go and tell the person that sends me this JSON that they have to fix their files …
Yeah, I didn't check the RFCs directly but if the SO quotes are correct it could be good argument
IF you need to remove the BOM from an InputStream, it should be possible to do without slurppin the whole stream into byte-array or such and creating a new Stream. With BufferedInputStream, you can add a mark at the beginning on stream, read 2 bytes, check if it was UTF-8 BOM, if it was, give rest of the stream to next mw, if not, reset stream to the previous mark -> next mw.
There is a BOMInputStream for this
Oh right, in apache-commons-io
(let [is (BufferedInputStream. (:body req))
_ (.mark is 3) ;; allow max 3 bytes to be read while allowing the return to the mark
first-bytes (byte-array 3)
_ (.read is first-bytes 0 3)
has-bom (= [0xEF 0xBB 0xBF] (vec first-bytes))]
(when-not has-bom (.reset is))
(handler (assoc req :body is)))Just for fun 😄
So I would have to prevent reitit from parsing into JSON and do it manually right ?
No, you could add this to a Reitit middleware, just making sure it is before Jsonista middleware
Or Muuntaja middleware
Ok yeah !
https://github.com/metosin/reitit/blob/master/examples/ring-example/src/example/server.clj#L22
like before that line
or after...
It is hard to remember which way the middlewares need to be defined so the middleware handler is executed BEFORE another mw
(mw-1 (mw-2 (mw-3 ...))) if you were just wrapping the handler fns, the middleware would need to be outside/after the muuntaja middleware
👍🏻 Cool ! Thanks ! I'm going to explore adding a middleware then …
If your mw sees the req body as InputStream, then the order is correct, if it is clj data, it is running after Muuntaja
Thanks for the suggestion @joel.kaasinen @juhoteperi