This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-08-25
Channels
- # announcements (4)
- # asami (26)
- # babashka (82)
- # beginners (27)
- # biff (6)
- # boot (1)
- # calva (42)
- # cider (2)
- # clj-commons (1)
- # clj-http-lite (2)
- # clj-kondo (37)
- # cljdoc (1)
- # clojure (46)
- # clojure-europe (34)
- # clojure-nl (1)
- # clojure-norway (7)
- # clojure-uk (2)
- # clojurescript (54)
- # code-reviews (18)
- # cursive (2)
- # datalevin (32)
- # datomic (7)
- # etaoin (1)
- # fulcro (9)
- # gratitude (3)
- # hyperfiddle (15)
- # introduce-yourself (1)
- # jobs (2)
- # lsp (32)
- # nrepl (1)
- # off-topic (18)
- # pathom (17)
- # pedestal (5)
- # polylith (89)
- # reitit (7)
- # releases (3)
- # remote-jobs (4)
- # shadow-cljs (52)
- # spacemacs (3)
- # squint (14)
- # tools-build (10)
- # tools-deps (18)
- # vim (4)
- # xtdb (34)
What’s the best way to parse a JSON string into a ClojureScript data?
I could use js/JSON.parseString
like this:
(js->clj (js/JSON.parseString json))
performance
That line of code is definitely the shortest possible solution, doesn't require any imports, doesn't require any dependencies. Although it's js/JSON.parse
.
If you absolutely must squeeze out every cycle, you can strip away unnecessary things from js->clj
, things that JSON data cannot have. Also, js/JSON.parse
accepts a second argument, reviver
, which transforms the returned value. Doubt it'll be faster to create CLJS data structures this way, but I'd definitely experiment with it.
What about using a JSON parser that skips the conversion from JSON to JS? Would it make sense to port Cheshire to ClojureScript? Would it be faster?
Why not?
Actually, as an alternative to turning JS into CLJS data structures, you might use the reviver
argument to wrap each value in a cljs-bean
. But again - no clue whether it'll be worth it. Might also depend on a particular data set or its usage.
> Why not?
Because there are few things in JS more optimized than JSON parsing. It's done at a very low level.
And JS object instantiation is also optimized better than probably any other part of JS, because objects are everywhere.
But still that’s a 2-step process: 1. JSON-> JS 2. JS -> CLJ Why having a single JSON->CLJ couldn’t be faster?
What makes JSON->CLJ is slow?
you're competing against the browser's native impl (c++, rust) with js only. Intuitively i don't think you could reach the performance but i might be wrong (maybe with webassembly ? i have no idea how fast this is)
I understand that JSON->CLJ is slower than JSON->JS. But it don’t understand why JSON->CLJ is slower than JSON->JS + JS->CLJ.
I am probably missing something very basic
then again, i'm not knowledgable on the subject, it's just intuition. But you only have access to a subset of the tools here, the browser has access to more
@U2FRKM4TW What do you say?
@U0L91U7A8 These are bogus timings, just for an explanation.
Suppose, JSON->JS for a particular data structure takes 1 second and JS->CLJS of the result takes 2 seconds.
In this case, JS->CLJS will easily take 10 seconds. Because going over a string, character by character, tokenizing it in the process, making checks, accumulating buffers - in CLJS it will all be crawling when compared to a native V8 implementation.
js/JSON.parse
is not written in JS. It's a function built into the engine. It's similar to calling a native method of some Java class. And countless human-hours have been spend optimizing just that single function.
You meant JSON->CLJS will easily take 10 seconds
. Right?
It starts to make sense now.
But I don’t get why JS->CLJS
is slower than JSON->JS
? There is no character parsing and tokenization involved in JS->CLJS
!
That particular point is of no importance. It might be slower, it might not be - depends on the data.
I agree. The important fact is that dealing with strings at the JavaScript level is significantly slower that dealing with them at the browser native level. Thank you for this important clarification @U2FRKM4TW!
Just tested 5 NPM packages for JSON parsing.
3 of them were comparable to JSON.parse
- because they were using it. :)
The other 2 were 7 times slower. But they're also streaming parsers - the comparison is hardly fair.
What do you mean by streaming parsers?
Parsers that stream their values as opposed to returning the whole data once it's ready.
Like a lazy seq?
Are streaming parsers expected to be faster or slower than regular parsers?
do you have an rough idea about how JSON.parse compare with a Java JSON parser?
Based on what you wrote earlier, the Java parser, being non-native is expected to be much slower. Right?
For your specific case, nothing would beat JSON.parse
- apart from maybe also using its reviver
argument. As I said I haven't tested it.
> do you have an rough idea about how JSON.parse compare with a Java JSON parser?
No clue. But I'd bet on JSON.parse
.
Do you have to convert to clj structures? If performance is key, just parse the JSON with the native parser then use the JS structure with (oget parsed-json ...)
P.S. @U0L91U7A8 if you consider porting Cheshire, what you’re actually saying is that you’re porting the https://github.com/FasterXML/jackson-core, because that’s what Cheshire uses to parse with. Have a look a the https://github.com/FasterXML/jackson-core/tree/2.14/src/main/java/com/fasterxml/jackson/core before committing yourself here! I know that a lot of it is Javadoc, but I’m still a bit shocked at how much code there is
@U2FRKM4TW explained quite eloquently to me why it’s a bad idea to port Cheshire.
Oh, I only saw the description around using the built-in parser (which would always be better). I didn’t see any comment about Cheshire in particular, which is why I added that
Nothing wrong with Cheshire. It's just that the native JS JSON parser is too strong
@U2FRKM4TW are you a language developer or something? I always see you answering deeply technical stuff with a fair bit of confidence. I’m in awe!
Just a note that when I benchmarked this a few years ago, (js->clj (js/JSON.parse json))
was slow - significantly slower than transit's read
in CLJS (especially with :keywordize-keys
?)
This was counterintuitive to me, because JSON.parse is probably as optimized as it gets.
My conclusion was that js-clj is a bottleneck, but didn't investigate further.
I've always found it puzzling that Javascript (the origin of JSON) doesn't provide a streaming JSON parser (like Java's SAX for XML). Having that would allow you to avoid creating an intermediate JS object if your goal is a CLJS data structure (especially JSON files >100 kB)
Of course if you can get away with just using (reading?) the JSON data structure directly, that's going to be the fastest
When I have a choice I store data as transit+json
> My conclusion was that js-clj is a bottleneck, but didn't investigate further.
Same, and that's probably due to multiple satisfies?
and implements?
there that are completely unnecessary when dealing with JSON data.
> Having that would allow you to avoid creating an intermediate JS object
If I understood the documentation correctly, that's precisely what the reviver
argument of JSON.parse
could be used for.
@U2FRKM4TW interesting, didn't know about this argument (what a strange name, reviver)
> due to multiple satisfies?
and implements?
I think that's a pretty good guess
There might be room for a JSON parser that's 2x faster for large files than JSON.parse + js->clj