Fork me on GitHub
Yehonathan Sharvit10:08:24

What’s the best way to parse a JSON string into a ClojureScript data? I could use js/JSON.parseString like this:

(js->clj (js/JSON.parseString json))


"Best" in what way?


That line of code is definitely the shortest possible solution, doesn't require any imports, doesn't require any dependencies. Although it's js/JSON.parse. If you absolutely must squeeze out every cycle, you can strip away unnecessary things from js->clj, things that JSON data cannot have. Also, js/JSON.parse accepts a second argument, reviver, which transforms the returned value. Doubt it'll be faster to create CLJS data structures this way, but I'd definitely experiment with it.

Yehonathan Sharvit10:08:18

What about using a JSON parser that skips the conversion from JSON to JS? Would it make sense to port Cheshire to ClojureScript? Would it be faster?


Actually, as an alternative to turning JS into CLJS data structures, you might use the reviver argument to wrap each value in a cljs-bean. But again - no clue whether it'll be worth it. Might also depend on a particular data set or its usage. > Why not? Because there are few things in JS more optimized than JSON parsing. It's done at a very low level.


And JS object instantiation is also optimized better than probably any other part of JS, because objects are everywhere.

Yehonathan Sharvit10:08:58

But still that’s a 2-step process: 1. JSON-> JS 2. JS -> CLJ Why having a single JSON->CLJ couldn’t be faster?


[1] is fast, [2] is medium. JSON->CLJ is incredibly slow.

Yehonathan Sharvit10:08:07

What makes JSON->CLJ is slow?


you're competing against the browser's native impl (c++, rust) with js only. Intuitively i don't think you could reach the performance but i might be wrong (maybe with webassembly ? i have no idea how fast this is)

Yehonathan Sharvit11:08:19

I understand that JSON->CLJ is slower than JSON->JS. But it don’t understand why JSON->CLJ is slower than JSON->JS + JS->CLJ.

Yehonathan Sharvit11:08:33

I am probably missing something very basic


you'd have to compare to json->js written in js to see th eupper limit


then again, i'm not knowledgable on the subject, it's just intuition. But you only have access to a subset of the tools here, the browser has access to more


@U0L91U7A8 These are bogus timings, just for an explanation. Suppose, JSON->JS for a particular data structure takes 1 second and JS->CLJS of the result takes 2 seconds. In this case, JS->CLJS will easily take 10 seconds. Because going over a string, character by character, tokenizing it in the process, making checks, accumulating buffers - in CLJS it will all be crawling when compared to a native V8 implementation. js/JSON.parse is not written in JS. It's a function built into the engine. It's similar to calling a native method of some Java class. And countless human-hours have been spend optimizing just that single function.

Yehonathan Sharvit11:08:20

You meant JSON->CLJS will easily take 10 seconds . Right?


Yes, sorry.

Yehonathan Sharvit11:08:15

It starts to make sense now.

Yehonathan Sharvit11:08:35

But I don’t get why JS->CLJS is slower than JSON->JS? There is no character parsing and tokenization involved in JS->CLJS !


now this part could probably be improved, have you run benchmarks ?


That particular point is of no importance. It might be slower, it might not be - depends on the data.

Yehonathan Sharvit11:08:09

I agree. The important fact is that dealing with strings at the JavaScript level is significantly slower that dealing with them at the browser native level. Thank you for this important clarification @U2FRKM4TW!

👍 1

Just tested 5 NPM packages for JSON parsing. 3 of them were comparable to JSON.parse - because they were using it. :) The other 2 were 7 times slower. But they're also streaming parsers - the comparison is hardly fair.

Yehonathan Sharvit11:08:14

What do you mean by streaming parsers?


Parsers that stream their values as opposed to returning the whole data once it's ready.


In this case, the parsers call a function when a value is ready.

Yehonathan Sharvit11:08:07

Are streaming parsers expected to be faster or slower than regular parsers?


The answer is, as usual, "it depends". :)

Yehonathan Sharvit11:08:58

do you have an rough idea about how JSON.parse compare with a Java JSON parser?

Yehonathan Sharvit11:08:26

Based on what you wrote earlier, the Java parser, being non-native is expected to be much slower. Right?


For your specific case, nothing would beat JSON.parse - apart from maybe also using its reviver argument. As I said I haven't tested it. > do you have an rough idea about how JSON.parse compare with a Java JSON parser? No clue. But I'd bet on JSON.parse.


Do you have to convert to clj structures? If performance is key, just parse the JSON with the native parser then use the JS structure with (oget parsed-json ...)


If you must have cljs, the cljs-bean.core/->clj is one of the quickest, I believe


P.S. @U0L91U7A8 if you consider porting Cheshire, what you’re actually saying is that you’re porting the, because that’s what Cheshire uses to parse with. Have a look a the before committing yourself here! I know that a lot of it is Javadoc, but I’m still a bit shocked at how much code there is

Yehonathan Sharvit16:08:54

@U2FRKM4TW explained quite eloquently to me why it’s a bad idea to port Cheshire.


Oh, I only saw the description around using the built-in parser (which would always be better). I didn’t see any comment about Cheshire in particular, which is why I added that

Yehonathan Sharvit17:08:25

Nothing wrong with Cheshire. It's just that the native JS JSON parser is too strong

👍 1

@U2FRKM4TW are you a language developer or something? I always see you answering deeply technical stuff with a fair bit of confidence. I’m in awe!


@U0AQ3HP9U Am a jack of all trades. :)

🛠️ 3

Just a note that when I benchmarked this a few years ago, (js->clj (js/JSON.parse json)) was slow - significantly slower than transit's read in CLJS (especially with :keywordize-keys?) This was counterintuitive to me, because JSON.parse is probably as optimized as it gets. My conclusion was that js-clj is a bottleneck, but didn't investigate further.


I've always found it puzzling that Javascript (the origin of JSON) doesn't provide a streaming JSON parser (like Java's SAX for XML). Having that would allow you to avoid creating an intermediate JS object if your goal is a CLJS data structure (especially JSON files >100 kB)


Of course if you can get away with just using (reading?) the JSON data structure directly, that's going to be the fastest


When I have a choice I store data as transit+json


> My conclusion was that js-clj is a bottleneck, but didn't investigate further. Same, and that's probably due to multiple satisfies? and implements? there that are completely unnecessary when dealing with JSON data. > Having that would allow you to avoid creating an intermediate JS object If I understood the documentation correctly, that's precisely what the reviver argument of JSON.parse could be used for.


@U2FRKM4TW interesting, didn't know about this argument (what a strange name, reviver)


Well, I've mentioned it before in this thread. ;)


> due to multiple satisfies? and implements? I think that's a pretty good guess There might be room for a JSON parser that's 2x faster for large files than JSON.parse + js->clj