Fork me on GitHub
#clojurescript
<
2022-08-25
>
Yehonathan Sharvit10:08:24

What’s the best way to parse a JSON string into a ClojureScript data? I could use js/JSON.parseString like this:

(js->clj (js/JSON.parseString json))

p-himik10:08:44

"Best" in what way?

p-himik10:08:06

That line of code is definitely the shortest possible solution, doesn't require any imports, doesn't require any dependencies. Although it's js/JSON.parse. If you absolutely must squeeze out every cycle, you can strip away unnecessary things from js->clj, things that JSON data cannot have. Also, js/JSON.parse accepts a second argument, reviver, which transforms the returned value. Doubt it'll be faster to create CLJS data structures this way, but I'd definitely experiment with it.

Yehonathan Sharvit10:08:18

What about using a JSON parser that skips the conversion from JSON to JS? Would it make sense to port Cheshire to ClojureScript? Would it be faster?

p-himik10:08:55

Actually, as an alternative to turning JS into CLJS data structures, you might use the reviver argument to wrap each value in a cljs-bean. But again - no clue whether it'll be worth it. Might also depend on a particular data set or its usage. > Why not? Because there are few things in JS more optimized than JSON parsing. It's done at a very low level.

p-himik10:08:29

And JS object instantiation is also optimized better than probably any other part of JS, because objects are everywhere.

Yehonathan Sharvit10:08:58

But still that’s a 2-step process: 1. JSON-> JS 2. JS -> CLJ Why having a single JSON->CLJ couldn’t be faster?

p-himik10:08:24

[1] is fast, [2] is medium. JSON->CLJ is incredibly slow.

Yehonathan Sharvit10:08:07

What makes JSON->CLJ is slow?

rolt10:08:49

you're competing against the browser's native impl (c++, rust) with js only. Intuitively i don't think you could reach the performance but i might be wrong (maybe with webassembly ? i have no idea how fast this is)

Yehonathan Sharvit11:08:19

I understand that JSON->CLJ is slower than JSON->JS. But it don’t understand why JSON->CLJ is slower than JSON->JS + JS->CLJ.

Yehonathan Sharvit11:08:33

I am probably missing something very basic

rolt11:08:59

you'd have to compare to json->js written in js to see th eupper limit

rolt11:08:16

then again, i'm not knowledgable on the subject, it's just intuition. But you only have access to a subset of the tools here, the browser has access to more

p-himik11:08:57

@U0L91U7A8 These are bogus timings, just for an explanation. Suppose, JSON->JS for a particular data structure takes 1 second and JS->CLJS of the result takes 2 seconds. In this case, JS->CLJS will easily take 10 seconds. Because going over a string, character by character, tokenizing it in the process, making checks, accumulating buffers - in CLJS it will all be crawling when compared to a native V8 implementation. js/JSON.parse is not written in JS. It's a function built into the engine. It's similar to calling a native method of some Java class. And countless human-hours have been spend optimizing just that single function.

Yehonathan Sharvit11:08:20

You meant JSON->CLJS will easily take 10 seconds . Right?

p-himik11:08:39

Yes, sorry.

Yehonathan Sharvit11:08:15

It starts to make sense now.

Yehonathan Sharvit11:08:35

But I don’t get why JS->CLJS is slower than JSON->JS? There is no character parsing and tokenization involved in JS->CLJS !

rolt11:08:12

now this part could probably be improved, have you run benchmarks ?

p-himik11:08:24

That particular point is of no importance. It might be slower, it might not be - depends on the data.

Yehonathan Sharvit11:08:09

I agree. The important fact is that dealing with strings at the JavaScript level is significantly slower that dealing with them at the browser native level. Thank you for this important clarification @U2FRKM4TW!

👍 1
p-himik11:08:49

Just tested 5 NPM packages for JSON parsing. 3 of them were comparable to JSON.parse - because they were using it. :) The other 2 were 7 times slower. But they're also streaming parsers - the comparison is hardly fair.

Yehonathan Sharvit11:08:14

What do you mean by streaming parsers?

p-himik11:08:16

Parsers that stream their values as opposed to returning the whole data once it's ready.

p-himik11:08:30

In this case, the parsers call a function when a value is ready.

Yehonathan Sharvit11:08:07

Are streaming parsers expected to be faster or slower than regular parsers?

p-himik11:08:06

The answer is, as usual, "it depends". :)

Yehonathan Sharvit11:08:58

do you have an rough idea about how JSON.parse compare with a Java JSON parser?

Yehonathan Sharvit11:08:26

Based on what you wrote earlier, the Java parser, being non-native is expected to be much slower. Right?

p-himik11:08:40

For your specific case, nothing would beat JSON.parse - apart from maybe also using its reviver argument. As I said I haven't tested it. > do you have an rough idea about how JSON.parse compare with a Java JSON parser? No clue. But I'd bet on JSON.parse.

sirwobin11:08:51

Do you have to convert to clj structures? If performance is key, just parse the JSON with the native parser then use the JS structure with (oget parsed-json ...)

sirwobin11:08:12

If you must have cljs, the cljs-bean.core/->clj is one of the quickest, I believe

quoll16:08:46

P.S. @U0L91U7A8 if you consider porting Cheshire, what you’re actually saying is that you’re porting the https://github.com/FasterXML/jackson-core, because that’s what Cheshire uses to parse with. Have a look a the https://github.com/FasterXML/jackson-core/tree/2.14/src/main/java/com/fasterxml/jackson/core before committing yourself here! I know that a lot of it is Javadoc, but I’m still a bit shocked at how much code there is

Yehonathan Sharvit16:08:54

@U2FRKM4TW explained quite eloquently to me why it’s a bad idea to port Cheshire.

quoll16:08:52

Oh, I only saw the description around using the built-in parser (which would always be better). I didn’t see any comment about Cheshire in particular, which is why I added that

Yehonathan Sharvit17:08:25

Nothing wrong with Cheshire. It's just that the native JS JSON parser is too strong

👍 1
reefersleep07:08:00

@U2FRKM4TW are you a language developer or something? I always see you answering deeply technical stuff with a fair bit of confidence. I’m in awe!

p-himik09:08:23

@U0AQ3HP9U Am a jack of all trades. :)

🛠️ 3
pesterhazy11:08:38

Just a note that when I benchmarked this a few years ago, (js->clj (js/JSON.parse json)) was slow - significantly slower than transit's read in CLJS (especially with :keywordize-keys?) This was counterintuitive to me, because JSON.parse is probably as optimized as it gets. My conclusion was that js-clj is a bottleneck, but didn't investigate further.

pesterhazy11:08:18

I've always found it puzzling that Javascript (the origin of JSON) doesn't provide a streaming JSON parser (like Java's SAX for XML). Having that would allow you to avoid creating an intermediate JS object if your goal is a CLJS data structure (especially JSON files >100 kB)

pesterhazy11:08:26

Of course if you can get away with just using (reading?) the JSON data structure directly, that's going to be the fastest

pesterhazy11:08:43

When I have a choice I store data as transit+json

p-himik11:08:09

> My conclusion was that js-clj is a bottleneck, but didn't investigate further. Same, and that's probably due to multiple satisfies? and implements? there that are completely unnecessary when dealing with JSON data. > Having that would allow you to avoid creating an intermediate JS object If I understood the documentation correctly, that's precisely what the reviver argument of JSON.parse could be used for.

pesterhazy11:08:11

@U2FRKM4TW interesting, didn't know about this argument (what a strange name, reviver)

p-himik11:08:28

Well, I've mentioned it before in this thread. ;)

pesterhazy11:08:31

> due to multiple satisfies? and implements? I think that's a pretty good guess There might be room for a JSON parser that's 2x faster for large files than JSON.parse + js->clj