Fork me on GitHub
#announcements
<
2021-03-19
>
slipset07:03:55

I’m very honoured to announce the release of https://github.com/clojure/data.json/releases/tag/data.json-2.0.0 . Thanks to @alexmiller for inspiration and guidance during this work! This release introduces significant speed improvements in both reading and writing json, while still being a pure clojure lib with no external dependencies. Using the benchmark data from jsonista we see the following improvement: Reading: • 10b from 1.4 µs to 609 ns (cheshire 995 ns) • 100b from 4.6 µs to 2.4 µs (cheshire 1.9 µs) • 1k from 26.2 µs to 13.3 µs (cheshire 10.2 µs) • 10k from 292.6 µs to 157.3 µs (cheshire 93.1 µs) • 100k from 2.8 ms to 1.5 ms (cheshire 918.2 µs) Writing • 10b from 2.3 µs to 590 ns (cheshire 1.0 µs) • 100b from 7.3 µs to 2.7 µs (cheshire 2.5 µs) • 1k from 41.3 µs to 14.3 µs (cheshire 9.4 µs) • 10k from 508 µs to 161 µs (cheshire 105.3 µs) • 100k from 4.4 ms to 1.5 ms (cheshire 1.17 ms)

👏 245
18
😍 35
4
4
borkdude08:03:47

@U04V5VAUN are you sure this is correct? > 10k from 508 µs to 161 µs (cheshire 105.3 ms)

borkdude08:03:05

161 µs vs 105.3 /ms/ ?

ikitommi08:03:26

Great work! Will rerun the benchmarks on jsonista repo with the new version

👏 27
🙏 4
borkdude08:03:11

FWIW there are also some new patches in cheshire master (with new Jackson) so it would be good to run against cheshire master and not the currently released version

slipset08:03:14

jsonista is still faster though 🙂

borkdude08:03:24

e.g. writing became 15% faster

ikitommi08:03:01

running with the latests now.

nilern08:03:51

Most people will be using the latest jar though

jmayaalv09:03:59

this is great! 😍

rickmoynihan09:03:26

@U04V5VAUN fantastic! Care to elaborate on the tricks you used to speed it up?

slipset10:03:16

1. remove the dynamic vars and pass them explicitly as an options map 2. for reading, split reading strings into two paths, the quick one (without any escapes), you do with passing an array slice to (String.), the slow one (with escapes and unicode and stuff) you still do with Stringbuilder 3. for writing, don’t use format to construct unicode escapes The main trick though was to use the stuff in http://clojure-goes-fast.com ie, profile, observe the results, form a hypothesis, create a fix 🙂

❤️ 6
rickmoynihan10:03:07

Yeah I’d taken a quick peak, and was mainly interested in hearing about 2 and 3. I entirely agree with the other advice too though! :thumbsup: :thumbsup: I’ve noticed in the past that unicode processing is often the slow bit in parsing large amounts of data. Also the performance difference between InputStream and Reader is staggering… mainly I believe because reader does that unicode stuff, and expands all characters into 16bits. So was curious how you were alleviating that. I’ve never tried parsing json, so know next to nothing about it; but I was trying to understand how you knew whether you needed to use unicode or not. I’m guessing you know you only need to handle unicode for strings inside the json?!, not the structure itself. Is that correct?!

roklenarcic10:03:01

I was looking at the commit that replaced dynamic vars with options map. Couldn’t you have saved up even more time if internal functions like read-object received key-fn and value-fn as an argument instead of the whole options map and performing a map get?

nilern10:03:42

data.json takes a Reader, I think @U04V5VAUN just meant Unicode escapes inside strings

borkdude10:03:16

avoiding apply and merge could possibly also help

☝️ 4
roklenarcic10:03:19

Another observation: couldn’t you capture the values of dynamic vars in a map at the start of the public functions like write-str and then you don’t get hit with dynamic var cost because you don’t access it repeatedly

rickmoynihan10:03:03

@U4MB6UKDL: Yes I know. I was alluding to that too. I mention InputStream/Reader as something observed in my own work, and in support of the general point that handling unicode is slow.

ikitommi10:03:05

the old:

jsonista.jmh/encode-data-json  :encode  :throughput  5         406998.934   ops/s  152242.102    {:size "10b"}
jsonista.jmh/encode-data-json  :encode  :throughput  5         146750.626   ops/s  13532.113     {:size "100b"}
jsonista.jmh/encode-data-json  :encode  :throughput  5         28543.913    ops/s  5982.429      {:size "1k"}
jsonista.jmh/encode-data-json  :encode  :throughput  5         1994.604     ops/s  193.798       {:size "10k"}
jsonista.jmh/encode-data-json  :encode  :throughput  5         229.534      ops/s  3.574         {:size "100k"}

ikitommi10:03:14

the new:

jsonista.jmh/encode-data-json  :encode  :throughput  5         1534830.890  ops/s  155359.246    {:size "10b"}
jsonista.jmh/encode-data-json  :encode  :throughput  5         341613.782   ops/s  26261.051     {:size "100b"}
jsonista.jmh/encode-data-json  :encode  :throughput  5         69673.326    ops/s  1647.625      {:size "1k"}
jsonista.jmh/encode-data-json  :encode  :throughput  5         5658.247     ops/s  999.701       {:size "10k"}
jsonista.jmh/encode-data-json  :encode  :throughput  5         581.924      ops/s  39.758        {:size "100k"}

ikitommi10:03:36

=> 2.5x throughtput improvement 🚀

ikitommi10:03:50

jsonista:

jsonista.jmh/encode-jsonista   :encode  :throughput  5         6718559.441  ops/s  564494.417    {:size "10b"}
jsonista.jmh/encode-jsonista   :encode  :throughput  5         2021530.135  ops/s  227934.280    {:size "100b"}
jsonista.jmh/encode-jsonista   :encode  :throughput  5         358639.582   ops/s  33561.700     {:size "1k"}
jsonista.jmh/encode-jsonista   :encode  :throughput  5         32536.978    ops/s  8135.004      {:size "10k"}
jsonista.jmh/encode-jsonista   :encode  :throughput  5         2687.242     ops/s  185.516       {:size "100k"}

ikitommi10:03:15

still much faster, but it’s 99% java.

nilern10:03:37

Jackson (and simdjson) can do their own UTF-8 decoding while parsing from a byte stream. All the structural JSON characters are ASCII so yes Unicode is only really relevant inside strings.

👍 4
slipset10:03:53

@U4MB6UKDL my initial patch had value-fn and key-fn passed as separate params, but that doesn’t really scale well (if you imagine passing more opts in the future). Also, the penalty from apply and array-map only shows on the smaller payloads, so it was probably worth the tradeoff.

nilern10:03:58

(I think you meant @U66G3SGP5)

slipset10:03:09

I most certainly did. Sorry.

rickmoynihan10:03:36

has some slack weirdness happened in this thread?! Some comments appear to have disappeared and replies now appear out of context e.g. my comment above was in response to something @U4MB6UKDL said which has also vanished.

nilern10:03:30

Nothing has vanished AFAICT. Try refreshing your browser?

rickmoynihan10:03:43

:thumbsup: I’d done that, but doing it a second time seems to have fixed it.

ikitommi12:03:17

new jmh-benchmarks on jsonista repo: https://github.com/metosin/jsonista#performance

👍 15
Ben Sless13:03:47

very cool! Looks like some more % can be shaved off by using identical? and == instead of = where possible. Especially using identical?, as the documentation says "If value-fn returns itself" - can you assume it's the same object?

Alex Miller (Clojure team)14:03:39

an excellent way to work on problems is to write them down in a trackable place like https://ask.clojure.org or jira (if you have access)

👍 4
Alex Miller (Clojure team)14:03:25

Changelog for 2.0.0 fyi: • Perf https://clojure.atlassian.net/browse/DJSON-35: Replace PrintWriter with more generic Appendable, reduce wrapping • Perf https://clojure.atlassian.net/browse/DJSON-34: More efficient writing for common path • Perf https://clojure.atlassian.net/browse/DJSON-32: Use option map instead of dynamic variables (affects read+write) • Perf https://clojure.atlassian.net/browse/DJSON-33: Improve speed of reading JSON strings • Fix https://clojure.atlassian.net/browse/DJSON-30: Fix bad test

Ben Sless14:03:27

you're right. I still need to open a jira on update-in's performance, too

slipset12:03:14

Unfortunately, there is a bug in the above release wrt to strings being longer than 64 chars, so do not use version 2.0.0, rather wait for 2.0.1 🥵

❤️ 69
otfrom12:03:25

ouch... solidarity and hugops

borkdude12:03:01

The pure Clojure JSON ecosystem now rests on your shoulders slipset... take care!

🙏 10
littleli12:03:57

stay strong! It's great effort. I personally would use pure library over java interop whenever possible.

☝️ 4
4
borkdude12:03:13

Btw, I wondered if there is some JSON standard compliance test suit that these kinds of libs should be ran against

👀 4
borkdude12:03:21

independent of their implementation

borkdude12:03:57

excellent, I will post an issue at the cheshire side as well about this

Noah Bogart13:03:33

as linked in that repo, i would highly recommend reading this blog post about json parsing ambiguities: http://seriot.ch/parsing_json.php

😲 7
Alex Miller (Clojure team)14:03:12

if anyone is interested in working on things like this, please join the club! would be happy to have help on this

Alex Miller (Clojure team)14:03:18

data.json 2.0.1 is now available • Fix https://clojure.atlassian.net/browse/DJSON-37: Fix off-by-one error reading long strings, regression in 2.0.0

❤️ 57
👍 12
borkdude14:03:59

@U04V5VAUN Congrats on the fix 😅 Does this affect the benchmarks?

chrisn18:03:28

I tested dtype-next's ffi generation with graal native and avclj. After a bit of work (a couple days) I can now generate a graal native executable that encodes video 🙂. https://github.com/cnuernber/avclj

🤯 54
👍 27