This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-03-19
Channels
- # announcements (52)
- # asami (83)
- # atom-editor (1)
- # babashka (143)
- # beginners (123)
- # calva (18)
- # chlorine-clover (21)
- # cider (7)
- # clj-kondo (57)
- # cljs-dev (4)
- # clojure (209)
- # clojure-boston (1)
- # clojure-europe (27)
- # clojure-germany (12)
- # clojure-italy (17)
- # clojure-nl (3)
- # clojure-serbia (6)
- # clojure-spec (1)
- # clojure-uk (59)
- # clojurescript (82)
- # conjure (9)
- # core-async (6)
- # cursive (20)
- # data-science (1)
- # datahike (1)
- # datascript (1)
- # datomic (86)
- # duct (5)
- # emacs (6)
- # events (6)
- # figwheel-main (4)
- # fulcro (27)
- # graalvm (19)
- # leiningen (14)
- # lsp (30)
- # malli (48)
- # meander (3)
- # off-topic (6)
- # pedestal (2)
- # practicalli (1)
- # rewrite-clj (21)
- # shadow-cljs (18)
- # sql (15)
- # tools-deps (9)
- # vim (3)
- # wasm (3)
- # xtdb (18)
I’m very honoured to announce the release of https://github.com/clojure/data.json/releases/tag/data.json-2.0.0 . Thanks to @alexmiller for inspiration and guidance during this work! This release introduces significant speed improvements in both reading and writing json, while still being a pure clojure lib with no external dependencies. Using the benchmark data from jsonista we see the following improvement: Reading: • 10b from 1.4 µs to 609 ns (cheshire 995 ns) • 100b from 4.6 µs to 2.4 µs (cheshire 1.9 µs) • 1k from 26.2 µs to 13.3 µs (cheshire 10.2 µs) • 10k from 292.6 µs to 157.3 µs (cheshire 93.1 µs) • 100k from 2.8 ms to 1.5 ms (cheshire 918.2 µs) Writing • 10b from 2.3 µs to 590 ns (cheshire 1.0 µs) • 100b from 7.3 µs to 2.7 µs (cheshire 2.5 µs) • 1k from 41.3 µs to 14.3 µs (cheshire 9.4 µs) • 10k from 508 µs to 161 µs (cheshire 105.3 µs) • 100k from 4.4 ms to 1.5 ms (cheshire 1.17 ms)
@U04V5VAUN are you sure this is correct? > 10k from 508 µs to 161 µs (cheshire 105.3 ms)
FWIW there are also some new patches in cheshire master (with new Jackson) so it would be good to run against cheshire master and not the currently released version
@U04V5VAUN fantastic! Care to elaborate on the tricks you used to speed it up?
1. remove the dynamic vars and pass them explicitly as an options map
2. for reading, split reading strings into two paths, the quick one (without any escapes), you do with passing an array slice to (String.), the slow one (with escapes and unicode and stuff) you still do with Stringbuilder
3. for writing, don’t use format
to construct unicode escapes
The main trick though was to use the stuff in http://clojure-goes-fast.com
ie, profile, observe the results, form a hypothesis, create a fix 🙂
Yeah I’d taken a quick peak, and was mainly interested in hearing about 2 and 3. I entirely agree with the other advice too though! :thumbsup: :thumbsup:
I’ve noticed in the past that unicode processing is often the slow bit in parsing large amounts of data. Also the performance difference between InputStream
and Reader
is staggering… mainly I believe because reader does that unicode stuff, and expands all characters into 16bits. So was curious how you were alleviating that.
I’ve never tried parsing json, so know next to nothing about it; but I was trying to understand how you knew whether you needed to use unicode or not. I’m guessing you know you only need to handle unicode for strings inside the json?!, not the structure itself. Is that correct?!
I was looking at the commit that replaced dynamic vars with options map. Couldn’t you have saved up even more time if internal functions like read-object
received key-fn
and value-fn
as an argument instead of the whole options map and performing a map get?
data.json
takes a Reader
, I think @U04V5VAUN just meant Unicode escapes inside strings
Another observation: couldn’t you capture the values of dynamic vars in a map at the start of the public functions like write-str
and then you don’t get hit with dynamic var cost because you don’t access it repeatedly
@U4MB6UKDL: Yes I know. I was alluding to that too. I mention InputStream/Reader as something observed in my own work, and in support of the general point that handling unicode is slow.
the old:
jsonista.jmh/encode-data-json :encode :throughput 5 406998.934 ops/s 152242.102 {:size "10b"}
jsonista.jmh/encode-data-json :encode :throughput 5 146750.626 ops/s 13532.113 {:size "100b"}
jsonista.jmh/encode-data-json :encode :throughput 5 28543.913 ops/s 5982.429 {:size "1k"}
jsonista.jmh/encode-data-json :encode :throughput 5 1994.604 ops/s 193.798 {:size "10k"}
jsonista.jmh/encode-data-json :encode :throughput 5 229.534 ops/s 3.574 {:size "100k"}
the new:
jsonista.jmh/encode-data-json :encode :throughput 5 1534830.890 ops/s 155359.246 {:size "10b"}
jsonista.jmh/encode-data-json :encode :throughput 5 341613.782 ops/s 26261.051 {:size "100b"}
jsonista.jmh/encode-data-json :encode :throughput 5 69673.326 ops/s 1647.625 {:size "1k"}
jsonista.jmh/encode-data-json :encode :throughput 5 5658.247 ops/s 999.701 {:size "10k"}
jsonista.jmh/encode-data-json :encode :throughput 5 581.924 ops/s 39.758 {:size "100k"}
jsonista:
jsonista.jmh/encode-jsonista :encode :throughput 5 6718559.441 ops/s 564494.417 {:size "10b"}
jsonista.jmh/encode-jsonista :encode :throughput 5 2021530.135 ops/s 227934.280 {:size "100b"}
jsonista.jmh/encode-jsonista :encode :throughput 5 358639.582 ops/s 33561.700 {:size "1k"}
jsonista.jmh/encode-jsonista :encode :throughput 5 32536.978 ops/s 8135.004 {:size "10k"}
jsonista.jmh/encode-jsonista :encode :throughput 5 2687.242 ops/s 185.516 {:size "100k"}
Jackson (and simdjson) can do their own UTF-8 decoding while parsing from a byte stream. All the structural JSON characters are ASCII so yes Unicode is only really relevant inside strings.
@U4MB6UKDL my initial patch had value-fn and key-fn passed as separate params, but that doesn’t really scale well (if you imagine passing more opts in the future). Also, the penalty from apply and array-map only shows on the smaller payloads, so it was probably worth the tradeoff.
(I think you meant @U66G3SGP5)
has some slack weirdness happened in this thread?! Some comments appear to have disappeared and replies now appear out of context e.g. my comment above was in response to something @U4MB6UKDL said which has also vanished.
:thumbsup: I’d done that, but doing it a second time seems to have fixed it.
new jmh-benchmarks on jsonista repo: https://github.com/metosin/jsonista#performance
very cool!
Looks like some more % can be shaved off by using identical?
and ==
instead of =
where possible. Especially using identical?
, as the documentation says "If value-fn returns itself" - can you assume it's the same object?
an excellent way to work on problems is to write them down in a trackable place like https://ask.clojure.org or jira (if you have access)
Changelog for 2.0.0 fyi: • Perf https://clojure.atlassian.net/browse/DJSON-35: Replace PrintWriter with more generic Appendable, reduce wrapping • Perf https://clojure.atlassian.net/browse/DJSON-34: More efficient writing for common path • Perf https://clojure.atlassian.net/browse/DJSON-32: Use option map instead of dynamic variables (affects read+write) • Perf https://clojure.atlassian.net/browse/DJSON-33: Improve speed of reading JSON strings • Fix https://clojure.atlassian.net/browse/DJSON-30: Fix bad test
Unfortunately, there is a bug in the above release wrt to strings being longer than 64 chars, so do not use version 2.0.0, rather wait for 2.0.1 🥵
The pure Clojure JSON ecosystem now rests on your shoulders slipset... take care!
stay strong! It's great effort. I personally would use pure library over java interop whenever possible.
Btw, I wondered if there is some JSON standard compliance test suit that these kinds of libs should be ran against
as linked in that repo, i would highly recommend reading this blog post about json parsing ambiguities: http://seriot.ch/parsing_json.php
if anyone is interested in working on things like this, please join the club! would be happy to have help on this
data.json 2.0.1 is now available • Fix https://clojure.atlassian.net/browse/DJSON-37: Fix off-by-one error reading long strings, regression in 2.0.0
@U04V5VAUN Congrats on the fix 😅 Does this affect the benchmarks?
I tested dtype-next's ffi generation with graal native and avclj. After a bit of work (a couple days) I can now generate a graal native executable that encodes video 🙂. https://github.com/cnuernber/avclj