This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-05-10
Channels
- # aws (39)
- # babashka (4)
- # beginners (5)
- # biff (25)
- # cider (14)
- # clj-on-windows (40)
- # clojure-europe (36)
- # clojure-gamedev (1)
- # clojure-losangeles (4)
- # clojure-norway (51)
- # clojure-spec (5)
- # clojure-uk (2)
- # clojurescript (2)
- # clr (176)
- # data-science (10)
- # datalevin (17)
- # datomic (7)
- # deps-new (4)
- # docs (3)
- # emacs (12)
- # figwheel (3)
- # figwheel-main (5)
- # hyperfiddle (20)
- # instaparse (3)
- # introduce-yourself (8)
- # lsp (66)
- # malli (43)
- # off-topic (4)
- # rdf (11)
- # reagent (5)
- # releases (2)
- # sci (11)
- # shadow-cljs (24)
- # slack-help (2)
- # specter (7)
- # tools-deps (3)
- # xtdb (48)
We’re parsing large documents. We’ve found we get better performance (memory and CPU) if we split the documents up into chunks, and then parse each chunk separately. This keeps the grammar smaller and aligns with https://github.com/Engelberg/instaparse/blob/master/docs/Performance.md#performance-tips.
However, we also need to keep track of line and column numbers of the original document, which doesn’t work very well with the chunked approach. Say we have two chunks, lines 1-10 and lines 11-20. When we use insta/add-line-and-column-info-to-metadata
on the second chunk, the line metadata starts at line 1, not line 11.
At the moment we have a collection of helpers to walk the metadata after it’s generated, and offset it. But I was wondering if anyone has a better approach.
I’ve submitted a PR with https://github.com/Engelberg/instaparse/pull/226, which pushes the complexity into instaparse itself. But if anyone has other tricks, I’d love to hear.
clj-antlr is much faster than Instaparse if performance remains an issue. Not quite as nice an API though.
Thanks, I’ll check that out!