Anyone well versed in the most recent quantum entanglement experiments?
I'm writing a lexer and parser for Clojure in a low-level lang as a learning exercise, with the goal of writing a fast code-formatter.
Is there a place that exhaustively documents the syntax of Clojure? (more so than the reader reference)
With some experimentation with the reader, I learned several unobvious cases, like:
1. spaces are accepted before the symbol name in tagged-literals. # uuid "1843e31d-6cee-4c3a-8eaa-7cbfd6e8cfbf".
2. Any number of slashes in symbols is accepted, although the reference states there must be at most one. 'a/b/c reads just fine, treating b/c as the name of the symbol.
One thing for example, that's not as clear to me, is the exhaustive set of characters accepted in character literals.
You may find clojure.tools.reader easier to read than lispreader.java. There are some antlr grammars out there too but i think they may pre-date namespaced maps and maybe even reader conditionals. You may also want to checkout clojure-standard-style
Thanks!
I'm a little worried about inheriting any non-standard behavior from things like clojure-standard-style, Babashka and Joker.
clojure.tools.reader should be close enough to the actual reader for my purposes.
True, lispreader.java is truth; if you find a discrepancy in tools.reader that would be worth an issue
On first look, it seems to be implemented in almost the same way as lispreader.java, by having a set of "macro-chars" split the byte-stream into tokens, and then categorizing and/or further splitting them into the actual tokens.
lispreader.java is implementation right now. It can change and reserves the right to change. youβre right that what this does is what clojure does, but it might be a moving target.
what language are you writing it in? (just curious and i promise i have no interest in a language debate, just interested in the choice)
> It can change and reserves the right to change True. In some cases, the reader-reference docs don't have an opinion, like the first of the examples above. Whereas the second example is something the reader currently accepts but the reference explicitly disallows. In such cases, I can either choose to be strict according to the reference, or accept anything the actual reader accepts, which makes more sense for a formatter so that I can format any code that does actually run today. > what language Zig. I've been wanting to learn a low-level language seriously for a few years now, and am finally jumping into it. Currently writing the lexer as a straightforward loop-with-a-switch state-machine, and a hand-written recursive descent parser. Once I get the correctness right, I want to try and go as fast as possible, mostly as an excuse to learn profiling and things like SIMD.
sounds like a really cool project. Iβd love to see periodic updates as you progress!
There are a lot of symbols/keywords/etc that are readable but are not "valid" according to the documentation.
You can check https://github.com/sogaiu/tree-sitter-clojure as an additional reference. Unless you're intent on rolling your own parsers, you can also use the tree sitter grammar directly from zig.
Just found this cursed REPL interaction when trying to figure out the set of characters accepted by the reader after a valid character literal π:
β ~ > clojure
Clojure 1.12.0
user=> (read-string "\u006A'foo")
j'foo
user=> (type (read-string "\u006A'foo"))
clojure.lang.Symbol
user=> (name (read-string "\u006A'foo"))
"j'foo"
Edit: Weirdly enough Entering that string directly in the REPL causes an error, and read-string happily accepts it.user=> 'j'foo j'foo