off-topic

Vincent 2025-10-24T09:30:19.331869Z

Anyone well versed in the most recent quantum entanglement experiments?

jaihindhreddy 2025-10-24T12:20:20.137499Z

I'm writing a lexer and parser for Clojure in a low-level lang as a learning exercise, with the goal of writing a fast code-formatter. Is there a place that exhaustively documents the syntax of Clojure? (more so than the reader reference) With some experimentation with the reader, I learned several unobvious cases, like: 1. spaces are accepted before the symbol name in tagged-literals. # uuid "1843e31d-6cee-4c3a-8eaa-7cbfd6e8cfbf". 2. Any number of slashes in symbols is accepted, although the reference states there must be at most one. 'a/b/c reads just fine, treating b/c as the name of the symbol. One thing for example, that's not as clear to me, is the exhaustive set of characters accepted in character literals.

pat 2025-10-24T12:56:23.609739Z

You may find clojure.tools.reader easier to read than lispreader.java. There are some antlr grammars out there too but i think they may pre-date namespaced maps and maybe even reader conditionals. You may also want to checkout clojure-standard-style

jaihindhreddy 2025-10-24T12:58:40.880489Z

Thanks! I'm a little worried about inheriting any non-standard behavior from things like clojure-standard-style, Babashka and Joker. clojure.tools.reader should be close enough to the actual reader for my purposes.

pat 2025-10-24T13:00:41.113129Z

True, lispreader.java is truth; if you find a discrepancy in tools.reader that would be worth an issue

πŸ’― 1
jaihindhreddy 2025-10-24T13:01:38.123899Z

On first look, it seems to be implemented in almost the same way as lispreader.java, by having a set of "macro-chars" split the byte-stream into tokens, and then categorizing and/or further splitting them into the actual tokens.

πŸ‘ 1
dpsutton 2025-10-24T13:19:34.771779Z

lispreader.java is implementation right now. It can change and reserves the right to change. you’re right that what this does is what clojure does, but it might be a moving target.

πŸ’― 1
dpsutton 2025-10-24T13:20:39.677759Z

what language are you writing it in? (just curious and i promise i have no interest in a language debate, just interested in the choice)

jaihindhreddy 2025-10-24T13:24:56.856589Z

> It can change and reserves the right to change True. In some cases, the reader-reference docs don't have an opinion, like the first of the examples above. Whereas the second example is something the reader currently accepts but the reference explicitly disallows. In such cases, I can either choose to be strict according to the reference, or accept anything the actual reader accepts, which makes more sense for a formatter so that I can format any code that does actually run today. > what language Zig. I've been wanting to learn a low-level language seriously for a few years now, and am finally jumping into it. Currently writing the lexer as a straightforward loop-with-a-switch state-machine, and a hand-written recursive descent parser. Once I get the correctness right, I want to try and go as fast as possible, mostly as an excuse to learn profiling and things like SIMD.

dpsutton 2025-10-24T13:25:35.243909Z

sounds like a really cool project. I’d love to see periodic updates as you progress!

❀️ 1
seancorfield 2025-10-24T15:43:44.035539Z

There are a lot of symbols/keywords/etc that are readable but are not "valid" according to the documentation.

βœ… 1
phronmophobic 2025-10-24T17:20:43.921829Z

You can check https://github.com/sogaiu/tree-sitter-clojure as an additional reference. Unless you're intent on rolling your own parsers, you can also use the tree sitter grammar directly from zig.

1
jaihindhreddy 2025-10-24T14:31:20.855469Z

Just found this cursed REPL interaction when trying to figure out the set of characters accepted by the reader after a valid character literal πŸ™‚:

βœ“ ~ > clojure
Clojure 1.12.0
user=> (read-string "\u006A'foo")
j'foo
user=> (type (read-string "\u006A'foo"))
clojure.lang.Symbol
user=> (name (read-string "\u006A'foo"))
"j'foo"
Edit: Weirdly enough Entering that string directly in the REPL causes an error, and read-string happily accepts it.

2025-10-24T16:34:05.047079Z

user=> 'j'foo j'foo