Fork me on GitHub
#tree-sitter
<
2023-01-27
>
mauricio.szabo04:01:11

Folks, does queries already work with tree-sitter Clojure?

mauricio.szabo04:01:33

I was trying to use it and I'm getting syntax errors on queries...

Noah Bogart13:01:30

They should, but ts clojure has no built in semantic markings, only data. The maintainer feels that given the extensible nature of clojure, trying to model locals and function definitions etc is fruitless

lispers-anonymous15:01:09

Noah is correct though, the grammar only describes raw syntax of clojure. Aka, this thing is a list, this thing is a symbol, etc. There are no nodes for things like functions for namespace declarations.

mauricio.szabo17:01:33

I'm still not sure them how syntax highlighting is supposed to work, honestly. I though we need to define queries and these queries will return what needs to be highlighted... did I understand that thing wrong about tree-sitter?

Noah Bogart19:01:05

Ehh, sort of? Tree sitter has the parser, which reads in the file (a string) and creates nodes: (def foo ::cool) is turned into something like (list_node [(sym_node (sym_name)) (kwd_node (kwd_name))])

Noah Bogart19:01:00

that's the AST that the queries then run over. the syntax highlighting maps kwd_node to "purple", or (list_node [(sym_node (sym_name) ...]) where sym_name is def) to green etc

borkdude19:01:01

maybe a tree-sitter plugin can be made or enhanced with clj-kondo since it does have this semantic information (which is also used by lsp clients for highlighting, it's called semantic tokens)

lispers-anonymous14:01:23

For editors like Emacs that is probably overkill. We can infer enough semantic information 99% of the time to get good syntax highlighting that works better than the existing regex based highlighting. For example, we know it is mostly safe to say something like

(list_list (sym_lit) @def_kw
           (sym_lit) @def_name)
To match definition forms, when @def_kw matches a regex like "^def.*" The actual function name matching is more complicated, but it's not too crazy.

lispers-anonymous14:01:33

Beyond syntax highlighting, navigation and indentation are really the only things to worry about. Indentation is complicated but not too much.

borkdude14:01:46

ah yes, makes sense. so by matching def + defn you can also infer the "function" names maybe?

borkdude14:01:00

and what about locals?

lispers-anonymous14:01:44

That can also be done by writing queries to match bindings inside let type blocks.

lispers-anonymous14:01:27

BUT all this can be thwarted by users. Imagine

(defmacro my-improved-let ...)
Similar with def Users can always sidestep this stuff with macros. That's why we don't codify any of that in tree-sitter-clojure. We can get false negatives and false positives trying to identify semantic things in the language because users can change the language's semantics

borkdude14:01:07

@UKFSJSM38 Does clojure-lsp colorize locals based on clj-kondo analysis, I think so right? This also has support for macros (when you configure it correctly)

borkdude14:01:41

Also navigation to locals works like you would expect. Also e.g. when they have the same name as a var. But I assume you can also make that work with tree-sitter

lispers-anonymous14:01:34

Maybe. Tree sitter lacks the context of the rest of the document though. It doesn't know what was declared as a macro somewhere else, or a binding up above.

lispers-anonymous14:01:00

In trying to do that with tree-sitter I would end up doing a lot of the same work that clj-kondo does, and clojure-lsp

borkdude14:01:24

I wonder what the difference in experience is with tree-sitter vs clojure-lsp for highlighting, I hope @UKFSJSM38 knows a bit about this.

ericdallo14:01:56

Clojure-lsp does colorize locals as variables using kondo analysis

lispers-anonymous14:01:22

Yeah, it doesn't do general highlighting for the entire document right?

ericdallo14:01:40

IMO clojure-lsp semantic tokens are smarter since we have more data from kondo, so we know if a call is from a function or macro for example and other things

lispers-anonymous14:01:54

Yeah, I don't have access to that info at all in tree-sitter. BUT the semantics tokens work very well on top of tree-sitter highlighting in Emacs. I've used them both at the same time.

lispers-anonymous14:01:57

Tree-sitter is great at describing the syntax of a language. Clojure doesn't have much syntax, so as a result the tree-sitter grammar is extremely simple. In other languages that are not as flexible as Clojure or other lisps it can also be good at understanding the semantics as well. But for Clojure, to understand the semantics means to understand that program, because the program can change the semantics. Tree-sitter can't do that, it's not even possible for it to backtrack a single character when parsing. Some basic semantics, like the def example above, can be considered, but it's always possible for it to be wrong. Normal clojure-mode can run into this as well. That's why more sophisticated tools like clojure-lsp are such a good supplement.

👍 2
ericdallo14:01:13

That's my point of view as well, agreed

mauricio.szabo19:01:46

Well, I was trying to follow this: https://tree-sitter.github.io/tree-sitter/syntax-highlighting, where it does mention something like: > The Tree-sitter highlighting system works by annotating ranges of source code with logical “highlight names” like function.method, type.builtin, keyword, etc. In order to decide what color should be used for rendering each highlight, a theme is needed. When I tried to use tree-sitter-clojure, none of these names like function were returned anywhere, so I'm completely lost...