Fork me on GitHub
#clojure-europe
<
2021-04-04
>
simongray06:04:39

I managed to basically completely replicate the Clojure regex functions while wrapping Stanford’s Semgrex DSL, since the Java classes underneath themselves mimic the Java regex classes. Pretty fun exercise! https://github.com/simongray/datalinguist/blob/master/src/dk/simongray/datalinguist/dependency.clj#L295-L349

👍 6
orestis16:04:48

Why? I mean what’s the reason to use a different regex engine?

orestis16:04:23

Oh it’s not regex at all. I got confused. Sorry!

simongray08:04:28

Yup, it's a DSL for matching against dependency grammar.

simongray08:04:20

While regex matches characters in strings, this matches grammar and other kinds of language data.

simongray08:04:37

Within a directed graph of nodes (words) related by grammatical relations.

reefersleep22:04:54

That’s sounds pretty cool!

reefersleep22:04:27

Are there examples of interesting usages?

simongray06:04:11

Not really interesting usages, but there are a few examples in the rich comment block https://github.com/simongray/datalinguist/blob/master/src/dk/simongray/datalinguist/dependency.clj#L351-L376

simongray06:04:08

I am thinking about making another DSL on top of it since I actually kinda dislike using text-based DSLs in Clojure 😆

simongray06:04:36

since it is matching against nodes in a directed graph it should be possible to represent it using Datomic-style triples

simongray06:04:57

I want to use it for building patterns to detect various Chinese sentence patterns

simongray06:04:56

I actually made my own Java API for doing the same stuff years ago, not knowing CoreNLP included such a feature already… https://github.com/simongray/StatementAnnotator/tree/master/src/main/java/statements/patterns

simongray06:04:46

(and good morning)