Fork me on GitHub
#announcements
<
2022-07-03
>
simongray07:07:27

New release of https://github.com/simongray/datalinguist (0.2.171), the Clojure wrapper for Stanford CoreNLP. This release... • bumps the CoreNLP dependencies to 4.4.0 (the latest) • adds support for Tregex (grammatical constituency tree pattern matching) • adds support for TokensRegex (token-based pattern matching) • removes the ML contribution by Carsten Behring at his own request

👀 3
👍 5
🎉 5
peterh08:07:13

Great work, thanks for doing this! Having had no previous experience with NLP libraries, I was wondering why I couldn’t get your examples to work. Then I realized that I had to download CoreNLP first from https://stanfordnlp.github.io/CoreNLP/ and add stanford-corenlp-4.4.0/* to the classpath. Everything worked fine after that. Is this what you are supposed to do? It wasn’t mentioned in the readme, so I was wondering if I did something wrong here or if it is more obvious to people who have already worked with CoreNLP.

simongray08:07:44

Hm… CoreNLP should be already added to the classpath if you’re using this as a library. However, you do need to download a language model to get most of the examples to work. This is mentioned in the README: https://github.com/simongray/datalinguist#language-models

simongray08:07:04

Perhaps I should make an example project to make this a bit clearer.

peterh08:07:02

This is strange, then maybe something with my setup wasn’t right. I have a minimal deps.edn that looks like this:

{:paths ["stanford-corenlp-4.4.0/*"]
 :deps {edu.stanford.nlp/stanford-corenlp$models-english {:mvn/version "4.4.0"}
        dk.simongray/datalinguist {:mvn/version "0.2.171"}}}
So I also had the language model already in there, but I had to add the path to get it to work. In my source file, I required [dk.simongray.datalinguist :refer :all] and recreated the example in your readme.

peterh08:07:48

Yes, an example project could be very helpful I believe.

simongray09:07:40

Here you go: https://github.com/simongray/datalinguist-example This works for me using the Clojure CLI in IntelliJ/Cursive.

peterh09:07:04

Thanks for the example! You used edu.stanford.nlp/stanford-corenlp$models instead of specifying the language model like I did before and it seems like this was the problem. I wasn’t aware that I also need the models library without any language suffix, but now it works fine.

simongray09:07:59

It's also confusing to me 😁 basically, most annotators need some data to work and in the case of English this data is not in a single place. Part of the motivation of this library is to make CoreNLP more accessible, since using it from Java is even more confusing IMO and requires some significant boilerplate.

simongray09:07:06

If I could, I would just include all of the official language models in datalinguist, but that constitutes a multi-gigabyte dependency.

peterh09:07:16

Yeah, I wouldn’t even know where to start when using the library from Java. Maybe a little note in the “Language models” section of your readme would clarify that you also need the package without the suffix? Otherwise I think it is really easy to setup.

👍 2
simongray10:07:10

I need to do a bunch of stuff, including rewriting the README and the examples 🙂

Ivar Refsdal18:07:26

New release of https://github.com/ivarref/double-trouble (0.1.96), a library to handle re-tried Datomic transactions and similar situations. This release adds: • A set-and-change function, :dt/sac, that cancels a transaction if a value does not change. • A just-increment-it function, :dt/jii, that increments the value of an attribute.

👍 1
🎉 3