Fork me on GitHub
#off-topic
<
2021-03-06
>
jaihindhreddy08:03:05

Will anyone on http://lobste.rs consider giving me an invite?

Ben Sless15:03:01

Anyone has a recommended Clojure library for simple text analysis in English?

borkdude15:03:03

What do you mean with simple text analysis? Part of speech? You can try stanford NLP. Here is a demo: https://corenlp.run/

borkdude15:03:24

@simongray has made a little wrapper lib for this

simongray15:03:28

yup, it’s available at https://github.com/simongray/datalinguist, but currently requires you to use deps.edn since I have not packaged it as a JAR yet. Nevertheless, it’s probably still the most full-featured CoreNLP experience you will get in Clojure right now.

Ben Sless15:03:47

I was looking for that!

Ben Sless15:03:48

Google and github did not cooperate with me

Ben Sless15:03:53

oh wow the models are heavy Does not seem suitable for a small script?

simongray15:03:56

Yeah, they're usually a couple hundred MBs apiece AFAIK. I think most language models produced through machine learning tend to be quite heavy and the memory requirements are usually pretty substantial too for most of the interesting things you wanna do.

simongray15:03:42

Another option is to use CoreNLP directly through interop, but I don’t recommend that… there’s a reason I’m trying to wrap it.

orestis18:03:31

How do you use NLP? Like, as a better full-text search or doing more interesting stuff like trying to extract information from texts?

orestis18:03:51

Ouch, Standford and CoreNLP are GPL -- probably no go for us then 😞

simongray18:03:22

Yup - it sucks

lread18:03:04

I used the https://github.com/facebookarchive/duckling_old on a personal project a couple of years ago when I was starting my Clojure journey. I enjoyed the experience. The https://github.com/facebook/duckling.

Ben Sless20:03:19

I just wanted a simple way to lint commit messages

lread21:03:59

Ah @ben.sless, maybe just roll your own then? Depending on how sophisticated your linting is… that maybe be easier than figuring out some NLP thingy.