Fork me on GitHub
#announcements
<
2021-12-28
>
simongray13:12:27

https://github.com/simongray/datalinguist is a REPL-friendly Clojure wrapper for Stanford University’s CoreNLP Datalinguist tries to make the https://stanfordnlp.github.io/CoreNLP/ more manageable by introducing a more data-oriented API. Several Clojure protocols are supported, e.g. Loom graph protocols and Datafy. This makes for a much smoother experience than the fairly cumbersome Java experience. Currently, the full language annotation pipeline is available in the core datalinguist namespace, while certain post-annotation features such as dependency parsing, semgrex (regex for dependency parse graphs), and triple generation have received wrappers too in separate namespaces. You can use the various annotators through their supported language models (downloaded as separate dependencies — see README) to extract valuable information from text. Have fun!

💯 14
👀 6
👍 2
❤️ 1
simongray13:12:41

I should probably qualify what I mean by data-oriented. The native Java objects that are the outcome of the language annotation process are used directly as input for the various post-annotation functions since this allows for the most direct mapping to CoreNLP’s various methods, however almost all of them can be recursively datafied at any point should you wish. Furthermore, the config used to construct a language processing pipeline is just a regular Clojure map. The point is to maximise integration with both CoreNLP as well as a typical Clojure workflow.

slimslenderslacks20:12:11

A clojure cli https://clojure.org/reference/deps_and_cli#_using_named_tools for packaging clojure apps into container images using https://github.com/GoogleContainerTools/jib (docker-less build/push) - https://github.com/vehvis/lein-jib-build already exists for leiningen. This supports jib builds in the deps.edn world, and comes with some opinions on how to layer jar dependencies in container images. https://github.com/atomisthq/jibbit

👏 11
🐳 4
📦 2
1
orestis09:12:22

Woo I was looking at juxt pack just yesterday since it was the only thing that Google brought up for this. Glad to see this. Thanks!

eskos15:12:37

I don’t have an env handy for testing this immediately so I’ll ask a dumb question instead - does this tool add dependencies as JARs in a separate layer instead of making an uberjar? Because if yes, that’s 💯🎉

slimslenderslacks16:12:03

@U8SFC8HLP ya, that's exactly what it does. It uses the basis from deps.edn to copy all of the dependencies into one layer. Then it generates a Class-Path manifest entry in the application jar and copies that to its own layer.

👍 1
slimslenderslacks16:12:22

@U7PBP4UVA feel free to reach out if you have any issues. I've been using this to deploy a pretty broad set of clojure projects (mostly deploying to gcr and ecr so far) but my entrypoints are very boring (`java -jar app.jar` with a configurable namespace for the -main - I sort of anticipate that this might not be sufficient. Also happy to add some more authenticators for image pushes - I now know more than I'd like to about registry authentication so I'd like that to be useful somehow.

orestis16:12:06

I'm still evaluating containers for production etc but in our case we're on AWS so ECR etc.

slimslenderslacks17:12:40

Cool, the author of lein-jib-build had already worked through most of the details for authenticating to ECR via environment variables, aws profiles, assume role, ... It's working well in practice.

Ivar Refsdal10:12:39

Neat project! I would recommend adding "-Dclojure.main.report=stderr" "-Dfile.encoding=UTF-8" to the java command in order to have helpful error messages on exceptions (and not a file) and also setting the default encoding to UTF-8 (IIRC this will affect e.g. (slurp (io/resource "some-classpath-file"))). I would also advise against using pure clojure.core/read-string as it can execute code, and instead use clojure.edn/read-string.

❤️ 1
slimslenderslacks15:12:00

@UGJE0MM0W thanks! Those do seem like better defaults - have added them now. Good catch on read-string too. Thanks again.

❤️ 1
eskos11:01:04

Those two have different semantics though, which is why https://github.com/clojure/tools.reader exists 🙂

Ivar Refsdal11:01:13

What exactly are you trying to say @U8SFC8HLP? > Presuming you're reading typical edn data [on the JVM], clojure.edn is the preferred option. Rough quote from Alex Miller https://groups.google.com/g/clojure/c/d61ImK2VCag.