Fork me on GitHub

What kind of combination are you thinking about, @conaw ?


Oh, and regarding your string question, I did something similar with finding regexps in a query language which would be enclosed by slashes and allowed backslash-escaped slashed inside. The regexp for this was so weird, I completely forgot, how it worked, but here it is:

REGEXP = <'/'> #'(?:.(?!(?<![\\\\])/))+.?' <'/'>
(the grammar is defined in a Clojure string, thus the massive escaping)


not sure yet to be honest — I’d like to be doing POS tagging, and tokenizing, but really enjoying instaparse and curious if anyone has used it in conjunction with something like opennlp


I once did a workshop on Clojure with very basic NLP examples (it was at a faculty for computational linguistics), but I did not combine it with any existing NLP libraries. Here at work, the NLP stuff is mostly self-written as much of it predates the open source libs. And we do not (yet?) use Clojure in that area.


Hm, looks like I never polished that workshop to put it online somewhere. Sorry.


(enough boasting now; please excuse the self-plugging)


Not boasting at all, I appreciate the link.


Another thing — Is there an idiomatic way to get the matched portion of a string for a given portion of a parse into the final transformed clojure data


I’m trying to parse the same text multiple times iteratively — passing the result to a different more granular parser based on the first


basically I’m trying to split the text up using a parse


spans looks like


There is a :partial option but it only returns the parse tree as far as it could be parsed. Maybe the total mode would help? Can't say. Sorry.


@conaw, I just found the span function which takes a parse tree (result of parsing) and returns start and end index into the string. So, you could first parse partially and then as your input string for the covered substring.


Like this:

(let [s "abcd"
               g "Q='a' 'b'"
               p (i/parser g)
               t (p s :partial true)]
               (apply subs
                             (into [s] (i/span t))))


(sorry for the broken indentation)