Fork me on GitHub

I'm working on my first project to scrape specific data out of human-readable text reports. I need to scan for a specific string and then apply some logic to material on either side of it. Is there a recommended library for that kind of thing?


hey @nathansmutz instaparse is a great library when you have to make a parser from a grammar, but for ordinary text, it may be that all you need are regular expressions. Are you familiar with those? Or even clojure.string/split may be enough.


hey fellow clojurians, i'm starting a new web project and being fairly new to clojure in production i could need a hint or two 😉 it's a website i need to rebuilt, so i want to use the chestnut leiningen template to use clojure/clojurescript, ring, compoujure and om.


i need hints when it comes to best practices in project setup, e.g all the samples i have seen use only a few routes on the server side and a single clojuresript file managing the whole app for the client side, but i need a lot more routes and views 😉


any pointers will be heavily appreciated


Thanks @ckarlsen and @jonahbenton . Looking at it, I see I can accomplish this with Clojure's built in string stuff. I've been meaning to get deeper into regex. I guess now I've "got two problems" :-) I'm mining some converted PDFs for data on courses students need to graduate. I'm probably still thinking about it too procedurally; but what has to happen is: after a certain trigger phrase, look for the string "Still Needed:" grab a number then, depending on whether the next word is "credits" or "class" search to the left of "Still needed:" for a phrase describing the type of credits, or search to the right of "Still Needed" for course names. A course subject might be followed by multiple course numbers identifying multiple classes: "CS 161 and 162" etc.


@nathansmutz: have you looked into instaparse?


@dev-hartmann: I personally split up my UI into separate namespaces, then require them all in core and do my routing/general architecture there


@dev-hartmann: In fact, my core ns is pretty sparse, as it's only real purpose is to bring all of the other pieces of the app together cohesively


so UI, state, routing, etc. all live in their own ns


I like doing this because each ns will have a couple functions that other ns's need, and the rest is hidden, so I only need to think about one ns at a time


@futuro: thanks for your reply!


@nathansmutz: in processing messy text, such as in converted PDFs, a q&d mechanical procedural approach is usually best. functional processing opportunities usually only kick in when data is nice and clean and orderly and amenable to processing without special conditions.