Fork me on GitHub
#admin-announcements
<
2016-01-25
>
nathansmutz02:01:13

I'm working on my first project to scrape specific data out of human-readable text reports. I need to scan for a specific string and then apply some logic to material on either side of it. Is there a recommended library for that kind of thing?

jonahbenton02:01:43

hey @nathansmutz instaparse is a great library when you have to make a parser from a grammar, but for ordinary text, it may be that all you need are regular expressions. Are you familiar with those? Or even clojure.string/split may be enough.

dev-hartmann07:01:20

hey fellow clojurians, i'm starting a new web project and being fairly new to clojure in production i could need a hint or two 😉 it's a website i need to rebuilt, so i want to use the chestnut leiningen template to use clojure/clojurescript, ring, compoujure and om.

dev-hartmann07:01:44

i need hints when it comes to best practices in project setup, e.g all the samples i have seen use only a few routes on the server side and a single clojuresript file managing the whole app for the client side, but i need a lot more routes and views 😉

dev-hartmann07:01:46

any pointers will be heavily appreciated

nathansmutz08:01:36

Thanks @ckarlsen and @jonahbenton . Looking at it, I see I can accomplish this with Clojure's built in string stuff. I've been meaning to get deeper into regex. I guess now I've "got two problems" :-) I'm mining some converted PDFs for data on courses students need to graduate. I'm probably still thinking about it too procedurally; but what has to happen is: after a certain trigger phrase, look for the string "Still Needed:" grab a number then, depending on whether the next word is "credits" or "class" search to the left of "Still needed:" for a phrase describing the type of credits, or search to the right of "Still Needed" for course names. A course subject might be followed by multiple course numbers identifying multiple classes: "CS 161 and 162" etc.

swizzard18:01:33

@nathansmutz: have you looked into instaparse?

futuro19:01:36

@dev-hartmann: I personally split up my UI into separate namespaces, then require them all in core and do my routing/general architecture there

futuro19:01:12

@dev-hartmann: In fact, my core ns is pretty sparse, as it's only real purpose is to bring all of the other pieces of the app together cohesively

futuro19:01:35

so UI, state, routing, etc. all live in their own ns

futuro19:01:45

I like doing this because each ns will have a couple functions that other ns's need, and the rest is hidden, so I only need to think about one ns at a time

dev-hartmann19:01:52

@futuro: thanks for your reply!

jonahbenton19:01:10

@nathansmutz: in processing messy text, such as in converted PDFs, a q&d mechanical procedural approach is usually best. functional processing opportunities usually only kick in when data is nice and clean and orderly and amenable to processing without special conditions.