off-topic 2020-07-21 | Slack Archive

Aron09:07:20

I understand the reasoning but it still seems weird that when I search for a phrase containing the word "property" on the clojurescript website, it also finds results with no property but with "properly" in them. It doesn't do that if I just search for "property" alone.

jlmr15:07:24

I’m looking for some advice on how to transform complex xml documents to a clear edn format. I know about the various libraries to parse xml and have played around with some of them (clojure.data.xml, tupelo.forest, etc) and I get how to extract specific parts of the xml. However the specification for the xml I’m dealing with (if you can call it a specification…) allows for a lot of variation. So different documents will have the same information at slightly different locations/paths. Also some documents don’t follow the spec completely. Has anyone dealt with similar problems? Is there a smart way to tackle this?

noisesmith16:07:57

one big picture approach to try: - use normal libraries to turn xml into a tree of clojure data (eg. clojure.data.xml), then use tree-seq plus filter to find subtrees matching some pattern. Using the args to tree-seq you can "prune" subtrees (eg. if the parent is a comment form, not search into the child nodes for data, only look for certain tags inside specific relevant parent tags...)

noisesmith16:07:19

that said I find xml frustrating (I think we all do), and I've never been fully satisfied with how I've handled xml data

dominicm17:07:02

You might find Tim Baldridge's Odin useful

jlmr20:07:59

Thanks for the tips. Last idea I had was using a zipper to walk depth first through the doc, keeping track of where you are in the tree and matching on patterns like you said @U051SS2EU. That way I can build up the simpler tree I want to get out of the data.

noisesmith20:07:51

yeah - the difference between a zipper and tree-seq is with tree-seq you get every subtree in a lazy-seq, and can filter out the ones you think are relevant based on the full path, and with a zipper you have a place oriented navigation API, both can accomplish the same thing, I am prejudiced toward values over places

jlmr21:07:43

I think I see what you mean using tree-seq, although I guess it could be quite finicky to rebuild parts of the tree. But I will try again tomorrow

noisesmith21:07:15

yeah - that approach is about extracting values, rather than transforming the tree

flowthing18:07:17

Using XPath (e.g. via https://github.com/eerohele/sigel) might also be an option.

2020-07-21

Channels