I understand the reasoning but it still seems weird that when I search for a phrase containing the word "property" on the clojurescript website, it also finds results with no property but with "properly" in them. It doesn't do that if I just search for "property" alone.


I’m looking for some advice on how to transform complex xml documents to a clear edn format. I know about the various libraries to parse xml and have played around with some of them (, tupelo.forest, etc) and I get how to extract specific parts of the xml. However the specification for the xml I’m dealing with (if you can call it a specification…) allows for a lot of variation. So different documents will have the same information at slightly different locations/paths. Also some documents don’t follow the spec completely. Has anyone dealt with similar problems? Is there a smart way to tackle this?


one big picture approach to try: - use normal libraries to turn xml into a tree of clojure data (eg., then use tree-seq plus filter to find subtrees matching some pattern. Using the args to tree-seq you can "prune" subtrees (eg. if the parent is a comment form, not search into the child nodes for data, only look for certain tags inside specific relevant parent tags...)


that said I find xml frustrating (I think we all do), and I've never been fully satisfied with how I've handled xml data


You might find Tim Baldridge's Odin useful


Thanks for the tips. Last idea I had was using a zipper to walk depth first through the doc, keeping track of where you are in the tree and matching on patterns like you said @U051SS2EU. That way I can build up the simpler tree I want to get out of the data.


yeah - the difference between a zipper and tree-seq is with tree-seq you get every subtree in a lazy-seq, and can filter out the ones you think are relevant based on the full path, and with a zipper you have a place oriented navigation API, both can accomplish the same thing, I am prejudiced toward values over places


I think I see what you mean using tree-seq, although I guess it could be quite finicky to rebuild parts of the tree. But I will try again tomorrow


yeah - that approach is about extracting values, rather than transforming the tree


Using XPath (e.g. via might also be an option.