Fork me on GitHub
#meander
<
2020-07-01
>
jlmr08:07:13

Hi, I’m hoping to use meander to extract the relevant information from some XML. I’ve used clojure.data.xml to parse the following XML. I would like to extract some fields for each of the :tag :record records. I don’t expect a fully formed pattern but it would be great if someone could point me in the right direction.

noprompt08:07:49

You could start with m/$ which is kind of like jQuery:

;; Assuming `data` is the data you provided.
(m/search data
  (m/$ {:tag :record :as ?data})
  ?data)
;; =>
({:attrs {},
  :content
  ({:attrs {:status "deleted"},
    :content
    ({:attrs {},
      :content ("l4l:oai:),
      :tag :identifier}
     {:attrs {}, :content ("2019-03-30T00:07:07Z"), :tag :datestamp}
     {:attrs {}, :content ("l4l"), :tag :setSpec}),
    :tag :header}),
  :tag :record}
 ,,,)
This will find all the {:tag :record} maps. 👍

jlmr09:07:08

@U06MDAPTP thanks! But then I would want to dive deeper into the records to extract only the relevant fields and put them into a flat clojure map: {:title <extracted title> :description <extracted description> :more :fields :like :this}. How would I go about that?

jlmr09:07:11

I’ve gotten as far as:

(m/search xml
    (m/$ {:tag :record
          :content (m/$ {:tag :title
                         :content (m/$ {:tag :langstring
                                        :content (?title)})})})
    {:title ?title})

jlmr09:07:39

Is this the best way to continue?

noprompt09:07:16

You could do that as well. However, if you know the data you are interested in exists in a certain location in the :content you can simply draw that as a pattern:

{:tag :record 
 :content (m/scan {:tag :title :content (?title)})}

noprompt09:07:47

Judging from the data, I think I would recommend separating that out as separate step rather than do it all in the pattern match.

jlmr09:07:47

Ok, I will play with it some more, thanks for the tips!

jlmr11:07:50

@U06MDAPTP one more question if you have time:

(defn record
  [record]
  (m/find record
    (m/separated {:tag :metadata
                  :content (m/scan {:tag :lom
                                    :content (m/separated {:tag :general
                                                           :content (m/separated {:tag :title
                                                                                  :content (m/scan {:content ?title})}
                                                                                 {:tag :description
                                                                                  :content ?description})})})})
    {:title ?title
     :description ?description}))
This is how far I’ve gotten so far. It works mostly, but I would expect (m/scan {:content ?title}) to return a sequence of all the titles. The data for that particular would look like this:
({:attrs {:xml/lang "en"}
  :content ("English title")
  :tag :langstring}
 {:attrs {:xml/lang "nl"}
  :content ("Dutch title")
  :tag :langstring})

noprompt17:07:09

I think you want to switch from find to search to yield all the results.

👍 3
markaddleman23:07:19

Is there a more idiomatic way to write

(m/rewrite query
    {:source {:scope {:type "apps" :apps (m/some ?apps) & ?scope-rest} & ?source-rest} & ?rest}
    {:source {:scope {:type          "apps"
                      :apps          ?apps
                      :segment-query {:where {:op   "in"
                                              :args [{:path ["event-attr" "appKey"]}
                                                     {:op "UNNEST" :args [{:op "ARRAY" :args ?apps}]}]}}
                      &              ?scope-rest}
              &      ?source-rest}
     &       ?rest})
In deeply nested maps like this, it's mildly cumbersome to manage all of the & xyz terms

Jimmy Miller23:07:19

I don’t know of any different way to express that rewrite. But you can put the & ?rest terms at the beginning if that helps you keep track of them better.

(m/rewrite query
  {:source {:scope {:type "apps" :apps (m/some ?apps) & ?scope-rest} & ?source-rest} & ?rest}
  {& ?rest
   :source 
   {& ?source-rest
    :scope 
    {& ?scope-rest
     :type "apps"
     :apps ?apps
     :segment-query {:where {:op "in"
                             :args [{:path ["event-attr" "appKey"]}
                                    {:op "UNNEST" :args [{:op "ARRAY" :args ?apps}]}]}}}}})

Jimmy Miller23:07:43

Other than syntactic things though, I can’t think of anything I would do differently.

markaddleman23:07:15

Thanks. Yeah, reordering will help a bit.

markaddleman23:07:01

My use case has a lot of this sort of thing. I was wondering if there is broad value for a special kind of rewrite - something like rewrite-merge where the syntax favors merging new information into the map