meander

Ben Sless 2022-02-09T11:40:10.175019Z

I'm rewriting html I parsed with crouton and trying to collect text from within nested tags, but I miss something with the flattening part. What's missing in this pattern?

{:tag (m/or :code :span :p :div :em :a)
   :content (m/some [!x ...])} 

[(m/cata !x) ...]

Richie 2022-02-09T12:57:32.334349Z

I started writing code like https://clojurians.slack.com/archives/C03S1KBA2/p1644357312001629 instead of cata. It lets me debug it more easily.

Ben Sless 2022-02-09T13:01:17.713699Z

I'd still rather do it with cata...

Richie 2022-02-09T13:22:29.207949Z

It’s not a list is it?

(m/rewrite '(1 2 3)
  [!x ...] [!x ...])
;; nil

Ben Sless 2022-02-09T13:32:18.483299Z

No, it's nested parsed HTML

Richie 2022-02-09T13:34:05.146049Z

Sorry, I’m not sure from your answer. If the value of :content isn’t a vector then [!x …] won’t match.

(m/rewrite '(1 2 3)
  (m/seqable !x ...) [!x ...])
;; [1 2 3]

Ben Sless 2022-02-09T13:54:31.324239Z

The value of :content is a vector of strings or maps which will contain more :content

2022-02-09T14:09:08.499869Z

I recommend adding the key to cata as the first argument, which will tell us what should happen

Ben Sless 2022-02-09T14:10:21.327899Z

Not sure I follow. specifically, I'm parsing some html table:

(m/rewrite (second tables)

  {:tag :table
   :content [(m/cata !m) ...]}
  [!m ...]

  {:tag :tbody :content nil} {}

  {:tag :thead
   :content
   [{:content
     [{:content [?parameter]}
      {:content [?description]}]}]}

  {:parameter ?parameter
   :description ?description}

  {:tag :tbody
   :content [!tr ...]}

  [(m/cata !tr) ...]

  {:tag :tr
   :content
   [{:tag :td :content [?key ?desc]}
    {:tag :td :content [(m/cata !doc) ...]}]}

  {:field (m/cata ?key)
   :type (m/cata ?desc)
   :doc [!doc ...]}

  {:tag (m/or :a :code :span :p :div :em :a :ul :li :i)
   :content (m/some [(m/cata !xs) ...])} [!xs ...]

  {:tag (m/or :a :code :span :p :div :em :a :ul :li :i)} nil

  {:tag :br} "\n"

  ?x ?x)

2022-02-09T14:10:59.967959Z

{:tag (m/or :code :span :p :div :em :a)
 :content (m/some [(m/cata [:flatten !x]) ...])}
[:flatten [!xs ...]]
[!xs ...]
[:flatten ?x]
?x

Ben Sless 2022-02-09T14:11:23.479559Z

ah, tag the data

2022-02-09T14:11:55.529429Z

this way, every time you use cata in any place, it will either unpack the vector or return the argument unchanged

2022-02-09T14:12:11.759149Z

much easier to debug and read code IMHO

Ben Sless 2022-02-09T14:12:49.099669Z

I'm missing the rhs for the map example

2022-02-09T14:14:49.912479Z

can you give a piece of HTML so we have the same?

2022-02-09T14:15:54.100589Z

btw, try this

{:tag :table
   :content [& [(m/cata !m) ...]]}
  [!m ...]

2022-02-09T14:16:09.648139Z

it should work like into

Ben Sless 2022-02-09T14:17:22.583969Z

Even a tiny example like

{:tag :p
 :content
 [{:tag :p
   :content
   [{:tag :p
     :content ["Hello"]}
    {:tag :p
     :content ["world"]}]}
  {:tag :p
   :content
   [{:tag :p
     :content ["Yes"
               {:tag :p
                :content ["No"]}]}]}]}

2022-02-09T14:26:39.038269Z

(m/rewrite data

  {:tag :table
   :content [& [(m/cata !m) ...]]}
  [!m ...]

  {:tag :tbody :content nil} {}

  {:tag :thead
   :content
   [{:content
     [{:content [?parameter]}
      {:content [?description]}]}]}

  {:parameter ?parameter
   :description ?description}

  {:tag :tbody
   :content [!tr ...]}

  [& [(m/cata !tr) ...]]

  {:tag :tr
   :content
   [{:tag :td :content [?key ?desc]}
    {:tag :td :content [& [(m/cata !doc) ...]]}]}

  {:field (m/cata ?key)
   :type (m/cata ?desc)
   :doc [!doc ...]}

  {:tag (m/or :a :code :span :p :div :em :a :ul :li :i)
   :content (m/some [(m/cata !xs) ...])} (m/cata [!xs ...])

  {:tag (m/or :a :code :span :p :div :em :a :ul :li :i)} nil

  {:tag :br} "\n"

  (m/with [%a (m/some !xs)
           %b [(m/or %b %a) ...]
           %c (m/or %b %a)]
    %c)
  [!xs ...]
  ?x ?x)
;; => ["Hello" "world" "Yes" "No"]

2022-02-09T14:28:21.942569Z

this is for now, but can be done better

Ben Sless 2022-02-09T14:29:10.564419Z

👍

Ben Sless 2022-02-09T19:44:00.946319Z

This + hiccup example:

;; Collect all content
  (m/with [%p {:content [(m/or (m/pred string? !s) %p %q) ...]}
           %q {:content nil}]
    %p)
  [!s ...]

Ben Sless 2022-02-09T19:44:16.080149Z

collects all strings, no need to collect then flatten

Ben Sless 2022-02-09T19:48:32.457899Z

Slightly more verbose but clearer what's going on:

(m/with [%s (m/pred string? !s) ;; string ref
           %q {:content nil} ;; empty
           %c (m/or %s %p %q) ;; content vector can be
           %p {:content [%c ...]} ;; recursion, mutually
           ]
    %p)
  [!s ...]

👍 1