Fork me on GitHub
#meander
<
2022-02-09
>
Ben Sless11:02:10

I'm rewriting html I parsed with crouton and trying to collect text from within nested tags, but I miss something with the flattening part. What's missing in this pattern?

{:tag (m/or :code :span :p :div :em :a)
   :content (m/some [!x ...])} 

[(m/cata !x) ...]

Richie12:02:32

I started writing code like https://clojurians.slack.com/archives/C03S1KBA2/p1644357312001629 instead of cata. It lets me debug it more easily.

Ben Sless13:02:17

I'd still rather do it with cata...

Richie13:02:29

It’s not a list is it?

(m/rewrite '(1 2 3)
  [!x ...] [!x ...])
;; nil

Ben Sless13:02:18

No, it's nested parsed HTML

Richie13:02:05

Sorry, I’m not sure from your answer. If the value of :content isn’t a vector then [!x …] won’t match.

(m/rewrite '(1 2 3)
  (m/seqable !x ...) [!x ...])
;; [1 2 3]

Ben Sless13:02:31

The value of :content is a vector of strings or maps which will contain more :content

ribelo14:02:08

I recommend adding the key to cata as the first argument, which will tell us what should happen

Ben Sless14:02:21

Not sure I follow. specifically, I'm parsing some html table:

(m/rewrite (second tables)

  {:tag :table
   :content [(m/cata !m) ...]}
  [!m ...]

  {:tag :tbody :content nil} {}

  {:tag :thead
   :content
   [{:content
     [{:content [?parameter]}
      {:content [?description]}]}]}

  {:parameter ?parameter
   :description ?description}

  {:tag :tbody
   :content [!tr ...]}

  [(m/cata !tr) ...]

  {:tag :tr
   :content
   [{:tag :td :content [?key ?desc]}
    {:tag :td :content [(m/cata !doc) ...]}]}

  {:field (m/cata ?key)
   :type (m/cata ?desc)
   :doc [!doc ...]}

  {:tag (m/or :a :code :span :p :div :em :a :ul :li :i)
   :content (m/some [(m/cata !xs) ...])} [!xs ...]

  {:tag (m/or :a :code :span :p :div :em :a :ul :li :i)} nil

  {:tag :br} "\n"

  ?x ?x)

ribelo14:02:59

{:tag (m/or :code :span :p :div :em :a)
 :content (m/some [(m/cata [:flatten !x]) ...])}
[:flatten [!xs ...]]
[!xs ...]
[:flatten ?x]
?x

Ben Sless14:02:23

ah, tag the data

ribelo14:02:55

this way, every time you use cata in any place, it will either unpack the vector or return the argument unchanged

ribelo14:02:11

much easier to debug and read code IMHO

Ben Sless14:02:49

I'm missing the rhs for the map example

ribelo14:02:49

can you give a piece of HTML so we have the same?

ribelo14:02:54

btw, try this

{:tag :table
   :content [& [(m/cata !m) ...]]}
  [!m ...]

ribelo14:02:09

it should work like into

Ben Sless14:02:22

Even a tiny example like

{:tag :p
 :content
 [{:tag :p
   :content
   [{:tag :p
     :content ["Hello"]}
    {:tag :p
     :content ["world"]}]}
  {:tag :p
   :content
   [{:tag :p
     :content ["Yes"
               {:tag :p
                :content ["No"]}]}]}]}

ribelo14:02:39

(m/rewrite data

  {:tag :table
   :content [& [(m/cata !m) ...]]}
  [!m ...]

  {:tag :tbody :content nil} {}

  {:tag :thead
   :content
   [{:content
     [{:content [?parameter]}
      {:content [?description]}]}]}

  {:parameter ?parameter
   :description ?description}

  {:tag :tbody
   :content [!tr ...]}

  [& [(m/cata !tr) ...]]

  {:tag :tr
   :content
   [{:tag :td :content [?key ?desc]}
    {:tag :td :content [& [(m/cata !doc) ...]]}]}

  {:field (m/cata ?key)
   :type (m/cata ?desc)
   :doc [!doc ...]}

  {:tag (m/or :a :code :span :p :div :em :a :ul :li :i)
   :content (m/some [(m/cata !xs) ...])} (m/cata [!xs ...])

  {:tag (m/or :a :code :span :p :div :em :a :ul :li :i)} nil

  {:tag :br} "\n"

  (m/with [%a (m/some !xs)
           %b [(m/or %b %a) ...]
           %c (m/or %b %a)]
    %c)
  [!xs ...]
  ?x ?x)
;; => ["Hello" "world" "Yes" "No"]

ribelo14:02:21

this is for now, but can be done better

Ben Sless19:02:00

This + hiccup example:

;; Collect all content
  (m/with [%p {:content [(m/or (m/pred string? !s) %p %q) ...]}
           %q {:content nil}]
    %p)
  [!s ...]

Ben Sless19:02:16

collects all strings, no need to collect then flatten

Ben Sless19:02:32

Slightly more verbose but clearer what's going on:

(m/with [%s (m/pred string? !s) ;; string ref
           %q {:content nil} ;; empty
           %c (m/or %s %p %q) ;; content vector can be
           %p {:content [%c ...]} ;; recursion, mutually
           ]
    %p)
  [!s ...]

👍 1