Fork me on GitHub
#meander
<
2021-04-25
>
wilkerlucio02:04:42

is Meander appropriated to write a tokenizer (start from a single string and break tokens), and if it is, what would a base for that looks like?

Jimmy Miller20:04:27

Sorry meant to reply to this. We haven't focused much on text so I'm not sure if there would be a lot of benefit in using meander for this. Might be possible though.

JAtkins20:04:41

Hard to say without looking at what you are doing, but I’d first look at Instaparse and maybe pair it with meander.

Jimmy Miller20:04:41

Yeah, that is a pretty good combination.

wilkerlucio21:04:57

thanks guys, I find instaparse a bit too bulky for what I’m doing, because its not much a syntax, its more like a text email that I’m trying to extract data from. I’ve been doing ok using just meander and regex so far. one trick I did that made my life easier was to pre-parse the text and transform it in a “hiccup like” syntax, where there is one entry for each line. This makes easier to match on specific line numbers (when they make sense) and also avoid dealing with line breaks

(defn text->hiccup [text]
  (into []
        (map-indexed
          #(vector (keyword (str "l" %)) %2))
        (-> text
            (str/split-lines))))
them I can match like this:
(-> (m/search hiccup
      (m/scan [:l0 ?store])
      {:riviera-delivery.order/store ?store}
      
      (m/scan [:l2 (m/re #"Pedido número: #(\d+)" [_ ?id])])
      {:riviera-delivery.order/id ?id}
      
      (m/scan [_ (m/re #".*Total Geral: R\$(\d+,\d+).*" [_ ?total])])
      {:riviera-delivery.order/total (u/parse-br-money ?total)}

      (m/scan [_ (m/re #".*(\d+) x \.\.\.\.\.\. (.+?) \.\.\.\.\.\. R\$(\d+,\d+).*" [_ ?q ?n ?p])])
      {:riviera-delivery.order/items
       [{:riviera-delivery.item/quantity ?q
         :riviera-delivery.item/price    (u/parse-br-money ?p)
         :riviera-delivery.item/name     ?n}]})
    (->> (apply merge-with into)))

wilkerlucio21:04:59

example text that I’m matching against:

Padaria Bella Riviera

Pedido número: #1619127436602

Status: Pedido Entregue

Wilker, você será notificado (a) à cada nova alteração de status.

_______________________________________________________________________________________________________

Produtos:


3 x ...... Pão Francês (1unid) ...... R$1,10

1 x ...... Pão Ciabata (1unid) ...... R$5,50

noprompt22:04:28

I’m planning to add m/str eventually for the purpose of matching/yielding strings (along with m/bytes) but I’m focused on hitting the zeta compiler goals I mentioned previously.

noprompt22:04:44

But you could try something like this:

(me/rewrite "def foo { bar }"
  (me/re #"^\s*([{}])\s*(.*)" [_ ?brace ?tail])
  ([:brace ?brace] & (me/cata ?tail))

  (me/re #"^\s*def\s(.*)" [_ ?tail])
  ([:def] & (me/cata ?tail))

  (me/re #"^\s*([a-zA-Z]+)\s*(.+)" [_ ?identifier ?tail])
  ([:identifier ?identifier] & (me/cata ?tail))

  (me/re #"\s*")
  ()

  ?unknown
  ([:unknown ?unknown]))
;; =>
([:def]
 [:identifier "foo"]
 [:brace "{"]
 [:identifier "bar"]
 [:brace "}"])

🙏 2
wilkerlucio22:04:16

thanks for the snippet, I can see a parser from it 🙂

👍 3
noprompt22:04:44

But you could try something like this:

(me/rewrite "def foo { bar }"
  (me/re #"^\s*([{}])\s*(.*)" [_ ?brace ?tail])
  ([:brace ?brace] & (me/cata ?tail))

  (me/re #"^\s*def\s(.*)" [_ ?tail])
  ([:def] & (me/cata ?tail))

  (me/re #"^\s*([a-zA-Z]+)\s*(.+)" [_ ?identifier ?tail])
  ([:identifier ?identifier] & (me/cata ?tail))

  (me/re #"\s*")
  ()

  ?unknown
  ([:unknown ?unknown]))
;; =>
([:def]
 [:identifier "foo"]
 [:brace "{"]
 [:identifier "bar"]
 [:brace "}"])

🙏 2