Fork me on GitHub
#meander
<
2020-03-02
>
noprompt04:03:25

[meander/epsilon "0.0.402"]

grounded_sage18:03:16

So I have meandered on over to tech.ml.dataset for processing the columnwise data that is csv. Sorely missing the clarity of Meander patterns.

noprompt18:03:07

I’m sorry. 🙂

grounded_sage18:03:26

As someone with a 10,000 foot view and little idea around internals. I’m wondering if there is a new ns there for Meander to handle large column data. Or like a Meander-csv that could leverage the codebase inside of tech.ml.dataset.

noprompt20:03:47

I’m hoping that zeta will have the ground work for doing that being able to deal with bytes, etc. and then building more sophisticated matching on top.

grounded_sage22:03:23

Cool. At present I’m dealing with two CSV’s. One with 1.5m rows and another with 300k. The dataset library handles it extremely well. It’s just a little more low level than I would like to be working at having seen how nice things can be :)

noprompt23:03:44

The goal (for me) is to be able to achieve low level performance but from the comfort of a high level. I believe we can get there via the right pattern matching primitives and pattern aliases (`defsyntax`). So, what I’m saying is, this case is motivating for me too. 🙂

👍 4
timothypratley15:03:58

Hi! Just so I can understand the desire here I’ll attempt to rephrase as:

I wish meander sequence patterns like (!xs ...) used transducers instead of memory variables.
^^ is this accurate? i.e.: The issue is that very long sequences don’t fit in memory? Or is it a different problem?

timothypratley15:03:56

(defn unarchived' [stories]
  (remove (fn [{:keys [archived completed]}]
            (and archived (not completed)))
          stories))

(def unarchived
  (s/rewrite
    ((m/or {:archived  true
            :completed false}
           !stories) ...)
    ;;>
    (!stories ...)))
^^ for a really big CSV !stories needs to be a sequence, not an array. Conversely when do we need an array not a sequence?

grounded_sage16:03:35

I’m still new to all of this so having some trouble keeping up. Though I am willing to dive in and contribute to this problem with a bit of guidance :)

noprompt21:03:58

(keep
 (fn [value]
   (me/rewrite value
     {:archived false, :completed true :as ?it}
     ?it))
 '({:archived true, :completed true}
   {:archived false, :completed true}
   {:archived true, :completed false}
   {:archived false, :completed false}))
;; => 
({:archived false, :completed true})
would be decent.

noprompt21:03:19

This also works

(me/rewrites '({:archived true, :completed true}
               {:archived false, :completed true}
               {:archived true, :completed false}
               {:archived false, :completed false})
  (me/scan {:archived false, :completed true :as ?it})
  ?it)
;; => 
({:archived false, :completed true})
but rewrites doesn’t support cata FYI.

timothypratley22:03:16

oh good thinking.

timothypratley22:03:31

Does that help with the original question of “Meander to handle large column data”?

noprompt22:03:59

It can. It just depends on what you are using. If you use a single in a pattern, Meander has to apply pattern matching to everything in the collection in question. If you can rephrase the pattern in such a way that search becomes applicable its nice to go that way.

grounded_sage10:03:14

Yea all of the interesting transformations and where meander has value for me is when I use