Fork me on GitHub

Hello! I ran into some kind of performance issue with the code below. The time to execute seems to increase exponentially for every m<nr> I add on the LHS. The code below takes about 10secs to execute. Any ideas?

(m/rewrite {0 {:id 0
                   :m1 {:a {:id :a, :x 0}}
                   :m2 {:b {:id :b, :x 0}}
                   :m3 {:c {:id :c, :x 0}}}
                1 {:id 1
                   :m1 {:a {:id :a, :x 0}}
                   :m2 {:b {:id :b, :x 0}}
                   :m3 {:c {:id :c, :x 0}}}}
      (m/map-of _
                {:id !id
                 :m1 {& (m/seqable [!m1-ids !m1s] ..!m1-cnt)}
                 :m2 {& (m/seqable [!m2-ids !m2s] ..!m2-cnt)}
                 :m3 {& (m/seqable [!m3-ids !m3s] ..!m3-cnt)}})

      [{:id !id
        :m1-ids [!m1-ids ..!m1-cnt]
        :m2-ids [!m2-ids ..!m2-cnt]
        :m3-ids [!m3-ids ..!m3-cnt]
        } ...])


It seems the be the compilation that takes time.


I spent some time narrowing it down to test-m2being slow compared to the others below

;; fast
(defn test-m1 [x]
    (m/rewrite x
               (m/map-of _
                         {:id !id
                          :m1 [!m1s ...]
                          :m2 [!m2s ...]
                          :m3 [!m3s ...]
                          :m4 [!m4s ...]})
               [{:id !id
                 } ...]))
;; slow
(defn test-m2 [x]
    (m/rewrite x
               (m/map-of _
                         {:id !id
                          :m1 [[!m1-ids !m1s] ...]
                          :m2 [[!m2-ids !m2s] ...]
                          :m3 [[!m3-ids !m3s] ...]
                          :m4 [[!m4-ids !m4s] ...]
               [{:id !id
                 } ...]))
;; fast
(defn test-m3 [x]
  (m/rewrite x
    {:id !id
     :m1 [[!m1-ids !m1s] ...]
     :m2 [[!m2-ids !m2s] ...]
     :m3 [[!m3-ids !m3s] ...]
     :m4 [[!m4-ids !m4s] ...]
    [{:id !id
      } ...]))


test-m1 is faster than test-m2 because it has fewer memory variables. Repeating the pattern upto m8 in test-m1 will make it perform as test-m2. test-m3 does not show the same behaviour. So my working conclusion is that it is related to map-of and the amount of memory variables


I guess this could be a similar issue as #234 I will leave it at that for now and try a different solution.

Lidor Cohen13:08:59

Hello! I'm starting to learn my way through meander and I hoped someone could help me with (what I believe is) a simple use case I struggle with: I have 2 csvs (vector of vectors) with one holding a vector of ids in the other csv (tags). I want to collect all the tags that match the relevant item. Something like this, but I haven't wrapped my head around matching \ collecting \ spreading:

(defn data-mapper [data]
  (m/search data

            {:data (m/scan _ ?product)
             :tags (m/scan _ (m/pred #(contains? (split (?product 36) ";") (% 0)) tag?))}

            {:products {"name" (?product 12)
                        "description" (?product 13)
                        "price" (js/parseInt (?product 17))
                        "media" {"data" {"src" (?product 24)}}
                        "product_tags" {"data" [{"name" (tag? 3)} ...]}}}))
If anyone can point in the right direction that would be great ^_^


Hi Lidor, are you able to provide a sample of your input data and expected output data? Also, note that it is only safe to use ?product in the m/pred function like that for small maps e.g. PersistentArrayMap.

Lidor Cohen17:08:20

well it will be pretty hard because the input is quite dirty, its is basically a parsed csv (vector of vectors) with 54 columns and the first vector being the headers, so something like this:

{:products [["name" "some" "unimportant" "values" ... "description" ... "price" ... "media" ... "tags" ...]
            ["some-name" "bla"  "bla" "bla" ... "some long description" ... "42" ... "" ... "238;239;756;785;1111;" ...]
 :tags     [["Category ID" ... "Category Name" ... "Parent" ...]
            ["238" ... "catcat" ... "catcat's mom"]
            ["239" ... "catcat's mom" ... ""]
And the output should look something like this:
[{"name" "some-name"
 "description" "some long description"
 "price" 42
 "media" {"data" {"src" ""}}
 "product_tags" [{"name" "catcat" "parent" "catcat's mom"}
                 {"name" "catcat's mom" "parent" ""}
I'm guessing I didn't use meanderright for this task, I'm just starting to learn its deeper powers 😅

Lidor Cohen17:08:02

I could clean the input before inputting into meander but I was hoping to be able to use meander for that as well...


For the CSV, personally, I would zipmap the fields or pluck them out with nth, etc. in a preprocessing step and then run them through meander.


I would also index the tags as well. You can then do the joins much more easily (and more legibly).


You could also drop the first rows (headers) from each of the two data sets.


Then you could

{:products (m/scan {:as ?product ,,,})
 :tags (m/scan {:name (m/pred #(contains ,,,) ,,,})}

Lidor Cohen18:08:26

Great I was wondering where I should do the cleaning, now I have an answer 😁


LMK if you need help with next steps. 🙂

Lidor Cohen23:08:43

So I came up with this:

(m/search data

            {:products (m/scan {"Product Name" ?product-name
                                "Meta Tag Description" ?description
                                "Price" ?price
                                "Image(Main image)" ?media
                                "Length" ?length
                                "Width" ?width
                                "height" ?height
                                "Weight" ?weight
                                "Manufacturer" ?manufacturer
                                "Product Tags" ?product-tags
                                "Categories id" ?categories-id})

             :categories (m/scan {"Category ID" (m/pred #(some #{%} (split ?categories-id ";")))
                                  "Category Name" ?category-name
                                  "Parent" ?parent})}

            {"name" ?product-name
             "description" ?description
             "price" ?price
             "media" {"data" {"src" ?media}}
             "categories-id" (split ?categories-id ";")
             "category-name" [?category-name]})
And it seems that the second scan emits every match as its own entry so instead of this:
[{"name" "some-name"
 "description" "some long description"
 "price" 42
 "media" {"data" {"src" ""}}
 "product_tags" [{"name" "catcat" "parent" "catcat's mom"}
                 {"name" "catcat's mom" "parent" ""}
I get this:
[{"name" "some-name"
 "description" "some long description"
 "price" 42
 "media" {"data" {"src" ""}}
 "product_tags" [{"name" "catcat" "parent" "catcat's mom"}]}
{"name" "some-name"
 "description" "some long description"
 "price" 42
 "media" {"data" {"src" ""}}
 "product_tags" [{"name" "catcat's mom" "parent" ""}]}
which actually makes sense as I understand meander better, but for my task I need to aggregate all of the tags under the relevant product's "product_tags"

Lidor Cohen07:08:31

I figured what I'm looking for is a join + group by

Lidor Cohen08:08:18

I found a previous answer that suggests to do the group-by out side of meander. can't wait to see how far will you go with meander, for now I'm happy with the current solution, thank you! P.S I invested time learning meander as I believe in it as a general declarative solution for the complicated data transformations in our company, so... rooting for ya!