Fork me on GitHub
#meander
<
2022-08-23
>
dbj12:08:45

Hello! I ran into some kind of performance issue with the code below. The time to execute seems to increase exponentially for every m<nr> I add on the LHS. The code below takes about 10secs to execute. Any ideas?

(m/rewrite {0 {:id 0
                   :m1 {:a {:id :a, :x 0}}
                   :m2 {:b {:id :b, :x 0}}
                   :m3 {:c {:id :c, :x 0}}}
                1 {:id 1
                   :m1 {:a {:id :a, :x 0}}
                   :m2 {:b {:id :b, :x 0}}
                   :m3 {:c {:id :c, :x 0}}}}
      (m/map-of _
                {:id !id
                 :m1 {& (m/seqable [!m1-ids !m1s] ..!m1-cnt)}
                 :m2 {& (m/seqable [!m2-ids !m2s] ..!m2-cnt)}
                 :m3 {& (m/seqable [!m3-ids !m3s] ..!m3-cnt)}})

      [{:id !id
        :m1-ids [!m1-ids ..!m1-cnt]
        :m2-ids [!m2-ids ..!m2-cnt]
        :m3-ids [!m3-ids ..!m3-cnt]
        } ...])

dbj12:08:52

It seems the be the compilation that takes time.

dbj20:08:18

I spent some time narrowing it down to test-m2being slow compared to the others below

;; fast
(defn test-m1 [x]
    (m/rewrite x
               (m/map-of _
                         {:id !id
                          :m1 [!m1s ...]
                          :m2 [!m2s ...]
                          :m3 [!m3s ...]
                          :m4 [!m4s ...]})
               [{:id !id
                 } ...]))
;; slow
(defn test-m2 [x]
    (m/rewrite x
               (m/map-of _
                         {:id !id
                          :m1 [[!m1-ids !m1s] ...]
                          :m2 [[!m2-ids !m2s] ...]
                          :m3 [[!m3-ids !m3s] ...]
                          :m4 [[!m4-ids !m4s] ...]
                          })
               [{:id !id
                 } ...]))
;; fast
(defn test-m3 [x]
  (m/rewrite x
    {:id !id
     :m1 [[!m1-ids !m1s] ...]
     :m2 [[!m2-ids !m2s] ...]
     :m3 [[!m3-ids !m3s] ...]
     :m4 [[!m4-ids !m4s] ...]
     }
    [{:id !id
      } ...]))

dbj20:08:33

test-m1 is faster than test-m2 because it has fewer memory variables. Repeating the pattern upto m8 in test-m1 will make it perform as test-m2. test-m3 does not show the same behaviour. So my working conclusion is that it is related to map-of and the amount of memory variables

dbj06:08:40

I guess this could be a similar issue as #234 https://github.com/noprompt/meander/issues/234#issue-1292196144 I will leave it at that for now and try a different solution.

Lidor Cohen13:08:59

Hello! I'm starting to learn my way through meander and I hoped someone could help me with (what I believe is) a simple use case I struggle with: I have 2 csvs (vector of vectors) with one holding a vector of ids in the other csv (tags). I want to collect all the tags that match the relevant item. Something like this, but I haven't wrapped my head around matching \ collecting \ spreading:

(defn data-mapper [data]
  (m/search data

            {:data (m/scan _ ?product)
             :tags (m/scan _ (m/pred #(contains? (split (?product 36) ";") (% 0)) tag?))}

            {:products {"name" (?product 12)
                        "description" (?product 13)
                        "price" (js/parseInt (?product 17))
                        "media" {"data" {"src" (?product 24)}}
                        "product_tags" {"data" [{"name" (tag? 3)} ...]}}}))
If anyone can point in the right direction that would be great ^_^

noprompt17:08:56

Hi Lidor, are you able to provide a sample of your input data and expected output data? Also, note that it is only safe to use ?product in the m/pred function like that for small maps e.g. PersistentArrayMap.

Lidor Cohen17:08:20

well it will be pretty hard because the input is quite dirty, its is basically a parsed csv (vector of vectors) with 54 columns and the first vector being the headers, so something like this:

{:products [["name" "some" "unimportant" "values" ... "description" ... "price" ... "media" ... "tags" ...]
            ["some-name" "bla"  "bla" "bla" ... "some long description" ... "42" ... "" ... "238;239;756;785;1111;" ...]
            ...]
 :tags     [["Category ID" ... "Category Name" ... "Parent" ...]
            ["238" ... "catcat" ... "catcat's mom"]
            ["239" ... "catcat's mom" ... ""]
            ...]}
And the output should look something like this:
[{"name" "some-name"
 "description" "some long description"
 "price" 42
 "media" {"data" {"src" ""}}
 "product_tags" [{"name" "catcat" "parent" "catcat's mom"}
                 {"name" "catcat's mom" "parent" ""}
                 ...]}
 ...
]
I'm guessing I didn't use meanderright for this task, I'm just starting to learn its deeper powers 😅

Lidor Cohen17:08:02

I could clean the input before inputting into meander but I was hoping to be able to use meander for that as well...

noprompt17:08:12

For the CSV, personally, I would zipmap the fields or pluck them out with nth, etc. in a preprocessing step and then run them through meander.

noprompt17:08:11

I would also index the tags as well. You can then do the joins much more easily (and more legibly).

noprompt17:08:24

You could also drop the first rows (headers) from each of the two data sets.

noprompt17:08:16

Then you could

{:products (m/scan {:as ?product ,,,})
 :tags (m/scan {:name (m/pred #(contains ,,,) ,,,})}

Lidor Cohen18:08:26

Great I was wondering where I should do the cleaning, now I have an answer 😁

noprompt18:08:26

LMK if you need help with next steps. 🙂

Lidor Cohen23:08:43

So I came up with this:

(m/search data

            {:products (m/scan {"Product Name" ?product-name
                                "Meta Tag Description" ?description
                                "Price" ?price
                                "Image(Main image)" ?media
                                "Length" ?length
                                "Width" ?width
                                "height" ?height
                                "Weight" ?weight
                                "Manufacturer" ?manufacturer
                                "Product Tags" ?product-tags
                                "Categories id" ?categories-id})

             :categories (m/scan {"Category ID" (m/pred #(some #{%} (split ?categories-id ";")))
                                  "Category Name" ?category-name
                                  "Parent" ?parent})}


            {"name" ?product-name
             "description" ?description
             "price" ?price
             "media" {"data" {"src" ?media}}
             "categories-id" (split ?categories-id ";")
             "category-name" [?category-name]})
And it seems that the second scan emits every match as its own entry so instead of this:
[{"name" "some-name"
 "description" "some long description"
 "price" 42
 "media" {"data" {"src" ""}}
 "product_tags" [{"name" "catcat" "parent" "catcat's mom"}
                 {"name" "catcat's mom" "parent" ""}
                 ...]}
 ...
]
I get this:
[{"name" "some-name"
 "description" "some long description"
 "price" 42
 "media" {"data" {"src" ""}}
 "product_tags" [{"name" "catcat" "parent" "catcat's mom"}]}
{"name" "some-name"
 "description" "some long description"
 "price" 42
 "media" {"data" {"src" ""}}
 "product_tags" [{"name" "catcat's mom" "parent" ""}]}
 ...
]
which actually makes sense as I understand meander better, but for my task I need to aggregate all of the tags under the relevant product's "product_tags"

Lidor Cohen07:08:31

I figured what I'm looking for is a join + group by

Lidor Cohen08:08:18

I found a previous answer that suggests to do the group-by out side of meander. can't wait to see how far will you go with meander, for now I'm happy with the current solution, thank you! P.S I invested time learning meander as I believe in it as a general declarative solution for the complicated data transformations in our company, so... rooting for ya!