Fork me on GitHub
#clojure
<
2023-11-24
>
Jonas Östlund20:11:09

I frequently want to perform a group-by but also perform some operation on the elements being grouped. For instance, I may want to group the entries of a map by a function applied to their keys and form groups with their values. I might have

(group-by (fn [[k v]] (mod k 3)) {1 "one" 2 "two" 3 "three" 4 "four" 5 "five"})
but want the result to be
{1 ["one" "four"], 2 ["two" "five"], 0 ["three"]}
instead of the groups being the key-value pairs. One way to accomplish this would be to have a variation of group-by that accepts a transducer, just like into does. Something like this, maybe:
(defn group-by2
  ([f xform coll]
   (let [step (xform conj)]
     (persistent!
      (reduce
       (fn [ret x]
         (let [k (f x)]
           (assoc! ret k (step (get ret k (step)) x))))
       (transient {}) coll)))))
and then call it like this:
(group-by2 (fn [[k v]] (mod k 3)) 
           (map val)
           {1 "one" 2 "two" 3 "three" 4 "four" 5 "five"})
in order to get the result that I want. How do you approach these situations?

Ben Sless20:11:54

your approach is correct, but notice that you can generalize it by passing a rf instead of a xform then you don't have to accumulate into (conj) which is an empty vector

pppaul20:11:15

Prismatic plumbing has a function like this. But I find groupby followed by update-vals is usually OK. I don't use the more complicated groupby in prismatic. Also specter offers easy transformations, but at the cost of learning specter

Jonas Östlund21:11:08

Thanks! Good advice. I also realized now that my implementation group-by2 may not work correctly if the transducer is stateful because the same state will be shared between different keys. And flushing would be needed, too.

nivekuil22:11:52

seconding xforms, by-key is very useful

DrLjótsson20:11:56

I'm running mysqldump from Clojure to compress a MySQL database using clojure.java.shell/sh. Right now I'm receiving the dump file in the :out key. What would be the best way to (1) save the dump file to disk instead and (2) compress the data using gzip? The file can be several hundreds of MB, should I somehow stream it to the compression program?

p-himik21:11:22

Build a command that pipes the data into gzip and then into a file. You'll probably need to use sh -c for that.

DrLjótsson21:11:25

How do I pipe data using sh ? 🙂

p-himik21:11:48

Something like (shell/sh "sh" "-c" "mysqldump ... | gzip -c c > /tmp/data.gzip").

DrLjótsson21:11:50

Right now I'm doing (apply shell/sh "mysqldump" vec-of-arguments-to-mysqldump)

DrLjótsson21:11:47

Oh, so using -c I can paste the full mysqldump command?

DrLjótsson21:11:59

It ran, but the resulting file was 0 bytes

DrLjótsson21:11:38

The dump works, if I use > dump.sql at the end instead of piping to gzip, I get the dump file.

DrLjótsson21:11:54

Oh, I removed the extra c in your example and then it worked perfectly

DrLjótsson21:11:22

Seems to work even without an additional -c (shell/sh "sh" "-c" "mysqldump ... | gzip > /tmp/data.gzip")

DrLjótsson21:11:54

mysqldump warns that providing the password on the command line is insecure. Is this a real issue when sending commands to the shell like this? If it's an issue, can I somehow "send in" an option file (from a string) with the password (as described here https://dev.mysql.com/doc/refman/8.0/en/password-security-user.html) or must I create the option file with the password, pass it as argument, and then delete it?

p-himik21:11:16

Oh, sorry - it was supposed to be gzip -c -. But gzip seems to be clever enough so that it doesn't need all that.

p-himik21:11:30

> Is this a real issue when sending commands to the shell like this? It is - any other software running under the same user will be able to see the password while that sh -c ... is running. Also, if that command is logged by your system or app, anything that can access those logs will also be able to see the password in plain text. An option file is probably more safe, but isn't perfect unless the user that's used to run your app has its files inaccessible to anything else. The best solution is probably to send the password to the mysqldump process directly, if it supports that - e.g. via its stdin.

p-himik21:11:41

Just took a look at that link - yeah, it pretty much says the same thing.

DrLjótsson21:11:19

There doesn’t seem to be a way to pass the password through stdin?

p-himik23:11:04

Probably, I don't know the MySQL client well enough. BTW another potential approach here could be embedding the client library itself via JNI. But that's much more of a hassle, you probably don't need it.

👍 1
Caio Cascaes21:11:18

I've once seen this kind of assertion:

(is (= (-> response :body (json/read-str :key-fn keyword))
               [{:paper-quantity-standard 512
                 :nickname                "Galactic printer"
                 :brand                   "EPSON"
                 :prints-color?           true
                 :papers-quantity-photo   35
                 :prints-photo?           true
                 :public-id               logic.common/valid-uuid-str?
                 :model                   "GalaxyPrint 3000"}]))
That we could place a pure function to "nest" testing, such this part :public-id logic.common/valid-uuid-str? How can I achieve this? Which lib? Or whatever? Note #1: I'm not sure if it's (is (=... , maybe something with ...equals... Note #2: I made this question in #C0JKW8K62 but it seemed not correct place for that, so making in the proper place

vemv22:11:49

I use matcher-combinators from time to time. Athough using nested destructuring can get you surprisingly far, e.g.:

(let [[{:keys [public-id]
        :as parsed-response}
       :as all]
      (-> response :body (json/read-str :key-fn keyword))]
  (testing (pr-str all)
    (is (logic.common/valid-uuid-str? public-id))))
Pros: • It's just clojure, you already know it • You get to master / continuously exercise destructuring skills - otherwise it's easy to lose this 'muscle' • Universal integration with test runners

Caio Cascaes23:11:59

Thank you very much!!

Caio Cascaes00:11:13

Worked! Thanks again!

Caio Cascaes00:11:16

What I've seen once was that likely: (is (match? {:alpha keyword?} {:alpha :beta}))