Fork me on GitHub
#beginners
<
2024-02-11
>
Omer ZAK00:02:28

I am trying to understand transducers. I noticed that some uses of transducers reduce a sequence to a single item (such as (transduce xform f init coll)). Other uses transform a sequence into another sequence by filtering some values and applying a function to other values (such as eduction). However, I found (at least in https://clojure.org/reference/transducers) no example of a transducer which expands a sequence by stuffing extra values into it. How to implement such a transducer? As a concrete example: how to write a transducer which given a sequence, create another sequence, in which each element of the input sequence having the numerical value N is repeated N times. The following is an example of the desired behavior of the transducer. Input: '(1 2 3 "a" :3 "2" 2) Output: '(1 2 2 3 3 3 "a" :3 "2" 2 2)

Sam Ferrell00:02:47

(sequence (mapcat (fn [x] (range x 5))) [1 2 3]) works by returning a collection and cating it

Bob B00:02:50

the short answer is mapcat

phronmophobic00:02:50

The shorter answer is cat

catjam 3
Omer ZAK03:02:25

Thanks, the following worked for me:

(defn multifrob
  "Given argument, repeat it N times if it is integer having value N.
  Otherwise, return the argument."
  [arg]
  (if (integer? arg)
    (take arg (repeat arg))
    (list arg)))

(defn transmultifrob
  "Apply multifrob to data."
  [data]
  (sequence (mapcat multifrob data)))
However, a version which uses cat did not work for me. When trying to understand it, I found that the Clojure documentation is very unclear about the argument to cat. (doc cat) states that the argument must be a rf, and the same is true also for https://clojure.github.io/clojure/clojure.core-api.html#clojure.core/cat Neither definition says what rf should be. Please help, preferably by helping me locate the documentation which explains rf and the other abbreviations used as argument names in outputs of (doc ....).

Bob B04:02:10

rf is a reducing function. As you've seen, transducers take a reducing function and return a new reducing function. If we look at something like map, the one-arg arity returns a transducer, and if we look at the code for the one-arg arity, we see it returns something that starts with (fn [rf]..., and rf here represents the same general idea. The difference is that where things like map and filter need a function to use before they return a transducer, cat doesn't have that property... it only concatenates, there are no arguments needed to make it a transducer. When cat is composed with other transducers, e.g. using comp, it can be used as just cat, as opposed to calling map with an argument.

Bob B04:02:01

So, for a toy example, maybe we want to get a range of numbers for each number in some input:

(let [inputs [5 6 7]
      xf     (map range)]
  (sequence xf inputs))
=> ((0 1 2 3 4) (0 1 2 3 4 5) (0 1 2 3 4 5 6))
then, let's say we only want the ranges that have an odd length:
(let [inputs [5 6 7]
      xf     (comp (map range) (filter #(odd? (count %))))]
  (sequence xf inputs))
=> ((0 1 2 3 4) (0 1 2 3 4 5 6))
and finally, we want to mash those ranges together into one big sequence:
(let [inputs [5 6 7]
      xf     (comp (map range) (filter #(odd? (count %))) cat)]
  (sequence xf inputs))
=> (0 1 2 3 4 0 1 2 3 4 5 6)
note that we didn't call cat while composing it... cat is itself a function that takes a reducing function, so its 'value' if ostensibly (fn [rf]... , but map and filter don't have a 'value' of (fn [rf]..., they return (fn [rf]... when called with one argument.

sheluchin13:02:30

Given a large collection of homogeneous maps, what's an efficient way to rename the keys?

phill13:02:46

For a sufficiently large collection, the most efficient thing is to not rename the keys.

😂 3
daveliepmann13:02:43

It depends but the first thing I'd ask is if https://clojuredocs.org/clojure.set/rename-keys does the trick

Noah Bogart14:02:26

If you have maps with guaranteed keys and you want a simple one to one transformation, mapv will do it: (mapv (fn [m] {:new1 (m :old1) :new2 (m :old2) ...}) coll)

Noah Bogart14:02:55

If the input contains nils , use keep as a transducer: (into [] (keep (fn [m] (when m {…}))) coll)

Noah Bogart14:02:02

Rename-keys is really really slow comparatively, because it has to walk the old->new map every time and it uses variadic dissoc

teodorlu14:02:55

> large collection of homogeneous maps When seq-of-maps is getting slow, I’d consider using a dataset instead of a seq of maps. You can think of a dataset as an effective way to store a seq of maps. I’ve used https://github.com/scicloj/tablecloth a bit. To change column names (“rename map keys”), you can use https://scicloj.github.io/tablecloth/#map. Raw https://github.com/techascent/tech.ml.dataset is another option. --- Other than that, I’d map over the collection and clojure.set/rename-keys as mentioned above!

Bob B17:02:21

there's also update-keys in clojure.core if the rename is more a function of the original name

(update-keys {"abc" 1 "def" 2} str/reverse)
=> {"cba" 1, "fed" 2}

Danil Shingarev15:02:29

I feel like it’s a very dumb question. I was playing with integrant this week for my learning project, and I love a lot about what it has to offer, but I’m unsure if I’m using it correctly in the context of my app that I’m building now, need some guidance (details in thread)

Danil Shingarev15:02:48

Context: • the app is going to be an CLI tool that processing a lot (~100k) json files and extract various structured elements out of it. • Every file has “entitiy-id” and refer to other files by their ids • I’m using datalevin as intermittent storage to index everything into kv database and then going to make queries to that db.

Danil Shingarev15:02:53

What I like is how integrant lets me keep various component configurations in edn. Like this:

{[:db/kv :db/reference-data] {:name "rt-db"}
[:table/kv :table/strings] {:name "strings" :db #ig/ref :db/reference-data}
[:reader/json :reader/strings] {:path "data/Strings/", :filters ["Names" "Descriptions"]}
[:table/kv :table/assets] {:name "assets" :db #ig/ref :db/reference-data}
[:reader/json :reader/assets] {:path "data/Assets/", :filters []}}
The idea is that I’d have components for db/kv (init datalevin storage), table/kv (table operations wrapper), reader/json (read files from path). And then I’d use those components in my application code:
;; after init state
((:table/strings ig-state/system) d/get-value "entity-id")

Danil Shingarev15:02:47

But I don’t feel like integrant is meant to be used this way: Composite components are tedious to request from ig-state/system, I have to use custom function to extract component by partial key:

(defn component [suffix]
  (let [system-map ig-state/system
        full-key (first (filter #(= suffix (last %)) (keys system-map)))]
    (when full-key
      (get-in system-map [full-key]))))
System initiates all the components, where I’d prefer it to not do anything until I request a specific component In all integrant examples/articles, I feel like it’s geared toward web applications or applications with state, I’m primarily interested in DI part of the library, and don’t care about the state too much.

Danil Shingarev15:02:26

The only way out that I see is defining higher-level components as well ie.

;; for system.edn
 :processor/strings {:reader #ig/ref :reader/strings, :table #ig/ref :table/strings}
 :processor/assets {:reader #ig/ref :reader/assets, :table #ig/ref :table/assets}
 :cli-cmd/process {:processors [:processor/strings :processor/assets]}
 :cli-cmd/query {:table/strings #ig/ref :table/strings, :table/assets #ig/ref :table/assets
...
With my “architecture”, just initializing some of these components will take time, and I really don’t want to init :cli-`cmd` dependencies if I’m running the app for query command Am I using a wrong tool for this problem, am I using the tool incorrectly, or what am I missing?

Danil Shingarev21:02:13

Little update after today’s session. I got into issues with my naive async processing approach, and now I’m thinking about reorganizing my code into series of core.async processes that run independently and communicate via channels. So it makes little bit more sense to load components on init and keep them online.

practicalli-johnny21:02:04

https://github.com/donut-party/system may be a more relevant approach if the concern is still aimed more toward DI (although system can also manage the state too)

Danil Shingarev21:02:46

looks like what I’m looking for. I’ll study it. Thank you!

Danil Shingarev22:02:14

assuming you’re responsible for https://practical.li/, thank you for that! it gave me (as a beginner) more understanding of integrant than official docs.

👍 1
practicalli-johnny07:02:55

Yes, that's me. Thank you. I did a page on using donut, although it's official docs are very good too https://practical.li/clojure-web-services/service-repl-workflow/donut-system/