This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2024-02-11
Channels
I am trying to understand transducers.
I noticed that some uses of transducers reduce a sequence to a single item (such as (transduce xform f init coll)
).
Other uses transform a sequence into another sequence by filtering some values and applying a function to other values (such as eduction
).
However, I found (at least in https://clojure.org/reference/transducers) no example of a transducer which expands a sequence by stuffing extra values into it. How to implement such a transducer?
As a concrete example: how to write a transducer which given a sequence, create another sequence, in which each element of the input sequence having the numerical value N is repeated N times. The following is an example of the desired behavior of the transducer.
Input: '(1 2 3 "a" :3 "2" 2)
Output: '(1 2 2 3 3 3 "a" :3 "2" 2 2)
(sequence (mapcat (fn [x] (range x 5))) [1 2 3])
works by returning a collection and cat
ing it
Thanks, the following worked for me:
(defn multifrob
"Given argument, repeat it N times if it is integer having value N.
Otherwise, return the argument."
[arg]
(if (integer? arg)
(take arg (repeat arg))
(list arg)))
(defn transmultifrob
"Apply multifrob to data."
[data]
(sequence (mapcat multifrob data)))
However, a version which uses cat
did not work for me.
When trying to understand it, I found that the Clojure documentation is very unclear about the argument to cat
.
(doc cat)
states that the argument must be a rf
, and the same is true also for https://clojure.github.io/clojure/clojure.core-api.html#clojure.core/cat
Neither definition says what rf
should be.
Please help, preferably by helping me locate the documentation which explains rf
and the other abbreviations used as argument names in outputs of (doc ....)
.rf
is a reducing function. As you've seen, transducers take a reducing function and return a new reducing function. If we look at something like map
, the one-arg arity returns a transducer, and if we look at the code for the one-arg arity, we see it returns something that starts with (fn [rf]...
, and rf
here represents the same general idea. The difference is that where things like map
and filter
need a function to use before they return a transducer, cat
doesn't have that property... it only concatenates, there are no arguments needed to make it a transducer. When cat
is composed with other transducers, e.g. using comp
, it can be used as just cat
, as opposed to calling map with an argument.
So, for a toy example, maybe we want to get a range of numbers for each number in some input:
(let [inputs [5 6 7]
xf (map range)]
(sequence xf inputs))
=> ((0 1 2 3 4) (0 1 2 3 4 5) (0 1 2 3 4 5 6))
then, let's say we only want the ranges that have an odd length:
(let [inputs [5 6 7]
xf (comp (map range) (filter #(odd? (count %))))]
(sequence xf inputs))
=> ((0 1 2 3 4) (0 1 2 3 4 5 6))
and finally, we want to mash those ranges together into one big sequence:
(let [inputs [5 6 7]
xf (comp (map range) (filter #(odd? (count %))) cat)]
(sequence xf inputs))
=> (0 1 2 3 4 0 1 2 3 4 5 6)
note that we didn't call cat
while composing it... cat
is itself a function that takes a reducing function, so its 'value' if ostensibly (fn [rf]...
, but map and filter don't have a 'value' of (fn [rf]...
, they return (fn [rf]...
when called with one argument.Given a large collection of homogeneous maps, what's an efficient way to rename the keys?
For a sufficiently large collection, the most efficient thing is to not rename the keys.
It depends but the first thing I'd ask is if https://clojuredocs.org/clojure.set/rename-keys does the trick
If you have maps with guaranteed keys and you want a simple one to one transformation, mapv will do it: (mapv (fn [m] {:new1 (m :old1) :new2 (m :old2) ...}) coll)
If the input contains nils , use keep as a transducer: (into [] (keep (fn [m] (when m {…}))) coll)
Rename-keys is really really slow comparatively, because it has to walk the old->new map every time and it uses variadic dissoc
> large collection of homogeneous maps
When seq-of-maps is getting slow, I’d consider using a dataset instead of a seq of maps. You can think of a dataset as an effective way to store a seq of maps.
I’ve used https://github.com/scicloj/tablecloth a bit. To change column names (“rename map keys”), you can use https://scicloj.github.io/tablecloth/#map.
Raw https://github.com/techascent/tech.ml.dataset is another option.
---
Other than that, I’d map over the collection and clojure.set/rename-keys
as mentioned above!
there's also update-keys
in clojure.core if the rename is more a function of the original name
(update-keys {"abc" 1 "def" 2} str/reverse)
=> {"cba" 1, "fed" 2}
I feel like it’s a very dumb question. I was playing with integrant this week for my learning project, and I love a lot about what it has to offer, but I’m unsure if I’m using it correctly in the context of my app that I’m building now, need some guidance (details in thread)
Context: • the app is going to be an CLI tool that processing a lot (~100k) json files and extract various structured elements out of it. • Every file has “entitiy-id” and refer to other files by their ids • I’m using datalevin as intermittent storage to index everything into kv database and then going to make queries to that db.
What I like is how integrant lets me keep various component configurations in edn. Like this:
{[:db/kv :db/reference-data] {:name "rt-db"}
[:table/kv :table/strings] {:name "strings" :db #ig/ref :db/reference-data}
[:reader/json :reader/strings] {:path "data/Strings/", :filters ["Names" "Descriptions"]}
[:table/kv :table/assets] {:name "assets" :db #ig/ref :db/reference-data}
[:reader/json :reader/assets] {:path "data/Assets/", :filters []}}
The idea is that I’d have components for db/kv (init datalevin storage), table/kv (table operations wrapper), reader/json (read files from path).
And then I’d use those components in my application code:
;; after init state
((:table/strings ig-state/system) d/get-value "entity-id")
But I don’t feel like integrant is meant to be used this way: Composite components are tedious to request from ig-state/system, I have to use custom function to extract component by partial key:
(defn component [suffix]
(let [system-map ig-state/system
full-key (first (filter #(= suffix (last %)) (keys system-map)))]
(when full-key
(get-in system-map [full-key]))))
System initiates all the components, where I’d prefer it to not do anything until I request a specific component
In all integrant examples/articles, I feel like it’s geared toward web applications or applications with state, I’m primarily interested in DI part of the library, and don’t care about the state too much.The only way out that I see is defining higher-level components as well ie.
;; for system.edn
:processor/strings {:reader #ig/ref :reader/strings, :table #ig/ref :table/strings}
:processor/assets {:reader #ig/ref :reader/assets, :table #ig/ref :table/assets}
:cli-cmd/process {:processors [:processor/strings :processor/assets]}
:cli-cmd/query {:table/strings #ig/ref :table/strings, :table/assets #ig/ref :table/assets
...
With my “architecture”, just initializing some of these components will take time, and I really don’t want to init :cli-`cmd` dependencies if I’m running the app for query
command
Am I using a wrong tool for this problem, am I using the tool incorrectly, or what am I missing?Little update after today’s session. I got into issues with my naive async processing approach, and now I’m thinking about reorganizing my code into series of core.async processes that run independently and communicate via channels. So it makes little bit more sense to load components on init and keep them online.
https://github.com/donut-party/system may be a more relevant approach if the concern is still aimed more toward DI (although system can also manage the state too)
looks like what I’m looking for. I’ll study it. Thank you!
assuming you’re responsible for https://practical.li/, thank you for that! it gave me (as a beginner) more understanding of integrant than official docs.
Yes, that's me. Thank you. I did a page on using donut, although it's official docs are very good too https://practical.li/clojure-web-services/service-repl-workflow/donut-system/