This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-09-02
Channels
- # aleph (25)
- # announcements (17)
- # aws (2)
- # babashka (72)
- # beginners (44)
- # calva (6)
- # cider (3)
- # clj-kondo (109)
- # cljfx (1)
- # cljsrn (31)
- # clojure (151)
- # clojure-austin (1)
- # clojure-europe (36)
- # clojure-nl (5)
- # clojure-norway (2)
- # clojure-spec (17)
- # clojure-uk (12)
- # clojurescript (74)
- # cursive (57)
- # data-science (1)
- # datascript (28)
- # datomic (40)
- # depstar (15)
- # gratitude (3)
- # helix (3)
- # introduce-yourself (1)
- # joker (1)
- # kaocha (2)
- # leiningen (2)
- # lsp (70)
- # lumo (2)
- # malli (2)
- # meander (4)
- # off-topic (10)
- # polylith (27)
- # quil (4)
- # re-frame (18)
- # reagent (24)
- # ring (4)
- # rum (1)
- # shadow-cljs (102)
- # sql (2)
- # tools-deps (48)
- # web-security (8)
- # xtdb (5)
Someone in here recently said something like, “Don’t do I/O in a transducer, but you can do it in a reducing fn.” Am I remembering that right? Anyone have a rationale or a source for that?
> Doing IO in a transducer is putting a suspendible, memory allocating, and long running operation on a background thread pool designed for the opposite.
IIUC this is regarding the use of transducers on chans (mostly). I/O doesn’t line up semantically with a transducer, but there’s no functional issue with doing I/O on a local into
xform. Is that right?
Not sure what you mean by "functional issue" exactly, but doing IO in a transducer will work. Just like building your whole app out of singletones. ;)
(Local being an operative word here. Locality means this xform cannot be reused in another context.)
Right. The same 100% you'd get when doing IO elsewhere, assuming you know exactly when and where that IO ends up being done.
Yeah, I’m not really recommending this as a pattern. I just wanted to make sure I fully understand what’s being said.
Although, I am curious what people do when they need to chain I/O calls (e.g. for pagination). I’ve seen—and been bitten by—`lazy-seq`. (into [] (mapcat fetch-page) pages)
works, but breaks said rule. Is everyone really loop/recurring around this problem?
Hold on. Why is (into [] (mapcat fetch-page) pages)
breaking the rule? In the end, fetch-page
becomes a part of the reducing function - it's not called in the transducer itself.
This is the transducer arity of map
:
([f]
(fn [rf]
(fn
([] (rf))
([result] (rf result))
([result input]
(rf result (f input)))
([result input & inputs]
(rf result (apply f input inputs)))))
f
is your fetch-page
.
(fn [rf] ...)
is the transducer.
(fn ...)
is the reducer.
f
is not called in the transducer, it's called only in the reducer.K, so you’re argument is, “the last step of a transducer chain is technically part of the reducer” is that correct?
The last step of a transducer chain is another transducer.
into []
reduces its input using the transducer chain. Transducers don't even know about the into
.
(mapcat fetch-page)
- this is your whole transducer chain in the example above.
This is another example:
(comp
(mapcat fetch-page)
(filter seq))
No reduction is going anywhere in that code. fetch-page
is never called. It's only called when you reduce over some input with one of those transducers.Transducers are similar to the threading macros, only they're done specifically in runtime and specifically on reducing functions.
Are you suggesting that the only I/O you should avoid is during initialization of a transducer chain?
The reducer is part of into
, yes.
Both the "transducer" and the "reducer" have very precise definitions, as far as I can tell.
The initial thesis explicitly states that one should not do IO in the former and instead should do it in the latter.
So yes, you should not do IO when turning one reducer into another in a transducer's body.
In other words, the example with (into [] ...)
above is completely fine, in the context of the initial thesis.
> you should not do IO when turning one reducer into another in a transducer’s body
Let me make sure we’re clear here (because I’m not certain about the definitions). The (fn [rf] ..)
form is called exactly once, immediately prior to execution, is this correct?
And, once that form is called on an xform, a fn chain is returned that will be used to do the transformation.
In the context of the above example with into
- yes. In some other situation, it might also be called immediately, but the resulting reducer execution might be delayed.
And the thesis is not mine. :) It's from that thread. I'm just trying to convey its meaning using the existing definitions.
So what you’re saying is that as long as you return that fn chain w/o doing any I/O, you’re good, is that correct?
Ok, my understanding was:
Doing I/O in a mapcat
was bad because, for example, you can hand it to a chan and then you’re doing I/O on the chan’s threadpool—which is reserved for coordination.
It had nothing to do with initialization, and everything to do with where an I/O fn happens to be called.
Not "you're good", but "you're not bad" from the perspective of the thesis. You can still get screwed by sticking that transducer into sequence
just because now all your IO is lazy - when you might not expect it.
Yes. But I wouldn't conflate it with channels. In principle, computation composition should be pure. That's it, that's all there is to it. You don't want your ->
macro to start doing DB queries.
From impurity in that context all sorts of nasty things might pop up, including hogging the main thread or a thread from a reserved/limited pool.
I mean, I think we agree about the usage (It’s fine. It will work.). But I have no idea how you perceive that my example is in line with the original statement.
yeah, but you later said that, “well it’s not good to hand that same transducer to sequence
because it’s lazy”
It's a completely different topic - you can get screwed by lazy sequences if you use them. The transducers are fine, the reducer is fine. But laziness might bite you if you are not careful.
"IO in transducers" is orthogonal to "IO in lazy contexts". Above, I was just nitpicking at the wording "you're good". Because you aren't magically out of the water if you do IO in a reducer - because other things might happen to you, still.
> The composed xf transducer will be invoked left-to-right with a final call to the reducing function f. In the last example, input values will be filtered, then incremented, and finally summed.
e.g. that sentence talks about the “composed xf transducer” as a the thing that’s returned from the initializer
I use the definition exactly from the "Terminology" section of that page.
You quoted the section that uses transduce
- it calls both the transducer and the resulting reducer. Take a look at its implementation.
actually, I’m pretty sure the reducing fn is, e.g. the last thing you pass to transduce
The thesis operates on transducers and reducers. They are both well defined. transduce
is a separate entity.
I’m saying that the third arg to transduce—for example—is the only reducer in the form.
It takes an xform and a separate reducing fn. Nothing in the xform arg would be called a “reducer”.
I assume you mean the second arg. Because these are the arities of transduce
:
[xform f coll]
[xform f init coll]
f
is a reducer, yes. xform
is a transducer. The very first thing transduce
does is replacing f
with (xform f)
- actually executing the transducer, the step where IO should not be done.
And that's my point exactly. Transducers do not call the reducer. It's completely fine to have IO in fetch-page
and use it in (mapcat fetch-page)
, because fetch-page
will not be called there. It will be called only during the reduction phase, whenever it happens.I think you already said that, but I’m removing a lot of the details to see if we agree 😄
i.e. because into
immediately starts reducing everything, by definition, my transduction is in the reduction phase.
It's not about eagerness. I'm sorry I've brought up lazy collections - forget about them. Forget about being eager either. And about threads and channels.
There are two things of interest going on:
• Computation composition (always higher-order functions - transducers in our case, but could be comp
, partial
, complement
, etc). The way to define a computation in advance, without running it
• Running the computation (reduction in our case)
The first step should only deal with what it's named after. Compose the "recipe" out of existing functions. It should not "do" anything.
The second step should do all the "doing".
The first step can be removed from the second in any sense of that word - they can be separated in time, in place, or even in languages. And it's fine exactly thanks to the first step not "doing" anything.
The whole point of this is that during execution you shouldn’t do I/O if you’re on a chan.
iow — my understanding of the original statement is, “Because you don’t know where execution is going to happen, you shouldn’t assume it’s okay to do I/O anywhere in an xform. They’re designed to be portable, and I/O is fundamentally anti-portable.”
Your statement is a specific case of a more general description that I've attempted to provide.
This is really cool stuff! I'm under the impression you could simplify the contract a lot
(iteration step! {:vf vf :kf kf :some? s? :initk initk}
is the same as
(->> (iteration (comp (fnil step! initk) kf))
(take-while s?)
(map vf))
(sorry for the edits, newlines on slack are hard)
Well, it's mostly the same maybe, with kf
now having to handle nils and all. But the point is still that most of these extra keys could be handled elsewhere. I guess I'd keep initk
for this to correspond to reduce
nicely.
it's an abstracted version of what's in almost every lib that has to handle pagination (like the common jdbc drivers & other db libs)
I think next.jdbc does this internally, the c* driver too, and the early jdbc driver from ghadi, squee, was doing the same (probably the first lib doing that in the wild that I know of)
I remember an interesting argument from tim baldridge. He argued that when several functions work together like this he preferred making a protocol. I think it was in a video rather than an article but i'd like to rewatch that.
i think it was on the video site he was publishing on but i don't think it was a core.async talk specifically
instead of: Fetch Page 1, Consume page 1, Fetch Page 2, Consume Page 2 you can: Fetch Page 1, Fetch Page 2, Fetch Page 3.... Consume Pg 1, Consume Page2, ....
oh nice. I've been meaning to ask if there was a change in how Cognitect allocates time to Clojure dev work versus time to consulting jobs following the acquisition. Curious if the amount of time allocated has changed
Yeah that’s^ the only solution I’ve seen that kind of addresses all the issues w/ streaming inputs.
in clojure, do namespaces have to follow the directory structure? I don't quite understand why they are needed if they just mirror the way the directory is structured
if you use load-file your code could come from anywhere, but require needs the classpath relative path to match the ns
this includes non file resources, eg entries in jars
I wrote a post about this recently https://code.thheller.com/blog/shadow-cljs/2021/05/13/paths-paths-paths.html
if I may paraphrase the question: couldn't the namespace just be implied from the directory structure. no chances on mismatch
I guess this is a minor thing, but I do understand where the question is coming from.
not files but require
one resource can define any number of namespaces that don't match its path, but we don't don't do that because it would be terrible
@noisesmith that is less true for ClojureScript where I think @zuwadihi is coming from, although he or she didn't specify that. correct me if I'm wrong @zuwadihi
anyway, I do think it contributes to clarity to not derive this info from the dir structure ;)
you can also scatter a single namespace over multiple files, clojure (clojure.core is split over multiple files, clojure.pprint, etc) itself is the only large project I've seen do that
It even has partial classes where you can later define other parts of classes in other files
In C#, this was commonly used to facilitate codegen workflows, especially IDE-based ones where part of the code was modified directly by the tooling. It was a significant help for these workflows, especially when used judiciously. Since Clojure has macros, you don't need to do that.
Are you implying that this is no longer a commonly used feature in C#, @russell.mull?
im a bit newer, and more familiar with how things work on the js side of things, where require
works more like load-file
@borkdude "Was" is strictly in my experience... I haven't seriously used C# in about a decade. So I suppose the past-tense is not necessarily appropriate :)
@zuwadihi Clojure is more dynamic than CLJS with respect to namespaces, vars, eval, etc.
which might be a blessing in disguise really on the CLJS side, at least for static analysis, etc.
yeah I would imagine less dynamic namespacing would work a bit better with static analyzers
Clojure does not seem to respect :refer-clojure :exclude
:
(ns user
(:refer-clojure :exclude [compile]))
(defn compile [] 42)
$ clj
WARNING: compile already refers to: #'clojure.core/compile in namespace: user, being replaced by: #'user/compile
$ clj --version
Clojure CLI version 1.10.3.943