Fork me on GitHub

Someone in here recently said something like, “Don’t do I/O in a transducer, but you can do it in a reducing fn.” Am I remembering that right? Anyone have a rationale or a source for that?


here's the thread


> Doing IO in a transducer is putting a suspendible, memory allocating, and long running operation on a background thread pool designed for the opposite. IIUC this is regarding the use of transducers on chans (mostly). I/O doesn’t line up semantically with a transducer, but there’s no functional issue with doing I/O on a local into xform. Is that right?


Not sure what you mean by "functional issue" exactly, but doing IO in a transducer will work. Just like building your whole app out of singletones. ;)


(Local being an operative word here. Locality means this xform cannot be reused in another context.)


“functional issue” meaning, it will 100% without exception


Right. The same 100% you'd get when doing IO elsewhere, assuming you know exactly when and where that IO ends up being done.

✔️ 2

I guess it falls into the "you shouldn't but you can"


if you're the only consumer and you know what you're doing


Right I mentioned in the thread that “local” was an important word in this conext


Cannot reuse a local xform 😄


Yeah, I’m not really recommending this as a pattern. I just wanted to make sure I fully understand what’s being said.


same, I don't think I would do this


Although, I am curious what people do when they need to chain I/O calls (e.g. for pagination). I’ve seen—and been bitten by—`lazy-seq`. (into [] (mapcat fetch-page) pages) works, but breaks said rule. Is everyone really loop/recurring around this problem?


I guess you could say “use core.async”.


there's a nice function coming for 1.11 for this


Yeah I’ve heard of such an animal.


You happen to have the source for it?


the one from ghadi?


Hold on. Why is (into [] (mapcat fetch-page) pages) breaking the rule? In the end, fetch-page becomes a part of the reducing function - it's not called in the transducer itself.


I apparently misunderstand, so you might need to define things in baby steps for me 🙂


This is the transducer arity of map:

    (fn [rf]
        ([] (rf))
        ([result] (rf result))
        ([result input]
           (rf result (f input)))
        ([result input & inputs]
           (rf result (apply f input inputs)))))
f is your fetch-page. (fn [rf] ...) is the transducer. (fn ...) is the reducer. f is not called in the transducer, it's called only in the reducer.


K, so you’re argument is, “the last step of a transducer chain is technically part of the reducer” is that correct?


The last step of a transducer chain is another transducer. into [] reduces its input using the transducer chain. Transducers don't even know about the into.


(mapcat fetch-page) - this is your whole transducer chain in the example above. This is another example:

  (mapcat fetch-page)
  (filter seq))
No reduction is going anywhere in that code. fetch-page is never called. It's only called when you reduce over some input with one of those transducers.


Transducers are similar to the threading macros, only they're done specifically in runtime and specifically on reducing functions.


The reducer here is part of into is it not? It just conjs items into a vector.


Are you suggesting that the only I/O you should avoid is during initialization of a transducer chain?


iiuc that’s for the (fn [rf] ...) form does—initialize the chain.


Because that does not line up with my understanding what’s being said.


The reducer is part of into, yes. Both the "transducer" and the "reducer" have very precise definitions, as far as I can tell. The initial thesis explicitly states that one should not do IO in the former and instead should do it in the latter. So yes, you should not do IO when turning one reducer into another in a transducer's body.


In other words, the example with (into [] ...) above is completely fine, in the context of the initial thesis.


> you should not do IO when turning one reducer into another in a transducer’s body Let me make sure we’re clear here (because I’m not certain about the definitions). The (fn [rf] ..) form is called exactly once, immediately prior to execution, is this correct?


And, once that form is called on an xform, a fn chain is returned that will be used to do the transformation.


In the context of the above example with into - yes. In some other situation, it might also be called immediately, but the resulting reducer execution might be delayed.


Does that matter for your thesis?


Absolutely not.


And the thesis is not mine. :) It's from that thread. I'm just trying to convey its meaning using the existing definitions.


So what you’re saying is that as long as you return that fn chain w/o doing any I/O, you’re good, is that correct?


Yeah yeah, you’re the only one explaining this to me tho 😄


Ok, my understanding was: Doing I/O in a mapcat was bad because, for example, you can hand it to a chan and then you’re doing I/O on the chan’s threadpool—which is reserved for coordination.


It had nothing to do with initialization, and everything to do with where an I/O fn happens to be called.


Not "you're good", but "you're not bad" from the perspective of the thesis. You can still get screwed by sticking that transducer into sequence just because now all your IO is lazy - when you might not expect it.


Yes. But I wouldn't conflate it with channels. In principle, computation composition should be pure. That's it, that's all there is to it. You don't want your -> macro to start doing DB queries.


(the -> macro analogy is only muddying the water for me 🙂 )


From impurity in that context all sorts of nasty things might pop up, including hogging the main thread or a thread from a reserved/limited pool.


I mean, I think we agree about the usage (It’s fine. It will work.). But I have no idea how you perceive that my example is in line with the original statement.


Because in your example there's no IO done when the transducer is called. That's it.


yeah, but you later said that, “well it’s not good to hand that same transducer to sequence because it’s lazy”


So the xform is busted in some sense.


Because it doesn’t follow the rule.


It's a completely different topic - you can get screwed by lazy sequences if you use them. The transducers are fine, the reducer is fine. But laziness might bite you if you are not careful.


"IO in transducers" is orthogonal to "IO in lazy contexts". Above, I was just nitpicking at the wording "you're good". Because you aren't magically out of the water if you do IO in a reducer - because other things might happen to you, still.


I think you might be off in your definitions of “transducer” and “reducer”


> The composed xf transducer will be invoked left-to-right with a final call to the reducing function f. In the last example, input values will be filtered, then incremented, and finally summed.


I’m pretty sure the reducing form is just (fn [result input] …)


e.g. that sentence talks about the “composed xf transducer” as a the thing that’s returned from the initializer


I use the definition exactly from the "Terminology" section of that page. You quoted the section that uses transduce - it calls both the transducer and the resulting reducer. Take a look at its implementation.


actually, I’m pretty sure the reducing fn is, e.g. the last thing you pass to transduce


sry the 3rd arg to transduce


transduce != "a transducer".


I know that


So how is the section about transduce relevant here?


The thesis operates on transducers and reducers. They are both well defined. transduce is a separate entity.


I’m saying that the third arg to transduce—for example—is the only reducer in the form.


It takes an xform and a separate reducing fn. Nothing in the xform arg would be called a “reducer”.


I assume you mean the second arg. Because these are the arities of transduce:

[xform f coll]
[xform f init coll]
f is a reducer, yes. xform is a transducer. The very first thing transduce does is replacing f with (xform f) - actually executing the transducer, the step where IO should not be done. And that's my point exactly. Transducers do not call the reducer. It's completely fine to have IO in fetch-page and use it in (mapcat fetch-page) , because fetch-page will not be called there. It will be called only during the reduction phase, whenever it happens.


sry yes, second arg


Ok, so your argument is entirely: Because it’s eager, it works just fine.


Is that true?


I think you already said that, but I’m removing a lot of the details to see if we agree 😄


i.e. because into immediately starts reducing everything, by definition, my transduction is in the reduction phase.


It's not about eagerness. I'm sorry I've brought up lazy collections - forget about them. Forget about being eager either. And about threads and channels. There are two things of interest going on: • Computation composition (always higher-order functions - transducers in our case, but could be comp, partial, complement, etc). The way to define a computation in advance, without running it • Running the computation (reduction in our case) The first step should only deal with what it's named after. Compose the "recipe" out of existing functions. It should not "do" anything. The second step should do all the "doing". The first step can be removed from the second in any sense of that word - they can be separated in time, in place, or even in languages. And it's fine exactly thanks to the first step not "doing" anything.


I understand fn composition vs execution.


The whole point of this is that during execution you shouldn’t do I/O if you’re on a chan.


AFAICT it has nothing to do with composition itself.


What do you mean by "on a chan"?


using the (chan xform) form


iow — my understanding of the original statement is, “Because you don’t know where execution is going to happen, you shouldn’t assume it’s okay to do I/O anywhere in an xform. They’re designed to be portable, and I/O is fundamentally anti-portable.”


Your statement is a specific case of a more general description that I've attempted to provide.


Yeah this has the same lazy-seq issue that I was mentioning before.


and the alternative is loop/recur




well, idk, at least it takes care of some of the machinery in a single fn call


What’s the issue? It offers both looping and lazy seqing


Lazy seqs are busted if they ever hit an exception


They happily return whatever was realized up to the point of exception.


So you have to be careful in how you use them.

Maciej Szajna11:09:04

This is really cool stuff! I'm under the impression you could simplify the contract a lot (iteration step! {:vf vf :kf kf :some? s? :initk initk} is the same as

(->> (iteration (comp (fnil step! initk) kf))
     (take-while s?)
     (map vf))

Maciej Szajna11:09:36

(sorry for the edits, newlines on slack are hard)

Maciej Szajna11:09:00

Well, it's mostly the same maybe, with kf now having to handle nils and all. But the point is still that most of these extra keys could be handled elsewhere. I guess I'd keep initk for this to correspond to reduce nicely.


it's an abstracted version of what's in almost every lib that has to handle pagination (like the common jdbc drivers & other db libs)


I think next.jdbc does this internally, the c* driver too, and the early jdbc driver from ghadi, squee, was doing the same (probably the first lib doing that in the wild that I know of)


I remember an interesting argument from tim baldridge. He argued that when several functions work together like this he preferred making a protocol. I think it was in a video rather than an article but i'd like to rewatch that.


isn't it in one of his core.async talks?


I found his way a bit odd, but maybe that was just the way it was presented


i think it was on the video site he was publishing on but i don't think it was a core.async talk specifically


i think it is this one;


but i stopped my subscription a while back.


there's a core.async version brewing @mpenet @dpsutton


that fetches 'pages' with concurrent run-ahead of the consumption


instead of: Fetch Page 1, Consume page 1, Fetch Page 2, Consume Page 2 you can: Fetch Page 1, Fetch Page 2, Fetch Page 3.... Consume Pg 1, Consume Page2, ....


oh nice. I've been meaning to ask if there was a change in how Cognitect allocates time to Clojure dev work versus time to consulting jobs following the acquisition. Curious if the amount of time allocated has changed


Yeah that’s^ the only solution I’ve seen that kind of addresses all the issues w/ streaming inputs.


But it’s kind of a pain for anyone who doesn’t core.async :allthethings:




in clojure, do namespaces have to follow the directory structure? I don't quite understand why they are needed if they just mirror the way the directory is structured


that's actually a good question :)


if you use load-file your code could come from anywhere, but require needs the classpath relative path to match the ns


this includes non file resources, eg entries in jars


its about CLJS but applies to CLJ as well


if I may paraphrase the question: couldn't the namespace just be implied from the directory structure. no chances on mismatch


well technically you don't need files at all


you can build your entire program one form at a time


sure, but if you do use files


I guess this is a minor thing, but I do understand where the question is coming from.


not files but require


one resource can define any number of namespaces that don't match its path, but we don't don't do that because it would be terrible


@noisesmith that is less true for ClojureScript where I think @zuwadihi is coming from, although he or she didn't specify that. correct me if I'm wrong @zuwadihi


anyway, I do think it contributes to clarity to not derive this info from the dir structure ;)


you can also scatter a single namespace over multiple files, clojure (clojure.core is split over multiple files, clojure.pprint, etc) itself is the only large project I've seen do that


I wish it didn't though


Btw, C# also has this feature I think, it's way less strict than Java


It even has partial classes where you can later define other parts of classes in other files


Why would you want to do that, though ?


Seems like a recipe for unreadable spaghetti soup.


Exactly, don't do it, it complicates tooling


oh i forgot about partial classes

Russell Mull16:09:05

In C#, this was commonly used to facilitate codegen workflows, especially IDE-based ones where part of the code was modified directly by the tooling. It was a significant help for these workflows, especially when used judiciously. Since Clojure has macros, you don't need to do that.


yes, im coming from the cljs side, but thought it should apply the same here as well


Are you implying that this is no longer a commonly used feature in C#, @russell.mull?


im a bit newer, and more familiar with how things work on the js side of things, where require works more like load-file

Russell Mull16:09:59

@borkdude "Was" is strictly in my experience... I haven't seriously used C# in about a decade. So I suppose the past-tense is not necessarily appropriate :)


Same here ;)


@zuwadihi Clojure is more dynamic than CLJS with respect to namespaces, vars, eval, etc.


which might be a blessing in disguise really on the CLJS side, at least for static analysis, etc.


yeah I would imagine less dynamic namespacing would work a bit better with static analyzers


Clojure does not seem to respect :refer-clojure :exclude:

(ns user
  (:refer-clojure :exclude [compile]))

(defn compile [] 42)
$ clj
WARNING: compile already refers to: #'clojure.core/compile in namespace: user, being replaced by: #'user/compile
$ clj --version
Clojure CLI version


User is magic

👍 2

The user namespace is already created and setup referring in all of clojure.core before your file is loaded


:exclude just stops that ns form from referring it in, it doesn't remove it if already referred