Fork me on GitHub

Is there a way to short-circuit the keep-indexed transducer?


do you mean like this?

(into [] (comp (keep-indexed vector) (take 5)) (range 100))

👍 3

yes, that's how take and halt-when work. What you can't do is return something from the function you pass to map or keep-indexed wrapped in reduce and expect it stop processing the rest of the data. So this won't stop at 3:

(into [] (map (fn [x] (if (< 3 x) (reduced x) x))) (range 10))


pretty much all the reduce looking things in core support reduced. Like transduce and reductions and so on ... so this will result in 3

(reduce (fn [r i] (if (< 3 i) (reduced r) i)) (range))
rather than running through the infinite seq forever


user=> (into [] (keep-indexed (fn [idx i] (reduced idx))) (range 3))
[#object[clojure.lang.Reduced 0x7c6189d5 {:status :ready, :val 0}] #object[clojure.lang.Reduced 0x4248e66b {:status :ready, :val 1}] #object[clojure.lang.Reduced 0x3e6534e7 {:status :ready, :val 2}]]
I must be doing it wrong


Oh, @U0P0TMEFJ already said that 😄


I'm not sure I'm following, nothing here seems to work for me


Well, except for halt-when.


(into []
        (fn [idx item]
          (println idx)
          (when (= 3 item)
            (reduced idx))))
       (range 100))
This prints all elements. So I can only assume the keep-indexed didn't short-circuit.


@U064X3EF3 Are you sure you are supposed to be able to use reduced with keep-indexed transducer? Is it a bug then?


yes @U0K064KQV that's exactly correct behaviour. You can't just return a reduced element into a collection, you have to return the whole collection via reduced.


that's what halt-when ,`take`,`take-while`, etc do


if you're trying to compose something like take-while after a keep, remember that keep expands to a mapand a remove nil? so

(into [] (comp (keep-indexed (fn [idx item] (prn idx) (when (< idx 3) item)))
                 (take-while (complement nil?)))
        (range 5)
is the same as saying "map all the elements, and when we've seen more than 3 return nil, then remove all the nils, then stop if we see nil" ... which will consume the whole range ... right?


Yes, right now keep-indexed work like: keep ALL indexed which match pred. But it be nice to be able to say: keep UP-TO reduced indexed which match pred. Because that's one downside when switching to the transducer, something like this:

   (fn [idx item]
     (println idx)
     (when (= 3 item)
   (range 100)))
Will only consume up-to chunk-size until it finds the first non nil thing that is kept. But the transducer will actually be consuming the full list no matter what.


Hum, ok actually maybe I'm wrong, this does seem to work, which I saw you had showed before but had missed:

(into []
         (fn [idx item]
           (println idx)
           (when (= 3 item)
        (take 1))
       (range 100))


yeah ... if I just wanted the first thing out of a list, with a transducer, I'd compose (take 1) in there, rather than adding first ...


Ah, that's what Alex meant by you can use reduced, like it will short-circtuit if a nested transducer returns reduced to it


also, with transducers, the lazyness is about how you apply that transducer


(first (sequence (keep-indexed (fn [idx item] (prn idx) (when (< idx 40) item))) (range 50)))
so that will behave like the lazy-seq version of keep-indexed and consume a chunk at a time


if you're using into you're producing a fully realised vector, then taking the first one, which is why it consumes the whole sequence


Its not exactly the same, this is still iterating through each transducer 1 element at a time. Where as lazy-seq will iterate 32 at a time per sequence function


well ... it'll do enough work to produce 32 elements in the lazy seq being returned by sequence


but yes, it's not producing intermediate lazy-seqs in between each step


is that what you mean?


Ya, but it will still do a reduce like behavior. Like if you have (comp A B C) it takes ele1 and sends it through A, then B then C. Now it take ele2 and sends it through A, B and C, etc. Where as with lazy-seq if you have (->> coll A B C) it will take 32 elements and run them all through A, then send 32 result out of A to B, and then 32 out of B to C


yeah ... which is why transducers are more efficient ... cos they're not producing all those lazy seqs ... right?


Unless you use some special transducers that collect things, like partition-by. And I thought keep-indexed did too, but I was wrong.


Ya, because each batch of "32" is wrapped in an extra object container, and the creation of that object and garbage collection is what slows down lazy-seq


I mean, its not just the batch, like even unchunked seq will wrap the single element in an extra object.


So using (sequence ...) with transducers should still be more performant than lazy-seq. Because it will only create a chunk each 32 result of the full chain, not any of the intermediate ones


Its what this sentence from the guide basically means: > The resulting sequence elements are incrementally computed. These sequences will consume input incrementally as needed and fully realize intermediate operations. This behavior differs from the equivalent operations on lazy sequences.


Anyways, thanks! You gave me my answer


I was hoping returning a (reduced) would do the trick, but it seems not

Jim Newton09:08:23

I discovered something nice that seems to accidentally work. One of my randomly-generated test suites was causing a java.lang.StackOverflowError exception. I wanted to know what input data was triggering the error. So I set up the following, to catch, warn, and rethrow the exception

(try (unary-test-fun data)
         (catch java.lang.StackOverflowError e
           (cl-format true "~&Stack overflow on ~A~%" data)
           (throw e)))
I was half expecting it not to work, but it seems to work beautifully, at least in my case.


Try/Catch->Rethrow is something commonly done in Java and C# as far as I remember :thinking_face:


I would say that is supposed to work (not only accidentally)


It is definitely supposed to work, and it's certainly common. One of the common uses of it (at least on my long term codebase at my job) is to log evidence of a certain problem (the exception) when you can't trust the calling code to properly deal with the exception.


Building a JAR for a Clojure library with, tools.deps; no :deps in my project's deps.edn and no aliases engaged; running clojure -T:build jar with a stock jar task (copied from fogus' blog post), the final pom.xml file still has a dependency on org.clojure/clojure (version 1.10.1, which isn't even the one in the :deps of my root deps.edn file). I can manually dissoc org.clojure/clojure from the :libs from the basis to keep it out of the final pom, but I'm wondering if I'm overlooking something simple, or if there's an expectation that even libs now have a dependency on Clojure and consumers can just override deps (including Clojure) as they see fit.


that is happening because org.clojure/clojure listed as a dependency in the root deps.edn provided by clojure cli To avoid using it you can specify option :root nil for create-basis function

(b/create-basis {:project "deps.edn" :root nil})

☝️ 3
thanks3 3

@U04V4KLKC thank you! that solves my problem. I must be mistaken about which deps.edn is my root one, because the Clojure version being included isn't the same as the one specified at /usr/local/lib/clojure/deps.edn but that's a different mystery to solve


there are three “main” deps.edn files: • clojure cli specific • the one from user’s home • your project’s deps.edn clj -Sdescribe and look at :config-files you will see full paths to those files create-basis function allows you to override each of them:

thanks3 3

I'd expect every library to have deps for the libraries it needs in order to work, I consider clojure one of those libraries


sure, but root deps.edn is declaring org.clojure/clojure {:mvn/version "1.10.3"} which could be too high version for distribution of some library


does clj really pick a higher versioned deeper dep over a lower versioned top level one?


I'd consider any behavior other than using the version explicitly in your deps file a bug, and not declaring a version is asking for trouble


hm… no, probably I had something different that influence which version to use for pom.xml


> and not declaring a version is asking for trouble in my head it was always expressed in the form - “clojure is a library for java” -> so it can load other “clojure” libraries -> so those libraries should not declare dependency on some version of clojure got this impression after looking at a number of “contrib” libraries such as data.json


> I'd expect every library to have deps for the libraries it needs in order to work, I consider clojure one of those libraries I prefer not to include Clojure as a dependency for Clojure libraries I distribute, with the expectation that consumers will either use the Clojure version of their system or specific project. Every Clojure library specifying a Clojure dependency adds to the noise of dependency trees and their resolution. Maven's dependency scopes provide a story for "indicate that my code uses this dependency, but expect the consumer to provide a concrete dependency on it downstream", but I believe dependency scopes of this nature are intentionally not supported by tools.deps


Ya, tools.deps always pick (or (declared version in deps.edn) (highest version))


@U11SSJP2A Chiming in late here. It's fairly typical for Clojure libraries to list as a dependency the minimum version of Clojure they work with. Sure, not all libraries do that -- some just assume Clojure will be "provided" -- but I think it's a good idea if there are (earlier) versions of Clojure a library will not work with.


@U04V70XH6 That's certainly a fair reason and in my case applies. Thanks!


@U04V4KLKC The root deps.edn (built-in for t.d.a) declares a default dependency for whatever is the current version of Clojure when that version of t.d.a was released. So "by default" when using the CLI, you get a "recent" version of Clojure -- and that's reflected in the version of the CLI itself: -- by default uses Clojure 1.10.3. See -- there were a few 1.10.2.x versions earlier this year and it was 1.10.1.x all of last year.


What’s the preferred EDN serialization/deserialization ? pr-str and clojure.edn/read-string ?


that's a fine combination; I tend to use pr-str together with


why instead of clojure.edn?


tools.reader README has a nice rationale that says it better than I could rephrase it:

Jim Newton11:08:22

trying to debug a stack overflow problem. Is setting the maximum stack depth something I can change from clojure or do I have to add some flag to the :ivm-opts of my project.clj file?


stack size is a global JVM property. You can configure it passing -Xss100M as an example


but instead of increasing stack size I can recommend change the code so it won’t consume stack. There are some handy function in clojure core: trampoline as an example

👍 3

Setting Xss100M is bonkers here's a more reasonable Xss (plus a comment on how to figure out a good value for your machine) and XX:MaxJavaStackTraceDepth which also is relevant


cool comments! thumbsup_all thanks!

🍻 3
Jim Newton11:08:15

My suspicion is that the lazy functions are triggering the stack overflow. I recently refactored lots of functions to return lazy lists. This means that functions which do not appear to be heavy stack users, all of a sudden become compute intensive. For example (first (rest ...)) now has to compute the 2nd element of the sequence. Anyway, currently this is only a suspicion. Maybe my bug is elsewhere, or maybe I really have introduced a logical bug in the lazy-list refactoring.

Jim Newton11:08:17

@, I haven't used trampolining yet. It was my impression that is intended for direct recursion, not for meta-circular dependencies.


if you're getting stack overflows with lazyness, have you seen this?

delaguardo11:08:30 yes, laziness might bring such problems. Here is a post about this

👍 3
😭 3
Jim Newton11:08:07

@U45T93RA6 is the intent of your post that I :jvm-options section into my project.clj file?


hahaha I was about to post that link too talk about hive minds 🐝


yes, profiles.clj has the same syntax as project.clj you'd simply have to copy the Xss and XX:MaxJavaStackTraceDepth entries

✔️ 3

> It was my impression that is intended for direct recursion, not for meta-circular dependencies. not necessarily. look at this example -

Jim Newton12:08:28

the tldr of is to avoid concat, if I understand correctly. I'm not really using concat directly, but I am using several calls to mapcat which internally uses concat. And in my recent refactoring I created my own lazy/mapcat which is based of my lazy/concat ... with the goal of 1-chunking rather than Clojure's default 32-chunking.

👍 3
Jim Newton12:08:08

the motivation being that 32-chunking is intended to optimize long thin sequences. In stead my application as short fat sequences.

Jim Newton12:08:09

@U04V4KLKC, yes nice example. But it still seems to me trampolining is when a closed set of functions need to call each other in a lexically concise way. In my case I have several generic functions which operate on trees whose nodes are sequences of other trees. Many operations are dependent on other operations, and even some operations are defined in different namespaces. So while it might be possible to refactor to use trampolining, it is not apparent to me how to do so.

Jim Newton12:08:22

That being said, it certainly would be nice if the clojure compiler knew how to efficiently compile tail calls of functions defined within the same letfn . That's not the problem I'm facing here, but it would be an interesting optimization.


As a quick observation, sometimes a SO error doesn't really indicate a categorical flaw in your code... clojure programs are hungrier than Java programs, so the JVM default settings don't always fit This expresses itself quite often in programs using walk , but also with various other functional patterns I've seen this in well-known libs, there was no bug one simply has to set Xss intentfully.

Jim Newton12:08:46

@U45T93RA6 good to know. In my case since I just did a big refactoring, I have to really consider whether I did in fact introduce subtle bugs into the program.

👍 3
Jim Newton12:08:43

Here are the lazy functions I am using. I have some unit tests which do sanity checks to assure that the function behave the same semantics as the clojure.core functions they eclipse. However, there may indeed be hidden greedy stack consumers hidden in there.

Jim Newton12:08:49

@U45T93RA6 with these changes you suggested, I'm still getting stack overflows, but now the stack traces are vvvvveeeeeeerrrrrrrryyyyyyyy long. Is there a way to tell clojure to prune the stack trace it prints?


You can undo or tweak XX:MaxJavaStackTraceDepth it only affects reported stacktraces, nothing else. I do find large sizes for it useful. Often with SOs the first few thousand entries will be repetitive and will hide the root ns that is invoking that code in the first place


When exactly garbage collector free memory? When references no longer exist? From time to time? In some kind of random way? The point is I see heap usage on chart and I am thinking if it is current need for memory or also include significant part of data which are no longer needed, but GC didn’t remove them yet.


I think most GCs don't free memory at all, their Linux process' memory can only grow or keep its size G1GC does free memory as soon as it performs a GC


I don't understand the second paragraph, you might want to reword it


> I think most GCs don’t free memory at all, I don’t understand what you mean


oh ok to precise:


I mean free memory inside Java application, not for system

👍 3

I am thinking how to interpret used heap: 1) heap which is currently used by app to which app refer 2) the point above + memory which is not needed anymore, but GC didn’t free it yet


so the code where using 1GB vector in function which ended. This data are not used by app anymore. Is this 1GB still in used heap and waiting for GC to remove it? I think yes. How long?


> When exactly garbage collector free memory? When references no longer exist? From time to time? In some kind of random way? Answering again then: will depend on the choice of GC and its parameters (which can be many). I guess a sensible tldr is that if the GC thinks you're about to run out of memory, and it's good timing to perform a GC, then it will do so. Overall it's a non-deterministic process although (System/gc) can nudge it for the sake of experimientation.


> GC thinks you’re about to run out of memory, and it’s good timing to perform a GC, then it will do so Yes this one for sure when it is close to out of memory. But what before? Does it run every 15 minutes or something like that?


Without that knowledge I don’t know how to think about heap usage


But hmm. Maybe I just have to accept I can’t know what memory heap contain. I mean how many old data.


I don't think any GC will have a hard-and-fast rule like "every 15m" or such. They're really complicated programs (which is why runtimes other than the JVM have subpar GCs)


ok so then it means I can’t really use heap usage as a way to know how much memory app need at that moment


In your screenshot, Used Heap includes used and unused object references. I know this by pure logic: the graph descends from time to time (once per each performed GC), which implies it accumulates garbage as the program runs


exactly, so I can’t rely on this to debug memory usage

Colin P. Hill13:08:31

If you google around, you should be able to find tools for dumping JVM memory and then exploring the dumped snapshot. These tools aren’t super easy to use, but if you need to know what’s eating up your memory, they’re a good way to explore it.


In my very limited understanding of memory usage that graph looks normal to me (depends on exactly what it’s doing at the end there though). What are you trying to debug?


@U029J729MUP I am waiting for cannot allocate memory to get heap dump file. But I am really not sure if it will help me. Debugging this is hard and very limited information. Especially with anonymous functions which are named in the way you really don’t know what part of the code it is.


@U0VP19K6K I am trying to fix “ Cannot allocate memory”


Is that what happens at the end of the graph?

Colin P. Hill13:08:42

if it’s a problem with JVM memory, I would expect an error, not an exception :thinking_face:


It is probably memory leak in the app, but if for sure and where.

Colin P. Hill13:08:34

googling that error message suggests that this may be an allocation failure in a child process

Colin P. Hill13:08:53

if the JVM itself were running out of memory, you’d get an OutOfMemoryError, not an IOException


it is happening when downloading AWS S3 file

kwladyka13:08:54 - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
Exception in thread "cli-planner thread" Cannot allocate memory


before we were using Java 8 and we had out of memory exception, after update to Java 11 we have the one above


What’s the max heap size?


so quite a lot


Is it maybe then a case that the JVM just needs more heap than it’s allowed to use? (eyeballing graph above)


we were increasing it a few times and it always needs more

Colin P. Hill13:08:56

Note carefully the difference between an exception and an error. This is an exception, implying that it’s something the JVM program may in principle be able to handle. That means the JVM itself is not running out of memory – something else, in native code, is failing to allocate, which may be for a number of reasons.

☝️ 3

@U029J729MUP can you give such example?


I think you'd have better luck starting this thread over again stating that you get a Cannot allocate memory and ideally attaching a redacted stacktrace, and the specific things you've tried for trying to solve that specific error (generic OOM hunting doesn't count; that assumes a specific root cause) tldr this doesn't particularly smell like an OOM, you can get better help by simply stating your problem and letting the experts who hang out in #clojure help (certainly not me in this case)

Colin P. Hill13:08:34

I don’t have a lot of experience with native code, but, for example, something might be trying to allocate a contiguous block that is larger than any free spot in system memory

Colin P. Hill13:08:59

which isn’t quite the same thing as simply running out of memory, and is very different from the JVM running out of memory


ok so let’s be clear about what Cannot allocate memory mean Are you saying it is not about heap or Java memory, but system memory for 100%?


or only it can mean this


Yeah… I would also consider that there are multiple factors here… I wouldn’t assume it’s a leak, not sure what you’re doing with the stream but if it’s big enough and you’re trying to consume the whole thing in one go it could surface other issues like others have mentioned? Just a stab in the dark 🙂


or it exactly mean this

Colin P. Hill13:08:31

Yes. If the JVM were running out of memory, it would be an OutOfMemoryError. An IOException implies that it’s a problem encountered while interfacing with something else on the system. Googling the error message reveals that people encounter this most often when working with child processes.


what exactly do you mean by child process here?


asking differently: async don’t use child processes right?


if so we don’t have child processes unless AWS libraries have :thinking_face:

Colin P. Hill13:08:52

That’s probably a bit more than I can take the time to explain. It’s a core concept in operating systems – I suggest googling “child process” and just reading up a bit. But in a nutshell, it’s another program outside of the JVM that the JVM is talking to.

Colin P. Hill13:08:38

It’s not at all unlikely that the AWS libraries spawn child processes or do some unexpected shenanigans with native code

Colin P. Hill13:08:13

The log message that you see occurs in the close method of that class, so it’s probably not saying anything about the root cause of your error, just that the error interrupted the read

Colin P. Hill13:08:32

but that tells you something about when it’s happening


> . But in a nutshell, it’s another program outside of the JVM that the JVM is talking to. Yes, but I don’t think we have such one. I also didn’t see it using jcmd.


But it is already progress, because I thought this exception is Java memory issue. Not outside Java memory (system memory).


If I will figure out this I will let you know 😉

Colin P. Hill13:08:17

Yeah I’m just speculating about it being a child process, but I’m positive that it isn’t the JVM running out of memory. This is native code somewhere. Maybe in a native system call in low-level library code. Do you have a full stack trace?


2021-06-22 18:19:33,837 [cli-planner thread] WARN - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
Exception in thread "cli-planner thread" Cannot allocate memory
        at Method)
        at $fn__11002.invokeStatic(io.clj:307)
        at $fn__11002.invoke(io.clj:302)
        at clojure.lang.MultiFn.invoke(
        at $fn__11006.invokeStatic(io.clj:321)
        at $fn__11006.invoke(io.clj:319)
        at clojure.lang.MultiFn.invoke(
        at $copy.invokeStatic(io.clj:406)
        at $copy.doInvoke(io.clj:391)
        at clojure.lang.RestFn.invoke(
        at personal_shopper.core$download_file$fn__20988.invoke(core.clj:801)
        at personal_shopper.core$download_file.invokeStatic(core.clj:799)
        at personal_shopper.core$download_file.invoke(core.clj:797)
        at personal_shopper.core$plan_batches.invokeStatic(core.clj:853)
        at personal_shopper.core$plan_batches.invoke(core.clj:849)
        at personal_shopper.core$plan_supplier_BANG_.invokeStatic(core.clj:893)
        at personal_shopper.core$plan_supplier_BANG_.invoke(core.clj:882)
        at personal_shopper.core$shop_supplier_BANG_.invokeStatic(core.clj:903)
        at personal_shopper.core$shop_supplier_BANG_.invoke(core.clj:899)
        at personal_shopper.core$fn__21162$fn__21165.invoke(core.clj:1013)


but you are right about cannot allocate memory. It means different thing. I was suggested previous out of memory with Java 8.


How big is the stream and how much memory does the machine have available?


about 150 MB if I remember


but let’s hold on. We changed memory limits yesterday. If it was really about system memory and there is no “memory leak” in system, then maybe everything will work from yesterday.


I have to just wait a couple of days


besides heap, jvm needs off-heap memory to do gc and other management task, maybe all memory is being assigned to heap. Or heap memory is unbound(grows with workload), and none memory is left for housekeeping or OS tasks. wild guess is S3ObjectInputStream something is using bytebuffer or native memory, and there's a contention with heap

Colin P. Hill13:08:17

at .FileOutputStream.writeBytes(Native Method) sounds like a malloc fail or something like that, which is definitely not managed by JVM memory options but is also beyond my knowledge (I’m not really a native developer)


The stacktrace really piqued my curiosity... You might have luck reproducing the problem in a production repl by performing the problematic copy of a large file. Maybe 1000 times in a row. Seems better than waiting :) was a nice one. It links to this blog post (otherwise the link is dead)


@U45T93RA6 yeah I would like to have full access to production 🙂

😄 3

same stackoverflow post provides another workarounds, like tweaking heap size, to make more memory available to the OS


it back java.lang.OutOfMemoryError: Java heap space 😱

😢 3
😱 3
🙀 3
Ivar Refsdal07:08:03

Are you setting -Xmx (heap size) to the container/(VM?) limit? To about 14 GB? Note that -Xmx is for the heap, and that the JVM can and will use more memory than just the size allocated to the heap. Source: Could the problem go away by simply omitting -Xmx? You should not need -Xmx: I could be totally wrong in suggesting this, but I've recently been battling an OOM and also setting a very high -Xmx.

Ivar Refsdal07:08:21

Yesterday I launched my container with max memory 8 GB (to Azure) and Xmx8g to the JVM, and it has been restarting/OOMing like crazy since then. Edit: This morning I redeployed without -Xmx --- and I'm waiting for the results.


Xmx and Xms is set to 14GB

Ivar Refsdal08:08:06

And how much memory is available for the system/container/(vm?) as a whole?

Ivar Refsdal08:08:27

OK. You may want to think about simply dropping -Xmx, re my comment and links above. How/what are you deploying into?


Ivar, have you read the thread? This didn't smell like a vanilla OOM to most of us


FWIW dropping Xmx will leave it at its default, which is typically 1GB. Not good.

Ivar Refsdal09:08:46

I did read it vemv. I don't think Xmx is 1GB by default since java 10, re my links above.

With the release of Java 10, the JVM now recognizes constraints set by container control groups (cgroups). Both memory and cpu constraints can be used manage Java applications directly in containers, these include:
    adhering to memory limits set in the container

Ivar Refsdal09:08:26

At least if he is deploying into a container

Ivar Refsdal09:08:30

Regarding -Xmx default value: The default value is chosen at runtime based on system configuration.


> At least if he is deploying into a container Certainly My humble input would be to avoid going in circles, the error message / stacktrace is very specific and similar problems had a solution to be applied at Linux level, not JVM (see the link from yesterday) I'd still recommend to start the thread again and list the things you've googled and tried (such as, again, the one. Among a few other things). Else you have people shooting in the dark for you


It was thread with async which didn’t close properly and refer to data. Maybe just deadlock or maybe just very not optimal pipeline in async. I didn’t analyse it further.


But all in all I have to wait longer to be really sure to confirm exception will not back.


I am writing this, because maybe it will be useful for somebody.


Can #? be extended? We are looking at using the magic of .cljc to share code between backend and front, but "front" has to work for both web and some RN wrapper. And they ^^^ differ in re, say, talking to MQTT via Paho. I see these #? options in the doc example: #?(:clj  (Clojure expression)    :cljs (ClojureScript expression)    :cljr (Clojure CLR expression)    :default (fallthrough expression)) Can we extend that ourselves? Thx! 🙏


I think babashka supports it's own tag in the same way ... I suspect that "extend that ourselves" is going to mean "write a complier" 😉

💯 3

I think clojure is modular enough that you just need to implement your own reader


I believe shadow-cljs has this as a feature


I've used it before to share code between browser and node.js code


I believe in a normal JVM Clojure context it should just skip over any conditional that doesn't match :clj


👍 .. cool ... I didn't know shadow had implemented that ... that's super useful ... every day's a school day 😉

Alex Miller (Clojure team)18:08:14

it can probably not be extended in the way you want to extend it

Alex Miller (Clojure team)18:08:41

it is designed as an extensible system, and you can pass platform identifiers to the reader in its option map when you invoke it


I think the difference between cljs in web and cljs in react-native is not like the difference between clj and cljs - regular conditional forms will work for what you want without extending the reader conditionals


"regular conditional forms will work". Not sure what are "regular conditional forms", but then I am no Clojure guru. Is this some other reader macrology? I only new of the cljs/clj variants. We can certainly write app code that tests some run-time variable to decide which platform to cater to, but I am worried about the NS and project.clj dependencies. Do we solve all this with some black-belt (or trivials deps.edn work)?


Wait ENVARs? Well, still need conditional (ns (:require....???))


yeah the real PITA sharing code between two platforms (web and rn) are conditional requires


you really want something that works at the compiler level


since you can't really conditionally require code in CLJS


you can conditionally require by using the require function inside a conditional, then use some variety of DI to provide the right platform dependent implementation


integrant / component / etc. make this easy


what makes this harder in cljs is that you can't use resolve


no, you cannot use require inside of a conditional in CLJS


all dependencies within a namespace must be statically put in the (ns ,,, (:require ,,,)) in ClojureScript


oh, I forgot that, thanks


I am having fantasies of two projects, one dedicate to Web, one to RN, that include whatever can be platform-neutral in one shared project, and then each pulls in a platform-specific project that supplies to provide (essentially) a common API to the ultimate client apps. insert hand-waving. We could even leave out platform-specific stuff, but I kinda like the idea of shared code bases as much as possible in an enterprise situation where code reuse has a chance.

Alex Miller (Clojure team)18:08:05

but you cannot change the platform identifiers used when reading clj or cljs source itself


the reader conditionals prevent compiler errors, I can't imagine what wouldn't even compile?


In transduce, I'm a bit confused when f is involved. It seems that the 2-ary of my f is called once in the begging with the init but the element is like the element returned by the xf. So I'm confused, its like my f is plugged after the xf, but it received the init?


Can you put a simple case here?


(defn index-of
   ([element coll]
    (index-of element coll []))
   ([element coll idxs]
     (comp (keep-indexed
            (fn [idx item]
              (if (sequential? item)
                (index-of element item (conj idxs idx))
                (when (= element item) (conj idxs idx))))))
       ([acc e] (println acc e) (when (some? e) (reduced e)))
       ([done] (first done)))

(index-of 3 [1 2 3 4 5 6 7])

;; prints:
:init [2]
:: returns:


Transduce is something like ((xf f)(reduce (xf f) init coll))


So f is not invoked directly, and xf has control of what f sees


I think you code is just kind of buggy? Like, inside your mapped keep-index function, you are conjing indices onto the passed in to the whole index-of function


Which maybe is a thing you want, but seems highly unlikely


Hum, ya it might be haha, I was more focused on the transduce bit.


I think what you are missing is how transducers work, where transduce is kind of the trivial application of


If you can dig up rich's original blog post about transducers it might be helpful


Lets use this one instead:

(defn index-of
   [element coll]
     (fn [idx item]
       (when (= element item) idx)))
      ([acc e] (println acc e) (when (some? e) (reduced e)))
      ([done] done))

(index-of 3 [1 2 3 4 5 6 7])
:init 2


The thing about transducers that seems to confuse everyone is that there are three arities and one of them (0-arity) is never called 🙂


it is called in transduce


there are actually multiple 0 arities available to tranduce, I beliver it will call (f), but before applying xf, which is the confusing thing


afaict, the initial value is produced by calling the reducing function f rather than calling the transducer, xform


It would call my f with no arg if no init is passed


reducing function can of course have transducers

(let [rf ((map inc) conj)]
  (transduce (map inc) rf [] (range 4)))


does that ever call the transducer's 0 arity?


that would call the 0 arity of (map inc)


which will delegate to the underlying reducing function conj which will return []


Are you sure, that's not what the doc says.


> If init is not supplied, (f) will be called to produce it.


Maybe it implied, only if the xform 0-ary decides to call it


(let [xf1 (fn [rf]
              ([] (println "I am called") (rf))
              ([result] (rf result))
              ([result x] (rf result x))))
      rf (xf1 conj)]
  (transduce (map inc) rf (range 5)))
I am called
[1 2 3 4 5]


unless you are pointing out that i accidentally supplied the init


  (fn [rf]
    (fn ([] (println "xf init") (rf))
      ([done] (println "xf done: " done) done)
      ([acc e] (println "xf rf: " acc e) e)))
    ([] (println "f init") :init)
    ([done] (println "f done: " done) done)
    ([acc e] (println "f rf: " acc e) e))
  [1 2 3 4 5 6 7 8 9])

f init
xf rf:  :init 1
xf rf:  1 2
xf rf:  2 3
xf rf:  3 4
xf rf:  4 5
xf rf:  5 6
xf rf:  6 7
xf rf:  7 8
xf rf:  8 9
xf done:  9


I don't see xf init being called


I'm honestly super confused by what I see, did I do somethinig wrong?


Oh, forgot to call rf in the other 2


  (fn [rf]
    (fn ([] (println "xf init") (rf))
      ([done] (println "xf done: " done) (rf done))
      ([acc e] (println "xf rf: " acc e) (rf acc e))))
    ([] (println "f init") :init)
    ([done] (println "f done: " done) done)
    ([acc e] (println "f rf: " acc e) e))
  [1 2 3 4 5 6 7 8 9])
f init
xf rf:  :init 1
f rf:  :init 1
xf rf:  1 2
f rf:  1 2
xf rf:  2 3
f rf:  2 3
xf rf:  3 4
f rf:  3 4
xf rf:  4 5
f rf:  4 5
xf rf:  5 6
f rf:  5 6
xf rf:  6 7
f rf:  6 7
xf rf:  7 8
f rf:  7 8
xf rf:  8 9
f rf:  8 9
xf done:  9
f done:  9
You can still see it never calls the init arity of xf


Which is what I said, yes.


The 0-arity of f is called. That is not the 0-arity of the transducer. f should be callable with 0 or 2 arguments. The transducer is definitionally required to have three arities: 0 (supposedly "init" but it is never used), 1 (completing), 2 (reducing).


i posted an example above where a transducer's 0 arity was called


No. Your example has a reducing step function. That's not a transducer.


In the transduce call, the transducer is the first argument (`xform`).


Not all transducer-related functions have a reducing step function.


(`sequence`, eduction, etc)


didn't i use a transducer as a function from one reducing function to another?


(let [xf1 (fn [rf]
              ([] (println "I am called") (rf))
              ([result] (rf result))
              ([result x] (rf result x))))
      rf (xf1 conj)]
  (transduce (map inc) rf (range 5)))
I am called
[1 2 3 4 5]
xf1 is a transducer here correct?


stares at the code Hmm, yeah, so a transducer will have its 0-arity called only when it is used to create a reducing step function from another reducing (step) function. How/where is that actually done in the wild?


Yeah, it is


But xf1 is not passed to transduce


it is in a slightly different context but it is an example of where the 0-arity of a transducer is called


so i was just pointing out that this was a bit over eager is all > The transducer is definitionally required to have three arities: 0 (supposedly "init" but it is never used),


This gets it to really picky terminology


But xform -- which is what is normally referred to as a transducer never gets its 0-arity called in any transducer-related functions.


The transducer being a function from step function to step function has no 0-arity


yeah i agree. i've also gotten wallowed a bit in figuring out which functions are transducers, which are reducing functions, if there's a name for a reducing function that has a 0-arity version, etc


a name for a reducing function that also has a completion arity, etc


True, the transducer itself really only has a 1-arity version...


Yeah, I guess it's sloppy to refer to a transducer having a 0-arity at all?


But people are generally pretty lax about what is called a transducer


"The inner function is defined with 3 arities used for different purposes:" -- from


So when it is said that transduce doesn't invoke the 0arity what is meant is that it doesn't invoke the 0-arity of the step function created by applying xform to the step function


Sorry, I guess I'll be more careful with terminology from now on 😐 There's at least one very confused SO post about this...


And given that meaning it is still true with dpsuttons example


That does make me ask my other Q again tho' @U11BV7MTK: where in the wild do we see transducers applied to reducing step functions to create new reducing step functions? Normally we just see the xform as a comp of a bunch of transducers.


(it is now clear to me that is what the reference doc is actually describing, now matter how many times I've read it in the past!)


I've seen the places where xform is called on a reducing function called a "reducing context"


we've got a few at work


Ah, interesting... in OSS? Link? Or just in blog posts about transducers?


So there is one of those inside transduce, and sequence, and core.async channels, and if you where creating your own reducing context


let me see if i can find them. at metabase we're all open source so i can share


I forget you work there 🙂


going on about a year now


this stuff takes me a while to remember how everything works but here's an example


but that will be super hard to follow without some navigation and knowing what is going on


And just above that, there's the same confusion we just had here: -- histogram isn't a transducer, it's a reducing step function.


yes good point. there is no other rf involved so it immediately stands out


(and then it's used in ((filter real-number?) histogram) at the end of that block)


i think the first time i the pattern like this was in the history of clojure paper


OK, I'll bear Metabase in mind when this subject comes up again (because it will). Thank you! I've been perpetuating incorrect information because I had my terminology wrong.


every time i reason about these things some new piece clicks into place


So I'm confused as to what receives the init value first? Does it first call the 2-ary of xf with [init first-element] and then keep-indexed calls my f/2-ary but passes to it the init untouched and the transformed element? But if so, I should see it printing a bunch of nils. So I think keep-indexed chooses not to call my f until there is a non-nil transformed element, but then I'm confused how my f at that point receives init ? Is that just all what keep-indexed does under the hood?


f and xf are not distinct


as I said transduce is something like ((xf f) (reduce (xf f) init coll))


transduce builds a new function g by applying xf to f and then reduces with g


Ya, I think I'm just surprised by the behavior of keep-indexed transducer. Its like it keeps track of the init even though its later in the reduction, and passes it to my f as the first value. Also, is there no way to have init be the first element? Like what reduce does?


there is not


reduce's behavior there is in some ways considered to be a mistake


Interesting, in what sense? I feel like most of my reduce use case start with the first two elements.


And so I often need to find a kind of identity for the init, and sometimes that can be tricky.


Assumes the accumulator and the elements are the same type


Which is almost never the case for complex folds


It is why reducers have distinct combining and reducing functions


Hum... I mean, but before you at least had a choice, if they are the same type, don't pass init, if they are not, pass an init. Now when they are the same type, you need to find a value of that type that somehow will result in an identity when your reducing function is first called


Or call first and rest your self


I thought about that, felt it was weird. Wouldn't it mess up the coll if it was on a reduced fast path?


It is what reduce has to do anyway if you don't supply an init


there is no other behavior for keep-indexed as a transducer that makes senses


Feel it would have been better to pass the index down


that would throw away the accumulator in the reduce and makes no sense


Hum, I was thinking like you'd get the accumulated list of things kept till now as the accumulator, and maybe the index as.the element. But I think you're, what they did is probably better


the index parameter muddies things, so it might be clearer if you think about it just in terms of filter, or possibly start from filter and see what it takes to add the index

Frank Henard22:08:39

I'm writing a script that migrates data. I want to grab a list of IDs, and send them to n threads to run concurrently. I was chunking them, but some threads finish early, and it slows down as it nears the end. I would like to use a queue for this. Should I use core.async, or can I just use an atom with a list as a queue?

Frank Henard22:08:37

The problem I see with the atom is that getting the first element in the list, and updating to rest needs to happen in the same operation to avoid race conditions, and I'm not readily seeing how to do that


just use an ExecutorService


max N threads, submit all your tasks


That doesn't work great if the queue is large of it a task requires too much data. Had OOMs before because of it.


that is why you make filling the work queue a task on the work queue


Could you elaborate a bit? I'm probably too sleepy to grasp it right away.


the problem with an atom is it is non-blocking, and for a work queue you generally want something blocking, or else you end up polling for work


say you have some how some database query that you can fetch in pages, and you want to do work to each page

Frank Henard22:08:14

thanks for the quick responses! I gotta run, but I'm going to come back and look through what you sent


you write some code that grabs N pages, puts them on the work queue, then queues itself on the work queue to do the next N, etc


so you never over fill the work queue and oom because of the queue size


(1 at a time is never, N at a time is greatly reduced, but whatever)


Ah, right, makes sense. I was stuck on thinking about it in the context of the problem that I had - a queue of manageable size already in memory, so I never had to fetch any pages. Just scheduling it all was blowing things up because threads aren't that light-weight, even if you don't feed them much data. (or rather, not threads themselves since there's a limited amount but the scheduled task in the executor)


depends, if you use a fixed size threadpool (like the static method I linked to) and fill the queue in the way I described then there is pressure, if you are interacting with the executor externally it is tricky but doable, you may have to cps your code though


@U0NCTKEV8 To approach it from a different side - why would you not want to use core.async? Any other reasons besides "it's easy enough to do with a fixed thread pool and a queue"?


if you are connecting core.async code to an executor used for io, it usually suffices to ignore any futures the executor creates, and instead queue up tasks that deliver their results to a channel, and have the core.async code park on the channel


core.async is kind of 3 things: channels, the go macro, and threadpools


alts falls out of channels with cancelable/idempotent callbacks


the 3 things really synergize, but they may not match what you are doing, and none of them is a fixed pool executor


so like you can do io on a async/thread, but async/thread creation is unbounded


which may or may not be fine, depending


channels are great, and when you need them there is no substitute, but a lot of the time you can get by with some kind of queue from java.util.concurrent


(import 'java.util.concurrent.ExecutorCompletionService)
(import 'java.util.concurrent.Executors)

(defn do-concurrently
  "Executes each task in tasks with concurrency c, assuming side-effects,
   and run handler on their results as they complete. Handler is called
   synchronously from the calling thread."
  [tasks c handler]
  (let [executor (Executors/newFixedThreadPool c)
        cs (ExecutorCompletionService. executor)
        initial (take c tasks)
        remaining (drop c tasks)]
    ;; Submit initial batch of tasks to run concurrently.
    (doseq [task initial]
      (-> cs (.submit task)))
    (doseq [task remaining]
      ;; Block until any task completes.
      (let [result (-> cs .take .get)]
        ;; When there remains tasks, submit another one to
        ;; replace the one that just completed.
        (-> cs (.submit task))
        ;; Handle the result of the task that just completed.
        (handler result)))
    ;; Since we submitted an initial batch, but only handled a remaining
    ;; number of tasks, some tasks are left un-handled, and we need to handle
    ;; them.
    (doseq [_ initial]
      (handler (-> cs .take .get)))
    ;; shutdown executor once all tasks have been processed
    (-> executor .shutdown)))

(defn io
  "Simulating an IO operation by sleeping the calling thread
   for the given amount-of-time. Returns the amount-of-time."
  (Thread/sleep amount-of-time)

;;; Run io 10000 times at 10 ms per io call with up to 100 concurrent calls
;;; and sum up all results.
;;; Then print the time it took and the resulting sum.
(let [sum (atom 0)]
   (do-concurrently (repeat 10000 (partial io 10)) 100 #(swap! sum + %)))
  (println @sum))


The trick is that you first submit c number of tasks to be executed concurrently. In this case, I've chosen to make 100 concurrent calls at a time. The call to submit is non blocking and will return immediately. After you've initiated your first batch, you block on cs, which will wait till any of them complete, and when one does, it will unblock and return the result of the task that just completed. When that happens, we will submit another task, so that we maintain our concurrency level, and we will call our handler with the result. In effect, we're saying, perform n number of calls up to c at a time. We are handling the results on the thread which submits the remaining tasks as they complete. This means that if our handler is very slow, it will delay our re-queuing of remaining tasks, so that's something to keep in mind. Finally, we have to handle the remaining batch of un-handled tasks, and shutdown the executor to release the resources associated with it.


core.async pipelines are vaguely like an executor, but not really (pipelines have more ordering which will limit concurrency)


Perhaps a naive question. Why would this be bad? Assuming we want exactly (+ 2 (.. Runtime getRuntime availableProcessors)) concurrently running tasks, as pmap gives us.

(->> tasks
     (pmap do-stuff)
tasks itself could be a chunked lazy seq that won't realize too much data ahead.


pmap is the worst


I like pmap 😛 , it actually does something similar in trying to stay ahead, but you can't control the number of threads, and it retains the head of whatever you are doing.


Though it retains order I think, so might not be as fast in any case


pmaps limiting to 2+ is of course broken (because of chunking), and the way it combines laziness and concurrency is bad, and the way it encapsulates the execution means even its broken limits don't apply if you have multiple pmaps being called


yes, and the ordering thing


Well, the chunking is actually a blessing in disguise, because you can now control pmap's concurrency based on your chunk size 😛


> limiting to 2+ is of course broken (because of chunking) Doesn't lazy-seq there basically disable chunking? Since it advances one at a time. Duh, there's extra map.


I was just looking at some migration code, where the migration is written as a reduce (a fold) over each users data, and then the reduce operation is customized to run each reduce step on an executor, and enqueue the next step to run when it is done


so then you can throw them all on a single executor and they share time


user=> (seq [1 2 3 4])
(1 2 3 4)
user=> (class (seq [1 2 3 4]))
user=> (lazy-seq (seq [1 2 3 4]))
(1 2 3 4)
user=> (class (lazy-seq (seq [1 2 3 4])))
user=> (class (seq (lazy-seq (seq [1 2 3 4]))))


You can see it going in batch of 32 here:

(pmap #(do (.println System/out (str "map: " %)) (Thread/sleep 500) %) (range 100))


The customized reduce is just

(defn exec-reduce [exec fun init coll]
  (if (seq coll)
    (exec (fn []
            (if-not (reduced? init)
              (exec-reduce exec fun (fun init (first coll)) (rest coll))
              (fun (unreduced init)))))
    (fun (unreduced init))))
so exec is expected to be a function that queues another function on the executor


@U0NCTKEV8 Is your example really correct, given that you explicitly wrap a chunked seq?

(defn step [x]
    (if (pos? x)
        (println x)
        (cons x (step (dec x))))

(first (step 100))
The above will print 100 only once. That's what I meant by "removes chunking", and that's similar to what pmap is using. However, it uses map in between, which is chunked, and that's where that n gets ignored.


no that is not correct


lazy-seq is not a defense against chunking


you don't have a chunked seq there, which is why you don't see it behave like one


Oh, true, my wording sucks, apologies. Replace "removes chunking" with "allows creating lazy seqs without chunking".


you can create lazy-seqs without chunking, but some people have things like vectors and like to map over them


or have a pipeline like (-> v (map ..) (map ...) (pmap ...)) or something


where because it started as a vector you get chunking


TIL map by itself does not induce chunking.


For example:

(defn re-chunk [n xs]
    (when-let [s (seq (take n xs))]
      (let [cb (chunk-buffer n)]
        (doseq [x s] (chunk-append cb x))
        (chunk-cons (chunk cb) (re-chunk n (drop n xs)))))))

(pmap #(do (.println System/out (str "map: " %)) (Thread/sleep 500) %) (re-chunk 50 (range 100)))
Now it is doing 50 at a time.


Going back to your initial reply on pmap, and ignoring the chunking for now. > the way it encapsulates the execution means even its broken limits don't apply if you have multiple pmaps being called I can see that, although creating thread pools willy-nilly I guess would be roughly the same. So treating a single usage of pmap as if it were a creation of a new thread pool should deal with that concern. > the way it combines laziness and concurrency is bad This is probably the most interesting. Could you say a couple more words on it?


just a general statement I guess, the laziness limits your ability to control execution, the use of concurrency implies you care about execution


I think for a script, you can probably trust something like:

(dorun (pmap #(handle (some-io %)) coll)
Keeping in mind that your handler will run in parallel as well.


It'll go 32 at a time by default, and you can re-chunk if you want it to go faster. Though you can't slow it down much more then that, since it'll be num of threads + 2 at a minimum, chunk-size otherwise


Thank you both, I learned something new.


But honestly, the CompletionService in my opinion is the best way to go if you want to run a bunch of tasks in parallel batches and go as fast as you can, handling each result where order doesn't matter.

Frank Henard21:08:39

Thanks everyone. I ended up using the CompletionService and am very happy!


I want to call the second method listed here:

(ins)org.noisesmith.gamey=> (-> (reflect/reflect GLFW) :members (->> (filter (fn [x] (= (:name x) 'glfwCreateWindow)))) pprint)
({:name glfwCreateWindow,
  :return-type long,
  :declaring-class org.lwjgl.glfw.GLFW,
  :parameter-types [int int java.nio.ByteBuffer long long],
  :exception-types [],
  :flags #{:public :static}}
 {:name glfwCreateWindow,
  :return-type long,
  :declaring-class org.lwjgl.glfw.GLFW,
  :parameter-types [int int java.lang.CharSequence long long],
  :exception-types [],
  :flags #{:public :static}})
when I call it as follows:
(GLFW/glfwCreateWindow 300 300 "Hello, World!" nil nil)
I get
Execution error (IllegalArgumentException) at org.noisesmith.gamey/init (gamey.clj:18).
No matching method glfwCreateWindow found taking 5 args
the javadoc for this method tells me I should be providing nil for my last two arguments,int,java.lang.CharSequence,long,long) what's the trick to getting clojure to find the right method here? hint the nil as a Long or something?


those are most likely pointers and they probable mean a null pointer (eg. 0) rather than java null


I've never seen 0 called NULL in javadoc


I'll try it though


or they copy and pasted from the glfw docs


My bet that it's the case. The original function definition:

GLFWwindow* glfwCreateWindow	(	int 	width,
int 	height,
const char * 	title,
GLFWmonitor * 	monitor,
GLFWwindow * 	share 


yeah, it's just shitty docs and it wanted 0, thanks


> [in] monitor The monitor to use for full screen mode, or NULL for windowed mode. > [in] share The window whose context to share resources with, or NULL to not share resources.


I guess I'll keep that in mind when I see random "long" args that are actually pointers


I think I got my signals crossed because under X11 you really do look up screens and windows with numeric ids that aren't pointers


It’s been a while since I’ve had to interop with a vararg Java method. How would I call a static method with this type signature? *of*(double[][] data, java.lang.String... names)


My understanding is that you just pass arrays to vararg functions.


(.of O data (into-array String [...]))


Aha, the array is only for the variable part.



(DataFrame/of (into-array (map int-array
                               [[0 0 0 0]
                                [0 0 0 0]
                                [0 0 0 0]]))
              (into-array java.lang.String


your spec above was doubles not ints, but yeah


if you are just messing around with quaternions and whatnot maybe take a look at neanderthal


or tech ml dataset for the dataframe type stuff


or python integration