clojure 2021-08-17 | Slack Archive

Is there a way to short-circuit the keep-indexed transducer?

halt-when?

do you mean like this?

(into [] (comp (keep-indexed vector) (take 5)) (range 100))

👍 3

Alex Miller (Clojure team)11:08:13

You can use reduced

Ed14:08:23

yes, that's how take and halt-when work. What you can't do is return something from the function you pass to map or keep-indexed wrapped in reduce and expect it stop processing the rest of the data. So this won't stop at 3:

(into [] (map (fn [x] (if (< 3 x) (reduced x) x))) (range 10))

Ed14:08:26

pretty much all the reduce looking things in core support reduced. Like transduce and reductions and so on ... so this will result in 3

(reduce (fn [r i] (if (< 3 i) (reduced r) i)) (range))

rather than running through the infinite seq forever

dominicm17:08:27

user=> (into [] (keep-indexed (fn [idx i] (reduced idx))) (range 3))
[#object[clojure.lang.Reduced 0x7c6189d5 {:status :ready, :val 0}] #object[clojure.lang.Reduced 0x4248e66b {:status :ready, :val 1}] #object[clojure.lang.Reduced 0x3e6534e7 {:status :ready, :val 2}]]

I must be doing it wrong

dominicm17:08:39

Oh, @U0P0TMEFJ already said that 😄

didibus17:08:04

I'm not sure I'm following, nothing here seems to work for me

didibus17:08:15

Well, except for halt-when.

didibus17:08:59

(into []
       (keep-indexed
        (fn [idx item]
          (println idx)
          (when (= 3 item)
            (reduced idx))))
       (range 100))

This prints all elements. So I can only assume the keep-indexed didn't short-circuit.

didibus17:08:43

@U064X3EF3 Are you sure you are supposed to be able to use reduced with keep-indexed transducer? Is it a bug then?

Ed18:08:41

yes @U0K064KQV that's exactly correct behaviour. You can't just return a reduced element into a collection, you have to return the whole collection via reduced.

Ed18:08:25

that's what halt-when ,`take`,`take-while`, etc do

Ed18:08:28

if you're trying to compose something like take-while after a keep, remember that keep expands to a mapand a remove nil? so

(into [] (comp (keep-indexed (fn [idx item] (prn idx) (when (< idx 3) item)))
                 (take-while (complement nil?)))
        (range 5)

is the same as saying "map all the elements, and when we've seen more than 3 return nil, then remove all the nils, then stop if we see nil" ... which will consume the whole range ... right?

didibus18:08:54

Yes, right now keep-indexed work like: keep ALL indexed which match pred. But it be nice to be able to say: keep UP-TO reduced indexed which match pred. Because that's one downside when switching to the transducer, something like this:

(first
  (keep-indexed
   (fn [idx item]
     (println idx)
     (when (= 3 item)
       idx))
   (range 100)))

Will only consume up-to chunk-size until it finds the first non nil thing that is kept. But the transducer will actually be consuming the full list no matter what.

didibus18:08:31

Hum, ok actually maybe I'm wrong, this does seem to work, which I saw you had showed before but had missed:

(into []
       (comp
        (keep-indexed
         (fn [idx item]
           (println idx)
           (when (= 3 item)
             idx)))
        (take 1))
       (range 100))

Ed18:08:40

yeah ... if I just wanted the first thing out of a list, with a transducer, I'd compose (take 1) in there, rather than adding first ...

didibus18:08:04

Ah, that's what Alex meant by you can use reduced, like it will short-circtuit if a nested transducer returns reduced to it

Ed18:08:53

yes

Ed18:08:18

also, with transducers, the lazyness is about how you apply that transducer

Ed18:08:59

(first (sequence (keep-indexed (fn [idx item] (prn idx) (when (< idx 40) item))) (range 50)))

so that will behave like the lazy-seq version of keep-indexed and consume a chunk at a time

Ed18:08:44

if you're using into you're producing a fully realised vector, then taking the first one, which is why it consumes the whole sequence

didibus18:08:06

Its not exactly the same, this is still iterating through each transducer 1 element at a time. Where as lazy-seq will iterate 32 at a time per sequence function

Ed18:08:03

well ... it'll do enough work to produce 32 elements in the lazy seq being returned by sequence

Ed18:08:34

but yes, it's not producing intermediate lazy-seqs in between each step

Ed18:08:45

is that what you mean?

didibus18:08:36

Ya, but it will still do a reduce like behavior. Like if you have (comp A B C) it takes ele1 and sends it through A, then B then C. Now it take ele2 and sends it through A, B and C, etc. Where as with lazy-seq if you have (->> coll A B C) it will take 32 elements and run them all through A, then send 32 result out of A to B, and then 32 out of B to C

Ed18:08:01

yeah ... which is why transducers are more efficient ... cos they're not producing all those lazy seqs ... right?

didibus18:08:14

Unless you use some special transducers that collect things, like partition-by. And I thought keep-indexed did too, but I was wrong.

didibus18:08:06

Ya, because each batch of "32" is wrapped in an extra object container, and the creation of that object and garbage collection is what slows down lazy-seq

didibus18:08:33

I mean, its not just the batch, like even unchunked seq will wrap the single element in an extra object.

didibus18:08:27

So using (sequence ...) with transducers should still be more performant than lazy-seq. Because it will only create a chunk each 32 result of the full chain, not any of the intermediate ones

didibus18:08:29

Its what this sentence from the guide basically means: > The resulting sequence elements are incrementally computed. These sequences will consume input incrementally as needed and fully realize intermediate operations. This behavior differs from the equivalent operations on lazy sequences.

➕ 3

didibus18:08:08

Anyways, thanks! You gave me my answer

Ed18:08:24

👍

didibus06:08:24

I was hoping returning a (reduced) would do the trick, but it seems not

Jim Newton09:08:23

I discovered something nice that seems to accidentally work. One of my randomly-generated test suites was causing a java.lang.StackOverflowError exception. I wanted to know what input data was triggering the error. So I set up the following, to catch, warn, and rethrow the exception

(try (unary-test-fun data)
         (catch java.lang.StackOverflowError e
           (cl-format true "~&Stack overflow on ~A~%" data)
           (throw e)))

I was half expecting it not to work, but it seems to work beautifully, at least in my case.

alpox21:08:44

Try/Catch->Rethrow is something commonly done in Java and C# as far as I remember :thinking_face:

alpox21:08:26

I would say that is supposed to work (not only accidentally)

chepprey01:08:51

It is definitely supposed to work, and it's certainly common. One of the common uses of it (at least on my long term codebase at my job) is to log evidence of a certain problem (the exception) when you can't trust the calling code to properly deal with the exception.

semperos10:08:00

Building a JAR for a Clojure library with tools.build, tools.deps; no :deps in my project's deps.edn and no aliases engaged; running clojure -T:build jar with a stock jar task (copied from fogus' blog post), the final pom.xml file still has a dependency on org.clojure/clojure (version 1.10.1, which isn't even the one in the :deps of my root deps.edn file). I can manually dissoc org.clojure/clojure from the :libs from the basis to keep it out of the final pom, but I'm wondering if I'm overlooking something simple, or if there's an expectation that even libs now have a dependency on Clojure and consumers can just override deps (including Clojure) as they see fit.

delaguardo10:08:43

that is happening because org.clojure/clojure listed as a dependency in the root deps.edn provided by clojure cli To avoid using it you can specify option :root nil for create-basis function

(b/create-basis {:project "deps.edn" :root nil})

☝️ 3

semperos11:08:09

@U04V4KLKC thank you! that solves my problem. I must be mistaken about which deps.edn is my root one, because the Clojure version being included isn't the same as the one specified at /usr/local/lib/clojure/deps.edn but that's a different mystery to solve

delaguardo11:08:40

there are three “main” deps.edn files: • clojure cli specific • the one from user’s home • your project’s deps.edn clj -Sdescribe and look at :config-files you will see full paths to those files create-basis function allows you to override each of them: https://clojure.github.io/tools.build/clojure.tools.build.api.html#var-create-basis

noisesmith16:08:41

I'd expect every library to have deps for the libraries it needs in order to work, I consider clojure one of those libraries

delaguardo16:08:02

sure, but root deps.edn is declaring org.clojure/clojure {:mvn/version "1.10.3"} which could be too high version for distribution of some library

noisesmith16:08:32

does clj really pick a higher versioned deeper dep over a lower versioned top level one?

noisesmith16:08:02

I'd consider any behavior other than using the version explicitly in your deps file a bug, and not declaring a version is asking for trouble

delaguardo17:08:05

hm… no, probably I had something different that influence which version to use for pom.xml

delaguardo17:08:03

> and not declaring a version is asking for trouble in my head it was always expressed in the form - “clojure is a library for java” -> so it can load other “clojure” libraries -> so those libraries should not declare dependency on some version of clojure got this impression after looking at a number of “contrib” libraries such as data.json

semperos17:08:22

> I'd expect every library to have deps for the libraries it needs in order to work, I consider clojure one of those libraries I prefer not to include Clojure as a dependency for Clojure libraries I distribute, with the expectation that consumers will either use the Clojure version of their system or specific project. Every Clojure library specifying a Clojure dependency adds to the noise of dependency trees and their resolution. Maven's dependency scopes provide a story for "indicate that my code uses this dependency, but expect the consumer to provide a concrete dependency on it downstream", but I believe dependency scopes of this nature are intentionally not supported by tools.deps

didibus18:08:00

Ya, tools.deps always pick (or (declared version in deps.edn) (highest version))

seancorfield01:08:48

@U11SSJP2A Chiming in late here. It's fairly typical for Clojure libraries to list as a dependency the minimum version of Clojure they work with. Sure, not all libraries do that -- some just assume Clojure will be "provided" -- but I think it's a good idea if there are (earlier) versions of Clojure a library will not work with.

semperos01:08:56

@U04V70XH6 That's certainly a fair reason and in my case applies. Thanks!

seancorfield02:08:41

@U04V4KLKC The root deps.edn (built-in for t.d.a) declares a default dependency for whatever is the current version of Clojure when that version of t.d.a was released. So "by default" when using the CLI, you get a "recent" version of Clojure -- and that's reflected in the version of the CLI itself: 1.10.3.933 -- by default uses Clojure 1.10.3. See https://clojure.org/releases/tools -- there were a few 1.10.2.x versions earlier this year and it was 1.10.1.x all of last year.

roklenarcic11:08:47

What’s the preferred EDN serialization/deserialization ? pr-str and clojure.edn/read-string ?

semperos13:08:25

that's a fine combination; I tend to use pr-str together with clojure.tools.reader.edn/read-string

roklenarcic10:08:13

why clojure.tools.reader.edn instead of clojure.edn?

semperos15:08:27

tools.reader README has a nice rationale that says it better than I could rephrase it: https://github.com/clojure/tools.reader#rationale

Jim Newton11:08:22

trying to debug a stack overflow problem. Is setting the maximum stack depth something I can change from clojure or do I have to add some flag to the :ivm-opts of my project.clj file?

delaguardo11:08:17

stack size is a global JVM property. You can configure it passing -Xss100M as an example

delaguardo11:08:50

but instead of increasing stack size I can recommend change the code so it won’t consume stack. There are some handy function in clojure core: trampoline as an example

👍 3

vemv11:08:08

Setting Xss100M is bonkers here's a more reasonable Xss (plus a comment on how to figure out a good value for your machine) and XX:MaxJavaStackTraceDepth which also is relevant https://github.com/reducecombine/.lein/blob/e05d6a2d22c0990a88a660c25fe8c5e51a3c6b1a/profiles.clj#L11-L43

delaguardo11:08:00

cool comments! thanks!

🍻 3

Jim Newton11:08:15

My suspicion is that the lazy functions are triggering the stack overflow. I recently refactored lots of functions to return lazy lists. This means that functions which do not appear to be heavy stack users, all of a sudden become compute intensive. For example (first (rest ...)) now has to compute the 2nd element of the sequence. Anyway, currently this is only a suspicion. Maybe my bug is elsewhere, or maybe I really have introduced a logical bug in the lazy-list refactoring.

Jim Newton11:08:17

@https://app.slack.com/team/U04V4KLKC, I haven't used trampolining yet. It was my impression that is intended for direct recursion, not for meta-circular dependencies.

Ed11:08:29

if you're getting stack overflows with lazyness, have you seen this? https://stuartsierra.com/2015/04/26/clojure-donts-concat

delaguardo11:08:30

https://stuartsierra.com/2015/04/26/clojure-donts-concat yes, laziness might bring such problems. Here is a post about this

👍 3

😭 3

Jim Newton11:08:07

@U45T93RA6 is the intent of your post that I :jvm-options section into my project.clj file?

vemv11:08:13

hahaha I was about to post that link too talk about hive minds 🐝

➕ 3

vemv11:08:45

yes, profiles.clj has the same syntax as project.clj you'd simply have to copy the Xss and XX:MaxJavaStackTraceDepth entries

✔️ 3

delaguardo11:08:23

> It was my impression that is intended for direct recursion, not for meta-circular dependencies. not necessarily. look at this example - https://clojuredocs.org/clojure.core/trampoline#example-5552b71ee4b01ad59b65f4cf

Jim Newton12:08:28

the tldr of https://stuartsierra.com/2015/04/26/clojure-donts-concat is to avoid concat, if I understand correctly. I'm not really using concat directly, but I am using several calls to mapcat which internally uses concat. And in my recent refactoring I created my own lazy/mapcat which is based of my lazy/concat ... with the goal of 1-chunking rather than Clojure's default 32-chunking.

👍 3

Jim Newton12:08:08

the motivation being that 32-chunking is intended to optimize long thin sequences. In stead my application as short fat sequences.

Jim Newton12:08:09

@U04V4KLKC, yes nice example. But it still seems to me trampolining is when a closed set of functions need to call each other in a lexically concise way. In my case I have several generic functions which operate on trees whose nodes are sequences of other trees. Many operations are dependent on other operations, and even some operations are defined in different namespaces. So while it might be possible to refactor to use trampolining, it is not apparent to me how to do so.

Jim Newton12:08:22

That being said, it certainly would be nice if the clojure compiler knew how to efficiently compile tail calls of functions defined within the same letfn . That's not the problem I'm facing here, but it would be an interesting optimization.

vemv12:08:30

As a quick observation, sometimes a SO error doesn't really indicate a categorical flaw in your code... clojure programs are hungrier than Java programs, so the JVM default settings don't always fit This expresses itself quite often in programs using walk , but also with various other functional patterns I've seen this in well-known libs, there was no bug one simply has to set Xss intentfully.

Jim Newton12:08:46

@U45T93RA6 good to know. In my case since I just did a big refactoring, I have to really consider whether I did in fact introduce subtle bugs into the program.

👍 3

Jim Newton12:08:43

Here are the lazy functions I am using. I have some unit tests which do sanity checks to assure that the function behave the same semantics as the clojure.core functions they eclipse. However, there may indeed be hidden greedy stack consumers hidden in there. https://gitlab.lrde.epita.fr/jnewton/clojure-rte/-/blob/295d0a287eb5fed51bd37cbcc3c4fc82400c2310/src/clojure_rte/lazy.clj

Jim Newton12:08:49

@U45T93RA6 with these changes you suggested, I'm still getting stack overflows, but now the stack traces are vvvvveeeeeeerrrrrrrryyyyyyyy long. Is there a way to tell clojure to prune the stack trace it prints?

vemv12:08:28

You can undo or tweak XX:MaxJavaStackTraceDepth it only affects reported stacktraces, nothing else. I do find large sizes for it useful. Often with SOs the first few thousand entries will be repetitive and will hide the root ns that is invoking that code in the first place

kwladyka12:08:21

When exactly garbage collector free memory? When references no longer exist? From time to time? In some kind of random way? The point is I see heap usage on chart and I am thinking if it is current need for memory or also include significant part of data which are no longer needed, but GC didn’t remove them yet.

vemv12:08:10

I think most GCs don't free memory at all, their Linux process' memory can only grow or keep its size G1GC does free memory as soon as it performs a GC

vemv12:08:38

I don't understand the second paragraph, you might want to reword it

kwladyka12:08:20

> I think most GCs don’t free memory at all, I don’t understand what you mean

kwladyka12:08:37

oh ok to precise:

kwladyka12:08:48

I mean free memory inside Java application, not for system

👍 3

kwladyka12:08:52

I am thinking how to interpret used heap: 1) heap which is currently used by app to which app refer 2) the point above + memory which is not needed anymore, but GC didn’t free it yet

kwladyka12:08:56

so the code where using 1GB vector in function which ended. This data are not used by app anymore. Is this 1GB still in used heap and waiting for GC to remove it? I think yes. How long?

vemv12:08:03

> When exactly garbage collector free memory? When references no longer exist? From time to time? In some kind of random way? Answering again then: will depend on the choice of GC and its parameters (which can be many). I guess a sensible tldr is that if the GC thinks you're about to run out of memory, and it's good timing to perform a GC, then it will do so. Overall it's a non-deterministic process although (System/gc) can nudge it for the sake of experimientation.

kwladyka12:08:17

> GC thinks you’re about to run out of memory, and it’s good timing to perform a GC, then it will do so Yes this one for sure when it is close to out of memory. But what before? Does it run every 15 minutes or something like that?

kwladyka12:08:34

Without that knowledge I don’t know how to think about heap usage

kwladyka12:08:13

But hmm. Maybe I just have to accept I can’t know what memory heap contain. I mean how many old data.

vemv12:08:56

I don't think any GC will have a hard-and-fast rule like "every 15m" or such. They're really complicated programs (which is why runtimes other than the JVM have subpar GCs)

kwladyka12:08:40

ok so then it means I can’t really use heap usage as a way to know how much memory app need at that moment

vemv12:08:44

In your screenshot, Used Heap includes used and unused object references. I know this by pure logic: the graph descends from time to time (once per each performed GC), which implies it accumulates garbage as the program runs

kwladyka12:08:16

exactly, so I can’t rely on this to debug memory usage

Colin P. Hill13:08:31

If you google around, you should be able to find tools for dumping JVM memory and then exploring the dumped snapshot. These tools aren’t super easy to use, but if you need to know what’s eating up your memory, they’re a good way to explore it.

donavan13:08:36

In my very limited understanding of memory usage that graph looks normal to me (depends on exactly what it’s doing at the end there though). What are you trying to debug?

➕ 6

kwladyka13:08:37

@U029J729MUP I am waiting for cannot allocate memory to get heap dump file. But I am really not sure if it will help me. Debugging this is hard and very limited information. Especially with anonymous functions which are named in the way you really don’t know what part of the code it is.

kwladyka13:08:07

@U0VP19K6K I am trying to fix “java.io.IOException: Cannot allocate memory”

donavan13:08:27

Is that what happens at the end of the graph?

kwladyka13:08:34

yes

Colin P. Hill13:08:42

if it’s a problem with JVM memory, I would expect an error, not an exception :thinking_face:

kwladyka13:08:08

It is probably memory leak in the app, but if for sure and where.

Colin P. Hill13:08:34

googling that error message suggests that this may be an allocation failure in a child process

Colin P. Hill13:08:53

if the JVM itself were running out of memory, you’d get an OutOfMemoryError, not an IOException

➕ 3

kwladyka13:08:19

it is happening when downloading AWS S3 file

kwladyka13:08:54

com.amazonaws.services.s3.internal.S3AbortableInputStream - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
Exception in thread "cli-planner thread" java.io.IOException: Cannot allocate memory

kwladyka13:08:41

before we were using Java 8 and we had out of memory exception, after update to Java 11 we have the one above

donavan13:08:43

What’s the max heap size?

kwladyka13:08:17

14000Mi

kwladyka13:08:39

so quite a lot

donavan13:08:54

Is it maybe then a case that the JVM just needs more heap than it’s allowed to use? (eyeballing graph above)

kwladyka13:08:29

we were increasing it a few times and it always needs more

Colin P. Hill13:08:56

Note carefully the difference between an exception and an error. This is an exception, implying that it’s something the JVM program may in principle be able to handle. That means the JVM itself is not running out of memory – something else, in native code, is failing to allocate, which may be for a number of reasons.

☝️ 3

kwladyka13:08:06

@U029J729MUP can you give such example?

vemv13:08:08

I think you'd have better luck starting this thread over again stating that you get a java.io.IOException: Cannot allocate memory and ideally attaching a redacted stacktrace, and the specific things you've tried for trying to solve that specific error (generic OOM hunting doesn't count; that assumes a specific root cause) tldr this doesn't particularly smell like an OOM, you can get better help by simply stating your problem and letting the experts who hang out in #clojure help (certainly not me in this case)

➕ 3

Colin P. Hill13:08:34

I don’t have a lot of experience with native code, but, for example, something might be trying to allocate a contiguous block that is larger than any free spot in system memory

Colin P. Hill13:08:59

which isn’t quite the same thing as simply running out of memory, and is very different from the JVM running out of memory

kwladyka13:08:20

ok so let’s be clear about what java.io.IOException: Cannot allocate memory mean Are you saying it is not about heap or Java memory, but system memory for 100%?

kwladyka13:08:47

or only it can mean this

donavan13:08:57

Yeah… I would also consider that there are multiple factors here… I wouldn’t assume it’s a leak, not sure what you’re doing with the stream but if it’s big enough and you’re trying to consume the whole thing in one go it could surface other issues like others have mentioned? Just a stab in the dark 🙂

kwladyka13:08:57

or it exactly mean this

Colin P. Hill13:08:31

Yes. If the JVM were running out of memory, it would be an OutOfMemoryError. An IOException implies that it’s a problem encountered while interfacing with something else on the system. Googling the error message reveals that people encounter this most often when working with child processes.

kwladyka13:08:29

what exactly do you mean by child process here?

kwladyka13:08:14

asking differently: async don’t use child processes right?

kwladyka13:08:38

if so we don’t have child processes unless AWS libraries have :thinking_face:

Colin P. Hill13:08:52

That’s probably a bit more than I can take the time to explain. It’s a core concept in operating systems – I suggest googling “child process” and just reading up a bit. But in a nutshell, it’s another program outside of the JVM that the JVM is talking to.

Colin P. Hill13:08:38

It’s not at all unlikely that the AWS libraries spawn child processes or do some unexpected shenanigans with native code

Colin P. Hill13:08:13

The log message that you see occurs in the close method of that class, so it’s probably not saying anything about the root cause of your error, just that the error interrupted the read https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174

Colin P. Hill13:08:32

but that tells you something about when it’s happening

kwladyka13:08:49

> . But in a nutshell, it’s another program outside of the JVM that the JVM is talking to. Yes, but I don’t think we have such one. I also didn’t see it using jcmd.

kwladyka13:08:04

But it is already progress, because I thought this exception is Java memory issue. Not outside Java memory (system memory).

kwladyka13:08:11

If I will figure out this I will let you know 😉

Colin P. Hill13:08:17

Yeah I’m just speculating about it being a child process, but I’m positive that it isn’t the JVM running out of memory. This is native code somewhere. Maybe in a native system call in low-level library code. Do you have a full stack trace?

kwladyka13:08:12

2021-06-22 18:19:33,837 [cli-planner thread] WARN  com.amazonaws.services.s3.internal.S3AbortableInputStream - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
Exception in thread "cli-planner thread" java.io.IOException: Cannot allocate memory
        at java.io.FileOutputStream.writeBytes(Native Method)
        at java.io.FileOutputStream.write(FileOutputStream.java:326)
        at $fn__11002.invokeStatic(io.clj:307)
        at $fn__11002.invoke(io.clj:302)
        at clojure.lang.MultiFn.invoke(MultiFn.java:238)
        at $fn__11006.invokeStatic(io.clj:321)
        at $fn__11006.invoke(io.clj:319)
        at clojure.lang.MultiFn.invoke(MultiFn.java:238)
        at $copy.invokeStatic(io.clj:406)
        at $copy.doInvoke(io.clj:391)
        at clojure.lang.RestFn.invoke(RestFn.java:425)
        at personal_shopper.core$download_file$fn__20988.invoke(core.clj:801)
        at personal_shopper.core$download_file.invokeStatic(core.clj:799)
        at personal_shopper.core$download_file.invoke(core.clj:797)
        at personal_shopper.core$plan_batches.invokeStatic(core.clj:853)
        at personal_shopper.core$plan_batches.invoke(core.clj:849)
        at personal_shopper.core$plan_supplier_BANG_.invokeStatic(core.clj:893)
        at personal_shopper.core$plan_supplier_BANG_.invoke(core.clj:882)
        at personal_shopper.core$shop_supplier_BANG_.invokeStatic(core.clj:903)
        at personal_shopper.core$shop_supplier_BANG_.invoke(core.clj:899)
        at personal_shopper.core$fn__21162$fn__21165.invoke(core.clj:1013)
        at clojure.lang.AFn.run(AFn.java:22)
        at java.lang.Thread.run(Thread.java:748)

kwladyka13:08:16

but you are right about cannot allocate memory. It means different thing. I was suggested previous out of memory with Java 8.

donavan13:08:27

How big is the stream and how much memory does the machine have available?

kwladyka13:08:06

about 150 MB if I remember

kwladyka13:08:44

but let’s hold on. We changed memory limits yesterday. If it was really about system memory and there is no “memory leak” in system, then maybe everything will work from yesterday.

kwladyka13:08:24

I have to just wait a couple of days

paulocuneo13:08:38

besides heap, jvm needs off-heap memory to do gc and other management task, maybe all memory is being assigned to heap. Or heap memory is unbound(grows with workload), and none memory is left for housekeeping or OS tasks. wild guess is ~~S3ObjectInputStream~~ something is using bytebuffer or native memory, and there's a contention with heap

Colin P. Hill13:08:17

at .FileOutputStream.writeBytes(Native Method) sounds like a malloc fail or something like that, which is definitely not managed by JVM memory options but is also beyond my knowledge (I’m not really a native developer)

vemv13:08:07

The stacktrace really piqued my curiosity... You might have luck reproducing the problem in a production repl by performing the problematic copy of a large file. Maybe 1000 times in a row. Seems better than waiting :) https://stackoverflow.com/a/57004096 was a nice one. It links to this blog post (otherwise the link is dead) https://archive.is/aScEq

kwladyka14:08:03

@U45T93RA6 yeah I would like to have full access to production 🙂

😄 3

paulocuneo14:08:40

same stackoverflow post provides another workarounds, like tweaking heap size, to make more memory available to the OS

kwladyka17:08:39

it back java.lang.OutOfMemoryError: Java heap space 😱

😢 3

😱 3

🙀 3

Ivar Refsdal07:08:03

Are you setting -Xmx (heap size) to the container/(VM?) limit? To about 14 GB? Note that -Xmx is for the heap, and that the JVM can and will use more memory than just the size allocated to the heap. Source: https://stackoverflow.com/a/14763095 Could the problem go away by simply omitting -Xmx? You should not need -Xmx: https://www.eclipse.org/openj9/docs/xxusecontainersupport/ I could be totally wrong in suggesting this, but I've recently been battling an OOM and also setting a very high -Xmx.

Ivar Refsdal07:08:21

Yesterday I launched my container with max memory 8 GB (to Azure) and Xmx8g to the JVM, and it has been restarting/OOMing like crazy since then. Edit: This morning I redeployed without -Xmx --- and I'm waiting for the results.

kwladyka08:08:16

Xmx and Xms is set to 14GB

Ivar Refsdal08:08:06

And how much memory is available for the system/container/(vm?) as a whole?

kwladyka08:08:25

20GB

Ivar Refsdal08:08:27

OK. You may want to think about simply dropping -Xmx, re my comment and links above. How/what are you deploying into?

vemv08:08:07

Ivar, have you read the thread? This didn't smell like a vanilla OOM to most of us

vemv08:08:58

FWIW dropping Xmx will leave it at its default, which is typically 1GB. Not good.

Ivar Refsdal09:08:46

I did read it vemv. I don't think Xmx is 1GB by default since java 10, re my links above.

With the release of Java 10, the JVM now recognizes constraints set by container control groups (cgroups). Both memory and cpu constraints can be used manage Java applications directly in containers, these include:
    adhering to memory limits set in the container
...

https://www.docker.com/blog/improved-docker-container-integration-with-java-10/

Ivar Refsdal09:08:26

At least if he is deploying into a container

Ivar Refsdal09:08:30

Regarding -Xmx default value: The default value is chosen at runtime based on system configuration. https://docs.oracle.com/en/java/javase/11/tools/java.html#GUID-3B1CE181-CD30-4178-9602-230B800D4FAE

vemv09:08:54

> At least if he is deploying into a container Certainly My humble input would be to avoid going in circles, the error message / stacktrace is very specific and similar problems had a solution to be applied at Linux level, not JVM (see the link from yesterday) I'd still recommend to start the thread again and list the things you've googled and tried (such as, again, the http://archive.is one. Among a few other things). Else you have people shooting in the dark for you

kwladyka09:08:05

👍

kwladyka21:08:08

It was thread with async which didn’t close properly and refer to data. Maybe just deadlock or maybe just very not optimal pipeline in async. I didn’t analyse it further.

kwladyka21:08:13

But all in all I have to wait longer to be really sure to confirm exception will not back.

kwladyka21:08:47

I am writing this, because maybe it will be useful for somebody.

kennytilton18:08:42

Can #? be extended? We are looking at using the magic of .cljc to share code between backend and front, but "front" has to work for both web and some RN wrapper. And they ^^^ differ in re, say, talking to MQTT via Paho. I see these #? options in the doc example: #?(:clj (Clojure expression) :cljs (ClojureScript expression) :cljr (Clojure CLR expression) :default (fallthrough expression)) Can we extend that ourselves? Thx! 🙏

Ed18:08:22

I think babashka supports it's own tag in the same way ... I suspect that "extend that ourselves" is going to mean "write a complier" 😉

💯 3

noisesmith18:08:12

I think clojure is modular enough that you just need to implement your own reader

lilactown19:08:25

I believe shadow-cljs has this as a feature

lilactown19:08:37

I've used it before to share code between browser and node.js code

lilactown19:08:56

https://shadow-cljs.github.io/docs/UsersGuide.html#_conditional_reading

👍 3

lilactown19:08:32

I believe in a normal JVM Clojure context it should just skip over any conditional that doesn't match :clj

Ed21:08:45

👍 .. cool ... I didn't know shadow had implemented that ... that's super useful ... every day's a school day 😉

Alex Miller (Clojure team)18:08:14

it can probably not be extended in the way you want to extend it

Alex Miller (Clojure team)18:08:41

it is designed as an extensible system, and you can pass platform identifiers to the reader in its option map when you invoke it

noisesmith18:08:51

I think the difference between cljs in web and cljs in react-native is not like the difference between clj and cljs - regular conditional forms will work for what you want without extending the reader conditionals

kennytilton19:08:38

"regular conditional forms will work". Not sure what are "regular conditional forms", but then I am no Clojure guru. Is this some other reader macrology? I only new of the cljs/clj variants. We can certainly write app code that tests some run-time variable to decide which platform to cater to, but I am worried about the NS and project.clj dependencies. Do we solve all this with some black-belt (or trivials deps.edn work)?

kennytilton19:08:34

Wait ENVARs? Well, still need conditional (ns (:require....???))

lilactown19:08:32

yeah the real PITA sharing code between two platforms (web and rn) are conditional requires

lilactown19:08:41

you really want something that works at the compiler level

lilactown19:08:00

since you can't really conditionally require code in CLJS

noisesmith20:08:45

you can conditionally require by using the require function inside a conditional, then use some variety of DI to provide the right platform dependent implementation

noisesmith20:08:59

integrant / component / etc. make this easy

noisesmith20:08:36

what makes this harder in cljs is that you can't use resolve

lilactown20:08:50

no, you cannot use require inside of a conditional in CLJS

lilactown20:08:31

all dependencies within a namespace must be statically put in the (ns ,,, (:require ,,,)) in ClojureScript

noisesmith21:08:44

oh, I forgot that, thanks

kennytilton22:08:19

I am having fantasies of two projects, one dedicate to Web, one to RN, that include whatever can be platform-neutral in one shared project, and then each pulls in a platform-specific project that supplies to provide (essentially) a common API to the ultimate client apps. insert hand-waving. We could even leave out platform-specific stuff, but I kinda like the idea of shared code bases as much as possible in an enterprise situation where code reuse has a chance.

Alex Miller (Clojure team)18:08:05

but you cannot change the platform identifiers used when reading clj or cljs source itself

noisesmith18:08:16

the reader conditionals prevent compiler errors, I can't imagine what wouldn't even compile?

didibus19:08:10

In transduce, I'm a bit confused when f is involved. It seems that the 2-ary of my f is called once in the begging with the init but the element is like the element returned by the xf. So I'm confused, its like my f is plugged after the xf, but it received the init?

dpsutton19:08:19

Can you put a simple case here?

didibus19:08:35

(defn index-of
   ([element coll]
    (index-of element coll []))
   ([element coll idxs]
    (transduce
     (comp (keep-indexed
            (fn [idx item]
              (if (sequential? item)
                (index-of element item (conj idxs idx))
                (when (= element item) (conj idxs idx))))))
     (fn
       ([acc e] (println acc e) (when (some? e) (reduced e)))
       ([done] (first done)))
     :init
     coll)))

(index-of 3 [1 2 3 4 5 6 7])

;; prints:
:init [2]
:: returns:
2

hiredman19:08:47

Transduce is something like ((xf f)(reduce (xf f) init coll))

hiredman19:08:17

So f is not invoked directly, and xf has control of what f sees

hiredman19:08:15

I think you code is just kind of buggy? Like, inside your mapped keep-index function, you are conjing indices onto the passed in to the whole index-of function

hiredman19:08:00

Which maybe is a thing you want, but seems highly unlikely

didibus19:08:59

Hum, ya it might be haha, I was more focused on the transduce bit.

hiredman19:08:54

I think what you are missing is how transducers work, where transduce is kind of the trivial application of

hiredman19:08:57

If you can dig up rich's original blog post about transducers it might be helpful

hiredman19:08:02

https://www.cognitect.com/blog/2014/8/6/transducers-are-coming

didibus19:08:24

Lets use this one instead:

(defn index-of
   [element coll]
   (transduce
    (keep-indexed
     (fn [idx item]
       (when (= element item) idx)))
    (fn
      ([acc e] (println acc e) (when (some? e) (reduced e)))
      ([done] done))
    :init
    coll))

(index-of 3 [1 2 3 4 5 6 7])
:init 2
2

seancorfield19:08:40

The thing about transducers that seems to confuse everyone is that there are three arities and one of them (0-arity) is never called 🙂

dpsutton19:08:05

it is called in transduce

phronmophobic19:08:24

where?

hiredman19:08:56

there are actually multiple 0 arities available to tranduce, I beliver it will call (f), but before applying xf, which is the confusing thing

phronmophobic19:08:57

afaict, the initial value is produced by calling the reducing function f rather than calling the transducer, xform https://github.com/clojure/clojure/blob/master/src/clj/clojure/core.clj#L6904

didibus19:08:58

It would call my f with no arg if no init is passed

dpsutton20:08:24

reducing function can of course have transducers

(let [rf ((map inc) conj)]
  (transduce (map inc) rf [] (range 4)))

phronmophobic20:08:01

does that ever call the transducer's 0 arity?

dpsutton20:08:32

that would call the 0 arity of (map inc)

dpsutton20:08:50

which will delegate to the underlying reducing function conj which will return []

didibus20:08:16

Are you sure, that's not what the doc says.

didibus20:08:44

> If init is not supplied, (f) will be called to produce it.

didibus20:08:41

Maybe it implied, only if the xform 0-ary decides to call it

dpsutton20:08:58

(let [xf1 (fn [rf]
            (fn
              ([] (println "I am called") (rf))
              ([result] (rf result))
              ([result x] (rf result x))))
      rf (xf1 conj)]
  (transduce (map inc) rf (range 5)))
I am called
[1 2 3 4 5]

dpsutton20:08:21

unless you are pointing out that i accidentally supplied the init

didibus20:08:44

(transduce
  (fn [rf]
    (fn ([] (println "xf init") (rf))
      ([done] (println "xf done: " done) done)
      ([acc e] (println "xf rf: " acc e) e)))
  (fn
    ([] (println "f init") :init)
    ([done] (println "f done: " done) done)
    ([acc e] (println "f rf: " acc e) e))
  [1 2 3 4 5 6 7 8 9])

f init
xf rf:  :init 1
xf rf:  1 2
xf rf:  2 3
xf rf:  3 4
xf rf:  4 5
xf rf:  5 6
xf rf:  6 7
xf rf:  7 8
xf rf:  8 9
xf done:  9
9

didibus20:08:14

I don't see xf init being called

didibus20:08:10

I'm honestly super confused by what I see, did I do somethinig wrong?

didibus20:08:26

Oh, forgot to call rf in the other 2

didibus20:08:22

(transduce
  (fn [rf]
    (fn ([] (println "xf init") (rf))
      ([done] (println "xf done: " done) (rf done))
      ([acc e] (println "xf rf: " acc e) (rf acc e))))
  (fn
    ([] (println "f init") :init)
    ([done] (println "f done: " done) done)
    ([acc e] (println "f rf: " acc e) e))
  [1 2 3 4 5 6 7 8 9])
f init
xf rf:  :init 1
f rf:  :init 1
xf rf:  1 2
f rf:  1 2
xf rf:  2 3
f rf:  2 3
xf rf:  3 4
f rf:  3 4
xf rf:  4 5
f rf:  4 5
xf rf:  5 6
f rf:  5 6
xf rf:  6 7
f rf:  6 7
xf rf:  7 8
f rf:  7 8
xf rf:  8 9
f rf:  8 9
xf done:  9
f done:  9
9

You can still see it never calls the init arity of xf

seancorfield01:08:36

Which is what I said, yes.

seancorfield01:08:52

The 0-arity of f is called. That is not the 0-arity of the transducer. f should be callable with 0 or 2 arguments. The transducer is definitionally required to have three arities: 0 (supposedly "init" but it is never used), 1 (completing), 2 (reducing).

dpsutton02:08:42

i posted an example above where a transducer's 0 arity was called

seancorfield02:08:42

No. Your example has a reducing step function. That's not a transducer.

seancorfield02:08:50

In the transduce call, the transducer is the first argument (`xform`).

seancorfield02:08:33

Not all transducer-related functions have a reducing step function.

seancorfield02:08:14

(`sequence`, eduction, etc)

dpsutton02:08:04

didn't i use a transducer as a function from one reducing function to another?

dpsutton02:08:32

(let [xf1 (fn [rf]
            (fn
              ([] (println "I am called") (rf))
              ([result] (rf result))
              ([result x] (rf result x))))
      rf (xf1 conj)]
  (transduce (map inc) rf (range 5)))
I am called
[1 2 3 4 5]

xf1 is a transducer here correct?

seancorfield02:08:49

stares at the code Hmm, yeah, so a transducer will have its 0-arity called only when it is used to create a reducing step function from another reducing (step) function. How/where is that actually done in the wild?

hiredman02:08:32

Yeah, it is

hiredman02:08:17

But xf1 is not passed to transduce

dpsutton02:08:45

it is in a slightly different context but it is an example of where the 0-arity of a transducer is called

dpsutton02:08:17

so i was just pointing out that this was a bit over eager is all > The transducer is definitionally required to have three arities: 0 (supposedly "init" but it is never used),

hiredman02:08:33

This gets it to really picky terminology

seancorfield02:08:54

But xform -- which is what is normally referred to as a transducer never gets its 0-arity called in any transducer-related functions.

hiredman02:08:11

The transducer being a function from step function to step function has no 0-arity

dpsutton02:08:19

yeah i agree. i've also gotten wallowed a bit in figuring out which functions are transducers, which are reducing functions, if there's a name for a reducing function that has a 0-arity version, etc

dpsutton02:08:39

a name for a reducing function that also has a completion arity, etc

seancorfield02:08:40

True, the transducer itself really only has a 1-arity version...

seancorfield02:08:13

Yeah, I guess it's sloppy to refer to a transducer having a 0-arity at all?

hiredman02:08:20

But people are generally pretty lax about what is called a transducer

seancorfield02:08:29

"The inner function is defined with 3 arities used for different purposes:" -- from https://clojure.org/reference/transducers

hiredman02:08:41

So when it is said that transduce doesn't invoke the 0arity what is meant is that it doesn't invoke the 0-arity of the step function created by applying xform to the step function

seancorfield02:08:14

Sorry, I guess I'll be more careful with terminology from now on 😐 There's at least one very confused SO post about this...

hiredman02:08:04

And given that meaning it is still true with dpsuttons example

seancorfield02:08:16

That does make me ask my other Q again tho' @U11BV7MTK: where in the wild do we see transducers applied to reducing step functions to create new reducing step functions? Normally we just see the xform as a comp of a bunch of transducers.

seancorfield02:08:51

(it is now clear to me that is what the reference doc is actually describing, now matter how many times I've read it in the past!)

hiredman02:08:25

I've seen the places where xform is called on a reducing function called a "reducing context"

dpsutton02:08:07

we've got a few at work

seancorfield02:08:09

Ah, interesting... in OSS? Link? Or just in blog posts about transducers?

hiredman02:08:14

So there is one of those inside transduce, and sequence, and core.async channels, and if you where creating your own reducing context

hiredman02:08:24

Not sure

dpsutton02:08:30

let me see if i can find them. at metabase we're all open source so i can share

seancorfield02:08:46

I forget you work there 🙂

dpsutton02:08:16

going on about a year now

dpsutton02:08:31

this stuff takes me a while to remember how everything works but here's an example

dpsutton02:08:32

https://github.com/metabase/metabase/blob/master/src/metabase/sync/analyze/fingerprint/fingerprinters.clj#L248

dpsutton02:08:47

but that will be super hard to follow without some navigation and knowing what is going on

seancorfield02:08:47

And just above that, there's the same confusion we just had here: https://github.com/metabase/metabase/blob/master/src/metabase/sync/analyze/fingerprint/fingerprinters.clj#L206-L221 -- histogram isn't a transducer, it's a reducing step function.

dpsutton02:08:25

yes good point. there is no other rf involved so it immediately stands out

seancorfield02:08:30

(and then it's used in ((filter real-number?) histogram) at the end of that block)

dpsutton02:08:07

i think the first time i the pattern like this was in the history of clojure paper

seancorfield02:08:19

OK, I'll bear Metabase in mind when this subject comes up again (because it will). Thank you! I've been perpetuating incorrect information because I had my terminology wrong.

dpsutton02:08:44

every time i reason about these things some new piece clicks into place

didibus20:08:24

So I'm confused as to what receives the init value first? Does it first call the 2-ary of xf with [init first-element] and then keep-indexed calls my f/2-ary but passes to it the init untouched and the transformed element? But if so, I should see it printing a bunch of nils. So I think keep-indexed chooses not to call my f until there is a non-nil transformed element, but then I'm confused how my f at that point receives init ? Is that just all what keep-indexed does under the hood?

hiredman20:08:54

f and xf are not distinct

hiredman20:08:23

as I said transduce is something like ((xf f) (reduce (xf f) init coll))

hiredman20:08:00

transduce builds a new function g by applying xf to f and then reduces with g

didibus20:08:08

Ya, I think I'm just surprised by the behavior of keep-indexed transducer. Its like it keeps track of the init even though its later in the reduction, and passes it to my f as the first value. Also, is there no way to have init be the first element? Like what reduce does?

hiredman20:08:29

there is not

hiredman20:08:45

reduce's behavior there is in some ways considered to be a mistake

didibus21:08:58

Interesting, in what sense? I feel like most of my reduce use case start with the first two elements.

didibus21:08:31

And so I often need to find a kind of identity for the init, and sometimes that can be tricky.

hiredman23:08:15

Assumes the accumulator and the elements are the same type

hiredman23:08:42

Which is almost never the case for complex folds

hiredman23:08:25

It is why reducers have distinct combining and reducing functions

didibus00:08:48

Hum... I mean, but before you at least had a choice, if they are the same type, don't pass init, if they are not, pass an init. Now when they are the same type, you need to find a value of that type that somehow will result in an identity when your reducing function is first called

hiredman00:08:14

Or call first and rest your self

didibus00:08:51

I thought about that, felt it was weird. Wouldn't it mess up the coll if it was on a reduced fast path?

hiredman00:08:03

It is what reduce has to do anyway if you don't supply an init

hiredman20:08:07

there is no other behavior for keep-indexed as a transducer that makes senses

didibus20:08:32

Feel it would have been better to pass the index down

hiredman20:08:07

that would throw away the accumulator in the reduce and makes no sense

didibus21:08:39

Hum, I was thinking like you'd get the accumulated list of things kept till now as the accumulator, and maybe the index as.the element. But I think you're, what they did is probably better

hiredman20:08:45

the index parameter muddies things, so it might be clearer if you think about it just in terms of filter, or possibly start from filter and see what it takes to add the index

Frank Henard22:08:39

I'm writing a script that migrates data. I want to grab a list of IDs, and send them to n threads to run concurrently. I was chunking them, but some threads finish early, and it slows down as it nears the end. I would like to use a queue for this. Should I use core.async, or can I just use an atom with a list as a queue?

Frank Henard22:08:37

The problem I see with the atom is that getting the first element in the list, and updating to rest needs to happen in the same operation to avoid race conditions, and I'm not readily seeing how to do that

emccue22:08:09

just use an ExecutorService

emccue22:08:53

max N threads, submit all your tasks

p-himik22:08:38

That doesn't work great if the queue is large of it a task requires too much data. Had OOMs before because of it.

hiredman22:08:58

that is why you make filling the work queue a task on the work queue

hiredman22:08:35

but yeah, use an executor https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html#newFixedThreadPool-int-

p-himik22:08:10

Could you elaborate a bit? I'm probably too sleepy to grasp it right away.

hiredman22:08:21

the problem with an atom is it is non-blocking, and for a work queue you generally want something blocking, or else you end up polling for work

hiredman22:08:05

say you have some how some database query that you can fetch in pages, and you want to do work to each page

Frank Henard22:08:14

thanks for the quick responses! I gotta run, but I'm going to come back and look through what you sent

hiredman22:08:47

you write some code that grabs N pages, puts them on the work queue, then queues itself on the work queue to do the next N, etc

hiredman22:08:12

so you never over fill the work queue and oom because of the queue size

hiredman22:08:31

(1 at a time is never, N at a time is greatly reduced, but whatever)

p-himik22:08:22

Ah, right, makes sense. I was stuck on thinking about it in the context of the problem that I had - a queue of manageable size already in memory, so I never had to fetch any pages. Just scheduling it all was blowing things up because threads aren't that light-weight, even if you don't feed them much data. (or rather, not threads themselves since there's a limited amount but the scheduled task in the executor)

hiredman22:08:20

depends, if you use a fixed size threadpool (like the static method I linked to) and fill the queue in the way I described then there is pressure, if you are interacting with the executor externally it is tricky but doable, you may have to cps your code though

p-himik22:08:39

@U0NCTKEV8 To approach it from a different side - why would you not want to use core.async? Any other reasons besides "it's easy enough to do with a fixed thread pool and a queue"?

hiredman22:08:11

if you are connecting core.async code to an executor used for io, it usually suffices to ignore any futures the executor creates, and instead queue up tasks that deliver their results to a channel, and have the core.async code park on the channel

hiredman22:08:57

core.async is kind of 3 things: channels, the go macro, and threadpools

hiredman22:08:20

alts falls out of channels with cancelable/idempotent callbacks

hiredman22:08:25

yeah

hiredman22:08:15

the 3 things really synergize, but they may not match what you are doing, and none of them is a fixed pool executor

hiredman22:08:43

so like you can do io on a async/thread, but async/thread creation is unbounded

hiredman22:08:55

which may or may not be fine, depending

hiredman22:08:55

channels are great, and when you need them there is no substitute, but a lot of the time you can get by with some kind of queue from java.util.concurrent

didibus22:08:41

I'm surprised no one mentioned CompletionService: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletionService.html

didibus22:08:35

(import 'java.util.concurrent.ExecutorCompletionService)
(import 'java.util.concurrent.Executors)

(defn do-concurrently
  "Executes each task in tasks with concurrency c, assuming side-effects,
   and run handler on their results as they complete. Handler is called
   synchronously from the calling thread."
  [tasks c handler]
  (let [executor (Executors/newFixedThreadPool c)
        cs (ExecutorCompletionService. executor)
        initial (take c tasks)
        remaining (drop c tasks)]
    ;; Submit initial batch of tasks to run concurrently.
    (doseq [task initial]
      (-> cs (.submit task)))
    (doseq [task remaining]
      ;; Block until any task completes.
      (let [result (-> cs .take .get)]
        ;; When there remains tasks, submit another one to
        ;; replace the one that just completed.
        (-> cs (.submit task))
        ;; Handle the result of the task that just completed.
        (handler result)))
    ;; Since we submitted an initial batch, but only handled a remaining
    ;; number of tasks, some tasks are left un-handled, and we need to handle
    ;; them.
    (doseq [_ initial]
      (handler (-> cs .take .get)))
    ;; shutdown executor once all tasks have been processed
    (-> executor .shutdown)))

(defn io
  "Simulating an IO operation by sleeping the calling thread
   for the given amount-of-time. Returns the amount-of-time."
  [amount-of-time]
  (Thread/sleep amount-of-time)
  amount-of-time)

;;; Run io 10000 times at 10 ms per io call with up to 100 concurrent calls
;;; and sum up all results.
;;; Then print the time it took and the resulting sum.
(let [sum (atom 0)]
  (time
   (do-concurrently (repeat 10000 (partial io 10)) 100 #(swap! sum + %)))
  (println @sum))

didibus22:08:13

The trick is that you first submit c number of tasks to be executed concurrently. In this case, I've chosen to make 100 concurrent calls at a time. The call to submit is non blocking and will return immediately. After you've initiated your first batch, you block on cs, which will wait till any of them complete, and when one does, it will unblock and return the result of the task that just completed. When that happens, we will submit another task, so that we maintain our concurrency level, and we will call our handler with the result. In effect, we're saying, perform n number of calls up to c at a time. We are handling the results on the thread which submits the remaining tasks as they complete. This means that if our handler is very slow, it will delay our re-queuing of remaining tasks, so that's something to keep in mind. Finally, we have to handle the remaining batch of un-handled tasks, and shutdown the executor to release the resources associated with it.

hiredman22:08:51

core.async pipelines are vaguely like an executor, but not really (pipelines have more ordering which will limit concurrency)

p-himik22:08:43

Perhaps a naive question. Why would this be bad? Assuming we want exactly (+ 2 (.. Runtime getRuntime availableProcessors)) concurrently running tasks, as pmap gives us.

(->> tasks
     (pmap do-stuff)
     vec)

tasks itself could be a chunked lazy seq that won't realize too much data ahead.

hiredman22:08:22

pmap is the worst

didibus22:08:41

I like pmap 😛 , it actually does something similar in trying to stay ahead, but you can't control the number of threads, and it retains the head of whatever you are doing.

didibus22:08:19

Though it retains order I think, so might not be as fast in any case

hiredman22:08:43

pmaps limiting to 2+ is of course broken (because of chunking), and the way it combines laziness and concurrency is bad, and the way it encapsulates the execution means even its broken limits don't apply if you have multiple pmaps being called

hiredman22:08:51

yes, and the ordering thing

didibus22:08:02

Well, the chunking is actually a blessing in disguise, because you can now control pmap's concurrency based on your chunk size 😛

p-himik22:08:33

> limiting to 2+ is of course broken (because of chunking) ~~Doesn't lazy-seq there basically disable chunking? Since it advances one at a time.~~ Duh, there's extra map.

hiredman22:08:20

I was just looking at some migration code, where the migration is written as a reduce (a fold) over each users data, and then the reduce operation is customized to run each reduce step on an executor, and enqueue the next step to run when it is done

hiredman22:08:58

so then you can throw them all on a single executor and they share time

hiredman23:08:06

user=> (seq [1 2 3 4])
(1 2 3 4)
user=> (class (seq [1 2 3 4]))
clojure.lang.PersistentVector$ChunkedSeq
user=> (lazy-seq (seq [1 2 3 4]))
(1 2 3 4)
user=> (class (lazy-seq (seq [1 2 3 4])))
clojure.lang.LazySeq
user=> (class (seq (lazy-seq (seq [1 2 3 4]))))
clojure.lang.PersistentVector$ChunkedSeq
user=>

didibus23:08:23

You can see it going in batch of 32 here:

(pmap #(do (.println System/out (str "map: " %)) (Thread/sleep 500) %) (range 100))

hiredman23:08:25

The customized reduce is just

(defn exec-reduce [exec fun init coll]
  (if (seq coll)
    (exec (fn []
            (if-not (reduced? init)
              (exec-reduce exec fun (fun init (first coll)) (rest coll))
              (fun (unreduced init)))))
    (fun (unreduced init))))

so exec is expected to be a function that queues another function on the executor

p-himik23:08:23

@U0NCTKEV8 Is your example really correct, given that you explicitly wrap a chunked seq?

(defn step [x]
  (lazy-seq
    (if (pos? x)
      (do
        (println x)
        (cons x (step (dec x))))
      [x])))

(first (step 100))

The above will print 100 only once. That's what I meant by "removes chunking", and that's similar to what pmap is using. However, it uses map in between, which is chunked, and that's where that n gets ignored.

hiredman23:08:05

no that is not correct

hiredman23:08:46

lazy-seq is not a defense against chunking

hiredman23:08:05

you don't have a chunked seq there, which is why you don't see it behave like one

p-himik23:08:56

Oh, true, my wording sucks, apologies. Replace "removes chunking" with "allows creating lazy seqs without chunking".

hiredman23:08:07

of course

hiredman23:08:35

you can create lazy-seqs without chunking, but some people have things like vectors and like to map over them

hiredman23:08:12

or have a pipeline like (-> v (map ..) (map ...) (pmap ...)) or something

hiredman23:08:26

where because it started as a vector you get chunking

p-himik23:08:08

TIL map by itself does not induce chunking.

didibus23:08:30

For example:

(defn re-chunk [n xs]
  (lazy-seq
    (when-let [s (seq (take n xs))]
      (let [cb (chunk-buffer n)]
        (doseq [x s] (chunk-append cb x))
        (chunk-cons (chunk cb) (re-chunk n (drop n xs)))))))

(pmap #(do (.println System/out (str "map: " %)) (Thread/sleep 500) %) (re-chunk 50 (range 100)))

Now it is doing 50 at a time.

p-himik23:08:33

Going back to your initial reply on pmap, and ignoring the chunking for now. > the way it encapsulates the execution means even its broken limits don't apply if you have multiple pmaps being called I can see that, although creating thread pools willy-nilly I guess would be roughly the same. So treating a single usage of pmap as if it were a creation of a new thread pool should deal with that concern. > the way it combines laziness and concurrency is bad This is probably the most interesting. Could you say a couple more words on it?

hiredman23:08:52

just a general statement I guess, the laziness limits your ability to control execution, the use of concurrency implies you care about execution

didibus23:08:05

I think for a script, you can probably trust something like:

(dorun (pmap #(handle (some-io %)) coll)

Keeping in mind that your handler will run in parallel as well.

didibus23:08:00

It'll go 32 at a time by default, and you can re-chunk if you want it to go faster. Though you can't slow it down much more then that, since it'll be num of threads + 2 at a minimum, chunk-size otherwise

p-himik23:08:05

Thank you both, I learned something new.

didibus23:08:10

But honestly, the CompletionService in my opinion is the best way to go if you want to run a bunch of tasks in parallel batches and go as fast as you can, handling each result where order doesn't matter.

Frank Henard21:08:39

Thanks everyone. I ended up using the CompletionService and am very happy!

noisesmith22:08:28

I want to call the second method listed here:

(ins)org.noisesmith.gamey=> (-> (reflect/reflect GLFW) :members (->> (filter (fn [x] (= (:name x) 'glfwCreateWindow)))) pprint)
({:name glfwCreateWindow,
  :return-type long,
  :declaring-class org.lwjgl.glfw.GLFW,
  :parameter-types [int int java.nio.ByteBuffer long long],
  :exception-types [],
  :flags #{:public :static}}
 {:name glfwCreateWindow,
  :return-type long,
  :declaring-class org.lwjgl.glfw.GLFW,
  :parameter-types [int int java.lang.CharSequence long long],
  :exception-types [],
  :flags #{:public :static}})
nil

when I call it as follows:

(GLFW/glfwCreateWindow 300 300 "Hello, World!" nil nil)

I get

Execution error (IllegalArgumentException) at org.noisesmith.gamey/init (gamey.clj:18).
No matching method glfwCreateWindow found taking 5 args

the javadoc for this method tells me I should be providing nil for my last two arguments https://javadoc.lwjgl.org/org/lwjgl/glfw/GLFW.html#glfwCreateWindow(int,int,java.lang.CharSequence,long,long) what's the trick to getting clojure to find the right method here? hint the nil as a Long or something?

phronmophobic22:08:38

those are most likely pointers and they probable mean a null pointer (eg. 0) rather than java null

noisesmith22:08:58

I've never seen 0 called NULL in javadoc

noisesmith22:08:02

I'll try it though

phronmophobic22:08:06

or they copy and pasted from the glfw docs

p-himik22:08:24

My bet that it's the case. The original function definition:

GLFWwindow* glfwCreateWindow	(	int 	width,
int 	height,
const char * 	title,
GLFWmonitor * 	monitor,
GLFWwindow * 	share 
)

phronmophobic22:08:35

https://www.glfw.org/docs/latest/group__window.html#ga5c336fddf2cbb5b92f65f10fb6043344

noisesmith22:08:53

yeah, it's just shitty docs and it wanted 0, thanks

phronmophobic22:08:54

> [in] monitor The monitor to use for full screen mode, or NULL for windowed mode. > [in] share The window whose context to share resources with, or NULL to not share resources.

noisesmith22:08:48

I guess I'll keep that in mind when I see random "long" args that are actually pointers

noisesmith22:08:24

I think I got my signals crossed because under X11 you really do look up screens and windows with numeric ids that aren't pointers

noisesmith22:08:39

(IIRC)

zane23:08:22

It’s been a while since I’ve had to interop with a vararg Java method. How would I call a static method with this type signature? *of*(double[][] data, java.lang.String... names)

zane23:08:00

My understanding is that you just pass arrays to vararg functions.

noisesmith23:08:15

(.of O data (into-array String [...]))

zane23:08:36

Aha, the array is only for the variable part.

zane23:08:05

Brilliant:

(DataFrame/of (into-array (map int-array
                               [[0 0 0 0]
                                [0 0 0 0]
                                [0 0 0 0]]))
              (into-array java.lang.String
                          ["w"
                           "x"
                           "y"
                           "z"]))

noisesmith23:08:59

your spec above was doubles not ints, but yeah

✅ 3

emccue23:08:24

if you are just messing around with quaternions and whatnot maybe take a look at neanderthal

emccue23:08:16

or tech ml dataset for the dataframe type stuff

emccue23:08:23

or python integration

2021-08-17

Channels