This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-08-17
Channels
- # announcements (13)
- # beginners (56)
- # brompton (1)
- # cider (2)
- # cljsrn (10)
- # clojure (369)
- # clojure-australia (4)
- # clojure-boston (1)
- # clojure-europe (28)
- # clojure-nl (1)
- # clojure-spec (1)
- # clojure-uk (18)
- # clojurescript (26)
- # data-science (2)
- # datahike (4)
- # datalog (2)
- # datasplash (6)
- # datomic (9)
- # events (1)
- # kaocha (4)
- # macro (1)
- # malli (22)
- # meander (40)
- # membrane (30)
- # music (1)
- # nbb (3)
- # news-and-articles (3)
- # off-topic (12)
- # practicalli (1)
- # re-frame (19)
- # remote-jobs (1)
- # sci (22)
- # shadow-cljs (15)
- # spacemacs (4)
- # tools-deps (40)
- # xtdb (26)
You can use reduced
yes, that's how take
and halt-when
work. What you can't do is return something from the function you pass to map
or keep-indexed
wrapped in reduce and expect it stop processing the rest of the data. So this won't stop at 3:
(into [] (map (fn [x] (if (< 3 x) (reduced x) x))) (range 10))
pretty much all the reduce looking things in core support reduced. Like transduce
and reductions
and so on ... so this will result in 3
(reduce (fn [r i] (if (< 3 i) (reduced r) i)) (range))
rather than running through the infinite seq foreveruser=> (into [] (keep-indexed (fn [idx i] (reduced idx))) (range 3))
[#object[clojure.lang.Reduced 0x7c6189d5 {:status :ready, :val 0}] #object[clojure.lang.Reduced 0x4248e66b {:status :ready, :val 1}] #object[clojure.lang.Reduced 0x3e6534e7 {:status :ready, :val 2}]]
I must be doing it wrongOh, @U0P0TMEFJ already said that 😄
(into []
(keep-indexed
(fn [idx item]
(println idx)
(when (= 3 item)
(reduced idx))))
(range 100))
This prints all elements. So I can only assume the keep-indexed didn't short-circuit.@U064X3EF3 Are you sure you are supposed to be able to use reduced with keep-indexed transducer? Is it a bug then?
yes @U0K064KQV that's exactly correct behaviour. You can't just return a reduced
element into a collection, you have to return the whole collection via reduced
.
if you're trying to compose something like take-while
after a keep
, remember that keep
expands to a map
and a remove nil?
so
(into [] (comp (keep-indexed (fn [idx item] (prn idx) (when (< idx 3) item)))
(take-while (complement nil?)))
(range 5)
is the same as saying "map all the elements, and when we've seen more than 3 return nil, then remove all the nils, then stop if we see nil" ... which will consume the whole range ... right?Yes, right now keep-indexed work like: keep ALL indexed which match pred. But it be nice to be able to say: keep UP-TO reduced indexed which match pred. Because that's one downside when switching to the transducer, something like this:
(first
(keep-indexed
(fn [idx item]
(println idx)
(when (= 3 item)
idx))
(range 100)))
Will only consume up-to chunk-size until it finds the first non nil thing that is kept.
But the transducer will actually be consuming the full list no matter what.Hum, ok actually maybe I'm wrong, this does seem to work, which I saw you had showed before but had missed:
(into []
(comp
(keep-indexed
(fn [idx item]
(println idx)
(when (= 3 item)
idx)))
(take 1))
(range 100))
yeah ... if I just wanted the first thing out of a list, with a transducer, I'd compose (take 1)
in there, rather than adding first
...
Ah, that's what Alex meant by you can use reduced, like it will short-circtuit if a nested transducer returns reduced to it
(first (sequence (keep-indexed (fn [idx item] (prn idx) (when (< idx 40) item))) (range 50)))
so that will behave like the lazy-seq version of keep-indexed
and consume a chunk at a timeif you're using into
you're producing a fully realised vector, then taking the first
one, which is why it consumes the whole sequence
Its not exactly the same, this is still iterating through each transducer 1 element at a time. Where as lazy-seq will iterate 32 at a time per sequence function
well ... it'll do enough work to produce 32 elements in the lazy seq being returned by sequence
Ya, but it will still do a reduce like behavior. Like if you have (comp A B C) it takes ele1 and sends it through A, then B then C. Now it take ele2 and sends it through A, B and C, etc. Where as with lazy-seq if you have (->> coll A B C) it will take 32 elements and run them all through A, then send 32 result out of A to B, and then 32 out of B to C
yeah ... which is why transducers are more efficient ... cos they're not producing all those lazy seqs ... right?
Unless you use some special transducers that collect things, like partition-by. And I thought keep-indexed did too, but I was wrong.
Ya, because each batch of "32" is wrapped in an extra object container, and the creation of that object and garbage collection is what slows down lazy-seq
I mean, its not just the batch, like even unchunked seq will wrap the single element in an extra object.
So using (sequence ...)
with transducers should still be more performant than lazy-seq. Because it will only create a chunk each 32 result of the full chain, not any of the intermediate ones
Its what this sentence from the guide basically means: > The resulting sequence elements are incrementally computed. These sequences will consume input incrementally as needed and fully realize intermediate operations. This behavior differs from the equivalent operations on lazy sequences.
I discovered something nice that seems to accidentally work. One of my randomly-generated test suites was causing a java.lang.StackOverflowError
exception. I wanted to know what input data was triggering the error. So I set up the following, to catch, warn, and rethrow the exception
(try (unary-test-fun data)
(catch java.lang.StackOverflowError e
(cl-format true "~&Stack overflow on ~A~%" data)
(throw e)))
I was half expecting it not to work, but it seems to work beautifully, at least in my case.Try/Catch->Rethrow is something commonly done in Java and C# as far as I remember :thinking_face:
It is definitely supposed to work, and it's certainly common. One of the common uses of it (at least on my long term codebase at my job) is to log evidence of a certain problem (the exception) when you can't trust the calling code to properly deal with the exception.
Building a JAR for a Clojure library with tools.build, tools.deps; no :deps
in my project's deps.edn
and no aliases engaged; running clojure -T:build jar
with a stock jar task (copied from fogus' blog post), the final pom.xml
file still has a dependency on org.clojure/clojure
(version 1.10.1, which isn't even the one in the :deps
of my root deps.edn
file).
I can manually dissoc org.clojure/clojure
from the :libs
from the basis to keep it out of the final pom, but I'm wondering if I'm overlooking something simple, or if there's an expectation that even libs now have a dependency on Clojure and consumers can just override deps (including Clojure) as they see fit.
that is happening because org.clojure/clojure
listed as a dependency in the root deps.edn provided by clojure cli
To avoid using it you can specify option :root nil
for create-basis
function
(b/create-basis {:project "deps.edn" :root nil})
@U04V4KLKC thank you! that solves my problem. I must be mistaken about which deps.edn
is my root one, because the Clojure version being included isn't the same as the one specified at /usr/local/lib/clojure/deps.edn
but that's a different mystery to solve
there are three “main” deps.edn files:
• clojure cli specific
• the one from user’s home
• your project’s deps.edn
clj -Sdescribe
and look at :config-files
you will see full paths to those files
create-basis
function allows you to override each of them: https://clojure.github.io/tools.build/clojure.tools.build.api.html#var-create-basis
I'd expect every library to have deps for the libraries it needs in order to work, I consider clojure one of those libraries
sure, but root deps.edn is declaring org.clojure/clojure {:mvn/version "1.10.3"}
which could be too high version for distribution of some library
does clj really pick a higher versioned deeper dep over a lower versioned top level one?
I'd consider any behavior other than using the version explicitly in your deps file a bug, and not declaring a version is asking for trouble
hm… no, probably I had something different that influence which version to use for pom.xml
> and not declaring a version is asking for trouble in my head it was always expressed in the form - “clojure is a library for java” -> so it can load other “clojure” libraries -> so those libraries should not declare dependency on some version of clojure got this impression after looking at a number of “contrib” libraries such as data.json
> I'd expect every library to have deps for the libraries it needs in order to work, I consider clojure one of those libraries I prefer not to include Clojure as a dependency for Clojure libraries I distribute, with the expectation that consumers will either use the Clojure version of their system or specific project. Every Clojure library specifying a Clojure dependency adds to the noise of dependency trees and their resolution. Maven's dependency scopes provide a story for "indicate that my code uses this dependency, but expect the consumer to provide a concrete dependency on it downstream", but I believe dependency scopes of this nature are intentionally not supported by tools.deps
@U11SSJP2A Chiming in late here. It's fairly typical for Clojure libraries to list as a dependency the minimum version of Clojure they work with. Sure, not all libraries do that -- some just assume Clojure will be "provided" -- but I think it's a good idea if there are (earlier) versions of Clojure a library will not work with.
@U04V70XH6 That's certainly a fair reason and in my case applies. Thanks!
@U04V4KLKC The root deps.edn
(built-in for t.d.a) declares a default dependency for whatever is the current version of Clojure when that version of t.d.a was released. So "by default" when using the CLI, you get a "recent" version of Clojure -- and that's reflected in the version of the CLI itself: 1.10.3.933 -- by default uses Clojure 1.10.3. See https://clojure.org/releases/tools -- there were a few 1.10.2.x versions earlier this year and it was 1.10.1.x all of last year.
What’s the preferred EDN serialization/deserialization ? pr-str
and clojure.edn/read-string
?
that's a fine combination; I tend to use pr-str
together with clojure.tools.reader.edn/read-string
why clojure.tools.reader.edn
instead of clojure.edn
?
tools.reader README has a nice rationale that says it better than I could rephrase it: https://github.com/clojure/tools.reader#rationale
trying to debug a stack overflow problem. Is setting the maximum stack depth something I can change from clojure or do I have to add some flag to the :ivm-opts
of my project.clj
file?
stack size is a global JVM property. You can configure it passing -Xss100M
as an example
but instead of increasing stack size I can recommend change the code so it won’t consume stack. There are some handy function in clojure core: trampoline
as an example
Setting Xss100M is bonkers here's a more reasonable Xss (plus a comment on how to figure out a good value for your machine) and XX:MaxJavaStackTraceDepth which also is relevant https://github.com/reducecombine/.lein/blob/e05d6a2d22c0990a88a660c25fe8c5e51a3c6b1a/profiles.clj#L11-L43
My suspicion is that the lazy functions are triggering the stack overflow. I recently refactored lots of functions to return lazy lists. This means that functions which do not appear to be heavy stack users, all of a sudden become compute intensive. For example (first (rest ...))
now has to compute the 2nd element of the sequence.
Anyway, currently this is only a suspicion. Maybe my bug is elsewhere, or maybe I really have introduced a logical bug in the lazy-list refactoring.
@https://app.slack.com/team/U04V4KLKC, I haven't used trampolining yet. It was my impression that is intended for direct recursion, not for meta-circular dependencies.
if you're getting stack overflows with lazyness, have you seen this? https://stuartsierra.com/2015/04/26/clojure-donts-concat
https://stuartsierra.com/2015/04/26/clojure-donts-concat yes, laziness might bring such problems. Here is a post about this
@U45T93RA6 is the intent of your post that I :jvm-options
section into my project.clj
file?
yes, profiles.clj has the same syntax as project.clj you'd simply have to copy the Xss and XX:MaxJavaStackTraceDepth entries
> It was my impression that is intended for direct recursion, not for meta-circular dependencies. not necessarily. look at this example - https://clojuredocs.org/clojure.core/trampoline#example-5552b71ee4b01ad59b65f4cf
the tldr of https://stuartsierra.com/2015/04/26/clojure-donts-concat is to avoid concat
, if I understand correctly. I'm not really using concat
directly, but I am using several calls to mapcat
which internally uses concat. And in my recent refactoring I created my own lazy/mapcat
which is based of my lazy/concat
... with the goal of 1-chunking rather than Clojure's default 32-chunking.
the motivation being that 32-chunking is intended to optimize long thin sequences. In stead my application as short fat sequences.
@U04V4KLKC, yes nice example. But it still seems to me trampolining is when a closed set of functions need to call each other in a lexically concise way. In my case I have several generic functions which operate on trees whose nodes are sequences of other trees. Many operations are dependent on other operations, and even some operations are defined in different namespaces. So while it might be possible to refactor to use trampolining, it is not apparent to me how to do so.
That being said, it certainly would be nice if the clojure compiler knew how to efficiently compile tail calls of functions defined within the same letfn
. That's not the problem I'm facing here, but it would be an interesting optimization.
As a quick observation, sometimes a SO error doesn't really indicate a categorical flaw in your code... clojure programs are hungrier than Java programs, so the JVM default settings don't always fit
This expresses itself quite often in programs using walk
, but also with various other functional patterns
I've seen this in well-known libs, there was no bug one simply has to set Xss intentfully.
@U45T93RA6 good to know. In my case since I just did a big refactoring, I have to really consider whether I did in fact introduce subtle bugs into the program.
Here are the lazy functions I am using. I have some unit tests which do sanity checks to assure that the function behave the same semantics as the clojure.core functions they eclipse. However, there may indeed be hidden greedy stack consumers hidden in there. https://gitlab.lrde.epita.fr/jnewton/clojure-rte/-/blob/295d0a287eb5fed51bd37cbcc3c4fc82400c2310/src/clojure_rte/lazy.clj
@U45T93RA6 with these changes you suggested, I'm still getting stack overflows, but now the stack traces are vvvvveeeeeeerrrrrrrryyyyyyyy long. Is there a way to tell clojure to prune the stack trace it prints?
You can undo or tweak XX:MaxJavaStackTraceDepth it only affects reported stacktraces, nothing else. I do find large sizes for it useful. Often with SOs the first few thousand entries will be repetitive and will hide the root ns that is invoking that code in the first place
When exactly garbage collector free memory? When references no longer exist? From time to time? In some kind of random way? The point is I see heap usage on chart and I am thinking if it is current need for memory or also include significant part of data which are no longer needed, but GC didn’t remove them yet.
I think most GCs don't free memory at all, their Linux process' memory can only grow or keep its size G1GC does free memory as soon as it performs a GC
I am thinking how to interpret used heap: 1) heap which is currently used by app to which app refer 2) the point above + memory which is not needed anymore, but GC didn’t free it yet
so the code where using 1GB vector in function which ended. This data are not used by app anymore. Is this 1GB still in used heap and waiting for GC to remove it? I think yes. How long?
> When exactly garbage collector free memory? When references no longer exist? From time to time? In some kind of random way?
Answering again then: will depend on the choice of GC and its parameters (which can be many).
I guess a sensible tldr is that if the GC thinks you're about to run out of memory, and it's good timing to perform a GC, then it will do so.
Overall it's a non-deterministic process although (System/gc)
can nudge it for the sake of experimientation.
> GC thinks you’re about to run out of memory, and it’s good timing to perform a GC, then it will do so Yes this one for sure when it is close to out of memory. But what before? Does it run every 15 minutes or something like that?
But hmm. Maybe I just have to accept I can’t know what memory heap contain. I mean how many old data.
I don't think any GC will have a hard-and-fast rule like "every 15m" or such. They're really complicated programs (which is why runtimes other than the JVM have subpar GCs)
ok so then it means I can’t really use heap usage as a way to know how much memory app need at that moment
In your screenshot, Used Heap
includes used and unused object references. I know this by pure logic: the graph descends from time to time (once per each performed GC), which implies it accumulates garbage as the program runs
If you google around, you should be able to find tools for dumping JVM memory and then exploring the dumped snapshot. These tools aren’t super easy to use, but if you need to know what’s eating up your memory, they’re a good way to explore it.
In my very limited understanding of memory usage that graph looks normal to me (depends on exactly what it’s doing at the end there though). What are you trying to debug?
@U029J729MUP I am waiting for cannot allocate memory to get heap dump file. But I am really not sure if it will help me. Debugging this is hard and very limited information. Especially with anonymous functions which are named in the way you really don’t know what part of the code it is.
@U0VP19K6K I am trying to fix “java.io.IOException: Cannot allocate memory”
if it’s a problem with JVM memory, I would expect an error, not an exception :thinking_face:
googling that error message suggests that this may be an allocation failure in a child process
if the JVM itself were running out of memory, you’d get an OutOfMemoryError, not an IOException
com.amazonaws.services.s3.internal.S3AbortableInputStream - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
Exception in thread "cli-planner thread" java.io.IOException: Cannot allocate memory
before we were using Java 8 and we had out of memory exception, after update to Java 11 we have the one above
Is it maybe then a case that the JVM just needs more heap than it’s allowed to use? (eyeballing graph above)
Note carefully the difference between an exception and an error. This is an exception, implying that it’s something the JVM program may in principle be able to handle. That means the JVM itself is not running out of memory – something else, in native code, is failing to allocate, which may be for a number of reasons.
@U029J729MUP can you give such example?
I think you'd have better luck starting this thread over again stating that you get a java.io.IOException: Cannot allocate memory
and ideally attaching a redacted stacktrace, and the specific things you've tried for trying to solve that specific error (generic OOM hunting doesn't count; that assumes a specific root cause)
tldr this doesn't particularly smell like an OOM, you can get better help by simply stating your problem and letting the experts who hang out in #clojure help
(certainly not me in this case)
I don’t have a lot of experience with native code, but, for example, something might be trying to allocate a contiguous block that is larger than any free spot in system memory
which isn’t quite the same thing as simply running out of memory, and is very different from the JVM running out of memory
ok so let’s be clear about what java.io.IOException: Cannot allocate memory
mean
Are you saying it is not about heap or Java memory, but system memory for 100%?
Yeah… I would also consider that there are multiple factors here… I wouldn’t assume it’s a leak, not sure what you’re doing with the stream but if it’s big enough and you’re trying to consume the whole thing in one go it could surface other issues like others have mentioned? Just a stab in the dark 🙂
Yes. If the JVM were running out of memory, it would be an OutOfMemoryError. An IOException implies that it’s a problem encountered while interfacing with something else on the system. Googling the error message reveals that people encounter this most often when working with child processes.
That’s probably a bit more than I can take the time to explain. It’s a core concept in operating systems – I suggest googling “child process” and just reading up a bit. But in a nutshell, it’s another program outside of the JVM that the JVM is talking to.
It’s not at all unlikely that the AWS libraries spawn child processes or do some unexpected shenanigans with native code
The log message that you see occurs in the close method of that class, so it’s probably not saying anything about the root cause of your error, just that the error interrupted the read https://github.com/aws/aws-sdk-java/blob/master/aws-java-sdk-s3/src/main/java/com/amazonaws/services/s3/internal/S3AbortableInputStream.java#L174
but that tells you something about when it’s happening
> . But in a nutshell, it’s another program outside of the JVM that the JVM is talking to.
Yes, but I don’t think we have such one. I also didn’t see it using jcmd
.
But it is already progress, because I thought this exception is Java memory issue. Not outside Java memory (system memory).
Yeah I’m just speculating about it being a child process, but I’m positive that it isn’t the JVM running out of memory. This is native code somewhere. Maybe in a native system call in low-level library code. Do you have a full stack trace?
2021-06-22 18:19:33,837 [cli-planner thread] WARN com.amazonaws.services.s3.internal.S3AbortableInputStream - Not all bytes were read from the S3ObjectInputStream, aborting HTTP connection. This is likely an error and may result in sub-optimal behavior. Request only the bytes you need via a ranged GET or drain the input stream after use.
Exception in thread "cli-planner thread" java.io.IOException: Cannot allocate memory
at java.io.FileOutputStream.writeBytes(Native Method)
at java.io.FileOutputStream.write(FileOutputStream.java:326)
at $fn__11002.invokeStatic(io.clj:307)
at $fn__11002.invoke(io.clj:302)
at clojure.lang.MultiFn.invoke(MultiFn.java:238)
at $fn__11006.invokeStatic(io.clj:321)
at $fn__11006.invoke(io.clj:319)
at clojure.lang.MultiFn.invoke(MultiFn.java:238)
at $copy.invokeStatic(io.clj:406)
at $copy.doInvoke(io.clj:391)
at clojure.lang.RestFn.invoke(RestFn.java:425)
at personal_shopper.core$download_file$fn__20988.invoke(core.clj:801)
at personal_shopper.core$download_file.invokeStatic(core.clj:799)
at personal_shopper.core$download_file.invoke(core.clj:797)
at personal_shopper.core$plan_batches.invokeStatic(core.clj:853)
at personal_shopper.core$plan_batches.invoke(core.clj:849)
at personal_shopper.core$plan_supplier_BANG_.invokeStatic(core.clj:893)
at personal_shopper.core$plan_supplier_BANG_.invoke(core.clj:882)
at personal_shopper.core$shop_supplier_BANG_.invokeStatic(core.clj:903)
at personal_shopper.core$shop_supplier_BANG_.invoke(core.clj:899)
at personal_shopper.core$fn__21162$fn__21165.invoke(core.clj:1013)
at clojure.lang.AFn.run(AFn.java:22)
at java.lang.Thread.run(Thread.java:748)
but you are right about cannot allocate memory. It means different thing. I was suggested previous out of memory with Java 8.
but let’s hold on. We changed memory limits yesterday. If it was really about system memory and there is no “memory leak” in system, then maybe everything will work from yesterday.
besides heap, jvm needs off-heap memory to do gc and other management task, maybe all memory is being assigned to heap. Or heap memory is unbound(grows with workload), and none memory is left for housekeeping or OS tasks.
wild guess is S3ObjectInputStream something is using bytebuffer or native memory, and there's a contention with heap
at
sounds like a malloc fail or something like that, which is definitely not managed by JVM memory options but is also beyond my knowledge (I’m not really a native developer)
The stacktrace really piqued my curiosity... You might have luck reproducing the problem in a production repl by performing the problematic copy of a large file. Maybe 1000 times in a row. Seems better than waiting :) https://stackoverflow.com/a/57004096 was a nice one. It links to this blog post (otherwise the link is dead) https://archive.is/aScEq
same stackoverflow post provides another workarounds, like tweaking heap size, to make more memory available to the OS
Are you setting -Xmx (heap size) to the container/(VM?) limit? To about 14 GB? Note that -Xmx is for the heap, and that the JVM can and will use more memory than just the size allocated to the heap. Source: https://stackoverflow.com/a/14763095 Could the problem go away by simply omitting -Xmx? You should not need -Xmx: https://www.eclipse.org/openj9/docs/xxusecontainersupport/ I could be totally wrong in suggesting this, but I've recently been battling an OOM and also setting a very high -Xmx.
Yesterday I launched my container with max memory 8 GB (to Azure) and Xmx8g to the JVM, and it has been restarting/OOMing like crazy since then. Edit: This morning I redeployed without -Xmx --- and I'm waiting for the results.
And how much memory is available for the system/container/(vm?) as a whole?
OK. You may want to think about simply dropping -Xmx, re my comment and links above. How/what are you deploying into?
I did read it vemv. I don't think Xmx is 1GB by default since java 10, re my links above.
With the release of Java 10, the JVM now recognizes constraints set by container control groups (cgroups). Both memory and cpu constraints can be used manage Java applications directly in containers, these include:
adhering to memory limits set in the container
...
https://www.docker.com/blog/improved-docker-container-integration-with-java-10/At least if he is deploying into a container
Regarding -Xmx default value:
The default value is chosen at runtime based on system configuration.
https://docs.oracle.com/en/java/javase/11/tools/java.html#GUID-3B1CE181-CD30-4178-9602-230B800D4FAE
> At least if he is deploying into a container
Certainly
My humble input would be to avoid going in circles, the error message / stacktrace is very specific and similar problems had a solution to be applied at Linux level, not JVM (see the
link from yesterday)
I'd still recommend to start the thread again and list the things you've googled and tried (such as, again, the http://archive.is one. Among a few other things). Else you have people shooting in the dark for you
It was thread with async which didn’t close properly and refer to data. Maybe just deadlock or maybe just very not optimal pipeline in async. I didn’t analyse it further.
But all in all I have to wait longer to be really sure to confirm exception will not back.
Can #?
be extended?
We are looking at using the magic of .cljc to share code between backend and front, but "front" has to work for both web and some RN wrapper.
And they ^^^ differ in re, say, talking to MQTT via Paho.
I see these #? options in the doc example:
#?(:clj (Clojure expression)
:cljs (ClojureScript expression)
:cljr (Clojure CLR expression)
:default (fallthrough expression))
Can we extend that ourselves? Thx! 🙏
I think babashka supports it's own tag in the same way ... I suspect that "extend that ourselves" is going to mean "write a complier" 😉
I think clojure is modular enough that you just need to implement your own reader
I believe in a normal JVM Clojure context it should just skip over any conditional that doesn't match :clj
👍 .. cool ... I didn't know shadow had implemented that ... that's super useful ... every day's a school day 😉
it can probably not be extended in the way you want to extend it
it is designed as an extensible system, and you can pass platform identifiers to the reader in its option map when you invoke it
I think the difference between cljs in web and cljs in react-native is not like the difference between clj and cljs - regular conditional forms will work for what you want without extending the reader conditionals
"regular conditional forms will work". Not sure what are "regular conditional forms", but then I am no Clojure guru. Is this some other reader macrology? I only new of the cljs/clj variants. We can certainly write app code that tests some run-time variable to decide which platform to cater to, but I am worried about the NS and project.clj dependencies. Do we solve all this with some black-belt (or trivials deps.edn work)?
Wait ENVARs? Well, still need conditional (ns (:require....???))
yeah the real PITA sharing code between two platforms (web and rn) are conditional requires
you can conditionally require by using the require function inside a conditional, then use some variety of DI to provide the right platform dependent implementation
integrant / component / etc. make this easy
what makes this harder in cljs is that you can't use resolve
all dependencies within a namespace must be statically put in the (ns ,,, (:require ,,,))
in ClojureScript
oh, I forgot that, thanks
I am having fantasies of two projects, one dedicate to Web, one to RN, that include whatever can be platform-neutral in one shared project, and then each pulls in a platform-specific project that supplies to provide (essentially) a common API to the ultimate client apps. insert hand-waving. We could even leave out platform-specific stuff, but I kinda like the idea of shared code bases as much as possible in an enterprise situation where code reuse has a chance.
but you cannot change the platform identifiers used when reading clj or cljs source itself
the reader conditionals prevent compiler errors, I can't imagine what wouldn't even compile?
In transduce
, I'm a bit confused when f
is involved. It seems that the 2-ary of my f
is called once in the begging with the init but the element is like the element returned by the xf. So I'm confused, its like my f
is plugged after the xf, but it received the init?
(defn index-of
([element coll]
(index-of element coll []))
([element coll idxs]
(transduce
(comp (keep-indexed
(fn [idx item]
(if (sequential? item)
(index-of element item (conj idxs idx))
(when (= element item) (conj idxs idx))))))
(fn
([acc e] (println acc e) (when (some? e) (reduced e)))
([done] (first done)))
:init
coll)))
(index-of 3 [1 2 3 4 5 6 7])
;; prints:
:init [2]
:: returns:
2
I think you code is just kind of buggy? Like, inside your mapped keep-index function, you are conjing indices onto the passed in to the whole index-of function
I think what you are missing is how transducers work, where transduce is kind of the trivial application of
Lets use this one instead:
(defn index-of
[element coll]
(transduce
(keep-indexed
(fn [idx item]
(when (= element item) idx)))
(fn
([acc e] (println acc e) (when (some? e) (reduced e)))
([done] done))
:init
coll))
(index-of 3 [1 2 3 4 5 6 7])
:init 2
2
The thing about transducers that seems to confuse everyone is that there are three arities and one of them (0-arity) is never called 🙂
where?
there are actually multiple 0 arities available to tranduce, I beliver it will call (f), but before applying xf, which is the confusing thing
afaict, the initial value is produced by calling the reducing function f
rather than calling the transducer, xform
https://github.com/clojure/clojure/blob/master/src/clj/clojure/core.clj#L6904
reducing function can of course have transducers
(let [rf ((map inc) conj)]
(transduce (map inc) rf [] (range 4)))
does that ever call the transducer's 0 arity?
(let [xf1 (fn [rf]
(fn
([] (println "I am called") (rf))
([result] (rf result))
([result x] (rf result x))))
rf (xf1 conj)]
(transduce (map inc) rf (range 5)))
I am called
[1 2 3 4 5]
(transduce
(fn [rf]
(fn ([] (println "xf init") (rf))
([done] (println "xf done: " done) done)
([acc e] (println "xf rf: " acc e) e)))
(fn
([] (println "f init") :init)
([done] (println "f done: " done) done)
([acc e] (println "f rf: " acc e) e))
[1 2 3 4 5 6 7 8 9])
f init
xf rf: :init 1
xf rf: 1 2
xf rf: 2 3
xf rf: 3 4
xf rf: 4 5
xf rf: 5 6
xf rf: 6 7
xf rf: 7 8
xf rf: 8 9
xf done: 9
9
(transduce
(fn [rf]
(fn ([] (println "xf init") (rf))
([done] (println "xf done: " done) (rf done))
([acc e] (println "xf rf: " acc e) (rf acc e))))
(fn
([] (println "f init") :init)
([done] (println "f done: " done) done)
([acc e] (println "f rf: " acc e) e))
[1 2 3 4 5 6 7 8 9])
f init
xf rf: :init 1
f rf: :init 1
xf rf: 1 2
f rf: 1 2
xf rf: 2 3
f rf: 2 3
xf rf: 3 4
f rf: 3 4
xf rf: 4 5
f rf: 4 5
xf rf: 5 6
f rf: 5 6
xf rf: 6 7
f rf: 6 7
xf rf: 7 8
f rf: 7 8
xf rf: 8 9
f rf: 8 9
xf done: 9
f done: 9
9
You can still see it never calls the init arity of xfWhich is what I said, yes.
The 0-arity of f is called. That is not the 0-arity of the transducer. f should be callable with 0 or 2 arguments. The transducer is definitionally required to have three arities: 0 (supposedly "init" but it is never used), 1 (completing), 2 (reducing).
No. Your example has a reducing step function. That's not a transducer.
In the transduce
call, the transducer is the first argument (`xform`).
Not all transducer-related functions have a reducing step function.
(`sequence`, eduction
, etc)
(let [xf1 (fn [rf]
(fn
([] (println "I am called") (rf))
([result] (rf result))
([result x] (rf result x))))
rf (xf1 conj)]
(transduce (map inc) rf (range 5)))
I am called
[1 2 3 4 5]
xf1
is a transducer here correct?stares at the code Hmm, yeah, so a transducer will have its 0-arity called only when it is used to create a reducing step function from another reducing (step) function. How/where is that actually done in the wild?
it is in a slightly different context but it is an example of where the 0-arity of a transducer is called
so i was just pointing out that this was a bit over eager is all > The transducer is definitionally required to have three arities: 0 (supposedly "init" but it is never used),
But xform
-- which is what is normally referred to as a transducer never gets its 0-arity called in any transducer-related functions.
yeah i agree. i've also gotten wallowed a bit in figuring out which functions are transducers, which are reducing functions, if there's a name for a reducing function that has a 0-arity version, etc
True, the transducer itself really only has a 1-arity version...
Yeah, I guess it's sloppy to refer to a transducer having a 0-arity at all?
"The inner function is defined with 3 arities used for different purposes:" -- from https://clojure.org/reference/transducers
So when it is said that transduce doesn't invoke the 0arity what is meant is that it doesn't invoke the 0-arity of the step function created by applying xform to the step function
Sorry, I guess I'll be more careful with terminology from now on 😐 There's at least one very confused SO post about this...
That does make me ask my other Q again tho' @U11BV7MTK: where in the wild do we see transducers applied to reducing step functions to create new reducing step functions? Normally we just see the xform
as a comp
of a bunch of transducers.
(it is now clear to me that is what the reference doc is actually describing, now matter how many times I've read it in the past!)
I've seen the places where xform is called on a reducing function called a "reducing context"
Ah, interesting... in OSS? Link? Or just in blog posts about transducers?
So there is one of those inside transduce, and sequence, and core.async channels, and if you where creating your own reducing context
I forget you work there 🙂
but that will be super hard to follow without some navigation and knowing what is going on
And just above that, there's the same confusion we just had here: https://github.com/metabase/metabase/blob/master/src/metabase/sync/analyze/fingerprint/fingerprinters.clj#L206-L221 -- histogram
isn't a transducer, it's a reducing step function.
(and then it's used in ((filter real-number?) histogram)
at the end of that block)
OK, I'll bear Metabase in mind when this subject comes up again (because it will). Thank you! I've been perpetuating incorrect information because I had my terminology wrong.
So I'm confused as to what receives the init value first? Does it first call the 2-ary of xf with [init first-element] and then keep-indexed calls my f/2-ary
but passes to it the init untouched and the transformed element? But if so, I should see it printing a bunch of nils.
So I think keep-indexed chooses not to call my f
until there is a non-nil transformed element, but then I'm confused how my f
at that point receives init ? Is that just all what keep-indexed does under the hood?
Ya, I think I'm just surprised by the behavior of keep-indexed transducer. Its like it keeps track of the init even though its later in the reduction, and passes it to my f as the first value. Also, is there no way to have init be the first element? Like what reduce does?
Interesting, in what sense? I feel like most of my reduce use case start with the first two elements.
And so I often need to find a kind of identity for the init, and sometimes that can be tricky.
Hum... I mean, but before you at least had a choice, if they are the same type, don't pass init, if they are not, pass an init. Now when they are the same type, you need to find a value of that type that somehow will result in an identity when your reducing function is first called
I thought about that, felt it was weird. Wouldn't it mess up the coll if it was on a reduced fast path?
Hum, I was thinking like you'd get the accumulated list of things kept till now as the accumulator, and maybe the index as.the element. But I think you're, what they did is probably better
the index parameter muddies things, so it might be clearer if you think about it just in terms of filter, or possibly start from filter and see what it takes to add the index
I'm writing a script that migrates data. I want to grab a list of IDs, and send them to n threads to run concurrently. I was chunking them, but some threads finish early, and it slows down as it nears the end. I would like to use a queue for this. Should I use core.async, or can I just use an atom with a list as a queue?
The problem I see with the atom is that getting the first element in the list, and updating to rest
needs to happen in the same operation to avoid race conditions, and I'm not readily seeing how to do that
That doesn't work great if the queue is large of it a task requires too much data. Had OOMs before because of it.
but yeah, use an executor https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/Executors.html#newFixedThreadPool-int-
the problem with an atom is it is non-blocking, and for a work queue you generally want something blocking, or else you end up polling for work
say you have some how some database query that you can fetch in pages, and you want to do work to each page
thanks for the quick responses! I gotta run, but I'm going to come back and look through what you sent
you write some code that grabs N pages, puts them on the work queue, then queues itself on the work queue to do the next N, etc
Ah, right, makes sense. I was stuck on thinking about it in the context of the problem that I had - a queue of manageable size already in memory, so I never had to fetch any pages. Just scheduling it all was blowing things up because threads aren't that light-weight, even if you don't feed them much data. (or rather, not threads themselves since there's a limited amount but the scheduled task in the executor)
depends, if you use a fixed size threadpool (like the static method I linked to) and fill the queue in the way I described then there is pressure, if you are interacting with the executor externally it is tricky but doable, you may have to cps your code though
@U0NCTKEV8 To approach it from a different side - why would you not want to use core.async? Any other reasons besides "it's easy enough to do with a fixed thread pool and a queue"?
if you are connecting core.async code to an executor used for io, it usually suffices to ignore any futures the executor creates, and instead queue up tasks that deliver their results to a channel, and have the core.async code park on the channel
the 3 things really synergize, but they may not match what you are doing, and none of them is a fixed pool executor
channels are great, and when you need them there is no substitute, but a lot of the time you can get by with some kind of queue from java.util.concurrent
I'm surprised no one mentioned CompletionService: https://docs.oracle.com/javase/8/docs/api/java/util/concurrent/CompletionService.html
(import 'java.util.concurrent.ExecutorCompletionService)
(import 'java.util.concurrent.Executors)
(defn do-concurrently
"Executes each task in tasks with concurrency c, assuming side-effects,
and run handler on their results as they complete. Handler is called
synchronously from the calling thread."
[tasks c handler]
(let [executor (Executors/newFixedThreadPool c)
cs (ExecutorCompletionService. executor)
initial (take c tasks)
remaining (drop c tasks)]
;; Submit initial batch of tasks to run concurrently.
(doseq [task initial]
(-> cs (.submit task)))
(doseq [task remaining]
;; Block until any task completes.
(let [result (-> cs .take .get)]
;; When there remains tasks, submit another one to
;; replace the one that just completed.
(-> cs (.submit task))
;; Handle the result of the task that just completed.
(handler result)))
;; Since we submitted an initial batch, but only handled a remaining
;; number of tasks, some tasks are left un-handled, and we need to handle
;; them.
(doseq [_ initial]
(handler (-> cs .take .get)))
;; shutdown executor once all tasks have been processed
(-> executor .shutdown)))
(defn io
"Simulating an IO operation by sleeping the calling thread
for the given amount-of-time. Returns the amount-of-time."
[amount-of-time]
(Thread/sleep amount-of-time)
amount-of-time)
;;; Run io 10000 times at 10 ms per io call with up to 100 concurrent calls
;;; and sum up all results.
;;; Then print the time it took and the resulting sum.
(let [sum (atom 0)]
(time
(do-concurrently (repeat 10000 (partial io 10)) 100 #(swap! sum + %)))
(println @sum))
The trick is that you first submit c
number of tasks to be executed concurrently. In this case, I've chosen to make 100 concurrent calls at a time. The call to submit
is non blocking and will return immediately. After you've initiated your first batch, you block on cs
, which will wait till any of them complete, and when one does, it will unblock and return the result of the task that just completed. When that happens, we will submit another task, so that we maintain our concurrency level, and we will call our handler with the result. In effect, we're saying, perform n number of calls up to c at a time. We are handling the results on the thread which submits the remaining tasks as they complete. This means that if our handler is very slow, it will delay our re-queuing of remaining tasks, so that's something to keep in mind. Finally, we have to handle the remaining batch of un-handled tasks, and shutdown the executor
to release the resources associated with it.
core.async pipelines are vaguely like an executor, but not really (pipelines have more ordering which will limit concurrency)
Perhaps a naive question. Why would this be bad? Assuming we want exactly (+ 2 (.. Runtime getRuntime availableProcessors))
concurrently running tasks, as pmap
gives us.
(->> tasks
(pmap do-stuff)
vec)
tasks
itself could be a chunked lazy seq that won't realize too much data ahead.I like pmap 😛 , it actually does something similar in trying to stay ahead, but you can't control the number of threads, and it retains the head of whatever you are doing.
pmaps limiting to 2+ is of course broken (because of chunking), and the way it combines laziness and concurrency is bad, and the way it encapsulates the execution means even its broken limits don't apply if you have multiple pmaps being called
Well, the chunking is actually a blessing in disguise, because you can now control pmap's concurrency based on your chunk size 😛
> limiting to 2+ is of course broken (because of chunking)
Doesn't
Duh, there's extra lazy-seq
there basically disable chunking? Since it advances one at a time.map
.
I was just looking at some migration code, where the migration is written as a reduce (a fold) over each users data, and then the reduce operation is customized to run each reduce step on an executor, and enqueue the next step to run when it is done
user=> (seq [1 2 3 4])
(1 2 3 4)
user=> (class (seq [1 2 3 4]))
clojure.lang.PersistentVector$ChunkedSeq
user=> (lazy-seq (seq [1 2 3 4]))
(1 2 3 4)
user=> (class (lazy-seq (seq [1 2 3 4])))
clojure.lang.LazySeq
user=> (class (seq (lazy-seq (seq [1 2 3 4]))))
clojure.lang.PersistentVector$ChunkedSeq
user=>
You can see it going in batch of 32 here:
(pmap #(do (.println System/out (str "map: " %)) (Thread/sleep 500) %) (range 100))
The customized reduce is just
(defn exec-reduce [exec fun init coll]
(if (seq coll)
(exec (fn []
(if-not (reduced? init)
(exec-reduce exec fun (fun init (first coll)) (rest coll))
(fun (unreduced init)))))
(fun (unreduced init))))
so exec is expected to be a function that queues another function on the executor@U0NCTKEV8 Is your example really correct, given that you explicitly wrap a chunked seq?
(defn step [x]
(lazy-seq
(if (pos? x)
(do
(println x)
(cons x (step (dec x))))
[x])))
(first (step 100))
The above will print 100
only once. That's what I meant by "removes chunking", and that's similar to what pmap
is using. However, it uses map
in between, which is chunked, and that's where that n
gets ignored.Oh, true, my wording sucks, apologies. Replace "removes chunking" with "allows creating lazy seqs without chunking".
you can create lazy-seqs without chunking, but some people have things like vectors and like to map over them
For example:
(defn re-chunk [n xs]
(lazy-seq
(when-let [s (seq (take n xs))]
(let [cb (chunk-buffer n)]
(doseq [x s] (chunk-append cb x))
(chunk-cons (chunk cb) (re-chunk n (drop n xs)))))))
(pmap #(do (.println System/out (str "map: " %)) (Thread/sleep 500) %) (re-chunk 50 (range 100)))
Now it is doing 50 at a time.Going back to your initial reply on pmap
, and ignoring the chunking for now.
> the way it encapsulates the execution means even its broken limits don't apply if you have multiple pmaps being called
I can see that, although creating thread pools willy-nilly I guess would be roughly the same. So treating a single usage of pmap
as if it were a creation of a new thread pool should deal with that concern.
> the way it combines laziness and concurrency is bad
This is probably the most interesting. Could you say a couple more words on it?
just a general statement I guess, the laziness limits your ability to control execution, the use of concurrency implies you care about execution
I think for a script, you can probably trust something like:
(dorun (pmap #(handle (some-io %)) coll)
Keeping in mind that your handler will run in parallel as well.It'll go 32 at a time by default, and you can re-chunk if you want it to go faster. Though you can't slow it down much more then that, since it'll be num of threads + 2 at a minimum, chunk-size otherwise
But honestly, the CompletionService in my opinion is the best way to go if you want to run a bunch of tasks in parallel batches and go as fast as you can, handling each result where order doesn't matter.
Thanks everyone. I ended up using the CompletionService and am very happy!
I want to call the second method listed here:
(ins)org.noisesmith.gamey=> (-> (reflect/reflect GLFW) :members (->> (filter (fn [x] (= (:name x) 'glfwCreateWindow)))) pprint)
({:name glfwCreateWindow,
:return-type long,
:declaring-class org.lwjgl.glfw.GLFW,
:parameter-types [int int java.nio.ByteBuffer long long],
:exception-types [],
:flags #{:public :static}}
{:name glfwCreateWindow,
:return-type long,
:declaring-class org.lwjgl.glfw.GLFW,
:parameter-types [int int java.lang.CharSequence long long],
:exception-types [],
:flags #{:public :static}})
nil
when I call it as follows:
(GLFW/glfwCreateWindow 300 300 "Hello, World!" nil nil)
I get
Execution error (IllegalArgumentException) at org.noisesmith.gamey/init (gamey.clj:18).
No matching method glfwCreateWindow found taking 5 args
the javadoc for this method tells me I should be providing nil for my last two arguments https://javadoc.lwjgl.org/org/lwjgl/glfw/GLFW.html#glfwCreateWindow(int,int,java.lang.CharSequence,long,long)
what's the trick to getting clojure to find the right method here? hint the nil as a Long or something?those are most likely pointers and they probable mean a null pointer (eg. 0) rather than java null
I've never seen 0 called NULL in javadoc
I'll try it though
or they copy and pasted from the glfw docs
My bet that it's the case. The original function definition:
GLFWwindow* glfwCreateWindow ( int width,
int height,
const char * title,
GLFWmonitor * monitor,
GLFWwindow * share
)
https://www.glfw.org/docs/latest/group__window.html#ga5c336fddf2cbb5b92f65f10fb6043344
yeah, it's just shitty docs and it wanted 0, thanks
> [in] monitor The monitor to use for full screen mode, or NULL for windowed mode. > [in] share The window whose context to share resources with, or NULL to not share resources.
I guess I'll keep that in mind when I see random "long" args that are actually pointers
I think I got my signals crossed because under X11 you really do look up screens and windows with numeric ids that aren't pointers
(IIRC)
It’s been a while since I’ve had to interop with a vararg Java method. How would I call a static method with this type signature? *of*(double[][] data, java.lang.String... names)
(.of O data (into-array String [...]))
Brilliant:
(DataFrame/of (into-array (map int-array
[[0 0 0 0]
[0 0 0 0]
[0 0 0 0]]))
(into-array java.lang.String
["w"
"x"
"y"
"z"]))