This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-05-09
Channels
- # babashka (176)
- # beginners (26)
- # cider (1)
- # cljsrn (3)
- # clojure (60)
- # clojure-australia (3)
- # clojure-europe (8)
- # clojure-uk (3)
- # clojurescript (1)
- # code-reviews (4)
- # community-development (3)
- # cursive (6)
- # depstar (1)
- # emacs (11)
- # figwheel (3)
- # helix (13)
- # integrant (2)
- # jackdaw (1)
- # kaocha (1)
- # leiningen (3)
- # luminus (2)
- # malli (4)
- # music (1)
- # nrepl (2)
- # off-topic (1)
- # practicalli (3)
- # reitit (3)
- # shadow-cljs (16)
- # slack-help (14)
- # sql (13)
- # vim (12)
Am I doing something wrong here or is there a bug?
bb repl
Babashka v0.4.0 REPL.
Use :repl/quit or :repl/exit to quit the REPL.
Clojure rocks, Bash reaches.
user=> (require '[babashka.process :refer [process sh]])
nil
user=> (sh ["youtube-dl" "--dump-json" "--ignore-errors" "--skip-download" ""])
Segmentation fault (core dumped)
Even more minimal exaxmple:
bb repl
Babashka v0.4.0 REPL.
Use :repl/quit or :repl/exit to quit the REPL.
Clojure rocks, Bash reaches.
user=> (require '[babashka.process :refer [process sh]])
nil
user=> (sh ["youtube-dl"])
Segmentation fault (core dumped)
@ben.sless are you using the static Linux binary perhaps?
Ive linked you to the thread where we are discussing the issues
@ben.sless what OS and kernel versions are you using? I will try to dig a bit deep
Maybe we should enable the musl then? https://github.com/oracle/graal/issues/571#issuecomment-618508346
ah
By default, --static will statically link your native-image against your host libc (glibc)
https://github.com/oracle/graal/issues/2691#issuecomment-670010352this could explain the segfaults
i would advocate for using musl too
will try this
I think even when linking statically, glibc still tries to access some system files on runtime
@ben.sless Any chance you could test this (musl-linked) static binary for linux with the same program that didn't work for you this morning? https://github.com/babashka/babashka/files/6448129/babashka-0.4.1-SNAPSHOT-linux-amd64-static-musl.tar.gz
This is with a more recent musl: https://github.com/babashka/babashka/files/6448198/babashka-0.4.1-SNAPSHOT-linux-amd64-static-musl.tar.gz
@rahul080327 Seems like musl is really the way to go 🙂
Would it make sense that iff you run the babashka static binary, then you also download static pods?
or would it be best to control this via an explicit env var:
BABASHKA_PODS_STATIC=true
if the pods are guaranteed to be on the same machine, which i think they are, i would say it makes sense to make it the default
most users may not think of that env var i think?
in this case we must capture the fact that babashka has been compiled statically (not sure if graalvm already provides a system property for this that you can read at compile time)
im not sure of it
this is easily done using (def static? (System/getenv "BABASHKA_STATIC"))
in the relevant pods library namespace and use this for downloading the right pod
Meanwhile, @thiagokokada are you going to do a similar PR to babashka for the static one?
Another question: we can "prefer" a static pod if bb is itself static, but what if there is no static linux pod available, but there is a dynamic one, should we select that one, or just crash with: no such pod here?
@thiagokokada My impression was that you were going to update bb static itself with musl too?
I thought we would use the @rahul080327 approach instead of mine, so I was waiting for his PR
But I can do it too if @rahul080327 is not interested of porting the approach from pods-aws-babashka
About the question about if we should try to download dynamic pods in case of the static one is unavailable: no, I don't think so
maybe automatically selecting a static or dynamic one is also too much magic - that is why I proposed an env var to control this
That can work, but then you'd have to change the code if you deploy your code from a mac to the cloud for example
Also has the advantage if someone wants to do something like "static with dynamic fallback" possible
Maybe let's test this out a bit for a few months and if the static one works out ok, we can just switch over all linux pods to it
> It would be much simpler if we could just always use the static pod on linux Maybe we could go even further and eventually simple remove the dynamically compiled version of Babashka If someone wants to use Babashka with dynamic linking, they can always depend on their distros packages
How would dynamically liked bb magically appear in "their distros packages" if we don't provide it
For example, NixOS's Babashka package is dynamically linked I was imagining like the distros themselves packaging Babashka
> I am all for it, the only reason I would not do it if it wouldn't work for someone Yeah, I think we can work on this static version now and ask for more people to test, and if everything looks good we could eventually remove the non-static version
I can secretly use the static aws pod in the manifest and see if someone starts yelling?
> I can secretly use the static aws pod in the manifest and see if someone starts yelling? Well, LGTM, I don't know if someone has some objection against this
let's do it, when it breaks certain environments, we will switch back and find another solution
Thanks @U04V15CAJ Just did another PR to make sure that we are using the correct ZLIB sources (avoiding possible security issues): https://github.com/babashka/babashka/pull/830
(I mean, after downloading a pod we should verify it against a sha in the manifest)
> Hmm, I just enabled the musl static build on master Strange, it seems to works fine in my machine
> which graalvm version did you use to compile locally? I used the version included in the Dockerfile
In that previous version I build musl from scratch, while this one uses the musl-tools
prebuilt from Debian
Any chance you were using 21.1 the first time around? I just merged into my 21.1 branch and there it seems to work fine...
I see. Note that you can bump up the BABASHK_XMX value in you want to speed up your local compile a bit btw
No, there is a graal21.1
branch which is the same as master but with 21.1 instead of 21.0
Nope, I also got an exception So maybe the difference is the musl version :thinking_face: ?
Well, downgrading to whatever musl version that comes from Debian fixed the clj-http.lite.client-test
tests but broke the clojure.data.generators-test
:man-shrugging:
So, a summary
- musl 1.22+GraalVM 21.0: broke clj-http.lite.client-test
- musl 1.1.21 (I think, this is the musl from Debian Buster but I am not sure that this is the version we use)+GraalVM 21.0: broke clojure.data.generators-test
- musl 1.22+GraalVM 21.1: works fine with all libs that we have tests
So if I run the tests once it works, again it segfaults, again it works, again it segfaults...
@U04V15CAJ All I can say for now is that the tests are really really flaky with those static binaries. We could add a retry and run it a few times, but it doesn't seem to be a good sign considering that this behavior doesn't happen with the old glibc binaries CC @rahul080327
I do think we should report this to GraalVM itself since it seems to be an issue with them, but I don't know how we could report this (we probably need some way to reproduce this with a smaller example that can trigger this behavior)
My suggestion is maybe releasing this static version with musl separated from the other static binary, at least until we can figure out why this tests are flakly. But I will leave the decision to @U04V15CAJ
Just to leave here, I also tried to build with Debian Buster (we seem to use Debian Stretch actually, since the GCC used by the image is 6.3 instead of the 8.3), since Native Image in the end builds everything with GCC and 6.3 is a quite old GCC version.
I ended up using openjdk-17-lein-2.9.6-buster
that also uses updated versions of lein and OpenJDK since I didn't find a way to only bump the Debian version.
But no dice, same behavior.
> My suggestion is maybe releasing this static version with musl separated from the other static binary, at least until we can figure out why this tests are flakly. Actually, after more testing maybe it is fine. It is just some libs that fail sometimes (but when it works all tests pass), and this musl based binary should work in any Linux system, so this is still a win in my book. The current static binary based on glibc seems more useless in comparison. When it fails it fails hard, and most of the time people are better using the non static version anyway. I also don't think the static glibc version works in Alpine anyway since Alpine is a musl based system (I may be wrong here). My opinion is to release it as is, just leaving somewhere in the release notes that this release is more unstable than the non-static build so the non-static build is still preferred. My 2c.
(And we keep track of new GraalVM releases, I think it is a bug there that will eventually be fixed. It does seem that with GraalVM 21.1 this bug is more difficult to trigger, but this is completely a non-scientific observation.)
Hmm. The static image was compiled to work with Alpine, that was the only reason it was there. If the musl build has issues that the other static image doesn’t have, I’m inclined to roll back changes until we sort it out. Cc @rahul080327
I think I found the issue. The static binary stackoverflows much sooner than the non-static one. E.g. try:
./bb -e "(defn foo [x] (prn x) (foo (inc x))) (foo 0)"
The non-static goes to about 20k, while the static one only goes through 186 or so> musl allocates memory of very small size for thread stacks. musl documentation says it's only 80KB (now increased to 128KB) for default, which is rather small compared to 2-10MB of glibc. And that's been said to be the major reason for stack-consuming executable to crash (segmentation fault) in musl libc environment.
Very interesting find! My opinion here would be that we could make it explicit that we are using musl and point out the caveats. if this is the reason for flakyness i think we can go ahead with musl with the warning stickers. the ability to run in most places should outweigh the stack size issues. musl has quite a widespread production usage and i think this would be known to people
We could try patching the binary but again i get shivers when we change the defaults of a well used lib, specially when it comes to memory regions
things like rust, zig and others solely rely on musl for static links, i would say thats a good reason to stick to it
@rahul080327 Patching the binary with muslstack
didn't work btw. Can't we set a higher stack limit during compilation?
This seems to be a similar bug: https://bugs.webkit.org/show_bug.cgi?id=187485
I am trying now with: "-H:CCompilerOption=-Wl,-z,stack-size=2097152"
So far it seems to work... more testing now
@thiagokokada Perhaps you can also try this option locally. I'm running the build a few times over to see if works repeatedly. https://app.circleci.com/pipelines/github/babashka/babashka?branch=stack-size
Doesn't seem to work. The recursion depth is also more limited still with this binary, so maybe it was just luck that it worked a couple of times.
> What kind of issue did you see with data generators, also a stackoverflow?
Yeah, but I only saw this issue once. The issue with clj-http-little
seems to occur more frequently (and also it does not seem related to stack, since it occurs randomly; or maybe it is, I will try it later with the recent stack changes)
> Very interesting find! My opinion here would be that we could make it explicit that we are using musl and point out the caveats. if this is the reason for flakyness i think we can go ahead with musl with the warning stickers. the ability to run in most places should outweigh the stack size issues. musl has quite a widespread production usage and i think this would be known to people
I concur with @rahul080327 here.
I've done deep into the weeds with a GraalVM dev already. It's related to what musl does with the stack size. We should resolve this before publishing the new static bb, I find this unacceptable.
You can find some details here: https://github.com/oracle/graal/issues/3398
I am building Babashka with "-H:CCompilerOption=-Wl,-z,stack-size=2097152"
just to run the tests and see what we get
@thiagokokada I already tried this and it doesn't have any effect
@thiagokokada @rahul080327 This seems to be a workaround that works: https://github.com/babashka/babashka/commit/3bc0f3ffad557bd5e89a1fb1337ce5fc1d8716b5
The workaround is to start bb in a thread with a better default for the stack size. The rest of the threads should get the thread-size from the stack-size
argument. It does incur some extra startup time so I will make that workaround conditional based on some env var
We don't even have to set the stack size in Java, as long as we start everything in a thread and pass a better default for the stack size, things should work again
> It does incur some extra startup time so I will make that workaround conditional based on some env var
So we will have something like BABASHKA_THREAD_STACK_SIZE
that can be set by the user if it wants it? This is the ideia?
LGTM by the way
What is the recommended way to have a bb task that consists of a Unix pipeline? I tried {:tasks {clean (-> (shell "find . -type f -name '*~' -print") (shell "xargs rm -f"))}}
but bb clean
gives me a java.util.concurrent.ExecutionException
.
I think shell
could support this though (made an issue: https://github.com/babashka/babashka/issues/829)
What is also possible:
(let [out (:out (shell {:out :string} "find . -type f -name '*~' -print"))]
(shell {:in out} "xargs rm -f"))