Fork me on GitHub
#babashka
<
2021-06-16
>
dabrazhe06:06:58

What is break even point for bb when it becomes too slow etc. and I have to switch to plain Clojure?

borkdude07:06:04

@dennisa That's hard to say in general, but as a rule of thumb I would say, scripts that take longer than 5 seconds are probably worth running on the JVM. Are you hitting any limits?

jeroenvandijk07:06:11

@borkdude Btw, I was thinking about this. I think you discovered that a Graalvm process can be replaced with another process, right? Could this mean that you could start a babashka process together with a clj process and have the clj process take over after X time? This assumes of course that there are no port collisions or other side effects

borkdude07:06:06

"after X time", this would just be an explicit call. you could e.g. do the CLI parsing in bb and then hand over control to the clj process

jeroenvandijk08:06:44

Can imagine it is even nice for user feedback. E.g. prepend a spinner to a slow starting jvm process https://github.com/clj-commons/spinner

grazfather12:06:50

But in this case you still have to pay the startup cost of the jvm, it’s not like it’s starting up ‘in the background’, since it’s an exec call.

kokada13:06:20

Nice that babashka has exec on its roadmap

kokada13:06:27

I can have some usages for it probably

borkdude13:06:46

@UFDRD93RR it's honestly not so hard to add, I'm just more worried that people use it in a way to shoot themselves in the foot

borkdude13:06:21

e.g. when using this with tasks, the tasks aren't supervised anymore, e.g. when one dependency uses exec, the entire tree of tasks will suddenly become that process

kokada13:06:28

It is still useful, Python has this on its stdlib and when you need exec is the only option

borkdude14:06:22

so what's something you would use this for as opposed to just create a child process and wait for it to finish?

kokada14:06:54

I needed to call a second program once where I didn't want to pay the memory consumption of my own program, so exec was the answer (also, I didn't need the result of the program, just calling it)

kokada14:06:58

So I used exec

borkdude14:06:41

but the memory consumption of bb is very little

kokada14:06:21

Still, I didn't need it

kokada14:06:43

Like I said, I didn't need the result of the program

kokada14:06:22

Also, I needed to return the real code of the exec'd code

kokada14:06:42

And I can do this with subprocess, but exec does this without needing special handling

borkdude14:06:58

correct. do you think this function belongs in babashka.process?

kokada14:06:14

Not sure, in python it is part of os

kokada14:06:49

(I would argue that technically not, because exec is not a subprocess)

kokada14:06:29

exec is a Unix system call, so it is better fitted to a place that groups system calls

borkdude14:06:59

babashka.core? ;P

borkdude14:06:08

babashka.system?

borkdude14:06:16

we could do babashka.os

kokada14:06:51

babashka.os seems great

kokada14:06:07

I remembered the discussion about setenv now 🙂

borkdude14:06:08

we didn't add setenv because it would be very confusing since the env is cached in the jvm

kokada14:06:11

BTW, I think exec may compose badly with other parts of Babashka Like, you can't set an environment 😅

borkdude14:06:29

this is why I'm not eager to add it yet

borkdude14:06:45

there may be reasons the java folks don't support this

kokada14:06:05

exec in Java doesn't really make sense

kokada14:06:15

If exec was possible in JVM you would exit JVM

borkdude14:06:48

so? if exec is possible in bb you would exit bb. same for python. what's the difference?

kokada14:06:06

JVM needs to do cleanup, exec is like a kill -9

kokada14:06:14

This would probably broke something

kokada14:06:47

(Not saying that this doesn't in Python, it is just that Python programs generally have a good behavior on kill -9)

kokada14:06:33

But maybe it is just that Java folks doesn't want to be too much coupled with Unix too

kokada14:06:02

Both setenv and exec are kinda of Unix specific (environments exists in Windows but their behavior are different)

borkdude14:06:05

if Java needs clean up, how is this different for bb?

kokada14:06:00

I just found that native programs doesn't need that much cleanup as a VM as big as Java

kokada14:06:29

But this is just an assumption, maybe my second reasoning about Unix specific calls makes more sense

borkdude14:06:48

I will leave the issue open to collect more info

kokada14:06:30

Anyway, I still find it bizarre that getenv is cached in Java

kokada14:06:57

This seems wrong for some reason for me

kokada14:06:29

It is not like getenv is slow

borkdude14:06:44

maybe getting the entire environment map is slow

kokada14:06:46

Maybe it is slow in some specific *nix?

kokada14:06:52

And this is why it is cached?

kokada14:06:13

> maybe getting the entire environment map is slow Yeah, this is the part that doesn't make sense for me AFAIK, getenv in Linux is fast

borkdude14:06:15

don't know, who is the developer from 96 to ask this?

😆 3
kokada14:06:50

The only thing I can think is like, getenv being slow in Solaris or HP-UX or whatever

kokada14:06:18

BTW, I found a bug report about this issue of System.getenv: https://bugs.java.com/bugdatabase/view_bug.do?bug_id=8173654 Interesting that it is marked as fixed (but we either hit another issue or something else, since this issue was resolved in 2017)

borkdude14:06:00

we didn't set the env var via jni

kokada14:06:53

Yep, but the issue is similar that System.getenv is returning the old value (and the explanation is what the developer from GraalVM said)

kokada15:06:01

Here is a more complete history of the issue: https://bugs.openjdk.java.net/browse/JDK-8173654

kokada15:06:03

Good reading BTW

kokada15:06:36

> But independent of that, the caching of the environment on first use (and its immutability except when creating a subprocess) was a deliberate design decision back in the 5.0 days. So no JDK bug here. OTOH ... I don't think the caching behavior was ever specified, and it might be useful to users to know the rules.

borkdude15:06:07

hmm weird that we came across it in graalvm then, if this is supposed to be fixed

kokada15:06:40

It is still caching though, it is very clear by the issue discussion

borkdude15:06:54

can you summarize it for me? I'm doing other stuff meanwhile :)

kokada15:06:03

So I think they just fixed whatever part of the code that broke this specially for JNI

borkdude15:06:06

(adding higher order function arity linting to clj-kondo)

borkdude15:06:26

oh so perhaps they could also fix it for the graalvm specific interop

borkdude15:06:43

btw, I think changing the dir may have pretty weird side effects on relative classpaths

kokada15:06:00

> can you summarize it for me? I'm doing other stuff meanwhile 🙂 Sure: - Like you said, this issue was a regression with calling setenv in JNI. Used to work before JDK 8u60, stopped working after this version - Martin Buchholz says that there is cache-on-first-access for System.getenv (actually, this seems to be from a code from ProcessBuilder that System.getenv reuses) - The cache is a explicitly design decision, however it is not documented - Also, changing environments using JNI is unsupported and may crash the JVM (I think this is highly unlikely unless you change some environment variable that JVM itself uses, but well) - The issue is fixed without explanation, and I can just assume they fixed whatever caused the regression in 8u60, but this wasn't the expected behavior anyway since the cache is a explicit design decision

kokada15:06:55

> oh so perhaps they could also fix it for the graalvm specific interop Maybe it would be a good idea to open a similar issue in GraalVM issuer tracker and see what the GraalVM devs think

borkdude15:06:56

ok, so all in all, if you try to implement this, you're pretty much on your own

kokada15:06:10

Pretty much

borkdude15:06:14

@UFDRD93RR what you could hack is an intermediate C-style function that will call setenv and then exec, to get around the setenv problem

borkdude15:06:42

I mean, I think our original setenv would work for exec, for passing through env variables?

borkdude15:06:07

that's an assumption

borkdude15:06:17

but you could offer this in a combined API, like exec + env vars

borkdude15:06:24

and expose it only there

kokada15:06:47

Or maybe having babashka.os.setenv documented with "if you use this function and expect it to call in the current process, please use babashka.os.getenv instead of System/getenv

borkdude15:06:00

but you could also hack a bash script the sets envs and then does exec and exec to that bash script from bb

kokada15:06:37

> I mean, I think our original setenv would work for exec, for passing through env variables? Yeah, it should work, the only issue I see with setenv is with System/getenv that we could workaround with a wrapper around getenv from C

borkdude15:06:50

yeah. there was also the Windows incompatibility with setenv/getenv, the Windows c lib calls this differently

borkdude15:06:00

all in all, it's a bit of a pain to maintain

kokada15:06:19

Maybe we can see how Python implements this :thinking_face: ?

borkdude15:06:35

yeah, please do look it up

kokada15:06:37

But in the end, as far I understand the code, you either call setenv() or _wputenv()

borkdude15:06:40

so even the public API is different in Python?

kokada15:06:07

No, they offer the same public API

kokada15:06:50

os_putenv_impl is the actual implementation, where on POSIX it compiles to call setenv(), while on Windows it compiles to _wputenv()

kokada15:06:51

Since setenv() uses 3-arity, it pass setenv(env_var, value, 1), while on Windows they do _wputenv(env_var + "=" + "value") AFAIK

borkdude15:06:54

yep, when I saw that I was like: eeeeeh, I'm having second thoughts

kokada15:06:44

(The call to _wputenv() ends doing a bunch of validation because of this concat, this is why the code is so big)

kokada15:06:34

TL;DR: Win32 sucks 🤷

kokada15:06:26

I can give it a try if you want @borkdude, I mean, I am probably the only person interested on this right now 😆 Since you already did the hardwork figuring out how to call C code in setenv branch, I think now it is mostly writing C+Java code

kokada15:06:35

No promises though, but should be a nice weekend project

borkdude15:06:32

OK, I merged the master branch into the set-env branch. No promises that if you make it work, that I will merge the branch, but feel free to try it :)

kokada15:06:51

Yeah, please review the code and take your own conclusions I mean, it is a pretty niche case

dabrazhe09:06:29

@borkdude I've got a lot of map/filter code that runs per se fast enough ~ . But printing becomes super slow after a while, especially in repl, it takes 500 ms for one line. And printing is important for my scripting.

Martín Varela10:06:31

When you say "in the repl", are you working from Emacs by any chance? If you print a lot, it does tend to become laggy... clearing the REPL output helps in this case (usually)

dabrazhe10:06:49

I work with VSC. The repl printing becomes so slow I have to restart the Repl, and VSC at some point. I am afraid it will happen in CLI/prod and become a bottleneck

borkdude10:06:15

@dennisa Is it possible for you to make a minimal repro for this? I may have an idea where a performance problem with println could come from and I might be able to optimize it

borkdude11:06:41

It may just be an issue with your editor, so I'd like to have some kind of editor-independent repro

borkdude11:06:51

Like, how many lines / items are you printing

dabrazhe12:06:44

I need to find a way to separate the business logic from the printing code. Do you have ideas how to do it?

borkdude12:06:30

as a first step, you could try to run your scripts outside of the editor and see if your problem is editor related

Tomas Brejla12:06:09

@dennisa printing in VSC (you probably mean vs code + Calva?) does indeed become slower and slower with more lines in output.calva-repl "file".

grazfather12:06:47

Can you maybe mitigate (in editor) by shrinking some backlog setting?

Tomas Brejla13:06:54

btw @dennisa in case it's really the issue with calva and its slow appending to output.calva-repl, I've already tried doing some optimizations in Calva in the past. You may check this archive: https://clojurians-log.clojureverse.org/calva/2021-03-30 Basically i just added batching into append function inside results-doc.ts. It made quite a big difference, especially when you append many lines in one-by-one fashion. Here's a youtube video https://www.youtube.com/watch?v=GufgU7C4n6s showing the slowness and how it might be optimized. Unfortunately I didn't have time back then to fully finish this effort. If this slowness is what you're experiencing, then we should probably continue the conversation in #calva channel instead.

dabrazhe21:06:05

It's likely you are right guys. I had 11K lines in the calva output and once removed the performance is back up. : ) will give it a try and get back

cldwalker12:06:32

Is there documentation anywhere that compares bash scripting to bb? Would like to point coworkers to this to make it easier for them to try bb. If not, was thinking of starting a wiki page

borkdude13:06:03

@cldwalker The wiki is open I believe.

borkdude13:06:11

There is also a github discussion about this. I'm also willing to incorporate this in the book at some point, but I'd be fine if someone else took initiative on this as well or maintained some page

cldwalker14:06:04

Created https://github.com/babashka/babashka/wiki/Tasks:-Bash-and-Babashka-equivalents as a first pass. Happy to move to the book at some point. Fixes and more contributions welcome 🙂

borkdude14:06:41

@cldwalker Good start! Perhaps explain what shell is since not all people might be familiar with bb.edn's tasks setup. The shell function comes from babashka.tasks which is based on babashka.process/process

👍 3
Bob B17:06:52

I want to ask if this is a well-known thing before opening an issue/continuing a discussion; I've done a cursory search through the issues and the book... running "one-liners" (passing forms on the command line without -e) on Windows will throw if the form contains "illegal" path characters, e.g. bb "(zero? 1)" will throw because of the '?'

borkdude18:06:01

@U013JFLRFS8 This may just be a shell-specific thing? Which shell is this, powershell or cmd.exe?

borkdude18:06:05

@U013JFLRFS8 Ah yes, I see the issue

borkdude18:06:46

For now just use explicit -e

bb -e "(some? 1)"
true

borkdude18:06:01

but I think it's good to fix

Bob B18:06:50

I'll open an issue so it's sort of 'written down' if that's ok, and then go from there

borkdude18:06:48

yep, I like that approach

nyor.tr18:06:27

Would it be possible to create a babashka pod from a clojure library that depends on Java libraries, for instance javax.xml.stream?

borkdude18:06:51

absolutely, as long as the libraries are compatible with graalvm native-image (if you want to create a pod with fast startup)