Fork me on GitHub
#babashka
<
2022-03-16
>
borkdude16:03:46

We now have an "official" badge that you can use in your libraries or projects to indicate that it runs with babashka: https://github.com/babashka/babashka#projects Thanks to @rahul080327

🆒 3
babashka 2
ghadi18:03:36

observation without analysis: `bb` startup is nearly instant when run in succession, but when I open a new iTerm, it takes an extra 500ms on the first time~

ghadi18:03:28

aaaand I withdraw that anecdote ^ it's due to my lazy loaded powershell10k zsh profile duckie

slowpoke 2
borkdude19:03:11

Another situation in which the first invocation of bb might be slower is when it fetches deps declared in bb.edn

borkdude20:03:44

A demo of fs/walk-file-tree , walk the tree for finding all .git dirs: https://gist.github.com/borkdude/123ad8a889e3261178054ee3867a8a3c

ghadi20:03:05

the Files/walkFileTree api is so bad

ghadi20:03:37

can't deal with it without mutation (like most visitor pattern APIs)

ghadi20:03:23

there's a Files/walk API that returns Stream<Path>

ghadi20:03:36

(but cannot short-circuit)

borkdude20:03:01

I came up with that example because someone complained about the performance of fs/glob compared to find, but it turned out that using find you can filter on file type and you can prevent doing the glob, which is why it's faster. If you do that manually using fs/walk-file-tree the bb example is several times faster.

ghadi20:03:40

how does fs/glob work?

borkdude20:03:03

it's implemented on top of fs/walk-file-tree which is in turn implemented on top of the java nio stuff

borkdude20:03:17

it uses the regular java nio glob stuff

borkdude20:03:00

I guess we could make fs/glob take an ordinary predicate instead of a glob too, or make a variation

borkdude20:03:21

but so far I haven't really needed it

borkdude20:03:10

@ghadi Maybe iteration could deal with file tree walking? ;)

ghadi20:03:30

or file-seq

borkdude20:03:51

yeah, file-seq always works... but not much you can control there

ghadi20:03:58

(everything we do, Rich did in 2008 🙂 )

ghadi20:03:39

I'm just not a fan of the APIs that require mutation in userspace code (the one that takes a https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/nio/file/FileVisitor.html)

borkdude20:03:40

It's running already over a minute long here:

$ time bb -e '(count (filter #(= ".git" (fs/file-name %)) (file-seq (io/file "."))))'

ghadi20:03:02

Files/walk might be better (although streams are not directly reducible)

borkdude20:03:09

Yes, that example was a performance optimization over using the non-mutative fs/glob :)

borkdude20:03:46

This is the "recommended" approach if you don't care about performance vs find :

$ bb -e '(time (count (filter fs/directory? (fs/glob "." "**/.git" {:hidden true}))))'

borkdude21:03:22

In babashka.fs I didn't try to make things fancy, it's just a thin layer over java nio, generally

borkdude21:03:55

Doesn't Files/walk basically do what file-seq does? What's the advantage of using the former?

ghadi21:03:38

presumably Files/walk is more efficient

ghadi21:03:00

file-seq delegates to tree-seq and checks if every file is a directory

borkdude21:03:56

This works:

bb -e '(-> (java.nio.file.Files/walk (.toPath (io/file ".")) (into-array java.nio.file.FileVisitOption [])) (.iterator) (iterator-seq))'
But then you still need to filter out the things you want.

ghadi21:03:15

clojure and babashka are pretty good at filtering

ghadi21:03:58

(into [] (filter git-dir?) (walk ...))

borkdude21:03:41

Yeah, but it's much much slower. Terminating early makes the example 4x faster, which was the original argument with someone on Twitter vs find ;)

$ time bb -e '(-> (java.nio.file.Files/walk (.toPath (io/file ".")) (into-array java.nio.file.FileVisitOption [])) (.iterator) (iterator-seq) (count))'
192654
bb -e    0.52s  user 1.74s system 48% cpu 4.699 total
But yeah, Files/walk is much faster than file-seq which still doesn't terminate after a minute on my home dir :)

borkdude21:03:07

$ time bb -e '(count (fs/glob "." "**" {:hidden true}))'
192653
bb -e '(count (fs/glob "." "**" {:hidden true}))'   1.24s  user 1.77s system 99% cpu 3.032 total

borkdude21:03:56

(the difference of 1 is because glob doesn't include the root dir)