Fork me on GitHub
#babashka-sci-dev
<
2022-03-20
>
cap10morgan16:03:46

Seeing hashing times in the hundreds of milliseconds for some typical pod binaries. Which is a big chunk of the caching speed up. Wondering if it defeats the purpose…

cap10morgan16:03:07

Was looking into perf optimizations too

borkdude16:03:39

perhaps a sha 1 then?

cap10morgan16:03:04

Perhaps checksumming might be appropriate here? I would want to keep the :cache param if so because it can (very rarely) have collisions.

cap10morgan16:03:16

Tried sha-1 too and it was about the same

borkdude16:03:24

you mean md5 or so?

cap10morgan16:03:46

CRC-32 or I found another one called Alder-32

borkdude16:03:06

we could do md5 + add the filename to the cache file: /Users/borkdude/pod-foobar => dcccadfc-pod-foobar.cache

cap10morgan16:03:33

MD-5 didn't seem any faster either

cap10morgan16:03:11

I tried using streams instead of reading the whole file into a byte array but that was slower

cap10morgan16:03:21

So might still be doing something wrong

cap10morgan16:03:18

Anyway, going to keep trying for a bit to see if I can get it sped up and if not might just go back to :cache param

borkdude16:03:45

Or we could make a cache relative to the bb.edn file instead of the global one and indeed the :cache param. Let's just do that

cap10morgan16:03:36

Not sure I follow

borkdude16:03:31

Since local pods aren't local, maybe it doesn't make sense to store the cache globally but relative in the project.

borkdude16:03:43

And then we could just use the :cache to avoid caching in dev.

borkdude16:03:07

and not do the hashing stuff because it's just too slow

cap10morgan16:03:59

Ah I see. So the goal of storing the cache in project is to make it more obvious to the user that it exists?

borkdude16:03:18

similar to .cpcache

cap10morgan16:03:19

👍:skin-tone-2:

borkdude16:03:54

maybe similar to .lsp/.cache and .clj-kondo/.cache we could do .babashka/.cache

cap10morgan17:03:34

Or… we could tell people using local pods in prod that if they want caching they need to store the sha-512 hash of their pod binary in name-of-pod.sha512 in the same dir. and then we read that and use it for caching? Then they can just build the hashing into their build pipeline. It really only needs to be calculated once when the pod is compiled. Or we could do it if the file isn't found and then just check if the timestamp is still newer? Maybe getting too complicated… What do you think?

borkdude17:03:47

Sounds too complicated.

borkdude17:03:19

you still need to calculate the sha512 of the pod to compare it with the stored sha512

borkdude17:03:38

and this will still be slower than just starting the pod and reading the uncached describe message probably

cap10morgan17:03:55

Well, the idea would be that you don't do that and it's up to the pod builder to bust the cache by updating that themselves

borkdude17:03:29

let's go with caching by default and in dev be able to opt out

cap10morgan17:03:46

But yeah, I'm not sure it's a great approach. Was just trying to think of a way to only compute the hash when it changes.

borkdude17:03:03

note that we only do this for local pods, not for registry pods which are always cached.