babashka-sci-dev

cap10morgan 2022-03-20T16:43:46.874309Z

Seeing hashing times in the hundreds of milliseconds for some typical pod binaries. Which is a big chunk of the caching speed up. Wondering if it defeats the purpose…

cap10morgan 2022-03-20T16:44:07.217889Z

Was looking into perf optimizations too

borkdude 2022-03-20T16:44:39.307189Z

perhaps a sha 1 then?

cap10morgan 2022-03-20T16:45:04.391889Z

Perhaps checksumming might be appropriate here? I would want to keep the :cache param if so because it can (very rarely) have collisions.

cap10morgan 2022-03-20T16:45:16.020209Z

Tried sha-1 too and it was about the same

borkdude 2022-03-20T16:45:24.166419Z

you mean md5 or so?

cap10morgan 2022-03-20T16:45:46.970749Z

CRC-32 or I found another one called Alder-32

borkdude 2022-03-20T16:46:06.302019Z

we could do md5 + add the filename to the cache file: /Users/borkdude/pod-foobar => dcccadfc-pod-foobar.cache

cap10morgan 2022-03-20T16:46:33.216899Z

MD-5 didn't seem any faster either

cap10morgan 2022-03-20T16:47:11.873419Z

I tried using streams instead of reading the whole file into a byte array but that was slower

cap10morgan 2022-03-20T16:47:21.042619Z

So might still be doing something wrong

cap10morgan 2022-03-20T16:48:18.602039Z

Anyway, going to keep trying for a bit to see if I can get it sped up and if not might just go back to :cache param

borkdude 2022-03-20T16:49:45.439129Z

Or we could make a cache relative to the bb.edn file instead of the global one and indeed the :cache param. Let's just do that

cap10morgan 2022-03-20T16:50:36.374319Z

Not sure I follow

borkdude 2022-03-20T16:51:31.609209Z

Since local pods aren't local, maybe it doesn't make sense to store the cache globally but relative in the project.

borkdude 2022-03-20T16:51:43.895819Z

And then we could just use the :cache to avoid caching in dev.

borkdude 2022-03-20T16:52:07.741919Z

and not do the hashing stuff because it's just too slow

cap10morgan 2022-03-20T16:52:59.521869Z

Ah I see. So the goal of storing the cache in project is to make it more obvious to the user that it exists?

borkdude 2022-03-20T16:53:15.286859Z

yeah

borkdude 2022-03-20T16:53:18.598359Z

similar to .cpcache

cap10morgan 2022-03-20T16:53:19.286409Z

šŸ‘šŸ»

borkdude 2022-03-20T16:53:54.585149Z

maybe similar to .lsp/.cache and .clj-kondo/.cache we could do .babashka/.cache

cap10morgan 2022-03-20T16:54:27.324779Z

Yeah

cap10morgan 2022-03-20T17:00:34.148499Z

Or… we could tell people using local pods in prod that if they want caching they need to store the sha-512 hash of their pod binary in name-of-pod.sha512 in the same dir. and then we read that and use it for caching? Then they can just build the hashing into their build pipeline. It really only needs to be calculated once when the pod is compiled. Or we could do it if the file isn't found and then just check if the timestamp is still newer? Maybe getting too complicated… What do you think?

borkdude 2022-03-20T17:01:47.771159Z

Sounds too complicated.

borkdude 2022-03-20T17:02:19.976309Z

you still need to calculate the sha512 of the pod to compare it with the stored sha512

borkdude 2022-03-20T17:02:38.123449Z

and this will still be slower than just starting the pod and reading the uncached describe message probably

cap10morgan 2022-03-20T17:02:55.337329Z

Well, the idea would be that you don't do that and it's up to the pod builder to bust the cache by updating that themselves

borkdude 2022-03-20T17:03:29.704539Z

let's go with caching by default and in dev be able to opt out

cap10morgan 2022-03-20T17:03:46.747629Z

But yeah, I'm not sure it's a great approach. Was just trying to think of a way to only compute the hash when it changes.

cap10morgan 2022-03-20T17:03:51.333779Z

Sounds good

borkdude 2022-03-20T17:04:03.138199Z

note that we only do this for local pods, not for registry pods which are always cached.

cap10morgan 2022-03-20T17:04:07.658049Z

Yep