Fork me on GitHub
#babashka-sci-dev
<
2022-10-12
>
teodorlu12:10:34

Hi 🙂 I feel a bit bad for https://github.com/babashka/neil/pull/110 which I haven't worked on since August. Status: • The neil dep upgrade command seems to be working as expected • But the tests are failing, and I don't know exactly why. I had some trouble getting the same results locally and on CI. Part of the reason is probably that I don't do any JVM Clojure work day to day. So I haven't been able to use neil for anything useful lately. (though avoiding JVM startup makes me prefer neil to other alternatives, and I think it's a great example of good use cases for babashka). So - If anyone wants to take over the PR to get it merged, I would be happy with that, and happy to discuss if anything comes up. 🙂 Teodor

borkdude13:10:21

@cap10morgan I pushed a babashka branch with your pod PR in it. There are some failing tests. Could you guide me a little bit through this PR by writing up some details about the changes in one or both PRs?

cap10morgan15:10:10

Yeah, I need a linux/aarch64 clj-kondo to continue debugging the tests. Any chance of a release soon?

borkdude15:10:20

We just had a release so this might take a while. Maybe there are other pods you can test with?

cap10morgan16:10:01

well, I'm having some strange test failures w/ babashka's test suite even on master and the most recent release tag on my dev machine. so I wanted to run those tests, unmodified, in a docker container to establish a baseline of 100% passing tests before digging into what may be breaking some in my changes.

cap10morgan16:10:08

the test suite wants to use clj-kondo

cap10morgan16:10:37

maybe a linux/aarch64 build of the most recent release would be doable?

cap10morgan16:10:22

or maybe I can comment those tests out for now and then see if everything passes in CI once pushed

borkdude18:10:22

@cap10morgan I think it would be ok if we uploaded an aarch64 release manualy for the previous release

borkdude18:10:28

and then updated the existing manifest

borkdude18:10:44

I'll try now

cap10morgan15:10:11

This is more of a user-focused doc, but useful for understanding the PR anyway. I'm happy to provide some implementation notes too where helpful.

borkdude15:10:55

Please do :)

cap10morgan15:10:21

well, the basic high level thing is that when a namespace gets required that bb isn't already aware of (let's say ns.one.two.three), it adds ns/one/two/three/pod-manifest.edn, ns/one/two/pod-manifest.edn, and ns/one/pod-manifest.edn to the source-for-namespace search paths and then tries to see if it can load a resource from the classpath at any of those paths (basically the same way it would try to load a .clj, .cljc, or .bb file). if it can, it uses some new code to load a pod from the manifest, which is kind of inverted from the usual code path where you have a pod name first, then you get the manifest from the registry or the cache.

cap10morgan15:10:22

and, of course, those pod-manifest.edn files get onto the classpath by putting a pod lib in your :deps map

borkdude19:10:39

What if I wanted to expose clj-kondo.core in my library for clj via normal code but for bb via the pod, so people can use the same namespace? Then the manifest has to live in /clj_kondo/pod-manifest.edn - but what if there are mare libs that have the clj_kondo prefix? It feels a little bit brittle. What I expected instead (in previous discussion) is that we would have clj_kondo/core.pod.edn for example, so it becomes unambiguous. Yes, you would have to insert the manifest potentially n times, each time for every namespace, but to circumvent that, you could support a redirection in the .pod.edn to clj_kondo/clj_kondo.pod.edn or so which then contains the complete manifest.

borkdude19:10:07

(also commented this in PR)

cap10morgan19:10:27

That’s why I wanted an unlikely to be used by something else filename (in this case pod-manifest.edn instead of manifest.edn). I think it’s far simpler to just have one manifest for the whole pod, and highly unlikely to step on other stuff. You would put it in clj_kondo/core/pod-manifest.edn in this case (has to be at least two ns segments). This seems like a thing we already have to deal with all the time with JAR path flattening.

borkdude19:10:50

What I mean if that when I expose 2 namespaces in my pod, I'd have to move my manifest to clj_kondo but then this manifest potentially conflicts with other manifests

cap10morgan19:10:00

well, under the current implementation you can't move it to clj_kondo. it doesn't look in the top-level ns segment b/c they're often TLDs or pod or similar. and like you said, there's a lot of conflict potential then. but yeah, I see your point. for two-level namespace pods you're back to manifest-per-ns anyway. clj_kondo/pod-manifest.edn is probably OK on a classpath, but com/pod-manifest.edn or pod/pod-manifest.edn obviously isn't. :thinking_face:

cap10morgan20:10:54

yeah... I'm coming around to path/to/my/ns.pod.edn or similar. there's not enough consistency in how namespaces are segmented to be able to reliably start with one and turn it into an unambiguous manifest path for the whole pod.

borkdude20:10:52

The ns.pod.edn could be one level of indirection though: it could have 1. The location of the full manifest on the classpath 2. A mapping of namespace -> pod namespace. E.g. clj-kondo.core -> pod.babashka.clj-kondo It might be good to first flesh out these details before changing a lot of code.

borkdude20:10:17

Or the full manifest could have that mapping

borkdude20:10:09

What we could also do is this:

clj_kondo/core.bb
which already has preference, will execute some pod related code, which will then ensure, that the namespace exists. E.g. just (load-pod ...)

borkdude20:10:49

And this will just load the pod for the same version as the library

cap10morgan15:10:57

I needed to sleep on all this (in part b/c I've been sick this week so my brain runs out of the thinky stuff earlier in the day than usual). So fundamentally, what we need to accomplish is going from a namespace (since that's all that require gives us) that babashka hasn't seen yet (since otherwise we have a better way to load it) to a pod binary we (may) need to download and load. I agree with you that full-ns -> full-ns.pod.edn or similar is a nice reliable approach. Clojure/bb's built-in code-loading either finds a one-to-one mapping from a ns to a file like this or throws an error. For very good reasons no doubt. However, there will be use cases where having a file in every namespace-path that you should be able to load in "pod mode" will be cumbersome (if the pod dev has to put those files there manually; perhaps we can find an approach where they don't). Like in the now-deprecated pod-babashka-aws pod which walked over all of the namespaces in the aws client library to map them into the pod's describe map. That one may be deprecated, but it's a use case I think we should still keep in mind.

borkdude15:10:58

That is why I proposed an indirection

borkdude15:10:14

or a .bb solution

cap10morgan15:10:18

I think it gets really cumbersome even if those files are just indirections

cap10morgan15:10:23

even if they're empty

cap10morgan15:10:19

but what about a build tool helper for these kinds of pods that can e.g. run your pod, get its describe map, and then generate all of the little indirection files for you? most pod authors won't need it. but for those that do, it would make a big difference.

borkdude15:10:24

What we could also do is pre-scan the classpath for pod stuff and cache that result, so we can skip that work the second time

cap10morgan15:10:55

a tools.build lib or similar

borkdude15:10:42

I would say: first make the mechanism, then make it easier, else it feels like premature work

cap10morgan15:10:47

hmm... interesting. and use the classpath itself (or a hash or whatever) as the cache key?

cap10morgan15:10:01

certainly, that would be the order we'd do it in

cap10morgan15:10:18

but I wouldn't want to do all the work if we didn't at least have an idea of how to make that use case feasible

borkdude15:10:46

clj-kondo also has a classpath scan mechanism: it scans the classpath for clj-kondo.exports directories and copies its contents to the .clj-kondo config directory

cap10morgan15:10:00

that's an interesting idea

borkdude15:10:19

but clj-kondo only does it on demand. lsp which uses kondo does it for you though

borkdude15:10:03

bb might need such a mechanism for data_readers.clj as well btw, right now it doesn't support that

borkdude15:10:13

(I left that out for performance reasons)

cap10morgan15:10:13

perhaps we could provide a --skip-pod-lib-scan or similar flag for those who know they don't have any and don't want to pay the perf hit?

borkdude15:10:33

bad idea, nobody will use that flag

borkdude15:10:47

it should work out of the box without any config

borkdude15:10:19

but caching might be ok

cap10morgan15:10:26

right, that's what I'm getting at. the feature works out of the box. if you notice the perf hit and don't like it and complain (which some will), then we point you to that flag.

cap10morgan15:10:36

it's an opt-out, not an opt-in

borkdude15:10:48

yeah, but if we cache then nobody will need that flag

cap10morgan15:10:31

not sure I agree. think ephemeral environments where the cache doesn't stick around

cap10morgan15:10:53

but that's a secondary concern and one we can address when/if it's an issue

borkdude15:10:14

I think such a scan should not take more than 10-50 ms or so so probably most people won't notice

borkdude15:10:38

but there is the concern that your classpath will have libs with pod manifests that you don't want or need to download

cap10morgan15:10:56

yeah, it would just be a combination of like ephemeral env & critical startup time (like aws lambda)

cap10morgan15:10:47

hmm... right. like hybrid libs w/ some nses that are native bb compatible and others that need a pod?

cap10morgan15:10:59

and we don't yet know which nses that project is going to load?

cap10morgan15:10:51

this was all much simpler when people had to declare their pods up front 😆

borkdude15:10:53

sorry to hear you were sick btw, hope you're doing better

cap10morgan15:10:03

yeah, mostly better today. thanks!

borkdude15:10:10

maybe that's why we started with it. and why we have load-pod :)

borkdude15:10:24

so I was thinking, we could have just a .bb file that calls load-pod that overrides the ns?

cap10morgan15:10:45

that would still rely on the centralized registry?

borkdude15:10:04

we could extend load-pod with support for loading a registry from the classpath?

borkdude15:10:14

eh manifest

cap10morgan15:10:25

yeah, my PR already adds load-pod-from-manifest

cap10morgan15:10:44

that is an elegant, generic mechanism

borkdude15:10:56

and if you want to "prepare" then you just load these namespaces up front...? but then they also execute...

cap10morgan15:10:00

prepare may need to copy uberscript's ns & require eval'ing for this anyway

borkdude15:10:25

but prepare now also supports downloading for other archs/oss?

cap10morgan15:10:02

yeah, with certain env vars set.

cap10morgan15:10:03

it turns on a "download only" mode when those don't match the system, so perhaps we can hook into that and only do the download and cache even though we started w/ eval'ing a require

cap10morgan15:10:22

only for prepare, of course

cap10morgan16:10:27

well, actually, for prepare, I guess it already doesn't handle pods that are explicitly loaded w/ load-pod but not declared in bb.edn :pods. and eval'ing ns and require forms wouldn't help there if we use .bb files that call load-pod or load-from-manifest. so perhaps what it would really need to do is just find those forms too (i.e. load-pod*) and download and cache the pods they refer to. they wouldn't need to "eval" them per se, but just pick out the relevant info. hmm... perhaps treating code as data in a too-brittle way then...

cap10morgan16:10:01

although... if it isn't already, we could expose the download-only? opt to those fns (it's there in the call chain pretty soon anyway if not already directly available) and just force it to true in this context and eval away