Fork me on GitHub
#polylith
<
2022-11-09
>
seancorfield00:11:45

@tengstrand FYI, I think we're running into yet more problems with the classloader isolation stuff and tests. We're seeing IncompatibleClassChangeError and other similar failures which all point to classloader-related problems (e.g., classes not being equal because they're in different classloaders etc, due to cross-thread loading and so on). And it nearly always only happens in CI -- the exact same commands tend to run just fine locally most of the time -- making it extremely hard to debug. I think we need a test runner that uses a fully-isolated subprocess instead of the futzing around with classloader isolation (which is nearly impossible to get right -- and Polylith does not get it completely right).

👀 1
tengstrand09:11:02

Okay. I’m on holiday this week till Sunday. Is it possible for you, @U2BDZ9JG3, or someone else to start looking at this immediately?

furkan3ayraktar10:11:33

I’m interested in discussing this issue and possibly helping implement a solution. I think the current classloader solution is good for many cases (we haven’t had any problems so far at work), and it’s suitable for newcomers since it’s fast. On top of that, we can implement an additional test runner which runs the tests by starting a new JVM per project on a separate process. To make it fast, we could start multiple processes simultaneously for different projects instead of running them sequentially. What do you think about this idea?

seancorfield17:11:24

I plan to start building it today. If I can do it as a pluggable test runner I will.

seancorfield17:11:51

My first observation is that this can't be built with the existing pluggable test runner machinery because that already assumes a classloader and multiple calls to eval-in-project -- which is where the problems are, for us.

furkan3ayraktar17:11:06

Yes, it needs to be added first to the main polylith project. We can then maybe give the option to the pluggable test runner developer to choose classloader or subprocess, although that’s tomorrow’s problem.

seancorfield18:11:10

I will be working in my fork of Polylith and try to do the minimal amount of surgery needed to make this work. The other wrinkle is that it will need a small, standalone test runner script with minimal dependencies that can be run as a subprocess from Polylith (to run the setup/teardown fns and iterate over the namespaces to test) -- so that will be impossible to do from an uberjar of the tool, only from a git dep (checked out) version.

seancorfield18:11:43

But I guess the main poly tool/uberjar could explicitly use tools.deps.alpha to make that dependency available at runtime, as another project in the Polylith repo. I'll cross that bridge when I come to it.

seancorfield18:11:17

@U2BDZ9JG3 @U08BJGV6E Am I reading the orchestrator code correctly that you can specify and run multiple test runners over the code? 👀 It looks like it runs setup-fn just once, then multiple test-runner passes, then teardown-fn (again, just once). I never knew that was even a possibility...

seancorfield19:11:24

Interesting, so for testing purposes, you run both the default test runner and the kaocha test runner to make sure both pass?

imre19:11:10

Yes and it doesn't make too much sense in this very case other than demonstrating the feature. I added it in response to an issue in poly, let me find it

imre19:11:16

It can be useful in a case when the different runners cover different "tests", like one for clojure test, another for eastwood

imre19:11:27

And on the note of your problem @U04V70XH6, I'd love to see a minimal repro case of it, where the current classloader-based stuff causes problems.

imre19:11:46

It is also possible that there could be bugs in the current classloader-based system. I think https://github.com/lambdaisland/classpath could be q useful utility in perhaps fixing it if that's the case

seancorfield19:11:28

A repro is really hard to create. The outline is that you need to have some pretty heavily async multi-threaded code in play, where you are reifying and/or proxying classes into a JDK "global" thread pool (such as the fork/join pool). The current poly isolation still shares the core JDK classloader so you can end up with multiple instances of the "same" class in that core resource and you start to get weird class cast/comparison/instance of errors because the Java classes that the Clojure depends on get loaded into different classloaders. My colleague did a fair bit of analysis on this and it depends on memory pressure and all sorts of other things -- which is why we usually tend to see it in CI but not locally, or we get different errors between local and CI or between his machine and mine etc.

seancorfield19:11:20

TBH, we hit some of these exact same issues with Boot "pods" because they also tried to use classloader isolation. It really just doesn't work at scale when you have dynamic loading of classes.

seancorfield19:11:46

(I'm repeating my understanding of my colleague's analysis but he's on vacation this week -- TL;DR: classloader isolation works for "simple" cases pretty well but always breaks down at some point in complex cases)

seancorfield19:11:56

My initial poking at the orchestrator indicates the pluggable test runner concept is still useful -- in terms of "are tests present?" etc -- but the eval in project stuff is problematic (and, in particular, trying to only run the setup/teardown exactly once across multiple test runners). If you're running in a subprocess, you have to run setup/teardown in that process, even if you have multiple test runners, so I think the safest approach here is to run setup/teardown for each test runner -- whether that ends up being in-project or in a separate process.

imre19:11:52

Sounds reasonable

furkan3ayraktar20:11:06

I agree; they should run for each runner.

seancorfield21:11:47

That makes my life simpler 🙂

seancorfield17:11:52

Changing the code to run startup/teardown per-test-runner simplifies things and I think it could simplify the transduce short-circuit too, since run-tests-for-project no longer needs to return truthy/falsey -- it can just rely on exception being thrown from run-tests-for-project-with-test-runner (which already throws for setup failure -- so I've made it throw for teardown failure as well, which was previously reported but then returned nil or false to short-circuit the transduce).

seancorfield17:11:11

Hmm, not quite: there's an edge case(?) where none of the test-runners see any test sources and that currently prevents any additional projects from being tested... which seems a bit weird but maybe @tengstrand can shed some light on that when he's back next week.

seancorfield17:11:07

I think I have a solid handle on how to write the external test runner at this point: I can leverage the existing TestRunner protocol but I need to add a method, so I'm going to add an optional ExternalTestRunner protocol that such processes can implement (in the reify call in their create function that returns the main namespace of the test runner to be used.

1
👍 1
seancorfield22:11:31

OK, here's the first cut of an external test runner, built right into the poly tool itself. I believe it could be a separate project like the Kaocha plugin but, at least right now, it depends on both itself and the poly tool being source checkouts, i.e., local or git deps, because it assumes that when it runs, it can find two specific namespaces as file-based resources.

👍 2
1