Fork me on GitHub
#core-async
<
2023-02-23
>
hiredman17:02:46

can someone remind me post https://clojure.atlassian.net/browse/ASYNC-248 what do you do to get around stuff like

user=> (require 'clojure.core.async)
nil
user=> (require 'clojure.core.async.impl.ioc-macros)
Execution error (NoSuchFieldError) at clojure.tools.analyzer.jvm.utils__init/load (REPL:259).
__thunk__0__
user=>

Alex Miller (Clojure team)17:02:58

You may need to proactively compile async or the ioc ns

Alex Miller (Clojure team)17:02:47

But I would like to know more about the classpath context where you see that

Alex Miller (Clojure team)17:02:04

And what has been aot compiled to classes and what has not

hiredman17:02:06

this is in a repl started using one of our production uberjars

hiredman17:02:28

the build process is a whole thing (polylith aot compiled, etc)

Alex Miller (Clojure team)17:02:58

So that makes sense - ioc-macros is needed only at compile time and your prod jar will not have it in either src or class form

hiredman17:02:13

it does have it

hiredman17:02:39

it is loading ioc-macros, which is causing it to try and load clojure.tools.analyzer.jvm.utils

hiredman17:02:49

and that is what is throwing the field error

Alex Miller (Clojure team)17:02:25

Is ioc-macros a clj or class? Is tools.analyzer there and if so clj or class?

hiredman17:02:44

I am actively trying to debug something else, and as part of that I tried to load some code in the repl that used the go macro

hiredman17:02:59

which is what lead to this error

hiredman17:02:17

as far as I can tell toosl.analyzer is present in the jar as both source and classes

hiredman17:02:57

(the error message of course indicates it is at least present as classes because otherwise you wouldn't get an error from the verifier)

Alex Miller (Clojure team)17:02:38

the change in 248 moves ioc-macros to be dynamically loaded only when compiling go macros. if you only do that at aot compile time, you don't need to load it or its deps at runtime. you're describing a case where you do want to compile a go macro at runtime, so ioc-macros will be loaded at that point and needs its dependencies to do so. it seems like there is at this point in the chain, an issue, but I don't yet have a candidate as to the cause

hiredman17:02:57

anyway I created https://ask.clojure.org/index.php/12690/can-async-248-be-reverted I need to figure some other way to debug my actual issue right now

Alex Miller (Clojure team)17:02:21

not planning to revert 248, but you could use the prior version of core.async to not have it

hiredman17:02:42

we haven't changed our build process at all from pre-248 to this, and pre-248 this worked, and now it doesn't

Alex Miller (Clojure team)17:02:43

I would like to have a reproducible case where this happens to try to understand and fix it though

Alex Miller (Clojure team)18:02:54

anecdotally, explicitly aot compiling ioc-macros seems to avoid it

hiredman18:02:46

ah, right, I looked this before

Alex Miller (Clojure team)18:02:48

I assume this is some flavor of the issue with compiling protocols over src and getting instances against a different class

hiredman18:02:53

it is a non-reproducible builds issue

hiredman18:02:42

it is because the compiler uses an identity hashmap for fields, so everytime you compile fields get different orders

Alex Miller (Clojure team)18:02:35

you still did not say whether ioc-macros is a clj or class above, I assume clj

hiredman18:02:18

both clj and classes are present for core.async

Alex Miller (Clojure team)18:02:33

so what is the class with different field ordering?

Alex Miller (Clojure team)18:02:20

so far, seems like everything is already a class so there is no compilation, except whatever go macro you are loading - is that an issue with not re-finding the compiled classes from a go loop?

hiredman18:02:41

no, the issue is the compiler ioc namespace references a field in the compiled tools.analyzer code that doesn't exist

Alex Miller (Clojure team)18:02:44

and that's because the tools.analyzer class in your prod jar is different than the one at the time ioc-macros was compiled?

hiredman18:02:44

like maybe something in the build is explicitly requiring ioc-macros, which doesn't have the lock, so wouldn't be protected from concurrent loads with requiring-resolve

Alex Miller (Clojure team)18:02:22

a build should be serial so I don't think that hypothesis holds

Alex Miller (Clojure team)18:02:38

unless you are doing something I don't understand with your build

Alex Miller (Clojure team)18:02:14

this could fall out of doing things like compiling libs separately from the app and combining

hiredman18:02:19

I mean, we've been through this before with core.async getting loaded concurrently by the tools.deps maven stuff

Alex Miller (Clojure team)18:02:44

well, that's not impossible, but most of that is no longer concurrent. and I don't understand how you'd end up there during building

hiredman18:02:42

the modify time on the analyzer utils namespace class is before the modify time on the ioc macros namespace class

hiredman18:02:22

imply the analyzer namespace was aot compile first (but doesn't prove the ioc macros were compiled against those same class files)

Alex Miller (Clojure team)18:02:04

wouldn't that always be the case? ioc-macros uses analyzer uses utils so compiling ioc-macros would lead to a class for utils first. unless you mean the timestamp is significantly different somehow

hiredman18:02:33

it rules out the utils class files from the ioc-macros compilation being discarded and replaced with class files from a later compilation or something

hiredman18:02:44

well, I have started digging through our build process, which I really don't want to do, and hit this right off the bat

Execution error (NullPointerException) at clojure.tools.build.tasks.uber/copy-stream! (uber.clj:50).

Alex Miller (Clojure team)18:02:26

version of tools.build?

Alex Miller (Clojure team)18:02:44

I think that was fixed in 0.9.2

hiredman18:02:45

let me make sure I am kicking this build off right, I haven't done a local build since before this all got rewritten to use tools.build I think

Alex Miller (Clojure team)18:02:02

I believe that was a bug introduced in 0.9.1 and fixed in 0.9.2

hiredman18:02:29

I can't imagine @U04V70XH6 has us on anything old

Alex Miller (Clojure team)18:02:16

well, maybe it's something I'm not aware of (still likely introduced in 0.9.1 though)

Alex Miller (Clojure team)18:02:49

not obvious to me how that could happen - has to be a nil input stream, which should only happen with dirs, and all uses of copy-stream! are downstream of a check skipping dirs

hiredman18:02:04

I dunno, there are an awful lot of checks around it for something that can't happen

hiredman19:02:04

local version of tools.build with an assert around isFile

Assert failed: "/..../target/classes"
(.isFile f)

Alex Miller (Clojure team)19:02:04

I made a commit with more debugging if that helps to try at 727ca15c1f55b2062f6b7f4ba3d8e98981f629ad

Alex Miller (Clojure team)19:02:21

what does the .isFile line above mean?

Alex Miller (Clojure team)19:02:26

well what's up with .... there?

seancorfield19:02:37

(we're having our standup -- he'll be back online in a few)

hiredman19:02:15

I just elided a big long path

Alex Miller (Clojure team)19:02:13

it does, and all of the code downstream of this point may have a nil input stream, but also has a flag saying whether it's a dir and does not do file stuff (like copying streams) if it is a dir (explode1's first check includes a check on dir?)

Alex Miller (Clojure team)19:02:44

this could happen in the case where something is deleted from a directory between the point of collection and copying, for example by a concurrent process

Alex Miller (Clojure team)19:02:45

(that concurrent process could even be say the os doing something with editor or metadata files)

Alex Miller (Clojure team)19:02:18

not sure what or where you were asserting above so hard for me to guess

hiredman19:02:57

assert that the file from the list of files the loop is going over isFile

Alex Miller (Clojure team)19:02:40

dirs are expected there

Alex Miller (Clojure team)19:02:15

so that assert is not valid

hiredman19:02:41

found it, it is due to a symlink to a non-existent file

Alex Miller (Clojure team)19:02:16

interesting. is that something you will fix in your setup? I can look at better catching this case, but still seems like it should be an error?

hiredman19:02:22

yeah, it was emacs helpful droping wierd dot files

Alex Miller (Clojure team)19:02:12

is there a file name pattern that we could automatically exclude? we do that for some other stuff

Alex Miller (Clojure team)19:02:51

things like .DS_Store on Mac

hiredman20:02:04

it did start with a .

hiredman20:02:42

.# but I really should just have emacs set to either not drop those or drop them somewhere else

Alex Miller (Clojure team)20:02:30

21cb08cfadb2a37038c7cb6b0115992bcd0528db now has that as an automatic exclusion for uberjars, will be in next release

hiredman20:02:42

The lookup thunk in question has got to be https://github.com/clojure/tools.analyzer.jvm/blob/master/src/main/clojure/clojure/tools/analyzer/jvm/utils.clj#L259 which looks like the only the ng top-level that would generate one

seancorfield20:02:29

I have a tiny repro for the _thunk__0__ problem...

{:paths ["src" "resources"]
 :deps {org.clojure/clojure {:mvn/version "1.11.1"}
        org.clojure/core.async {:mvn/version "1.6.673"}}
 :aliases
 {:build {:deps {io.github.clojure/tools.build
                 {:git/tag "v0.9.2" :git/sha "fe6b140"}}
          :ns-default build}}}
and
(ns build
  (:require [clojure.tools.build.api :as b]))

(defn aot [_]
  (b/delete {:path "target"})
  (b/compile-clj {:basis (b/create-basis {})
                  :class-dir "target/classes"
                  :uber-file "target/example.jar"
                  :ns-compile ['clojure.core.async
                               'clojure.tools.analyzer.jvm.utils]}))
then clojure -T:build aot produces:
Execution error (NoSuchFieldError) at clojure.tools.analyzer.jvm.utils__init/load (REPL:259).
__thunk__0__

Full report at:
/tmp/clojure-11118341833155594225.edn
Execution error (ExceptionInfo) at clojure.tools.build.tasks.compile-clj/compile-clj (compile_clj.clj:114).
Clojure compilation failed, working dir preserved: /tmp/compile-clj13767047880507073890

Full report at:
/tmp/clojure-3581990850772262725.edn

seancorfield20:02:00

(the project otherwise has one source file containing -main but that's not relevant)

seancorfield20:02:08

If you swap the order of the nses in :ns-compile, you don't get the error at compile time but if you build an uber and run that as a REPL (as @U0NCTKEV8 did above), the _thunk__0_ error occurs on requiring the ioc-macros ns.

Alex Miller (Clojure team)20:02:44

that makes sense to me as how this could happen

seancorfield20:02:09

I thought it was CLJ-1886 but that seemed to point to another AOT'd lib but it seems to be an interaction between core.async and tools.analyzer directly when they are AOT compiled?

seancorfield20:02:35

Confirmed that core.async 1.5.648 does not cause this issue.

Alex Miller (Clojure team)20:02:07

so in your actual build, what are you actually compiling?

seancorfield20:02:48

Code that depends on core.async 🙂 -- but the above is the smallest repro I could create for this.

seancorfield20:02:22

We just compile the main ns for each of our projects -- so whatever it pulls in transitively is what gets compiled.

seancorfield20:02:34

(and that's a lot of code)

hiredman20:02:17

To be clear, @U04V70XH6 you are saying when you aot compile the utils namespace first, then core.async, then you get the error from the uberjar?

seancorfield20:02:52

Compilation itself fails. That project doesn't build an uberjar.

hiredman21:02:38

it looks like you are saying different compilation orders do different things: 1. ioc-macros 2. jvm utils 3. compilation failure 1. jvm utils 2. ioc-macros 3. failure when requiring ioc-macros in uberjar

seancorfield21:02:26

Yeah, if I built the uberjar in that project, I'm sure that would be the case -- but that you can't even compile tools.analyzer after compiling core.async is enough to highlight the problem.

seancorfield21:02:15

So I'm inclined to downgrade core.async at work to 1.5.648 until this is resolved...

hiredman21:02:50

I don't think that repro case is actually analogous to our build though because it passes multiple namespaces to compile-clj, compile-clj is going to do a topo sort of them based on static dependencies, not going to find any dependencies, so the order may be randomsih that they are compiled in, but our build only ever passes a single namespace to compile-clj

seancorfield21:02:40

No, :ns-compile overrides the topo-sort.

seancorfield21:02:22

If you don't provide :ns-compile then it tries to topo-sort all nses it can find. With :ns-compile, it just compiles the specified nses in that order.

seancorfield21:02:02

We only compile the main ns in our JARs so we just rely on transitive compilation to handle everything else 😐

hiredman21:02:23

ah, but still, the repo case is calling compile multiple times (which can easily result in double compiling a namespace and weirdness as a result)

hiredman21:02:30

but our build is not

hiredman21:02:08

(compile bypasses the already loaded lib check require does)

seancorfield21:02:10

True, but I suspect that repro project makes a good canary for the problem we are seeing... For our build at work, if we omit the main ns, compile-clj finds nothing to compile because our projects don't have source code: everything is a dependency (via :local/root for our own code)

seancorfield21:02:52

(I found that repro case by adding additional nses to :ns-compile in our build)

pmooser11:02:35

I also stumbled across this when I happened to call compile-clj and didn't explicitly specify namespaces, so it appears to have AOT-compiled everything, which I didn't imagine to be a problem, except now anything that depends on that compiled artifact can't do anything involving core.async without getting that same thunk error.

pmooser11:02:39

I'm not sure what I can even do to stop this from happening, since I don't see any way of AOT compiling without it transitively following things and compiling core.async too, except maybe if I always explicitly specify the AOT namespaces.

hiredman16:02:30

ASYNC-249 has what I think is a pretty damning repo case on it now, we've rolled core.async back a version at work. Fixing it will require reverting the change to core.async, or fixing the compiler.

hiredman16:02:36

It's not actually a core.async bug, but a bug in the clojure compiler that leaks the current compilation state into the compilation of any code loaded during macro expansion

pmooser17:02:38

I hope it gets addressed.

Alex Miller (Clojure team)17:02:31

it is high on my list for this week

Alex Miller (Clojure team)17:02:10

I appreciate all the info on the ticket @U0NCTKEV8

2