Fork me on GitHub
#clojure-dev
<
2021-11-04
>
Eddie15:11:18

Would it be crazy to ask for additional compiler options to set *compile-path* and *compile-files* via java system properties at startup?

Eddie15:11:54

My current solution is a bit of a hack: I call (.bindRoot Compiler/COMPILE_FILES true) on load of a particular namespace.

Alex Miller (Clojure team)16:11:05

could you explain what you're trying to do?

Eddie17:11:05

I am working in the context of distributed computing on HPC clusters. Most platforms that I am aware of use a JVM on each worker and expect all classes to be sent to each worker so that distributed tasks can be run. I use my hack to get class files from evaluations in my Clojure repl and then send them to a cluster of remote JVMs. It has been working well so far. For prior art, the Scala REPL always writes class files for each command and these are commonly sent to a remote JVMs.

Eddie17:11:42

The general problem statement is "Without the ability to write class files from the Clojure compiler, it is difficult to sync with remote JVMs."

Alex Miller (Clojure team)18:11:03

if you're at the repl, I don't understand why you can't just dynamically bind those. why do you care about system properties?

Alex Miller (Clojure team)18:11:34

writing classes is literally the only thing the compiler does, so I don't understand the problem

Eddie18:11:00

> if you're at the repl, I don't understand why you can't just dynamically bind those. why do you care about system properties? This is what we do today. It works for code compiled after the binding is set as long at there are no decencies on classes compiled before the binding. If system properties could be used to set the vars at startup, all classes created by the Clojure compiler would be portable. > writing classes is literally the only thing the compiler does, so I don't understand the problem https://stackoverflow.com/questions/69356231/get-or-write-class-files-for-the-classes-in-the-dynamicclassloader/69833025#69833025 was that the compiler creates classes in-memory and adds them to the DynamicClassLoader. If true, there are never any portable class files created. Is that correct?

Alex Miller (Clojure team)19:11:46

whether to write them to .class files depends on *compile-files* ... which you can bind when you need to

Eddie21:11:23

Here is an example:

(defn foo [x] x)
=> #'user/foo

(binding [*compile-files* true]
  (eval '(defn bar [x] (foo x))))
=> #'user/bar
The *compile-path* will only have 1 class file, user$bar.class. Based on my experiments, the bar class file seems to be unusable in a remote JVM because it references foo which I think will ask the class loader to load a class user$foo which won’t exist in the remote JVM.

Eddie21:11:12

The reason for my request to set the var at startup is that (I believe) it would allow me to keep the entire set of classes in sync across JVMs. I apologize if that I am thinking about this all wrong. I can see why this is an uncommon scenario.

Alex Miller (Clojure team)21:11:42

I think you're starting to approach a better problem statement :)

🙏 1
Alex Miller (Clojure team)21:11:04

compilation is a side effect of load, which is inherently namespace-oriented

Alex Miller (Clojure team)21:11:37

you seem to want to compile functions to classes for remote portability

Alex Miller (Clojure team)21:11:58

which necessarily requires the ability to resolve other functions

Eddie21:11:34

Correct! You absolutely trimmed the fat on my on my problem statement! For completeness, here is another example where a single form will result in multiple class files.

(defn i-must-be-portable
  [x]
  (let [me-too (fn [y] (+ x y))]
    (me-too x)))

Alex Miller (Clojure team)21:11:33

in the prior case what do you want? the set of all classes needed to run a function (both bar and foo)?

Eddie21:11:35

My current solution is to bind the var and ask users to not call functions that were defined pre the binding of the var. It works, but feels wrong.

Alex Miller (Clojure team)21:11:34

and you want an interactive environment, not one where foo is already on the remote machine

Alex Miller (Clojure team)21:11:26

(that latter scenario is something we've been thinking about for a long time with rehydrating vars)

Eddie21:11:02

Yes. Everything works fine with an AOT compiled jar that is placed on all remote machines. Not so much in the interactive case.

Alex Miller (Clojure team)21:11:13

several people have built things like this over the years, for cascalog/storm, for spark, etc

Alex Miller (Clojure team)21:11:46

can't say I've looked at any of them closely enough to compare

Eddie21:11:17

Ah, Spark is my use case as well. Do you recall which framework tackled this? I am familiar with most of them but I have never seen a solution to this.

Eddie21:11:35

FWIW, the Scala interface to Spark does it exactly as I am hoping to acomplish here: The Scala repl compiles all the incoming code from the user to physical class files and then broadcasts them to each worker JVM. The same could be done for Clojure if there was a way to set *compile-files* before any functions get compiled.

Eddie21:11:31

Thanks. I’ve looked at Sparkling quite a bit but I must have missed it. I’ll look deeper.

Alex Miller (Clojure team)21:11:20

I feel like there is another one too that I'm missing. certainly there were some older projects going way back, but I feel like there's something else

Eddie21:11:46

In addition to Sparkling, I have spent a bunch of time with https://github.com/sorenmacbeth/flambo and https://github.com/zero-one-group/geni. They either restrict deployment to AOT compiled uberjars (to interaction) or restrict the API to functions that are known upfront (no user defined functions).

slipset21:11:54

@cgrand did something along these lines? Called ourobourus or some such. Gave a talk at a conj about it IIRC

Alex Miller (Clojure team)21:11:04

that is what I'm thinking of

Alex Miller (Clojure team)21:11:15

but I am not coming up with the name :)

Eddie21:11:38

Ah ok. Thanks I haven’t looked into that project yet. Much appreciated.

slipset21:11:58

Powderkeg?

Eddie21:11:43

Awesome. I always look forward to a good conj talk.

Eddie21:11:48

Regardless, it sounds like you aren’t in favor of adding a way to set *compile-files* at startup. Do you think there is value in putting together an http://ask.clojure.org on the topic for sake of record keeping? If so, I’m happy to write one.

Alex Miller (Clojure team)21:11:22

I think that's maybe part of an answer, not a problem :)

Alex Miller (Clojure team)21:11:00

the problem is much more interesting and there are probably other solutions that have nothing to do with system properties :)

Alex Miller (Clojure team)21:11:45

the talk directly addresses all this btw

👍 1
Eddie21:11:17

Sure. I can imagine that this might be the only problem for which a possible solution is “get class files for everything”.

Alex Miller (Clojure team)21:11:08

people have run into this same general problem with Hadoop, with database functions in Datomic, etc

Eddie21:11:12

On the other hand, it was generally confusing to me that there wasn’t a way to specify all the compiler vars on startup.

Alex Miller (Clojure team)21:11:13

system properties are generally a pretty gross solution as they bake in the assumption that everything in jvm has the same goal and exactly one answer

Eddie21:11:06

Gotcha. That makes sense. If my project requires property A to be X and your requires A to be Y, we can’t compose.

Eddie22:11:20

Wow. That talk is exactly what I am looking for. Thats so much @alexmiller and @slipset!