clojure-dev 2019-07-02 | Slack Archive

has there been any work done on making clojure objects (e.g. functions, anything reified) efficiently serializable between processes which are running the same code?

nathanmarz17:07:26

seems like this would need deep compiler integration in order to correctly account for closures

ghadi17:07:50

to what end?

nathanmarz17:07:14

like create an object in one process and send it to another process on another machine

nathanmarz17:07:24

that's running the same code

ghadi17:07:29

like CORBA?

nathanmarz17:07:40

not very familiar with that

ghadi17:07:43

or Java serialization?

nathanmarz17:07:00

specifically, I need to be able to take a function instance and send it to another process

ghadi17:07:10

send data and eval it

nathanmarz17:07:11

or an instance of an object defined via reify that has some random closure

ghadi17:07:22

actual data or a sexp

nathanmarz17:07:51

that seems too inefficient, and how would you send the closure?

ghadi17:07:12

but why?

mikerod17:07:37

@ghadi I’ve faced this a lot of times

mikerod17:07:43

seems there are plenty of reasons

mikerod17:07:46

distributed processing especially

nathanmarz17:07:59

yes, my case would be distributed processing

mikerod17:07:00

it’s not always super trivial to just eval it on the other side either

nathanmarz17:07:30

anyway, just wondering if there was some branch of clojure somewhere tackling this or any related work

mikerod17:07:46

1) have to setup the same context around it (maybe not trivial), 2) I’ve hit issue with max method size errors trying to eval across as well - depending on what you need to send.

ghadi17:07:47

there's some examples of serializing functions in portkey

ghadi17:07:01

using java serialization IIRC

cgrand17:07:52

Nope, kryo

ghadi17:07:21

👌:skin-tone-4:

mikerod17:07:13

I remember doing some extensions and improvements over this https://github.com/technomancy/serializable-fn in a private repo in the past

mikerod17:07:27

but it was based around just eval’ing forms later, but could get a bit more sophisticated with it.

mikerod17:07:51

I think the standard Serializable fails for certain closure situations.

mikerod17:07:32

(also, I’ve only did work on any of this for function instances, not things liek reify)

mikerod17:07:49

oh and lastly - concerning eval

nathanmarz17:07:50

i'll take a look at portkey

mikerod17:07:54

it is quite slow to eval a bunch of times. you want to “batch eval” but in a way that can not break method size limits

nathanmarz17:07:09

yea

mikerod17:07:17

We did some stuff with this on the clara-rules serialization stuff

nathanmarz17:07:41

the efficient way to do this would be to serialize an id for the class and the values in its closure

mikerod17:07:47

It was super slow to eval tons of “fn’s back again”, so we made a big data structure to hold them all in to do a “batch eval and then relink them to their appropriate places after”

ghadi17:07:54

data does not have edge cases: maps, sets, lists, scalars

mikerod17:07:48

sure, but there are useful applications …

ghadi17:07:34

you can send bytecode, you can have a classloader that is network-controlled

ghadi17:07:06

sending arbitrary things is an arbitrary requirement, and is going to be challenging

andy.fingerhut17:07:29

Sometimes, the customer really wants a machete, and all you can do is point out the sharp edges and hope they don't return it later with a bandage wrapped around their hand.

mikerod17:07:02

> Sometimes, the customer really wants a machete, and all you can do is point out the sharp edges and hope they don’t return it later with a bandage wrapped around their hand. eh, I don’t really consider this constructive here

mikerod17:07:12

distributing work across processes is common and this comes up

mikerod17:07:29

There are several clj libs out there doing things around this. Ones built around Hadoop stuff/Spark etc

mikerod17:07:56

But yeah, I’ve specifically targeted fn’s before, not just “any object”. I know there will be limitations

mikerod17:07:30

and I’m not even sure I think there is a “good solution” for clj alone. Basically I’ve always came back to the eval approach, but then have to do things like batching forms together etc

mikerod17:07:42

if you don’t want it to be extremely slow for larger things

hiredman17:07:43

http://paul.stadig.name/2009/03/clojure-terracotta-we-have-repl.html was a thing

Alex Miller (Clojure team)17:07:54

I was actually working at Terracotta at this time, helping Paul, before I was a Clojure user :)

😲 4

hiredman17:07:20

very cool

Alex Miller (Clojure team)17:07:54

that might have been my first intro to Clojure actually

mikerod17:07:07

networked style classloader etc, seems interesting indeed

hiredman17:07:54

which is an interesting question, if you want to send closures around, why not just turn all your machines in to a single shared memory space

ghadi17:07:04

even with the very general problem statement expressed, I still think you shouldn't pass around closures

ghadi17:07:31

if you want to send code, have a control plane be in charge of worker machines or classloaders from above

ghadi17:07:13

roll the code forwards or what not, but limit what's passed in between workers to be data

ghadi17:07:45

unless that's impossible for the SLA, in which case send the code to the data like Datomic Ions do

ghadi17:07:50

it works, and it's Really Fast

Alex Miller (Clojure team)17:07:35

working on sending function instances is not something we're likely to do in Clojure proper anytime soon

Alex Miller (Clojure team)17:07:51

working on being able to send rehydratable var refs is something we're working on

Alex Miller (Clojure team)17:07:47

vars are actually serializable now (since 1.9? 1.10? can't remember) and when deserialized, they will re-resolve in the target

Alex Miller (Clojure team)17:07:31

what we're looking at is making the reader support that too

Alex Miller (Clojure team)17:07:20

so can read a var and have it become a resolved var again

🚀 4

mikerod17:07:56

all seems reasonable. I just chimed in based on some past experiences anyways. fortunately for me, I haven’t been fighting with this situation in recent times.

mikerod17:07:02

this var addition seems interesting

hiredman17:07:32

the other place this pops up (thorny serialization issues) is image based development

ghadi17:07:37

the var serialization stuff is still a reference to the var, not the actual contents

✅ 4

Alex Miller (Clojure team)17:07:04

https://clojure.atlassian.net/browse/CLJ-2165

john18:07:27

Nice!

john18:07:22

@nathanmarz I've been working on a similar thing in CLJS [tau.alpha](https://github.com/johnmn3/tau.alpha/tree/master/src/tau/alpha) that I've recently been working on porting to nodejs.

john18:07:17

And I've been re-working the fn serialization stuff and it works pretty much like shipping around byte code

john18:07:40

but javascript as bytecode 🙂

john18:07:58

And in a web worker env or a tightly controlled cluster, I think it makes most sense to use a fully connected mesh so they appear to have a single shared memory space

john18:07:59

like @hiredman was saying

john19:07:04

the more recent version for node is aware of locals and grabs them too, if necessary, so the code looks a little more traditional. In the code above, that version uses a more explicit binding conveyance mechanism for the on macro. Anyway, ping me if you have any questions about it. I'll hopefully have a rough nodejs version out soon

nathanmarz19:07:40

@john cool, my case though is specifically that the code itself is shared between processes

nathanmarz19:07:46

so no need to send byetcode, since it's already there

nathanmarz19:07:15

just need to send an id for the class and whatever the fields are, which would be the closure

ghadi19:07:52

since you're not dynamically generating code, you don't need to do any of this: which ever class that generates the closures should be the target

ghadi19:07:08

ask instances of that class to generate the closures

ghadi19:07:59

(aside: this is no different semantically than passing around RPC-style maps with a target "op" key)

john20:07:35

@nathanmarz that's sort of how I'm doing it though. The reason what I'm doing mostly works is because the same compiled code is on both sides

john20:07:07

The byte code becomes mostly the serialized calling convention

john20:07:02

You can just use tagged literals for the IDs and hydrate them with the read function on the other side

nathanmarz20:07:47

@ghadi how do you ask instance of a function to generate its closure?

nathanmarz20:07:47

you mean with reflection?

ghadi20:07:06

no i mean call a Regular Function that returns a closure

ghadi20:07:38

with normal arguments, return function that closes over arguments

nathanmarz20:07:34

do you mean to write or generate the functions that need to be serializable in a special way?

ghadi20:07:33

(defn callme
  [a b c]
  (fn [x]
    ...use a b c))

nathanmarz20:07:58

not sure what you mean by that example

nathanmarz20:07:05

the code I need to be able to write would be like this:

nathanmarz20:07:09

(defn foo [a] (fn [] a))
(def f (foo 1))
(defn bar [a] (reify SomeInterface (someMethod [this] a)))
(def r (bar 2))
...
(send-to-other-process (serialize f))
(send-to-other-process (serialize r))

ghadi20:07:37

I get the impression that you're focused on mechanism and might need to step back into the problem space

ghadi20:07:07

if you control code on all nodes, you don't need to send code

ghadi20:07:28

send data that can recreate the code

nathanmarz20:07:29

yes, that's been established

nathanmarz20:07:44

the question is how do I take an instance of an arbitrary function and send that data

nathanmarz20:07:04

the data would be the class and the fields that comprise that instance (the closure)

nathanmarz20:07:47

or an instance of some object made with reify

ghadi20:07:53

send data that can recreate the objects, you can use metadata on the objects

nathanmarz20:07:54

so how would you imagine my code sample working then?

nathanmarz20:07:05

foo has to annotate its return with metadata as to what is in its closure?

john21:07:20

If foo and bar can be custom types, you can customize the print writer so that the other side knows to call foo and bar constructors on them on the other side. Assuming you control foo and bar.

nathanmarz21:07:18

no, that's not the case

nathanmarz21:07:27

this needs to work for any function instance or reify instance

nathanmarz21:07:38

it sounds like the answer to my question is there has been no experimental work on the clojure compiler for this

john21:07:28

Shipping code between run times is a generally frowned upon practice. think there hasn't been a lot of interest in it until recently

ghadi21:07:05

it's one of those things that every generation re-learns is a bad idea

ghadi21:07:47

where is the #corba channel? 😉