Fork me on GitHub
#clojure-dev
<
2023-02-09
>
dmiller19:02:21

Looking at PersistentVector$TransientVector we have

Node ensureEditable(Node node){
		if(node.edit == root.edit)
			return node;
		return new Node(root.edit, node.array.clone());
	}
Straightforward. Return the existing node or create a new one. Notably, no mutation of anything. Then we get to
private Object[] editableArrayFor(int i){
		if(i >= 0 && i < cnt)
			{
			if(i >= tailoff())
				return tail;
			Node node = root;
			for(int level = shift; level > 0; level -= 5)
				node = ensureEditable((Node) node.array[(i >>> level) & 0x01f]);
			return node.array;
			}
		throw new IndexOutOfBoundsException();
	}
Unless I'm missing something, this potentially creates new new nodes all the way through the iteration, throwing away all but the last one, and for the last one, we don't need the node, just a possibly a copy of its array. Would the effect be the same as this:
private Object[] editableArrayFor(int i){
		if(i >= 0 && i < cnt)
			{
			if(i >= tailoff())
				return tail;
			Node node = root;
			for(int level = shift; level > 0; level -= 5)
				node = (Node) node.array[(i >>> level) & 0x01f];
            if (node.edit == root.edit) 
               return node.array;
            else  
               return node.array.clone();
			}
		throw new IndexOutOfBoundsException();
	}
except without all the potential copying of nodes and their arrays?

hiredman20:02:46

I think editableArrayFor could be removed entirely, and replaced with arrayFor + clone

hiredman20:02:14

I guess not entirely because you want to check edit before cloning

dmiller20:02:39

One could combine with arrayFor with a boolean parameter (yes, I know some people say these are bad) to indicate if a copying should be checked for. A question of style at this point.

hiredman20:02:50

the new copy is assigned to node, which is a local, which doesn't appear to escape the editableArrayFor, and is potentially overwritten as the for loop traverses down the tree, so it creates a new tree for mutation, but the root of the new tree doesn't appear to be saved anywhere?

hiredman20:02:31

it looks like where editableArrayFor is called (https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentVector.java#L854-L871) there is a second pass with ensureEditable constructing a second new tree where the tail from editableArrayFor is stuffed into

cgrand20:02:13

You are right, I replied too hastily without waiting to be in front of a proper screen. :man-bowing: The one in cljd is similar except that the new node is stored in its parent array.

(defn- tv-editable-array-for
  "Returns the editable array where i is located."
  [^TransientVector tv ^int i]
  (loop [^VectorNode node (set! (.-root tv) (tv-ensure-editable (.-edit tv) (.-root tv)))
         ^int level (.-shift tv)]
    (if (< 0 level)
      (let [arr (.-arr node)
            j (bit-and (u32-bit-shift-right i level) 31)]
        (recur (aset arr j (tv-ensure-editable (.-edit tv) (aget arr j))) (- level 5)))
      (.-arr node))))
and turns out I’m the one to blame for this code https://github.com/clojure/clojure/commit/da412909d36551a526ed090d8161cb59542db8b6 :man-bowing: :man-bowing:

dmiller20:02:41

Good to go straight to the source. 🙂

dmiller20:02:49

A tiny inefficiency waiting 11.5 years to be discovered.

😁 2
🤯 2
dmiller21:02:04

While we are on the topic, looking at TransientVector.pop()

public TransientVector pop(){
		ensureEditable();
		if(cnt == 0)
			throw new IllegalStateException("Can't pop empty vector");
		if(cnt == 1)
			{
			cnt = 0;
			return this;
			}
		int i = cnt - 1;
		//pop in tail?
		if((i & 0x01f) > 0)
			{
			--cnt;
			return this;
			}

      ...
shouldn't we be nulling out the entries in the tail array? Are we hanging on to values that potentially could be GC'd? (I haven' t gotten to the rest of pop yet, so there may be more when we are copying arrays.)

cgrand22:02:39

Only for a while as they are cleared in the call to persistent(). Is the trade off worth it? Especially nowadays where transient are not thread bound anymore :man-shrugging:

dmiller22:02:00

I just got to the point of porting persistent(). Agreed not worth it. Could you elaborate on the not thread bound anymore? Looking at all initialization and assignment to Node.edit in this code, could this field be a UUID or the value from a global counter?

cgrand22:02:43

ensureEditable used to check the current thread was the thread stored in the atomic ref. Nowadays refs are still initialized with the current thread but it’s purely vestigial. You could use an AtomicBoolean it would make no difference.

cgrand23:02:41

In fact you don’t even need a mutable reference if you can tag node references with 1 bit. That’s what we do for hash maps in Cljd.

dmiller23:02:07

Your last response came in just as I was writing a question about Atomic-anything being needed. Not sure about the 1-bit mention. don't you still need some kind of unique identification. Otherwise you do a transient, then persist it. Come back and transient on something that shares the non-root nodes that were marked (and never unmarked) during the first attempt. Don't you need to know that the you have a new transient going on, else this transient will think it already owns those nodes?

cgrand23:02:37

This one immutable bit tells whether the parent has unique ownership of the child. A node is editable if you have a continuous chain of exclusive ownership from root to node. No mutation required.

dmiller23:02:38

You are assuming some object that holds that bit and we are using reference equality, rather than a naked bit in each node? (I'm now guessing my current confusion is just my misinterpretation.)

cgrand23:02:59

If only we could really tag pointers it would be broadly applicable. That’s why we use it only in hahsmaps. In our hashmap implementation we use a 64-bit bitmap to allow tighter packing of the array. So two bits per child. Thus 4 possible states. At first we had usage only for 3 (empty, in-line pair, child node) so 41% of a perfectly working bit was going to waste 😀. So now we have 4 states : empty, in-line, shared child, exclusive child. And this frees us from having a ref to a mutable box. This makes up for the larger bitmap.

dmiller00:02:46

Lisp implementers closer to the metal had so many cool tricks involving tagging pointers. Sigh. I played around last year with doing a more F#-idiomatic approach to PersistentHashMap. Ran out of time back then. Will be revisiting soon. I'll take a look at what you've done to get some ideas. For PersistentVector, I'm not sure I can do better than a transaction id. Whether that is an int generated globally or an object with a boolean field, I'm not going to get much out of it. But an AtomicReference<Thread> really does seem like overkill.

cgrand07:02:08

How would you implement it with a transaction ID? You would need a global registry of in-flight transactions. Sounds more complex than a object with a mutable boolean field no?

dmiller13:02:45

RT.nextId() ?

cgrand13:02:08

I mean: you need to be able to tell if “transientction” is still going, who would you do it with just an id?

dmiller13:02:15

But AtomicBoolean is fine.

cgrand13:02:51

yes it provides unicity through, well, its own identity and state through its boolean field.