clojure-dev

dmiller 2023-02-09T19:51:21.935569Z

Looking at PersistentVector$TransientVector we have

Node ensureEditable(Node node){
		if(node.edit == root.edit)
			return node;
		return new Node(root.edit, node.array.clone());
	}
Straightforward. Return the existing node or create a new one. Notably, no mutation of anything. Then we get to
private Object[] editableArrayFor(int i){
		if(i >= 0 && i < cnt)
			{
			if(i >= tailoff())
				return tail;
			Node node = root;
			for(int level = shift; level > 0; level -= 5)
				node = ensureEditable((Node) node.array[(i >>> level) & 0x01f]);
			return node.array;
			}
		throw new IndexOutOfBoundsException();
	}
Unless I'm missing something, this potentially creates new new nodes all the way through the iteration, throwing away all but the last one, and for the last one, we don't need the node, just a possibly a copy of its array. Would the effect be the same as this:
private Object[] editableArrayFor(int i){
		if(i >= 0 && i < cnt)
			{
			if(i >= tailoff())
				return tail;
			Node node = root;
			for(int level = shift; level > 0; level -= 5)
				node = (Node) node.array[(i >>> level) & 0x01f];
            if (node.edit == root.edit) 
               return node.array;
            else  
               return node.array.clone();
			}
		throw new IndexOutOfBoundsException();
	}
except without all the potential copying of nodes and their arrays?

2023-02-09T20:28:46.078989Z

I think editableArrayFor could be removed entirely, and replaced with arrayFor + clone

2023-02-09T20:37:14.151569Z

I guess not entirely because you want to check edit before cloning

dmiller 2023-02-09T20:53:39.184149Z

One could combine with arrayFor with a boolean parameter (yes, I know some people say these are bad) to indicate if a copying should be checked for. A question of style at this point.

2023-02-09T20:02:50.679439Z

the new copy is assigned to node, which is a local, which doesn't appear to escape the editableArrayFor, and is potentially overwritten as the for loop traverses down the tree, so it creates a new tree for mutation, but the root of the new tree doesn't appear to be saved anywhere?

2023-02-09T20:06:31.755299Z

it looks like where editableArrayFor is called (https://github.com/clojure/clojure/blob/master/src/jvm/clojure/lang/PersistentVector.java#L854-L871) there is a second pass with ensureEditable constructing a second new tree where the tail from editableArrayFor is stuffed into

cgrand 2023-02-09T20:44:13.212759Z

You are right, I replied too hastily without waiting to be in front of a proper screen. 🙇‍♂️ The one in cljd is similar except that the new node is stored in its parent array.

(defn- tv-editable-array-for
  "Returns the editable array where i is located."
  [^TransientVector tv ^int i]
  (loop [^VectorNode node (set! (.-root tv) (tv-ensure-editable (.-edit tv) (.-root tv)))
         ^int level (.-shift tv)]
    (if (< 0 level)
      (let [arr (.-arr node)
            j (bit-and (u32-bit-shift-right i level) 31)]
        (recur (aset arr j (tv-ensure-editable (.-edit tv) (aget arr j))) (- level 5)))
      (.-arr node))))
and turns out I’m the one to blame for this code https://github.com/clojure/clojure/commit/da412909d36551a526ed090d8161cb59542db8b6 🙇‍♂️ 🙇‍♂️

dmiller 2023-02-10T13:04:45.164679Z

RT.nextId() ?

cgrand 2023-02-10T13:06:08.426349Z

I mean: you need to be able to tell if “transientction” is still going, who would you do it with just an id?

dmiller 2023-02-10T13:07:15.332679Z

But AtomicBoolean is fine.

cgrand 2023-02-10T13:11:51.419599Z

yes it provides unicity through, well, its own identity and state through its boolean field.

dmiller 2023-02-09T20:54:41.499429Z

Good to go straight to the source. 🙂

dmiller 2023-02-09T20:55:49.958919Z

A tiny inefficiency waiting 11.5 years to be discovered.

🤯 1
😁 1
baptiste-from-paris 2023-02-09T21:11:13.073039Z

Lol, funny

dmiller 2023-02-09T21:18:04.283029Z

While we are on the topic, looking at TransientVector.pop()

public TransientVector pop(){
		ensureEditable();
		if(cnt == 0)
			throw new IllegalStateException("Can't pop empty vector");
		if(cnt == 1)
			{
			cnt = 0;
			return this;
			}
		int i = cnt - 1;
		//pop in tail?
		if((i & 0x01f) > 0)
			{
			--cnt;
			return this;
			}

      ...
shouldn't we be nulling out the entries in the tail array? Are we hanging on to values that potentially could be GC'd? (I haven' t gotten to the rest of pop yet, so there may be more when we are copying arrays.)

cgrand 2023-02-09T22:28:39.223259Z

Only for a while as they are cleared in the call to persistent(). Is the trade off worth it? Especially nowadays where transient are not thread bound anymore 🤷‍♂️

dmiller 2023-02-09T22:39:00.645209Z

I just got to the point of porting persistent(). Agreed not worth it. Could you elaborate on the not thread bound anymore? Looking at all initialization and assignment to Node.edit in this code, could this field be a UUID or the value from a global counter?

cgrand 2023-02-09T22:59:43.230379Z

ensureEditable used to check the current thread was the thread stored in the atomic ref. Nowadays refs are still initialized with the current thread but it’s purely vestigial. You could use an AtomicBoolean it would make no difference.

cgrand 2023-02-09T23:06:41.663669Z

In fact you don’t even need a mutable reference if you can tag node references with 1 bit. That’s what we do for hash maps in Cljd.

dmiller 2023-02-09T23:11:07.399579Z

Your last response came in just as I was writing a question about Atomic-anything being needed. Not sure about the 1-bit mention. don't you still need some kind of unique identification. Otherwise you do a transient, then persist it. Come back and transient on something that shares the non-root nodes that were marked (and never unmarked) during the first attempt. Don't you need to know that the you have a new transient going on, else this transient will think it already owns those nodes?

cgrand 2023-02-09T23:12:37.652029Z

This one immutable bit tells whether the parent has unique ownership of the child. A node is editable if you have a continuous chain of exclusive ownership from root to node. No mutation required.

dmiller 2023-02-09T23:19:38.987529Z

You are assuming some object that holds that bit and we are using reference equality, rather than a naked bit in each node? (I'm now guessing my current confusion is just my misinterpretation.)

cgrand 2023-02-09T23:56:59.770949Z

If only we could really tag pointers it would be broadly applicable. That’s why we use it only in hahsmaps. In our hashmap implementation we use a 64-bit bitmap to allow tighter packing of the array. So two bits per child. Thus 4 possible states. At first we had usage only for 3 (empty, in-line pair, child node) so 41% of a perfectly working bit was going to waste 😀. So now we have 4 states : empty, in-line, shared child, exclusive child. And this frees us from having a ref to a mutable box. This makes up for the larger bitmap.

dmiller 2023-02-10T00:43:46.411369Z

Lisp implementers closer to the metal had so many cool tricks involving tagging pointers. Sigh. I played around last year with doing a more F#-idiomatic approach to PersistentHashMap. Ran out of time back then. Will be revisiting soon. I'll take a look at what you've done to get some ideas. For PersistentVector, I'm not sure I can do better than a transaction id. Whether that is an int generated globally or an object with a boolean field, I'm not going to get much out of it. But an AtomicReference<Thread> really does seem like overkill.

cgrand 2023-02-10T07:40:08.225319Z

How would you implement it with a transaction ID? You would need a global registry of in-flight transactions. Sounds more complex than a object with a mutable boolean field no?