Fork me on GitHub
Tobias Sjögren16:10:51

From what I understand a query to the EAVT index to get every attribute and value associated with a specific entity is much faster than a query to the historic order log would be if the query - why is that? (I know the EAVT index contains datoms sorted by the “E” position)


EAVT is indexed by E; the log is ordered by TX.


actually what do you mean by the “historic order log”, the history index or the tx-log?

Tobias Sjögren16:10:01

Should that be understood as datoms sorted by E compared of by TX with no other difference?

Tobias Sjögren16:10:41

The doc says “transaction data in historic order”, that’s why I used that wording..


that sounds like the tx-log


it’s just “every transaction ever, in the order it was written”


so if you want to know about datoms related to a specific E, you would have to inspect all transactions for that.

Tobias Sjögren16:10:37

instead of having them grouped together as in the EAVT index?

Tobias Sjögren16:10:53

I need a better understanding of why datoms that sits next to each other (as in the EAVT index) are faster to query than ones that are “spread out” (as in the TX log)..


EAVT the index is a sorted B-tree like structure, with high branching factor and three levels, so if you know an E and that corresponds to the sort order, finding the segments EAVT data requires reading at least 3 segments (6 if you want history), and the parent segments are more likely to be in memory anyway since they are shared by many reads; finding the E in the tx-log must examine all data ever written--it’s as good as being unsorted.

Tobias Sjögren17:10:50

Would you say putting an entity with all its attributes/key-values in a JSON object is somehow similar to the EAVT B-tree ?


depends on the details. json objects are typically implemented as hashmaps, so probably not. At small scale it wouldn’t matter, but then nothing matters at a small enough scale

Linus Ericsson09:10:33

Please look into this introduction to Clojure PersistentVector to understand what optimizations it offers. This structure is (more or less) used in Datomic.

👍 1
Linus Ericsson09:10:22

HashMaps are in general not as fast as persistentVector and other B-tree like structures (if you know what you look for, that is), because they have to look for data stored at random positions in memory. CPU:s really like when data is stored in sequential memory blocks, and in the same 4 kb RAM page etc etc.

Tobias Sjögren10:10:30

What about data on disk, in contrast to in memory? (I need to learn more about this stuff..)

Linus Ericsson10:10:51

Datomic is a typical SSD-drive database. And there paging is similar to how it works in the cpu/cachepipeline. But a spinning disk also has the concept of pages (with memory mapped paging to disk heavily optimized in the OS kernel). Datomic dont use this directly though, but benefints through databases that do (DynamoDB for instance).

Tobias Sjögren05:10:14

So far I’ve just had a quick look at the link about PersistentVector - it seems it solves the problem of combining immutability and non-redundancy, there’s immutability but no copying - is that correct?

Tobias Sjögren09:10:09

(the author says ”persistance” to mean ”immutability”, I guess)


I'll try to untangle the words "persistence" and "immutability". Someone please correct me if I'm wrong.


"Persistence" means that when you do any operation on a data structure, you can still access the previous version. There's a bunch of techniques to implement this. For example, you could imagine that (assoc A 0 :foo) first made a copy A', then modified A' in memory and returned that. Or that each element of A keep track of version numbers, such that assoc creates an entry v2 for the first element, and you coordinate thing such that A' points to v2 and A points to v1. Persistence alone doesn't imply immutability or sharing of structure; you can have persistence without immutability.


"Immutability" refers to the fact that a data structure, once created, cannot be modified. The only way to modify it would be to copy it first. Taken together, persistence and immutability give you some very strong guarantees about your data structures. Concurrent access becomes much simpler, since any reference to A will always be valid and always return the same value.


In Clojure, it's not true that there's no copying involved. Otherwise things wouldn't be immutable. But there is "as little" copying as possible, and due to tree-implementation of vectors that "little" really is quite small, copying only the paths to the modified parts.

Tobias Sjögren11:10:54

From the ”Programming Clojure” book (2018): ”When all data is immutable, “update” translates into “create a copy of the original data, plus my changes.”” ”…persistent means that the data structures preserve old copies of themselves by efficiently sharing structure between older and newer versions.”

Tobias Sjögren11:10:23

I’m used to see the word ”persistance” in the context of ”persistent storage” and ”persistance layer” which is something else..