This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
- # aleph (4)
- # announcements (5)
- # babashka (28)
- # babashka-sci-dev (13)
- # beginners (63)
- # calva (76)
- # cider (113)
- # clara (7)
- # clj-kondo (42)
- # cljdoc (1)
- # clojure (170)
- # clojure-europe (20)
- # clojure-nl (17)
- # clojure-norway (3)
- # clojure-spec (12)
- # clojure-sweden (1)
- # clojure-uk (6)
- # clojurescript (55)
- # clojureverse-ops (1)
- # consulting (1)
- # core-async (9)
- # cursive (16)
- # data-science (1)
- # datascript (8)
- # datomic (27)
- # emacs (14)
- # events (1)
- # fulcro (10)
- # graphql (9)
- # gratitude (1)
- # jobs (6)
- # jobs-discuss (5)
- # leiningen (10)
- # lsp (35)
- # missionary (4)
- # nextjournal (9)
- # off-topic (46)
- # pathom (15)
- # pedestal (5)
- # polylith (37)
- # portal (15)
- # re-frame (22)
- # reagent (4)
- # reitit (5)
- # reveal (18)
- # shadow-cljs (20)
- # tools-deps (7)
- # xtdb (10)
From what I understand a query to the EAVT index to get every attribute and value associated with a specific entity is much faster than a query to the historic order log would be if the query - why is that? (I know the EAVT index contains datoms sorted by the “E” position)
actually what do you mean by the “historic order log”, the history index or the tx-log?
Should that be understood as datoms sorted by E compared of by TX with no other difference?
The doc says “transaction data in historic order”, that’s why I used that wording..
so if you want to know about datoms related to a specific E, you would have to inspect all transactions for that.
instead of having them grouped together as in the EAVT index?
I need a better understanding of why datoms that sits next to each other (as in the EAVT index) are faster to query than ones that are “spread out” (as in the TX log)..
EAVT the index is a sorted B-tree like structure, with high branching factor and three levels, so if you know an E and that corresponds to the sort order, finding the segments EAVT data requires reading at least 3 segments (6 if you want history), and the parent segments are more likely to be in memory anyway since they are shared by many reads; finding the E in the tx-log must examine all data ever written--it’s as good as being unsorted.
Would you say putting an entity with all its attributes/key-values in a JSON object is somehow similar to the EAVT B-tree ?
depends on the details. json objects are typically implemented as hashmaps, so probably not. At small scale it wouldn’t matter, but then nothing matters at a small enough scale
Please look into this introduction to Clojure PersistentVector to understand what optimizations it offers. This structure is (more or less) used in Datomic. https://hypirion.com/musings/understanding-persistent-vector-pt-1
HashMaps are in general not as fast as persistentVector and other B-tree like structures (if you know what you look for, that is), because they have to look for data stored at random positions in memory. CPU:s really like when data is stored in sequential memory blocks, and in the same 4 kb RAM page etc etc.
What about data on disk, in contrast to in memory? (I need to learn more about this stuff..)
Datomic is a typical SSD-drive database. And there paging is similar to how it works in the cpu/cachepipeline. But a spinning disk also has the concept of pages (with memory mapped paging to disk heavily optimized in the OS kernel). Datomic dont use this directly though, but benefints through databases that do (DynamoDB for instance).
So far I’ve just had a quick look at the link about PersistentVector - it seems it solves the problem of combining immutability and non-redundancy, there’s immutability but no copying - is that correct?
(the author says ”persistance” to mean ”immutability”, I guess)
I'll try to untangle the words "persistence" and "immutability". Someone please correct me if I'm wrong.
"Persistence" means that when you do any operation on a data structure, you can still access the previous version. There's a bunch of techniques to implement this. For example, you could imagine that
(assoc A 0 :foo) first made a copy A', then modified A' in memory and returned that. Or that each element of A keep track of version numbers, such that
assoc creates an entry
v2 for the first element, and you coordinate thing such that A' points to
v2 and A points to
v1. Persistence alone doesn't imply immutability or sharing of structure; you can have persistence without immutability.
"Immutability" refers to the fact that a data structure, once created, cannot be modified. The only way to modify it would be to copy it first. Taken together, persistence and immutability give you some very strong guarantees about your data structures. Concurrent access becomes much simpler, since any reference to A will always be valid and always return the same value.
In Clojure, it's not true that there's no copying involved. Otherwise things wouldn't be immutable. But there is "as little" copying as possible, and due to tree-implementation of vectors that "little" really is quite small, copying only the paths to the modified parts.
From the ”Programming Clojure” book (2018): ”When all data is immutable, “update” translates into “create a copy of the original data, plus my changes.”” ”…persistent means that the data structures preserve old copies of themselves by efficiently sharing structure between older and newer versions.”
I’m used to see the word ”persistance” in the context of ”persistent storage” and ”persistance layer” which is something else..