I just discovered Data Semantics (J. R. Abrial, 1974) and Database Design – The Semantic Modeling Approach (Naphtali Rishe, 1992). Has anyone here read them? They seem like early predecessors of Datomic.
Abrial’s “Data Semantics” appears to be the first data modeling paper to make explicit use of binary relations as the foundational construct, which naturally yields triples...
You have me curious - don't suppose you have a link/copy handy?
This is another paper that is relevant I think: https://cake.fiu.edu/Publications/Rishe-90-SB.Semantic_Binary_Database_Model.published-scanned.pdf
Nice find! All the Date/Darwen 6NF history is also very relevant. As is http://daslab.seas.harvard.edu/reading-group/papers/dsm.pdf
The docs state: > Diagnostic: if your [Memcache] average is substantially less than 1.0, then you can serve more reads from Memcache by increasing the size of your memcached cluster. But I’m a bit unclear on what “average” would be here in the process monitor logs. Is it the ratio of sum to count?
:Memcache {:lo 0, :hi 1, :sum 93, :count 137}I see, that makes sense and also likely since we are giving quite a bit of ram
SUM/COUNT
is 0.67 considered “substantially less than 1.0”. It would not shock me that we could increase memcached cluster capacity, however when looking at the utilization stats it doesn’t appear to be particularly full or have high eviction
So, that count is pretty low per minute meaning your object cache hit ratio is probably actually pretty good.
Is there a way to achieve the same performance as On Prem d/index-range but with reverse index traversal over tuple indices using something like d/datoms?
not currently but we're considering it. Can you describe the use case and what your schema is?
It's for https://github.com/theronic/eacl, my SpiceDB-compatible ReBAC authorization library built in Clojure and backed by Datomic.
EACL calculates multiple paths through the permission graph and then recursively traverses those paths "in parallel", interleaving + deduplicating results in a stable return order.
Since parallel paths can emit duplicate results, EACL lazily deduplicates results and return either subjects or resources in a stable order that matches the order of resource eids stored at-rest, from one of two Relationship indices. Return order must be stable to enable cursor-based pagination, since sorting later would require materialising the full index (can be large, >1M results).
The https://github.com/theronic/eacl/blob/main/src/eacl/datomic/schema.clj#L180-L211 is two tuple indices that store a sorted set of Relationships when either the subject or resource is known:
• :eacl.relationship/subject-type+subject+relation-name+resource-type+resource for LookupResources, where subject is known, but resources are unknown, and
• :eacl.relationship/resource-type+resource+relation-name+subject-type+subject for LookupSubjects, where resource is known but matching subjects are unknown, as well as CheckPermission.
The lack of reverse-order traversal in d/index-range makes it hard to do cursor-based pagination in reverse. This makes "< Previous Page" navigation costly: to go back a page, you need to hold on to all previously seen cursors in a list of cursors and pop cursors off the list to navigate to the previous page, since you cannot construct the previous page using only the current cursor.
Queries look https://github.com/theronic/eacl/blob/main/src/eacl/datomic/impl/indexed.clj#L264-L288, where we first calculate possible paths through the graph and then traverse those paths recursively, returning lazy deduplicated seqs. I'm sure there are many areas to optimize and functions can be pulled apart, but it works 🙂
You can get close but not exactly the same with d/index-pull. There's an additional cost for what you select, even if what you select is already in the datom you are scanning.
does d/index-pull support nil wildcards in tuple end values like d/index-range?
nil inside a tuple isn't really a wildcard in either case. It's just a value that sorts lower than any other
hmm, good point I can just ask for Long/MAX_VALUE as "wildcard" in end value and take N using d/index-pull?
generally if you're termination condition depends on values within a tuple, omit a termination value and terminate via take-while with appropriate predicate
this is true for all the lazy scanners (d/datoms, d/seek-datoms, d/index-range, d/index-pull)
we don't have an official "highest value" or "lowest value" sentinel value
Roughly how much slower is d/index-pull vs d/index-range to gain reverse traversal?
like 10%? 2x?
all the extra work is satisfying whatever :selector asks for
so how much slower depends on what's in there
in my case, I only care about the tail position of the tuple value, which happens to be an eid
so it will cost you an AEVT per item
that said, I hit the returned results with a d/entity lookup later on anyway, so maybe that can be consolidated
d/entity would use EAVT, so it would be superior to that
so would standalone d/pull
d/pull-many, query :find's pull, and index-range all bias towards AEVT for reads to satisfy the pull expression, on the theory that AEVT is more selective when pulling a few datoms from a large number of entities
d/pull and d/entity assume you want a lot from a single entity, so favor EAVT
thanks for the info. I get the idea. (Can't click into AGPL stuff though)
Can't click into? Fully intend to relax the licence to something more permissive soon, just need to get approval from employer.
@favila oh, so d/index-pull https://docs.datomic.com/indexes/index-pull.html walk :aevt or :avet, but not a specific tuple index? that's gonna be way too slow. I was hoping it could take a tuple attr as :index.
the people need an d/index-range with :reverse true, please
d/index-pull can do :avet , which is exactly what d/index-range does
d/index-pull can only do :avet or :aevt
I'm not sure what you mean by "not a specific tuple index". :start requires an a
(d/index-range db :attr [start tuple] nil) is like (d/index-pull db {:index :avet :start [:attr [start tuple]})
ohh, I see. It was not clear to me that :start could have [:tuple-attr start-tuple], because I wasn't sure how it would destructure it.
A tuple example in the docs would go a long way. I totally discarded index-pull as a solution from reading the docs, because I assumed it could only query simple attrs on entities. This can work then.
tuples are just values 🤷 no difference
so if you want to traverse a tuple attr with d/index-pull and you specify :start as [:year+semester [2000 1]], what do you put in :selector, :course/year? I don't understand how it destructures the tuple value, if it all.
In my case, I want the tupleAttr in the https://github.com/theronic/eacl/blob/main/src/eacl/datomic/schema.clj#L187 which is :eacl.relationship/resource (for lookup-resources)
it does not destructure the tuple value, it pulls from E
equivalent to this
right, so it matches on the tuple, but selects over :e, and in my case the :selector would be [:eacl.relationship/resource]?
(->> (d/index-range db :year+semester [2000 1] nil)
(map (fn [[e]]
(d/pull db selector e))))sure, or you could pull the tuple itself if you want the whole thing
would putting the whole tuple be any faster?
no
it's going to do that AVET read to satisfy the pull no matter what
that's what makes this inferior to a true reverse-index scan for your use case
I'll need to benchmark. Luckily there are plenty of low-hanging optimizations remaining in EACL, like caching permission paths, using refs for relations & permissions instead of matching on keywords, and a few other things, possibly pmap, and optimizing lazy-dedupe-merge-sort. Thankfully SpiceDB does not support reverse lookup 😅, but we want it internally.
@favila why do you say d/index-pull has to do an AVET read if the :e is known? Wouldn't that be an EAVT given that :e of the attr (e.g. tuple) is available?
as per equivalence,
(->> (d/index-range db :year+semester [2000 1] nil)
(map (fn [[e]]
(d/pull db [:year] e))))
...wouldn't d/pull select over e via :eavt?Sorry meant aEvt
Code is imprecise. More precise would be using d/pull-many
OK, I think I understand: it uses :AE* since both A & E are known, it should be better performance than EA* since num of e's are large, and have better...locality?
i.e. in a DB with millions of entities, not all e's will have a given A, so traversing via AE* first is more selective than EA*, which might span more pages in storage.
If you want few A over many E, EAVT is almost guaranteed to have A you don’t want
(Per segment of index)
Whereas AEVT will only at worse have E you don’t want