Might be a little pedantic, so feel free to ignore π
https://docs.datomic.com/transactions/transaction-data-reference.html#entity-identifiers consist of among other things an eid = nat-int
(pull db pattern eid & {:as options}) but eid here seems to refer to any of the entity-identifiers, except temp-id
On https://docs.datomic.com/transactions/transaction-data-reference.html#list-forms :db/add references an entity-id but as far as I can see, only supports an eid while :db/retract supports "entity-identifiers, except temp-id"
trying to make an internal api clear,
what do people use (as in variable names) to refer to
β’ eid = nat-int
β’ "entity-identifiers, except temp-id"
β’ entity-identifiers as defined in the first link
Db/add accepts any entity identifier. Datomic public apis only have two classes of input: entity identifier and entity identifier+tempid
If you are designing internal apis that care about distinguishing and accepting only certain identifier types and not others (why?), you should use specs and docs and not rely on idioms
> If you are designing internal apis that care about distinguishing and accepting only certain identifier types and not others (why?)
it seems :db/add works for existing entity-identifiers (ie. a lookup-ref that has already been transacted) while the pull and map-transact syntax doesn't care.
So :db/add supports temp-id or existing other entity-identifiers
where there are others that support existing entity-identifiers and non-existing entity identifiers but not temp-ids
> you should use specs and docs and not rely on idioms
Sure I do have schemas and specs, but I was hoping for some better names for those
As context this is part of migrating data from one system to another
Since we're doing it gradually, some things may already exist in the target and can be updated
Some may not, and need to be transacted for the first time.
We were originally planning on doing the diff and returning a list of datoms :db/add :db/retract to be transacted later, but we can't know the db/id ahead of time, and we can't upsert in a :db/add.
This caught me off guard, and that started the discussion about what we name the identifiers in different cases
Not a big point of contention though, there are ways around it
(yes I know we can use temp-ids in the list of datoms, it just complicates the logic much more than using lookup-refs)
I have noticed the inconsistency in the documentation as well. And we have struggled at my company and choosing good idiomatic names for entity identifiers. Itβs not a big deal, but it would be nice to have an unambiguous vocabulary word for that which can be used in a transaction and that which can be used in a query
> We were originally planning on doing the diff and returning a list of datoms :db/add :db/retract to be transacted later, but we can't know the db/id ahead of time, and we can't upsert in a :db/add.
Doesn't this mean it's not type that matters, but provenance? You care about the source of an eid, not whether it's a lookup ref or not
@helberg.andre, when referring to internal Datomic :db/id Long IDs, I use symbol eid.
For identifiers like [:unique/ident 123], I use ident, which can be passed to d/entity or d/entid.
@helberg.andre what do you mean by you can't upsert with :db/add?
@petrus Thatβs good, but doesnβt help in the case where the field can be some super set of those ident|eid|lookup-ref
A few days ago we had an issue where in a Datomic Cloud query group CPU spiked too 100 and free memory droped to zero. We are looking into the cause of this but I've found these messages in the logs. As they are not Alerts then should I just consider them as expected Datomic behaviour?
{
"Msg": "RestartingDaemonException",
"Name": "adopter-2",
"Ex": {
"Via": [
{
"Type": "clojure.lang.ExceptionInfo",
"Message": "Unable to load index root ref 456ed91a-ca5f-4448-869b-bbb7e7538625",
"Data": {
"Ret": {
"CognitectAnomaliesCategory": "CognitectAnomaliesFault",
"CognitectAnomaliesMessage": "Unable to execute HTTP request: Request did not complete before the request timeout configuration.",
"Error": "Unable to execute HTTP request: Request did not complete before the request timeout configuration."
},
"DbId": "456ed91a-ca5f-4448-869b-bbb7e7538625"
},
"At": [
"datomic.cloud.index$require_ref_map",
"invokeStatic",
"index.clj",
858
]
}Also, as long as I remember we have always had Alerts w/ CreateUpdateSystemFailed message in the logs. Ions deployments work, app functionality too so until now I have not bothered looking into it. These happen in all the query groups and (in the past) the primary compute group too. This is not related to the previous post.
{
"Msg": "CreateUpdateSystemFailed",
"Ex": {
"Via": [
{
"Type": "clojure.lang.ExceptionInfo",
"Message": "Unable to execute HTTP request: Request did not complete before the request timeout configuration.",
"Data": {
"CognitectAnomaliesCategory": "CognitectAnomaliesFault",
"CognitectAnomaliesMessage": "Unable to execute HTTP request: Request did not complete before the request timeout configuration.",
"Error": "Unable to execute HTTP request: Request did not complete before the request timeout configuration."
},
"At": [
"datomic.core.anomalies$throw_if_anom",
"invokeStatic",
"anomalies.clj",
94
]
}
],
"Trace": [
[
"datomic.core.anomalies$throw_if_anom",
"invokeStatic",
"anomalies.clj",
94
],Can you share what the EC2 InstanceTypes are for the Primary Compute Group and the Query Groups?
And the Datomic Cloud version you're using?
(And whether you're storing large strings in your system)
Currently: β’ i3.large for one query group β’ t3.medium for the primary compute group and other query groups It appears in all the query groups except the primary compute group where we do not deploy anything at the moment. This error was also present in logs in the past when we were running everything in a single compute group (i3.large or t3.xlarge).
We are running the latest version of everything, I'll look up versions in a sec
No need.
Do you know how far back (months/ releases/ wall clock time) you started seeing these errors?
> (And whether you're storing large strings in your system) and we generally do not save long strings but I am pretty sure we have data close to the 4096 string limit.
Do you know how far back (months/ releases/ wall clock time) you started seeing these errors?Not really. I am 90% sure it was present one year ago (100% October) but as we do not retain logs that long I cannot really prove it.
We have been running pretty much the latest Datomic all the time.
I recall Jaret noticing the last error during one of our discussions but we did not look further.
Ok, well I'll have a release I think you'll want coming soon for Cloud.
I am waiting for all the Cloud releases like Christmas π
Datomic Lucene Custom Parser request: can we please get custom Lucene parsers for Datomic fulltext search in a future Datomic version? I'm having to do a lot of sanitation and parsing Lucene grammar to offer user-friendly fulltext to end-users.
I'm seeing some unexpected behaviour when using tuples in Datalog rules. I'm trying to optimize a complex query by introducing tuples for composite values to reduce the number of clauses. I added tuples for all the heavy stuff, but my tests fail when I replace N clauses with a single tuple-matching clause. I would expect composite tuples to match exactly against any existing clauses that cover the tuple attrs, i.e. if a tuple has attrs [A B C], then in any query which has a clause to constrain A, B & C, then adding the tuple clause on [A B C] with the same logic vars, should not change the output of the query. Why are my queries returning different results? (in this case it's true|false result from an authorization system)
Managed to get a 250x+ https://github.com/theronic/eacl/pull/6/files on EACL with the help of Claude o4 Opus (MAX). Tests passing excl. expand-permission-tree which is not impl.
OK Mr @favila, in this https://github.com/theronic/eacl/pull/4 I've modelled arrow permissions under their own set of attrs to re-enable the unique ident on arrow 4-tuple (avoids the nils in tuples).
Then, branching off that in https://github.com/theronic/eacl/pull/5/files, I've started to make different rulesets for can?, lookup-subjects and lookup-resources (`build-can?-rules` vs. build-slow-rules) so I can take out the slow [?resource :resource/type ?resource-type] clause at the top of each rule that matches everything. It already speeds things up a lot when I move that clause to the bottom.
The bulk of changes in PR 4 are in:
β’ https://github.com/theronic/eacl/pull/4/files#diff-ad9e4d12c19c764fd2c47886c4eae769fe13a2a1d44f02259270e9ba818cdc95, and
β’ https://github.com/theronic/eacl/pull/4/files#diff-49a2e83709d07fd8c2db4bec3b6ca8d4280ce245fb20d101dac487bf30292bce.
I'm wondering if those bindings "destructuring" the matching tuple values could move down for speed, or if they could be pulled into a second phase. Something like: first traverse the graph for all reachability paths, and then bind values to cull the search space π€ .
Any of your expert feedback is most appreciated π. I'll try to benchmark next to see how it performs with ~10k entities.
1. Meta: composite unique tuples can be https://favila.github.io/2023-07-28/unique-composite-attribute-footguns/ 2. Syntactically, you can't just plop a vector into a data match clause, you need a single binding. 3. Adding a composite doesn't automatically backfill that value to existing entities. Did you backfill? 4. Tuple values are not interpreted for entity identifiers (i.e. they are "raw" when they contain refs). Are ?resource and ?subject values normalized to entity ids in your tuple?
Illustrating 2: [(tuple ?resource ?reation-name ?subject) ?r+rn+s][?relationship :eacl.relationship/resource+relation-name+subject ?r+rn+s]
1. Thanks @favila :) I'm aware of the nil problem (wish this was configurable)
2. Ah, I tried using the tuple fn and get the same results:
'[(has-permission ?subject ?permission-name ?resource)
[?resource :resource/type ?resource-type]
[?relationship :eacl.relationship/resource ?resource]
[?relationship :eacl.relationship/relation-name ?relation-name]
[?relationship :eacl.relationship/subject ?subject]
; I now use (tuple ...):
[(tuple ?resource-type ?relation-name ?subject) ?res-type+rel-name+subject]
[?relationship :eacl.relationship/resource+relation-name+subject ?res-type+rel-name+subject]
...]
3. Yes, this is in test suite that runs against fresh in-memory Datomic each time from empty.
4. Re: interpreted as for entity identifiers, I think this also relates to using tuple fn?tuple is just an alias for vector
it doesn't interpret anything (and can't! it doesn't get a db!)
The problem must be somewhere else that you're not showing me
I notice ?permission-name
that's a join further down β I'll push a more complete example to GH
and you know for sure that ?subject and ?resource are entity ids?
(stepping back, I assume this rule is just for testing this out--it doesn't make sense to join on individual items then join on the composite too)
Here is the https://github.com/theronic/eacl/blob/main/src/eacl/datomic/schema.clj (before I tried to add tuples). And here are https://github.com/theronic/eacl/blob/main/src/eacl/datomic/impl.clj#L50 to https://github.com/theronic/eacl/blob/main/src/eacl/datomic/impl.clj#L183`can?`https://github.com/theronic/eacl/blob/main/src/eacl/datomic/impl.clj#L183, lookup-resources and lookup-subjects.
(yes, joining on items as well as composite tuple was just to test)
?subject & ?resource are entity IDs, and ?relation-name is a keyword.
Could this be related to the order of the tupleAttrs, or that ?relation-name is a keyword?
OK sorry there's a bug in my example above (tuple ?resource-type ... should be (tuple ?resource...) (fixing)
Hmm, even if I make a different tuple with just resource + subject, tests still fail:
[(tuple ?resource ?subject) ?resource+subject]
[?relationship :eacl.relationship/resource+subject ?resource+subject]
(combined with the other clauses)(also tried removing :db/unique constraint)
have you inspected the entity you expect this to read?
If you just do (d/pull db '[*] [:eacl.relationship/resource+relation-name+subject [resource-id relation-name-kw subject-id]) do you see it?
If I query the relationships pertaining to my test resource :test/server1, I see:
[{:db/id 17592186045465, :eacl.relationship/subject #:db{:id 17592186045452}, :eacl.relationship/relation-name :account, :eacl.relationship/resource #:db{:id 17592186045464}, :eacl.relationship/resource+subject [17592186045464 17592186045452], :eacl.relationship/resource+relation-name+subject [17592186045464 :account 17592186045452]}]
So I can see the tuples are populated.query:
(d/q '[:find [(pull ?rel [*]) ...]
:where
[?rel :eacl.relationship/resource :test/server1]]
(d/db conn))and you are absolutely sure that can? is getting subject-id and resource-id as entity ids? because there's no normalization to db-id happening
that if-not guard also doesn't make sense unless they're not entity ids
aha, this could be the issue! I see in can? I am passing the idents throuhg. let me try that... π
This is my point 4
> Tuple values are not interpreted for entity identifiers (i.e. they are "raw" when they contain refs). Are ?resource and ?subject values normalized to entity ids in your tuple?
gotcha, thanks! I thought I was passing in eids, but can? function was passing potential idents π
Consider [(datomic.api/entid $ ?subject) ?subject-eid] etc in your rule, because with tuples it matters
or normalize on the outside
or write a function that does this whole thing for you, gives you the eid, bypass query completely
(defn relationship-matching [db subject relation resource] -> eid)
so can? now looks like this:
(defn can?
"Returns true if subject has permission on resource. Copied from core2.
Note: we are not checking subject & resource types, but we probably should."
[db subject-id permission resource-id]
(let [{:as _subject-ent, subject-eid :db/id} (d/entity db subject-id)
{:as _resource-ent, resource-eid :db/id} (d/entity db resource-id)]
(if-not (and subject-eid resource-eid)
false
(->> (d/q '[:find ?subject . ; Using . to find a single value, expecting one or none
:in $ % ?subject ?perm ?resource
:where
(has-permission ?subject ?perm ?resource)] ; do we still needs this?
db
rules
subject-eid
permission
resource-eid)
(boolean)))))
note subject-eid and resource-eid. Is that what you mean?multiple relationships could match to confer a given permission, so not sure how I would do that in the recursive query, or do you mean like a DB function?
I mean just the "lookup the relationship" part, not the whole rule
this thing about refs in tuples needing to be eids to match into indexes is a fiddly bit that is easy to get wrong in composition
you should isolate that requirement into something that composes well and does the fiddly bit correctly
the rule signature gives the impression that it doesn't matter; it appears your code assumes not-entity-ids
both of these are at odds with the requirement
OK, it works now π
But frustratingly, I still need the other binding clauses like along with the tuple clause, because I reuse these rules in lookup-subjects and lookup-resources:
[?relationship :eacl.relationship/resource ?resource]
[?relationship :eacl.relationship/relation-name ?relation-name]
[?relationship :eacl.relationship/subject ?subject]
I assume this would still be a speed up if the tuple clause is first by constraining matches?Don't you have these values already by virtue of using the tuple?
I see, they are not bound
that's what I expected, but lower down I have this clause:
[(not= ?subject ?resource)]
and Datomic complains that it is not sufficiently constrained if I don't also bind the other clauses:
":db.error/insufficient-binding [?relation-name] not bound in expression clause: [(= ?relation-name ?relation-name-in-perm-def)]"
I would expect that the ?relation-name in tuple would sufficiently constrain it:
[(tuple ?resource ?relation-name ?subject) ?resource+rel-name+subject]
[?relationship :eacl.relationship/resource+relation-name+subject ?resource+rel-name+subject]you can't use tuple matching for that
You need index-range specifically
data clauses match exact values, you want a cursor range
so you definitely need a helper fn
can I use helper functions from rules? (never tried)
yes
you already are, e.g. tuple
tuple is not an intrinsic, it's just a fn
how would I use index-range in a query like this to match on multiple values? (I'll have to do some reading)
Do I understand correctly that even if I have to provide all the bindings, as long as the big tuple exact match is at the top, it should speed up the query by constraining the search space and avoiding set disjunctions (right word?) ?
if you use the tuple, it gives you a single index on which to get candidates (including exact match). But it can only do that with prefix-matches.
so lookup-subject is pointless for example
the subject is the last thing in the tuple
"this thing about refs in tuples needing to be eids to match into indexes is a fiddly bit that is easy to get wrong in composition (edited" ^^^ this. I wrote a general purpose db fn to resolve to an EID, but usage can still be clunky ...and you have to remember to use it.
yeah, I suspected I'd need a different set of rules to optimize can?, lookup-subjects and lookup-resources, and probably play with the order of tuples to support prefix-matching, e.g. most systems have many resources but fewer subjects.
using the same rule name to execute different search strategies depending on what is bound is not a thing datomic can do
right, I'd have to pass in different set of rules in can? vs lookup-subjects vs lookup-resources
so I think making lookup-subjects, lookup-resources, and can? all use has-permission as-is is not possible
@cch1 would you mind sharing how that looks and how to use it?
Example of index-range: (d/index-range db tuple-attr [resource-eid nil nil] [(inc resource-eid) nil nil]) gives you a seq of all datoms where resource-eid is the first element in the tuple
can you do that in a d/q, or do you pass that into a d/q?
I would extract to a fn you call in a d/q
(or call in a rule)
in on-prem there's no point to the "make everything pure query" discipline. Stuff that is syntax pain in datalog just encapsulate into fns
Here are my https://github.com/theronic/eacl/pull/4. Sorry it's a bit big, but the new schema w/tuples is in schema.clj, and the new rules are under fn build-fast-rules. in eacl/datomic/impl.clj I just added the tuple constraints above the other bindings. Next steps I'll try to use index-range and have different rules for different purposes.
Direct links:
β’ https://github.com/theronic/eacl/blob/optimize/tuples/src/eacl/datomic/schema.clj#L3
β’ https://github.com/theronic/eacl/blob/optimize/tuples/src/eacl/datomic/impl.clj#L142 (hopefully faster. I need to benchmark next)
[?resource :resource/type ?resource-type] ; would this be faster lower down? Depends on whether ?resource is bound
if not bound (e.g., lookup subjects) it is way slower
building set of all relationship types from all relations
Really important to use the rule binding syntax if you are optimizing rules to make them direction-specific
what do you mean by rule-binding syntax? I don't quite understand direction-specific
"required bindings" https://docs.datomic.com/query/query-data-reference.html#rule-required-bindings
Looks like [(subject-has-permission [?subject] ?x ?y) ...] (note extra brackets around ?subject
This causes the query to error if you call a rule without the bindings it expects
"direction specific" means you are optimizing your rules for a certain direction of traversal (e.g. subject to relationship vs resource to relationship)
maybe not precise way of putting it. Essentially what you know changes how you learn new things.
abstract datalog/logic, nothing changes (you're describing the same relations), but implementation wise what indexes you use in what order changes
OK, great! Thanks so much for your help, @favila π. Re: directionality, it's a bit tricky because subjects & resources are connected via the relation in one or more Relationships and you can inherit permissions via other resources, e.g. user Favila owns Account A, under which fall 10 servers and that confers <these> permissions. So you kind of need to traverse that graph and it's not always obvious in which direction to search: β’ up from resource to subject? (probably), or β’ down from subject to resource? and then the permissions fall out of the schema for those Relations mentioned in various Relationships. Hopefully EACL can be useful to you and other Datomic users who need Spice-like AuthZ (once it's fast enough). The goal is not to try and match SpiceDB, just make it "good enough" for 10k-100k entities. (I posted my https://x.com/PetrusTheron/status/1927409042659422478, if you're interested)