Fork me on GitHub
#datomic
<
2017-07-10
>
beders05:07:16

I'm pretty green with datomic still, so forgive me if this is a dumb question: Is there some set semantics I can enforce on an entity level? Here's the example: Parsing e-mail addresses and storing them as :email/personal and :email/address where :email/address is unique, however, personal might differ. That is ok, unless I'm trying to transact this:

[{ :email/address "", :email/personal "Joe User"},
 { :email/address "", :email/personal "some other name"}]
{ :email/address "", :email/personal "some other name"}])
IllegalArgumentExceptionInfo :db.error/datoms-conflict Two datoms in the same transaction conflict
{:d1 [17592186054378 :email/personal "Joe User" 13194139543273 true],
 :d2
 [17592186054378 :email/personal "some other name" 13194139543273 true]}
  datomic.error/deserialize-exception (error.clj:124)
Any way to get around that?

misha08:07:03

@ezmiller77 1. you can programmatically http://docs.datomic.com/query.html#sec-6 construct query which would look like:

[:find (pull ?doc [*])
 :in $ ?t1 ?t2
 :where
 [?t1 :metadata/tag :tag1]
 [?t2 :metadata/tag :tag2]
 [?doc :metadata/tags ?t1]
 [?doc :metadata/tags ?t2]]
not sure whether it would be more readable/performant/maintainable/etc. than post processing, though. 2. I'd say, your attribute names look confusing. I'd use :metadata.tag/name for individual tags, or even, if you need those only as enum values, – {:db/ident :metadata.tag/tag1} Your schema makes it look (to me) like both :metadata/tag and :metadata/tags belong to the same entity, and do not represent relationship. 3. Also, if you supply tag ids to the query instead of actual tag keywords – you might be able to use your initial implementation. That'd be "preprocessing" with something like http://docs.datomic.com/clojure/#datomic.api/entid (which can be done inline btw.), I guess.

hmaurer10:07:12

@beders is your :email/personal attribute marked as cardinality many? It seems that you are trying to attach multiple personal names to your entity

beders17:07:19

hmaurer: If I marked them as many, I don't get the set semantic I'd like, i.e. adding an entity with the same email/name combo twice leads to two copies of the same name. I guess I want this: email: e1 name: n1, n2, n3, n4 where nx are the different names being used for the same e-mail address. I can achieve that by declaring cardinality of /personal to many, but then I get duplicates. I.e. I want n1, n2, n3, n4 to be unique as well

favila17:07:32

What is it precisely you want unique? entity+attribute+value is always unique, so you can't have duplicate names for the same email address if you make :email/personal cardinality-many.

favila17:07:13

i.e {:email/address "" :email/personal #{"foo" "foo"}} is literally impossible.

beders17:07:43

So the correct way is to look up the entity first for the e-mail address, then assert additional facts. I wanted to avoid the extra lookup, but it seems it is unavoidable.

beders17:07:53

thanks for the help

favila18:07:04

you could make email address upserting

favila18:07:12

if that semantic makes sense for your application

favila18:07:15

Compare section "Unique Identities" with the following section "Unique Values"

favila18:07:30

seems like you have :email/address as a unique value, maybe you want unique identity

beders18:07:38

it's unique identity. I still will not be able to insert the example I gave above in one go, due to :db.error/datoms-conflict Two datoms in the same transaction conflict

beders18:07:41

I'll do the extra round-trip to get the e-mail's e before inserting the actual e-mail entity (which contains attributes for :sender, :recipients, etc. )

beders18:07:47

thanks again

favila18:07:10

My point is that is not necessary with unique-identity

favila18:07:39

@beders See this example:

beders18:07:41

ok, I found the error in my original schema: personal was not set to many (and address was set to 'identity') With

{:db/ident       :email/address
                      :db/valueType   :db.type/string
                      :db/cardinality :db.cardinality/one
                      :db/unique      :db.unique/identity}

                     {:db/ident       :email/personal-m
                      :db/valueType   :db.type/string
                      :db/cardinality :db.cardinality/many}}
I can now do transact: [{:email/address "", :email/personal-m "Bubu"} {:email/address "" :email/personal-m "Lala"}]) and it works as expected!

favila18:07:42

(def uri "datomic:")
;=> #'user/uri
(d/create-database uri)
;=> true
(def c (d/connect uri))
;=> #'user/c
@(d/transact c [{:db/ident :email/address
                 :db/cardinality :db.cardinality/one
                 :db/valueType :db.type/string
                 :db/unique :db.unique/identity}
                {:db/ident :email/personal
                 :db/cardinality :db.cardinality/many
                 :db/valueType :db.type/string}])
;=>
;{:db-before datomic.db.Db,
; @5c351827 :db-after,
; datomic.db.Db @ecb17425,
; :tx-data [#datom[13194139534312
;                  50
;                  #inst"2017-07-10T18:29:14.565-00:00"
;                  13194139534312
;                  true]
;           #datom[63 10 :email/address 13194139534312 true]
;           #datom[63 41 35 13194139534312 true]
;           #datom[63 40 23 13194139534312 true]
;           #datom[63 42 38 13194139534312 true]
;           #datom[64 10 :email/personal 13194139534312 true]
;           #datom[64 41 36 13194139534312 true]
;           #datom[64 40 23 13194139534312 true]
;           #datom[0 13 64 13194139534312 true]
;           #datom[0 13 63 13194139534312 true]],
; :tempids {-9223301668109598144 63, -9223301668109598143 64}}
(d/transact c [{:email/address "", :email/personal "Joe User"},
               {:email/address "", :email/personal "some other name"}])
;=>
;{:db-before datomic.db.Db,
; @ecb17425 :db-after,
; datomic.db.Db @1f5fd569,
; :tx-data [#datom[13194139534313
;                  50
;                  #inst"2017-07-10T18:29:36.232-00:00"
;                  13194139534313
;                  true]
;           #datom[17592186045418 63 "" 13194139534313 true]
;           #datom[17592186045418 64 "Joe User" 13194139534313 true]
;           #datom[17592186045418 64 "some other name" 13194139534313 true]],
; :tempids {-9223301668109598142 17592186045418,
;           -9223301668109598141 17592186045418}}

beders18:07:52

thank you!

favila18:07:11

Your transaction error tells me that you definitely do not have :db.unique/identity set

favila18:07:36

(on :email/address)

isaac11:07:42

Why does’t datomic support resolve partition from string tempid ?

Ethan Miller13:07:58

@misha thanks for the helpful feedback. i think you are right that the attributes could use some simplifying/clarifying. i can't use enums for these tags because i want the tags to be definable by the end-user. what my schema describes is a simple entity, an arb, that has three things: an id, a value, and metadata. the metadata is a ref with a cardinality of many. its ident is :arb/metadata. then i've defined a set of attributes that can be included as ref-ed values for :arb/metadata, including the one we were discussing: :metadata/tags. all the attrs meant to be refs for :arb/metadata start with :metadata. your comment is helpful because it makes me think that 1) i was misunderstanding the conventions for the . and the / in datomic, and 2) that i could have done this more simply by simply associating a series of attributes using the style you suggested :metadata.tag, where the first part clearly indicates that it's metadata and the second indicates what kind of metadata. then i could skip the whole ref thing. did i get you right? regarding the meaning of the . and / and their conventional use, is this documented somewhere? or discussed in a blog post perhaps?

favila14:07:58

:x.y/z is keyword syntax from clojure (if you are not aware of clojure)

favila14:07:55

x.y (before the slash) is called the "namespace", clojure.core/namespace function will give you this part. z (after the slash) is the "name", get with clojrue.core/name

favila14:07:05

I advise never putting . in the name part

favila14:07:08

I usually put some indicator of the entity type the attribute appears on in the namespace part

favila14:07:44

@ezmiller77 I'm guessing you want your final result to look like {:db/id ... :metadata/tags [:tag1 :tag2]}? You can't both reify tags and get this result directly from a pull expression. Just accept that you will post-process the result

favila14:07:49

If you don't reify tags (i.e. if metadata/tags is cardinality-many scalar type, no data on tags), you can do this.

hmaurer14:07:17

Hi! Could someone email to me or link a good article on the performance characteristics of Datomic’s “filter” function? It takes the current db and an arbitrary predicate on datoms. How can be that executed efficiently?

favila14:07:19

Or you can weak-reference the tags with another entity (but from a data-modeling perspective, this is not a good idea)

favila14:07:45

@hmaurer the predicate is run on each and every candidate datom

favila14:07:03

so make sure your predicate is fast 😀

hmaurer14:07:17

@favila are there any predicate that cannot be used? for example, can I do a datalog query in the predicate to check an ACL or something?

favila14:07:42

you can do literally whatever you want, as long as it's synchronous

hmaurer14:07:00

right, but if I do a datalog query in the predicate I assume that will become very very slow on a large DB?

favila14:07:01

but again, speed is important

favila14:07:09

not necessarily

hmaurer14:07:12

I am not quite sure how many “candidate datoms” there are for a given query

favila14:07:36

Ah. You can determine that clause-by-clause

hmaurer14:07:50

so candidate datoms are datoms that match all clauses?

favila14:07:01

no, datoms are fetched lazily as needed

favila14:07:18

so the index segment that would be visible (without filtering) for a given clause is put through the filter

favila14:07:45

e.g. first clause of a query is [?e :some-attr "some-value"]

hmaurer14:07:59

So it is not crazy performance-wise to use filters for access control?

favila14:07:21

so the candidate datoms are (d/datoms :avet :some-attr "some-value")

hmaurer14:07:22

Sorry, keep going with your explanation on lazy fetching, it’s interesting 🙂

hmaurer14:07:41

so it will proceed and fetch clause by clause

hmaurer14:07:54

and it will also apply the security filter clause by clause

favila14:07:56

and then the datoms from the next clause are necessarily limited by whatever could bind to ?e

favila14:07:07

this is a simplification, because there is some parallelism going on

hmaurer14:07:15

so I guess in some cases it could even make a query more performant?

favila14:07:22

But clauses are not reordered

hmaurer14:07:22

since you are narrowing down the sets

favila14:07:33

yeah, good point, that could be possible

favila14:07:45

it really hinges on how fast and selective your filter function is

favila14:07:53

you want that to be as fast as possible always

favila14:07:04

Many people do use db filters for access control

favila14:07:16

(not really access, visiblity)

favila14:07:46

it's simple and brute force, but it gets the job done

hmaurer14:07:51

with your explanation on lazy fetching it makes a lot more sense

hmaurer14:07:34

a bit brute-force, but conceptually it’s really neat to be able to filter the database based on security rules

hmaurer14:07:50

and let the user query that filtered db abitrarily

favila14:07:01

just make sure your filter function is fast is all

favila14:07:09

tune the heck out of those

favila14:07:46

and try to use the same query object with parameters, so that query plan doesn't get regenerated every time

hmaurer14:07:57

I guess I could even pre-load the ACL for every user in memory so there are no remote calls in the filter predicate

hmaurer14:07:28

although Datomic’s peer caching should do the job too I guess

favila14:07:53

yes it often works just as well

favila14:07:02

you just need to try it

favila14:07:06

see what happens

hmaurer14:07:23

out of curiosity, is there any way to control the peer cache?

hmaurer14:07:35

e.g. force to keep some segments in memory, view what’s current in memory, etc

favila14:07:42

you can control how big it is, but that's it

hmaurer14:07:53

and in general, are there tools to debug datomic’s internals?

hmaurer14:07:57

observe what the peer is doing, etc

favila14:07:30

maybe there is a metrics callback for peers? don't remember

favila14:07:38

but there are no tools

favila14:07:43

internals are pretty black-box

misha15:07:11

@ezmiller77 you can think about attribute's namespace – as an sql table name, and attribute's name – as a column name. So :metadata.tags would be equivalent to tags column in metadata table. And :metadata.tag/namename column in metadata.tag table. (as opposed to yours name column in the same metadata table)

misha15:07:41

such thinking is a bit limiting though, because you can have attributes with different namespaces on the same entity (e.g. {:db/id 1 :foo/bar 2 :baz/quux 3}), but it explains my earlier comment well enough.

beders17:07:19

hmaurer: If I marked them as many, I don't get the set semantic I'd like, i.e. adding an entity with the same email/name combo twice leads to two copies of the same name. I guess I want this: email: e1 name: n1, n2, n3, n4 where nx are the different names being used for the same e-mail address. I can achieve that by declaring cardinality of /personal to many, but then I get duplicates. I.e. I want n1, n2, n3, n4 to be unique as well

hmaurer17:07:09

@favila does datomic re-assert existing facts?

hmaurer17:07:24

e.g if I transact a fact, then transact the same fact again later on

favila17:07:30

it used to, but not anymore

hmaurer17:07:32

will it ignore the transaction or double assert it?

hmaurer17:07:00

so transactions are always idempotent? (excluding possibly tx functions)

favila17:07:44

I want to say yes, but not sure about new rules for schema

favila17:07:58

I think still yes though

favila17:07:23

however a new :db/txInstant is still asserted

favila17:07:35

so it's not completely idempotent

favila17:07:04

(d/transact conn []) always asserts at least one datom--the tx-instant

hmaurer17:07:50

oh, that was my main question/concern

hmaurer17:07:07

so if you transact the same fact two times, the txInstant will be the latest one?

favila17:07:17

no it will be the earlier one

hmaurer17:07:46

oh so you mean it will create an “empty” transaction

favila17:07:47

if an E+A+V assertion would be redundant, it is not reasserted

hmaurer17:07:48

with a txInstant

favila17:07:19

well, it's not an empty transaction

favila17:07:28

it has a txInstant assertion

favila17:07:48

What I mean is every transaction has at least one assertion, the assertion of the time the transaction occured

favila17:07:57

even if you assert nothing else, calling d/transact will assert that

hmaurer17:07:59

“It will create a transaction entity, but no datom will be associated with that transaction”

hmaurer17:07:03

actually this makes me wonder

favila17:07:11

no, you misunderstand?

hmaurer17:07:21

are attributes of a transaction (e.g. txInstant) associated with their own transaction?

favila17:07:05

datom looks like [tx-id :db/txInstant #inst "..." tx-id true]

favila17:07:43

that is in fact the index used by tx-log to determine tx ids

hmaurer17:07:16

I get it now; thanks for your patience 🙂

wei21:07:11

is anyone validating data going into datomic using spec? in general, how are folks approaching data validation?