Fork me on GitHub
#datomic
<
2018-11-13
>
lwhorton20:11:53

> maybe slack isn’t the place to post this and i should take it to the datomic forum. if so, please let me know and i’ll clean it out. tldr; how do i more-simply configure/load/structure my database when foreign-keys rule my schema? --- after playing around with datomic for a few days in a “real” domain, i feel like i still have a fundamental conceptual misunderstanding around entities, attributes, and refs. i’m attempting to load two disparate datasets into datomic in a way that is easy to work with. “forecast”, a dataset for scheduling personnel resources into the future, assigning them to projects, etc., and “harvest”, which is the other half of forecast dealing with personnel logging hours worked to a particular project, etc. the issue i’m having starts just at the fundamental schema level - how do i model my own schema to either attempt to unify the two disparate datasets under common entities, or keep the systems separate but setup the refs properly so working with them is easy? a real example is probably the clearest way to get across my confusion: harvest has a “user”, which has a uuid harvest has a “project”, which has a uuid harvest has a “time entry”, which has a uuid, and points to a user uuid, and points to a project uuid forecast has a “user”, which has a uuid AND a “harvest-user-id”, which points to a harvest user forecast has a “project”, which has an id AND a “harvest-project-id”, which points to a harvest project forecast has a “assignment” which point to a user id, and points to a project id my first approach was to invoke “one big fn” that extracted all this data from a dataset, picked out the relevant attributes, and transacted everything into a db. after playing around with it, it’s really cumbersome to query anything meaningful because I have to keep working from “the top down” and chase attrs across entities based on some foreign keys. for example, if i’m trying to find out which user is assigned to which projects:

(defn assigned-to-projects [db lname]
  (d/q
    '[:find
      (pull ?p [*])
      :in $ ?lname
      :where
      [?u :user/last-name ?lname]
      [?u :user/forecast-id ?fid]

      [?a :assignment/user ?fid]
      [?a :assignment/project ?pid]

      [?p :project/forecast-id ?pid]]
    db lname))
every query i write resolves to some version of getting the id from harvest/forecast and tracing it through the domain entities until we can link up database entities. the way the foreign keys are linked in the domains and how they interact with datomic just feels wrong, like i’m doing something stupid. additionally, these arbitrary foreign key linkages aren’t really reified anywhere in my schema, and it’s like i have to go searching for “out of band” documentation somewhere to understand how to query the system. is this just a consequence of my domain? of bad schema modeling? is it my responsibility to better shape this data before transacting into datomic? do i load specific entities first (such as users) and then assign other entities (such as assignments) to the datomic identifiers so the refs are entity-id based and not foreign key based?

benoit21:11:34

@lwhorton As you suspected, your main issue is that you try to link your entities using your generated identifiers instead of actually linking the entities in Datomic with a ref attribute.

lwhorton21:11:21

i’m not sure i follow, so if you dont mind i want to keep with the example so maybe i can understand “in the small”:

(def assignment
  [{:db/ident :assignment/id
    :db/valueType :db.type/long
    :db/cardinality :db.cardinality/one
    :db/unique :db.unique/identity}
   {:db/ident :assignment/user
    :db/valueType :db.type/ref
    :db/cardinality :db.cardinality/one}
   {:db/ident :assignment/project
    :db/valueType :db.type/ref
    :db/cardinality :db.cardinality/one}
   ])

lwhorton21:11:19

is this fundamentally just a bad schema? should i declare idents assignment/user or assignment/project to instead be ? (what)? a more explicit schema like assignment/forecast.user.id? my relational-biased brain is trying to think that I can point my assignment/user to an entity that is a user, but really there’s no such thing, there’s just a collection of attributes that represent a user entity.

lwhorton21:11:26

is “the way” to configure a user schema :db/ident :forecast.user/id, load all the users first, and then (while loading assignments) point each :assignment/user at a db attr :forecast.user/id? if so, given a “assignment” entity (from the external dataset) with a forecast-user-id, wouldn’t it be required to perform a lookup for the db’s ref to the forecast.user/id for each “assignment”?

😕 4
Joe Lane21:11:34

@lwhorton I’d read up again on datomics universal schema. I usually only have Id’s for external systems and then have entities point to each other with refs. That should help with modeling. sorry I dont have more time right now to help.

lwhorton21:11:24

> then have entities point to each other with refs do you mean have an attribute belonging to an entity point to another attribute belonging to a different entity? weird sentence, but maybe that’s where i dont understand.

benoit21:11:18

Attributes do not belong to entities, they're relationships between two entities (ref attributes) or 1 entity and 1 value (all attributes other than ref).

lwhorton21:11:59

so does that mean when transaction assertions, in order to assign all facts to the related entity id, you have to assert 1 fact first, grab the eid, then use the eid to assert future facts? i know there’s shorthand map syntax for doing this, which means the T part of ETL should be responsible for shaping (joining) these maps before transacting?

ro616:11:03

Working my way through this conversation after the fact, so sorry if this got resolved later, but I think this is what temp-ids are for in transactions. If you want to assert a bunch of facts about the same entity, and the entity doesn't exist yet, you just provide a placeholder string in place of the entity id (the same string in each fact).

benoit21:11:37

@lwhorton your schema is fine. It is I think your query that was not right.

lwhorton21:11:48

if i do as i have done so far, which is just “dump a big old pile of facts into datomic all at once”, i can’t utilize any entity refs for queries, and have to resort to foreign keys… i think?

lwhorton21:11:01

gah i don’t get it. i have to go watch more videos and read more docs.

benoit21:11:25

You should not have to resort to foreign keys.

benoit21:11:13

This should give you the projects for a user with a given last name:

(defn assigned-to-projects
    [db lname]
    (d/q '[:find (pull ?p [*])
           :in $ ?lname
           :where
           [?u :user/last-name ?lname]
           [?a :assignment/user ?u]
           [?a :assignment/project ?p]]
         db lname))

benoit21:11:46

When you transact data, Datomic identify entities for you: https://docs.datomic.com/on-prem/transactions.html#identify-entities

favila21:11:13

@lwhorton what are your goals here? if the goal is to keep separate (possibly messy) datasets as-is, what you are doing (joining via values on attrs) is appropriate. You can even keep them in separate datomic dbs if you want and quary across them

👍 4
favila21:11:15

if you goal is to harmonize them and produce a consistent view of the universe, then your schema will look more like refs and a single entity will have attributes from both datasets. but you have a bit of an ETL job ahead of you to harmonize them

favila21:11:56

you can also use the first to help you build the second

lwhorton21:11:41

i think that is perhaps my problem: my schema and etl->{transacting data} is not correct. assuming datomic generates temporary ids and automatically joins them to setup proper refs, i think i just have missing refs in the schema (as well as bad rename-keys, etc. from the external data).

favila21:11:03

fundamental question: is a thing one entity or multiple

lwhorton21:11:34

i do like the idea of using a temporary datomic db to hold all the data as an intermediary ETL step

favila21:11:05

if a thing is always one entity you can still have assertions about that thing from multiple sources. they are joined by entity id so that you know the assertions are all about the same "thing"

favila21:11:32

if a thing may be many entities, you are siloing assertions by provenance

lwhorton21:11:52

that makes sense. thanks @me1740 and @favila for sticking with me. my root misunderstanding is just that i’m missing steps to unify two disparate data sources into a single datomic schema from which i can perform EID-based queries instead of attribute-based joins.

favila22:11:03

join-by-value is absolutely appropriate if you can't trust that all their assertions can be "said-of" the same entity

favila22:11:43

join-by-ref is what you want when you have created a consistent view of the world

ro616:11:03

Working my way through this conversation after the fact, so sorry if this got resolved later, but I think this is what temp-ids are for in transactions. If you want to assert a bunch of facts about the same entity, and the entity doesn't exist yet, you just provide a placeholder string in place of the entity id (the same string in each fact).