Fork me on GitHub
#xtdb
<
2022-06-10
>
zeitstein16:06:19

A data modelling question. Here's an example, expressed as clj data for simplicity:

[{:entity/id 1
  :entity/props-and-vals [{:prop 2 :val "today"}
                          {:prop 3 :enum 4}]}

 {:prop/id 2 :prop/name "due date"}
 {:prop/id 3 :prop/name "status" :prop/enums [4 5]}

 {:enum/id 4 :enum/val "todo"}
 {:enum/id 5 :enum/val "done"}]
Requirements: 1. 3 is a prop which has enumerated values, 2 is a prop whose value is a string. 2. Props are defined db-wide, enumerated values are defined per prop, while string values are defined per each entity. 3. Maps inside props-and-vals need to be ordered per each entity. 4. Queries need to answer questions like "find all entities that have this :prop and this :val/`:enum`" . Matter of fact, these will probably be among the most frequent queries. The questions is: how should props-and-vals be modelled?

zeitstein16:06:23

Given requirements and especially 4, I'd make it a "multi-cardinality join", i.e.

[{:entity/id 1 :entity/props-and-vals [-1 -2]}

 {:pv/id -1 :pv/prop 2 :pv/val "today"}
 {:pv/id -2 :pv/prop 3 :pv/enum 4}]
But I dislike the idea of storing these :pv/id docs per each entity. But I probably dislike slowing down queries even more. What would you do? Is there a more creative solution? Only thing that comes to mind is a minor optimization: :pv/id -2 is not really necessary – having the ref to enum 4 would be enough to reconstruct what the corresponding prop is. But maybe there's a solution that looks less like the clj example and so is non-obvious?

jarohen16:06:10

bikeshedding, but might this work?

[{:entity/id 1
  :due-date "today"
  :status :done
  :prop-order [:due-date :status]}]

zeitstein16:06:18

I don't think so. What if the user changes the :prop/name? (Maybe should've included that in the requirements.)

jarohen16:06:21

maybe the following, if you wanted to query the metadata too:

[{:entity/id 1
  :prop/due-date "today"
  :prop/status :status/done
  :props [:prop/due-date :prop/status]}

 {:xt/id :prop/due-date
  :label "due date"}

 {:xt/id :prop/status
  :label "status"}

 {:xt/id :status/done
  :enum :prop/status
  :label "done"}]

jarohen16:06:46

heh, messages crossed 🙂

🙂 1
jarohen16:06:14

yeah - could introduce a surrogate id for props if you didn't want to include it in the name itself

zeitstein16:06:13

Interesting! Really need random ids for the props though. But they would need to be keywords, correct? And queries like these would work?

(xt/q db
      '{:find [e]
        :in [enum-id enum-val simple-id simple-val]
        :where [[e enum-id enum-val] [e simple-id simple-val]]}
      :enum-prop-1 :val-id   ; status
      :simple-prop-1 "today" ; due date
  )

zeitstein16:06:47

I'll also just add that it's not metadata, really. Quite important for the prop to be linked to its enumerated values. (I messed up the requirements again!)

jarohen16:06:05

yes, they need to be keywords to be document attributes - but :prop/<uuid-str> is a valid keyword, for example

zeitstein16:06:07

Well I'm chuffed 🙂 Shows the flexibility of the schema-less approach. I will give this a go, thank you! (This has been on my mind for months...)

☺️ 1
jarohen16:06:14

> And queries like these would work? those and many others, I suspect - you could have some good fun with graph/pull queries there

jarohen16:06:48

I often use the document's :xt/id for such things, fwiw - means pull becomes more powerful

zeitstein16:06:48

Yeah, of course they would, I was just a bit stunned 🙂

zeitstein16:06:56

> I often use the document's :xt/id for such things, fwiw - means pull becomes more powerful I plan on adopting the strategy

{:xt/id uuid :entity/uuid uuid ...}
for that reason. And to be able to easily distinguish between doc types.

👌 2
zeitstein18:06:08

You know, I might try not using this. God forbid there is any duplication 😆 All I need to distinguish doc types is for each type to have one always-appearing keyword 🙂 E.g. [e :prop/name] would filter to props.

zeitstein19:06:23

Actually the :prop/<uuid-str> seems like an interesting convention to adopt for :xt/id generally – can tell the type just by looking at the id. Downsides?

nivekuil19:06:37

I don't like using :xt/id for anything in your application logic because it complects entities and documents, information and data. so I always use maps, like :xt/id {:type uuid}. Then when we want another document for that logical entity, to avoid write duplication for example, we can do something like :xt/id {:type uuid :qualifier :x}

nivekuil19:06:16

this does mean that pull maybe becomes more trouble than its worth, but imo EQL doesn't make sense without a schema anyway

zeitstein19:06:08

That's interesting. I might have a use case for that, but I'll have to think about it. @U050V1N74 This query throws Query didn't match expected structure

(xt/q
  (xt/db node)
  '{:find [e]
    :in [enum-id]
    :where [[e enum-id]]}
  :prop/status-id)
I don't think you can pass in attributes as args? This works:
(xt/q
  (xt/db node)
  '{:find [e]
    :where [[e :prop/status-id]]})

zeitstein20:06:22

From the https://docs.xtdb.com/language-reference/1.21.0/datalog-queries/#datascript-differences > XTDB does not support: > • vars in the attribute position, such as [e ?a "Ivan"] or [e _ "Ivan"] So I guess I'll just have to build :where before running the query.

zeitstein20:06:15

Also, if one doesn't know beforehand which props will be in an entity, can't join the props in pull. E.g:

['* {:prop/status-id [:prop/name]}]
Can't work if I don't know :prop/status-id in advance. Not really an issue for me (not going to use pull), but worth mentioning.