datomic 2022-10-13 | Slack Archive

Pascale Audet15:10:06

Hello ! We have a particular use case and have a mandate to stick with Datomic as much as possible. We are starting to plan a lot of programmability into the system, so that end users can dynamically define entity types, attributes, etc. In other words, add their own schemas. Our first intuition is not to let them create actual schema attributes, but to represent them as data in our domain, not in Datomic's schema domain (we thought of Thing→Data from Reddit or some RDF variant). Of course, by doing this, we will lose the power of Datalog on regular schemas. We believe that we cannot use the typical vertical table (entity id, attribute and value) to store the entity data. So we thought of using tuples (entity id, attributes and values) and zip the attributes/values of the tuples (in the same table or separated, review the diagrams). We will deal with limiting the length of the tuples. If you have experience with this type of design and its trade-offs, we'd love to hear it too.

robert-stuttaford16:10:45

all in one multi-tenant database?

robert-stuttaford16:10:14

you have a ceiling on idents, 32k items if i recall

📝 1

dvingo16:10:20

I was working on an application a couple months ago that had a similar use-case of allowing users to define custom forms - collection of form types. We decided to not allow users to create schema programmatically. In the end I think we agreed the system design would have been a lot simpler to allow them to actually transact schema attributes though. So I would recommend thinking through a system design that allows that. Namespaced keywords make this sort of thing tractable

📝 1

Dustin Getz17:10:08

Do the user schemas overlap or do they naturally shard per tenant? Idea: one database per user tenant

steveb8n21:10:00

We have done this at Nextdoc. I can’t share all the IP but can show you the base storage design which simplifies how you store attributes.

📝 1

steveb8n21:10:27

there’s another set of meta data that maps customer entities to these entities and a layer of CRUD fns to abstract it all. that’s too much to share but the underlying storage can point you in a direction that works

steveb8n21:10:10

we use Lacinia and it makes the abstraction of customer entities easier too

favila22:10:54

The approach I’ve taken is to create attributes for each value type, allow users to create attributes as data entities referencing one of them, and model “assertions” as entities with an attribute ref plus value ref. E.g.

;; schema attributes to support user-attribute models
 [{:db/ident :user-defined-attribute/name
   :db/cardinality :db.cardinality/one
   :db/valueType :db.type/string
   :db/unique :db.unique/value}
  {:db/ident :user-defined-attribute/valueType
   :db/cardinality :db.cardinality/one
   :db/valueType :db.type/ref}
  
  {:db/ident       :user-data/value-string
   :db/cardinality :db.cardinality/one
   :db/valueType   :db.type/string}
  {:db/ident       :user-data/value-strings
   :db/cardinality :db.cardinality/many
   :db/valueType   :db.type/string}
  ,,,
  ]
 ;; User attribute definition
 [{:user-defined-attribute/name      "my-attribute"
   :user-defined-attribute/valueType :user-data/value-string ,,,}]
 ;; User attribute "assertion"
 [{:user-data/attribute [:user-defined-attribute/name "my-attribute"]
   :user-data/value-string "my-value"
   ,,,}]
 
 (d/q '[:find ?data-e ?data-attr ?data-val
        :where
        [?data-e :user-data/attribute ?data-attr]
        [?data-attr :user-defined-attribute/valueType ?datomic-a]
        [?data-e ?datomic-a ?data-val]
        ])

favila22:10:40

It’s also possible to write a pull expression + xform that will “lift” an entity which represents a “user data” into a single map entry

favila22:10:40

e.g.

(defn lift-user-data [elem]
  {(-> elem :user-data/attribute :user-defined-attribute/name)
   (get elem (-> elem :user-data/attribute :user-defined-attribute/valueType :db/ident))})

(pull db [{(:entity/user-data-element :xform 'lift-user-data)
           [{:user-data/attribute [:user-defined-attribute/name
                                   {:user-defined-attribute/valueType [:db/ident]}]}
            :user-data/value-string
            :user-data/value-strings
            ]
           }]
      e)

tatut05:10:53

as you can use multiple databases in queries, wouldn’t the approach of having a “tenant custom db” for each tenant be good? having a metamodel on top of the actual model seems cumbersome

tatut05:10:49

like having 1 shared main database that has all the common things and then each tenant would have a separate custom db for their attrs

octahedrion08:10:28

I don't think Datomic Cloud supports multiple databases in queries, but OnPrem might

📝 1

robert-stuttaford09:10:26

on-prem does yes

favila10:10:25

using multiple dbs isn’t that great operationally unless you have a fixed number and they are smallish

📝 1

favila10:10:22

Connecting is slow, so they need to be connected all the time practically speaking. If they all share a transactor, a need for indexing on one blocks indexing on the others. It’s hard to predict object cache utilization. Maybe it makes sense sometimes. However I think both schema and databases are meant to be provisioned and manipulated by devs not users (i.e. carefully and thoughtfully)

👍 2

robert-stuttaford10:10:18

yes, all dbs in use need to fit their roots into peer memory

tatut10:10:13

good to know

Pascale Audet10:10:40

Hi @U0509NKGK (@U09K620SG and @U11SJ6Q0K too) , ideally, everything in one database, about 90% of the data will be shared with other tenants. But I may be wrong in my reasoning. And thanks for the idents, I didn't know that, I'll add it in our note.

Pascale Audet10:10:03

@U0CKDHF4L, we are on Datomic Cloud

Pascale Audet10:10:36

@U051V5LLP, thanks for your experience!

Pascale Audet10:10:53

@U0510KXTU what happens if someone needs a 15th field?

Pascale Audet10:10:22

@U09R86PA4, can you tell us how big your table is at this point? How long does it take to search on a user defined attribute?

favila11:10:26

We have a 17 billion datom db, but these user attributes are a fraction of that, tens of millions of entities at most, tens of thousands of tenants. You’ll have to be more specific about what you mean by “search”--the operations we do usually start from a user attribute (unique per tenant) or a thing which has an attribute on it. Both of these are fast enough that we don’t notice.

👍 1

favila11:10:32

as you can see it’s an extra join or two onto likely-to-be-loaded (low-cardinality, very shared) entities.

Pascale Audet11:10:01

Thanks for the details! You've also responded to the "search" question.

Pascale Audet16:10:19

#Also sent to the channel

I can see that a vertical table would work for our use case. However, I think most of you are on Datomic On-Prem? Do you think it would be the same on Datomic Cloud?

steveb8n21:10:05

We just add more attributes as needed. You will always have some limits to # of columns or you can be DOS’d or DOW’d so having limits in this dimension is consistent with that

📝 1

favila21:10:50

yeah, even in your datomic schema cardinality limits are a good idea (i.e. cardinality-many rarely really means “infinity”)

Pascale Audet16:10:19

replied to a thread:Hello ! We have a particular use case and have a mandate to stick with Datomic as much as possible. We are starting to plan a lot of programmability into the system, so that end users can dynamically define entity types, attributes, etc. In other words, add their own schemas. Our first intuition is not to let them create actual schema attributes, but to represent them as data in our domain, not in Datomic's schema domain (we thought of Thing→Data from Reddit or some RDF variant). Of course, by doing this, we will lose the power of Datalog on regular schemas. We believe that we cannot use the typical vertical table (entity id, attribute and value) to store the entity data. So we thought of using tuples (entity id, attributes and values) and zip the attributes/values of the tuples (in the same table or separated, review the diagrams). We will deal with limiting the length of the tuples. If you have experience with this type of design and its trade-offs, we'd love to hear it too.

I can see that a vertical table would work for our use case. However, I think most of you are on Datomic On-Prem? Do you think it would be the same on Datomic Cloud?

2022-10-13

Channels