Fork me on GitHub
#datomic
<
2023-11-08
>
onetom05:11:34

We are still getting exceptions often, when trying to get a database value (with d/db) for the 1st time after a Datomic Cloud Ion deployment:

Execution error (ExceptionInfo) at datomic.core.anomalies/throw-if-anom (anomalies.clj:94).
Loading database
clojure.lang.ExceptionInfo: Loading database #:cognitect.anomalies{:category :cognitect.anomalies/unavailable, :message "Loading database"}
	at datomic.core.anomalies$throw_if_anom.invokeStatic(anomalies.clj:94)
	at datomic.core.anomalies$throw_if_anom.invoke(anomalies.clj:88)
	at datomic.core.anomalies$throw_if_anom.invokeStatic(anomalies.clj:89)
	at datomic.core.anomalies$throw_if_anom.invoke(anomalies.clj:88)
	at datomic.cloud.client.local.Client$thunk__29787.invoke(local.clj:175)
	at datomic.cloud.client.local$create_db_proxy.invokeStatic(local.clj:282)
	at datomic.cloud.client.local$create_db_proxy.invoke(local.clj:280)
	at datomic.cloud.client.local.Connection.db(local.clj:103)
	at datomic.client.api$db.invokeStatic(api.clj:181)
	at datomic.client.api$db.invoke(api.clj:170)
Is there any way to avoid this? Is this a known issue? Do we really need to put a retry around the d/db call too, like we did for the d/connect, as recommended by https://github.com/Datomic/ion-starter/blob/master/src/datomic/ion/starter.clj#L21-L24?

Lone Ranger18:11:27

I could use some advice on best practices for sensitive personal account data in #C03RZMDSH. I'm currently using CouchDB because it's very easy to keep user account information segregated via one database per user. Now I know nubank uses it for banking info, so datomic is clearly good enough for the job. The only thing that concerns me a bit is I find it a little dicey leaving all of the work of data segregation and access controls up to the application layer. So for instance -- would you make a database per user, or would you simply be very careful in the application layer and try to make an abstraction/middleware chain that prevents you from doing anything stupid?

favila18:11:01

A database per tenant isn’t realistic--databases are fairly high-cost abstractions and aren’t designed for having large numbers of them.

favila18:11:54

I suspect most people do it in the app layer entirely. However, if you are careful about assigning tenant ownership to entities in your schema in a consistent and discoverable way, and being explicit about which ref attributes can cross tenant boundaries and which cannot, you can install a lot of guardrails and affordances into your system

favila18:11:33

At one extreme, you can use a d/filter predicate into a database to filter datoms by tenant

favila18:11:01

but that has a performance cost which you may not want to bear at scale; so you could use application code and db/ensure to enforce tenancy boundaries on write, and your queries can trust that they were enforced

favila18:11:13

an offline job could double-check tenancy boundaries periodically; if you find boundary violations and put enough metadata on your transactions, you should be able to squash those bugs fairly easily, or at least have an audit trail of the write.

favila18:11:39

this is a more out-there idea, but you could encode tenancy into the entity partition (which you probably should do anyway for performance), and that would give you a blast-damage-limiting, very cheap d/filter that would at least ensure a bug can’t break out of the tenants in a partition

favila18:11:51

anyway, I think the most important thing is make sure tenancy is very clearly expressed in your schema, and then you can build on that to the degree you’re willing to trade safety for performance

favila18:11:32

or application-run code for datomic-run code

Lone Ranger19:11:59

you're using some terms I'm not familiar with and I just want to make sure that I understand what you're implying -- tenancy -- would this translate to a db.type/ref for a user identity? partitions -- I thought about this. I have no idea if a partition per tenant is feasible or not, but that makes a lot of sense. metadata -- Do you mean literal metadata in the clojure sense of metadata or do you mean database metadata encoded via schema?

Lone Ranger19:11:20

these are great suggestions, btw, thank you

favila19:11:01

tenancy: Roughly “ordinary scopes of read and write”--in your example each user would be a tenant. You might have smaller units or larger ones, or even hierarchical ones. I’m just using a general term for this problem.

1
favila19:11:42

partitions: you probably want to assign partitions via a hash function rather than explicitly. Partitions are really a performance optimization to increase locality and my suggestion is kind of an abuse of them. note also you can’t change the partition of an entity once assigned (it’s part of the entity id) so if you make a mistake this could be a real inconvenience

favila19:11:04

metadata: I mean attributes asserted on the transaction entity itself. This is just schema.

favila19:11:21

things you might assert as metadata: a reference to the authenticated user that performed the operation; the name of the operation in your code; a reference to the tenant, etc

Lone Ranger19:11:33

Yeah, makes sense. You know what I think is interesting here is using Clojure in conjunction with Datomic for something like banking. I would imagine you'd want more structured like Java classes or something to put some compile-time enforcement as a backup.

Lone Ranger19:11:26

but obviously the case-study says that it works. And from what I understand, non-datomic bank code is a huge mess anyway

favila19:11:53

I’m not sure how any compile-time approach would work for what is fundamentally a data reference problem

favila19:11:46

the enforcement has to live over the data at runtime; the question is just how much is lowered into the database runtime and how much perf are you willing to trade for enforcement.

Lone Ranger19:11:23

My biggest concern is inappropriate egress, which could potentially occur from inappropriately transacting combined user data, which could potentially occur from merging hashmaps that should not be merged. Can this be handled without classes? Yes. But it's harder to inappropriately merge object than it is to inappropriately merge hashmaps.

Lone Ranger19:11:11

Similarly, it's harder to egress "too much" data from a hashmap when marshalled into a class then serialized.

favila19:11:16

are you talking about queries that unavoidably cross tenants and aggregate somehow? otherwise I’m not sure how classes could enforce that e.g. every entity is from user A.

Lone Ranger19:11:16

And yes I know you could use spec to do this, but I try to avoid anything that relies to much on discipline if that discipline can be outsourced to a different layer like the compiler or the database

favila19:11:26

if you’re just talking about what attributes on an entity are exposed, this isn’t really a tenancy problem. You control that with d/pull on the read-from-db side and explicit serialization code (i.e. just pulling certain fields, renaming fields, etc) on the write-to-wire side

Lone Ranger19:11:41

And maybe this is what I need to hear -- that ultimately it is an application layer problem, no matter which way you slice it, and that I need to evaluate if the benefits of having datomic outweigh my concerns about data tenancy.

favila19:11:22

well it doesn’t have to be application layer--some databases lower quite complex authentication and row-and-column-level authorization controls into their database, but arguably the database is functioning as an application layer at this point

favila19:11:45

you gotta pay for that safety somewhere

Lone Ranger19:11:51

yeah but who wants to use other databases besides datomic 😛

Lone Ranger19:11:03

thanks, as always, @U09R86PA4. You probably have no idea how many times you've pulled my @$$ out of the fire regarding Datomic over the years 😅

refset10:11:36

A database per tenant isn’t realistic--databases are fairly high-cost abstractions and aren’t designed for having large numbers of them.it's perhaps worth noting that there are people trying to address this (but I can't say how successfully), e.g. with Postgres https://www.thenile.dev/blog/introducing-nile

favila12:11:53

I meant in datomic specifically

👍 1
favila12:11:40

Datahike apparently is much closer to zero cost

favila12:11:55

However any system like this still needs to grapple with the fact that shared-nothing means things like schema aren’t shared, so migrations involve a lot of orchestration

favila12:11:23

I wish tenancy were a first-class concept in more databases

2
danbunea16:11:11

I use namespaced keywords per tenant. Many tenants in the same db, such as:

:tenant-1.order/total
:tenant-2.order/total 
etc

Lone Ranger17:11:50

oh interesting @U0GE6JTKK . does that imply a schema per tenant?

danbunea10:11:47

yes, but in my case schemas between tenants may have differences. There are some common namespaces as well. I call a tenant a workspace so you'd have: [:workspace/id :tenant-1] [:workspace/id :tenant-2] and the permissions of who can access what.

bhurlow15:11:48

my two cents here is that this is really more of an application concern. Datomic provides a good set of building blocks to implement what you need – you’ll eventually need to write tests that prove the application boundaries anyway