Fork me on GitHub
#architecture
<
2018-01-18
>
Drew Verlee19:01:52

Anyone have any insight into effectively handling db constraints across your entire system? An example will make this clear. Say you have a uniqueness constraint on a db column. A caller can potentially violate this constraint. The callers options seem to be 1) check the constrain violation before issuing the command 2) issue the command and handle the possible violation (which is typically an exception). In the case of the first option, it seems redundant to check both before and after. With a database, this would additional have to be done in a transaction. In the later case, it seems unclear how callers are supposed to A) know about the constraints in the first place B) handle them adequately Maybe its reasonable for all constraints to return an error code that callers are aware of.

try
   command
catch (exception e)
     case e.code
      "1" "duplicate name error"
      "2" "no name error"
       etc...
But i have never seen this done, so i’m hesitant to start using it in my own system. I’m curious what other people do, or if there is another way to approach this type of problem.

Drew Verlee19:01:22

A third more radical options would be to not through exceptions and make the db smart enough to know what todo depening on the context.

result = command()
// result can be error or success
If result.error()
   ...
else 
   ....
This is similar to the second option above, just without exceptions… as its not clear to me why i would want a system crash as the default.

fellshard19:01:58

(IIRC, that's one of my peeves about many SQL-based DBs - they generally have very poor error-handling mechanics.)

fellshard19:01:34

It's hard to reason about failure on the client side.

seancorfield20:01:05

How you handle this really depends on your problem domain (how likely constraint violations actually are) and what error reporting model you want for the "client" of your data access code.

seancorfield20:01:15

We have two typical patterns around this: 1) where the likelihood of an insert failing is "high" (due to potentially competing requests to insert conflicting things), we tend to do (try (insert-the-data) (catch SQLException e (if (insert-conflict e) (update-the-data) (throw e)))); 2) where the likelihood of an insert failing is "low", we tend to do (if (data-not-present) (insert-the-data) (report-duplicate))

seancorfield20:01:28

Of course the second approach can fail (and throw an exception) and we just treat that as a failed operation and let the client try again (and they'll either get the duplicate error or another failure).

seancorfield20:01:05

The second approach is good when you want to trap a condition at a system boundary (or in the UI) -- where it's worth doing the test first to indicate whether the operation is likely to succeed or fail -- and the client is likely to be able to alter course (e.g., a member selecting a username when registering on a site).

seancorfield20:01:56

(the likelihood of two different users simultaneously attempting registration with the same username or email address is extremely low)

Drew Verlee22:01:28

@seancorfield So in the cases i’m envisioning, the conflicts aren’t likely because of a timing issue, so you recommend testing the condition(s) first. I’m ok with this approach, but in the case of a relational DB it seems hard to organize and re-use the code. Take the uniqueness constraint example. I don’t know an easy way to invoke that as even a sql function as a pre-condition and even then It still feels like duplicating the constraints intent. Does this type of thing change much when using Datomic?

seancorfield22:01:06

(if (username-exists? db new-user)
  (username-already-taken new-user)
  (register-user db new-user))
Not sure what's hard to organize and re-use about that -- could you elaborate?

seancorfield22:01:10

Even if you abstract the data access out of that completely, assuming a readonly view of the DB passed in and a set of updates to perform passed out, the logic stays the same.

seancorfield22:01:39

Are you saying that you "don't know" what the constraints in the DB are?

fellshard22:01:37

I get the impression you're looking for 'constraints-as-data'?

seancorfield22:01:53

Over the years, we've tended to lean to having as few constraints in the DB as we can get away with (for all sorts of reasons) -- I think most of what you're thinking of as "constraints" @drewverlee are really Business Domain concerns (and, in my opinion, don't really belong in the DB -- it should be just a backing store for your data).

seancorfield22:01:34

Going back to the username issue -- over the years we've changed the business rules about usernames several times, including the scope of their uniqueness (we have 100+ websites and a global database). Having those constraints in the DB would have been very constricting. For example, if we decided to make usernames unique per site, the DB could not have the constraint added if we'd previously allowed non-unique usernames per site.

Drew Verlee22:01:23

@seancorfield Your right, they are business domain concerns. But I don't understand why those shouldn't be db constraints. I need to find some materials on that topic, as my hammock time hasn't come up with a reason yet. Your example starts to highlight the cons, thanks for bring it up.

seancorfield22:01:26

Having some of your business domain constraints in code and some in the database can be a maintenance nightmare. Having unnecessary constraints in the database makes adjusting your business domain rules hard, as well as potentially impacting DB migrations, data transformations / fixes.

seancorfield22:01:59

BTW, here's an interesting paper on just how difficult database constraints can make schema refactoring https://arxiv.org/ftp/arxiv/papers/0912/0912.1016.pdf -- that proposes special metadata tooling to extract all the constraints, drop them, perform the schema refactoring, and then re-apply the appropriate/updated constraints. Ouch!

fellshard23:01:44

The difficulty with putting it in code is that it's usually difficult to keep the constraints uniformly applied when multiple codebases touch the same database. This is less an argument for moving constraints to the database, though, and more for not using databases as integration points.

Drew Verlee19:01:03

> This is less an argument for moving constraints to the database, though, and more for not using databases as integration points. Hm. If your not using databases for integration points, then what are you using for integration points?

fellshard19:01:11

Processes, via facade or domain APIs, though perhaps Datomic / CQRS-based DB designs alleviate the old 'everyone connect to one relational DB' problem.

fellshard19:01:36

In the end, competing domains end up corrupting one another's' data models if you try and just shove everything into a single relational DB. The more domains you add, the harder it is for them to preserve understanding for all other domains.

Drew Verlee20:01:13

I think understand what your suggesting, but i’m not sure how not using a db resolves it. I think what your suggesting is more that we can build more flexible entry points ontop of immutable data. Streaming queries over Immutable kafka logs thorough onyx dont have to worry about stepping on each others toes (no master schema).

fellshard21:01:26

Precisely. Not saying to not use a relational DB, but rather to scope each relational DB to a single domain. Assuming that new domains can just connect to the database and share it seamlessly with other domains will get you into deep trouble, as relational DBs just aren't flexible enough to allow for independent change. Datomic's datoms are one way you can still centralize your data source without necessarily forcing other apps to integrate. Similarly, abstracting your data as an immutable log provides flexible writes and reads, with lower chance of breakage.