sql

2023-07-25T16:39:39.568899Z

How would you represent a sum type (tagged union, discriminated union… the beast has many names) in a relational database?

2023-07-25T16:41:47.446319Z

Tried googling for ‘union type’ but get the union sql keyword, tried googling for ‘sum type’, but get the sum sql function :’)

isak 2023-07-25T16:47:35.348859Z

The simplest would be a json(b)? array: ['mytag', 456, 'my-value'], ["my-other-tag", "myvalue"]

2023-07-25T16:50:23.436379Z

I was kinda hoping to maintain sane things like non-null constraint, so I’m looking for a way to leverage the database schema

isak 2023-07-25T16:54:47.362429Z

Do they all need to fit in 1 table? I guess then you'd have to create a lot of generic storage locations, like text1, text2, number1, number2, and then have a SUM type definition that you point to, and wrap all inserts/updates in a sql function/procedure that does the validation.

respatialized 2023-07-25T17:29:27.199179Z

Postgres has composite types https://www.postgresql.org/docs/current/rowtypes.html

2023-07-25T17:29:55.713409Z

oh interesting

2023-07-25T17:30:12.561379Z

I dont have an arbitrary product type btw, I can list the branches

respatialized 2023-07-25T17:33:16.773479Z

What purpose are these types going to serve in your application?

2023-07-25T17:34:17.991099Z

its just one of a hand full of shapes, so I’m looking to enforce consistency “if vehicle is a car, it has at least these properties, if it is a bike, it has these properties”

respatialized 2023-07-25T17:34:47.828639Z

That sounds like a good candidate for composite types (Edit: maybe not)

2023-07-25T17:36:43.418259Z

I guess the ways could be a wide table with fields of all branches plus a constraint, or a table per branch, but I’m looking for why I would take one over the other — what are my considerations

2023-07-25T17:36:51.863229Z

If that makes sense

2023-07-25T17:38:33.183279Z

Or another form I couldn’t dream up just yet, composite types sound cool, but no non-null constraints, the docs say 😞

isak 2023-07-25T17:39:57.205359Z

Does pg not support check(bool expr) type constraints?

2023-07-25T17:40:06.650289Z

Yeah it does

isak 2023-07-25T17:40:40.507299Z

But you don't want to use those to enforce non-null?

2023-07-25T17:41:25.925359Z

Thats a way, I wonder what choices I have and why I would pick one over the other — what upsides and downsides do I get with each ^^

👌 1
respatialized 2023-07-25T17:46:21.862589Z

This isn't exactly root cause analysis but i think the "5 whys" might be illuminating here; the specific way you need to model the data depends on why you need to enforce this property constraint to begin with https://en.m.wikipedia.org/wiki/Five_whys

🆒 1
isak 2023-07-25T17:48:00.208509Z

I've had bad experiences with trying to use advanced custom types, but that was with SQL Server. It makes the normal problem of the app and schema having different lifetimes worse.

Max 2023-07-26T04:55:51.088529Z

I’ve run into the sun-type-in-rdbms problem several times in the past; there’s not a great solution. Here’s a few different approaches I’ve seen work and why you might choose them: • One table with all the columns. An additional discriminator column allows you to tell which member of the sum type each row is. You can use a CHECK constraint to enforce required/prohibited fields for each sum member. • One table with common field columns and a discriminator column, plus a JSONB column. You can use a CHECK constraint to enforce the shape of the JSON. • Two tables with entirely separate schemas The tradeoffs that inform which of these options you choose are primarily around your query patterns. For example, if rows across different members of the sum type should share the same ID space, then you probably should not choose the two-tables approach. If the sum type members have a large number of fields or complicated constraints, then it may be worth the additional query complexity to split it into two tables. If the number of differing fields between the sum type members is small and/or the constraints on those fields are complicated, then it may make sense to use the all-the-columns approach. And if the fields differ significantly and/or the sum type members have a large number of fields, then it may be worth it to use the JSON approach.

👍 1
Max 2023-07-26T05:02:54.087559Z

An example: I worked on a project that bridged in-app chat to SMS via Twilio. Incoming messages from Twilio would have a bunch of Twilio-specific fields, and outgoing messages from the app would have a bunch of app-specific fields. In our case, since our query patterns were all around fetching a slice of messages from a conversation and all messages shared an ID space, we decided against the two-table approach. The number of differing fields between incoming and outgoing messages was relatively small, so we went with the all-the-columns approach. I forget if we wrote a big CHECK constraint to verify the shape of both message types or not, it wasn’t a huge concern since the likelihood of someone accidentally writing to some Twilio field on an outgoing message was low (what would they put there?) and vice versa for the incoming messages.

2023-07-26T07:48:35.731499Z

Thanks ^^!