How would you represent a sum type (tagged union, discriminated union… the beast has many names) in a relational database?
Tried googling for ‘union type’ but get the union sql keyword, tried googling for ‘sum type’, but get the sum sql function :’)
The simplest would be a json(b)? array: ['mytag', 456, 'my-value'], ["my-other-tag", "myvalue"]
I was kinda hoping to maintain sane things like non-null constraint, so I’m looking for a way to leverage the database schema
Do they all need to fit in 1 table? I guess then you'd have to create a lot of generic storage locations, like text1, text2, number1, number2, and then have a SUM type definition that you point to, and wrap all inserts/updates in a sql function/procedure that does the validation.
Postgres has composite types https://www.postgresql.org/docs/current/rowtypes.html
oh interesting
I dont have an arbitrary product type btw, I can list the branches
What purpose are these types going to serve in your application?
its just one of a hand full of shapes, so I’m looking to enforce consistency “if vehicle is a car, it has at least these properties, if it is a bike, it has these properties”
That sounds like a good candidate for composite types (Edit: maybe not)
I guess the ways could be a wide table with fields of all branches plus a constraint, or a table per branch, but I’m looking for why I would take one over the other — what are my considerations
If that makes sense
Or another form I couldn’t dream up just yet, composite types sound cool, but no non-null constraints, the docs say 😞
Does pg not support check(bool expr) type constraints?
Yeah it does
But you don't want to use those to enforce non-null?
Thats a way, I wonder what choices I have and why I would pick one over the other — what upsides and downsides do I get with each ^^
This isn't exactly root cause analysis but i think the "5 whys" might be illuminating here; the specific way you need to model the data depends on why you need to enforce this property constraint to begin with https://en.m.wikipedia.org/wiki/Five_whys
I've had bad experiences with trying to use advanced custom types, but that was with SQL Server. It makes the normal problem of the app and schema having different lifetimes worse.
I’ve run into the sun-type-in-rdbms problem several times in the past; there’s not a great solution. Here’s a few different approaches I’ve seen work and why you might choose them:
• One table with all the columns. An additional discriminator column allows you to tell which member of the sum type each row is. You can use a CHECK constraint to enforce required/prohibited fields for each sum member.
• One table with common field columns and a discriminator column, plus a JSONB column. You can use a CHECK constraint to enforce the shape of the JSON.
• Two tables with entirely separate schemas
The tradeoffs that inform which of these options you choose are primarily around your query patterns. For example, if rows across different members of the sum type should share the same ID space, then you probably should not choose the two-tables approach. If the sum type members have a large number of fields or complicated constraints, then it may be worth the additional query complexity to split it into two tables. If the number of differing fields between the sum type members is small and/or the constraints on those fields are complicated, then it may make sense to use the all-the-columns approach. And if the fields differ significantly and/or the sum type members have a large number of fields, then it may be worth it to use the JSON approach.
An example: I worked on a project that bridged in-app chat to SMS via Twilio. Incoming messages from Twilio would have a bunch of Twilio-specific fields, and outgoing messages from the app would have a bunch of app-specific fields. In our case, since our query patterns were all around fetching a slice of messages from a conversation and all messages shared an ID space, we decided against the two-table approach. The number of differing fields between incoming and outgoing messages was relatively small, so we went with the all-the-columns approach. I forget if we wrote a big CHECK constraint to verify the shape of both message types or not, it wasn’t a huge concern since the likelihood of someone accidentally writing to some Twilio field on an outgoing message was low (what would they put there?) and vice versa for the incoming messages.
Thanks ^^!