Fork me on GitHub
#off-topic
<
2023-01-09
>
marrs17:01:12

Are there any opinions here about using PRIMARY KEY in an RDBMS to actually identify the set of columns (tuple) that identify an individual record? This is what the primary key actually refers to in relational theory but I've never seen it used this way in the wild. Usually, a single id column is used instead. I'm toying with the idea of using the primary key in this old-fashioned sense, but keeping an id column for use as a foreign key. I'm using Postgres for my project, but I'm curious about the approach generally.

pavlosmelissinos18:01:31

It also supports composite compound foreign keys, so you might not even need that id column: https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-FK

marrs18:01:51

I just read this in the docs: > PRIMARY KEY enforces the same data constraints as a combination of UNIQUE and NOT NULL. However, identifying a set of columns as the primary key also provides metadata about the design of the schema, since a primary key implies that other tables can rely on this set of columns as a unique identifier for rows.

marrs18:01:58

I'd like to keep the id column because its data will be immutable whereas the primary key as I described it (let's call it the natural key from now on) could change its data.

isak18:01:29

I like surrogate keys better, because sometimes you want to write logic that touches multiple tables, or doesn't care about the table, and that gets a lot harder if every table can have its own type of primary key.

marrs18:01:17

Thanks. I think I'll stick with surrogate primary key for the most part. I'm curious about using a compound primary key if the members of that key are guaranteed to be immutable although I take your point about consistency, @U08JKUHA9.

2
mauricio.szabo18:01:36

The problem is that even if a compound (or even non-compound) primary key is immutable but actually represents something about the data, you can have a whole lot of other problems. Sure, the data may be for internal use today, but will it be tomorrow? What if the user wants his data to be deleted from DB, and that primary key is also eligible for deletion, but you can't actually do it because it'll break constraints over the code? And what if the "guaranteed to be immutable" become not that guaranteed tomorrow because of some extreme edge-case? Lots of questions...

2
marrs18:01:31

So you would avoid a compound foreign key altogther?

Cora (she/her)19:01:05

people make a lot of bad assumptions about what's immutable and what's unique, especially over time

hiredman19:01:31

a lot of shops just have a rule, every row gets a synthetic key

Cora (she/her)19:01:12

I'd only trust something that's specifically designed to be unique across the domain and has no other meaning. that way the meaning can't change and invalidate your assumptions since it's not tied to anything else

👍 2
Cora (she/her)19:01:39

and storage is cheap these days ¯\(ツ)

Cora (she/her)19:01:49

plus, you can share an arbitrary surrogate key without sharing what could amount to private information (PII or otherwise) which is a handy property to have in a key

marrs22:01:36

"storage is expensive" is usually the first reason given for using surrogate keys in the first place! 😛

Cora (she/her)22:01:36

I've never heard that argument but I guess it must be a thing

Cora (she/her)22:01:02

I suppose indexes of compound primary keys and using them for foreign keys would take up more space

Cora (she/her)22:01:10

so that makes sense

marrs22:01:13

Big indices used to be a concern. Maybe they still are. I've never really used them

marrs22:01:25

I've only ever worked with surrogate primary keys, partly because most interaction was through ORMs, and partly because it was the received wisdom that that's what you do

marrs22:01:18

but yeah, people used to stuff table data into indices to optimise query performance, but the trade-off was that you could fill up your disc space

Cora (she/her)22:01:17

people still do. on postgres, at least, you can do index-only scans with covering indexes

marrs23:01:09

That's good to know. I don't think I've seen anyone optimise a query in at least 10 years

mauricio.szabo02:01:18

@UVDD67FFX weird, I do this all the time. In fact, most problems I solved in the last few years were about poor queries, usually made in ORMs

Cora (she/her)03:01:04

I've had jobs where I do that a lot and jobs where I don't. the different kinds of problems to solve are not evenly distributed

fraxamo08:01:29

If you decide to go down the route of using a surrogate key to identify a record in a table you should also create a UNIQUE constraint against some of the columns (what you refer to as the natural key above) to prevent duplicates creeping in. An automatically-generated surrogate key on its own will not prevent duplicates.

mpenet16:01:16

it's quite common in cassandra schemas for a bunch of reasons

mpenet16:01:45

it's mostly an artifact of how it works internally and access patterns you might need