This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-01-09
Channels
- # announcements (9)
- # beginners (69)
- # cider (4)
- # clj-kondo (8)
- # cljdoc (1)
- # clojure (52)
- # clojure-austin (4)
- # clojure-europe (22)
- # clojure-nl (2)
- # clojure-norway (14)
- # clojure-uk (3)
- # clojurescript (9)
- # conjure (4)
- # cursive (3)
- # datalevin (13)
- # datomic (4)
- # events (2)
- # fulcro (59)
- # graalvm (17)
- # helix (25)
- # inf-clojure (4)
- # integrant (4)
- # introduce-yourself (2)
- # java (5)
- # kaocha (1)
- # leiningen (3)
- # meander (7)
- # nbb (4)
- # off-topic (30)
- # portal (4)
- # rdf (1)
- # reagent (5)
- # sci (1)
- # shadow-cljs (57)
- # sql (8)
- # tools-deps (39)
- # uncomplicate (3)
- # vim (3)
- # xtdb (8)
Are there any opinions here about using PRIMARY KEY
in an RDBMS to actually identify the set of columns (tuple) that identify an individual record? This is what the primary key actually refers to in relational theory but I've never seen it used this way in the wild. Usually, a single id
column is used instead. I'm toying with the idea of using the primary key in this old-fashioned sense, but keeping an id
column for use as a foreign key. I'm using Postgres for my project, but I'm curious about the approach generally.
Postgres supports composite compound primary keys: https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-PRIMARY-KEYS
It also supports composite compound foreign keys, so you might not even need that id column: https://www.postgresql.org/docs/current/ddl-constraints.html#DDL-CONSTRAINTS-FK
I just read this in the docs:
> PRIMARY KEY
enforces the same data constraints as a combination of UNIQUE
and NOT NULL
. However, identifying a set of columns as the primary key also provides metadata about the design of the schema, since a primary key implies that other tables can rely on this set of columns as a unique identifier for rows.
I'd like to keep the id
column because its data will be immutable whereas the primary key as I described it (let's call it the natural key from now on) could change its data.
I like surrogate keys better, because sometimes you want to write logic that touches multiple tables, or doesn't care about the table, and that gets a lot harder if every table can have its own type of primary key.
Thanks. I think I'll stick with surrogate primary key for the most part. I'm curious about using a compound primary key if the members of that key are guaranteed to be immutable although I take your point about consistency, @U08JKUHA9.
The problem is that even if a compound (or even non-compound) primary key is immutable but actually represents something about the data, you can have a whole lot of other problems. Sure, the data may be for internal use today, but will it be tomorrow? What if the user wants his data to be deleted from DB, and that primary key is also eligible for deletion, but you can't actually do it because it'll break constraints over the code? And what if the "guaranteed to be immutable" become not that guaranteed tomorrow because of some extreme edge-case? Lots of questions...
I would
people make a lot of bad assumptions about what's immutable and what's unique, especially over time
I'd only trust something that's specifically designed to be unique across the domain and has no other meaning. that way the meaning can't change and invalidate your assumptions since it's not tied to anything else
and storage is cheap these days ¯\(ツ)/¯
plus, you can share an arbitrary surrogate key without sharing what could amount to private information (PII or otherwise) which is a handy property to have in a key
"storage is expensive" is usually the first reason given for using surrogate keys in the first place! 😛
I've never heard that argument but I guess it must be a thing
I suppose indexes of compound primary keys and using them for foreign keys would take up more space
so that makes sense
I've only ever worked with surrogate primary keys, partly because most interaction was through ORMs, and partly because it was the received wisdom that that's what you do
but yeah, people used to stuff table data into indices to optimise query performance, but the trade-off was that you could fill up your disc space
people still do. on postgres, at least, you can do index-only scans with covering indexes
That's good to know. I don't think I've seen anyone optimise a query in at least 10 years
@UVDD67FFX weird, I do this all the time. In fact, most problems I solved in the last few years were about poor queries, usually made in ORMs
I've had jobs where I do that a lot and jobs where I don't. the different kinds of problems to solve are not evenly distributed
If you decide to go down the route of using a surrogate key to identify a record in a table you should also create a UNIQUE constraint against some of the columns (what you refer to as the natural key above) to prevent duplicates creeping in. An automatically-generated surrogate key on its own will not prevent duplicates.