Fork me on GitHub
#architecture
<
2023-07-27
>
Marius18:07:34

Hi all, I’d like your opinion on the following approach: I want to store monthly financial data in a relational database. Let’s assume I have to use PostgreSQL. Usually there would be one column per field and some logic to (de)construct the map to/from DB columns. Having quite a number of values (around 80+) in a map hierarchy, being lazy to create a table with 80+ columns, and having and no need (yet) to aggregate numbers in the DB, I was thinking about just flattening each map into multiple rows of column id (type UUID) = some unique id column key (type text) = [:key1 :sub_key1 …] (vector like used for get-in / assoc-in) column value (type text) = (prn-str value) When reading from DB, it would only require to reduce with assoc-in and read-string to get the original map. What I like about this approach is that it’s quite flexible, any map can be stored. Should the map structure change, I can run a SQL migration to update the field names with an UPDATE query to reflect the new structure. What I dislike is that I don’t use the DB as intended, because I am essentially just storing key as string and value as string. I would ignore all benefits of the database, in particular native data types (I store only strings) and aggregation. While typing this it becomes more or less clear, that this is not a good idea, but still I am curious, what is your opinion?

vemv19:07:47

What's your opinion on json/jsonb postgresql column types? You'd have a single column to cram the less predictable/structured data

Lennart Buit19:07:40

I think thats key here, if your data has a predictable schema, many columns win for me over singular EDN-string or JSON/JSONB columns. JSON/JSONB, for me, are for semi-structured or many-shaped data.

Marius19:07:52

@U45T93RA6 I thought about that, but I find that the deserialization from JSON is a bit tricky because the type info is lost. Using (prn-str) and (read-string) I get exactly the same type back.

👍 1
Marius19:07:39

One could use e.g. Malli coercion afterwards of course

Lennart Buit19:07:25

JSON(B)? is queryable in Postgres, thats also something to consider — but still, tables and columns if you can 😉

Marius19:07:24

@UDF11HLKC Good point, definitely a plus. But then I was wondering when I would need that? Probably mostly for migrations, to change all JSON blobs at once. But then this is also possible (if not easier) with the columns, as you write, or with the idea I sketched above.

Marius19:07:17

In case I would go with individual columns, I wonder if it is a good idea to programmatically generate the columns (generate the table schema) using my Malli spec……. seems to be a fragile approach?

Lennart Buit19:07:33

I mean, what you say exists, its EAV databases like Datomic. It is an idea that has merit on its own

Marius19:07:04

Yeah I feel like imitating Datomic in PostgreSQL 😄

😅 1
Lennart Buit19:07:54

Reddit famously ha(d|s) it, thats why so many things are upvoteable. Anyhow, I’ll stop, this is a distraction 😛

Marius20:07:39

Sorry, Reddit has what?

Lennart Buit20:07:03

They used(?) to have an EAV database schema. Anyhow, I want to steer away from this haha, your usecase sounds structured, so that is where I would go — I’m merely saying that systems exist with less rigidly structured data, and EAV databases can help with that. Their ups and downs are well understood I think ^^.

Marius20:07:09

Yeah I have to admit that I was not aware of the term EAV anymore, so that pointer is helpful. And yes, you are right, in my case it is fairly structured so I guess going for individual columns would be the right way. Thanks for you time and advice!

👍 2
seancorfield20:07:04

Another option, if you don't plan to query by column on that data or run numeric computations over it, is to use EDN for the whole map instead of JSON(B). But it really does depend on what you're planning to do with the data in terms of SQL queries.

seancorfield20:07:05

We have some data that we never query except for one field, so our DB table has: PK, that one field name, and then a TEXT blob containing the EDN. Perfect for our use case. We also have some data that potentially has a lot of columns and we need to query by any/all of them, so we use EAV for that rather than actual DB columns.

Marius20:07:16

Hi @U04V70XH6, storing the whole map as EDN in a TEXT blob was in fact my first implementation. Quick and easy, but I am a bit worried about how I would handle future migrations to the map structure, once I have all those EDN blobs in my DB. I guess I could add a version number to the map and then read it accordingly (migrate on the fly, I guess thats how MongoDB and the likes do it, right?) or write a one-time migration script, but it still feels a bit shaky. Therefore reinvented the EAV wheel at first, but now I guess I’ll go back to the roots using good old columns. 😉

Jorin11:07:01

All of these approaches are totally reasonable and the reason Postgres is so popular is that it is this flexible. Wide tables, JSON colums, EDN blobs, KV tables, ... they all have their use cases. And the beauty of Postgres is that you can cover them all with a single tool. And it's good enough for 99% of cases. For many cases starting with any of the approaches is probably fine. Once you learn how it works in practice and what exactly your specific pain points are you can reevaluate.

Marius18:07:53

Thanks @U8ZN5EHGU, that gives me some confidence. The hardest decisions are often the ones where you have equally good options… 🙂

🙌 2