architecture

Marius 2023-07-27T18:56:34.696609Z

Hi all, I’d like your opinion on the following approach: I want to store monthly financial data in a relational database. Let’s assume I have to use PostgreSQL. Usually there would be one column per field and some logic to (de)construct the map to/from DB columns. Having quite a number of values (around 80+) in a map hierarchy, being lazy to create a table with 80+ columns, and having and no need (yet) to aggregate numbers in the DB, I was thinking about just flattening each map into multiple rows of column id (type UUID) = some unique id column key (type text) = [:key1 :sub_key1 …] (vector like used for get-in / assoc-in) column value (type text) = (prn-str value) When reading from DB, it would only require to reduce with assoc-in and read-string to get the original map. What I like about this approach is that it’s quite flexible, any map can be stored. Should the map structure change, I can run a SQL migration to update the field names with an UPDATE query to reflect the new structure. What I dislike is that I don’t use the DB as intended, because I am essentially just storing key as string and value as string. I would ignore all benefits of the database, in particular native data types (I store only strings) and aggregation. While typing this it becomes more or less clear, that this is not a good idea, but still I am curious, what is your opinion?

2023-07-29T11:25:01.517359Z

All of these approaches are totally reasonable and the reason Postgres is so popular is that it is this flexible. Wide tables, JSON colums, EDN blobs, KV tables, ... they all have their use cases. And the beauty of Postgres is that you can cover them all with a single tool. And it's good enough for 99% of cases. For many cases starting with any of the approaches is probably fine. Once you learn how it works in practice and what exactly your specific pain points are you can reevaluate.

vemv 2023-07-27T19:08:47.719609Z

What's your opinion on json/jsonb postgresql column types? You'd have a single column to cram the less predictable/structured data

2023-07-27T19:47:40.285519Z

I think thats key here, if your data has a predictable schema, many columns win for me over singular EDN-string or JSON/JSONB columns. JSON/JSONB, for me, are for semi-structured or many-shaped data.

Marius 2023-07-27T19:48:52.130549Z

@vemv I thought about that, but I find that the deserialization from JSON is a bit tricky because the type info is lost. Using (prn-str) and (read-string) I get exactly the same type back.

👍 1
Marius 2023-07-27T19:50:39.732619Z

One could use e.g. Malli coercion afterwards of course

2023-07-27T19:51:25.006759Z

JSON(B)? is queryable in Postgres, thats also something to consider — but still, tables and columns if you can 😉

Marius 2023-07-27T19:54:24.357509Z

@lennart.buit Good point, definitely a plus. But then I was wondering when I would need that? Probably mostly for migrations, to change all JSON blobs at once. But then this is also possible (if not easier) with the columns, as you write, or with the idea I sketched above.

Marius 2023-07-27T19:57:17.340889Z

In case I would go with individual columns, I wonder if it is a good idea to programmatically generate the columns (generate the table schema) using my Malli spec……. seems to be a fragile approach?

2023-07-27T19:57:33.147749Z

I mean, what you say exists, its EAV databases like Datomic. It is an idea that has merit on its own. Might be worth to look those up, see what their ups and downs are

Marius 2023-07-27T19:59:04.925009Z

Yeah I feel like imitating Datomic in PostgreSQL 😄

😅 1
2023-07-27T19:59:54.108899Z

Reddit famously ha(d|s) it, thats why so many things are upvoteable. Anyhow, I’ll stop, this is a distraction 😛

Marius 2023-07-27T20:00:39.053829Z

Sorry, Reddit has what?

2023-07-27T20:01:03.324019Z

They used(?) to have an EAV database schema. Anyhow, I want to steer away from this haha, your usecase sounds structured, so that is where I would go — I’m merely saying that systems exist with less rigidly structured data, and EAV databases can help with that. Their ups and downs are well understood I think ^^.

Marius 2023-07-27T20:09:09.134389Z

Yeah I have to admit that I was not aware of the term EAV anymore, so that pointer is helpful. And yes, you are right, in my case it is fairly structured so I guess going for individual columns would be the right way. Thanks for you time and advice!

👍 1
seancorfield 2023-07-27T20:32:04.025269Z

Another option, if you don't plan to query by column on that data or run numeric computations over it, is to use EDN for the whole map instead of JSON(B). But it really does depend on what you're planning to do with the data in terms of SQL queries.

seancorfield 2023-07-27T20:34:05.762979Z

We have some data that we never query except for one field, so our DB table has: PK, that one field name, and then a TEXT blob containing the EDN. Perfect for our use case. We also have some data that potentially has a lot of columns and we need to query by any/all of them, so we use EAV for that rather than actual DB columns.

Marius 2023-07-28T20:00:16.022199Z

Hi @seancorfield, storing the whole map as EDN in a TEXT blob was in fact my first implementation. Quick and easy, but I am a bit worried about how I would handle future migrations to the map structure, once I have all those EDN blobs in my DB. I guess I could add a version number to the map and then read it accordingly (migrate on the fly, I guess thats how MongoDB and the likes do it, right?) or write a one-time migration script, but it still feels a bit shaky. Therefore reinvented the EAV wheel at first, but now I guess I’ll go back to the roots using good old columns. 😉

Marius 2023-07-31T18:24:53.674399Z

Thanks @hi895, that gives me some confidence. The hardest decisions are often the ones where you have equally good options… 🙂

🙌 1