Fork me on GitHub
#xtdb
<
2022-10-09
>
Christian Pekeler10:10:32

To store or not to store derived attributes? In the past, when using mutable relational DBs, I’d generally not persist calculated attributes, i.e. values derived from other attributes. For example, I’d store a person’s weight and height, but not their body-mass-index (BMI). That way, the data would be normalized and changes to the BMI calculation wouldn’t require a migration. I would only make an exception to this if the calculation was very expensive or if this was for a reporting DB. Now that I’m using an immutable more document-centric DB and care about prior versions, I have become less sure about this pattern. If I don’t persist the BMI, viewing prior versions of a person is more cumbersome (either for the implementor or the user). And, let’s say the history viewer calculates the BMI and shows it, history might look incorrect if at some point the implementation of BMI changed. Instead of also keeping all versions of the BMI function around at runtime, it seems more pragmatic to just explicitly store the BMI value. Also, now that I have versions in my transactional DB, it has become more useful for analytical tasks so that I might not need a separate reporting DB. And if I do reporting against the transactional DB, storing derived values makes a lot of sense. Curious what everyone’s thinking on this subject is.

Hukka08:10:14

We store the data that is shown to the customers, so that we could go back and offer historic reports later. But our case is a reporting system, so it might be crucial. If we didn't have bitemporality, we would store it in a table next to each other, but that makes it a bit more cumbersome at this point when nobody has actually requested it yet

👍 2
Hukka08:10:04

Spinning old versions of code is not really viable, since so many other things might have changed.

Hukka08:10:22

I could see doing it as some really special case forensics, but not as part of normal features

1
malcolmsparks11:10:50

It's partly a classic time/space tradeoff. I would urge pragmatism. I'd err on storing derived data in many cases, such as account balances though arguably discrepancies might occur.

👍 1
cjohansen20:10:56

I avoid storing derived data as much as possible. I much prefer the flexibility of deriving in code. I get the point about being able to “correctly” view historic data, but since your code lives its life separate from your data, I don’t think it’s practical to fully solve this problem by storing derived data. In my apps, I transact the git sha of the code during startup. This way, you can grab a snapshot of the database and know what exact version of your codebase was relevant at the time. Something like that could be used to build an accurate historic reporting service.

💡 1
👍 3