Fork me on GitHub
#datahike
<
2023-01-20
>
Francesco Pischedda12:01:09

Hi there! 🙂 I have started using datahike recently and I am enjoying it a lot, thanks for the great work! Now that my little project is growing I am starting to change the schema (so far only adding stuff, no changes to existing schema) and I am not sure how to properly handle incremental changes. Is there a common approach to handle schema changes to an existing db?

Francesco Pischedda12:01:32

Currently I keep all bits of the current schema in separate https://github.com/fpischedda/unrefined/blob/main/src/clj/fpsd/unrefined/persistence/datahike/schema.clj and concat them all in a vector for the https://github.com/fpischedda/unrefined/blob/main/src/clj/fpsd/unrefined/persistence/datahike.clj#L18, which works fine when starting from scratch. Now that I have to do incremental changes to an existing db this is clearly not possible. For future changes I was thinking about keeping track of the changes applied so far (both in db and code), at startup the application will verify if the db is up to date with what the code expects and, if some changes are missing then apply them. Or maybe something more manual, run with bb. Or maybe I am just complicating everything and there is easy solution to this problem 🙂

timo21:01:34

Hi @U0165ADKUDU, I am right now working on a migration tool, @konrad.kuehne almost finished it and I am testing it right now. It needs documentation but if you are up to trying things out I can throw it over the fence for you... maybe you have good ideas on what needs to be better.

Francesco Pischedda15:01:07

Hi @timok! This is great to ear, looking forward to try it! 🙂 But at the same time yesterday evening I've started working on my own (naive) version of a migration tool for my little project; probably I will continue with it just as an exercise 🙂 The basic idea is • somewhere I maintain a list of changes I want to apply to a db; changes are written manually, no automatic tool to generate them • I feed the list to the "migrator" • it compares to what have been already applied to the db ◦ in the happy path it applies the changes and record the migration ◦ in case of errors (applied migrations history in not aligned with the app "view" of migrations), report them and stop This process can happen both at the application startup time (current implementation) or outside the main app process (like bb scripts). I am curious to see how you have approached the problem!

Francesco Pischedda15:01:27

Hello @timok and @konrad.kuehne 👋 Today I have finished my initial version of the migration tools for my app, and can be seen in the commit https://github.com/fpischedda/unrefined/pull/40/commits/a524e49aff760fb73b61db99a48931a54007eb21 The migrator ns handles the core logic and the datahike ns takes care of trying to apply the changes if any, when its subsystem is started. The idea is that the application keeps track of its migrations in a vector, these migrations are compared with what is already in the db, and eventually applies the transactions. As you can see this is very naive, not unit-tested at all and probably too specific to my needs. Curious to see your migration tool!

timo18:01:55

Hi @U0165ADKUDU, I need a day or two to open the PR but happy to talk about it then 👍

🙏 2
kkuehne08:01:39

Very nice approach. If you like, I could add some comments there.

Francesco Pischedda08:01:52

> Very nice approach. Thanks! 🙂 > If you like, I could add some comments there. Yes, please 🙏

timo09:01:01

@U0165ADKUDU here you are with our solution: https://github.com/replikativ/datahike/pull/598 Feel free to comment 👍

👀 2
Francesco Pischedda19:01:08

I feel I am missing a bit of nomenclature here and experience with datomic 🙂 • what does norm stand for? • the discussion linked in the PR mentions datomic specific stuff in stork and conformity, can someone briefly describe what should be ported to datahike? On the plus side: • transaction data as edn is clearly a good choice • code of the PR looks simple and direct enough I feel I need a bit more context, I'll try to collect the missing info 🙂

Francesco Pischedda19:01:50

questions: • why is it needed to have an :norm key in the transaction edn? • why isn't it unique? • isn't the filename enough? • maybe more helpful to have a :description key instead

Francesco Pischedda07:01:13

> isn't the filename enough? ah I've missed https://github.com/replikativ/datahike/pull/598/files#diff-2f747a48a15846b2f6af676194a5da9552437e28558e46ec718d5916abdc7089R38 earlier facepalm yeah it makes sense to use the filename if the :norm key is not specified. 👍 What if the filename contains spaces? keyword does not complain but the result looks weird

user> (keyword "with spaces")
:with spaces
Also nice that it is https://github.com/replikativ/datahike/pull/598/files#diff-2f747a48a15846b2f6af676194a5da9552437e28558e46ec718d5916abdc7089R57 to feed a sequence of migrations instead of reading them from the resources. In my implementation I've also added the timestamp of when the norm has been applied together with what has been applied to help debug issues with migrations, do you think it would make sense? Another feature that I think is important is to ensure that the migrations that someone wants to apply are in line with what has been recorded, do you think it would make sense to add it as well?

timo08:01:04

Adding a timestamp is redundant because you always have a timestamp with your transactions anyway or am I missing something?

Francesco Pischedda08:01:20

you are totally right! I am going to remove it from my implementation as well 🙂

timo08:01:46

I don't quite get what you mean with 'in line' in your last question. If you mean something like serialization, then yes that totally makes sense. In case someone creates the db from scratch the migrations need to be applied in a certain order. I need to think about it but we will have to compile the norms at first and transact them in order... :thinking_face:

Francesco Pischedda08:01:52

Yes I am thinking about that use case; another concern I have is that multiple people may be working on the same project and adding migrations in two different branches, there is a chance of conflicting transactions. Another source of problems could be that from your machine you could run some migrations againts a staging or production db and then you forget to commit the migration itself; this happened at $work before we introduced automatic migrations during deployments and it is probably an edge case of our "peculiar" setup at that time 🙂

Francesco Pischedda08:01:35

(using a relational db and the rambler migration tool, nothing related to datahike or other datalog dbs)

timo08:01:54

Some mistakes can not be remedied by a migration tool. But as a linux-user I am not used to filenames with spaces, so that is a good point. Thanks for your help 👍

Francesco Pischedda15:01:27

Hello @timok and @konrad.kuehne 👋 Today I have finished my initial version of the migration tools for my app, and can be seen in the commit https://github.com/fpischedda/unrefined/pull/40/commits/a524e49aff760fb73b61db99a48931a54007eb21 The migrator ns handles the core logic and the datahike ns takes care of trying to apply the changes if any, when its subsystem is started. The idea is that the application keeps track of its migrations in a vector, these migrations are compared with what is already in the db, and eventually applies the transactions. As you can see this is very naive, not unit-tested at all and probably too specific to my needs. Curious to see your migration tool!