Does anybody have a good set of techniques for "tagging" entities?
I'm doing a data import from another application into a Datomic DB, and I'm a little uncertain that I've got all ofmy bases covered when it comes to mapping the source database's values onto my own entities.
I was thinking an attribute like :tag/bad-import would be a handy thing to append to the entity when I'm transacting so I've got a really easy thing to query for when checking the import.
But if y'all have a different and/or better system, or some experience reports around what worked and didn't work for you in prod, I'd love to hear it.
What makes you know something is a :tag/bad-import? Is it when an import job fails in the middle? or when the conversion from the external entity types to your own may be shoddy?
I'm not sure how closely related this is but it seems adjacent and is fresh on my mind
I've been working on importing some data that is essentially a bunch of records with strings for all the values, even though they are often semantically numbers/instants/enums/etc and there's no real safe way to parse them. There's also a lot of attributes in the source data, and I only need a small subset and I don't really want to have to make decisions about everything up front. So I've just been importing the ALL raw data into datomic as strings ("automatically" deriving schema for the attributes), eg
{:raw.RecordType/name "product1" :raw.RecordType/price "123.00" ...}
then I have a higher level, where, when I actually want to use an attribute in my app I'll add a corresponding attribute, manually adding schema for it with the type I actually want (even if I still just want a string):
{:app.RecordType/name "product1" :app.RecordType/price 123.00M ...}
And then I have a job that goes through the raw data and transacts the derived app level data.
This way I have all the raw data dumped into datomic which itself is very useful to explore, but I know at my app layer is fully curated and all my assumptions can be checked during the raw->app derivation process.
I think this is something like the https://dataengineering.wiki/Concepts/Data+Architecture/Medallion+ArchitectureI’ve done similar things with a :system/admin-notes attribute before. It was definitely helpful when trying to figure out what happened with a particular entity.
@jjttjj > when the conversion from the external entity types to your own may be shoddy? This is exactly the situation: there are entities in the source database with attributes which I'm pretty sure I have mapped over to my own via keywords, but there may be attributes out there that I haven't accounted for, and if one sneaks into my database, I want a super obvious indicator that there's an entity with an attribute in there that I haven't mapped onto my own model. Starting with the raw strings and then gradually refining up to a fully curated and typed set of data when the set of raw strings becomes more or less "known" is an approach I hadn't considered at all. Cool way to both ingest data and not prematurely reject something out of hand because it doesn't map onto your early, preconceived notion of what this external data source contains. > And then I have a job that goes through the raw data and transacts the derived app level data. I'm using Conformity for exactly this, and it has worked quite well: https://github.com/qtrfeast/conformity > I think this is something like the https://dataengineering.wiki/Concepts/Data+Architecture/Medallion+Architecture Super cool concept with Medallion Architecture! I had not heard of that before, and it will be a great model to inform my thinking about these kinds of tasks.