Hi all,
Is it better to have a parent reference its children with :cardinality/many, or have the children reference their parent with a :parent ref attribute?
I'd be mostly looking for children of a specific parent
Also, writing 560k entities takes up about 250mb space. Is this normal size for that many entities (Ik there's currently no compression)? 98% of the entities has about 4 short attributes: valid-from, valid-to, parent-ref, level, value.
Lastly, how can I set the initial db size for just one db? I see the *init-db-size* var, but wouldn't it affect all dbs?
We all are 🙂
Parent child relationship for me depends on how I transact the data. E.g the data might come from two sources that get transacted separately in which case the relation will be attached to the source that gets transacted second. But, most of the time it doesn't make much of a difference.
thanks. I have lots of children entites and was wondering if making them child refs would result in faster queries. Regarding the db size. is it expensive an expensive operation? esp going from, e.g., 100gb to 200gb? does lmdb have to rearrange pages? I expect my db size to grow up to 500gb and back to 1gb, once I prune data, many times. having it resize so that often would be quite expensive; I'd much rather keep it at minimum of 500gb
{:kv-opts {:mapsize 500000}} pass this as option when get-conn
that's 500 GiB.
When you open the DB, it cost nothing to pass in a bigger size.
Nothing is rearranged.
mmap needs a size. That's all.
Yes, triple store takes up a lot of spaces. In the future, even after we have compression, it is still going to take more space than RDBMS. Because triple store index everything by default, whereas RDBMS index nothing by default.
See also https://kb.symas.com/design/lmdb-database-file-sizes-and-memory-utilization
So I would recommend to always pass in a max size when open the DB. It cost nothing, but will save from having to resize the DB during transaction, that's very slow.
I would recommend not using :cardinality/many as it triggers the slower pathway during query.
I would prefer to add a reference at the many side of the relationship, i.e normalize the data as much as possible. Datalevin's query engine is optimized for highly normalized data. For example, Datalevin beats PostgreSQL in querying join order benchmark data set, which is highly normalized.
:cardinality/many violates first normal form (1NF). It is a convenient feature, not a preferred one.
Furthermore, pragmatically, pull queries are more ergonomic when they traverse a relationship that's cardinality-one. If the relationship is cardinality-many (or is a _reverse relationship) you get an array back, which is a headache if you know there will only always be one member.
Database sizea allocation increases in chunks something like 250MG 1G 10G etc. But that's not the actual size storage size.
> I would prefer to add a reference at the many side of the relationship, i.e normalize the data as much as possible.
How would we use :db/isComponent in this way?
I think you can't. you'd have to manually retract the references. At least that's what I do.