2025-01-23 datalevin | Clojure Slack Archive

datalevin 2025-01-23

Jeremy 2025-01-23T07:16:15.893689Z

Hi all, Is it better to have a parent reference its children with :cardinality/many, or have the children reference their parent with a :parent ref attribute? I'd be mostly looking for children of a specific parent Also, writing 560k entities takes up about 250mb space. Is this normal size for that many entities (Ik there's currently no compression)? 98% of the entities has about 4 short attributes: valid-from, valid-to, parent-ref, level, value. Lastly, how can I set the initial db size for just one db? I see the *init-db-size* var, but wouldn't it affect all dbs?

Jeremy 2025-01-25T19:04:38.420909Z

Thanks @huahaiy @phill. datalevin is my first datalog db, so I'm still finding best practices

Huahai 2025-01-25T21:47:35.242769Z

We all are 🙂

2025-01-23T08:00:10.806369Z

Parent child relationship for me depends on how I transact the data. E.g the data might come from two sources that get transacted separately in which case the relation will be attached to the source that gets transacted second. But, most of the time it doesn't make much of a difference.

👍 1

Jeremy 2025-01-23T08:29:22.851019Z

thanks. I have lots of children entites and was wondering if making them child refs would result in faster queries. Regarding the db size. is it expensive an expensive operation? esp going from, e.g., 100gb to 200gb? does lmdb have to rearrange pages? I expect my db size to grow up to 500gb and back to 1gb, once I prune data, many times. having it resize so that often would be quite expensive; I'd much rather keep it at minimum of 500gb

Huahai 2025-01-23T16:46:31.327589Z

{:kv-opts {:mapsize 500000}} pass this as option when get-conn

Huahai 2025-01-23T16:46:50.485149Z

that's 500 GiB.

Huahai 2025-01-23T16:47:16.185539Z

When you open the DB, it cost nothing to pass in a bigger size.

Huahai 2025-01-23T16:47:32.768199Z

Nothing is rearranged.

Huahai 2025-01-23T16:48:18.304979Z

mmap needs a size. That's all.

Huahai 2025-01-23T16:50:28.590049Z

Yes, triple store takes up a lot of spaces. In the future, even after we have compression, it is still going to take more space than RDBMS. Because triple store index everything by default, whereas RDBMS index nothing by default.

Huahai 2025-01-23T16:53:39.481099Z

Huahai 2025-01-23T16:58:08.546949Z

So I would recommend to always pass in a max size when open the DB. It cost nothing, but will save from having to resize the DB during transaction, that's very slow.

🧠 1

Huahai 2025-01-23T16:59:32.225549Z

I would recommend not using :cardinality/many as it triggers the slower pathway during query.

👍 1

Huahai 2025-01-23T17:03:13.216459Z

I would prefer to add a reference at the many side of the relationship, i.e normalize the data as much as possible. Datalevin's query engine is optimized for highly normalized data. For example, Datalevin beats PostgreSQL in querying join order benchmark data set, which is highly normalized.

Huahai 2025-01-23T17:15:07.225319Z

:cardinality/many violates first normal form (1NF). It is a convenient feature, not a preferred one.

2025-01-23T23:40:40.533929Z

Furthermore, pragmatically, pull queries are more ergonomic when they traverse a relationship that's cardinality-one. If the relationship is cardinality-many (or is a _reverse relationship) you get an array back, which is a headache if you know there will only always be one member.

🤯 1

2025-01-23T07:58:08.647099Z

Database sizea allocation increases in chunks something like 250MG 1G 10G etc. But that's not the actual size storage size.

Ahmed Hassan 2025-02-06T19:58:35.008219Z

> I would prefer to add a reference at the many side of the relationship, i.e normalize the data as much as possible. How would we use :db/isComponent in this way?

Jeremy 2025-02-07T20:23:53.365849Z

I think you can't. you'd have to manually retract the references. At least that's what I do.

Clojurians Log v2

datalevin 2025-01-23