2020-08-17 datalog | Clojure Slack Archive

datalog

Ben Sless 2020-08-17T06:51:37.125Z

Hello, How should I model the schema of entities whose identity is predicated on several attributes, some of which are optional? Assuming the attributes are :a0, :a1 , :opt0, :opt1 The following are valid distinct entities:

{:a0 0 :a1 1}, {:a0 0 :a1 1 :opt0 0}, {:a0 0 :a1 1 :opt1 1}, {:a0 0 :a1 1 :opt0 0 :opt1 1}

Huahai 2020-08-19T18:15:41.127200Z

In systems that are similar to Datomic (Datascript, Datalevin, Datahike), the entity id is nothing but a system generated integer, so you really don’t need anything special to model “entities whose identity is predicated on several attributes”, because that’s already the case.

Huahai 2020-08-19T18:19:23.127400Z

you can directly transact this data (d/transact conn [{:db/id -1 :a0 0 :a1 1}, {:db/id -2 :a0 0 :a1 1 :opt0 0}, {:db/id -3 :a0 0 :a1 1 :opt1 1}, {:db/id -4 :a0 0 :a1 1 :opt0 0 :opt1 1}]) in Datascript, Datalevin or Datahike, and it will work, as schema is optional for these systems.

Huahai 2020-08-19T18:32:29.128Z

If you care about read performance on durable storage, give Datalevin a try, which stores data solely in LMDB. As far as I know, LMDB is the fastest kv-store for read intensive workload.

Ben Sless 2020-08-19T18:34:11.128200Z

Alright, I'll give it a shot, thank you! If I were to use a system where schemas aren't optional, how would I do it? How would I distinguish when querying between entities with the same required attributes which might not even have the optional attributes? (one entity has to have it for the use case to be interesting, ofc)

Huahai 2020-08-19T18:34:56.128400Z

Datalevin is specifically optimized for LMDB to maximize the performance, unlike other systems that try to enable pluggable storage.

Huahai 2020-08-19T18:36:50.128700Z

Datalevin is more like Datomic in term of schema, although schema is optional, but if you want to do range query on an attribute, you should define the :db/valueTye for it, as the keys are compared bitwise.

Huahai 2020-08-19T18:39:06.128900Z

I am not sure Datalog care about if an attribute is required or not, that seems to be an application level concern.

Huahai 2020-08-19T18:42:33.129100Z

your application code should maintain such constraint, I don’t think Datalog has such facility to allow you to say “this attribute is required for this entity”, for entity is generic, not typed.

Huahai 2020-08-19T18:43:50.129300Z

but in Clojure, we do use some conventions to facility this, for example, the namespaced key

Huahai 2020-08-19T18:45:02.129500Z

for example, an “salses” entity will have attributes such as :sales/company, :sales/category ,etc.

Huahai 2020-08-19T18:46:05.129700Z

but these are just conventions, database is not aware of these

Huahai 2020-08-19T18:48:53.129900Z

in your application code, you can use libraries such as prismatic/schema to maintain such constraint, but the database is not going to enforce it.

Huahai 2020-08-19T18:50:20.130100Z

the only constraints datomic flavor of datalog systems enforce are uniqueness, references, as well as data type if you specify them.

Huahai 2020-08-19T18:53:15.130300Z

of course, you can always write transaction functions to check whatever properties that you want to check, including “this kind of entities require this key”

Huahai 2020-08-19T18:54:35.130500Z

Right now, Datalevin doesn’t support persistent transaction functions, it’s on our roadmap though

refset 2020-08-17T08:56:26.125100Z

Hi :) with Crux you can use these maps directly as explicit IDs

Ben Sless 2020-08-17T09:52:56.125300Z

I'm still not sure which implementation I'll be going with. It's good to know crux supports it. I can always artificially collect these to a keyword in order to create an identifier in the system How's Crux's read performance? I'm considering using datalog for a very read-intensive system, whereas bitemporality is less important.

refset 2020-08-17T10:13:29.125500Z

Crux's read performance is pretty great because it delegates so much of the real work to the KV store. LMDB can be up to 3x faster than RocksDB for reads in some of our measurements so might be the better choice for you. I can't comment on comparative performance with the other Datalog engines you might be looking at, but we run graph query "stress test" benchmarks every night and hold up fairly well against the likes of RDF4J and Neo4j (which have benefited from decades of performance engineering)

Ben Sless 2020-08-17T12:57:46.126600Z

Well, in the meanwhile I'm in the proof of concept stage, more worried about my domain modelling. If you could point me to the specific part of the docs I'd be much obliged 🙂

👍 1

refset 2020-08-17T14:18:25.126900Z

A lot of Crux users seem to be happy doing the schema modelling purely in spec. What Crux enforces is very minimal, so you probably want to write a few transactions functions to enforce any invariants you might need: https://www.opencrux.com/reference/transactions.html It usually best to model relationships (ref attributes) as reverse references, where child entities point to their parents (child->parent). The trade-off though is that it modelling in the forward direction (parent->child) benefits from native sorting in the indexes

🙏 1

2020-08-17T12:52:55.126400Z

@tbrooke has joined the channel

Clojurians Log v2

datalog