Fork me on GitHub
#xtdb
<
2022-08-11
>
Carlo12:08:09

I'd love to ask a more generic question here. It is a commonly held opinion that using relational databases as triple stores is a bad idea for performance (I can link references). Is there some characteristic of xtdb (or datomic for what matters) that makes using them as (essentially) triple stores more palatable. Or is my mental image wrong and xtdb is not (essentially) a triple store? I ask mainly because I'm very attracted by the simplicity of the modeling that datomic can achieve, but scared by the perfomance implications, and curious to know if there's some secret sauce that's not in postgres ☺️

tatut13:08:29

postgresql is just one option to store the tx log and docs… so it isn’t a case of using “relational database as a triple store”

Carlo15:08:45

Thank you, but I meant: what's the architectural difference that makes working with a triple store performant in datomic/xtdb?

Karim17:08:49

@UA7E6DU04 I'm not sure about the XTDB implementation but Datalog as a query language makes a big difference for querying the data with as many predicates as you want. From the book Designing Data Intensive Applications: (the example is in Cypher but it's the same idea)

tatut04:08:35

There are probably many specifics to XTDB/Datomic/something else, but afaict it’s the different covering indexes (EAV, AEV) that make datalog queries performant… and one of the reasons we can do it is storage space being cheap and fast these days

tatut04:08:51

and having a SQL table like create table triples (e bigint, a text, v text) would be cumbersome to work with even if you had indexed it by every combination

refset11:08:29

> having a SQL table like create table triples (e bigint, a text, v text) would be cumbersome to work with even if you had indexed it by every combination This is the correct answer IMO. Most OLTP ("OnLine Transaction Processing") relational databases typically use a row-oriented storage model, which strongly couples the physical data layout to the logical data model. Attempting to use sparse and wide rows is inefficient in these systems (e.g. NULLs takes up space, and excess I/O for a full row when you only need a couple of columns), and running lots of joins is typically also inefficient/problematic. By contrast, systems like Datomic and XT encourage performing lots of joins over sparse indexes because it can make complex modelling a lot simpler.