hello i want to understand better how XTDB works and the underlying tools it uses i have read the docs that state it uses S3 as storage (with apache arrow format) and Kafka as the write ahead log but how does it store the columnar data? is it using postgres in some way? i would be glad if you could reference me to more elaborated docs or videos
(execute! conn {:insert-into :test-01
:records (map xt-jdbc/->pg-obj
[{:_id 0
:name "apple"
:price 20}])})
(execute! conn {:insert-into :test-01
:records (map xt-jdbc/->pg-obj
[{:_id 0
:name "apple"
:price 25}])})
(execute! conn {:insert-into :test-01
:records (map xt-jdbc/->pg-obj
[{:_id 0
:name "apple"
:price 45}])})
"modifying" the apple entity, does XTDB keep the unmodified fields unchanged and point to them or duplicates them?
how are they stored in comparison to datomic?
[{:_id 0, :name "apple", :price 45, :_system_from #inst "2025-02-18T09:48:40.855695000-00:00", :_system_to nil}
{:_id 0,
:name "apple",
:price 25,
:_system_from #inst "2025-02-18T09:34:02.891368000-00:00",
:_system_to #inst "2025-02-18T09:48:40.855695000-00:00"}
{:_id 0,
:name "apple",
:price 20,
:_system_from #inst "2025-02-18T09:09:27.993840000-00:00",
:_system_to #inst "2025-02-18T09:34:02.891368000-00:00"}]XTDB holds duplicates of data data for every row, but because of the columnar storage compression it's often not a significant impact in reality - you'll probably be interested to read this thread: https://discuss.xtdb.com/t/v2-best-way-to-handle-frequent-updates-that-might-not-contain-any-changes/404
Hi @itai good questions. So the documentation here is somewhat sparse, which reflects the fact there's been churn in the design https://xtdb.com/blog/dev-diary-feb-24 (although it's largely now stabilised), but I think you can get a feel based on various PRs descriptions, e.g. https://github.com/xtdb/xtdb/pull/4114, https://github.com/xtdb/xtdb/pull/3191, https://github.com/xtdb/xtdb/pull/3191, https://github.com/xtdb/xtdb/pull/3282, https://github.com/xtdb/xtdb/pull/2741, https://github.com/xtdb/xtdb/pull/3171 Essentially though there's a primary key LSM-tree structure (per table) that uses a trie for the keys that interleaves bits from both the ID and the timestamp information. We will be documenting the design more thoroughly once we have the General Availability (GA) release out of the door in a few weeks, but for now we're more focused on stability and the user facing docs. Hope that helps!