datalevin 2022-08-12 | Slack Archive

Athan09:08:21

Measuring Transactions Throughput with Datomic in-memory database and Datalevin storage engine Hi, I thought you might find this interesting. First https://clojurians.slack.com/archives/C03RZMDSH/p1660170818216469?thread_ts=1660170818216469&cid=C03RZMDSH below to see the experiment and the results. 1. It's important to explain quickly why I think there is such bad performance with Datomic in-memory client and I expect analogous results using https://github.com/tonsky/datascript. It's because of the data structures. To maximize the speed of writing in memory one has to use different memory management and data structures similar to those used in pyarrow, numpy. So, how about building an in-memory Datalog query engine on top of pyarrow ? 2. It was relatively easy to deploy, configure and test transaction throughput of a key-value storage engine (LMDB) of https://github.com/juji-io/datalevin. I would expect Datomic transactor on AWS DynamoDB Local or https://docs.xtdb.com/storage/rocksdb/ to have similar performance.

;; 5.2 sec for 23 cols x 10000 rows
;; 3.2MB books.csv file
;; Elapsed time: 5.2 secs
;; datoms inserted: 229,956
;; transactions throughput: 229,956/5.2 datoms/sec = 44222 datoms/sec

I have met similar problems in the past when I was testing the writing performance of redis KV storage engine. All these KV engines (redis, dynamodb, lmdb) are very good on point queries but they perform really bad when you want to write (import) a big volume of data. You may argue that writing performance is not critical for a transactional (OLTP) DBMS but it becomes super important when you want to import your data from another system/project, or you want to integrate a big volume of data from other sources, or you want to do analytics without adding another storage engine. In fact what we are discussing here is the price you pay for having a flexible universal data model based on EAV/RDF triplets. Which is a similar case when you try to construct a relational, tuple based data model on top of a KV storage engine or object like memory structure (Python/Clojure). The physical layout must be appropriate for such data model and the best candidate I found from my personal research and experiments is to use a columnar layout. Why not adding Datalog query engine on top of a columnar database engine based on LSM tree data structures, such as Clickhouse and Singlestore(MemSQL) ?

Huahai17:08:35

I assume that you are suggesting a columnar storage format? In that case, you should realize that all these triple stores already do that. Most triple stores store data in both columnar (e.g. AVE) and row format (e.g. EAV), in order to support flexible query. Because of redundant storage, the write speed is going to be slower than a single storage.

Huahai17:08:31

There are going to be tradeoffs. More flexible query means slower write time. There’s no free lunch.

Huahai17:08:25

For bulk loading datoms in Datalevin, you should not be doing transaction, you should init-db to load datoms directly, that bypasses the transaction logic and can be much faster. For example, it took about less than half seconds to load 100k datoms, but took one to two seconds to transact them in our benchmark, depending how many datoms you write in one shot. Preferably, you should write all datoms in a single transaction as it would be faster.

🙌 1

Huahai17:08:42

Finally, I don’t think LSM based structure will make a difference, which is the default of XTDB index (they use RocksDB as default). The write is a little bit faster, but only for small data size. If the data is big, i.e. more than a few KB a piece, LMDB is actually faster. Of course, LMDB is much faster when read the data than LSM based stores. So you will have to choose. Datalevin choose to focus on read speed, as we are a OLTP store.

Huahai18:08:15

Datalevin is designed as an OLTP DB. For analytical work load, you really need an OLAP. There’s no such things as one DB that is great for both OLTP and OLAP. Triple stores can be thought of as such, but as you already see, the write speed is going to suffer, for they write the same data multiple times. You simply cannot have everything.

Huahai18:08:14

The next version of Datalevin will have even slower write, in order to build more sophisticated indices to have query speed on par with relational DBs. So if your data is tabular in nature and your workload is mostly analytical, you should stick with an OLAP store.

Huahai18:08:16

We will be adding read only transaction log soon. Presumably, these can be stored in a columnar format to facilitate analytical workload. Of course, these will not be always up to date, but it is less important for analytics anyway. We will see.

Huahai18:08:56

So, ideally, one can use one instance of Datalevin as the operational store, and another instance as the read only replica, and run the analytical workload on the replica. After all, the strength of a triple store is the flexibility. We paid the price of redundant storage, so we can reap the benefits. Just don’t run them on the same instance.

Athan10:08:32

@U0A74MRCJ thank you for the time to respond, it's the physical layout of data that makes the difference both in-memory, or on-disk storage. Data are stored separately for each column (attribute) in contiguous arrays (memory/disk blocks) , then they are combined to form tables. The line between OLTP and OLAP is blared in these days because of the exact implementation details of the physical storage and the distinction between columns and rows both physically and logically. There is a very nice article "https://www.singlestore.com/blog/should-you-use-a-rowstore-or-a-columnstore/". I'll extract the following passage to make my point. • Rowstores are better at random reads and random writes. • Columnstores are better at sequential reads and sequential writes. Columnstores are excellent in reads despite the fact they are using sequential access. I have seen that in practice, their only disadvantage is when you want to update/delete specific records. And even in that case there are method of adding new records replacing and merging with old ones. Of course the value proposition for a Datalog DBMS is not to compete on the storage technology and that is why everyone is using an existing storage engine but to make a difference with the flexible schema and the functional/logic programming style (clojure/datalog). And yes generally speaking more flexible query means slower write time but for the same storage engine changing only the complexity of logical layer. That said I do also recognize that in practice you cannot have a perfect separation between physical and logical layers. And the exact way how those two are linked and communicate is a big open research field in my opinion. You mentioned XTDB is using RocksDB which is based on LSM-tree but it is optimized as a key-value store not as a column store. There are subtle differences on how Clickhouse/SingleStore has used similar technology (LSM-tree) for their column store engines. Moreover it's completely different when you build a column index (logically) based on a KV store with when you build the same index (physically) with the column store. Finally I will give you a real example to see what difference it makes using a really fast columnar: DBMS: Clickhouse Dataset: 1,200,241,561 (1.2 billion triples) Time to import the dataset: 146m !!! (bulk loading from a CSV file) Writing Throughput: 137,013 triples/sec Size on Disk: 19GB !!! Index: EAV Query: SELECT FROM dataset WHERE sub = '*0242ac12' (No Caching) Result: 20 rows in set. Elapsed: 0.010 sec. Processed 49.15 thousand rows !!! Machine: workstation with Intel Xeon E5-1650 v4 @ 3.60GHz × 6 cores - 32GB ram PS: You suggested to do bulk loading of datoms using init-db, this is a good point I have to try and see the difference it makes. You are reporting write throughput of less than a second for 100k, is that a single process writing, what are the specs of that machine, what type of SSD and what is the size of database on disk ?

Huahai16:08:11

I am not sure what the point you are trying to make by showing clickhouse can load data fast. Clickhouse is squarely in the analytics camp so it must load data fast. However, you cannot possibly use it as an OLTP store, as it does not have transactions, has no deletion, updates, has no constraints enforcement, so on and so forth. When people try to do both OLTP and OLAP in one store, they use the OLTP store as the basis, not the other way around. Datalog stores can be thought of as such an effort. Loading 100k Datoms in less than half second is in the same ballpark as what you are seeing. So init-db increases speed around 2x to 5x compared to transactions. Not a whole lot.

Huahai16:08:32

One point you seem to have missed, is that when you have physical layout that is great for sequential read and write, you necessarily have slow updates, for you normally have to rebuild a large structure for minor changes. In Datalevin, we already see this, in search. Our built in search engine has data structure optimized for sequential read and write, we beat lucene in search speed. But it is really slow if one wants to update an already indexed document, it is sometimes faster to rebuild the whole index if many documents need to be updated. By the way, LMDB should be thought of as an alternative to file system, not a DB per se, so one can put in any data structure in it, which is what we do in Datalevin. By choosing LMDB as the base, we have a lot of freedom in deciding our data format. I will not rule out the possibility of using format such as arrow for the transaction log, for example.

Athan17:08:11

Full-text search is a different story, I wouldn't mix this with the performance of a strictly columnar or Datalog based engine. Datomic for example has this feature as an option.

Huahai17:08:42

Btw, the kind of physical data structure for columnar stores are often bitmaps. Roaringbitmap is normally used. It is used in clickhouse and many other analytical stores. It is also used in our search engine. What I am saying, is that these trade offs are well known, and we are not missing anything.

Athan17:08:03

@U0A74MRCJ just to make lighter the discussion, everybody is missing everything. In some other logical form Socrates said I know one thing that I know nothing 🙂. There are many things I am trying to understand deeper. It's always good to discuss with experts in the field.

Huahai17:08:20

That’s not the impression I got from your original message. You talked as if we don't know what we (datalog people in the clojure world) are doing. I am a straight talker, my apologies. I thank you for trying out Datalevin. Cheers.

❤️ 1

Athan17:08:13

I have admitted in public, see for example this message in Clojure group, that the reason I was attracted to Clojure communities is the Datalog query engine itself and how you managed to combine it with data store engines. A clean separation of these two is in fact something I am an advocate of in posts I write and with a hyper-graph system I built for analytics. But I believe and you made some points here there is a lot of space for improvement on the marriage of these two.

👍 1

Huahai18:08:29

Definitely, there are tons of room for improvement. That's what keeps me interested in this space. A lot of things to explore, it is exciting for me personally. Hopefully some one finds useful as well.

Athan18:08:54

It is marvelous that you are an open-source contributor, open to share your expertise and implementation with others.

2022-08-12

Channels