Fork me on GitHub
#xtdb
<
2022-03-21
>
J14:03:22

Hi guys! I’m a new user of xtdb and I want some informations on the node object create with xt/start-node. I use postgres for document and transaction store, rockdb for the index store. My question is why the size node is so big? Is because xtdb load the index store in memory?

refset14:03:07

Hey @UHZPYLPU1 🙂 what is the mm/measure function from? RocksDB uses a bunch of native memory allocations for caches, so it might be reflecting that. Similarly, XT also creates some (configurable) caches, so this size might be reflecting a warm cache size, depending on what operations you were using. What size of data did you load into the node?

J14:03:52

Hey @U899JBRPF thanks to answser me. I use this lib to measure object https://github.com/clojure-goes-fast/clj-memory-meter I just put few documents into the node.

refset14:03:16

Interesting, thanks, I haven't tried using that lib before but it looks neat! I'm not sure exactly what it will be measuring though :thinking_face: Are you trying to make XT work on a particularly memory-constrained system? Or are you just concerned about whether it is configured correctly?

J15:03:25

I’m concerned about the memory and it is configured correctly. I ask myself a lot of questions on xtdb ^^, especially on a multi-instance architecture.

👍 1
jarohen15:03:02

nothing springs to mind I'm afraid - the top-level number doesn't give us an awful lot of visibility into what it could be. do you see the same/similar results and/or more details if you run XT through a memory profiler?

J15:03:47

I haven’t used the profiler yet. Maybe I have misconfigured the xtdb config. In Postgres, I have only a tx_events table. It’s correct? (for handle document-store and tx-log).

refset15:03:49

That sounds about right, but feel free to paste your full config map (passed to start-node). What happens when you submit lots more data, does the memory usage shown by mm/measure increase to match it? Or does it stay ~constant?

J16:03:50

Here the config map passed to start-node

{:xtdb/index-store
 {:kv-store
  {:xtdb/module xtdb.rocksdb/->kv-store,
   :db-dir #object[java.io.File 0x4039850f "/tmp/indexes"]}},
 :xtdb.jdbc/connection-pool
 {:dialect #:xtdb{:module xtdb.jdbc.psql/->dialect},
  :db-spec
  {:host "0.0.0.0",
   :port 5432,
   :dbname "test",
   :user "....",
   :password "...."}},
 :xtdb/tx-log
 {:xtdb/module xtdb.jdbc/->tx-log,
  :connection-pool :xtdb.jdbc/connection-pool},
 :xtdb/document-store
 {:xtdb/module xtdb.jdbc/->document-store,
  :connection-pool :xtdb.jdbc/connection-pool}}

J16:03:05

When I transact lots of data (~1000 docs) the memory seems constant. And the result with the shallow option (:shallow true calculates only memory occupied by the object itself, without children)

J16:03:52

Alright. Thanks you @U899JBRPF 🙂

🙏 1
jarohen16:03:34

there are a number of such caches throughout XTDB too - as @U899JBRPF alludes to, we consciously trade off higher memory usage for faster ingest/query.

jarohen16:03:45

i.e. while we're obviously not being careless with memory, we're not specifically optimising for restricted-memory scenarios.

✔️ 1
J16:03:10

Thanks @U050V1N74 for you precisions 🙂

🙏 1
Martynas Maciulevičius08:03:31

Which version of Java did you use? I have these versions and none of them could run the measure method for xtdb node. Some of them needed command line params too.

java-11-temurin
  java-17-openjdk
  java-17-temurin (default)
  java-8-adoptopenjdk
  java-8-openjdk
Can you share your args and version of JDK?

J08:03:14

Hello @U028ART884X I use this version of java:

1
ogi15:03:05

I have a couple of questions about the XTDB SQL: 1. Is it possible to have the aggregates (`stddev`, median, ...) in the :xtdb.sql/table-query datalog query? 2. I’d like to deploy an XT node with SQL API, for example`{:xtdb.calcite/server {:port 1501}}` , to Kubernetes. Is there an endpoint on HTTP port 1501 to check the status?

refset15:03:05

Hey @UCBBG7HFX 🙂 1. Yes, the aggregates in the stored queries should work fine (although beware that they are necessarily calculated "from scratch" each time) 2. I don't know the real answer, but I expect that there is something that can be done. The Calcite server project is called Avatica: https://calcite.apache.org/avatica/docs/ and they have Docker images, as per https://calcite.apache.org/avatica/docs/docker.html and https://hub.docker.com/r/apache/calcite-avatica ...so I imagine there is a trodden path here already

ogi16:03:06

Thank you Jeremy! 🙂 I’ll take a look at the Calcite docs. I wasn’t able to run aggregate in the table query. If I have query '{:find [stddev(price)] … } what should go into :xtdb.sql/table-columns for the stddev column?

refset16:03:40

Ah, good point, so on the Clojure side you can usually treat quoted expressions as if they were symbols, for instance you can do :order-by [[stddev(price) :asc]], but I doubt this can work at all nicely from SQL given how the column names get translated to uppercase (as per https://github.com/xtdb/xtdb/blob/1693a633910b9a7815ad514da2e0685b4c21cc4f/modules/sql/src/xtdb/calcite.clj#L444) As a workaround, you could push the whole query into a subquery, and then the outer query would have "normal" symbol vars, e.g. like:

{:find [e sum-bar]
 :where [[(q $ {:find [e (sum bar)]
                :where [[e :baz bar]]}) [[e sum-bar]]]]}

refset16:03:38

Thinking again about the other issue, given that Avatica protocol is JSON-over-HTTP - you should be able to send some predictable/well-known SQL statement to act as a status check, e.g. like an "ALL_TABLES" table (apologies for the very low quality screenshot I just dug out from https://www.youtube.com/watch?v=StXLmWvb5Xs)

👍 1
ogi17:03:47

Using a subquery for the aggregates works. Thank you !

🙏 1