Fork me on GitHub
#xtdb
<
2021-06-10
>
dvingo22:06:59

is it noted anywhere in the docs that there is a query planner? I am struggling to find any mention of it - and that you don't need to worry about your clause order. This is an amazing feature and somewhat of a selling point, I think, so I would expect to see it

refset22:06:47

Not explicitly, currently. The idea of a DBMS without a query planner is pretty obscure, but you're right that it's worth pointing out for the Clojure audience who may be preconditioned by other systems 🙂 It did get a mention in the latest tutorial: > The order of the data patterns does not matter. Crux ignores the user-provided clause ordering so the query engine can optimize query execution. > https://nextjournal.com/try/learn-crux-datalog-today/learn-crux-datalog-today The other context here is that we don't offer any real manual control of the join order, and this can sometimes be unhelpful when Crux's query planner does something inefficient. However, without explaining the query engine implementation in detail or providing tooling to understand whether Crux is selecting a good join order, it has been easiest to present Crux as a black box and simply wait until users raises issues about painfully slow queries. Out of interest, have you looked at the crux.query DEBUG logs before?

dvingo12:06:45

That's a pretty good point that until my experience with datomic, all the other databases I've used (the usual sql list, neo4j, and mongo) include them, but do not call them out because they are assumed to be part of a database. They do have documentation about their query/execution plans though and then notes on how these relate to query perf (https://docs.mongodb.com/manual/core/query-plans/ https://neo4j.com/docs/cypher-manual/current/execution-plans/) and how to introspect the db to tune your queries or how to store your data to fit the constraints of the design of the database. I'm currently working with a datomic system at work, and because it doesn't provide any introspection mechanisms around these things it is a constant ambient background pain and anxiety knowing that the database could start suffering performance problems because the size of the data has changed and what used to be performant queries no longer are, or some seemingly minor query change could tank performance, and not being given any tools to deal with these problems is very frustrating. I guess what my original question is really getting at, is that it would be helpful to have a page in the crux docs similar to those linked above: how crux constructs a query plan, what work is needed to fulfill the query and what data is available from the db to see, in a particular database, why a specific plan was chosen. I understand given the age of crux it is entirely reasonable that this isn't a priority. It is also not something I need to have to successfully use crux, but is something I would want to have, because if I'm going to make crux the core of a system, then having a solid understanding of what it is doing, and why, is a good practice from an engineering perspective of knowing the limitations/parameters of your tools. Related to just tuning queries, knowing this information about how the database translates a query into a specific set of steps to retrieve your documents would help in designing the shapes of your documents. So that would be another area where I would find this information useful - to help guide schema shape decisions when modeling a domain. I didn't know about crux.query how would I find out about this (or how do/did other people)? Maybe notes on how to view these logs can be added to the docs 🙂 Nothing here is pressing, I think if you are interested in knowing this information you can currently piece it together via the documentation, the various recorded talks online, and looking at the source code, but it would be nice to have some more notes in the official docs, even just collecting together those materials and a note saying more content will be added later. I didn't realize I would have this much to say about it so thanks for the follow!

🙌 2
refset16:06:26

Wow, thanks for the write-up! 🙏 It really helps to hear all this feedback and perspective 🙂 I'll make sure it all flows through the project board in due course. > I understand given the age of crux it is entirely reasonable that this isn't a priority Pretty much this, yep 😅 > I didn't know about `crux.query` how would I find out about this (or how do/did other people)? I've mentioned it here and on Zulip a few times, possibly in an issue or two also, but I guess looking through the source is the main prompt.

dvingo18:06:05

Makes sense regarding the crux.query info! No problem, I'm glad you're receptive to feedback and use it to build a better product 🙂

❤️ 2
Steven Deobald02:06:26

@U051V5LLP @U899JBRPF For future reference (since this won't be the last time this question is asked), crux.query debugging is documented here: https://opencrux.com/community/faq.html#observequeries

Steven Deobald02:06:22

> it would be nice to have some more notes in the official docs, even just collecting together those materials and a note saying more content will be added later. @U051V5LLP This is a really strong point. As one of those "the universe is trying to tell me something" moments, I was just reading this same suggestion today in Fogel's Producing Open Source Software ... I think I've always felt adding "TODOs" in the docs looks unpolished. His argument is that these markers serve two purposes: "[empathic reassurance] that users don't face a struggle to convince the project of what's important" and "a legitimate open request for volunteer help."

dvingo14:06:58

@U01AVNG2XNF thanks for the link! And yea, that's not a bad way of looking at things 🙂

🙏 2
rschmukler22:06:53

Should I need to do anything to rebuild a node's local document store / index store beyond deleting the files?

refset22:06:58

The document store should never need rebuilding, unless you're talking about the "local-document-store", but yep you can just delete the index-store KV directory

rschmukler22:06:28

I do mean the local document store

👍 2
refset22:06:31

You may also need to clear the checkpoint store / point at a new one

rschmukler22:06:44

So, interestingly, I've seen this happen now 2x...

rschmukler22:06:06

Basically I start the crux node and it doesn't seem to rebuild the local indices

rschmukler22:06:42

Furthermore, submitting a tx has a lower ID than where it should be

rschmukler22:06:11

(defsys *config*
  {:kafka-config
   {:crux/module       'crux.kafka/->kafka-config
    :bootstrap-servers "localhost:9092"}
   :local-rocks
   {:crux/module 'crux.rocksdb/->kv-store
    :db-dir      (io/file "data/rocksdb")}
   :crux/tx-log
   {:crux/module  'crux.kafka/->tx-log
    :kafka-config :kafka-config}
   :crux/document-store
   {:crux/module          'crux.kafka/->document-store
    :kafka-config         :kafka-config
    :local-document-store {:kv-store :local-rocks}}
   :crux/index-store
   {:kv-store :local-rocks}
   :teknql.crux-geo/geo-store {:backend
                               {:crux/module 'teknql.crux-geo.spatialite/->backend
                                :db-path "data/geo.sqlite"}}})

rschmukler22:06:50

Do you see anything in the above that would make it seemingly overwrite the existing transaction log?

refset22:06:43

it shouldn't be possible to overwrite the existing transaction log, Crux doesn't do anything destructive with Kafka, except via evictions, so your data should be safe

rschmukler22:06:05

That's why this is so troubling 😛

refset22:06:07

Can you verify that what's on Kafka looks about right?

rschmukler22:06:09

Unless my docker volumes are screwed up

rschmukler22:06:17

taking a look

rschmukler22:06:59

It's indeed a docker issue. Whew!

rschmukler22:06:08

Sorry for the false alarm

refset09:06:26

No problem, I'm glad you have it sorted 🙂