Fork me on GitHub
#xtdb
<
2022-11-16
>
Hukka07:11:15

What all things affect the time to get the xtdb node in sync from a cold startup? For some reason my development machine syncs in 150 seconds, but the deployed instance in 1000 seconds. It's closer to the stores (network latency) and has better bandwidth too, but has less memory and CPU cores. I'm thinking about testing with a couple of differently sized instances, but it's a bit of a blind search.

tatut07:11:49

cold as in, no local indexes at all?

Hukka07:11:01

That's right

tatut07:11:26

I think it depends on the type of local indexes being used

Hukka07:11:18

You mean rocks vs something else? Both of these were done with the same configuration; with local indexes on Rocksdb

tatut07:11:58

yeah, rocksdb or lmdb will have different things to tune performance

Hukka09:11:15

Seemed to be related to throttling in the deployed env. At current amounts of data, I couldn't see significant differences between 2/6CPU or 4/8GB of ram

refset12:11:03

Hi @U8ZQ1J1RR - very late to the party here (sorry!), but throttling sounds plausible. Can you share which tx-log/doc-store combo this was? Have you figured out how to disable/reduce the throttling?

Hukka12:11:21

We are on GCP, nodes on cloud run and stores on long running postgres (not the cloud storage that google offers). The crucial button was in the cloud run settings on whether the CPU is always allocated. We had missed that, since we are already allocating one node at minimum all the time to avoid long delays in serving requests.

refset12:11:51

Ah, that's very interesting to hear, thanks for summarising 🙂

zeitstein08:11:37

Looking at https://github.com/xtdb/xtdb/blob/e2f51ed99fc2716faa8ad254c0b18166c937b134/test/test/xtdb/query_test.clj#L1153 using literal sets in clauses (v position), it seems to work as though the clause was expanded into an or rule. What are the performance characteristics between using the literal vs. or vs. or-join? Basically wondering if I can/should replace my or-joins with literals. or-joins are somewhat annoying to deal with programmatically.

refset12:11:25

Hey @U02E9K53C9L the literals are always going to be faster, as or/`or-join` compiles to a subquery and the overheads of that are non-zero. Whereas the set literal is essentially just a tiny virtual relation

refset12:11:53

(sorry for the very delayed response 🙈)

zeitstein20:11:24

Awesome, thanks 🙂 (And no worries about the delay!)

🙏 1
J09:11:10

Hi, I have the following data in the db:

(def data
  [{:name "foo"
    :settings nil}
   {:name "bar"
    :settings {:width 300}}
   {:name "fifoo"
    :settings {:end-date nil}}
   {:name "baz"
    :settings {:end-date (belt/new-instant 2022 12 12)}}])
I want to have only item with settings.end-date to nil or missing and with settings not nil I use this query:
{:find [(pull ?item [*])]
 :where [[?item :name]
         (or-join [?item]
          (and [?item :settings ?settings]
               [(some? ?settings)]
               [?settings :end-date ?end-date]
               [(nil? ?end-date)])
          (and [?item :settings ?settings]
               [(some? ?settings)]
               (not [?settings :end-date])))]
But I don’t understand why the item baz with the end-date not nil is returned.

zeitstein09:11:51

data is definitely not in db (e.g. missing :xt/id), so I'm not sure this is the case but: the :end-date maps will not be indexed if your actual docs in db are

{:xt/id ...
 :name "fifoo"
 :settings {:end-date nil}}

J09:11:38

Sorry @U02E9K53C9L, xt/id are in place like: (xt/submit-tx xtdb (map #(vector ::xt/put (assoc % :xt/id (:name %))) data)

zeitstein09:11:11

You can either change your data model as described at the link, or :end-date maps can be their own entities – I don't know what makes sense for your use case. It should also be possible to retrieve the map inside the query. Something like:

(xt/q (xt/db xtdb)
    '{:find [e m]
      :where [[e :settings m]
              [(contains? m :end-date)]]})

🙌 1
J09:11:13

Nice catch @U02E9K53C9L thanks!

1
sakalli15:11:40

hi, :limit seems to return duplicates if there are less results than the limit. if i aggregate a logical variable with distinct in :find I dont get the duplicates. now, how would i combine both pull and distinct in the :find clause? or is there a more idiomatic way of getting a response of only unique items?