Fork me on GitHub
#xtdb
<
2022-04-04
>
tatut05:04:23

I have a deep structure shredded as documents that link A->B->C (document A has an attribute that is the id of B and so on)... now I want to find As by an attribute in C, this requires the query to consider all As and is quite slow... in datomic this kind of query is much faster as it has the VAE index for all entity references. Any tricks to make that faster other than pulling the search fields up to A and maintaining those manually.

refset06:04:17

Are the attributes always known in advance? If so, XT has an AVE index to avoid scanning, but the index may not being factored into the join order in an optimal way. Please can you share the query and the :vars-in-join-order for xtdb.query/query-plan-for?

tatut06:04:49

attributes are known, the Cs are found using text-search

tatut06:04:39

I can try to create an equivalent repro case as I can't share this as is

tatut06:04:50

on second look, this looks that to be another or slowdown

tatut06:04:16

I had an or clause filter that checked that either an attribute in A matches a text-search or an attribute in C matches a text-seach... just searching from C the query performs well enough

tatut07:04:02

tried looking at flamegraphs, but I don't understand internals enough... either of the clauses in the or individually are fast, (the whole query with one clause 200ms) then when combining, it goes to 16 seconds... layered-idx->seq step seems to feature heavily, but can't say that is a culprit

👍 1
refset07:04:52

Sadly regular flamegraphs can't follow the execution of individual clauses meaningfully, but even just a very crudely anonymised version of the query (+ :vars-in-join-order) with the logic vars replaced with foo/bar/etc should definitely be enlightening. Equally feel free to DM me

refset07:04:37

It sounds possible that the text-search may be re-evaluating multiple times

tatut08:04:17

that sounds like a reasonable guess... it is much faster if I replace the text-search or clauses with regular triple clauses

tatut08:04:51

tried some regular triple clauses and am back in 400ms range for the query

tatut08:04:10

is there an easy way to verify? I could probably monkey patch the text search code in the REPL...

tatut08:04:10

I added a call counter to xtdb.lucene/resolve-search-results-a-v and that at least is called 18282 times in my query, is that called for each potential result

👍 1
refset08:04:53

right, it sounds like one or both of those text-searches are running multiple times

emccue19:04:22

Is there a way to know how exactly nested fields get broken up into datoms? Like, exactly what are the facts that can be matched against in a query

Steven Deobald20:04:41

Can you provide an example? In general, nested fields won't match xtdb datalog queries, so I'm not sure what you mean. (Only top-level attrs in a doc are matched.)

emccue20:04:40

right. i must have known that at some point

emccue19:04:32

And then - this might be a silly thing - but is there any particular reason the Java API exposes/works with Dates instead of Instants?

Linus Ericsson19:04:22

My guess is that this is because of historical reasons/backward compatibility,

Steven Deobald19:04:53

Largely this, yes.

✔️ 1
emccue20:04:26

well if java usage takes off, mayhaps something to consider

Steven Deobald20:04:51

The dev team is taking a good hard look at data types right now, and this is definitely getting a serious revisit. The tentative plan is to come up with a stricter set of types which are deeply and precisely supported.

👍 2
refset21:04:47

it's at least in part due to a preference for handling every temporal coordinate internally as a long (64-bits), which has a direct and widely-recognised correspondence with Date

👍 2
emccue20:04:15

While I'm asking dumb questions, are there plans out there for managed installs? Like, heroku addons, that kind of stuff

Steven Deobald20:04:54

Not a dumb question at all. 🙂 Almost clairvoyant. We spoke to a large number of folks recently about what they'd like from xt and "management" (for various values of "management") came up repeatedly. On one end, that might just be a simpler Docker install or an AWS Marketplace component. On the other end, there's full-blown DBaaS.

emccue20:04:29

For us, we'd like something we can just slap in pulumi

👍 2
Steven Deobald20:04:38

@U3JH98J4R Out of curiosity, is a Heroku Addon specifically the thing you'd like to see us implement?

emccue20:04:52

Depends what hat i'm wearing

Steven Deobald20:04:18

I'd like to hear what the answer is for each hat, if you have a sec to list them out.

emccue20:04:57

for the company I work for, we are doing all our infra via pulumi on AWS and we have a strong preference for a managed service over anything we roll ourselves

emccue20:04:00

for the people I teach, and would want to throw them the java library or the http api, I would want something single click on heroku and a straightforward local install similar or more convenient than postgres

👍 1
emccue20:04:23

for my personal doodlings I would want heroku and/or a docker image just so i can use it without thinking much

Steven Deobald20:04:03

> managed service Just to completely disambiguate, this means a DBaaS that offers you an API but no direct access to the hardware, correct? (As opposed to something in AWS Marketplace, say, where you'd still manage the instance yourself.)

emccue20:04:03

For option number one, yes

emccue20:04:47

at least I think so

emccue20:04:56

i'd have to ask our pulumi "person"

Steven Deobald20:04:42

@U3JH98J4R We'd love to know. There's a wide spectrum between producing prebuilt marketplace binaries and actually managing other peoples' data. We're not averse to either, but actually running a full-blown DBaaS means investing heavily in admin/ops work. Managing other peoples' data isn't something to take lightly. 🙂

emccue00:04:55

for our financial stuff I think keeping all the data in our VPC would be important

1
emccue00:04:05

so maybe the AWS marketplace solution would be better?

👍 1
Martynas Maciulevičius05:04:59

I think that as xtdb supports variety of backends for storage it should somehow be choosable with the back-end in mind. So for instance I imagine that on Heroku you could take a Postgres image and use XTDB with that Postgres as a storage engine. But some apps may run XTDB with in-memory only configuration. And then there's kafka and others. So I think that every one-click deployment can be very different about what it does.