Fork me on GitHub
#xtdb
<
2019-09-28
>
Ivan Fedorov14:09:35

@hoppy may I ask for a shallow timecard structure? Are they multiple entities or a single one? If you work with versions of a single entity you can use history-range which has four boundaries – two for valid time and two for transaction time. If this doesn’t help – you’re welcome to elaborate your thought in github issues, this indices idea may be very helpful. trampoline https://github.com/juxt/crux/issues

hoppy14:09:54

I have ~120K of those

hoppy14:09:02

this one takes ~3 secs to run, 94 hits

refset14:09:47

@hoppy My understanding is that this should make use of efficient range scans over the indexes. Have you benchmarked running this without using args?

hoppy15:09:33

Afk, bit i will. Does the type mess things up?

hoppy02:09:42

ok, looked a bunch more at this. My conclusion is that the indexes are not getting leveraged at all for inequality queries.

4
hoppy02:09:01

I'm getting ~4ish seconds for things that would constitute full scans.

hoppy02:09:59

I will make a quick gist of these so you can take a look.

refset16:09:26

thanks @hoppy, a minimal gist would be very helpful, the team will attempt to figure this out tomorrow

hoppy01:09:24

ok, I went a little past that. I put a project on github that is representative of my use case, which exhibits the same "shape"

💯 4
refset08:09:11

Thanks! I will take a look 🙂 in the meantime I think this analysis addresses the immediate problem: >>> so we support java.util.Dates for range scans, but not Joda Time, you can try adding some form of extension of ValueToBuffer for the date type, or you can model it on the normal date implementation, that might speed it up

hoppy10:09:06

I had had that thought as well, however, in the case I sent you, it appears that even for integers the index isn't exploited, so I didn't go beyond that in the example.

refset11:10:17

@hoppy as a brief update: I can reproduce your results and timings on my machine (again, thanks for publishing the repo!). I'm currently trying to whittle it down to an even more minimal example

hoppy11:10:28

@U899JBRPF, I therefore assume that you agree this result is unexpected?

refset12:10:55

Yes, although that is just my own personal assessment right now. For instance, it seems that doing [e :r r][(= r 308)] is slower than a direct [e :r 308] :thinking_face:

hoppy12:10:42

what is our workflow here. are you basically triaging this to see if it needs to become a ticket?

refset12:10:09

That's my plan. I'll get a second opinion (internally) today and open an issue a ticket for this today if we can't resolve it.

refset14:10:34

@hoppy I've determined that this is due to an unfortunate miscalculation of the join order for certain trivial cases. The workaround for the current release (`1.4.0`) is to add extra dummy clauses so that the relevant lvar (logical variable) is more frequent. E.g. this is based on a lightly modified version of your example with 200k samples and runs in ~13ms

range-test=>   (def q4a {:find '[e]
        #_=>            :where '[[e :counter ?r]
        #_=>                     [_ :counter ?r]
        #_=>                     [(<= ?r 263)]
        #_=>                     [(>= ?r 263)]]})
#'range-test/q4a
range-test=>   (time (def r1 (crux/q (crux/db @node) q4a))) (count r1)
15:22:22.474 [clojure-agent-send-off-pool-6] DEBUG crux.query - :query {:find [e], :where [[e :counter ?r] [_ :counter ?r] [(= ?r 263)] [(= ?r 263)]]}
15:22:22.475 [clojure-agent-send-off-pool-6] DEBUG crux.query - :join-order :ave ?r e {:e e, :a :counter, :v ?r}
15:22:22.475 [clojure-agent-send-off-pool-6] DEBUG crux.query - :join-order :ave ?r _13910 {:e _13910, :a :counter, :v ?r}
15:22:22.476 [clojure-agent-send-off-pool-6] DEBUG crux.query - :where [[:triple {:e e, :a :counter, :v ?r}] [:triple {:e _, :a :counter, :v ?r}] [:range [[:sym-val {:op =, :sym ?r, :val 263}]]] [:range [[:sym-val {:op=, :sym ?r, :val 263}]]]]
15:22:22.476 [clojure-agent-send-off-pool-6] DEBUG crux.query - :vars-in-join-order [?r e _13910]
15:22:22.476 [clojure-agent-send-off-pool-6] DEBUG crux.query - :attr-stats {:crux.db/id 200000, :counter 200000}
15:22:22.476 [clojure-agent-send-off-pool-6] DEBUG crux.query - :var->bindings {_13910 #crux.query.VarBinding{:e-var _13910, :var _13910, :attr :crux.db/id, :result-index 2, :join-depth 2, :result-name _13910, :type :entity, :value? false}, e #crux.query.VarBinding{:e-var e, :var e, :attr :crux.db/id, :result-index 1, :join-depth 1, :result-name e, :type :entity, :value? false}, ?r #crux.query.VarBinding{:e-var _13910, :var ?r, :attr :counter, :result-index 0, :join-depth 2, :result-name _13910, :type :entity, :value? false}}
15:22:22.480 [clojure-agent-send-off-pool-6] DEBUG crux.query - :query-time-ms 11
15:22:22.480 [clojure-agent-send-off-pool-6] DEBUG crux.query - :query-result-size 8
"Elapsed time: 12.565868 msecs"
#'range-test/r1
8

hoppy14:10:14

well, that's neat. I will try that out later with my case after you guys are asleep.

hoppy15:10:51

@U899JBRPF I presume this is going to get fixed at some point?

refset15:10:52

@hoppy Yes, definitely. I'm working on a PR candidate at the moment but I will open the issue (as promised!) if I can't figure out the solution myself today.