Fork me on GitHub
#xtdb
<
2020-11-26
>
nivekuil07:11:51

here's something weird I'm experiencing, still on 20.11. await-tx here blocks forever:

(crux/await-tx node
               (crux/submit-tx
                node
                [[:crux.tx/put {:crux.db/id 14 :whatever/id "a"}]]))

nivekuil07:11:20

but only if it's an :whatever/id

nivekuil07:11:47

not an :whatever/idfoo or a :whatever/idbar, just if (name attr) is id , including :id

nivekuil07:11:28

now, I know that some other docs with a :foo/id attribute have transacted into the entirely in-memory stores just fine, but at some point this seems to have broken

nivekuil07:11:56

ah, it only happens with string values. I assume this is lucene's doing

nivekuil08:11:26

;; returns true
(crux/tx-committed? node
                      (crux/await-tx node
                                     (crux/submit-tx
                                      node
                                      [[:crux.tx/put {:crux.db/id 512 :oaeueoaueoa/id 1}]])))

;; blocks forever
  (crux/tx-committed? node
                      (crux/await-tx node
                                     (crux/submit-tx
                                      node
                                      [[:crux.tx/put {:crux.db/id 512 :oaeueoaueoa/id "1"}]])))

;; returns true
(crux/tx-committed? node
                      (crux/await-tx node
                                     (crux/submit-tx
                                      node
                                      [[:crux.tx/put {:crux.db/id 512 :oaeueoaueoa/idd "1"}]])))

jonpither11:11:26

@kevin842 Have raised an issue the attribute being :id - https://github.com/juxt/crux/issues/1274

jonpither11:11:43

Although your exact example is fixed by https://github.com/juxt/crux/pull/1273

jonpither11:11:32

Thanks for kicking the tyres

nivekuil11:11:22

my pleasure! thanks for getting it all fixed

jonpither13:11:58

If you try this snapshot, it should have rhe fixes [juxt/crux-lucene "20.09-1.11.1-alpha-SNAPSHOT"]

jonpither13:11:05

@kevin842 ^ - thanks again.

nivekuil13:11:16

works great :) I noticed something with the performance; using or with even a single clause slows it down dramatically. I was hoping to query two attributes in one query, but it looks like I'm better off making two crux/qs?

user> (quick-bench (q '{:find  [?e ?v]           :where [(or [(text-search :view/feed "xk*") [[?e ?v]]]                       )                   [?e :view/id]]})) Evaluation count : 6 in 6 samples of 1 calls.              Execution time mean : 245.380112 ms     Execution time std-deviation : 217.685410 ms    Execution time lower quantile : 151.759149 ms ( 2.5%)    Execution time upper quantile : 622.555545 ms (97.5%)                    Overhead used : 1.788354 ns  Found 1 outliers in 6 samples (16.6667 %)  low-severe  1 (16.6667 %)  Variance from outliers : 83.1035 % Variance is severely inflated by outliers nil user> (quick-bench (q '{:find  [?e ?v]                         :where [[(text-search :view/feed "xk*") [[?e ?v]]]                   [?e :view/id]]})) Evaluation count : 108 in 6 samples of 18 calls.              Execution time mean : 1.339781 ms     Execution time std-deviation : 28.631001 µs    Execution time lower quantile : 1.313759 ms ( 2.5%)    Execution time upper quantile : 1.385600 ms (97.5%)                    Overhead used : 1.788354 ns  Found 1 outliers in 6 samples (16.6667 %)  low-severe  1 (16.6667 %)  Variance from outliers : 13.8889 % Variance is moderately inflated by outliers nil

nivekuil14:11:36

oh, the search value can't be parameterized with :in?

(db/q          '{:find  [?id ?v]            :in    [input]            :where [[(text-search :view/feed input) [[?e ?v]]]                    [?e :view/id ?id]]}          "xk*")  ;; throws class crux.query.VarBinding cannot be cast to class java.lang.String

nivekuil14:11:08

resorting to this for now

(db/q
     `{:find  [?id ?v]
           :where [[(~(symbol "text-search") :view/feed ~input) [[?e ?v]]]
                   [?e :view/id ?id]]})

nivekuil14:11:32

ah, here's a bug for sure.

nivekuil14:11:32

(q '{:find  [?e ?v]
       :where [[(text-search :view/title "/") [[?e ?v]]]
               [?e :crux.db/id]]})

;; throws 
2. Unhandled org.apache.lucene.queryparser.classic.ParseException
   Cannot parse '/': Lexical error at line 1, column 2.  Encountered: <EOF>
   after : ""

      QueryParserBase.java:  114  org.apache.lucene.queryparser.classic.QueryParserBase/parse
                lucene.clj:  135  crux.lucene/search
                lucene.clj:  126  crux.lucene/search
                lucene.clj:  150  crux.lucene/full-text
                lucene.clj:  149  crux.lucene/full-text
                lucene.clj:  163  crux.lucene/pred-constraint/pred-get-attr-constraint
                 query.clj: 1161  crux.query/constrain-join-result-by-constraints/fn
                  core.clj: 2681  clojure.core/every?
                  core.clj: 2672  clojure.core/every?
                 query.clj: 1160  crux.query/constrain-join-result-by-constraints
                 query.clj: 1158  crux.query/constrain-join-result-by-constraints
                 query.clj: 1501  crux.query/build-sub-query/constrain-result-fn
                 query.clj: 1512  crux.query/build-sub-query
                 query.clj: 1474  crux.query/build-sub-query
                 query.clj: 1674  crux.query/query
                 query.clj: 1654  crux.query/query
                 query.clj: 1792  crux.query.QueryDatasource/fn
                 query.clj: 1791  crux.query.QueryDatasource/openQuery
                 query.clj: 1754  crux.query.QueryDatasource/query
                   api.clj:  374  crux.api/eval18945/fn
                   api.clj:  249  crux.api/eval18783/fn/G
                   api.clj:  337  crux.api/q
                   api.clj:  331  crux.api/q
               RestFn.java:  425  clojure.lang.RestFn/invoke
                      REPL:   15  app.db/q
                      REPL:   13  app.db/q
                      REPL:  496  app.db/eval122334
                      REPL:  496  app.db/eval122334
             Compiler.java: 7177  clojure.lang.Compiler/eval
             Compiler.java: 7132  clojure.lang.Compiler/eval
                  core.clj: 3214  clojure.core/eval
                  core.clj: 3210  clojure.core/eval
    interruptible_eval.clj:   91  nrepl.middleware.interruptible-eval/evaluate/fn
                  main.clj:  437  clojure.main/repl/read-eval-print/fn
                  main.clj:  437  clojure.main/repl/read-eval-print
                  main.clj:  458  clojure.main/repl/fn
                  main.clj:  458  clojure.main/repl
                  main.clj:  368  clojure.main/repl
               RestFn.java:  137  clojure.lang.RestFn/applyTo
                  core.clj:  665  clojure.core/apply
                  core.clj:  660  clojure.core/apply
                regrow.clj:   20  refactor-nrepl.ns.slam.hound.regrow/wrap-clojure-repl/fn
               RestFn.java: 1523  clojure.lang.RestFn/invoke
    interruptible_eval.clj:   84  nrepl.middleware.interruptible-eval/evaluate
    interruptible_eval.clj:   56  nrepl.middleware.interruptible-eval/evaluate
    interruptible_eval.clj:  155  nrepl.middleware.interruptible-eval/interruptible-eval/fn/fn
                  AFn.java:   22  clojure.lang.AFn/run
               session.clj:  190  nrepl.middleware.session/session-exec/main-loop/fn
               session.clj:  189  nrepl.middleware.session/session-exec/main-loop
                  AFn.java:   22  clojure.lang.AFn/run
               Thread.java:  832  java.lang.Thread/run

1. Caused by org.apache.lucene.queryparser.classic.TokenMgrError
   Lexical error at line 1, column 2.  Encountered: <EOF> after : ""

QueryParserTokenManager.java: 1119  org.apache.lucene.queryparser.classic.QueryParserTokenManager/getNextToken
          QueryParser.java:  822  org.apache.lucene.queryparser.classic.QueryParser/jj_scan_token
          QueryParser.java:  666  org.apache.lucene.queryparser.classic.QueryParser/jj_3R_3
          QueryParser.java:  702  org.apache.lucene.queryparser.classic.QueryParser/jj_3_1
          QueryParser.java:  646  org.apache.lucene.queryparser.classic.QueryParser/jj_2_1
          QueryParser.java:  225  org.apache.lucene.queryparser.classic.QueryParser/Query
          QueryParser.java:  215  org.apache.lucene.queryparser.classic.QueryParser/TopLevelQuery
      QueryParserBase.java:  109  org.apache.lucene.queryparser.classic.QueryParserBase/parse
                lucene.clj:  135  crux.lucene/search
                lucene.clj:  126  crux.lucene/search
                lucene.clj:  150  crux.lucene/full-text
                lucene.clj:  149  crux.lucene/full-text
                lucene.clj:  163  crux.lucene/pred-constraint/pred-get-attr-constraint
                 query.clj: 1161  crux.query/constrain-join-result-by-constraints/fn
                  core.clj: 2681  clojure.core/every?
                  core.clj: 2672  clojure.core/every?
                 query.clj: 1160  crux.query/constrain-join-result-by-constraints
                 query.clj: 1158  crux.query/constrain-join-result-by-constraints
                 query.clj: 1501  crux.query/build-sub-query/constrain-result-fn
                 query.clj: 1512  crux.query/build-sub-query
                 query.clj: 1474  crux.query/build-sub-query
                 query.clj: 1674  crux.query/query
                 query.clj: 1654  crux.query/query
                 query.clj: 1792  crux.query.QueryDatasource/fn
                 query.clj: 1791  crux.query.QueryDatasource/openQuery
                 query.clj: 1754  crux.query.QueryDatasource/query

nivekuil14:11:07

it does not seem to like slashes in the search. done poking around for now, thanks @U050DD55V

jonpither15:11:22

thanks @kevin842 will look at this one ^ - thanks again

refset15:11:11

> using `or` with even a single clause slows it down dramatically. I was hoping to query two attributes in one query, but it looks like I'm better off making two crux/qs? In your example the contents of the or will get treated as a subquery that implicitly executes after its sibling clauses. This means the engine will actually be scanning through all [e :view/id] entities and running the text search for each. In theory the query engine could be smart enough to lift the contents of the or, given that it only has one leg, and avoid the scanning altogether, but I'm sure that idea would involve other trade-offs. Is this essentially the question you are hoping to model: "Find all view entities (which necessarily have a :view/id) with their :view/feed value where that value begins with xk*"? Or is it something more subtle?

nivekuil15:11:46

@U899JBRPF: I want to find all entities that match some query with either of two attributes:

(db/q          `{:find  [?id ?v ?s]            :where [(~(symbol "or")                     [(~(symbol "text-search") :view/feed ~input) [[?e ?v ?s]]]                     [(~(symbol "text-search") :view/title ~input) [[?e ?v ?s]]])                    [?e :view/id ?id]]})

nivekuil15:11:17

I believe that has the semantics I want but is terribly slow

refset17:11:14

Thanks, will give this some thought 🙂 Please see our conclusions on querying with / https://github.com/juxt/crux/issues/1278#issuecomment-734396067 (documentation to follow!)

👍 3
nivekuil17:11:07

from what I can see or-text-search is searching from one of many values? My use case is maybe a little odd, in that I want to search one value from many attributes

nivekuil17:11:15

I think the many value case might be covered with a regex? But I definitely don't have a clear view of lucene.. lots going on with the tokenizers and analyzers

nivekuil18:11:25

Well, I don't think it's that odd of a use case really. You could think of a google search result as a title and a description that a query is run against

👍 3
refset18:11:55

Ah yep, that makes sense, it would be like or-wildcard-search I guess! The idea with that defmethod approach is you can define these search predicates as a user and make it work exactly as you wish. We've only just done the refactoring to make this possible though, and we will try to get it merged in time for the new release (within the next few days)

nivekuil18:11:48

Oh, I totally missed that it's user defined. Interesting!

🙂 3
nivekuil08:11:35

in summary, a doc with a tuple where (= (name attr) "id") and value is a string will never be indexed. My other domain entities with :foo/id point to uuids so they were fine, but one uses a string value for :bar/id

nivekuil08:11:29

and those docs were entirely missing

jonpither10:11:04

thanks @kevin842, we will take a look

sjharms22:11:12

For the document store, is there compression, or is that part of the choice to make?

refset23:11:10

There's no compression explicit in the Crux DocumentStore protocol or our doc-store module implementations. However RocksDB, for instance, certainly provides a lot of useful low-level compression by default (with many config options available also). I don't have data points to offer on exactly how well Rocks or any of the other doc-store backends typically fair with our Nippy-ified documents. I believe there's a lot of theoretical scope for improvement for doc-store implementations to implement some document-level structural sharing before any low-level compression takes place, i.e. do things more like git. Working in this area is not a near-term priority for us at the moment, but it's certainly important in the long run 🙂

thanks 3
sjharms23:11:21

Thank you for the considerate answer. I am storing random json I scrape from the web, and was just doing some testing to see the on-disk space across git, elasticsearch, postgresql, scylladb and was going to test out Crux next so that is helpful

refset00:11:31

No problem, and thank you for the extra context - although I am really not sure which of those I'd bet money on being the most space-efficient! I would be quite interested to find out what you discover.