Fork me on GitHub
#xtdb
<
2022-12-08
>
Stas Makarov13:12:46

Hi folks! I have troubles with full text search via Lucene among several related entities. I have data of the following shape:

{:user/name "John", :xt/id ...} ;; 10 entities

{:client/first-name "Bird" 
 :client/last-name "Whatever",
 :client/company-name "Optional"
 :xt/id ...} ;; ~100k entities

{:invoice/client :client-xt-id
 :invoice/user :user-xt-id
 ...
 ;; some other invoice fields like `reference`, `note` etc
} ;; ~100k entities
I want to find invoices that either: • has a client with s-str in “first-name”, “last-name” or “company-name” • has a user with s-str in “name” • has s-str in any of its attributes I usually need to retrieve small amount of items (10-100) from possibly pretty big dataset (100k+ items). I also need to do pagination and sorting on the result. I’ve come up with this query, which seems to work, but I guess it’s far form optimal.
(xt/q
  (db)
  {:find
   '[(pull ?invoice
           [*
            {:inv/client [:xt/id :client/first-name :client/last-name]}
            {:inv/user [:xt/id :user/name]}])]

   :limit 20

   :where
   [['?invocie :xt/id]

    (list 'or-join ['?invoice]
          (list 'and
                [(list 'lucene-text-search
                       (str "client\\/first-name: %1$s* OR"
                            "client\\/last-name: %1$s* OR"
                            "client\\/company-name: %1$s*")
                       s-str)
                 [['?client]]]
                '[?invoice :inv/client ?client])
          (list 'and
                [(list 'lucene-text-search
                       "user\\/name: %1$s*"
                       s-str)
                 [['?user]]]
                '[?invoice :inv/user ?user])

          [(list 'lucene-text-search
                 (str "invoice\\/note: %1$s* OR"
                      "invoice\\/reference: %1$s*")
                 s-str)
           [['?invoice]]])]})
I suspect the best way would be to implement custom Indexer, which would “gather” multiple related entities (Invoice, Client, User) into single Lucene document, that I can than query with single lucene-text-search. Am I right on that? Are there any other approaches?

tatut05:12:33

We’ve had similar issues with searching for a given parent doc, where the nested children may have some text… we ended up pulling some text search attributes into the parent as :search/… fields that are used for text searches

👍 2
Stas Makarov13:12:24

Regarding pagination and sorting: I’ve already discovered that naive approach with :offset and :limit leads to ineffective xtdb queries, so I guess I need to look into doing that with Lucene too. Something like that? https://stackoverflow.com/questions/963781/how-to-achieve-pagination-in-lucene

tatut06:12:19

for many use cases (like let’s say a customer list in a crm) where there won’t be tens of thousands of rows, it is possible to just query ids and the order column… then take a slice and do pulls for the actual info you need

tatut06:12:34

some cases an infinite scroll works, where you can filter based > than the last result

tatut06:12:12

sadly I haven’t found a one size fits all solution to this