This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-12-08
Channels
- # adventofcode (49)
- # announcements (2)
- # architecture (4)
- # babashka (48)
- # babashka-sci-dev (4)
- # beginners (7)
- # biff (1)
- # calva (14)
- # cider (6)
- # clj-kondo (1)
- # clj-yaml (1)
- # cljsrn (3)
- # clojure (14)
- # clojure-art (12)
- # clojure-europe (62)
- # clojure-nl (1)
- # clojure-norway (35)
- # clojure-uk (5)
- # clojurescript (18)
- # clr (4)
- # community-development (9)
- # conjure (2)
- # core-async (3)
- # cursive (2)
- # datomic (2)
- # emacs (8)
- # events (3)
- # graalvm (1)
- # helix (6)
- # holy-lambda (3)
- # jobs (1)
- # off-topic (16)
- # polylith (30)
- # practicalli (11)
- # reitit (5)
- # shadow-cljs (14)
- # slack-help (10)
- # xtdb (6)
Hi folks! I have troubles with full text search via Lucene among several related entities. I have data of the following shape:
{:user/name "John", :xt/id ...} ;; 10 entities
{:client/first-name "Bird"
:client/last-name "Whatever",
:client/company-name "Optional"
:xt/id ...} ;; ~100k entities
{:invoice/client :client-xt-id
:invoice/user :user-xt-id
...
;; some other invoice fields like `reference`, `note` etc
} ;; ~100k entities
I want to find invoices that either:
• has a client with s-str
in “first-name”, “last-name” or “company-name”
• has a user with s-str
in “name”
• has s-str
in any of its attributes
I usually need to retrieve small amount of items (10-100) from possibly pretty big dataset (100k+ items).
I also need to do pagination and sorting on the result.
I’ve come up with this query, which seems to work, but I guess it’s far form optimal.
(xt/q
(db)
{:find
'[(pull ?invoice
[*
{:inv/client [:xt/id :client/first-name :client/last-name]}
{:inv/user [:xt/id :user/name]}])]
:limit 20
:where
[['?invocie :xt/id]
(list 'or-join ['?invoice]
(list 'and
[(list 'lucene-text-search
(str "client\\/first-name: %1$s* OR"
"client\\/last-name: %1$s* OR"
"client\\/company-name: %1$s*")
s-str)
[['?client]]]
'[?invoice :inv/client ?client])
(list 'and
[(list 'lucene-text-search
"user\\/name: %1$s*"
s-str)
[['?user]]]
'[?invoice :inv/user ?user])
[(list 'lucene-text-search
(str "invoice\\/note: %1$s* OR"
"invoice\\/reference: %1$s*")
s-str)
[['?invoice]]])]})
I suspect the best way would be to implement custom Indexer, which would “gather” multiple related entities (Invoice, Client, User) into single Lucene document, that I can than query with single lucene-text-search
. Am I right on that? Are there any other approaches?We’ve had similar issues with searching for a given parent doc, where the nested children may have some text… we ended up pulling some text search attributes into the parent as :search/…
fields that are used for text searches
👍 2
Regarding pagination and sorting: I’ve already discovered that naive approach with :offset
and :limit
leads to ineffective xtdb queries, so I guess I need to look into doing that with Lucene too. Something like that? https://stackoverflow.com/questions/963781/how-to-achieve-pagination-in-lucene