This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-10-31
Channels
- # ai (5)
- # announcements (11)
- # beginners (19)
- # biff (1)
- # calva (8)
- # cider (3)
- # clj-kondo (12)
- # clojure (97)
- # clojure-europe (39)
- # clojure-nl (1)
- # clojure-norway (74)
- # clojure-uk (35)
- # clojurescript (8)
- # component (8)
- # conjure (4)
- # cursive (13)
- # data-science (1)
- # datahike (55)
- # datomic (2)
- # emacs (3)
- # etaoin (6)
- # gratitude (1)
- # hoplon (12)
- # hyperfiddle (54)
- # introduce-yourself (1)
- # lsp (70)
- # missionary (40)
- # music (1)
- # off-topic (79)
- # re-frame (78)
- # releases (4)
- # sql (5)
- # squint (9)
- # tree-sitter (4)
- # xtdb (20)
Is there a way to write a custom LuceneIndexer
such that, for the doc {:xt/id 1 :text [{:text/html "foo"}]}
, association of 1 :text "foo"
is made in Lucene?
I've taken a crack at it, but text searches through xt/q
return nothing. Using xtdb.lucene/search
I see the documents have been indexed. My guess is that I'm missing the fact that the nested map (with :text/html
) is hashed before being indexed by XT?
Hey @U02E9K53C9L I suspect the issue is that the field-xt-val
is expected to be identical to the actual v
in the doc during this resolving function (which is essentially a temporal filter) https://github.com/xtdb/xtdb/blob/2fc600840c11aae31ee0246b2d400928d98e3f4e/modules/lucene/src/xtdb/lucene.clj#L190
I would recommend just working with xtdb.lucene/search
directly if you can (and not attempting to use/customise/fork the text-search Datalog function)
Thanks, Jeremy. If I could hash the actual v
(the map) before passing it to field-xt-val
, would that work?
As I understand it: {:text/html "foo"}
is hashed (based on its Nippy serialisation) and stored as v
in the ave
index (in RocksDB) but with your example doc you are storing "foo"
as the field-xt-val
(in the Lucene index) - so upon querying via text-search
the resolve-search-results-a-v
function (linked above) will therefore not be able to find any entries for "foo"
as the v
value in ave
and the result will be filtered out.
In principle you could override the definition stored under field-xt-val
if you never needed it for wildcard searching, however I don't think it would be straightforward to calculate and store the hash needed for comparison, since Lucene isn't storing binary here, it's only working with text.
for your use case It's likely more useful to view the xtdb.lucene namespace as illustrative, like if you don't need field-xt-val
then just don't store it (but that also means you have to handle your own temporal 'resolve' step)
Thank you for the thorough explanation! I can kind of see how it all fits together. But, I think I'll leave the exercise for the future and just drop the nested map for now.
Can I use the same strategy in a custom resolve-search-results-a-v-wildcard
or does db/ave
expect a nippy'd v
? (I'm assuming using the index is faster than xt/q
above.)
:thinking_face: I'd also wish to replace the text-search
q
predicate, since I'm using it as part of larger queries.
I believe db/ave
can accept a raw value, because it uses https://github.com/xtdb/xtdb/blob/f65c4a398584ec03830509baa2085f3a880a1694/core/src/xtdb/kv/index_store.clj#L97-L106 within https://github.com/xtdb/xtdb/blob/f65c4a398584ec03830509baa2085f3a880a1694/core/src/xtdb/kv/index_store.clj#L757

> I'm assuming using the index is faster than xt/q
above
quite possibly, but I wouldn't be surprised if the difference is marginal (might be worth a quick microbench)
I'm curious to know more about implementing custom backends using XTDB. I looked through the source code, already, but I'm not really sure what I'm looking for. Could one glue XTDB to plain text backends, or git repositories? Assuming you had the appropriate transaction history?
Hey @UF7A9T2P4 in principle, yep, you can do such things by creating your own independent modules and wiring them into your config. The pluggable storage protocols in the 1.x are relatively small and well defined: https://github.com/xtdb/xtdb/blob/9a379bcb188ab37451344c5c935017a9d163addb/core/src/xtdb/db.clj#L72-L86 and https://github.com/xtdb/xtdb/blob/9a379bcb188ab37451344c5c935017a9d163addb/core/src/xtdb/db.clj#L100-L112 You can see implementations of those protocols in various places around the repo, e.g. for KV storage https://github.com/xtdb/xtdb/tree/master/core/src/xtdb/kv
Just messing around, might be a terrible idea, might be cool. Seems like a good learning experience regardless.
> a good learning experience regardless for sure! I learned a lot building https://github.com/xtdb-labs/crux-redis/blob/master/src/crux/redis.clj KV module for Redis and getting the generative tests working etc.
@U050CTFRT has at least thought about implementing a git backend, though possibly at the site level rather than XT? Either way he might have some insight