Fork me on GitHub
#xtdb
<
2023-09-23
>
Vincent19:09:33

Hi, I am interested in doing "near match" queries for band names. I have the use-case where: musicians and music listeners add tracks to my database, I want to search them by band name and return a list of all the matches but band names are notoriously hard to spell sometimes, sometimes they use numbers, or have the word The in it, or missing, so I'm wondering if any of you seasoned xtdb vets could give me some pointers in doing a fuzzy match and preferably one that I can de-fuzz over time. I am thinking to add "project-ids" for each band/project but in the beginning it is a more laissez-faire upload/submit style, and i want to encourage more uploads with less rigid rules. Meaning, fuzziness encouraged in the beginning 😅 Okay I am noticing I am reinventing ElasticSearch. 1. how would I do this [without ES -- i'm an engineer and want to know :D] 2. can I do it in less than 2 weeks? 😃 3. how much complexity / overhead / dependency leaning is involved in adding ES (if you know)

Vincent19:09:10

to be clear, there will be a limited number of possible results and they will likely be super close if not always identical to the actual spelling, so i sense that ES is overkill

Vincent19:09:43

can i squish the name into a number and then do a range query over numbers

Vincent19:09:25

in theory that letter-by-letter tree representation of a string could be numberified, but then maybe range queries would not make sense unless the numbers were very large polynomials lol

Vincent19:09:43

I guess my ask from this whole inquiry is: would be awesome to have an "enable fuzzy search on this param" flag in xtdb that built a fuzzy index for it. if you want me to roll my own or use elasticsearch instead, i can also hold my peace and walk away 😂

ianjones21:09:15

forgive my ignorance but is this not something Apache Lucene could do for you? https://docs.xtdb.com/extensions/full-text-search/

😮 1
🙌 1
✅ 1
Vincent21:09:58

jackpot. thanks, i believe that is the right tool for the job. i guess it was liminally or non existently on my radar 😅 ty

ianjones21:09:38

I’ve wanted something similar, havent gotten around to implementing it yet but looks pretty straight forward!

Felipe22:09:26

and by the way Elasticsearch is Lucene in a way 🙂