How do you do phrase searches in Datalog? I read the search doc (https://github.com/juji-io/datalevin/blob/master/doc/search.md) but that only mentions how to do it in the search syntax DSL. I have my own working implementation, but I would like to use the built-in one, especially if I can combine it with boolean expressions (which also only seem to be for the special search DSL?). And sorry for only checking out the new capabilities now, @huahaiy !
That’s not a phrase if it has stop words in it. “a” is a stop word. What you see is entirely expected
Phrase should not contain stop words
Hmm ok. I guess I could go without stop words then?
And why do both "had a" and "a little" both work then? It doesn't really seem consistent.
Seems more like a bug to me 🤷♂️
"had a" is "had", "a little" is "little", both are single word
single word will match.
Ok, so it automatically reduces into a single word if the edge is a stop word?
stop words are removed
Aha, now I get it.
Thanks for explaining. I'll have to build some kind of workaround.
you can supply your own analyzer
The search DSL is just a boolean search expression, where phrase search is just one of its features. fulltext function takes such an expression as the query. A string query is just a special case.
What's the question?
Oh, you mean the Datalog search example in the doc is not the same as the standalone search one. It is the same. Just replace "red fox" with a search expression. It works.
Let me update the doc.
Please see the updated doc.
It also shows the domain semantics.
phrase search requires :index-position? to be true
Thanks, @huahaiy, I'll try that out 🙏
The second example query in the search doc returns the empty set on my machine, not #{[2 :text "Mary had a little lamb whose fleece was red as fire."]} . Hmmm? 🤔 I am Datalevin 0.9.22.
It's in the test: https://github.com/juji-io/datalevin/blob/7deb6ba431cdb716332fc02e0e131e3e1e0d5668/test/datalevin/test/query_fns.clj#L456
Notice you need this: https://github.com/juji-io/datalevin/blob/7deb6ba431cdb716332fc02e0e131e3e1e0d5668/test/datalevin/test/query_fns.clj#L435
Ah, I think it might have something to do with the fact that an old temp db wasn’t removed prior to running the code. I see that in your test code it is a randomly generated dir, that is the only real difference I can spot.
I am running into other issues though. The phrase "had a little" doesn’t return any results, for example, while "a little lamb" does. I was thinking maybe this was stopword-related, but "a little" and "had a" both return results.