I’m trying to learn more about datalevin (especially with the new idoc being interesting).
I got my test db into a state in which I have a datom:
[18 :doc/md {:something "Officia laboris id laborum."}]
and in a datalog query it seems I can find (fulltext $ "something") but not (fulltext $ "Officia")
(i.e. the former returns correct ref and the latter empty results set)
This is purely for educational educational for me atm, I’m just trying to understand how indexing and search works.
Unfortunately like I said I was just messing around, in a very un-systematic, so I can’t really reproduce what got me into that state,
I know there were some (d/re-index ...) calls thrown into the mix 🤷
I was just curious to know if that is expected in certain situations and is there a fix once one gets into this state?
fulltext requires :doc/md to have :db/fulltext true. Is that the case?
yes it is (via (d/schema …))
:doc/md
#:db{:doc "Markdown idoc",
:valueType :db.type/idoc,
:idocFormat :markdown,
:fulltext true,
:aid 13}there's no such property fulltext, nor aid
the above is copied directly from the output of calling (d/schema conn)
the schema was declared as
:doc/md {:db/doc "Markdown idoc"
:db/valueType :db.type/idoc
:db/idocFormat :markdown
:db/fulltext true}
if that’s what you mean?what's the transacted data? markdown takes a string
yes it was transacted as a string like this one: "# something\n Officia laboris id laborum"
when I list the datoms I can see, something along the lines of:
[18 :doc/md {:something "Officia laboris id laborum."}]
are you able to idoc-get that?
let me check, I haven’t tried that
yes, that works
(def doc (:doc/md (dl/entity (get-db) 18)))
( doc :something)
;; => "Officia laboris id laborum. [truncated]" it could be a bug, file a GitHub issue if you can isolate a test case.
Thanks for looking at this! I will try to reproduce but it was messy I can’t quite figure out what got me to this state
What’s interesting is that it seems to be ‘broken’ on the level of terms i.e. if I insert new content with the same words, those are not findable either e.g.
(dl/transact! conn
[{:doc/md "# New thing\nOfficia laboris id laborum. Some new content-123"}])
…still (fulltext $ "Officia") won’t work (as it didn’t for previously inserted md content), but the ‘new’ words e.g. (fulltext $ "conent-123") are findable 🤷
I will try to find a way to reproduce it, but just leaving the above in case it’s enough of a clue (no need to reply, if buys).could be a tokenization problem insert an "An" in front of "Officia"
also remove capitalization and try again
Added a test, couldn't reproduce the problem.
so transacting a new :doc/md with “An officia” does not help (i.e. still can’t find “officia”) but “Anofficia” helps, as it is a completely new word I suppose
(dl/transact! conn
[{:doc/md "# Yet another new thing\nAn officia laboris id laborum. Some new content-456"}])
;; vs
(dl/transact! conn
[{:doc/md "# Yet another new thing\nAnofficia laboris id laborum. Some new content-456"}])you said you tried re-index, can you try again?
ran (d/re-index conn schema {}), that closes the db, got the connection again, queried, same result i.e. those queries that previously failed are still failing
Then it is not a search engine term index corruption problem, because re-index is basically dump and load. The search engine is recreated. You can look at the dump to see what the datom entries looks like.
you can dump the db, and attach the file if you are willing to file an issue.
sure, of course, it will be a very small file anyway since this was new db, for me to try and understand idoc etc.
thanks!
I’ve created an issue. Not sure how helpful it’s going to be as there are not clear steps to reproduce, as I was just playing around with the db. There is also a repo with schema and dumps linked. I will keep an eye on this and see if this issue repeats for me. Thanks again!
Thx