datalevin

prnc 2026-02-09T17:19:59.254539Z

I’m trying to learn more about datalevin (especially with the new idoc being interesting). I got my test db into a state in which I have a datom: [18 :doc/md {:something "Officia laboris id laborum."}] and in a datalog query it seems I can find (fulltext $ "something") but not (fulltext $ "Officia") (i.e. the former returns correct ref and the latter empty results set) This is purely for educational educational for me atm, I’m just trying to understand how indexing and search works. Unfortunately like I said I was just messing around, in a very un-systematic, so I can’t really reproduce what got me into that state, I know there were some (d/re-index ...) calls thrown into the mix 🤷 I was just curious to know if that is expected in certain situations and is there a fix once one gets into this state?

Huahai 2026-02-09T17:25:19.132079Z

fulltext requires :doc/md to have :db/fulltext true. Is that the case?

prnc 2026-02-09T17:26:37.674829Z

yes it is (via (d/schema …))

:doc/md
 #:db{:doc "Markdown idoc",
      :valueType :db.type/idoc,
      :idocFormat :markdown,
      :fulltext true,
      :aid 13}

Huahai 2026-02-09T17:27:37.903719Z

there's no such property fulltext, nor aid

prnc 2026-02-09T17:28:38.945829Z

the above is copied directly from the output of calling (d/schema conn)

prnc 2026-02-09T17:29:40.869499Z

the schema was declared as

:doc/md           {:db/doc         "Markdown idoc"
                      :db/valueType   :db.type/idoc
                      :db/idocFormat  :markdown
                      :db/fulltext true}
if that’s what you mean?

Huahai 2026-02-09T17:31:03.052969Z

what's the transacted data? markdown takes a string

prnc 2026-02-09T17:33:25.699009Z

yes it was transacted as a string like this one: "# something\n Officia laboris id laborum"

prnc 2026-02-09T17:33:53.121069Z

when I list the datoms I can see, something along the lines of: [18 :doc/md {:something "Officia laboris id laborum."}]

Huahai 2026-02-09T17:33:56.951719Z

are you able to idoc-get that?

prnc 2026-02-09T17:35:42.692299Z

let me check, I haven’t tried that

prnc 2026-02-09T17:38:39.722939Z

yes, that works

(def doc (:doc/md (dl/entity (get-db) 18)))
  ( doc :something)
  ;; => "Officia laboris id laborum. [truncated]"

Huahai 2026-02-09T17:39:21.266229Z

it could be a bug, file a GitHub issue if you can isolate a test case.

prnc 2026-02-09T17:40:32.774839Z

Thanks for looking at this! I will try to reproduce but it was messy I can’t quite figure out what got me to this state

prnc 2026-02-09T17:59:02.918979Z

What’s interesting is that it seems to be ‘broken’ on the level of terms i.e. if I insert new content with the same words, those are not findable either e.g.

(dl/transact! conn
    [{:doc/md "# New thing\nOfficia laboris id laborum. Some new content-123"}]) 
…still (fulltext $ "Officia") won’t work (as it didn’t for previously inserted md content), but the ‘new’ words e.g. (fulltext $ "conent-123") are findable 🤷 I will try to find a way to reproduce it, but just leaving the above in case it’s enough of a clue (no need to reply, if buys).

Huahai 2026-02-09T18:03:46.616339Z

could be a tokenization problem insert an "An" in front of "Officia"

Huahai 2026-02-09T18:04:49.801569Z

also remove capitalization and try again

Huahai 2026-02-09T18:08:16.710839Z

Added a test, couldn't reproduce the problem.

prnc 2026-02-09T18:13:05.595069Z

so transacting a new :doc/md with “An officia” does not help (i.e. still can’t find “officia”) but “Anofficia” helps, as it is a completely new word I suppose

prnc 2026-02-09T18:14:04.652669Z

(dl/transact! conn
    [{:doc/md "# Yet another new thing\nAn officia laboris id laborum. Some new content-456"}])

  ;; vs

  (dl/transact! conn
    [{:doc/md "# Yet another new thing\nAnofficia laboris id laborum. Some new content-456"}])

Huahai 2026-02-09T18:15:32.364089Z

you said you tried re-index, can you try again?

prnc 2026-02-09T18:19:31.660649Z

ran (d/re-index conn schema {}), that closes the db, got the connection again, queried, same result i.e. those queries that previously failed are still failing

Huahai 2026-02-09T18:23:56.232089Z

Then it is not a search engine term index corruption problem, because re-index is basically dump and load. The search engine is recreated. You can look at the dump to see what the datom entries looks like.

Huahai 2026-02-09T18:25:05.029569Z

you can dump the db, and attach the file if you are willing to file an issue.

prnc 2026-02-09T18:26:57.282479Z

sure, of course, it will be a very small file anyway since this was new db, for me to try and understand idoc etc.

👍 1
prnc 2026-02-09T18:27:32.161629Z

thanks!

prnc 2026-02-11T18:14:24.906959Z

I’ve created an issue. Not sure how helpful it’s going to be as there are not clear steps to reproduce, as I was just playing around with the db. There is also a repo with schema and dumps linked. I will keep an eye on this and see if this issue repeats for me. Thanks again!

👍 1
Huahai 2026-02-11T18:17:26.513289Z

Thx