Fork me on GitHub
#datalevin
<
2023-11-21
>
mhuebert21:11:43

"Key cannot be larger than 511 bytes." - this seems to limit the size of any attribute that is fulltext indexed?

Huahai21:11:23

No. If you are using Datalog, this is an implementation detail that doesn’t concern you. Data has no practical size limit.

Huahai21:11:14

This 511 bytes limit is for key value store. Since you mentioned “attribute”, I assume that you are not talking about the kv store feature, then there is no practical limit, or put more precisely, the limit is 2GB, the maximal possible byte buffer size on JVM. So it is perfectly ok to store huge documents in Datalevin, unlike Datomic.

mhuebert02:11:00

I'm getting the error while transacting to datalog, where an attribute's value is greater than 511 bytes. The error stack shows that it's within the search code so I was thinking the value was being used as a LMDB key somewhere in the search indexing process Can post the full error tomorrow

mhuebert10:11:57

Execution error (ExceptionInfo) at datalevin.binding.java.LMDB/transact_kv (java.clj:498).
Fail to transact to LMDB: #error {
 :cause "Key cannot be larger than 511 bytes."
 :data {:input [221156 235 "XXX 527 chars long string XXX"]}
 :via
 [{:type clojure.lang.ExceptionInfo
   :message "Key cannot be larger than 511 bytes."
   :data {:input [221156 235 "XXX 527 chars long string XXX"]}
   :at [datalevin.binding.java.DBI put_key "java.clj" 183]}]
 :trace
 [[datalevin.binding.java.DBI put_key "java.clj" 183]
  [datalevin.binding.java$transact_STAR_ invokeStatic "java.clj" 256]
  [datalevin.binding.java$transact_STAR_ invoke "java.clj" 242]
  [datalevin.binding.java.LMDB transact_kv "java.clj" 487]
  [datalevin.search$add_doc_STAR_ invokeStatic "search.clj" 688]
  [datalevin.search$add_doc_STAR_ invoke "search.clj" 644]
  [datalevin.search.SearchEngine add_doc "search.clj" 349]
  [datalevin.storage$fulltext_index invokeStatic "storage.cljc" 531]
  [datalevin.storage$fulltext_index invoke "storage.cljc" 526]
  [datalevin.storage.Store load_datoms "storage.cljc" 393]

mhuebert10:11:37

using this analyzer

(sut/create-analyzer
                                   {:tokenizer
                                    (sut/create-regexp-tokenizer
                                      #"[\s:/\.;,!=?\"'()\[\]{}|<>&@#^*\\~`]+")
                                    :token-filters [(sut/create-ngram-token-filter 2)]})