datahike 2021-10-20 | Slack Archive

Jack Park21:10:21

I got the postgres example test running. I learned that it stores an entire Datom as a binary blob in a k-v store. Which leads to this question: is it safe to assume it indexes e a and v outside that so that you don't need to break them out into columns in the table? There are two indexes, konserve_data_id_key and konserve_data_id -- I'm trying to figure out how it works; the source code doesn't make that easy.

metasoarous21:10:19

IIUC, the values (in the kv store) aren't single datoms, but rather blocks of some number of datoms, ordered according to the particular index. It's somewhat more complicated than this, because datahike is based on the hitchhiker tree data structure (hence the name). If you're interested in digging into the internals, I recommend learning more about the hitchhiker tree. You can start here on github: https://github.com/datacrypt-project/hitchhiker-tree

metasoarous21:10:59

Probably best actually to start with the inventor's Strange Loop talk: https://www.youtube.com/watch?v=jdn617M3-P4

metasoarous21:10:52

In short, this is expected, and you shouldn't expect to be able to directly query the underlying database and have it make any sense. Stick to the datahike api.

Jack Park22:10:25

Indeed! But...what about the kinds of deep text search that, e.g. :lucene adds?

metasoarous22:10:23

Great question! I'm curious what @U1C36HC6N, @UB95JRKM3 et al have to say about this, but my guess is that if you want to do something like that you'll need to copy over text data to another database for this sort of thing. Presumably, you'd only want this for certain text attributes, so you could use a https://cljdoc.org/d/io.replikativ/datahike/0.3.6/api/datahike.api#listen to catch additions or retractions related to whatever set of attributes you decide on, and update your secondary store accordingly.

👍 2

Linus Ericsson12:10:46

I'm guessing but Lucene also renders binary index data structures which are probably serialized and stored inside the data structure of datahike.

Jack Park16:10:04

@UQY3M3F6D that's a surprising idea. I always imagined that Lucene indexing would occur outside as a kind of "consultant" - only really needed for things like deep text queries; there's a plugin for postgres which indexes any text field with lucene you wish to index and streamlines queries to use it.

Jack Park16:10:38

In truth, my queries in this thread are my way of learning datomic and clojure together. For me, it's not quite enough to just use Datahike as documented, but to fully understand its code and how it works. For the time being, the way indexes in postgres are made is, for me, completely opaque.

richiardiandrea21:10:37

FWIW xtdb added Lucene support and it stores data as binary so it might be doable - found some docs here https://github.com/xtdb/xtdb/blob/fde3663243a1b92ed24ae75b40c6d891c423ef39/docs/extensions/modules/ROOT/pages/full-text-search.adoc#L6

👍 1

kkuehne19:10:02

interesting, thanks, I should have a look how xtdb integrates that.

2021-10-20

Channels