xtdb 2019-06-02 | Slack Archive

hoppy02:06:32

having a bit of trouble groking this statement from dox: "This is possible because Crux is "schemaless" and automatically indexes the top-level fields in all of your documents to support efficient ad-hoc joins and retrievals."

hoppy02:06:01

If top-level fields consist of, say, a big map, is the value of said map indexed? If not, is this a viable way of deciding what should/should not be indexed (by burying it in a map)? This seems to imply that in some way, but that only works if it doesn't recurse the top-level values in the index.

hoppy02:06:43

not sure if it's trying to say "everything should be a top level field" or "top level fields are special"

Jorin10:06:57

I asked the same question in here a while ago. Unfortunately Slack history doesn’t stay. As far as we could figure out an answer, the values of the top level keys are turned into bytes and it’s limited to the first X. Then a hash is used for the index id. If I remember correctly it’s somewhere around this logic in the code: https://github.com/juxt/crux/blob/master/src/crux/codec.clj#L100

refset21:06:13

Thanks @U8ZN5EHGU, and yep that's about right. It's definitely a case of "top level fields are special". The idea being that you can choose to recursively break your documents into document subgraphs as/when it makes sense (either at the point of ingestion or later), for more granular indexing. We don't have any generalised helper code or decorator for this just yet, but that's the plan

hoppy14:06:42

yeah, so still not getting it then.

hoppy14:06:08

if my doc is "keys" + "wad of stuff", I'm not sure how "wad of stuff" would not end up expressed (recursively) in some top level field

refset22:06:02

@U19EVCEBV "wad of stuff" doesn't get indexed recursively/automatically. Right now you would have to do that yourself

hoppy17:06:53

my question revolves more around how to avoid it by hiding it from the indexer, and not have it be treated like a ByteArray that was serialized, or somesuch.

jjttjj21:06:21

Hi all, first day playing around with crux. Is it possible to query for entities that fall within a range of valid times?

refset21:06:49

Hi 🙂 good question. You can do this with a regular Datalog query if you also include the valid time as a user-defined timestamp inside your document (i.e. user-defined time or domain time). Unfortunately the existing bitemp indexes can't help with temporal range queries over multiple entities like this, although see history-range for doing so at the level of a single entity

jjttjj21:06:53

Cool, thank you!

👍 4

2019-06-02

Channels