xtdb 2020-10-18 | Slack Archive

witek10:10:42

Does the document oriented model of Crux in contrast to the fact oriented model of Datomic affect how I should model my entities/documents? Examples: 1. Is it ok to store frequently changed attributes like :tweet/view-count together with more static and "large" attributes like :tweet/text? Or should I have two documents? 2. Is it ok to model arity-many references as attributes of a document? Even if the attribute is expected to have thousands of values? Or should I use arity-one reverse references instead?

refset13:10:41

As it stands today none of Crux's doc-store module implementations attempt to do any clever structural sharing across revisions, so there invariably will be a storage (and network) overhead when frequently submitting large and mostly unchanged documents. Whether this inefficiency is cost-prohibitive for a given use-case would require some analysis. We have no immediate plans to add structural sharing to the existing doc-store implementations but it's something we could bring attention to fairly quickly if required. Note that a doc-store only ever stores one physical copy of any given revision, which is achieved by using the content-hash as the key. Splitting fast-changing attributes into a separate document (and therefore entity) is a useful pattern for performance optimisation and would have benefits regardless of whether the doc-store implements structural sharing. However the pattern shouldn't be applied too liberally as it introduces new complications when transacting and querying. In the extreme case of attempting to use a distinct document (& entity) per attribute you will almost certainly find performance is a lot worse 🙂 The Rocks/LMDB indexes do benefit from structural sharing though, so at least the storage overheads only accumulate in the one central place (the doc-store). How all this should impact your modelling is debatable, but certainly there are some advantages in being able to use vectors or sets as forward-reference containers. The disadvantages are mostly performance related when dealing with large documents, as you point out, but also positional information from vectors is effectively transparent to the indexes & Datalog, so the utility is limited (`eql/project` and entity will respect the original vector ordering, however). In general though I would usually lean towards modelling with reverse references by default (arity-one or otherwise)

euccastro11:10:40

is the use of custom EDN data literals supported? I'm using crux/tick and trying to store/query java.time.LocalDate instances, using the time-literals support provided by tick. all seems to work fine until I query the values, where I get a list like (. java.time.LocalDate parse "2020-10-18") instead of an instance of java.time.LocalDate (i.e., what that list would evaluate to)

euccastro11:10:47

I'm almost certain I'm hitting https://github.com/henryw374/time-literals/issues/3, which is an issue with clojure.core/read-string . I guess nippy uses that for deserializing, somehow?

euccastro11:10:26

anyway, I know how to work around that on my end, but is this kind of thing [P.S.: i.e., custom tagged literals] supported at all? otherwise I'll do some more custom EDN serialization instead

refset16:10:39

Hey again, sorry to chime in so late here. We haven't committed to supporting custom edn tags for the indefinite future but will have firm position either way in the next ~month or two. We are aiming to limit the scope of the value types we have to support following a General Availability release, (e.g. currently Nippy will happily freeze arbitrary Java objects, but it's not really something we want to tie Crux to).

refset16:10:17

There's an issue here which you'd be very welcome to comment on if you wish: https://github.com/juxt/crux/issues/312

dominicm22:10:06

@euccastro I think this is not related to crux. How are you creating your document?

euccastro00:10:04

thanks for replying! I know the bug is not in crux. the README of https://github.com/henryw374/time-literals acknowledges the limitation that read-string needs to be further passed to eval, and apparently that somehow affects nippy too. my question is whether using custom tagged literals (and hence, custom data types) as attribute values is supported in crux. user-defined tagged literals aren't mentioned anywhere in the docs AFAICT (in particular, not in crux's own EDN primer, but that could be because it's just an overview)

euccastro00:10:50

this would inform me to either do a specific workaround for this bug until it's fixed upstream, or instead make a serdes layer around the DB, so java.time.LocalDates etc. are seen as strings by crux

euccastro00:10:03

to reproduce, add a dependency on juxt/tick (or time-literals directly) and require the namespaces indicated in the README of either project to install the java.time serdes, transact a document that has a java.time.LocalDate instance as an attribute value, query for that attribute via either q or entity , and check the type of what you get

dominicm07:10:34

Crux and data literals are basically unrelated :). They simply are expanded to a data structure which crux reads.

dominicm07:10:49

Well, object, not data structure.

dominicm07:10:02

Crux only sees the local date (when this is done correctly)

euccastro12:10:19

yes, crux does get a LocalDate when I try and transact my data (I don't even use tagged literals up to that point; I pass progarmmatically constructed objects). but then, crux (or nippy) seem to be honoring any custom installed printers and readers for serdes

euccastro12:10:48

so the custom printer that time-literals installs will serialize my date as "#time/date \"2020-06-06\"" (which then nippy further compresses, I guess) and when I query that I get what the tag handler installed by time-literals would give me

euccastro12:10:00

indeed, I've just tried to see what happens if I try and transact a LocalDate without having installed the time-literals and I get an exception from nippy: Unfreezable type: class java.time.LocalDate

dominicm12:10:38

Nippy is dumber or smarter than I thought, depends where you're sat.

dominicm13:10:29

Yeah, this is generally fully supported afaik. I realize I didn't actually answer that.

dominicm13:10:39

Nippy is part of the contract.

euccastro16:10:12

thank you!

2020-10-18

Channels