datomic 2017-10-25 | Slack Archive

csm00:10:48

we’re using a separate attribute, that contains the ordering as an edn string (it’s a list of #uuids, since we give each entity a GUID primary key)

potetm02:10:13

@alexisvincent The other choices I know of are: an index attr (array) or an attr that points to the next item (linked list).

devn05:10:25

RE: lucene I really wish there was some tiny amount of support for custom tokenization, even if the use of it meant all performance guarantees were off.

alexisvincent09:10:19

@csm Hm, I suppose thats an approach. But you loose queryability.

alexisvincent09:10:24

@potetm I’m wondering if a generic datastructure lib for datomic would be useful or if they should be baked into each instance by hand

Matt Butler10:10:46

@devn I agree, its a great feature (fulltext) thats almost there, so even its not exposing lucene directly, some form of control would make all the difference, at least in my case 🙂

gerstree13:10:27

I was wondering if anyone has a good strategy to prove a datomic backup.

gerstree13:10:27

We run the backup job every hour, backing up to s3. When the backup finishes, we sync the s3 folder to something not AWS.

gerstree13:10:18

There we restore using a local transactor with dev storage

gerstree13:10:29

While all the different steps work perfectly fine, I would love to be able to verify that we restore exactly what we backup.

gerstree13:10:46

As far as I understand, running bin/datomic list-backups gives a list of t's that are based on folder names in 'backup-folder/roots'. Nothing more nothing less. Is that correct?

gerstree13:10:35

Ideally I am looking for a checksum for 't' at backup time, that I verify at restore time for that same 't'.

robert-stuttaford13:10:16

@alexisvincent if you take the [[eavt] [eavt] ..] data model - how would you model arbitrary cardinality-many order with it? this is what Datomic has to somehow do for you. it turns out that either you have to model it with extra eavt’s yourself (index attr / linked list), or you have to affect the index’s own sort, which of course messes with the indexing algorithms. guess which one Rich picked 🙂

alexisvincent13:10:29

@robert-stuttaford thanks for the answer, the choice makes sense. I’ve been thinking about this a bit and here’s what I’ve more or less arrived at: 1. Order shouldn’t be baked into the data itself (since we can have multiple orderings per list), but rather is a semantic structure on top. 2. Order isn’t only a performance booster, (i.e. give me everything, I’ll sort it myself), but also vital for expressive queries, (e.g. limiting query to first 5 items of an ordering). --- Maybe an approach would be to provide user defined indexes as named orderings that can be specified at query time?

alexisvincent13:10:41

@robert-stuttaford Do you do this via datastructures embedded into datomic?

robert-stuttaford13:10:56

yeah, that old chestnut … performant sort + pagination

robert-stuttaford13:10:37

it’s an interesting problem, with no one correct answer

augustl13:10:53

seems to me that any ordering mechanism other than "whatever it is ordered in when you walk the data" requires the whole dataset to be in memory for a sort first, no matter how you do it

robert-stuttaford13:10:29

yep

augustl13:10:49

so, you can get insertion order in datomic, at least. Right?

alexisvincent13:10:09

@augustl Could also have a lazy index

augustl13:10:21

this is the only thing I found after a quick google https://www.postgresql.org/message-id/33721.67.116.52.35.1090035034.squirrel%40mail.redhotpenguin.com

augustl13:10:34

is that a lazy index?

augustl13:10:52

if so, Datomic kind of has that I suppose, since it merges the actual main datom tree periodically, not on every transaction

alexisvincent13:10:53

So for instance, given ordering based on popularity of photos, you’re more likely to view top photos, and so that would be hot in cache

alexisvincent13:10:40

I meant on demand order resolution, only when you need it

augustl13:10:52

as in you maintain a subset of popular photos and sort those only?

robert-stuttaford14:10:28

you’d have to maintain this recency/popularity index yourself - to add things when they become recent/popular, and to remove them when they stop being either

alexisvincent14:10:45

@robert-stuttaford I’ve also run into this brain bug where I’m not so sure how to handle versioning. Say for instance you want to track file revisions, you could use datomics ‘as-of’, but… versioning is actually a ‘first class’ problem of the domain. Also, when you want to deal with data imports you might want to specify a realworld time not a datomic transaction time. How do people solve these problems in the datomic world?

robert-stuttaford14:10:26

you can annotate your entities with any datetime value, including transaction entities

robert-stuttaford14:10:51

and then write explicit queries against those

robert-stuttaford14:10:10

as-of and since are great for slicing and dicing what the transactor did. they are both performant implementations of d/filter, which you can make yourself

robert-stuttaford14:10:33

after having tried this, i’ve found that a straightforward datalog query got me there quicker 🙂

augustl14:10:42

some colleagues of mine has run into something similar. They want to use datomic's time model, for tracking the position of some GPS data. But the GPS data can be delayed. So they have a delay of the maximum expected real world delay, through a queue, for inserting the data into datomic

augustl14:10:16

@robert-stuttaford using d/filter instead of as-of, interesting

augustl14:10:41

makes sense, as-of just does the same binary search on the sorted sets that all other operations do on the index, I suppose

alexisvincent14:10:33

@augustl I actually mean, define a global ordering on all the photos, then when you make your query where clause you do something like this

[id :user/photo ?photo] 
[(order ?photo :date)]

The order index could then be constructed lazily. You would still need to scan all the data, but you wouldnt need to store it all in memory all the time.

alexisvincent14:10:40

@robert-stuttaford In this case the time is actually needing to be set on the datom (arrow) which unfortunately aren’t first class entities in datomic (I think)

alexisvincent14:10:17

the best we have is groups of datoms as entities (transactions)

alexisvincent14:10:51

So for instance we could have different times we need to set for datoms in a transaction. For the moment I’m approaching this by adding a multi-relational arrow, implemented via an intermediary entity. Anyway, don’t want to pull you away from your work 🙂

Brendan van der Es18:10:41

Anyone know if there are any reusable specs for the datomic api? [ e.g. (s/def :db/id (s/or keyword? int? vector?)) ]

hmaurer20:10:41

@augustl wouldn’t it make more sense to have two notions of time in the system, the time of events and the recording time of events (the later being the “datomic time”)

augustl20:10:30

maybe, I wasn't aware of using d/filter instead of d/as-of as @robert-stuttaford mentioned previously

2017-10-25

Channels