datomic 2015-12-29 | Slack Archive

robert-stuttaford04:12:16

@currentoor: we use tx annos in a couple ways. 'who': every web-generated tx is tagged with the signed-in user who created it = easy audit trail. and our back-end event-stream processor tags its own txes as 'processed-by' so that it can keep track of what work it's done and has to do. i've also seen examples mentioned like marking a past tx as 'error', or marking a new tx as a 'correction'

robert-stuttaford04:12:18

things like that

robert-stuttaford04:12:17

@currentoor: on pagination, it's actually an interesting problem to solve. the problem is fundamentally this: Datomic doesn't do arbitrary sorting for you like SQL or Mongo do, beyond the sort order present in the 4 indexes (eavt aevt avet vaet). if you needed to e.g. sort a 3 'column' dataset by any of its columns ascending or descending, you're on your own. it's easy to implement, but not performant in the large. i went down the road of caching large data-sets in redis to make paginating and re-sorting the set faster, as all the work required to get the dataset to the point where it's sortable and ready for render is slow when you get to 10,000s and 100,000s

currentoor04:12:16

hmm, and how did that perform for you?

currentoor04:12:19

redis i mean

robert-stuttaford04:12:36

generating the initial set is still slow, but once cached, it's very fast

robert-stuttaford04:12:50

but this just moves the problem somewhere else: cache expiry

robert-stuttaford04:12:27

using core.memoize, the cache key is all the fn's args, one of which is the datomic db

robert-stuttaford04:12:40

so every time a new db is used, the cache is empty

currentoor04:12:48

i see

robert-stuttaford04:12:00

so, we're now looking into ways to reduce the total dataset size before you start sorting, by warning the user of the dataset size up-front and prompting them to apply filters to reduce it

robert-stuttaford04:12:19

because the likelihood that you're going to page through 1000s of records is ultra low

robert-stuttaford04:12:18

actual pagination code is very easy: (->> (d/datoms ...) seq (drop (* page-index page-size)) (take page-size))

robert-stuttaford04:12:58

you could have a datalog query or any other collection producing code at the beginning, of course, and you'd also sort before you drop+take

currentoor05:12:53

nice

currentoor05:12:03

thanks for showing

tcrayford10:12:44

@robert-stuttaford: would recommend also tagging transactions with: a) git sha of the process that produced it b) basic info about the http request (I just do method and path)

robert-stuttaford16:12:26

both great tips

robert-stuttaford16:12:42

i assume you inject the git sha into your build artifact somehow

Ben Kamphaus17:12:49

Just as a side note, any generated or domain supplied unique identifier on a transaction is great for dealing with retry logic, since you have to sync/coordinate after unavailability to see what made it in otherwise.

Ben Kamphaus17:12:56

Tim Ewald also covered some other use cases of Reified Transactions at the Datomic Conf portion of the Conj this year: http://www.datomic.com/videos.html

davebryand19:12:59

how do you guys think about creating partitions for your data? Should I be creating a different partition for every type of entity? So, if we had a notion of users, teams, games, stadiums we would do a separate partition for each?

curtosis20:12:33

@davebryand: as I understand it, partitions (primarily) drive index locality, so you want to keep entities you work with together a lot under the same partition. It really depends on how you use teams/games/stadiums/etc.

curtosis20:12:06

(I can imagine use cases for those entities where each strategy could be more appropriate.)

curtosis20:12:31

presumably someone with more experience will correct me if I'm wrong

davebryand20:12:18

gotcha—so depending on the app logic, it might make sense to have a partition per team or something, if that’s a common query pattern?

davebryand21:12:23

anyone know if there is a way to expand a transaction map form into a list form for debugging?

kschrader23:12:54

has anyone tried to use multiple count functions in a query?

kschrader23:12:21

I’m seeing behavior where it seems to sum up across all of the counts instead of giving me individual counts

kschrader23:12:42

or perhaps multiplying both values together…

Ben Kamphaus23:12:47

@kschrader: can you share an example of what you want the output to like like and a version of the query, obfuscated from your domain if need be?

kschrader23:12:26

@bkamphaus: give me a second to put together a minimal example

kschrader23:12:23

If I only use one of the count statements I get the expected result

kschrader23:12:35

@bkamphaus: but using both of them seems to multiple the values together and return that value for both statements

kschrader23:12:03

it should be 7 projects and 1264 stories

Ben Kamphaus23:12:38

@kschrader: let me think through setting up an analogous query with mbrainz to test, and see expected behavior. What happens if you put ?org in a :with clause ( http://docs.datomic.com/query.html#with )

kschrader23:12:58

same behavior

kschrader23:12:03

@bkamphaus: need to head home, can you email me (kurt at http://clubhouse.io)?

Ben Kamphaus23:12:35

@kschrader: sure

kschrader23:12:37

thanks

currentoor23:12:08

Should I store a JSON blob as a string or bytes?

2015-12-29

Channels