Fork me on GitHub
#datomic
<
2017-01-11
>
robert-stuttaford06:01:14

@favila, may i suggest you scan through the Datomic changelog?

robert-stuttaford06:01:38

http://my.datomic.com/downloads/free > click Changes in the the first row in the table

pesterhazy09:01:07

what does everyone else use in terms of quick 'n dirty db manipulation/transformation helpers?

robert-stuttaford09:01:09

@pesterhazy every-val is (into #{} (map :v (seq (d/datoms db :aevt attr)))) 🙂

pesterhazy09:01:59

indeed it is

robert-stuttaford09:01:06

can do the same with (map :e) for every-eid

robert-stuttaford10:01:43

and you can switch every-entity to use a transducer (sequence (map #(d/entity db %)) (every-eid db attr))

robert-stuttaford10:01:27

oh, you want a set; (into #{} (map #(d/entity db %)) (every-eid db attr))

robert-stuttaford10:01:56

you could do the transducer thing in every-val too of course

pesterhazy10:01:52

good thing about datoms is that it's lazy

pesterhazy10:01:08

so maybe no do the set thing...

kurt-yagram10:01:29

In previous versions of datomic, one would use :db/id #db/id[:db.part/user -X] to add entity references in a transaction. In the newer verions, one can use :db/id "someid", which is really cool. However, what if the entity is in another part of the database? (If one has a entity reference like :db/id #db/id[:db.part/custom -X], how to use the newer version of datomic to reference this? :db/id "someid" will write it in the db.part/user, but I want it in another part.)

robert-stuttaford11:01:31

@kurt-yagram the v1 syntax is still supported

kurt-yagram11:01:00

yeah, I know, question is, can we use the 2nd syntax together with the 'named' references somehow?

zalky14:01:39

Hi all. I've been trying to understand the performance implication of datomic's partitions. Specifically, the datomic docs state that "entities you'll often query across... should be in the same partition to increase query performance". It just so happened that I ended up with a group of entities with a similar ":entity/type" attribute across two partitions that were being returned on a query result, and when I fixed it, there was no effect on performance. The database is relatively small. My thinking was that maybe it had to do with how the db is being cached in memory, but I'm really not sure. Can anyone shed more detailed light on how poor use of partitions might decrease performance? Specifically, a simple test I could run where I would see the performance effect of partitions? Thanks!

robert-stuttaford14:01:21

@zalky less index segments need to be read into peer memory from storage if all the datoms necessary for a query are partitioned together

robert-stuttaford14:01:33

this means less read-side pressure — on storage itself, in the peer cache (when it fills up) and in the 2nd-tier cache (memcached). for small databases (or really any database that fits in peer ram) this doesn’t really matter at all. large databases can suffer read performance if they have to read a lot of segments in for normal queries and so cause the cache to churn

robert-stuttaford14:01:51

basically, unless you think you’re going to have a large database, i wouldn’t worry about it 🙂

robert-stuttaford14:01:32

the fact that the newest release gives us a simpler partitioning scheme (essentially just sticking to one user partition) shows that it’s not such a worry for most users

zalky14:01:41

Thanks @robert-stuttaford for the response. So to test out the performance effects, do you think it would be sufficient to generate a db of sufficient size, then run a query across partitions? A second question: are there any performance drawbacks of just using a single partition? Or is it just the flip side of the coin: not much to worry about for most use cases.

robert-stuttaford14:01:56

you’d have to generate a substantial database, and be watching your peer and storage metrics pretty carefully to notice a difference

robert-stuttaford14:01:28

unless you’re planning a big database and you have strict read-side SLAs to conform to, i wouldn’t worry for now

zalky15:01:37

Gotcha, thanks again for the response.

sova-soars-the-sora19:01:21

What's the preferred way of keeping track of an ordered sequence in a datomic store? say i have lots of elements that users of my site vote on, and from these votes I derive a ranking... how can I keep track of a ranking? :item/rank ? seems a bit funky to try and keep uniqueness.

robert-stuttaford19:01:47

this is a great question, @sova.

robert-stuttaford19:01:30

you could use a linked list - 2 links to 1, 3 links to 2, which makes it so small changes don’t require renumbering everything

robert-stuttaford19:01:42

but this does make determining position costly to do, due to having to traverse the list

eoliphant19:01:42

question, datomic/datalog doesn’t seem to have a concept of paging, offset, etc? How do folks usually implement something like this ? Just walk back and forth through the list of keys?

robert-stuttaford19:01:52

or you could store a vector of entity ids in a string as a pr-str edn blob

zalky19:01:53

robert,if you modified your suggestion to also have :item/rank, reindexing would be a matter of swapping :item/rank, no?

robert-stuttaford19:01:17

but that means managing a large string for large collections

robert-stuttaford19:01:42

if you do rank, then (as with linked list) items can only participate in a single ranking list

robert-stuttaford19:01:16

if you do rank, you’d have to re-calc all the items between the lower and upper rank for any given change

zalky19:01:30

yes indeed

robert-stuttaford19:01:31

e.g. something moving from 72 to 45 means altering all of 45 through 72

robert-stuttaford19:01:46

which may be totally ok

marshall19:01:29

It's the linked list vs array CS question. Depends on your use pattern

robert-stuttaford19:01:21

do you know of anyone actually doing either at large scale, @marshall ? e.g. 1000s or 10,000s or gulp 1,000,000s?

marshall19:01:11

Not off the top of my head

sova-soars-the-sora20:01:50

Well the rankings will be changing rather rapidly...

sova-soars-the-sora20:01:58

As people vote on things

sova-soars-the-sora20:01:09

Maybe it's better to calculate them on the fly. But I like the idea of persisting :item/rank in storage. Then I can do some excellent time-travel and see how the entity went down or up in rank over time.

sova-soars-the-sora20:01:50

As you said, Robert, "or you could store a vector of entity ids in a string as a pr-str edn blob" .. this is an interesting suggestion. Just keeping a file of all the entity-ids in their rank-order.... Hmm.. I'll have to consider my options a bit.

sova-soars-the-sora20:01:24

@eoliphant i'm curious about pagination as well. Based on some googling... http://docs.datomic.com/clojure/#datomic.api/seek-datoms seems very useful

sova-soars-the-sora20:01:08

In my own use case I think I'm set on every item having its own unique :item/rank ... so I'll have to make sure they are unique and that they get iteratively updated when the rankings change... I'm curious how this will function on the order of thousands of elements. will let y'all know eventually 🙂

jdkealy20:01:45

If I wanted to import data into a local DB and then push it to prod, can I be using datomic:free locally and restore to datomic:pro ?

eoliphant20:01:00

Yeah i’d looked at that @sova , but i’m needing to do it with query results sometimes

jaret21:01:18

@jdkealy Yes you can backup a free and restore into a pro. The only restriction is that backup to S3 is only available in Datomic Pro

jdkealy21:01:16

ah cool thanks... so when you back up... it makes many directories. would you then tar it, scp onto your transactor server and then backup from local?

marshall21:01:20

You can do that. Alternatively, if you're using pro or starter locally you can backup directly to s3 and restore in production from s3

timgilbert22:01:04

FWIW, in a past project I just kept an :item/rank integer and recalculated it on move. It worked fine for me. I was always using pretty small lists (on the order of 10-30 elements or so), and reading the lists was much more common than reordering the lists

jdkealy22:01:48

i can never seem to get the export to s3 command right

jdkealy22:01:28

not sure what i'm doing wrong but i actually just tried on a fresh install after running ensure transactor

timgilbert22:01:41

Say, I have a general question. I want to produce a filtered database value representing the universe of data that a specific user can access (based on security rules). I'm thinking about doing this by using (d/with) to nuke a bunch of entities out of the database value before passing it down to code that will run client-provided pull patterns against it. Is this likely to be performant, or is there a better way to do it?

timgilbert22:01:26

Another option I've considered is using (d/filter), but my rules are too general to filter at the datom level (eg, Bob can see Alice if and only if Bob's project has the same company as Alice's project), so it doesn't seem applicable

favila22:01:14

@timgilbert surely you can express that as a query?

timgilbert22:01:58

Yeah, and to a limited extent I've been doing it with rules

timgilbert22:01:54

But my dream system does this at the top level and then I don't need to add the same boilerplate logic into every single one of my queries

Lambda/Sierra22:01:30

@timgilbert It is difficult to do this kind of access filtering while still supporting the full syntax of datalog queries or pull expressions. It is usually easier to define your own query syntax, perhaps as a subset of datalog or pull expressions, and enforce the access rules in your query evaluator.

Lambda/Sierra22:01:00

Retracting a large number (thousands?) of entities with d/with is unlikely to perform well. d/filter with complex queries will also not be "fast."

timgilbert22:01:03

Hmm, ok, I guess I'll reevaluate my approach

favila22:01:10

@timgilbert d/filter may be fast enough. It doesn't sound like you have tried it yet?

timgilbert22:01:17

The problem we're trying to address is that if our clients send us straight-up pull patterns, they can traverse from entities that they should be able to access to entities they shouldn't be, like Alice is an admin for two different companies, and suddenly Bob from BobCo can see all the data in CarlCo by back-navigating through :company/_admin or whatnot

timgilbert22:01:16

@favila, I'll think about it some more but since the argument to the filter predicate is a single datom I don't think it will work

timgilbert22:01:39

Like I have a semantic context that I'm trying to enforce

favila22:01:44

@timgilbert it's a db and also a single datom

timgilbert22:01:14

Oh, hmm, didn't realize that, thanks!

favila22:01:47

@timgilbert that said, sounds like you could also either whitelist pull attrs (preprocess the pull expr) or completely cover over the search/retrieval "ops" the users are allowed, so that you know they are safe

Lambda/Sierra22:01:47

If you can't trust the client, don't accept raw d/pull patterns. It will be hard to ensure you've covered all the cases for restricting access.

timgilbert22:01:21

Yeah, that's what I've been learning. 😉

Lambda/Sierra22:01:55

Think of it like SQL: you wouldn't let your clients send in raw SQL queries.

timgilbert22:01:23

I mean, I wouldn't accept raw SQL for a postgres-backed service... haha jinx

timgilbert22:01:00

Ok, well I'll think about this some more. Thanks for the advice @stuartsierra and @favila