Fork me on GitHub
#datomic
<
2016-05-18
>
casperc17:05:46

I am looking to generate some queries based on user input. I think this is doable but given that Datomic doesn’t have a query optimiser, how do I make sure that I order the clauses in my :where in the right (or at least a reasonably right way)?

casperc17:05:10

Are there some guidelines that I can go by when generating the query?

gardnervickers17:05:50

Are datomic s3 backups just the new transactions since the previous backup?

sdegutis17:05:50

Is it possible for a transaction to have only partially completed due to java.lang.OutOfMemoryError?

marshall18:05:24

@gardnervickers: Since 0.9.5130 Datomic backups have been incremental if they’re issued against the same storage location: http://docs.datomic.com/backup.html#differential-backup

marshall18:05:17

@sdegutis: Transactions are atomic, so they either complete successfully or fail, there is no way to have a ‘partial transaction’; did you see an OOM error on the transactor?

marshall18:05:08

@casperc: Are you generating the entire query or just altering parameters based on the user input?

sdegutis18:05:31

Phew. @marshall I just verified that it did not partially go through. Thank goodness for ACID compliance I suppose.

sdegutis18:05:17

@marshall: I think so... I tried to d/transact-async a hundred thousand entities into existence, and got java.lang.OutOfMemoryError.

marshall18:05:14

@sdegutis: create 100,000 entities within a single transaction? that is a bit on the large size for number of datoms in a single txn - do you need to be able to create those together atomically?

sdegutis18:05:43

@marshall: Probably not, I'm devising a way of splitting this data migration into multiple transactions. Think I found a way.

sdegutis18:05:56

@marshall: It does need to be done within the same 20 minutes though.

marshall18:05:45

I’d definitely recommend splitting something that size up. I don’t think it should be particularly hard to get it through in 20 minutes, of course it will depend on the specifics of your system and schema, etc.

sdegutis18:05:46

@marshall: Someone yesterday mentioned that Datomic recomends a maximum of 10 billion datoms in a database. After this migration we'll have gone from 5 million to 7 million, which eases my mind considering it's not even 1000th of the max.

sdegutis18:05:07

But I just didn't anticipate that it would be too big for a single transaction.

sdegutis18:05:23

But yeah I've got me an idea for splitting it up.

marshall18:05:19

Incidentally, the “Understanding and Using Reified Transactions” talk here: http://www.datomic.com/videos.html discusses a few approaches to large operations that span transaction boundaries

casperc20:05:39

@marshall: I am generating the entire query. Our data model forms a DAG and I am generating a :where clause joining from (if that is the right way to put it) one of the leaf nodes to the root.

casperc20:05:32

It might just be that it is not a problem though if I put the clauses with input parameters first and then just join up towards the root.

marshall20:05:19

@casperc: If your user-paramaterized clauses narrow the dataset fairly substantially, that sounds like a reasonable place to start. I’d recommend against premature optimization and tend to worry about making it faster only if you see significant perf issues

bkamphaus20:05:11

@casperc: might be helpful to look at the code in the mbrainz sample database for generating rules: https://github.com/Datomic/mbrainz-sample/blob/master/src/clj/datomic/samples/mbrainz/rules.clj and the resulting rules: https://github.com/Datomic/mbrainz-sample/blob/master/resources/rules.edn for graph traversal for collaborating artists.

casperc20:05:17

@marshall: Sound advice, I’ll see how it performs. 🙂 I guess I was looking for some reference material of some sort for generating the query

casperc20:05:54

@bkamphaus: Perfect, I got my wish delivered 🙂

marshall20:05:09

Right, the other thing I was going to say was that it sounded like a recursive rule might fit the problem, depending on your schema.

bkamphaus20:05:30

@sdegutis: if you haven’t yet, might want to check out the transaction pipeline example here: http://docs.datomic.com/best-practices.html#pipeline-transactions — though that’s for the step after you break up the transaction. (you put transactions on a channel that the tx-pipeline function would take from).

bvulpes21:05:25

is there any way that calling (d/tempid :db.part/user) in lazy-seq's would result in producing the same db/id?

bvulpes21:05:38

(when i go to transact the lazy seq, i mean

bkamphaus21:05:34

@bvulpes: so not generally but two possible issues: 1 - messing up the code so you just generate it once and repeat the generated value, and 2 - transaction functions that generate tempids can unintentionally conflict with tempids generated on the peer

bvulpes21:05:00

thanks bkamphaus

bvulpes21:05:18

ran it down to a mistaken db/unique