Fork me on GitHub
#datomic
<
2016-04-25
>
pheuter13:04:02

Is it considered idiomatic to sort Datomic result sets outside of the query? For larger result sets, is it more performant to use Datomic functions?

pheuter13:04:47

Upon an initial read through the docs, it doesn’t seem like there’s a SQL order by equivalent.

Ben Kamphaus14:04:33

@pheuter: that’s correct, there’s not an order by equivalent at present and it’s necessary to sort outside query. If the default sort of an indexed attribute (in :avet) works for your use case, datoms/`seek-datoms`/`index-range` are tools you can use to lazily iterate/page through facts.

pheuter14:04:33

bkamphaus: thanks! that’s good to know, index-range looks like what I was looking for.

pheuter14:04:59

i wonder how expensive avet indexes are

Ben Kamphaus14:04:42

It’s not really expensive to create/maintain. I’d just turn it on for anything I ever suspected I’d want to look something up by simple_smile The cost incurred by indexing is fairly trivial, and storage, well, usually really cheap given the size of most Datomic dbs. We’ve considered just having all attributes indexed by default but thus far haven’t done so. (the only thing you want to avoid indexing is anything overly blob/document like).

pheuter14:04:24

makes sense, good to know nothing funky going on with avet

pheuter14:04:41

not being turned on by default spooked me a little bit

casperc14:04:41

I am having some trouble with a transaction where d/with and d/transact gives different answers. Am I correct in thinking this should really not ever happen?

casperc14:04:59

The discrepancy between d/with and d/transact i mean

pheuter14:04:59

transact takes a connection I believe and applies the tx-data to the latest version of the database whereas with takes a particular database value which may not be the latest one.

pheuter14:04:31

also, applying transactions using with will not affect the source database passed in

Ben Kamphaus14:04:48

@casperc: as @pheuter mentions there could be a time discrepancy between what occurs over the conn since you don’t transact to a db value directly. Another case could be using with on a database with e.g. an as-of filter. The as-of filter will prevent the resulting database from seeing the data added by with (it’s after the as-of-t)

casperc15:04:55

Yes, I should mention that the database is not being written to, so the db is not changing and should be up to date. I am using a transaction function remove values containing lookup-refs if the entity that they reference doesn’t exist. d/with correctly removes the lookup-refs but when transacting the same thing the transaction fails due to the lookup ref not being removed. So something is going on with a function which is not acting the same when run on the transactor compared to in the peer using d/with.

Ben Kamphaus15:04:00

@casperc: with a transaction function, there are definitely some possible differences, the primary being how arguments to the transactor function are serialized. If you’re using Clojure specific collection logic on args the java level interface behavior is preserved in parameters passed over the wire (i.e. they’re java.util.Collections, etc.) it may be if you’re checking to see if something is an instance of a vector, for example, with the transactor function run in the peer (when the arguments don’t go over the wire) it evaluates true, but on the transactor it evaluates false.

casperc15:04:02

That would explain it. I am indeed checking if something is an vector to see if it is a lookup ref.

casperc15:04:20

@bkamphaus: How would you recommend that I check if something is a lookup ref?

casperc15:04:56

This is my current check:

(and (vector? v) (= 2 (count v)) (keyword? (first v)) (= :db.type/ref (:db/valueType (d/entity db k))))

casperc15:04:34

Where v is the value and k is the attribute.

Ben Kamphaus15:04:58

I’m not sure from the use case if I’d use a transaction function? Is there a race by peers to write something first? If it’s unique/identity and what you want is “write a new entity if this doesn’t exist, otherwise add/change the existing entity”, that’s basically how upsert with unique/identity behaves.

Ben Kamphaus15:04:49

For the other use case of just removing transactions that don’t match to anything, if there’s not an expectation that peers are racing to create, etc. it might be a use case for running that processing on the peer and then just submitting the cleaned transaction.

Ben Kamphaus15:04:17

{:db/id (d/tempid :db.part/user)
 :unique/id “myId"
 :some/fact “some value”}
{:db/id [:unique/id “myId”]
 :some/fact “some value”}
If :unique/id “myId” entity exists and it’s unique identity upsert will mean those two transactions behave the same. Under those same assumptions it will create a new one if it doesn’t exist with the first form, and fail with the second form.

casperc15:04:03

Well I might get by using d/sync and that is probably the route I will end up taking, but there are multiple threads transacting at the same time which can cause a race condition for my specific case.

Ben Kamphaus15:04:32

Right, which does imply handling it in a transaction function, I just wasn’t sure about the behavior for handling that you want. I.e. if the lookup ref fails just drop those datoms on the floor? Or create it instead of using the lookup ref? Which points as relying on upsert behavior instead.

casperc15:04:34

Ah, if it doesn’t exist, I drop the datom with the lookup ref.

Ben Kamphaus15:04:45

and no retry of that fact later, etc. or reporting that that fact was invalid anywhere?

casperc15:04:31

Currently our dataset is incomplete, which is why I drop the refs that don’t exist.

casperc15:04:47

Later it will result in a failed transaction.

casperc15:04:03

I can tell that you have some amount of aversion to me using transaction functions though, so I’ll take that into account 😉

Ben Kamphaus15:04:52

If it’s a temporary solution, I would probably just invoke the invalid lookup ref dropping logic on the peer when the transaction is built (prior to it being submitted). But it’s an odd use case to me, because the precision/isolation guarantee seems somewhat arbitrary. I do get that it’s not going to be the production logic in the end.

Ben Kamphaus15:04:33

The aversion is that transactor functions that do things like walk all the transaction data to clean them can end up being huge performance bottlenecks simple_smile I just always try to push back to see if there’s an optimistic concurrency strategy that makes sense peer side, or if the time-based logic is not really about serialized, atomic updates of entities but just about preferring the time by coincidence that shows up on the transactor vs. the peer.

casperc15:04:35

Point taken simple_smile And honestly, a d/sync would probably do the trick but since I was seeing a difference between d/with and d/transact I was wondering why.

casperc15:04:42

Seems like a bit of a gotya that should probably be documented if it isn’t simple_smile

Ben Kamphaus15:04:24

I agree, I’ll take a stab at adding it to the transactor function docs.

casperc15:04:52

Sounds good. Thanks for the help making sense of things though.