Fork me on GitHub
#datomic
<
2015-08-26
>
sdegutis03:08:01

Is it typical that (d/connect "datomic:) takes 1678 msecs?

tcrayford10:08:53

@sdegutis: it depends on a lot of things

tcrayford10:08:09

e.g. if the dev transactor is doing a full GC, that latency could be even larger

viesti10:08:09

(hello “world”)

viesti10:08:19

was chatting with colleagues about using Datomic in an open source project

viesti10:08:51

Datomic is really neat technically but how to tackle the question about keeping your data in a closed system

robert-stuttaford10:08:55

@jonas i can talk to you about pagination

tcrayford10:08:39

@robert-stuttaford: curious to hear your take. I have a lot of thoughts about it

robert-stuttaford10:08:59

@sdegutis: a new connect is also downloading the live index from the transactor

tcrayford10:08:20

I'd expect that to typically be very small in the dev transactor, but maybe that's just my use case 😉

robert-stuttaford10:08:27

we’re actually struggling with pagination at the moment

robert-stuttaford10:08:47

but i think our issue is conceptual more than it is a fault of Datomic's

robert-stuttaford10:08:31

you have 100k entities. you want to see the 100 most recently active ones. you have to get all 100k, sort by activity date descending, then grab the first 100

robert-stuttaford10:08:37

how to break this work up?

robert-stuttaford10:08:08

we’re doing things like memoizing (on a redis backend) with the db value so that once you’ve generated the set, pages 2..n are super fast

robert-stuttaford10:08:18

but making that initial set is still super slow

robert-stuttaford10:08:40

we’re pre-calculating as much as we can, too, so that queries only need look at a single attr per entity

tcrayford10:08:47

@robert-stuttaford: in that case, raw index walking isn't too hard, right?

robert-stuttaford10:08:49

but that’s not always possible

robert-stuttaford10:08:07

it is if you want descending order

tcrayford10:08:34

just do an arohner and write an attribute in 2512 or whatever

robert-stuttaford10:08:53

storing ever descending values doesn’t work. it’s not really a solution if you have lots of existing data

robert-stuttaford10:08:51

if it were easy to enable some sort of ‘clutch’ and allow edits in the past, then it’s easy to fix with ETL

robert-stuttaford10:08:18

but it’s not. to do that we’d have to retransact our entire db in time order and alter txes on the fly

robert-stuttaford10:08:36

we’re at over 40mil txes

tcrayford10:08:41

oh, he stores them in ascending order by inverting them from Long/MAX_VALUE. Now I understand 😉

tcrayford10:08:40

conceptually all you'd need is access to an inverted index (which seems… relatively doable?)

robert-stuttaford10:08:54

right now, we take the sort dimension you want to use, realise the full set for just that one ‘attr’ (might be computed, might be direct lookup), sort, paginate, then realise the rest of the data for each ‘row'

jonas10:08:10

tcrayford: surely that depends on the value type?

robert-stuttaford10:08:30

we’ve cut a lot of processing time like this, and i’ve got it all using datalog and transducers as much as possible

robert-stuttaford10:08:37

but it still takes long for big sets

tcrayford10:08:57

(many folk have asked for inverted indexes though, so I assume they have good reasons for not doing it yet)

robert-stuttaford10:08:38

we have to find a better way. if we didn’t need to sort, then you can paginate very easily. unfortunately, unsorted data is fairly useless in a reporting context. sorting’s the real perf pain.

tcrayford10:08:10

yeah 😞 And conceptually, the indexes have the already sorted data, just there's no way to ask datomic for it 😞

robert-stuttaford10:08:20

i would actually dig to have a 1 or 2 hour hangout with you tom, and whoever else has tried their hand at this to talk about novel options

robert-stuttaford10:08:27

i can talk through what we’ve done so far

robert-stuttaford10:08:40

what’s worked, how well, etc

tcrayford10:08:47

I uh, haven't tried anything

tcrayford10:08:58

I could write Yeller's database down on two or three sheets of paper

tcrayford10:08:04

so I don't have a sorting problem simple_smile

jonas10:08:14

@robert-stuttaford: That would be great! I will need to read through and respond later. I have a few ideas myself as well

tcrayford10:08:57

(actually it's bigger than that now, thinking about it. Still, like the number of entities is below 1k)

robert-stuttaford11:08:19

yeah no we are WAY beyond that

robert-stuttaford11:08:24

3 years of user data

tcrayford11:08:40

@robert-stuttaford: aren't y'all paid users? Lean on dat support contract

robert-stuttaford11:08:01

it’s not a datomic support issue. datomic isn’t doing anything wrong

robert-stuttaford11:08:08

it’d be a consulting gig

tcrayford11:08:43

no, but a "how do I use your product to do $COMMON_TASK" thing imo (and I think it is a datomic issue, because there's no inverted index access)

tcrayford11:08:27

like, if you had a d/datoms-reversed or whatever, this'd be trivial

robert-stuttaford11:08:50

for descending sorts, yes

robert-stuttaford11:08:01

i’m looking into pre-processing with Onyx and creating Sorted Sets in Redis now

jonas11:08:32

I would like (the possibility) to get sorted sets out of the datalog queries where you can specify the sort-order. Then you could also specify offset/limit

robert-stuttaford11:08:36

as Onyx is processing our data tx by tx, it can update many sets pretty quickly

robert-stuttaford11:08:03

@jonas: yep, although it’s all still going to happen app-side

robert-stuttaford11:08:30

if Datomic provides this, it’s going to be a layer around d/q, not a new internal part of it

robert-stuttaford11:08:38

and we can pretty much do that ourselves

robert-stuttaford11:08:56

that’s a big fat assumption on my part, of course

robert-stuttaford11:08:10

i don’t have any sort of insider knowledge or anything 😁

jonas11:08:36

I agree we can do it ourselves and that’s an idea I’m exploring

robert-stuttaford11:08:25

anyway. i have to be off. i’d love to show you guys what we’re doing at the mo, as it might help you, but also it’ll probably help me because you’ll likely poke holes in all of it simple_smile

robert-stuttaford11:08:43

perhaps a hangout sometime in September?

tcrayford11:08:24

I'm interested, but September is kinda bad for me 😐

shofetim13:08:34

So from the docs http://docs.datomic.com/clojure/index.html#datomic.api/q it looks I can (and perhaps should prefer?) to write queries as maps rather then vectors, but whenever I try it, I get "java.lang.IllegalArgumentException Don't know how to create ISeq from: clojure.lang.Symbol" am I doing it wrong, or maybe the docs are describing an as yet unreleased API? (I'm running 0.9.5206 which I think is the latest)

jonas13:08:51

I don’t think the map form is preferred (except for when you’re generating queries programmatically). The IllegalArgumentException is probably unrelated. Note that when using the map form you need to wrap the “arguments” in an extra vector (or list): {:find [?a ?b ?c] …} instead of [:find ?a ?b ?c …]

sdegutis14:08:28

What settings do you use for the development transactor?

sdegutis14:08:12

Do you ever change the min/max memory for it?

sdegutis19:08:01

What are the advantages or disadvantages of using maps to describe transactions vs using vectors?

sdegutis19:08:20

What would you typically want to use and when would you use the other kind?

bensu19:08:59

@sdegutis: maps usually refer to a single entity and they are easier to generate since you can assoc attributes with values in.

bensu20:08:25

@sdegutis: I'm not 100% confident on this next point: vectors might be the only way to leverage user defined or built in functions like :db.fn/retract-entity

sdegutis20:08:16

I'm thinking so too.

sdegutis20:08:46

When would you want to use transact-async over transact?

sdegutis21:08:50

I guess I don't understand why transact returns a future when it's not async and waits for it to complete anyway.

Alex Miller (Clojure team)21:08:14

so the result is not built if it's not needed

Alex Miller (Clojure team)21:08:40

(would be my guess - I'm not on the datomic team)

sdegutis21:08:09

Oh that could be it.

arohner21:08:26

am I allowed to assume txids are monotonically increasing?

arohner21:08:38

@alexmiller: those are ts, though, not txids?

arohner21:08:19

looks like I can avoid it and just d/pull the txid’s txInstant, and sort those

Alex Miller (Clojure team)21:08:49

I think it is logical that txids would be as well (for ordering in the index, plus they are serialized at creation time), but I don't know that that is guaranteed

sdegutis22:08:27

I have some pretty messy code to automatically resolve the :tempids of a transaction. Is this common, or is there a better pattern that people use?

bensu22:08:01

@sdegutis: for what is worth I also have a tx->ids function.

bensu22:08:07

(not pretty)