Fork me on GitHub
#datomic
<
2017-10-23
>
Empperi13:10:30

hi, I have a very specific problem to solve which might sound very generic one in the beginning. In short, I’m trying to implement LIMIT functionality with Datomic but my use case really doesn’t allow me to use for example the datoms API and pull based approach really doesn’t work either since that does the limitation on attribute level

Empperi13:10:18

so, I’m kinda like trying to solve the LIMIT problem just by using the q based queries

Empperi13:10:50

since queries done with it are eager and it doesn’t support limit itself I’m kinda out of ideas

Empperi13:10:41

the reason why I need to stick with q and really cannot use datoms API is that we are creating a SPARQL endpoint to our data and thus we really need to actually perform datalog queries

Empperi13:10:02

also, I think I wouldn’t be able to solve this via datoms API either since when making actual queries I think the query engine uses all the indices available and with datoms I’m restricting myself to single index

Empperi13:10:19

so even if I do get access to all datoms via datoms API, I cannot exactly do stuff like:

Empperi13:10:30

that executes but will not provide the same results as straight query due to different indices (or well, that exact query just might, but in general you can’t rely on that)

Empperi13:10:54

and besides, doing limit functionality on datoms level there wouldn’t give me correct results since then limiting is done too early

Empperi13:10:15

so, am I just screwed or is there some hidden feature somewhere which would allow me to do LIMIT?

Empperi13:10:37

let me read that through with thought

dominicm13:10:23

LIMIT somewhat implies order afaik, which I don't think Datomic has. Datomic uses sets. We've run into a desire for LIMIT before, and had to work around it using d/datoms (and we could, in our situation).

dominicm13:10:40

Perhaps you could write a custom aggregate for this, but I suspect there isn't one for a reason.

Empperi13:10:59

yeah, we are actually going to need ordering too but we already have a plan for that

Empperi13:10:24

that is: add ordering information as metadata, do the ordering at post via code

Empperi13:10:46

not optimal but nothing too bad really

Empperi13:10:58

limit is much harder nut to crack and has much bigger performance implications

dominicm13:10:48

I think your LIMIT solution, is coupled to your ordering situation. You can't LIMIT until you've ORDERed

Empperi13:10:07

yeah, true

Empperi13:10:14

this sucks 😕

Empperi13:10:29

love datomic but this is a real problem

Empperi13:10:40

I can guess it would be there if it was straightforward to do

Empperi13:10:37

did a quick test with sample, execution time is same with sample and if you just get all the data out

Empperi13:10:42

so not really helping

Empperi13:10:42

I’m slowly starting to lean on that we need to add LIMIT to “not supported” list

Empperi13:10:49

which would be a huge letdown

augustl13:10:33

if you need to sort and limit a dataset, you'll need to have the whole thing in memory at some point, don't you

augustl13:10:06

I can't think of any way for a RDBMS to sort and then limit, on the database server, without having the whole working set in memory

augustl13:10:40

(and if I can't think of an algorithm to do that off the top of my head, then obviously it cannot exist, right?)

Empperi13:10:43

I guess doing it naively after making the query is our best bet here, at least we can reduce the amount of data sent over the wire that way

Empperi13:10:02

and at least it would improve performance in ORDER BY + LIMIT scenario, since one needs to do the sorting only as long as the LIMIT has been reached

Empperi13:10:25

but then again, I’d guess the performance benefit would be marginal at best

Empperi13:10:37

just trying to find something positive here 😄

hmaurer14:10:54

@niklas.collin so, I haven’t gotten into a case where I had to implement this yet, but this problem bothered me a bit and I thought of two solutions:

hmaurer14:10:33

(1) if there is only one ordering you care about (e.g. a newsfeed, where you want to retrieve the top N entries), store the data in a format which allows for this specific query to be efficient. e.g. a linked list

hmaurer14:10:05

(2) if (1) does not work (e.g. because you need to sort arbitrarily), build a materialised view of your database using the tx report queue

Empperi17:10:11

@hmaurer both good ideas but unfortunately not usable in this case, thanks for your input and ideas though 👍

rrevo20:10:52

I was trying to look for information on AWS cross region fail-over support. https://github.com/awslabs/dynamodb-cross-region-library from amazon can clone from one dynamodb table to another across regions. And dynamodb streams (http://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Streams.html) guarantee that the records appear once and in sequence. Does this satisfy the consistent copy options for HA in datamic or is something else that is missing? http://docs.datomic.com/ha.html#other-consistent-copy-options

mike_ananev21:10:14

Hi, Datomic Team! I would like to recommend you new host DB for Datomic – Tarantool DB.

mike_ananev21:10:25

Sub 1 ms latency • 100K-300K QPS per one CPU core • 100K updates per node • Small number of nodes (money saver) • Expiration • Always up, no maintenance windows • Optimized for heavy parallel workloads

mike_ananev21:10:44

+ Full ACID DB

mike_ananev21:10:28

You can avoid HornetQ, cause Tarantool can work as queue too.

mike_ananev21:10:39

Tarantool is a cache + acid db in one solution. Proven in production many years on highload services: Badoo, Avito, http://Mail.ru

mike_ananev21:10:12

It has App server in DB, so you can write stored procedures in High Level Language