Fork me on GitHub

hi, I have a very specific problem to solve which might sound very generic one in the beginning. In short, I’m trying to implement LIMIT functionality with Datomic but my use case really doesn’t allow me to use for example the datoms API and pull based approach really doesn’t work either since that does the limitation on attribute level


so, I’m kinda like trying to solve the LIMIT problem just by using the q based queries


since queries done with it are eager and it doesn’t support limit itself I’m kinda out of ideas


the reason why I need to stick with q and really cannot use datoms API is that we are creating a SPARQL endpoint to our data and thus we really need to actually perform datalog queries


also, I think I wouldn’t be able to solve this via datoms API either since when making actual queries I think the query engine uses all the indices available and with datoms I’m restricting myself to single index


so even if I do get access to all datoms via datoms API, I cannot exactly do stuff like:


that executes but will not provide the same results as straight query due to different indices (or well, that exact query just might, but in general you can’t rely on that)


and besides, doing limit functionality on datoms level there wouldn’t give me correct results since then limiting is done too early


so, am I just screwed or is there some hidden feature somewhere which would allow me to do LIMIT?


let me read that through with thought


LIMIT somewhat implies order afaik, which I don't think Datomic has. Datomic uses sets. We've run into a desire for LIMIT before, and had to work around it using d/datoms (and we could, in our situation).


Perhaps you could write a custom aggregate for this, but I suspect there isn't one for a reason.


yeah, we are actually going to need ordering too but we already have a plan for that


that is: add ordering information as metadata, do the ordering at post via code


not optimal but nothing too bad really


limit is much harder nut to crack and has much bigger performance implications


I think your LIMIT solution, is coupled to your ordering situation. You can't LIMIT until you've ORDERed


yeah, true


this sucks 😕


love datomic but this is a real problem


I can guess it would be there if it was straightforward to do


did a quick test with sample, execution time is same with sample and if you just get all the data out


so not really helping


I’m slowly starting to lean on that we need to add LIMIT to “not supported” list


which would be a huge letdown


if you need to sort and limit a dataset, you'll need to have the whole thing in memory at some point, don't you


I can't think of any way for a RDBMS to sort and then limit, on the database server, without having the whole working set in memory


(and if I can't think of an algorithm to do that off the top of my head, then obviously it cannot exist, right?)


I guess doing it naively after making the query is our best bet here, at least we can reduce the amount of data sent over the wire that way


and at least it would improve performance in ORDER BY + LIMIT scenario, since one needs to do the sorting only as long as the LIMIT has been reached


but then again, I’d guess the performance benefit would be marginal at best


just trying to find something positive here 😄


@niklas.collin so, I haven’t gotten into a case where I had to implement this yet, but this problem bothered me a bit and I thought of two solutions:


(1) if there is only one ordering you care about (e.g. a newsfeed, where you want to retrieve the top N entries), store the data in a format which allows for this specific query to be efficient. e.g. a linked list


(2) if (1) does not work (e.g. because you need to sort arbitrarily), build a materialised view of your database using the tx report queue


@hmaurer both good ideas but unfortunately not usable in this case, thanks for your input and ideas though 👍


I was trying to look for information on AWS cross region fail-over support. from amazon can clone from one dynamodb table to another across regions. And dynamodb streams ( guarantee that the records appear once and in sequence. Does this satisfy the consistent copy options for HA in datamic or is something else that is missing?


Hi, Datomic Team! I would like to recommend you new host DB for Datomic – Tarantool DB.


Sub 1 ms latency • 100K-300K QPS per one CPU core • 100K updates per node • Small number of nodes (money saver) • Expiration • Always up, no maintenance windows • Optimized for heavy parallel workloads


+ Full ACID DB


You can avoid HornetQ, cause Tarantool can work as queue too.


Tarantool is a cache + acid db in one solution. Proven in production many years on highload services: Badoo, Avito,


It has App server in DB, so you can write stored procedures in High Level Language