Fork me on GitHub

I was talking about LIMIT functionality and lack of it in Datomic here few days ago, I want to continue a bit on the subject and say that after thinking in detail about this I can totally get the basic reason why it is not supported. However I think it should be relatively easy to support IF the query API would return a lazy sequence instead of an eager one. So, my question goes to that department now: why is it exactly that the query API results are actually eager and not lazy?


I’m guessing it has something to do with different indices within Datomic and query planner and combining the datasets from these indices reliably, but this is mostly just guesses and would love to hear some ideas


I would guess that the where clauses are handled via reducers within Datomic (that would just make sense) and based on that assumption creating a lazy sequence shouldn’t be too much of a problem


but I’m pretty certain I’m missing something here, otherwise we would be receiving lazy sequences already. I want to understand the internals of Datomic a bit better so that I can circumvent it’s limitations and use it’s advantages more efficiently


I would imagine it's something that could be included in the query engine if you're OK with limiting when a certain condition is met and the ordering is "whatever the order the query engine iterates the indices in"


it is fundamentally walking a lazy tree of chunks, after all


I actually think it is not as long as the resultset is eager


because the where clauses are applied one by one


so in order to do the LIMIT you need to apply all of them


and at that point you already have all the data without the LIMIT processed and since it is in memory due to peer cache then what’s the point in returning just a subset? Just return it all and let the client to do the limit functionality


but, if this processing would be lazy then you could do this depth first traversal of the where clauses instead of breadth first (which I guess is currently happening)


then one could just simply do (take limit query-results) at Datomic level


and it would work exactly the way people would want it to work


but, actually just got another idea why it is like this: to optimize the peer cache population


right, it must be actually because of that


because you actually want to do the breadth-first: that way after doing the first where clause you know the absolute worst case of data you’re going to need in order to do rest of the query


then you can retrieve that from the storage backend with one sweep and do the rest of the stuff in memory


We’re currently running an index creation migration and would like to get a sense of how long it will take to finish but I’m not sure how to check on that?


In the docs it says the client library can support non-JVM languages. Are there any examples of that? Can we for example use datomic’s new client library from a ruby process?