Fork me on GitHub

I wonder if it’s possible to know in advanced how many segments will be fetched from storages during query execution?


I mean I understand that it depends on query itself, schema, data that was already fetched etc


Something like “select explain"


i saw someone had made a library to attempt to count datoms at each clause, but that was before all the new stuff was added


from reading the questions the author was asking and the sort of answer he was getting from Cognitect, it was directly in Datomic’s secret-sauce and no real progress was made


if such a facility exists, it’ll be because Cognitect provides it


anyone using the pull spec have a simple way to flatten the result such that all nested maps are merged with the root map?


@kachayev: i think the best thing you can do is monitoring your storage. as far as i know, index segments have a fixed size in kilobytes. so even your data influences how many segments are needed for a query. the peer also caches all segments. so running a reasonable small query again should not reach out to the storage twice. in queries, it’s important to order the clauses by the number of datoms they bind. by having the most specific clause first, you will get the best performance.


the question is that it’s hard to keep track of "order the clauses by the number of datoms” when you have many queries


and/or dynamically built queries


it’s also hard to “play” with data locality - it just takes too much time to do: change schema, load everything, run a lot of queries, then analyze charts about network consumption & storage performance (and they are not that obvious usually)


dynamically built queries are not such a good idea. d/q caches the preparatory work it does for its first param. better to have standard queries with dynamic :in values


hear you on the ease of play thing. immutability does have its downsides simple_smile


i’ve just checked, we have 500+ invocations of d/q in our projects


and we’ve done ok with perf testing each one as we go


ordering clauses such that :in values are handled early and :find values late, and then testing swapping things around in the middle


on network traffic, you can stick memcached in the middle to get a big overall read perf boost


didn’t get the idea about “immutability” and “data locality” (in terms of “immutability downsides”). orthogonal concepts as for me


@kachayev: right a query planner may be helpful in bigger projects - I once heard from Rich that he likes the control one have when there is no query planner. I think he was bitten by some SQL query planner in the past. If he still thinks the same and paying customers do not complain a lot, do not expect a query planner very soon.


@kachayev: I expect data locality is also not easy to track down inside say Oracle accessing files in a SAN. Other than that there is better tooling around.


I can’t say that it’s a kind of “complain”, just curiosity. I understand that most modern databases don’t provide any tooling for this as well, so it’s not a “must-have” and definitely not a “deal-breaker”.


i was talking more to the busy-work of having to recreate databases with new schema etc to test different setups


Is the Datomic documentation available offline (as pdf, dash docset, repo)?


@akiel: note: not all segments are cached, log segments aren't, neither are things from the gc


@tcrayford: what do you mean with gc?


the "list of segments to gc" is stored in storage somewhere. When you run d/gc-storage it has to query that stuff and it's not cached


ah ok - so this is only a maintenance thing