This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
- # beginners (3)
- # boot (8)
- # cljs-dev (10)
- # clojure (87)
- # clojure-art (6)
- # clojure-dev (13)
- # clojure-japan (8)
- # clojure-russia (60)
- # clojure-sg (2)
- # clojurescript (126)
- # clojurewerkz (1)
- # core-logic (10)
- # cursive (6)
- # datomic (30)
- # editors (10)
- # ldnclj (7)
- # off-topic (114)
- # onyx (7)
- # re-frame (7)
- # reagent (37)
I wonder if it’s possible to know in advanced how many segments will be fetched from storages during query execution?
I mean I understand that it depends on query itself, schema, data that was already fetched etc
i saw someone had made a library to attempt to count datoms at each clause, but that was before all the new stuff was added
from reading the questions the author was asking and the sort of answer he was getting from Cognitect, it was directly in Datomic’s secret-sauce and no real progress was made
anyone using the pull spec have a simple way to flatten the result such that all nested maps are merged with the root map?
@kachayev: i think the best thing you can do is monitoring your storage. as far as i know, index segments have a fixed size in kilobytes. so even your data influences how many segments are needed for a query. the peer also caches all segments. so running a reasonable small query again should not reach out to the storage twice. in queries, it’s important to order the clauses by the number of datoms they bind. by having the most specific clause first, you will get the best performance.
the question is that it’s hard to keep track of "order the clauses by the number of datoms” when you have many queries
it’s also hard to “play” with data locality - it just takes too much time to do: change schema, load everything, run a lot of queries, then analyze charts about network consumption & storage performance (and they are not that obvious usually)
dynamically built queries are not such a good idea. d/q caches the preparatory work it does for its first param. better to have standard queries with dynamic :in values
ordering clauses such that
:in values are handled early and
:find values late, and then testing swapping things around in the middle
on network traffic, you can stick memcached in the middle to get a big overall read perf boost
didn’t get the idea about “immutability” and “data locality” (in terms of “immutability downsides”). orthogonal concepts as for me
@kachayev: right a query planner may be helpful in bigger projects - I once heard from Rich that he likes the control one have when there is no query planner. I think he was bitten by some SQL query planner in the past. If he still thinks the same and paying customers do not complain a lot, do not expect a query planner very soon.
@kachayev: I expect data locality is also not easy to track down inside say Oracle accessing files in a SAN. Other than that there is better tooling around.
I can’t say that it’s a kind of “complain”, just curiosity. I understand that most modern databases don’t provide any tooling for this as well, so it’s not a “must-have” and definitely not a “deal-breaker”.
i was talking more to the busy-work of having to recreate databases with new schema etc to test different setups
@akiel: note: not all segments are cached, log segments aren't, neither are things from the gc
the "list of segments to gc" is stored in storage somewhere. When you run
d/gc-storage it has to query that stuff and it's not cached