This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-02-04
Channels
- # atom-editor (2)
- # babashka (39)
- # beginners (123)
- # calva (90)
- # cider (57)
- # clojure (103)
- # clojure-australia (13)
- # clojure-europe (38)
- # clojure-france (3)
- # clojure-italy (13)
- # clojure-nl (7)
- # clojure-norway (2)
- # clojure-seattle (1)
- # clojure-uk (22)
- # clojurescript (28)
- # conjure (20)
- # cursive (2)
- # datascript (1)
- # datomic (20)
- # depstar (23)
- # emacs (1)
- # events (2)
- # graphql (12)
- # honeysql (4)
- # jobs (2)
- # kaocha (2)
- # malli (14)
- # music (2)
- # off-topic (103)
- # pathom (8)
- # polylith (7)
- # quil (3)
- # reagent (7)
- # reitit (2)
- # remote-jobs (1)
- # shadow-cljs (55)
- # slack-help (4)
- # spacemacs (29)
- # xtdb (16)
what shared resources does open-db take up in particular? It seems to be a rocksdb snapshot, any idea how much that costs to keep around? I was thinking of just letting GC take care of old handles, because managing lifetimes in a dynamic lang, taking into account caching etc. is just too much for me
Hey @U797MAJ8M 👋 tl;dr is if you're not keen on closing it, it'd be worth using plain old db
instead as that doesn't open any of its own resources
open-db
opens up the RocksDB snapshot, as you've said, and also an entity resolution cache, which caches the current versions of entities as of the DB timestamp. I'm not aware of anything in this cache that wouldn't get GC'd though.
The RocksDB snapshot is a different matter - I'd have to look at the Rocks Java source, but IIUC this is a backed by a natively allocated object, so won't get GC'd. The memory impact of this is pretty small - it just stores the latest version number when the snapshot was taken but (again IIUC) this is then used to determine what files Rocks can compact away - i.e. if there are old snapshots open, I'm not sure it can compact files into higher levels of the LSM tree, so will affect query performance over time.
hi :) I was thinking, inchoately, about doing something like keeping track of opened db's, and if the tx-basis is the same as one already existing in memory, reuse that one instead of making a trip to crux. Does that have any merit to it? not sure how much it's saving
I didn't know about the entity resolution cache. I guess after that it's some sort of on-disk cache, and then the doc store past that?
It's just an in-memory cache, it only caches the mapping between entity-id and content-hash at that DB basis - the 'temporal resolution' of each entity. Doesn't sound like much, but we end up using this mapping quite a lot throughout the query engine
And I don't know, I'm afraid - it'd depend quite a lot on your use case. The Rocks snapshot and the cache are pretty cheap to create, certainly - the questions would be around how often you re-use the same tx-basis, and how frequently you access the same entities in those queries. As always, best to measure 🙂
thanks for the info! real simple criterium for anyone curious:
(put {:crux.db/id :test :foo 1 :bar "1"}) (put {:crux.db/id :test2 :foo 2 :bar "2"}) (do (println "open-db") (criterium.core/quick-bench (let [db (crux/open-db node)] (crux/entity db :test) (crux/entity db :test2)))) (do (println "db") (criterium.core/quick-bench (do (crux/entity (db) :test) (crux/entity (db) :test2)))) open-db Evaluation count : 10422 in 6 samples of 1737 calls. Execution time mean : 57.281102 µs Execution time std-deviation : 399.468510 ns Execution time lower quantile : 57.014872 µs ( 2.5%) Execution time upper quantile : 57.946446 µs (97.5%) Overhead used : 1.604431 ns Found 1 outliers in 6 samples (16.6667 %) low-severe 1 (16.6667 %) Variance from outliers : 13.8889 % Variance is moderately inflated by outliers db Evaluation count : 7656 in 6 samples of 1276 calls. Execution time mean : 75.881244 µs Execution time std-deviation : 1.356425 µs Execution time lower quantile : 74.723975 µs ( 2.5%) Execution time upper quantile : 78.143538 µs (97.5%) Overhead used : 1.604431 ns Found 1 outliers in 6 samples (16.6667 %) low-severe 1 (16.6667 %) Variance from outliers : 13.8889 % Variance is moderately inflated by outliers
validity of microbenchmarks notwithstanding, it seems like avoiding a full call to (crux/db node) pays off quite easily
and with the cache:
(def db-cache (cc/lru-cache-factory {} :threshold 1000)) (defn db ([] (crux/db node)) ([valid-time-or-basis] (cc/lookup-or-miss db-cache valid-time-or-basis (fn [_] (crux/db node valid-time-or-basis))))) (do (println "db-cached") (criterium.core/quick-bench (let [date (tick/inst)] (do (crux/entity (db date) :test) (crux/entity (db date) :test2))))) db-cached Evaluation count : 12114 in 6 samples of 2019 calls. Execution time mean : 50.338854 µs Execution time std-deviation : 415.897755 ns Execution time lower quantile : 50.013153 µs ( 2.5%) Execution time upper quantile : 50.958765 µs (97.5%) Overhead used : 1.604431 ns
not sure if it's better to cache a db
or an open-db
, but at least I'm not thinking about lifetimes anymorethinking about it more, if you wanted to do some deferred computation that makes db calls in general, you would want to explicitly use and cache the tx basis alongside it. Actually a very interesting feature of crux that you can do that at all.
Is there a way to search all documents that had a certain attribute/value, including the ones that were deleted, throughout all history (not in a specific point in time)?
Hi @U01DZ6JEPH6 we don't have an index or API that specifically supports such a query today, but we hinted in a recent blog post that more advanced temporal queries are on our roadmap: https://opencrux.com/blog/dev-diary-jan-21.html#_future Can you describe the domain of data you're working with, and what you would need such queries for? It would be very helpful for our designs to learn more about any such use-cases. For now, it would probably be better to handle such information with regular Date values.
That's great news! My use case would be accidental deletions. Suppose you have a relationship of 1 to many, so it makes sense to store the 1 ref in the many. Like: {:crux.db/id 123 :user/name "User"} {:crux.db/id 456 :doc/title "My Doc" :doc/owner 123} And the user has lots of docs. If he accidentally deletes a doc, I would like there to be a functionality like "Recover deleted docs". I know I can keep the many refs somewhere or implement it in some other way, but I think that since I have an immutable DB, I shouldn't keep a :deleted attribute on "deleted" docs. Thank you for helping out!