Fork me on GitHub
#datahike
<
2021-11-02
>
Björn Ebbinghaus13:11:54

Is there a recommendation for the storage backend? So far I have always used the file backend. Are there Benchmarks comparing different backends?

Jack Park21:11:38

I have a few experiments with postgres 13. It does seem to work fine. Not making recommendations, just suggesting it's possible.

kkuehne07:11:06

Hi, since we rewrote the backend, I started now on a notebook for the benchmarks inside Datahike. As for storage in general, I would not recommend to use the file backend with really large databases because it creates new files with each index fragment which could lead to large folders with millions of files.

Björn Ebbinghaus12:11:30

@UB95JRKM3 I don't have a large database (thousands of datoms) and I don't expect to scale beyond ten-thousands. I am mostly interested in latency and (in the future) the ability to have multiple services running on the same database. Any recommendations in this case? -- A few weeks ago, I had a situation where I benchmarked some code of mine, sorting entities of a to-many attribute by their subcomponents. There were 80 - 150 entities involved. (15 entities to sort, each with 5-10 sub-entities to sort by) Just using d/entity and then default Clojure functions took 60-70ms Touching the entity first or pulling took 34ms and getting the needed values with a query, sorting them took 5-6ms. It made me conscious about the time Datahike needed to get data from the disk. This made me rethink my use of entities, which I thought was the way to go for me, because the code involved is way simpler, as it is just normal Clojure. If you are interested in what I did specifically:

def db (d/db conn))

(defn- newest-first [proposal]
  (- (:db/id proposal)))

(defn approvals [proposal]
  (reduce + 0 (map ::opinion/value (::proposal/opinions proposal))))

(defn ranked-proposals
  "Return an ordered list of proposals, sorted by approval."
  [process]
  (let [proposals (::process/proposals process)]
    (reverse (sort-by (juxt approvals newest-first) proposals))))

;; 60-70ms
(time (ranked-proposals (d/entity db [::process/slug "der-naechste-schritt"])))

;; ~34ms
(time (ranked-proposals (d/pull db [{::process/proposals [:db/id {::proposal/opinions [::opinion/value]}]}]
                          [::process/slug "der-naechste-schritt"])))

;; ~5-6ms
(time (map #(d/entity db (first %))
        (reverse (sort-by (juxt second first)
                   (d/q '[:find ?proposal (sum ?value) :with ?dp
                          :in $ ?process
                          :where
                          [?process ::process/proposals ?proposal]
                          (or-join [?proposal ?dp ?value]
                            (and
                              [?proposal ::proposal/opinions ?opinion]
                              [?opinion ::opinion/value +1]
                              [(identity ?opinion) ?dp]
                              [(ground 1) ?value])
                            (and
                              [(identity ?proposal) ?dp]
                              [(ground 0) ?value]))]

                     @conn
                     [::process/slug "der-naechste-schritt"])))))