This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-11-24
Channels
- # announcements (1)
- # asami (10)
- # aws (1)
- # babashka (1)
- # beginners (105)
- # cider (13)
- # cljsrn (6)
- # clojure (42)
- # clojure-australia (4)
- # clojure-dev (7)
- # clojure-europe (26)
- # clojure-nl (2)
- # clojure-uk (13)
- # clojurescript (19)
- # code-reviews (3)
- # conjure (18)
- # core-async (4)
- # core-matrix (5)
- # cryogen (3)
- # datomic (27)
- # depstar (21)
- # emacs (2)
- # figwheel-main (9)
- # fulcro (18)
- # helix (7)
- # jobs (3)
- # jobs-discuss (15)
- # juxt (7)
- # kaocha (4)
- # lambdaisland (2)
- # leiningen (11)
- # luminus (1)
- # malli (6)
- # meander (9)
- # minimallist (4)
- # mount (3)
- # off-topic (3)
- # pathom (8)
- # pedestal (28)
- # rdf (13)
- # re-frame (7)
- # reagent (5)
- # shadow-cljs (3)
@quoll This is an interesting benchmark. Do you have a reproducible version somewhere? We can compare the numbers with the hitchhiker-tree then. Was this with or without persisting to a durable medium?
I’ve been messing with it since 🙂 And there’s a bug that I’m tracking down, though I don’t think it shows up in the version that led to that benchmark.
The string pool does not store short strings. Instead, it encodes them in a negative number. Longer strings are stored
A lot of the document was short strings, so it’s hard to say what the ratio of encoding into numbers vs storing was. To figure that out, I told it to store everything. And it turns out that there’s a bug when the stored string is shorter than the size of the data in a tree node. I’m debugging this now. 🙂
So it was a bit premature to be talking about that above, but I was really excited that it worked!
It’s possible to reproduce what I was talking about, though it’s tricky… • checkout the “storage” branch” • checkout the working commit: 0b6031f • execute the code in the attached thread…
(require '[clojure.string :as s])
(require '[asami.durable.common :refer :all])
(require '[asami.durable.pool :refer [create-pool]])
;; read a book
(def book (slurp "resources/pride_and_prejudice.txt"))
(def words (s/split book #" "))
;; create a data pool
(def pool (create-pool "book"))
;; load words into the pool, accumulating the numbers the pool assigns to the words
(def coded-pool (time (reduce (fn [[ids p] w]
(let [[id p'] (write! p w)]
[(conj ids id) p']))
[[] pool]
words)))
(def coded (first coded-pool))
(def bpool (second coded-pool))
;; transactions are handled outside of the pool, so get transaction point for later
(def root (:root-id bpool))
;; coded now contains numbers for every word in the document
(count coded)
(take 10 coded)
;; ask the data pool for the data associated with each number
(def output-words (map #(find-object bpool %) coded))
(time (count output-words))
;; 3.1s 3.4s
(= words output-words)
;; truncate the files to just what is in use
(close bpool)
;; come back and reopen the file. Use the transaction point from above
(def pool2 (create-pool "book" 68))
;; ask the data pool again for the data associated with each number in coded
(def output-words2 (map #(find-object pool2 %) coded))
;; is it all still available?
(count output-words2)
(= words output-words2)