This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-05-19
Channels
- # announcements (1)
- # asami (75)
- # beginners (16)
- # calva (14)
- # cider (4)
- # clj-kondo (11)
- # cljs-dev (3)
- # clojure (110)
- # clojure-australia (1)
- # clojure-china (1)
- # clojure-europe (38)
- # clojure-hk (1)
- # clojure-indonesia (1)
- # clojure-japan (1)
- # clojure-korea (1)
- # clojure-my (1)
- # clojure-nl (1)
- # clojure-sg (1)
- # clojure-spec (1)
- # clojure-taiwan (1)
- # clojure-uk (2)
- # clojurescript (34)
- # conjure (1)
- # data-science (9)
- # datahike (7)
- # datalevin (1)
- # datascript (1)
- # datomic (5)
- # etaoin (1)
- # fulcro (23)
- # graalvm (50)
- # helix (4)
- # hyperfiddle (8)
- # introduce-yourself (1)
- # jobs (3)
- # kaocha (10)
- # malli (8)
- # mid-cities-meetup (1)
- # minecraft (1)
- # off-topic (13)
- # pathom (14)
- # polylith (38)
- # reitit (1)
- # releases (1)
- # sci (65)
- # shadow-cljs (28)
- # specter (12)
- # tools-deps (8)
- # vim (1)
- # vscode (11)
- # xtdb (31)
I've been poking at the asami database locally and reading the dev docs. I think I was able to get the index, but I wasn't able to figure out how to seek to a particular spot and read tuples.
on-disk
OK, from a Database (found via (asami.core/db the-connection
), you get the graph, via (asami.storage/graph the-db)
That’s an instance of asami.durable.graph/BlockGraph
This has 4 indexes in it (I’m thinking of reducing it to 3):
spot
post
ospt
tspo
You’re probably more familiar with the names:
eavt
avet
veat
teav
These indexes can return all statements… as a quad of numbers.
The numbers get converted into the actual things via another field in the BlockGraph
called pool
Anyway… each of spot
post
ospt
are an instance of asami.durable.tuples/TupleIndex
(the teav
index is just a flat array)
Iterating through any of the TupleIndex
objects is usually done with the function asami.durable.common/find-tuples
. This just calls https://github.com/quoll/asami/blob/db42e4c1f4593ee1e22d5a644af21bb72d539417/src/asami/durable/tuples.cljc#L466
oh, cool
It calls find-coord
, which will return either a single map that contains a coordinate of the thing looked for, or a vector of 2 maps, indicating a pair of coordinates (immediately before and immediately after the thing being looked for)
So if it matches map?
then it’s a matching value, and it starts iterating through the index
The coord is a map of {:node nnn :pos ppp}
Where the nnn
value is the node number in the tree, and the ppp
is the position in the block associated with the node. If you’ve read the structure description, then you know that each node has a reference to a block in another file, and the block has between 0 and 512 tuples in it
The function tuple-seq
will return a seq with all of the tuples in it. It works by iterating though all of the tuples in a block, and when it gets to the end of the block, the node reference moves to the next node in the tree, and the block position gets reset to 0
The seq finishes when it encounters the first tuple that doesn’t match the provided search criteria
So if I look for [100 200]
The first few tuples returned might be ([100 200 1 5] [100 200 2 5] [100 200 3 5] [100 200 4 6] …)
when I call asami.durable.common/find-tuples
, the second argument is the tuple to start from for the seq. It seems like it's expecting a vector of numbers?
so if I have the :post
index, I need a way to convert [:my/attribute "my-value"]
into the tuple format?
Take the pool
value from the graph, and call (asami.durable.common/find-id :my/attribute)
. You’ll get back a number
Same for the value. find-id
will return return a number for any value, or nil
if the thing hasn’t been stored
If you look at resolve-triple
in the BlockGraph
you’ll see that it implements the search for patterns like [?e :has-value ?v]
It does this by called get-id
on each element of the provided vector. That’s just a small wrapper function that ignores symbols, or else it calls find-id
on the argument
So if I wanted to find tuples in an index that come after something like [:my/attribute "asami-"]
, since "asami-" isn't in the db yet, how would I get an find-id
for it?
Do I have to DataStorage/write!
it?
Hmmm…. You’d have to find the point immediately after. Which either needs you to write that, or you can call the internal functions
But… well… it doesn’t return statements in lexical order. It returns them in ID order, and the IDs are based on the insertion order
Yea, I noticed find-id
returns non-nil for some novel values
and I think numbers too 😄
Yea, I didn't realize the indices were sorted by pool id and not by value.
Oh, and the pool ids that don't need to be in the index are the negative ones?
Wow, I feel like I just learned a lot. Thanks for your help
Objects can be encapsulated in a 64-bit long if they are: • a string 7 bytes long, or fewer • a keyword, 7 bytes long, or fewer • Longs that can be stored in 60 bits (negative or positive) • Instants or Dates, where the ms-since-epoch values can fit in a long If they’re encapsulated, the top bit will be set (hence the negative numbers)
Note that strings are 7 bytes or fewer. It uses UTF-8. If you have a string like “hello:slightly_smiling_face:” then it won’t fit. But “hi:slightly_smiling_face:” does fit
But for stored objects (like strings, keywords, and URIs) then they’re all stored lexically
It’s a bit harder to work with, since it doesn’t expect you to iterate through this structure
The https://github.com/quoll/asami/blob/db42e4c1f4593ee1e22d5a644af21bb72d539417/src/asami/durable/pool.cljc#L63-L68* function converts the thing you’re looking for into a sequence of bytes (actually, 2 sequences. The first part is all that’s stored in a tree node). Then it uses this to find the tree node (line 66). The first part of the bytes is used for searching through the tree of nodes, but as soon as it finds a complete match, then it needs to check if the remaining string is less-than/equal/greater-than, so it has to look at the rest of the bytes.
That find-node
function returns the node. You could use this to create a node seq, and iterate through the tree from that point.
So, in theory, if I locally turned off the small value optimization, I could scan by value (but use extra space and slowing things down)?
I'm not sure I'll do that, but just trying to make sure I'm following along
I’ve considered storing all strings (and keywords, and URIs) in a Patricia tree on disk instead. That would take a bit of work, and be a little slower to read, but might be interesting, because it would allow for fast regexs
I've worked with databases before, but haven't poked too much at the internals. It's very cool stuff
For instance, that ospt
index is fast to read, but costs to write to it, and it’s not read enough to justify having it. I need to remove it and emulate it with other indices
For whatever reason, I was just happy to see that asami didn't use nippy to serialize values
To me, it was just an indication that asami cares about my data enough to own the serialization format:
It's also really cool that asami works with native image and in the browser
Well, I have to update it to use promises if I want it to use local storage in the browser 😕
if I get that done, then I can use everything that exists now, and just create a new block manager
Because all these indexes are built on blocks, and they can be stored anywhere. Once the transaction that creates them is over, then they’re immutable
Which means that I can store blocks in any system, and they can be replicated without any concerns

Just fyi, I was able to seek to create a lazy seq to seek to a specific spot in the index 🎉 . It's just a slightly modified, poorly named version of the fns in asami.durable.tuples
.
(defn tuple-seq-infinite
[index blocks tuple node offset]
(when-let [node (next-populated index node)]
(let [nodes (tree/node-seq index node)
block-for (fn [n] (and n (get-block blocks (get-block-ref n))))
nested-seq (fn nested-seq' [[n & ns :as all-ns] blk offs]
(when n
(let [t (tuple-at blk offs)]
(let [nxto (inc offs)
[nnodes nblock noffset] (if (= nxto (get-count n))
(loop [[fns & rns :as alln] ns]
(if (and fns (zero? (get-count fns)))
(recur rns)
[alln (block-for fns) 0]))
[all-ns blk nxto])]
(cons t (lazy-seq (nested-seq' nnodes nblock noffset)))))))]
(nested-seq nodes (block-for node) offset))))
(defn seek-index
([tuple-index tuple]
(seek-index (:index tuple-index)
(:blocks tuple-index)
tuple))
([index blocks tuple]
(let [coord (find-coord index blocks tuple)
coord (if (map? coord)
coord
;; use node after where this tuple would be found as coord
(second coord))]
(if (map? coord)
(let [{:keys [node pos]} coord]
(tuple-seq-infinite index blocks tuple node pos))))))