Fork me on GitHub

Not publicly, but it's easy to get. Ping me tomorrow and I can show you

gratitude 1
🙏 1

I've been poking at the asami database locally and reading the dev docs. I think I was able to get the index, but I wasn't able to figure out how to seek to a particular spot and read tuples.


Are you referring to the on-disk index, or is it in memory?


OK, from a Database (found via (asami.core/db the-connection), you get the graph, via ( the-db) That’s an instance of asami.durable.graph/BlockGraph This has 4 indexes in it (I’m thinking of reducing it to 3): spot post ospt tspo You’re probably more familiar with the names: eavt avet veat teav

👍 1

These indexes can return all statements… as a quad of numbers. The numbers get converted into the actual things via another field in the BlockGraph called pool


Anyway… each of spot post ospt are an instance of asami.durable.tuples/TupleIndex (the teav index is just a flat array)


Iterating through any of the TupleIndex objects is usually done with the function asami.durable.common/find-tuples. This just calls


It calls find-coord, which will return either a single map that contains a coordinate of the thing looked for, or a vector of 2 maps, indicating a pair of coordinates (immediately before and immediately after the thing being looked for)


So if it matches map? then it’s a matching value, and it starts iterating through the index


The coord is a map of {:node nnn :pos ppp} Where the nnn value is the node number in the tree, and the ppp is the position in the block associated with the node. If you’ve read the structure description, then you know that each node has a reference to a block in another file, and the block has between 0 and 512 tuples in it


The function tuple-seq will return a seq with all of the tuples in it. It works by iterating though all of the tuples in a block, and when it gets to the end of the block, the node reference moves to the next node in the tree, and the block position gets reset to 0


The seq finishes when it encounters the first tuple that doesn’t match the provided search criteria


So if I look for [100 200] The first few tuples returned might be ([100 200 1 5] [100 200 2 5] [100 200 3 5] [100 200 4 6] …)


as soon as something doesn’t have a matching prefix of [100 200] then it stops


when I call asami.durable.common/find-tuples, the second argument is the tuple to start from for the seq. It seems like it's expecting a vector of numbers?


Yes. That’s what you’re looking for


On disk, all tuples are numbers


so if I have the :post index, I need a way to convert [:my/attribute "my-value"] into the tuple format?


That’s what pool is for


is that DataStorage/find-id?

👍 1

Take the pool value from the graph, and call (asami.durable.common/find-id :my/attribute). You’ll get back a number

👍 1

Same for the value. find-id will return return a number for any value, or nil if the thing hasn’t been stored


If you look at resolve-triple in the BlockGraph you’ll see that it implements the search for patterns like [?e :has-value ?v] It does this by called get-id on each element of the provided vector. That’s just a small wrapper function that ignores symbols, or else it calls find-id on the argument


Then it passes that along to get-from-index


So if I wanted to find tuples in an index that come after something like [:my/attribute "asami-"], since "asami-" isn't in the db yet, how would I get an find-id for it?


Do I have to DataStorage/write! it?


Hmmm…. You’d have to find the point immediately after. Which either needs you to write that, or you can call the internal functions


But… well… it doesn’t return statements in lexical order. It returns them in ID order, and the IDs are based on the insertion order


Mind you… the ID for "asami-" is calculated, rather than retrieved from storage


That’s because it’s short enough to fit into a 64-bit number


So… Asami storage may not do what you’re looking for?


Yea, I noticed find-id returns non-nil for some novel values


Yup. Short strings and keywords can be encoded as a number


and I think numbers too 😄


That way, the pool index doesn’t have to be hit unnecessarily


Yea, I didn't realize the indices were sorted by pool id and not by value.


Though, you CAN get a sequence of pool ids, and they’re in lexical order


But short strings aren’t in there


Oh, and the pool ids that don't need to be in the index are the negative ones?


Yes 🙂

🆒 1

Wow, I feel like I just learned a lot. Thanks for your help gratitude


Objects can be encapsulated in a 64-bit long if they are: • a string 7 bytes long, or fewer • a keyword, 7 bytes long, or fewer • Longs that can be stored in 60 bits (negative or positive) • Instants or Dates, where the ms-since-epoch values can fit in a long If they’re encapsulated, the top bit will be set (hence the negative numbers)


Note that strings are 7 bytes or fewer. It uses UTF-8. If you have a string like “hello:slightly_smiling_face:” then it won’t fit. But “hi:slightly_smiling_face:” does fit


Oh… wait. That fits


It can’t fit: “hello:woman-shrugging:”

😆 1

Basically, I was looking for multi-byte chars, and simple_smile is only 2 bytes


But for stored objects (like strings, keywords, and URIs) then they’re all stored lexically


It’s a bit harder to work with, since it doesn’t expect you to iterate through this structure


The* function converts the thing you’re looking for into a sequence of bytes (actually, 2 sequences. The first part is all that’s stored in a tree node). Then it uses this to find the tree node (line 66). The first part of the bytes is used for searching through the tree of nodes, but as soon as it finds a complete match, then it needs to check if the remaining string is less-than/equal/greater-than, so it has to look at the rest of the bytes.


That find-node function returns the node. You could use this to create a node seq, and iterate through the tree from that point.


So, in theory, if I locally turned off the small value optimization, I could scan by value (but use extra space and slowing things down)?



🆒 1

I'm not sure I'll do that, but just trying to make sure I'm following along


I’ve considered storing all strings (and keywords, and URIs) in a Patricia tree on disk instead. That would take a bit of work, and be a little slower to read, but might be interesting, because it would allow for fast regexs


I've worked with databases before, but haven't poked too much at the internals. It's very cool stuff


There are tradeoffs in all sorts of places 🙂


For instance, that ospt index is fast to read, but costs to write to it, and it’s not read enough to justify having it. I need to remove it and emulate it with other indices


For whatever reason, I was just happy to see that asami didn't use nippy to serialize values


I do suffer a bit from “not invented here”


To me, it was just an indication that asami cares about my data enough to own the serialization format:


It's also really cool that asami works with native image and in the browser


Well, I have to update it to use promises if I want it to use local storage in the browser 😕


Though, theoretically, promesa could give me that with minimal effort… I hope


if I get that done, then I can use everything that exists now, and just create a new block manager


I also want to create a block manager for Redis.


Because all these indexes are built on blocks, and they can be stored anywhere. Once the transaction that creates them is over, then they’re immutable


Which means that I can store blocks in any system, and they can be replicated without any concerns

😎 1
clojure-spin 1

I just need to get IDs for each one


Just fyi, I was able to seek to create a lazy seq to seek to a specific spot in the index 🎉 . It's just a slightly modified, poorly named version of the fns in asami.durable.tuples.

(defn tuple-seq-infinite
  [index blocks tuple node offset]
  (when-let [node (next-populated index node)]
    (let [nodes (tree/node-seq index node)
          block-for (fn [n] (and n (get-block blocks (get-block-ref n))))
          nested-seq (fn nested-seq' [[n & ns :as all-ns] blk offs]
                       (when n
                         (let [t (tuple-at blk offs)]
                           (let [nxto (inc offs)
                                 [nnodes nblock noffset] (if (= nxto (get-count n))
                                                           (loop [[fns & rns :as alln] ns]
                                                             (if (and fns (zero? (get-count fns)))
                                                               (recur rns)
                                                               [alln (block-for fns) 0]))
                                                           [all-ns blk nxto])]
                             (cons t (lazy-seq (nested-seq' nnodes nblock noffset)))))))]
      (nested-seq nodes (block-for node) offset))))

(defn seek-index
  ([tuple-index tuple]
   (seek-index (:index tuple-index)
               (:blocks tuple-index)
  ([index blocks tuple]
   (let [coord (find-coord index blocks tuple)
         coord (if (map? coord)
                 ;; use node after where this tuple would be found as coord
                 (second coord))]
     (if (map? coord)
       (let [{:keys [node pos]} coord]
         (tuple-seq-infinite index blocks tuple node pos))))))