Fork me on GitHub
#asami
<
2022-05-19
>
quoll04:05:02

Not publicly, but it's easy to get. Ping me tomorrow and I can show you

gratitude 1
🙏 1
phronmophobic17:05:08

I've been poking at the asami database locally and reading the dev docs. I think I was able to get the index, but I wasn't able to figure out how to seek to a particular spot and read tuples.

quoll18:05:02

Are you referring to the on-disk index, or is it in memory?

quoll18:05:59

OK, from a Database (found via (asami.core/db the-connection), you get the graph, via (asami.storage/graph the-db) That’s an instance of asami.durable.graph/BlockGraph This has 4 indexes in it (I’m thinking of reducing it to 3): spot post ospt tspo You’re probably more familiar with the names: eavt avet veat teav

👍 1
quoll18:05:33

These indexes can return all statements… as a quad of numbers. The numbers get converted into the actual things via another field in the BlockGraph called pool

quoll18:05:42

Anyway… each of spot post ospt are an instance of asami.durable.tuples/TupleIndex (the teav index is just a flat array)

quoll18:05:19

Iterating through any of the TupleIndex objects is usually done with the function asami.durable.common/find-tuples. This just calls https://github.com/quoll/asami/blob/db42e4c1f4593ee1e22d5a644af21bb72d539417/src/asami/durable/tuples.cljc#L466

quoll18:05:09

It calls find-coord, which will return either a single map that contains a coordinate of the thing looked for, or a vector of 2 maps, indicating a pair of coordinates (immediately before and immediately after the thing being looked for)

quoll18:05:08

So if it matches map? then it’s a matching value, and it starts iterating through the index

quoll18:05:58

The coord is a map of {:node nnn :pos ppp} Where the nnn value is the node number in the tree, and the ppp is the position in the block associated with the node. If you’ve read the structure description, then you know that each node has a reference to a block in another file, and the block has between 0 and 512 tuples in it

quoll18:05:16

The function tuple-seq will return a seq with all of the tuples in it. It works by iterating though all of the tuples in a block, and when it gets to the end of the block, the node reference moves to the next node in the tree, and the block position gets reset to 0

quoll18:05:50

The seq finishes when it encounters the first tuple that doesn’t match the provided search criteria

quoll18:05:09

So if I look for [100 200] The first few tuples returned might be ([100 200 1 5] [100 200 2 5] [100 200 3 5] [100 200 4 6] …)

quoll18:05:34

as soon as something doesn’t have a matching prefix of [100 200] then it stops

phronmophobic18:05:00

when I call asami.durable.common/find-tuples, the second argument is the tuple to start from for the seq. It seems like it's expecting a vector of numbers?

quoll18:05:30

Yes. That’s what you’re looking for

quoll18:05:40

On disk, all tuples are numbers

phronmophobic18:05:52

so if I have the :post index, I need a way to convert [:my/attribute "my-value"] into the tuple format?

quoll18:05:03

That’s what pool is for

phronmophobic18:05:12

is that DataStorage/find-id?

👍 1
quoll18:05:22

Take the pool value from the graph, and call (asami.durable.common/find-id :my/attribute). You’ll get back a number

👍 1
quoll18:05:54

Same for the value. find-id will return return a number for any value, or nil if the thing hasn’t been stored

quoll19:05:01

If you look at resolve-triple in the BlockGraph you’ll see that it implements the search for patterns like [?e :has-value ?v] It does this by called get-id on each element of the provided vector. That’s just a small wrapper function that ignores symbols, or else it calls find-id on the argument

quoll19:05:43

Then it passes that along to get-from-index

phronmophobic19:05:50

So if I wanted to find tuples in an index that come after something like [:my/attribute "asami-"], since "asami-" isn't in the db yet, how would I get an find-id for it?

phronmophobic19:05:45

Do I have to DataStorage/write! it?

quoll19:05:04

Hmmm…. You’d have to find the point immediately after. Which either needs you to write that, or you can call the internal functions

quoll19:05:59

But… well… it doesn’t return statements in lexical order. It returns them in ID order, and the IDs are based on the insertion order

quoll19:05:13

Mind you… the ID for "asami-" is calculated, rather than retrieved from storage

quoll19:05:32

That’s because it’s short enough to fit into a 64-bit number

quoll19:05:05

So… Asami storage may not do what you’re looking for?

phronmophobic19:05:06

Yea, I noticed find-id returns non-nil for some novel values

quoll19:05:44

Yup. Short strings and keywords can be encoded as a number

phronmophobic19:05:09

and I think numbers too 😄

quoll19:05:30

That way, the pool index doesn’t have to be hit unnecessarily

phronmophobic19:05:45

Yea, I didn't realize the indices were sorted by pool id and not by value.

quoll19:05:23

Though, you CAN get a sequence of pool ids, and they’re in lexical order

quoll19:05:41

But short strings aren’t in there

phronmophobic19:05:28

Oh, and the pool ids that don't need to be in the index are the negative ones?

quoll19:05:04

Yes 🙂

🆒 1
phronmophobic19:05:54

Wow, I feel like I just learned a lot. Thanks for your help gratitude

quoll19:05:05

Objects can be encapsulated in a 64-bit long if they are: • a string 7 bytes long, or fewer • a keyword, 7 bytes long, or fewer • Longs that can be stored in 60 bits (negative or positive) • Instants or Dates, where the ms-since-epoch values can fit in a long If they’re encapsulated, the top bit will be set (hence the negative numbers)

quoll19:05:25

Note that strings are 7 bytes or fewer. It uses UTF-8. If you have a string like “hello:slightly_smiling_face:” then it won’t fit. But “hi:slightly_smiling_face:” does fit

quoll19:05:56

Oh… wait. That fits

quoll19:05:06

It can’t fit: “hello:woman-shrugging:”

😆 1
quoll19:05:10

Basically, I was looking for multi-byte chars, and simple_smile is only 2 bytes

quoll19:05:41

But for stored objects (like strings, keywords, and URIs) then they’re all stored lexically

quoll19:05:15

It’s a bit harder to work with, since it doesn’t expect you to iterate through this structure

quoll19:05:16

The https://github.com/quoll/asami/blob/db42e4c1f4593ee1e22d5a644af21bb72d539417/src/asami/durable/pool.cljc#L63-L68* function converts the thing you’re looking for into a sequence of bytes (actually, 2 sequences. The first part is all that’s stored in a tree node). Then it uses this to find the tree node (line 66). The first part of the bytes is used for searching through the tree of nodes, but as soon as it finds a complete match, then it needs to check if the remaining string is less-than/equal/greater-than, so it has to look at the rest of the bytes.

quoll19:05:19

That find-node function returns the node. You could use this to create a node seq, and iterate through the tree from that point.

phronmophobic19:05:41

So, in theory, if I locally turned off the small value optimization, I could scan by value (but use extra space and slowing things down)?

quoll19:05:55

yes

🆒 1
phronmophobic19:05:16

I'm not sure I'll do that, but just trying to make sure I'm following along

quoll19:05:19

I’ve considered storing all strings (and keywords, and URIs) in a Patricia tree on disk instead. That would take a bit of work, and be a little slower to read, but might be interesting, because it would allow for fast regexs

phronmophobic19:05:31

I've worked with databases before, but haven't poked too much at the internals. It's very cool stuff

quoll19:05:37

There are tradeoffs in all sorts of places 🙂

quoll19:05:14

For instance, that ospt index is fast to read, but costs to write to it, and it’s not read enough to justify having it. I need to remove it and emulate it with other indices

phronmophobic19:05:48

For whatever reason, I was just happy to see that asami didn't use nippy to serialize values

quoll19:05:15

I do suffer a bit from “not invented here”

phronmophobic19:05:06

To me, it was just an indication that asami cares about my data enough to own the serialization format:

phronmophobic19:05:11

It's also really cool that asami works with native image and in the browser

quoll20:05:29

Well, I have to update it to use promises if I want it to use local storage in the browser 😕

quoll20:05:58

Though, theoretically, promesa could give me that with minimal effort… I hope

quoll20:05:21

if I get that done, then I can use everything that exists now, and just create a new block manager

quoll20:05:48

I also want to create a block manager for Redis.

quoll20:05:49

Because all these indexes are built on blocks, and they can be stored anywhere. Once the transaction that creates them is over, then they’re immutable

quoll20:05:30

Which means that I can store blocks in any system, and they can be replicated without any concerns

😎 1
clojure-spin 1
quoll20:05:03

I just need to get IDs for each one

phronmophobic17:05:07

Just fyi, I was able to seek to create a lazy seq to seek to a specific spot in the index 🎉 . It's just a slightly modified, poorly named version of the fns in asami.durable.tuples.

(defn tuple-seq-infinite
  [index blocks tuple node offset]
  (when-let [node (next-populated index node)]
    (let [nodes (tree/node-seq index node)
          block-for (fn [n] (and n (get-block blocks (get-block-ref n))))
          nested-seq (fn nested-seq' [[n & ns :as all-ns] blk offs]
                       (when n
                         (let [t (tuple-at blk offs)]
                           (let [nxto (inc offs)
                                 [nnodes nblock noffset] (if (= nxto (get-count n))
                                                           (loop [[fns & rns :as alln] ns]
                                                             (if (and fns (zero? (get-count fns)))
                                                               (recur rns)
                                                               [alln (block-for fns) 0]))
                                                           [all-ns blk nxto])]
                             (cons t (lazy-seq (nested-seq' nnodes nblock noffset)))))))]
      (nested-seq nodes (block-for node) offset))))

(defn seek-index
  ([tuple-index tuple]
   (seek-index (:index tuple-index)
               (:blocks tuple-index)
               tuple))
  ([index blocks tuple]
   (let [coord (find-coord index blocks tuple)
         coord (if (map? coord)
                 coord
                 ;; use node after where this tuple would be found as coord
                 (second coord))]
     (if (map? coord)
       (let [{:keys [node pos]} coord]
         (tuple-seq-infinite index blocks tuple node pos))))))