Fork me on GitHub
#asami
<
2022-07-06
>
folcon11:07:17

So having looked at naga and rete, naga processes rule clauses to work out what rules need to be run. However, I'm looking at, if instead of just recording the counts, we could also capture the matching datoms. This might be all that's needed to provide something akin to materialised views with incremental maintenance[0] or differential dataflow[1]. IE, we register some queries with the db, which are then cached, then instead of querying the entire db when a transaction changes its state, we can tell which clauses were affected and then appropriately conj/disj triples into the cached query. Now this may need to be a different mechanism so we don't interfere with naga's performance. This is also done in 3DF[2][3][4]. Am I missing anything obvious here? This is an area of interest to me and I'm trying to figure out if it's possible to do this in clojure, what would be involved and the best way to tackle it. Please let me know if anyone else is also interested 😃... • [0]: https://www.wotbrew.com/relic/materialization/ • [1]: https://timelydataflow.github.io/differential-dataflow/introduction.html • [2]: https://www.youtube.com/watch?v=ZgqFlowyfTA • [3]: clojure client https://github.com/sixthnormal/clj-3df • [4]: rust server https://github.com/comnik/declarative-dataflow

quoll11:07:37

Yes, this is indeed possible! The reason why Naga relies on counts is because it was written to talk to any graph database, via a protocol. In most cases this means that the data returned is not lazy, and is uncached.

quoll11:07:12

However, I have been wanting to reintegrate Naga into Asami. This will allow it to do some interesting new things, such as tracking generated statements, cheap deltas, and… sometimes… resolution caching.

folcon11:07:25

Oh awesome! How hard would it be to do and what would be a good path for me to get started? This is a really painful problem and if I can get a performant solution to it would be awesome!

quoll11:07:36

One problem is that resolutions CAN be expensive, since they're a generator when it's based in memory, and a cache holds onto the head. OTOH, on disk they're just a pointer and a pattern, so they're cheap to cache

quoll11:07:28

Hmmm. It might be necessary to create a new function on the storage protocol to assist with the caching

folcon11:07:34

Right, but if we can do some reasonable profiling, we can test both and if it's more expensive to run the resolution then we can just not record that query to be resolved?

quoll11:07:56

It's expensive in memory, rather than time

folcon11:07:10

Sure, but that's the tradeoff right 😃...

folcon12:07:34

Hmm, what would be a good route to getting started on this? I'd like to help if possible as it would give me an opportunity to get more comfortable with the underlying system as well as better understand the rules execution side (naga) vs the materialised views side (query caching) =)...

quoll12:07:11

The tedious work will be bringing Naga back into Asami, updating the transact function to call it, and then stripping down Naga’s use of storage to make it natively talk to Asami, rather than trying to handle it all via the protocol

folcon12:07:11

I could just pull naga's source and start tinkering if that's the best way to go about this?

folcon12:07:12

Ok, I've got a few other things to resolve this/next week, but this sounds like a workable direction, so when I get some free time I'm going to dig into this 😃...

quoll12:07:44

If it were me, I’d refamiliarize myself with the Naga implementation of the storage protocol https://github.com/quoll/naga/blob/main/src/naga/storage/asami/core.cljc

1
quoll12:07:26

This jumps through some hoops so that it can look more like another database. There’s code that can be removed

quoll12:07:44

a LOT of code can be removed

folcon12:07:51

Btw, with regards to storage, does it just generically store anything? Or is the storage format fixed? I'm wondering what happens if I try and store edn data.

quoll12:07:13

Should work

quoll12:07:38

Do you mean, store edn as a blob?

quoll12:07:49

In memory, an EDN structure is a reference like any other.

quoll12:07:48

On disk, it’s trickier. But it comes down to storing seqs

quoll12:07:37

Hmmm… it does actually miss a couple of things from EDN. It doesn’t store lists (it sees them as a seq, and stores them as vectors instead), and it doesn’t do metadata

quoll12:07:01

Just looking at edn. So it falls apart a couple of ways: • no set support #{} (this would be very easy to add) • no list support () (again, this would be easy) • No support for tagging, though this may be OK It looks like metadata is not needed (whew)

folcon12:07:51

For some rules, it would be useful to store more complex datastructures like priority hash maps, which can be worked with, I've been taking to just using the :tx-triples to add these, but I'm not confident that if I used anything other than mem-store they would survive the roundtrip to disk. For some cases of forward chaining it would be useful to store an edn representation of a function or some other structure of what to do. If they had to be triplets, then at the very least being able to record db functions somewhere and then spitting out triplets of:

[:db/fn.call :db-function/name [args]]
Could cover this?

folcon12:07:49

For one of my projects, I'm exploring the viability of a single db that can be used for local functioning as well as somewhere remote like on aws ec2 or lambda 😃...

folcon12:07:37

One gap here, is that in datascript / datomic gives you the database as the first arg inside db functions, so you can use the current database for your function, I've not worked out how to do this in asami. So this area is a bit underexplored as I'm looking at rewriting all of my current db functions to use this. Is this for performance reasons? Even access to the current graph would be helpful 😃... As I have some idea how to go about working with the graph.

folcon12:07:14

Hmm, this discussion is giving me a reasonable idea of action items and what needs to be looked at on my side, so that's good 😃... • Look at naga clauses and how they're handled, see how hard it is to have them depend on asami • Look at storing edn data into disk storage in a way that returns the original data • Look at optionally allowing the user to store more complex datastructures like priority maps, perhaps with a similar whitelisting mechanism of asami.query/*override-restrictions* • See if current db op behaviour can be extended to have the db or graph being the first arg

quoll13:07:31

I think we probably need a conversation, rather than delayed Slack messages 🙂

quoll13:07:04

• Naga clauses used to be a conjunction of simple patterns. But they now support a WHERE clause with any constructs. These can only be built using the API, and not with the datalog language. Some way to expose these for users to not need an API would be nice • Well, it looks like I just need to define codes for sets and lists, and write the handlers to serialize and deserialize theses (a couple of lines each) • Need to discuss what you mean when you say that you want to store a priority map • Are you suggesting that if the db function were to take a database as the first argument, then it would just act as an identity function?

folcon14:07:12

> I think we probably need a conversation, rather than delayed Slack messages 🙂 Agreed, let me know when you'd like to do that 😃... > Need to discuss what you mean when you say that you want to store a priority map specifically in this case, org.clojure/data.priority-map, the main usecase was to be able to store data which tracks some queuing behaviour. There's two ways to do this: 1. allow for the storage of more complex data types, with some optional way to extend it to more data structures. 2. store it outside the db, but now we need some key / value store for example as well as some logic around storing the data pre transact mapped to some key and then retrieving it post query. > Are you suggesting that if the db function were to take a database as the first argument, then it would just act as an identity function? I don't believe so? I'm comparing the db function behaviour of: • datomic[0] and datascript [1], both of which allow you access to the database as an argument to the function you define. • asami[2], where for retrieve-op there doesn't appear to be a similar capability to have access to the database or graph. I've tried doing this for example:

(let [pred (fn [db e a]
             (= a (:age (asami/entity db e))))]
  (binding [asami.query/*override-restrictions* true]
   (asami/q '[:find ?e
              :in $ ?pred
              :where [?e :food ?a]
              [(?pred $ ?e 10)]]
     db pred)))
However $ is just a symbol. • [0]: https://docs.datomic.com/on-prem/reference/database-functions.html • [1]: https://github.com/tonsky/datascript/blob/a6127c4886c93b2c43584fdf57daaeb97cbf86f6/test/datascript/test/transact.cljc#L200 • [2]: https://github.com/quoll/asami/blob/db42e4c1f4593ee1e22d5a644af21bb72d539417/src/asami/query.cljc#L198