asami

2022-07-06T11:37:17.170269Z

So having looked at naga and rete, naga processes rule clauses to work out what rules need to be run. However, I'm looking at, if instead of just recording the counts, we could also capture the matching datoms. This might be all that's needed to provide something akin to materialised views with incremental maintenance[0] or differential dataflow[1]. IE, we register some queries with the db, which are then cached, then instead of querying the entire db when a transaction changes its state, we can tell which clauses were affected and then appropriately conj/disj triples into the cached query. Now this may need to be a different mechanism so we don't interfere with naga's performance. This is also done in 3DF[2][3][4]. Am I missing anything obvious here? This is an area of interest to me and I'm trying to figure out if it's possible to do this in clojure, what would be involved and the best way to tackle it. Please let me know if anyone else is also interested 😃... • [0]: https://www.wotbrew.com/relic/materialization/ • [1]: https://timelydataflow.github.io/differential-dataflow/introduction.html • [2]: https://www.youtube.com/watch?v=ZgqFlowyfTA • [3]: clojure client https://github.com/sixthnormal/clj-3df • [4]: rust server https://github.com/comnik/declarative-dataflow

quoll 2022-07-06T11:51:37.955029Z

Yes, this is indeed possible! The reason why Naga relies on counts is because it was written to talk to any graph database, via a protocol. In most cases this means that the data returned is not lazy, and is uncached.

quoll 2022-07-06T11:55:12.771479Z

However, I have been wanting to reintegrate Naga into Asami. This will allow it to do some interesting new things, such as tracking generated statements, cheap deltas, and… sometimes… resolution caching.

2022-07-06T11:55:25.712219Z

Oh awesome! How hard would it be to do and what would be a good path for me to get started? This is a really painful problem and if I can get a performant solution to it would be awesome!

quoll 2022-07-06T11:56:36.973849Z

One problem is that resolutions CAN be expensive, since they're a generator when it's based in memory, and a cache holds onto the head. OTOH, on disk they're just a pointer and a pattern, so they're cheap to cache

quoll 2022-07-06T11:57:28.587089Z

Hmmm. It might be necessary to create a new function on the storage protocol to assist with the caching

2022-07-06T11:57:34.496859Z

Right, but if we can do some reasonable profiling, we can test both and if it's more expensive to run the resolution then we can just not record that query to be resolved?

quoll 2022-07-06T11:57:56.379619Z

It's expensive in memory, rather than time

2022-07-06T11:58:10.233419Z

Sure, but that's the tradeoff right 😃...

2022-07-06T12:00:34.889339Z

Hmm, what would be a good route to getting started on this? I'd like to help if possible as it would give me an opportunity to get more comfortable with the underlying system as well as better understand the rules execution side (naga) vs the materialised views side (query caching) =)...

quoll 2022-07-06T12:01:11.642709Z

The tedious work will be bringing Naga back into Asami, updating the transact function to call it, and then stripping down Naga’s use of storage to make it natively talk to Asami, rather than trying to handle it all via the protocol

2022-07-06T12:01:11.776389Z

I could just pull naga's source and start tinkering if that's the best way to go about this?

2022-07-06T12:05:12.816009Z

Ok, I've got a few other things to resolve this/next week, but this sounds like a workable direction, so when I get some free time I'm going to dig into this 😃...

quoll 2022-07-06T12:05:44.541389Z

If it were me, I’d refamiliarize myself with the Naga implementation of the storage protocol https://github.com/quoll/naga/blob/main/src/naga/storage/asami/core.cljc

👍🏽 1
quoll 2022-07-06T12:06:26.487039Z

This jumps through some hoops so that it can look more like another database. There’s code that can be removed

quoll 2022-07-06T12:06:44.167339Z

a LOT of code can be removed

2022-07-06T12:06:51.361479Z

Btw, with regards to storage, does it just generically store anything? Or is the storage format fixed? I'm wondering what happens if I try and store edn data.

quoll 2022-07-06T12:07:13.616579Z

Should work

quoll 2022-07-06T12:07:21.623349Z

🤞

quoll 2022-07-06T12:07:38.489639Z

Do you mean, store edn as a blob?

quoll 2022-07-06T12:09:49.226379Z

In memory, an EDN structure is a reference like any other.

quoll 2022-07-06T12:10:48.188939Z

On disk, it’s trickier. But it comes down to storing seqs

quoll 2022-07-06T12:11:37.961089Z

Hmmm… it does actually miss a couple of things from EDN. It doesn’t store lists (it sees them as a seq, and stores them as vectors instead), and it doesn’t do metadata

quoll 2022-07-06T12:16:01.843389Z

Just looking at edn. So it falls apart a couple of ways: • no set support #{} (this would be very easy to add) • no list support () (again, this would be easy) • No support for tagging, though this may be OK It looks like metadata is not needed (whew)

quoll 2022-07-06T12:16:27.927859Z

Support for this is all in https://github.com/quoll/asami/blob/main/src/asami/durable/encoder.clj

2022-07-06T12:36:51.125739Z

For some rules, it would be useful to store more complex datastructures like priority hash maps, which can be worked with, I've been taking to just using the :tx-triples to add these, but I'm not confident that if I used anything other than mem-store they would survive the roundtrip to disk. For some cases of forward chaining it would be useful to store an edn representation of a function or some other structure of what to do. If they had to be triplets, then at the very least being able to record db functions somewhere and then spitting out triplets of:

[:db/fn.call :db-function/name [args]]
Could cover this?

2022-07-06T12:39:49.069319Z

For one of my projects, I'm exploring the viability of a single db that can be used for local functioning as well as somewhere remote like on aws ec2 or lambda 😃...

2022-07-06T12:42:37.830519Z

One gap here, is that in datascript / datomic gives you the database as the first arg inside db functions, so you can use the current database for your function, I've not worked out how to do this in asami. So this area is a bit underexplored as I'm looking at rewriting all of my current db functions to use this. Is this for performance reasons? Even access to the current graph would be helpful 😃... As I have some idea how to go about working with the graph.

2022-07-06T12:48:14.362079Z

Hmm, this discussion is giving me a reasonable idea of action items and what needs to be looked at on my side, so that's good 😃... • Look at naga clauses and how they're handled, see how hard it is to have them depend on asami • Look at storing edn data into disk storage in a way that returns the original data • Look at optionally allowing the user to store more complex datastructures like priority maps, perhaps with a similar whitelisting mechanism of asami.query/*override-restrictions* • See if current db op behaviour can be extended to have the db or graph being the first arg

quoll 2022-07-06T13:33:31.994749Z

I think we probably need a conversation, rather than delayed Slack messages 🙂

quoll 2022-07-06T13:40:04.382199Z

• Naga clauses used to be a conjunction of simple patterns. But they now support a WHERE clause with any constructs. These can only be built using the API, and not with the datalog language. Some way to expose these for users to not need an API would be nice • Well, it looks like I just need to define codes for sets and lists, and write the handlers to serialize and deserialize theses (a couple of lines each) • Need to discuss what you mean when you say that you want to store a priority map • Are you suggesting that if the db function were to take a database as the first argument, then it would just act as an identity function?

2022-07-06T14:12:12.021429Z

> I think we probably need a conversation, rather than delayed Slack messages 🙂 Agreed, let me know when you'd like to do that 😃... > Need to discuss what you mean when you say that you want to store a priority map specifically in this case, org.clojure/data.priority-map, the main usecase was to be able to store data which tracks some queuing behaviour. There's two ways to do this: 1. allow for the storage of more complex data types, with some optional way to extend it to more data structures. 2. store it outside the db, but now we need some key / value store for example as well as some logic around storing the data pre transact mapped to some key and then retrieving it post query. > Are you suggesting that if the db function were to take a database as the first argument, then it would just act as an identity function? I don't believe so? I'm comparing the db function behaviour of: • datomic[0] and datascript [1], both of which allow you access to the database as an argument to the function you define. • asami[2], where for retrieve-op there doesn't appear to be a similar capability to have access to the database or graph. I've tried doing this for example:

(let [pred (fn [db e a]
             (= a (:age (asami/entity db e))))]
  (binding [asami.query/*override-restrictions* true]
   (asami/q '[:find ?e
              :in $ ?pred
              :where [?e :food ?a]
              [(?pred $ ?e 10)]]
     db pred)))
However $ is just a symbol. • [0]: https://docs.datomic.com/on-prem/reference/database-functions.html • [1]: https://github.com/tonsky/datascript/blob/a6127c4886c93b2c43584fdf57daaeb97cbf86f6/test/datascript/test/transact.cljc#L200 • [2]: https://github.com/quoll/asami/blob/db42e4c1f4593ee1e22d5a644af21bb72d539417/src/asami/query.cljc#L198