is there support for with-db capabilities in asami? Reading through the channel history and issues it seems yes. If so, is there a reason this API hasn't been copied over from datomic?
is there a path to get (defn in-memory-with [db tx] ..)? I.e. support the full transaction data API. The my-with you shared above expects new and remove triples (and takes a connection, which is irrelevant to this discussion)
when working in memory I'd prefer to instantiate an empty db and with over it as necessary, avoiding the entire connection API
IIUC it's build-triples
does this look OK to you?
Thank you… you found a bug for me! 🙂
The :tx-data isn’t right, since the add/`remove` seqs will contain triples (each triple is a 3 element vector). So the resulting :tx-data will just be a large list of triples that were removed and added with no explanation of what is what.
I would build these lazily: (concat (map #(apply vector :db/retract %) remove) (map #(apply vector :db/add %) add))
(I might have done it differently, and not created this as a seq of triple vectors if I was starting this again, but this is where I am) 🙂
It survived as a bug because I have never looked at this output
I just looked at the original transact-async again, and I remember what a mess it is. I was trying to add a lot of functionality without changing the architecture. This included accumulating the triples as a transaction log (I chose a volatile for this due to speed). The way it passes back and forth between functions in core and the transact functions in the Connection implementations is dizzying. (Sorry!)
rebuilding the vectors seems wasteful indeed. I thought build-triples could return the triples with add and retract markers at the beginning, but it's the same amount of waste. Maybe :tx-data is a bad idea and one should split it into :tx-added and :tx-retracted
Rebuilding is wasteful, yes… if that’s what you’re doing. Which I guess you will be. However, I’ve never looked at this data before, so I would never see it
For now, you’ll have to rebuild them. But I should log a bug on this, and then the code to strip the first element can be put into the transaction, rather than the build-triples function
https://github.com/quoll/asami/wiki/5.-Entity-Structure#basic-structure doesn't the print of tx-data here show the missing add/retract information?
Hmmm… I made a point of running code before putting it into the docs, so maybe I’m missing something
using the above code, running (with (->db) [{:db/id :a/foo :foo 1}]) throws
OK… I found what I was missing. It happens in common_index.cljc
graph-transact is in there
It removes the retractions, and if they result in a deletion, then it constructs the Datom for them and appends it. It then does the same with additions… if the triple is added then a Datom is created and appended
So you were right the first time, and there was no bug in my code. The bug is in my head
the exception from above worries me more 🙂
Oh, I think I might have started looking for specific nodes types, and throwing exceptions when I don’t get them. Let me look
is {:db/id :a/foo} not valid? What can I pass as an id
You’re supposed to pass nodes that were created for the graph. I check for this, even though it’s not really needed. (It indicates that you probably had some unexpected data if you don’t pass this test)
You create a node with:
(zuko.node/new-node the-graph)
But if it’s in memory, it just creates a keyword that looks like :a/node-123
The test is at: https://github.com/quoll/asami/blob/169301f0a783a40a73ad730b0645cacba7ea571f/src/asami/graph.cljc#L112
It’s looking for a keyword with a namespace of "a" and a name that starts with "node-"
that makes the API above (`with`) tricky, since you take a db and a tx. I'd like to specify my own node id as a keyword (preferably arbitrary, not in the a namespace). I cannot do that because new-node wants the graph as an argument
Yes, it’s the graph that defines how it wants its own nodes made
(with (->db) [{:db/id ??}])
https://github.com/quoll/asami/wiki/5.-Entity-Structure#dbid says you can define your own ids, or am I misreading
https://github.com/quoll/asami/issues/18
Until then you could use with-redefs on asami.graph/node-type?
?
the part I’m looking at is this:
That’s saying that you can specify an existing node… or a node you create on your own.
But not one that you’ve built manually
You’ll note that the example node follows the pattern: :a/node-XXXX
Which is what the test expects
I see. Thanks. An alternative is to use tempids I guess, it's just more cumbersome
I’m sorry. I would do a release where I address it, but I don’t have time for a few days
Just like Datomic Pro (which was Datomic local) there is no need for a with-db. Just use a normal db
help me finish the picture, using a normal db will work for queries, but what if I want to fork the db or "speculatively" apply transacations. Think git branch
I actually prevent a with database from being saved, but there’s no reason they can’t be. The reason I never allowed it was because I don’t have a good strategy for merging (or rebasing) branches.
I asked a few people, but no one saw the need
I'm exploring an in memory use case, so saving is not interesting. What could be interesting is applying some transactions and compare query results from db1 and db2. I understand if db1 and db2 share the same root (all db1 txs are in db2) we don't need branching. This is all theoretical at this point
5 more minutes and I’ll be back at my keyboard. I can talk better then
OK… I’m here
It may be helpful to explain the architecture
I’ll stick to in-memory, though on-disk is similar
It’s layered. Everything actually happens in the Graph. This is a protocol in https://github.com/quoll/asami/blob/main/src/asami/graph.cljc and the in-memory version is just a set of supporting functions around 3 PersistentHashMaps: https://github.com/quoll/asami/blob/main/src/asami/index.cljc
(There’s a historical simpler version too, if that would help)
Wrapping the graph is the Database (or “db”). This is just a protocol to make it look like Datomic.
Finally, there is an object called the Connection. This holds atoms that point to the latest Database, and an array of all the old databases.
If you do a query, it asks the Connection for the latest database, then asks the database for the graph that it wraps, and then queries the graph.
If you do a transaction, then it asks the Connection for the latest database, then asks the database for the graph that it wraps, then does the transaction on the graph. Once the transaction is done, it gets wrapped in a new Database object, and this Database is conjed onto the array that the Connection holds, and the “latest” connected is updated.
If you want to do a with, then you’re just getting the graph (as usual), and working with it normally. Any transactions still get wrapped in a Database, but the Connection doesn’t get updated.
The Connection is mutable, but Database/Graphs are immutable, just like any other Clojure objects (because that’s what they are)
Oh, dammit. I thought I had with checked in. Ugh
You can still do it with in-memory just fine. But I need it transparent for the on-disk version
Get the graph, transact against it, and then wrap it in a new Database. Unfortunately, the “transact” part is not standard. Instead you need a seq of triples to add, and a seq of triples to remove:
(def my-with
[connection new-triples remove-triples]
(let [graph (core/as-graph connection)
new-graph (graph/graph-transact graph new-triples remove-triples)]
(memory/as-database new-graph)))
This is mostly what a transaction does anyway 🙂
Except the transact has to deal with a whole lot of other possible parameters. e.g. nested maps, [:db/add …] and [:db/retract …] elements, seqs of triples… lots of stuff.
is the with on your pc the my-with you posted here or one that papers over the internals and takes a datomic-like transaction
Oh, it uses the whole transaction operation structure (the entity handling is the hardest part there). The one I’ve been working on is not released because it deals with persistent databases. If it’s in memory, then it’s just what I said. But if it’s persistent, then I need a https://github.com/quoll/asami/blob/main/src/asami/wrapgraph.cljc between a transaction commit-point on disk, and an in-memory graph.
In general, I don’t want on-disk and in-memory to have any differences, so hiding the difference here is what’s delayed it. But in-memory is as trivial as what I showed you
except it expects a list of new and remove triples 🙂
Yes. But that’s what I usually have. 🙂
If I’m getting :db/add and :db/retract statements, then you can just filter for those values in the transaction seq, and then map them to the subvec that skips the first value.
The actual implementation is harder, because negative numbers require new nodes in the graph, and… Well, look for yourself:
https://github.com/quoll/asami/blob/169301f0a783a40a73ad730b0645cacba7ea571f/src/asami/entities.cljc#L154
yikes
Yeah… sorry!
You’ll see there are LOTS of comments in there, because it would be impossible for anyone (including me!!!) to navigate it otherwise
A function should do just ONE thing with ONE data structure. But when a function can handle many, many types of data structures like Datomic transactions does… well then I have to do it even if I don’t like it
That’s all assuming there’s no map in the structure. If there’s a map, then it’s an entity, and that means building triples from the entity. That’s in the same file on lines 55-136
do you dislike the datomic API?
line 174 does the (if (map? obj) …) and then sends it off to entity-triples if it is.
In general, I like it a lot. But the transaction API take a single seq, and that seq can hold a lot of different kinds of things in it. That makes it easy for users, which I like, but OMG… it’s a mess behind the scenes, which I don’t like!
I like programming interfaces to handle one type of data per function, but user interfaces typically need to be the opposite
Basically, I don’t like users 🙃 🤣
(I do like users. They’re the reason we even do this. But user interactions are hard)
The build-triples function got more and more complex as I supported more and more features. I’m looking at it now, and realize that I should refactor it.
It should be a simple loop over the transaction data, which dispatches as appropriate.
Actually… it is a simple loop. But instead of “dispatch” it handles everything in a single function (`add-triples` is inline from 168-241). That should be broken up into multiple function, and the add-triples function becomes a simple cond or something like that.
maybe you could load it in an in-memory graph and perform rewrites on it 😉
Hah
I have a code parser that converts Clojure into a graph
And does it on Datomic even!
do show
I should switch to using Borkdude’s parser though, since I based it on Rich’s Java-base parser
> do show OK, but please forgive me… it was written in 2015!
Found it! https://github.com/quoll/cast
BTW, I have to thank you. It’s conversations that like that reinvigorate me to get back to working on it
I like to understand a library and its state before thinking of adopting it. Conversations like this help me make an educated decision, so thank you too 😉
I think a good way to understand Asami is to see that it was built as a very simple system, and then expanded as features were added. But the essential architecture remained the same throughout. So if you look at the original version, then that may help
The first iteration of the Graph protocol had just 3 functions: https://github.com/threatgrid/naga/blob/d149904da8ecb510ddbbb5b73824c74da4b05d77/src/naga/storage/memory/index.clj#L44-L47
The function implementations are very short too
no worries
what would it take to replicate the entity API? At least like (d/get node :name) and reverse lookup (d/get node :_friends)
to compare with datascript - https://github.com/tonsky/datascript/blob/85f9b5d4deadba2841fa6e1ef5cc14225fcb4987/src/datascript/impl/entity.cljc#L166 • :db/id check • reverse lookup conditional • cache check • I don't know what touched checks • sift through the datoms • update cache
Yeah… I like mine more 🙂 I was never quite happy with the way datascript does things, though I know that others are not happy about Asami, so I guess we’re even
the additional conditionals aside (which I don't understand why are they necessary) I think the only major difference is caching, no? Given a not-in-memory database reads are expensive. I think datomic caches the entity lookups too
Actually… no, it doesn’t. It relies on the indexes being fast. This is possibly a good candidate for caching.
That said… the in-memory indexes ARE fast (they’re just a call to (some-> spo node attribute))
And the on-disk indexes ARE cached, at a few different levels.
the get from above for transacted {:foo 1} querying :foo returns ([1]). Why the double nesting? Is ffirst safe here? The entity API typically returns a single (latest) value for normal and a sequence for reverse lookup
Well, you’re right… it’s assuming that every attribute can have multi-cardinality. The first is necessary. The remaining [1] is the full collection of values for that attribute. If you use ffirst then you’ll always get the first value. So long as you never store more than one value in that attribute, then it will always return the correct thing.
Here’s an example:
Say you have 2 nodes representing people: :node1 and :node2
If the first has a name of “Alice” and an age of 32, and the second is “Bob” with an age of 31, then the spo index will look like this:
{:node1 {:name ["Alice"]
:age [32]}
:node2 {:name ["Bob"]
:age [31]}}
a quick benchmark against datascript, can we do better?
Normally, if you look up the name for :node2 you would use:
(ffirst (graph/resolve-triple spo :node1 :name '?name))
Oh well… they’re faster 🙂
well I've run https://github.com/lambdaisland/datalog-benchmarks/blob/main/src/datalog_benchmarks/scratch.clj where asami is 50x faster on my machine for both queries
The issue may have nothing to do with the insertion of data. I think the most expensive part is the translation of the map into triples.
I'm only timing the "get" which I called ? here, the entity attribute retrieval
Triples lookups are reasonably good on Asami. But building triples out of entities…. yes, that’s expensive. I did triple the speed a few years ago, so it used to be even slower
I had a JSON dataset that took 6 seconds to load, and I took it down to less than a second
So it used to be worse
you're talking about insertion but the timings are for reading, or am I missing something
Oh, I wasn’t reading properly (I’m in a block of code for work. I’m also about to get on a call)
I don’t know what your ? is doing. Is it callling entity? Is it doing something else?
thought I shared it, my bad. I rewrote it just now, now it beats datascript 😛
I profiled it and >50% of time was spent in resolve-triple, and most of its time was in get-from-index. I know which index I need so I can just use it directly
If you want to get an attribute from a database, then you can define something like:
(defn get-attr [db node attr]
(some-> db :graph :spo node attr first))almost, you need to get the graph first right
get-from-index is just how I map from the arguments to the index that’s needed, and how to look that index up. There’s probably something faster I can do there 🤔
Yes, the graph… let me edit it
There’s no testing needed for the first few levels so…
(defn get-attr [db node attr]
(some-> ((db :graph) :spo) node attr first))
I can’t remember the results, but I recall using Criterium to check which of the following was fastest:
(index :key)
(:key index)
(get index :key)
There’s a small difference. From memory, the get function is not the fastest onehttps://gist.github.com/xificurC/df079feb85bb118efd305764c8714cf5
the use case: given a root node traverse a tree lazily. Instead of pulling the whole nodes pull a subset of the eavs, or one by one on demand. I see pull query API is missing too
Pull is missing, yes. Mostly because I never learned it 😳
Doing a get like that is one-liner. 3 lines if it’s handling reverse directions like that
you mean transforming it into a d/q query. I thought some way of direct read of the EAV would be more efficient? For the non-reverse lookup
No, the d/q function is definitely not the way to do it. I’d extract the graph object, and look up the index directly
I’m AFK right now, but the answer to “what would it take” is “very little”. The graph API is how all this sort of work is done (it’s how both the entity and q functions are built)
and the graph API allows reverse lookups as well?
Well… it allows lookups.
If you look up:
[entity attribute value]
and you check attribute and discover that (= \_ (first (name attribute))) then you change it to look up:
[value attribute entity]
by lookup you mean a correct asami.graph/resolve-triple incantation?
So, get becomes:
(defn get
[db node attr]
(let [g (core/as-graph db)]
(if (and (keyword? attr)
(= \_ (first (or (namespace attr) (name attr)))))
(graph/resolve-triple g '?v attr node)
(graph/resolve-triple g node attr '?v))))
P.S. sorry to take so long. Slack is possibly my least-favorite Clojure IDE!ha, datomic allows both :_person/friends and :person/_friends?
thanks for explaining the architecture, makes the whole thing clearer for me. Your wiki pages are also helpful in this regard
> ha, datomic allows both :_person/friends and :person/_friends?
Oh… did I get that wrong? I tried to make it only work with :_person/friends and :_friends, but not with :person/_friends
no, it's just me parsing it incorrectly
Well, you have me testing it 🙂