asami

xificurC 2023-12-12T09:59:02.501989Z

is there support for with-db capabilities in asami? Reading through the channel history and issues it seems yes. If so, is there a reason this API hasn't been copied over from datomic?

xificurC 2023-12-13T15:31:49.743699Z

is there a path to get (defn in-memory-with [db tx] ..)? I.e. support the full transaction data API. The my-with you shared above expects new and remove triples (and takes a connection, which is irrelevant to this discussion)

xificurC 2023-12-13T15:32:47.427439Z

when working in memory I'd prefer to instantiate an empty db and with over it as necessary, avoiding the entire connection API

xificurC 2023-12-13T16:01:22.692879Z

IIUC it's build-triples

xificurC 2023-12-13T16:35:38.394819Z

does this look OK to you?

quoll 2023-12-13T20:13:35.431659Z

Thank you… you found a bug for me! 🙂 The :tx-data isn’t right, since the add/`remove` seqs will contain triples (each triple is a 3 element vector). So the resulting :tx-data will just be a large list of triples that were removed and added with no explanation of what is what. I would build these lazily: (concat (map #(apply vector :db/retract %) remove) (map #(apply vector :db/add %) add)) (I might have done it differently, and not created this as a seq of triple vectors if I was starting this again, but this is where I am) 🙂

quoll 2023-12-13T20:15:00.974369Z

It survived as a bug because I have never looked at this output

quoll 2023-12-13T20:17:49.403959Z

I just looked at the original transact-async again, and I remember what a mess it is. I was trying to add a lot of functionality without changing the architecture. This included accumulating the triples as a transaction log (I chose a volatile for this due to speed). The way it passes back and forth between functions in core and the transact functions in the Connection implementations is dizzying. (Sorry!)

xificurC 2023-12-13T20:21:07.639489Z

rebuilding the vectors seems wasteful indeed. I thought build-triples could return the triples with add and retract markers at the beginning, but it's the same amount of waste. Maybe :tx-data is a bad idea and one should split it into :tx-added and :tx-retracted

quoll 2023-12-13T20:23:59.412059Z

Rebuilding is wasteful, yes… if that’s what you’re doing. Which I guess you will be. However, I’ve never looked at this data before, so I would never see it

quoll 2023-12-13T20:25:25.932399Z

For now, you’ll have to rebuild them. But I should log a bug on this, and then the code to strip the first element can be put into the transaction, rather than the build-triples function

xificurC 2023-12-13T20:26:17.247289Z

https://github.com/quoll/asami/wiki/5.-Entity-Structure#basic-structure doesn't the print of tx-data here show the missing add/retract information?

quoll 2023-12-13T20:27:42.155619Z

Hmmm… I made a point of running code before putting it into the docs, so maybe I’m missing something

xificurC 2023-12-13T20:31:23.759839Z

using the above code, running (with (->db) [{:db/id :a/foo :foo 1}]) throws

quoll 2023-12-13T20:36:26.164009Z

OK… I found what I was missing. It happens in common_index.cljc

quoll 2023-12-13T20:36:38.420249Z

graph-transact is in there

quoll 2023-12-13T20:38:02.642699Z

It removes the retractions, and if they result in a deletion, then it constructs the Datom for them and appends it. It then does the same with additions… if the triple is added then a Datom is created and appended

quoll 2023-12-13T20:38:19.061869Z

So you were right the first time, and there was no bug in my code. The bug is in my head

xificurC 2023-12-13T21:09:02.364549Z

the exception from above worries me more 🙂

quoll 2023-12-13T21:12:19.111059Z

Oh, I think I might have started looking for specific nodes types, and throwing exceptions when I don’t get them. Let me look

xificurC 2023-12-13T21:12:53.103629Z

is {:db/id :a/foo} not valid? What can I pass as an id

quoll 2023-12-13T21:17:35.443199Z

You’re supposed to pass nodes that were created for the graph. I check for this, even though it’s not really needed. (It indicates that you probably had some unexpected data if you don’t pass this test)

quoll 2023-12-13T21:18:31.349179Z

You create a node with:

(zuko.node/new-node the-graph)

quoll 2023-12-13T21:20:13.116939Z

But if it’s in memory, it just creates a keyword that looks like :a/node-123

quoll 2023-12-13T21:21:22.200119Z

It’s looking for a keyword with a namespace of "a" and a name that starts with "node-"

xificurC 2023-12-13T21:21:36.140949Z

that makes the API above (`with`) tricky, since you take a db and a tx. I'd like to specify my own node id as a keyword (preferably arbitrary, not in the a namespace). I cannot do that because new-node wants the graph as an argument

quoll 2023-12-13T21:22:05.826139Z

Yes, it’s the graph that defines how it wants its own nodes made

xificurC 2023-12-13T21:22:11.857609Z

(with (->db) [{:db/id ??}])

xificurC 2023-12-13T21:27:37.899489Z

https://github.com/quoll/asami/wiki/5.-Entity-Structure#dbid says you can define your own ids, or am I misreading

quoll 2023-12-13T21:28:17.714289Z

https://github.com/quoll/asami/issues/18 Until then you could use with-redefs on asami.graph/node-type? ?

quoll 2023-12-13T21:29:20.888239Z

the part I’m looking at is this:

quoll 2023-12-13T21:29:45.646159Z

That’s saying that you can specify an existing node… or a node you create on your own.

quoll 2023-12-13T21:30:04.465189Z

But not one that you’ve built manually

quoll 2023-12-13T21:30:28.006419Z

You’ll note that the example node follows the pattern: :a/node-XXXX

quoll 2023-12-13T21:30:39.699559Z

Which is what the test expects

xificurC 2023-12-13T21:50:21.080219Z

I see. Thanks. An alternative is to use tempids I guess, it's just more cumbersome

quoll 2023-12-13T22:10:39.150569Z

I’m sorry. I would do a release where I address it, but I don’t have time for a few days

quoll 2023-12-12T13:51:15.222659Z

Just like Datomic Pro (which was Datomic local) there is no need for a with-db. Just use a normal db

xificurC 2023-12-12T13:57:43.012119Z

help me finish the picture, using a normal db will work for queries, but what if I want to fork the db or "speculatively" apply transacations. Think git branch

quoll 2023-12-12T14:03:18.820089Z

I actually prevent a with database from being saved, but there’s no reason they can’t be. The reason I never allowed it was because I don’t have a good strategy for merging (or rebasing) branches. I asked a few people, but no one saw the need

xificurC 2023-12-12T14:08:42.885249Z

I'm exploring an in memory use case, so saving is not interesting. What could be interesting is applying some transactions and compare query results from db1 and db2. I understand if db1 and db2 share the same root (all db1 txs are in db2) we don't need branching. This is all theoretical at this point

quoll 2023-12-12T14:26:59.450599Z

5 more minutes and I’ll be back at my keyboard. I can talk better then

quoll 2023-12-12T14:33:39.478869Z

OK… I’m here

quoll 2023-12-12T14:34:12.473009Z

It may be helpful to explain the architecture

quoll 2023-12-12T14:34:29.149189Z

I’ll stick to in-memory, though on-disk is similar

quoll 2023-12-12T14:36:21.937419Z

It’s layered. Everything actually happens in the Graph. This is a protocol in https://github.com/quoll/asami/blob/main/src/asami/graph.cljc and the in-memory version is just a set of supporting functions around 3 PersistentHashMaps: https://github.com/quoll/asami/blob/main/src/asami/index.cljc

quoll 2023-12-12T14:37:03.521249Z

(There’s a historical simpler version too, if that would help)

quoll 2023-12-12T14:37:37.318439Z

Wrapping the graph is the Database (or “db”). This is just a protocol to make it look like Datomic.

quoll 2023-12-12T14:38:44.024539Z

Finally, there is an object called the Connection. This holds atoms that point to the latest Database, and an array of all the old databases.

quoll 2023-12-12T14:39:20.754389Z

If you do a query, it asks the Connection for the latest database, then asks the database for the graph that it wraps, and then queries the graph.

quoll 2023-12-12T14:40:49.683049Z

If you do a transaction, then it asks the Connection for the latest database, then asks the database for the graph that it wraps, then does the transaction on the graph. Once the transaction is done, it gets wrapped in a new Database object, and this Database is conjed onto the array that the Connection holds, and the “latest” connected is updated.

quoll 2023-12-12T14:41:53.797129Z

If you want to do a with, then you’re just getting the graph (as usual), and working with it normally. Any transactions still get wrapped in a Database, but the Connection doesn’t get updated.

quoll 2023-12-12T14:42:37.153479Z

The Connection is mutable, but Database/Graphs are immutable, just like any other Clojure objects (because that’s what they are)

quoll 2023-12-12T14:46:18.303249Z

Oh, dammit. I thought I had with checked in. Ugh

quoll 2023-12-12T14:46:43.492999Z

You can still do it with in-memory just fine. But I need it transparent for the on-disk version

quoll 2023-12-12T14:53:24.623779Z

Get the graph, transact against it, and then wrap it in a new Database. Unfortunately, the “transact” part is not standard. Instead you need a seq of triples to add, and a seq of triples to remove:

(def my-with
  [connection new-triples remove-triples]
  (let [graph (core/as-graph connection)
        new-graph (graph/graph-transact graph new-triples remove-triples)]
    (memory/as-database new-graph)))

quoll 2023-12-12T14:54:08.976079Z

This is mostly what a transaction does anyway 🙂

quoll 2023-12-12T14:55:53.615209Z

Except the transact has to deal with a whole lot of other possible parameters. e.g. nested maps, [:db/add …] and [:db/retract …] elements, seqs of triples… lots of stuff.

xificurC 2023-12-12T14:57:40.129349Z

is the with on your pc the my-with you posted here or one that papers over the internals and takes a datomic-like transaction

quoll 2023-12-12T15:00:41.399439Z

Oh, it uses the whole transaction operation structure (the entity handling is the hardest part there). The one I’ve been working on is not released because it deals with persistent databases. If it’s in memory, then it’s just what I said. But if it’s persistent, then I need a https://github.com/quoll/asami/blob/main/src/asami/wrapgraph.cljc between a transaction commit-point on disk, and an in-memory graph.

quoll 2023-12-12T15:01:25.242029Z

In general, I don’t want on-disk and in-memory to have any differences, so hiding the difference here is what’s delayed it. But in-memory is as trivial as what I showed you

xificurC 2023-12-12T15:01:59.569369Z

except it expects a list of new and remove triples 🙂

quoll 2023-12-12T15:14:35.914639Z

Yes. But that’s what I usually have. 🙂 If I’m getting :db/add and :db/retract statements, then you can just filter for those values in the transaction seq, and then map them to the subvec that skips the first value. The actual implementation is harder, because negative numbers require new nodes in the graph, and… Well, look for yourself: https://github.com/quoll/asami/blob/169301f0a783a40a73ad730b0645cacba7ea571f/src/asami/entities.cljc#L154

xificurC 2023-12-12T15:16:06.823069Z

yikes

quoll 2023-12-12T15:18:34.563839Z

Yeah… sorry!

quoll 2023-12-12T15:19:15.125559Z

You’ll see there are LOTS of comments in there, because it would be impossible for anyone (including me!!!) to navigate it otherwise

quoll 2023-12-12T15:20:19.876619Z

A function should do just ONE thing with ONE data structure. But when a function can handle many, many types of data structures like Datomic transactions does… well then I have to do it even if I don’t like it

quoll 2023-12-12T15:21:32.657659Z

That’s all assuming there’s no map in the structure. If there’s a map, then it’s an entity, and that means building triples from the entity. That’s in the same file on lines 55-136

xificurC 2023-12-12T15:22:27.758019Z

do you dislike the datomic API?

quoll 2023-12-12T15:22:46.401949Z

line 174 does the (if (map? obj) …) and then sends it off to entity-triples if it is.

quoll 2023-12-12T15:23:43.343779Z

In general, I like it a lot. But the transaction API take a single seq, and that seq can hold a lot of different kinds of things in it. That makes it easy for users, which I like, but OMG… it’s a mess behind the scenes, which I don’t like!

quoll 2023-12-12T15:24:49.892369Z

I like programming interfaces to handle one type of data per function, but user interfaces typically need to be the opposite

quoll 2023-12-12T15:25:21.410849Z

Basically, I don’t like users 🙃 🤣

quoll 2023-12-12T15:25:40.053219Z

(I do like users. They’re the reason we even do this. But user interactions are hard)

quoll 2023-12-12T15:27:11.103639Z

The build-triples function got more and more complex as I supported more and more features. I’m looking at it now, and realize that I should refactor it. It should be a simple loop over the transaction data, which dispatches as appropriate.

quoll 2023-12-12T15:29:05.037489Z

Actually… it is a simple loop. But instead of “dispatch” it handles everything in a single function (`add-triples` is inline from 168-241). That should be broken up into multiple function, and the add-triples function becomes a simple cond or something like that.

xificurC 2023-12-12T15:29:16.831479Z

maybe you could load it in an in-memory graph and perform rewrites on it 😉

quoll 2023-12-12T15:29:25.684689Z

Hah

quoll 2023-12-12T15:29:39.453439Z

I have a code parser that converts Clojure into a graph

quoll 2023-12-12T15:29:47.044119Z

And does it on Datomic even!

xificurC 2023-12-12T15:29:52.805999Z

do show

quoll 2023-12-12T15:30:14.308259Z

I should switch to using Borkdude’s parser though, since I based it on Rich’s Java-base parser

quoll 2023-12-12T15:30:42.933759Z

> do show OK, but please forgive me… it was written in 2015!

quoll 2023-12-12T15:32:27.541929Z

Found it! https://github.com/quoll/cast

quoll 2023-12-12T15:33:34.327969Z

BTW, I have to thank you. It’s conversations that like that reinvigorate me to get back to working on it

xificurC 2023-12-12T15:41:45.374289Z

I like to understand a library and its state before thinking of adopting it. Conversations like this help me make an educated decision, so thank you too 😉

quoll 2023-12-12T15:52:52.517049Z

I think a good way to understand Asami is to see that it was built as a very simple system, and then expanded as features were added. But the essential architecture remained the same throughout. So if you look at the original version, then that may help

quoll 2023-12-12T15:54:17.140469Z

The first iteration of the Graph protocol had just 3 functions: https://github.com/threatgrid/naga/blob/d149904da8ecb510ddbbb5b73824c74da4b05d77/src/naga/storage/memory/index.clj#L44-L47

quoll 2023-12-12T15:54:45.955579Z

The function implementations are very short too

xificurC 2023-12-14T09:00:25.595499Z

no worries

xificurC 2023-12-12T10:39:35.115219Z

what would it take to replicate the entity API? At least like (d/get node :name) and reverse lookup (d/get node :_friends)

xificurC 2023-12-13T12:51:10.608299Z

to compare with datascript - https://github.com/tonsky/datascript/blob/85f9b5d4deadba2841fa6e1ef5cc14225fcb4987/src/datascript/impl/entity.cljc#L166 • :db/id check • reverse lookup conditional • cache check • I don't know what touched checks • sift through the datoms • update cache

quoll 2023-12-13T20:14:40.052629Z

Yeah… I like mine more 🙂 I was never quite happy with the way datascript does things, though I know that others are not happy about Asami, so I guess we’re even

xificurC 2023-12-13T20:16:54.170799Z

the additional conditionals aside (which I don't understand why are they necessary) I think the only major difference is caching, no? Given a not-in-memory database reads are expensive. I think datomic caches the entity lookups too

quoll 2023-12-13T20:19:58.331129Z

Actually… no, it doesn’t. It relies on the indexes being fast. This is possibly a good candidate for caching.

quoll 2023-12-13T20:22:16.989219Z

That said… the in-memory indexes ARE fast (they’re just a call to (some-> spo node attribute)) And the on-disk indexes ARE cached, at a few different levels.

xificurC 2023-12-13T20:37:24.764209Z

the get from above for transacted {:foo 1} querying :foo returns ([1]). Why the double nesting? Is ffirst safe here? The entity API typically returns a single (latest) value for normal and a sequence for reverse lookup

quoll 2023-12-13T20:41:27.984939Z

Well, you’re right… it’s assuming that every attribute can have multi-cardinality. The first is necessary. The remaining [1] is the full collection of values for that attribute. If you use ffirst then you’ll always get the first value. So long as you never store more than one value in that attribute, then it will always return the correct thing.

quoll 2023-12-13T20:45:29.677839Z

Here’s an example: Say you have 2 nodes representing people: :node1 and :node2 If the first has a name of “Alice” and an age of 32, and the second is “Bob” with an age of 31, then the spo index will look like this:

{:node1 {:name ["Alice"]
         :age [32]}
 :node2 {:name ["Bob"]
         :age [31]}}

xificurC 2023-12-13T20:47:20.193179Z

a quick benchmark against datascript, can we do better?

quoll 2023-12-13T20:47:27.563379Z

Normally, if you look up the name for :node2 you would use: (ffirst (graph/resolve-triple spo :node1 :name '?name))

quoll 2023-12-13T20:48:07.719019Z

Oh well… they’re faster 🙂

xificurC 2023-12-13T20:48:55.971839Z

well I've run https://github.com/lambdaisland/datalog-benchmarks/blob/main/src/datalog_benchmarks/scratch.clj where asami is 50x faster on my machine for both queries

quoll 2023-12-13T20:49:33.920819Z

The issue may have nothing to do with the insertion of data. I think the most expensive part is the translation of the map into triples.

xificurC 2023-12-13T20:50:06.039529Z

I'm only timing the "get" which I called ? here, the entity attribute retrieval

quoll 2023-12-13T20:50:26.201519Z

Triples lookups are reasonably good on Asami. But building triples out of entities…. yes, that’s expensive. I did triple the speed a few years ago, so it used to be even slower

quoll 2023-12-13T20:51:14.881259Z

I had a JSON dataset that took 6 seconds to load, and I took it down to less than a second

quoll 2023-12-13T20:51:26.287989Z

So it used to be worse

xificurC 2023-12-13T20:51:42.278219Z

you're talking about insertion but the timings are for reading, or am I missing something

quoll 2023-12-13T21:01:24.164599Z

Oh, I wasn’t reading properly (I’m in a block of code for work. I’m also about to get on a call)

quoll 2023-12-13T21:01:48.829069Z

I don’t know what your ? is doing. Is it callling entity? Is it doing something else?

xificurC 2023-12-13T21:04:53.794559Z

thought I shared it, my bad. I rewrote it just now, now it beats datascript 😛

xificurC 2023-12-13T21:06:05.725789Z

I profiled it and >50% of time was spent in resolve-triple, and most of its time was in get-from-index. I know which index I need so I can just use it directly

quoll 2023-12-13T21:06:24.647609Z

If you want to get an attribute from a database, then you can define something like:

(defn get-attr [db node attr]
  (some-> db :graph :spo node attr first))

xificurC 2023-12-13T21:07:31.765959Z

almost, you need to get the graph first right

quoll 2023-12-13T21:07:55.345899Z

get-from-index is just how I map from the arguments to the index that’s needed, and how to look that index up. There’s probably something faster I can do there 🤔

quoll 2023-12-13T21:08:07.661359Z

Yes, the graph… let me edit it

quoll 2023-12-13T21:08:42.870629Z

There’s no testing needed for the first few levels so…

quoll 2023-12-13T21:09:18.541449Z

(defn get-attr [db node attr]
  (some-> ((db :graph) :spo) node attr first))

quoll 2023-12-13T21:10:33.057659Z

I can’t remember the results, but I recall using Criterium to check which of the following was fastest:

(index :key)
(:key index)
(get index :key)
There’s a small difference. From memory, the get function is not the fastest one

xificurC 2023-12-12T10:43:20.689469Z

the use case: given a root node traverse a tree lazily. Instead of pulling the whole nodes pull a subset of the eavs, or one by one on demand. I see pull query API is missing too

quoll 2023-12-12T13:52:31.848049Z

Pull is missing, yes. Mostly because I never learned it 😳

quoll 2023-12-12T13:53:16.318739Z

Doing a get like that is one-liner. 3 lines if it’s handling reverse directions like that

xificurC 2023-12-12T13:54:50.317069Z

you mean transforming it into a d/q query. I thought some way of direct read of the EAV would be more efficient? For the non-reverse lookup

quoll 2023-12-12T13:58:12.612109Z

No, the d/q function is definitely not the way to do it. I’d extract the graph object, and look up the index directly

quoll 2023-12-12T14:00:03.884779Z

I’m AFK right now, but the answer to “what would it take” is “very little”. The graph API is how all this sort of work is done (it’s how both the entity and q functions are built)

xificurC 2023-12-12T14:58:40.168779Z

and the graph API allows reverse lookups as well?

quoll 2023-12-12T15:03:14.746929Z

Well… it allows lookups. If you look up: [entity attribute value] and you check attribute and discover that (= \_ (first (name attribute))) then you change it to look up: [value attribute entity]

xificurC 2023-12-12T15:09:23.701529Z

by lookup you mean a correct asami.graph/resolve-triple incantation?

👍 1
quoll 2023-12-12T15:10:50.957349Z

So, get becomes:

(defn get
 [db node attr]
 (let [g (core/as-graph db)]
   (if (and (keyword? attr)
            (= \_ (first (or (namespace attr) (name attr)))))
     (graph/resolve-triple g '?v attr node)
     (graph/resolve-triple g node attr '?v))))
P.S. sorry to take so long. Slack is possibly my least-favorite Clojure IDE!

xificurC 2023-12-12T15:12:21.576339Z

ha, datomic allows both :_person/friends and :person/_friends?

xificurC 2023-12-12T15:13:29.061079Z

thanks for explaining the architecture, makes the whole thing clearer for me. Your wiki pages are also helpful in this regard

quoll 2023-12-12T15:15:58.718949Z

> ha, datomic allows both :_person/friends and :person/_friends? Oh… did I get that wrong? I tried to make it only work with :_person/friends and :_friends, but not with :person/_friends

xificurC 2023-12-12T15:16:40.395179Z

no, it's just me parsing it incorrectly

quoll 2023-12-12T15:17:38.633949Z

Well, you have me testing it 🙂

quoll 2023-12-12T15:18:02.290069Z