Fork me on GitHub
#asami
<
2023-12-12
>
xificurC09:12:02

is there support for with-db capabilities in asami? Reading through the channel history and issues it seems yes. If so, is there a reason this API hasn't been copied over from datomic?

quoll13:12:15

Just like Datomic Pro (which was Datomic local) there is no need for a with-db. Just use a normal db

xificurC13:12:43

help me finish the picture, using a normal db will work for queries, but what if I want to fork the db or "speculatively" apply transacations. Think git branch

quoll14:12:18

I actually prevent a with database from being saved, but there’s no reason they can’t be. The reason I never allowed it was because I don’t have a good strategy for merging (or rebasing) branches. I asked a few people, but no one saw the need

xificurC14:12:42

I'm exploring an in memory use case, so saving is not interesting. What could be interesting is applying some transactions and compare query results from db1 and db2. I understand if db1 and db2 share the same root (all db1 txs are in db2) we don't need branching. This is all theoretical at this point

quoll14:12:59

5 more minutes and I’ll be back at my keyboard. I can talk better then

quoll14:12:39

OK… I’m here

quoll14:12:12

It may be helpful to explain the architecture

quoll14:12:29

I’ll stick to in-memory, though on-disk is similar

quoll14:12:21

It’s layered. Everything actually happens in the Graph. This is a protocol in https://github.com/quoll/asami/blob/main/src/asami/graph.cljc and the in-memory version is just a set of supporting functions around 3 PersistentHashMaps: https://github.com/quoll/asami/blob/main/src/asami/index.cljc

quoll14:12:03

(There’s a historical simpler version too, if that would help)

quoll14:12:37

Wrapping the graph is the Database (or “db”). This is just a protocol to make it look like Datomic.

quoll14:12:44

Finally, there is an object called the Connection. This holds atoms that point to the latest Database, and an array of all the old databases.

quoll14:12:20

If you do a query, it asks the Connection for the latest database, then asks the database for the graph that it wraps, and then queries the graph.

quoll14:12:49

If you do a transaction, then it asks the Connection for the latest database, then asks the database for the graph that it wraps, then does the transaction on the graph. Once the transaction is done, it gets wrapped in a new Database object, and this Database is conjed onto the array that the Connection holds, and the “latest” connected is updated.

quoll14:12:53

If you want to do a with, then you’re just getting the graph (as usual), and working with it normally. Any transactions still get wrapped in a Database, but the Connection doesn’t get updated.

quoll14:12:37

The Connection is mutable, but Database/Graphs are immutable, just like any other Clojure objects (because that’s what they are)

quoll14:12:18

Oh, dammit. I thought I had with checked in. Ugh

quoll14:12:43

You can still do it with in-memory just fine. But I need it transparent for the on-disk version

quoll14:12:24

Get the graph, transact against it, and then wrap it in a new Database. Unfortunately, the “transact” part is not standard. Instead you need a seq of triples to add, and a seq of triples to remove:

(def my-with
  [connection new-triples remove-triples]
  (let [graph (core/as-graph connection)
        new-graph (graph/graph-transact graph new-triples remove-triples)]
    (memory/as-database new-graph)))

quoll14:12:08

This is mostly what a transaction does anyway 🙂

quoll14:12:53

Except the transact has to deal with a whole lot of other possible parameters. e.g. nested maps, [:db/add …] and [:db/retract …] elements, seqs of triples… lots of stuff.

xificurC14:12:40

is the with on your pc the my-with you posted here or one that papers over the internals and takes a datomic-like transaction

quoll15:12:41

Oh, it uses the whole transaction operation structure (the entity handling is the hardest part there). The one I’ve been working on is not released because it deals with persistent databases. If it’s in memory, then it’s just what I said. But if it’s persistent, then I need a https://github.com/quoll/asami/blob/main/src/asami/wrapgraph.cljc between a transaction commit-point on disk, and an in-memory graph.

quoll15:12:25

In general, I don’t want on-disk and in-memory to have any differences, so hiding the difference here is what’s delayed it. But in-memory is as trivial as what I showed you

xificurC15:12:59

except it expects a list of new and remove triples 🙂

quoll15:12:35

Yes. But that’s what I usually have. 🙂 If I’m getting :db/add and :db/retract statements, then you can just filter for those values in the transaction seq, and then map them to the subvec that skips the first value. The actual implementation is harder, because negative numbers require new nodes in the graph, and… Well, look for yourself: https://github.com/quoll/asami/blob/169301f0a783a40a73ad730b0645cacba7ea571f/src/asami/entities.cljc#L154

quoll15:12:34

Yeah… sorry!

quoll15:12:15

You’ll see there are LOTS of comments in there, because it would be impossible for anyone (including me!!!) to navigate it otherwise

quoll15:12:19

A function should do just ONE thing with ONE data structure. But when a function can handle many, many types of data structures like Datomic transactions does… well then I have to do it even if I don’t like it

quoll15:12:32

That’s all assuming there’s no map in the structure. If there’s a map, then it’s an entity, and that means building triples from the entity. That’s in the same file on lines 55-136

xificurC15:12:27

do you dislike the datomic API?

quoll15:12:46

line 174 does the (if (map? obj) …) and then sends it off to entity-triples if it is.

quoll15:12:43

In general, I like it a lot. But the transaction API take a single seq, and that seq can hold a lot of different kinds of things in it. That makes it easy for users, which I like, but OMG… it’s a mess behind the scenes, which I don’t like!

quoll15:12:49

I like programming interfaces to handle one type of data per function, but user interfaces typically need to be the opposite

quoll15:12:21

Basically, I don’t like users 🙃 :rolling_on_the_floor_laughing:

quoll15:12:40

(I do like users. They’re the reason we even do this. But user interactions are hard)

quoll15:12:11

The build-triples function got more and more complex as I supported more and more features. I’m looking at it now, and realize that I should refactor it. It should be a simple loop over the transaction data, which dispatches as appropriate.

quoll15:12:05

Actually… it is a simple loop. But instead of “dispatch” it handles everything in a single function (`add-triples` is inline from 168-241). That should be broken up into multiple function, and the add-triples function becomes a simple cond or something like that.

xificurC15:12:16

maybe you could load it in an in-memory graph and perform rewrites on it 😉

quoll15:12:39

I have a code parser that converts Clojure into a graph

quoll15:12:47

And does it on Datomic even!

quoll15:12:14

I should switch to using Borkdude’s parser though, since I based it on Rich’s Java-base parser

quoll15:12:42

> do show OK, but please forgive me… it was written in 2015!

quoll15:12:34

BTW, I have to thank you. It’s conversations that like that reinvigorate me to get back to working on it

xificurC15:12:45

I like to understand a library and its state before thinking of adopting it. Conversations like this help me make an educated decision, so thank you too 😉

quoll15:12:52

I think a good way to understand Asami is to see that it was built as a very simple system, and then expanded as features were added. But the essential architecture remained the same throughout. So if you look at the original version, then that may help

quoll15:12:45

The function implementations are very short too

xificurC15:12:49

is there a path to get (defn in-memory-with [db tx] ..)? I.e. support the full transaction data API. The my-with you shared above expects new and remove triples (and takes a connection, which is irrelevant to this discussion)

xificurC15:12:47

when working in memory I'd prefer to instantiate an empty db and with over it as necessary, avoiding the entire connection API

xificurC16:12:22

IIUC it's build-triples

xificurC16:12:38

does this look OK to you?

quoll20:12:35

Thank you… you found a bug for me! 🙂 The :tx-data isn’t right, since the add/`remove` seqs will contain triples (each triple is a 3 element vector). So the resulting :tx-data will just be a large list of triples that were removed and added with no explanation of what is what. I would build these lazily: (concat (map #(apply vector :db/retract %) remove) (map #(apply vector :db/add %) add)) (I might have done it differently, and not created this as a seq of triple vectors if I was starting this again, but this is where I am) 🙂

quoll20:12:00

It survived as a bug because I have never looked at this output

quoll20:12:49

I just looked at the original transact-async again, and I remember what a mess it is. I was trying to add a lot of functionality without changing the architecture. This included accumulating the triples as a transaction log (I chose a volatile for this due to speed). The way it passes back and forth between functions in core and the transact functions in the Connection implementations is dizzying. (Sorry!)

xificurC20:12:07

rebuilding the vectors seems wasteful indeed. I thought build-triples could return the triples with add and retract markers at the beginning, but it's the same amount of waste. Maybe :tx-data is a bad idea and one should split it into :tx-added and :tx-retracted

quoll20:12:59

Rebuilding is wasteful, yes… if that’s what you’re doing. Which I guess you will be. However, I’ve never looked at this data before, so I would never see it

quoll20:12:25

For now, you’ll have to rebuild them. But I should log a bug on this, and then the code to strip the first element can be put into the transaction, rather than the build-triples function

xificurC20:12:17

https://github.com/quoll/asami/wiki/5.-Entity-Structure#basic-structure doesn't the print of tx-data here show the missing add/retract information?

quoll20:12:42

Hmmm… I made a point of running code before putting it into the docs, so maybe I’m missing something

xificurC20:12:23

using the above code, running (with (->db) [{:db/id :a/foo :foo 1}]) throws

quoll20:12:26

OK… I found what I was missing. It happens in common_index.cljc

quoll20:12:38

graph-transact is in there

quoll20:12:02

It removes the retractions, and if they result in a deletion, then it constructs the Datom for them and appends it. It then does the same with additions… if the triple is added then a Datom is created and appended

quoll20:12:19

So you were right the first time, and there was no bug in my code. The bug is in my head

xificurC21:12:02

the exception from above worries me more 🙂

quoll21:12:19

Oh, I think I might have started looking for specific nodes types, and throwing exceptions when I don’t get them. Let me look

xificurC21:12:53

is {:db/id :a/foo} not valid? What can I pass as an id

quoll21:12:35

You’re supposed to pass nodes that were created for the graph. I check for this, even though it’s not really needed. (It indicates that you probably had some unexpected data if you don’t pass this test)

quoll21:12:31

You create a node with:

(zuko.node/new-node the-graph)

quoll21:12:13

But if it’s in memory, it just creates a keyword that looks like :a/node-123

quoll21:12:22

It’s looking for a keyword with a namespace of "a" and a name that starts with "node-"

xificurC21:12:36

that makes the API above (`with`) tricky, since you take a db and a tx. I'd like to specify my own node id as a keyword (preferably arbitrary, not in the a namespace). I cannot do that because new-node wants the graph as an argument

quoll21:12:05

Yes, it’s the graph that defines how it wants its own nodes made

xificurC21:12:11

(with (->db) [{:db/id ??}])

xificurC21:12:37

https://github.com/quoll/asami/wiki/5.-Entity-Structure#dbid says you can define your own ids, or am I misreading

quoll21:12:17

https://github.com/quoll/asami/issues/18 Until then you could use with-redefs on asami.graph/node-type? ?

quoll21:12:20

the part I’m looking at is this:

quoll21:12:45

That’s saying that you can specify an existing node… or a node you create on your own.

quoll21:12:04

But not one that you’ve built manually

quoll21:12:28

You’ll note that the example node follows the pattern: :a/node-XXXX

quoll21:12:39

Which is what the test expects

xificurC21:12:21

I see. Thanks. An alternative is to use tempids I guess, it's just more cumbersome

quoll22:12:39

I’m sorry. I would do a release where I address it, but I don’t have time for a few days

xificurC10:12:35

what would it take to replicate the entity API? At least like (d/get node :name) and reverse lookup (d/get node :_friends)

xificurC10:12:20

the use case: given a root node traverse a tree lazily. Instead of pulling the whole nodes pull a subset of the eavs, or one by one on demand. I see pull query API is missing too

quoll13:12:31

Pull is missing, yes. Mostly because I never learned it 😳

quoll13:12:16

Doing a get like that is one-liner. 3 lines if it’s handling reverse directions like that

xificurC13:12:50

you mean transforming it into a d/q query. I thought some way of direct read of the EAV would be more efficient? For the non-reverse lookup

quoll13:12:12

No, the d/q function is definitely not the way to do it. I’d extract the graph object, and look up the index directly

quoll14:12:03

I’m AFK right now, but the answer to “what would it take” is “very little”. The graph API is how all this sort of work is done (it’s how both the entity and q functions are built)

xificurC14:12:40

and the graph API allows reverse lookups as well?

quoll15:12:14

Well… it allows lookups. If you look up: [entity attribute value] and you check attribute and discover that (= \_ (first (name attribute))) then you change it to look up: [value attribute entity]

xificurC15:12:23

by lookup you mean a correct asami.graph/resolve-triple incantation?

👍 1
quoll15:12:50

So, get becomes:

(defn get
 [db node attr]
 (let [g (core/as-graph db)]
   (if (and (keyword? attr)
            (= \_ (first (or (namespace attr) (name attr)))))
     (graph/resolve-triple g '?v attr node)
     (graph/resolve-triple g node attr '?v))))
P.S. sorry to take so long. Slack is possibly my least-favorite Clojure IDE!

xificurC15:12:21

ha, datomic allows both :_person/friends and :person/_friends?

xificurC15:12:29

thanks for explaining the architecture, makes the whole thing clearer for me. Your wiki pages are also helpful in this regard

quoll15:12:58

> ha, datomic allows both :_person/friends and :person/_friends? Oh… did I get that wrong? I tried to make it only work with :_person/friends and :_friends, but not with :person/_friends

xificurC15:12:40

no, it's just me parsing it incorrectly

quoll15:12:38

Well, you have me testing it 🙂

xificurC12:12:10

to compare with datascript - https://github.com/tonsky/datascript/blob/85f9b5d4deadba2841fa6e1ef5cc14225fcb4987/src/datascript/impl/entity.cljc#L166 • :db/id check • reverse lookup conditional • cache check • I don't know what touched checks • sift through the datoms • update cache

quoll20:12:40

Yeah… I like mine more 🙂 I was never quite happy with the way datascript does things, though I know that others are not happy about Asami, so I guess we’re even

xificurC20:12:54

the additional conditionals aside (which I don't understand why are they necessary) I think the only major difference is caching, no? Given a not-in-memory database reads are expensive. I think datomic caches the entity lookups too

quoll20:12:58

Actually… no, it doesn’t. It relies on the indexes being fast. This is possibly a good candidate for caching.

quoll20:12:16

That said… the in-memory indexes ARE fast (they’re just a call to (some-> spo node attribute)) And the on-disk indexes ARE cached, at a few different levels.

xificurC20:12:24

the get from above for transacted {:foo 1} querying :foo returns ([1]). Why the double nesting? Is ffirst safe here? The entity API typically returns a single (latest) value for normal and a sequence for reverse lookup

quoll20:12:27

Well, you’re right… it’s assuming that every attribute can have multi-cardinality. The first is necessary. The remaining [1] is the full collection of values for that attribute. If you use ffirst then you’ll always get the first value. So long as you never store more than one value in that attribute, then it will always return the correct thing.

quoll20:12:29

Here’s an example: Say you have 2 nodes representing people: :node1 and :node2 If the first has a name of “Alice” and an age of 32, and the second is “Bob” with an age of 31, then the spo index will look like this:

{:node1 {:name ["Alice"]
         :age [32]}
 :node2 {:name ["Bob"]
         :age [31]}}

xificurC20:12:20

a quick benchmark against datascript, can we do better?

quoll20:12:27

Normally, if you look up the name for :node2 you would use: (ffirst (graph/resolve-triple spo :node1 :name '?name))

quoll20:12:07

Oh well… they’re faster 🙂

quoll20:12:33

The issue may have nothing to do with the insertion of data. I think the most expensive part is the translation of the map into triples.

xificurC20:12:06

I'm only timing the "get" which I called ? here, the entity attribute retrieval

quoll20:12:26

Triples lookups are reasonably good on Asami. But building triples out of entities…. yes, that’s expensive. I did triple the speed a few years ago, so it used to be even slower

quoll20:12:14

I had a JSON dataset that took 6 seconds to load, and I took it down to less than a second

quoll20:12:26

So it used to be worse

xificurC20:12:42

you're talking about insertion but the timings are for reading, or am I missing something

quoll21:12:24

Oh, I wasn’t reading properly (I’m in a block of code for work. I’m also about to get on a call)

quoll21:12:48

I don’t know what your ? is doing. Is it callling entity? Is it doing something else?

xificurC21:12:53

thought I shared it, my bad. I rewrote it just now, now it beats datascript 😛

xificurC21:12:05

I profiled it and >50% of time was spent in resolve-triple, and most of its time was in get-from-index. I know which index I need so I can just use it directly

quoll21:12:24

If you want to get an attribute from a database, then you can define something like:

(defn get-attr [db node attr]
  (some-> db :graph :spo node attr first))

xificurC21:12:31

almost, you need to get the graph first right

quoll21:12:55

get-from-index is just how I map from the arguments to the index that’s needed, and how to look that index up. There’s probably something faster I can do there :thinking_face:

quoll21:12:07

Yes, the graph… let me edit it

quoll21:12:42

There’s no testing needed for the first few levels so…

quoll21:12:18

(defn get-attr [db node attr]
  (some-> ((db :graph) :spo) node attr first))

quoll21:12:33

I can’t remember the results, but I recall using Criterium to check which of the following was fastest:

(index :key)
(:key index)
(get index :key)
There’s a small difference. From memory, the get function is not the fastest one