This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-12-12
Channels
- # adventofcode (42)
- # aleph (10)
- # announcements (1)
- # asami (138)
- # babashka (7)
- # beginners (7)
- # biff (13)
- # cider (7)
- # clj-kondo (15)
- # clojure (53)
- # clojure-austin (11)
- # clojure-belgium (2)
- # clojure-europe (23)
- # clojure-nl (1)
- # clojure-norway (55)
- # clojure-sweden (5)
- # clojure-uk (4)
- # cryogen (7)
- # cursive (63)
- # datomic (5)
- # eastwood (6)
- # emacs (31)
- # fulcro (7)
- # hyperfiddle (9)
- # introduce-yourself (3)
- # java (11)
- # lsp (10)
- # malli (14)
- # membrane (35)
- # off-topic (13)
- # portal (12)
- # prelude (1)
- # releases (2)
- # ring-swagger (27)
- # shadow-cljs (8)
- # timbre (25)
is there support for with-db
capabilities in asami? Reading through the channel history and issues it seems yes. If so, is there a reason this API hasn't been copied over from datomic?
Just like Datomic Pro (which was Datomic local) there is no need for a with-db
. Just use a normal db
help me finish the picture, using a normal db will work for queries, but what if I want to fork the db or "speculatively" apply transacations. Think git branch
I actually prevent a with
database from being saved, but there’s no reason they can’t be. The reason I never allowed it was because I don’t have a good strategy for merging (or rebasing) branches.
I asked a few people, but no one saw the need
I'm exploring an in memory use case, so saving is not interesting. What could be interesting is applying some transactions and compare query results from db1 and db2. I understand if db1 and db2 share the same root (all db1 txs are in db2) we don't need branching. This is all theoretical at this point
It’s layered. Everything actually happens in the Graph
. This is a protocol in https://github.com/quoll/asami/blob/main/src/asami/graph.cljc and the in-memory version is just a set of supporting functions around 3 PersistentHashMaps: https://github.com/quoll/asami/blob/main/src/asami/index.cljc
Wrapping the graph is the Database (or “db”). This is just a protocol to make it look like Datomic.
Finally, there is an object called the Connection. This holds atoms that point to the latest Database, and an array of all the old databases.
If you do a query, it asks the Connection for the latest database, then asks the database for the graph that it wraps, and then queries the graph.
If you do a transaction, then it asks the Connection for the latest database, then asks the database for the graph that it wraps, then does the transaction on the graph. Once the transaction is done, it gets wrapped in a new Database object, and this Database is conj
ed onto the array that the Connection holds, and the “latest” connected is updated.
If you want to do a with
, then you’re just getting the graph (as usual), and working with it normally. Any transactions still get wrapped in a Database, but the Connection doesn’t get updated.
The Connection is mutable, but Database/Graphs are immutable, just like any other Clojure objects (because that’s what they are)
You can still do it with in-memory just fine. But I need it transparent for the on-disk version
Get the graph, transact against it, and then wrap it in a new Database. Unfortunately, the “transact” part is not standard. Instead you need a seq of triples to add, and a seq of triples to remove:
(def my-with
[connection new-triples remove-triples]
(let [graph (core/as-graph connection)
new-graph (graph/graph-transact graph new-triples remove-triples)]
(memory/as-database new-graph)))
Except the transact
has to deal with a whole lot of other possible parameters. e.g. nested maps, [:db/add …]
and [:db/retract …]
elements, seqs of triples… lots of stuff.
is the with
on your pc the my-with
you posted here or one that papers over the internals and takes a datomic-like transaction
Oh, it uses the whole transaction operation structure (the entity handling is the hardest part there). The one I’ve been working on is not released because it deals with persistent databases. If it’s in memory, then it’s just what I said. But if it’s persistent, then I need a https://github.com/quoll/asami/blob/main/src/asami/wrapgraph.cljc between a transaction commit-point on disk, and an in-memory graph.
In general, I don’t want on-disk and in-memory to have any differences, so hiding the difference here is what’s delayed it. But in-memory is as trivial as what I showed you
Yes. But that’s what I usually have. 🙂
If I’m getting :db/add
and :db/retract
statements, then you can just filter for those values in the transaction seq, and then map them to the subvec that skips the first value.
The actual implementation is harder, because negative numbers require new nodes in the graph, and… Well, look for yourself:
https://github.com/quoll/asami/blob/169301f0a783a40a73ad730b0645cacba7ea571f/src/asami/entities.cljc#L154
You’ll see there are LOTS of comments in there, because it would be impossible for anyone (including me!!!) to navigate it otherwise
A function should do just ONE thing with ONE data structure. But when a function can handle many, many types of data structures like Datomic transactions does… well then I have to do it even if I don’t like it
That’s all assuming there’s no map in the structure. If there’s a map, then it’s an entity, and that means building triples from the entity. That’s in the same file on lines 55-136
In general, I like it a lot. But the transaction API take a single seq, and that seq can hold a lot of different kinds of things in it. That makes it easy for users, which I like, but OMG… it’s a mess behind the scenes, which I don’t like!
I like programming interfaces to handle one type of data per function, but user interfaces typically need to be the opposite
The build-triples
function got more and more complex as I supported more and more features. I’m looking at it now, and realize that I should refactor it.
It should be a simple loop over the transaction data, which dispatches as appropriate.
Actually… it is a simple loop. But instead of “dispatch” it handles everything in a single function (`add-triples` is inline from 168-241). That should be broken up into multiple function, and the add-triples
function becomes a simple cond
or something like that.
I should switch to using Borkdude’s parser though, since I based it on Rich’s Java-base parser
BTW, I have to thank you. It’s conversations that like that reinvigorate me to get back to working on it
I like to understand a library and its state before thinking of adopting it. Conversations like this help me make an educated decision, so thank you too 😉
I think a good way to understand Asami is to see that it was built as a very simple system, and then expanded as features were added. But the essential architecture remained the same throughout. So if you look at the original version, then that may help
The first iteration of the Graph protocol had just 3 functions: https://github.com/threatgrid/naga/blob/d149904da8ecb510ddbbb5b73824c74da4b05d77/src/naga/storage/memory/index.clj#L44-L47
is there a path to get (defn in-memory-with [db tx] ..)
? I.e. support the full transaction data API. The my-with
you shared above expects new and remove triples (and takes a connection, which is irrelevant to this discussion)
when working in memory I'd prefer to instantiate an empty db and with
over it as necessary, avoiding the entire connection API
Thank you… you found a bug for me! 🙂
The :tx-data
isn’t right, since the add
/`remove` seqs will contain triples (each triple is a 3 element vector). So the resulting :tx-data
will just be a large list of triples that were removed and added with no explanation of what is what.
I would build these lazily: (concat (map #(apply vector :db/retract %) remove) (map #(apply vector :db/add %) add))
(I might have done it differently, and not created this as a seq of triple vectors if I was starting this again, but this is where I am) 🙂
I just looked at the original transact-async
again, and I remember what a mess it is. I was trying to add a lot of functionality without changing the architecture. This included accumulating the triples as a transaction log (I chose a volatile for this due to speed). The way it passes back and forth between functions in core and the transact functions in the Connection implementations is dizzying. (Sorry!)
rebuilding the vectors seems wasteful indeed. I thought build-triples
could return the triples with add and retract markers at the beginning, but it's the same amount of waste. Maybe :tx-data
is a bad idea and one should split it into :tx-added
and :tx-retracted
Rebuilding is wasteful, yes… if that’s what you’re doing. Which I guess you will be. However, I’ve never looked at this data before, so I would never see it
For now, you’ll have to rebuild them. But I should log a bug on this, and then the code to strip the first element can be put into the transaction, rather than the build-triples
function
https://github.com/quoll/asami/wiki/5.-Entity-Structure#basic-structure doesn't the print of tx-data here show the missing add/retract information?
Hmmm… I made a point of running code before putting it into the docs, so maybe I’m missing something
It removes the retractions, and if they result in a deletion, then it constructs the Datom for them and appends it. It then does the same with additions… if the triple is added then a Datom is created and appended
So you were right the first time, and there was no bug in my code. The bug is in my head
Oh, I think I might have started looking for specific nodes types, and throwing exceptions when I don’t get them. Let me look
You’re supposed to pass nodes that were created for the graph. I check for this, even though it’s not really needed. (It indicates that you probably had some unexpected data if you don’t pass this test)
The test is at: https://github.com/quoll/asami/blob/169301f0a783a40a73ad730b0645cacba7ea571f/src/asami/graph.cljc#L112
that makes the API above (`with`) tricky, since you take a db and a tx. I'd like to specify my own node id as a keyword (preferably arbitrary, not in the a
namespace). I cannot do that because new-node
wants the graph as an argument
https://github.com/quoll/asami/wiki/5.-Entity-Structure#dbid says you can define your own ids, or am I misreading
https://github.com/quoll/asami/issues/18
Until then you could use with-redefs
on asami.graph/node-type?
?
I’m sorry. I would do a release where I address it, but I don’t have time for a few days
what would it take to replicate the entity API? At least like (d/get node :name)
and reverse lookup (d/get node :_friends)
the use case: given a root node traverse a tree lazily. Instead of pulling the whole nodes pull a subset of the eavs, or one by one on demand. I see pull query API is missing too
Doing a get like that is one-liner. 3 lines if it’s handling reverse directions like that
you mean transforming it into a d/q
query. I thought some way of direct read of the EAV would be more efficient? For the non-reverse lookup
No, the d/q
function is definitely not the way to do it. I’d extract the graph object, and look up the index directly
I’m AFK right now, but the answer to “what would it take” is “very little”. The graph API is how all this sort of work is done (it’s how both the entity
and q
functions are built)
Well… it allows lookups.
If you look up:
[entity attribute value]
and you check attribute
and discover that (= \_ (first (name attribute)))
then you change it to look up:
[value attribute entity]
So, get
becomes:
(defn get
[db node attr]
(let [g (core/as-graph db)]
(if (and (keyword? attr)
(= \_ (first (or (namespace attr) (name attr)))))
(graph/resolve-triple g '?v attr node)
(graph/resolve-triple g node attr '?v))))
P.S. sorry to take so long. Slack is possibly my least-favorite Clojure IDE!thanks for explaining the architecture, makes the whole thing clearer for me. Your wiki pages are also helpful in this regard
> ha, datomic allows both :_person/friends
and :person/_friends
?
Oh… did I get that wrong? I tried to make it only work with :_person/friends
and :_friends
, but not with :person/_friends
to compare with datascript - https://github.com/tonsky/datascript/blob/85f9b5d4deadba2841fa6e1ef5cc14225fcb4987/src/datascript/impl/entity.cljc#L166 • :db/id check • reverse lookup conditional • cache check • I don't know what touched checks • sift through the datoms • update cache
Yeah… I like mine more 🙂 I was never quite happy with the way datascript does things, though I know that others are not happy about Asami, so I guess we’re even
the additional conditionals aside (which I don't understand why are they necessary) I think the only major difference is caching, no? Given a not-in-memory database reads are expensive. I think datomic caches the entity lookups too
Actually… no, it doesn’t. It relies on the indexes being fast. This is possibly a good candidate for caching.
That said… the in-memory indexes ARE fast (they’re just a call to (some-> spo node attribute)
)
And the on-disk indexes ARE cached, at a few different levels.
the get
from above for transacted {:foo 1}
querying :foo
returns ([1])
. Why the double nesting? Is ffirst
safe here? The entity API typically returns a single (latest) value for normal and a sequence for reverse lookup
Well, you’re right… it’s assuming that every attribute can have multi-cardinality. The first
is necessary. The remaining [1]
is the full collection of values for that attribute. If you use ffirst
then you’ll always get the first value. So long as you never store more than one value in that attribute, then it will always return the correct thing.
Here’s an example:
Say you have 2 nodes representing people: :node1
and :node2
If the first has a name of “Alice” and an age of 32, and the second is “Bob” with an age of 31, then the spo index will look like this:
{:node1 {:name ["Alice"]
:age [32]}
:node2 {:name ["Bob"]
:age [31]}}
Normally, if you look up the name for :node2
you would use:
(ffirst (graph/resolve-triple spo :node1 :name '?name))
well I've run https://github.com/lambdaisland/datalog-benchmarks/blob/main/src/datalog_benchmarks/scratch.clj where asami is 50x faster on my machine for both queries
The issue may have nothing to do with the insertion of data. I think the most expensive part is the translation of the map into triples.
Triples lookups are reasonably good on Asami. But building triples out of entities…. yes, that’s expensive. I did triple the speed a few years ago, so it used to be even slower
I had a JSON dataset that took 6 seconds to load, and I took it down to less than a second
you're talking about insertion but the timings are for reading, or am I missing something
Oh, I wasn’t reading properly (I’m in a block of code for work. I’m also about to get on a call)
I profiled it and >50% of time was spent in resolve-triple
, and most of its time was in get-from-index
. I know which index I need so I can just use it directly
If you want to get an attribute from a database, then you can define something like:
(defn get-attr [db node attr]
(some-> db :graph :spo node attr first))
get-from-index
is just how I map from the arguments to the index that’s needed, and how to look that index up. There’s probably something faster I can do there :thinking_face: