Hi Graph people... I have to do some experiments on a rather big graph (800K nodes and 4.2M edges). Is that actually possible with Clojure? has anyone done anything like that?
Hey @thomas, this depends on what kind of experiment you’re doing I think, but in principle I don’ts see why not. There’s also #rdf where lots of people are doing stuff with large datasets.
I need to do some traversals in the graph and find subtree's
At the very least I would want to say that due to the Java interop capabilities, you can safely say that if it can be done using Java, it can also be done by using Clojure. I'm mostly thinking of the Java ecosystem that becomes available to you
we have been looking at Neo4j as well as a solution, there we can read in the data no problem. but having some problems with getting the right subtree's unfortunately.
Check out the dialogue between @quoll, @simongray and @rowland.watkins in the #rdf channel on August 23rd. They're discussing performance between frameworks and actual estimations of the size of your data set are mentioned.
I’m working with Stardog at the moment (like, literally, as I type this, it’s in the next window). With 18 million edges. I was calculating subtrees using transitive predicates, but that was taking 66 seconds on this laptop. I’m feeling happy right now because I precalculated it (lifting my edges from 11 million triples to the 18 million I have now), and consequently I got a result of 3 seconds. That’s better
Stardog is all Java, but there’s a Clojure API. I know this, because I wrote it 😜
I saw that mentioned somewhere... really cool!
I think I can do similar things with Asami, but… that takes weekends
I don’t support queries on multiple graphs yet, but I plan that!
sounds good.
I know the syntax and everything
I just… haven’t done it yet 😬 But doing it in Stardog has me motivated to get it done in Asami
I might give asami a try.... not ideal as all the data we need to access is in a Postgres at the moment. but could work as a PoC.
Basically, think of Asami as a flexible index. Which also describes other graph stores. But the data-in/data-out story is better in systems like Stardog
ok that sounds good
Please keep in touch with any issues you encounter. There are always a lot of things to do, and having users need something is great for both setting priorities, and for my own motivation
I'll keep that in mind
I want to extend an asami db to allow https://github.com/lilactown/pyramid/blob/main/src/pyramid/pull.cljc#L8 data out of it using EQL. is there a concrete type that I can extend a protocol to that would work for all database types?
Hmm, give me a second…
OK, sorry.
The storage abstraction is the asami.graph/Graph protocol
The asami.storage/Database protocol is just supposed to be a wrapper for this, and it just provides things like as-of, as-of-t, and so on. It also has the graph function which returns the underlying graph
here's what I got so far
(defn- resolve-entity
[db [k v]]
(when-let [node (ffirst (ag/resolve-triple (a/graph db) '?e k v))]
(a/entity db node)))
(extend-protocol pull/IPullable
MemoryDatabase
(resolve-ref [db ref] (resolve-entity db ref))
(resolve-ref [db ref not-found] (or (resolve-entity db ref) not-found))
DurableDatabase
(resolve-ref [db ref] (resolve-entity db ref))
(resolve-ref [db ref not-found] (or (resolve-entity db ref) not-found)))The functions on a graph are for modifying the data (`graph-add`, graph-delete, graph-transact), and ALL of the read operations, particularly resolve-triple
So MemoryDatabase and DurableDatabase are both records
not protocols
right... I don't think I can extend my protocol to another protocol?
Oh, of course, I’m reading it upside down. Doh. Excuse me. I’m not feeling great today 🙂
No worries! I want two things:
1. I want to easily be able to go from an "ident" tuple [:person/id 1] to the entity map
2. I want to be able to pass any arbitrary asami database to pyramid.core/pull and it Just Works:tm:
I think that leads me to using asami.core/entity and extending MemoryDatabase and DurableDatabase
I’m kinda thinking that just using the entity function on asami.storage.Database already does this for you?
I think I would need to provide my own wrapper type then that implements IPullable
OK. But don’t do the work you’re doing in resolve-entity
See how you’re code says:
(when-let [node (ffirst (ag/resolve-triple (a/graph db) '?e k v))]
(a/entity db node))
when I do this
(a/entity (a/db conn) [:person/id 0])
;; => nilvs
(resolve-entity (a/db conn) [:person/id 0])
;; => {:person/id 0, :person/name "Rachel", :friend/list [#:db{:ident :a/node-22665} #:person{:id 2, :name "Cassie"} {:person/id 3, :person/name "Jake", :friend/best #:person{:id 1}} #:person{:id 4, :name "Tobias"} #:person{:id 5, :name "Ax", :species "andalite"}]}I can share my code if you want to play with it
I honestly can’t see how they would be different. Both MemoryDatabase and DurableDatabase pass the parameters through untouched to asami.entities.reader/ident->entity
Here is the code in that function:
(when-let [eid (or (and (seq (node/find-triple graph [ident '?a '?v])) ident)
(ffirst (node/find-triple graph ['?eid :db/ident ident]))
(ffirst (node/find-triple graph ['?eid :id ident])))]
(ref->entity graph eid nested?)))The value you passed in is ident
maybe it's the way I'm transacting these maps? I've got some other weirdnesss
{:tx-data
[{:person/id 0,
:person/name "Rachel",
:friend/list
({:person/id 1, :person/name "Marco", :db/id -2}
{:person/id 2, :person/name "Cassie", :db/id -3}
{:person/id 3, :person/name "Jake", :db/id -4}
{:person/id 4, :person/name "Tobias", :db/id -5}
{:person/id 5, :person/name "Ax", :db/id -6}),
:db/id -1}
{:person/id 1,
:friend/best {:person/id 3, :friend/best #:person{:id 1}, :db/id -4},
:db/id -2}
{:species
{:andalites [{:person/id 5, :person/species "andalite", :db/id -6}]}}]}Oh… wait. I see the difference
Your pair there… is that supposed to be :person/id as an identifying property, and 0 is the key? Or is it a compound key of [:person/id 0]?
I had presumed you had a compound key, but now I’m thinking it’s a key/value pair
yeah, OK. It’s a key/value pair. I get it. Carry on, you’re doing the correct thing.
yeah it's a key/value pair is the way I'm treating it rn. so [:person/id 0] supposedly refers to the entity that has {:person/id 0}
what's the "asami" way to do this?
I only expect entity keys of :db/ident or :id
something like
{:db/ident [:person/id 0] ,,,}
?gotcha
yes, that’s what I thought you were doing
Sorry!
Told you… my brain is foggy today
hey no worries at all. I'm learning asami too
(I had a flu shot last night, and woah)
Incidentally, you CAN have compound keys like I was just showing
In case you’re ever interested 🤣
I'm planning out when I'm going to get the new omicron vaccine... might do the flu shot too just to get all the yuck over with at once
a compound key is like {:db/ident [:person/id 0]} ? or something else?
Exactly like that
Easy to do in memory (just throw an object into the value position). Trickier on disk, but I’m clever 😜
I think I'll assume people are doing that, rather than trying to look things up by some map ID.
maybe you can also show me what I'm doing wrong with my transaction
given this tx on an in-memory store
(def tx {:tx-data '[{:db/ident [:person/id 0]
:person/id 0
:person/name "Rachel"
:friend/list ({:db/ident [:person/id 1]
:person/id 1
:person/name "Marco"}
{:db/ident [:person/id 2]
:person/id 2
:person/name "Cassie"
:db/id -3}
{:db/ident [:person/id 3]
:person/id 3
:person/name "Jake"}
{:db/ident [:person/id 4]
:person/id 4
:person/name "Tobias"}
{:db/ident [:person/id 5]
:person/id 5
:person/name "Ax"})}
{:db/ident [:person/id 1]
:person/id 1
:friend/best {:db/ident [:person/id 1]
:person/id 3
:friend/best #:person{:id 1}}}]})the result:
(a/create-database "asami:")
(def conn (a/connect "asami:"))
(a/transact conn tx)
(a/entity (a/db conn) [:person/id 0])
;; => {:person/id 0, :person/name "Rachel", :friend/list [#:db{:ident [:person/id 1]} #:person{:id 2, :name "Cassie"} #:person{:id 3, :name "Jake"} #:person{:id 4, :name "Tobias"} #:person{:id 5, :name "Ax"}]}
for some reason the first item in the :friend/list is not an entity map
Oh drat… that’s a bug
When you’re storing nested objects like that, it’s supposed to pull it apart into its component triples
UNLESS the key is :id or :db/ident
In that case, it shouldn’t be pulled apart
I noticed this as well when I was using temporary :db/id -1 etc. before too
when I remove the second map I'm transacting, it works as expected
I'll open up a gh issue
What we’re seeing in the above is that the first object correctly left :db/ident alone, but then dropped all the other triples. The remaining objects might be OK, because I think :db/ident properties are hidden when returning from ident->entity
Oh, wait! It may be working fine! Because your object that is keyed as {:db/ident [:person/id 1]} is a top level entity, and you don’t have the nested? flag set. This means that it sees the object, but rather than returning you the whole thing, it just gives you the identifier for it
None of the other objects appear outside of that structure, so they will always be embedded inside of it
hmm I see
And their :db/ident fields are hidden when you ask for them by entity
if I did another transaction with something about {:db/ident [:person/id 2]}, would it move that to a "top level" entity?
See what you get back if you update:
(defn- resolve-entity
[db [k v]]
(when-let [node (ffirst (ag/resolve-triple (a/graph db) '?e k v))]
(a/entity db node true)))
> if I did another transaction with something about {:db/ident [:person/id 2]}, would it move that to a “top level” entity?
Yes
gotcha
I think I want pyramid to be lazy at how it pulls entities out of the db. so I can rely on either the entity map containing :db/ident OR it will have all the information in it
Alternatively:
(let [gr (a/graph db)
person2 (ffirst (ag/resolve-triple gr '?e :db/ident [:person/id 2]))]
(graph-add gr person2 :a/entity true))
Will also make it a top level entityBTW, when I said that the :db/ident values don’t show up inside the entities when you retrieve them? It happens on this line:
https://github.com/quoll/asami/blob/6b8c0fb68ba734f968a3e5906d338cfd6b49e2a6/src/asami/entities/reader.cljc#L132
As for the nested structures thing, that’s done here: https://github.com/quoll/asami/blob/6b8c0fb68ba734f968a3e5906d338cfd6b49e2a6/src/asami/entities/reader.cljc#L83-L87
(seen v) means that the object has already been included in a structure, so don’t recurse into it (this is to break loops in the graph).
Then it checks that you don’t want nested structures and it’s a top level entity. If so, then emit a short map, that’s either {:db/ident v}, or {:id v} or, finally, {:db/id v}
I mean… it’s trying to be extremely clever here. And that works for many cases. But if you’re not doing the “standard” thing (whatever that means), then I get that it’s confusing. Sorry!!!!
This all came about due to user requests 😄
Yeah I want to make pyramid match up with what's "standard"
Well, hopefully by showing you some of the bits of code that are making these decisions, you’ll be able to work with the system, and not fight it
(Can you tell I’ve done several years of Aikido?)
Then again, I have a black belt in TKD, which is more of a “hit it until it breaks” style 😜
my black belt in BJJ taught me to use my head. as a battering ram, when necessary
I think the core issue I'm running into now is that pyramid expects collections to be homogeneous, i.e. either it's a collection of "lookup refs" like [:person/id 0] OR it's a collection of other stuff
My first thought is to ask you to ensure that this is what you store 🙂
I'm trying to think of how I can use this on a well constructed asami db
assuming you are transacting entities with :db/ident, my approach is now to replace maps with {:db/ident x} with [:db/ident x] in the data returned by (pyramid.pull/resolve-ref db y)
the tuple [:db/ident x] would signal to pyramid to call (resolve-ref db [:db/ident x]) and then it would get the entity data via (a/entity db x)
this works as long as your collections are homogeneous, i.e. if (a/entity db [:db/ident 0]) returns
{:person/id 0
:friend/list [{:db/ident 1} {:db/ident 2}]}however, if you have a mix of top level entities and nested entities then pyramid doesn't know what to do
{:person/id 0
:friend/list [{:db/ident 1} {:person/id 2 :person/name "Cassie}]}I'll have to think about this. perhaps I can relax the requirement in pyramid that collections that contain references are homogeneous
So you have a choice:
• make everything a top level entity, and get back [{:db/ident 1} {:db/ident 2} …]
• called a/entity with nested? set to true
nested? will resolve the entire tree, right?
But then you could get deep objects coming back. However, as I mentioned earlier, it DOES break loops
what happens when it reaches a cycle? what kind of data does it return?
It resolves the tree, yes, but not if it’s seen a node before. In that case, it will update it
I actually showed this earlier
ah sorry. I'll look back
See how it checks if it’s “seen” the value v? If so, then it puts in the short map of {:db/id …} {:db/ident …} or {:id …}
ah ok cool
that still runs into the issues of heterogeneous collections but probably would work for more data in the interim
So if person 1 includes person 0 in their friend list, and you ask for person 0, you’ll get back something like:
{:db/ident [:person/id 0]
:person/id 0
:person/name "Rachel"
:friend/list ({:person/id 1
:person/name "Marco"
:friend/list ({:db/ident [:person/id 0]})})}
Without the call to “seen” it would have put in the entire object, which would lead to infinite recursion. Always fun to process infinite recursion
yeah. pyramid allows you to recurse up to a point, which is nice. well, actually you can infinitely recurse if you want, but you can also put limits on it
e.g.
[{[:person/id 0] [:person/name {:friend/list [:person/name {:friend/list ...}]}]}]
follows all references, recursing forever
or you can provide limits. this will recurse up to 3 times
[{[:person/id 0] [:person/name {:friend/list [:person/name {:friend/list 3}]}]}]or you can specify directly when to terminate by eliding the :friend/list selection
[{[:person/id 0] [:person/name {:friend/list [:person/name {:friend/list [:person/name]}]}]}]EQL is cool
it looks like I could probably extend asami.memory/MemoryDatabase and asami.durable.store/DurableDatabase?