This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-09-22
Channels
- # announcements (2)
- # asami (123)
- # aws (17)
- # babashka (77)
- # babashka-sci-dev (23)
- # beginners (48)
- # biff (6)
- # calva (35)
- # cider (16)
- # clj-on-windows (1)
- # clj-yaml (19)
- # clojure (36)
- # clojure-europe (78)
- # clojure-nl (5)
- # clojure-norway (8)
- # clojure-poland (3)
- # clojure-uk (16)
- # clojurescript (17)
- # cursive (6)
- # datahike (3)
- # datalevin (26)
- # duct (7)
- # emacs (41)
- # events (2)
- # fulcro (7)
- # graphql (5)
- # honeysql (13)
- # juxt (3)
- # kaocha (7)
- # lsp (5)
- # malli (12)
- # off-topic (14)
- # pathom (3)
- # portal (1)
- # rdf (9)
- # reitit (3)
- # remote-jobs (2)
- # shadow-cljs (37)
- # spacemacs (5)
- # tools-build (1)
- # tools-deps (20)
- # xtdb (2)
Hi Graph people... I have to do some experiments on a rather big graph (800K nodes and 4.2M edges). Is that actually possible with Clojure? has anyone done anything like that?
Hey @thomas, this depends on what kind of experiment you’re doing I think, but in principle I don’ts see why not. There’s also #rdf where lots of people are doing stuff with large datasets.
At the very least I would want to say that due to the Java interop capabilities, you can safely say that if it can be done using Java, it can also be done by using Clojure. I'm mostly thinking of the Java ecosystem that becomes available to you
we have been looking at Neo4j as well as a solution, there we can read in the data no problem. but having some problems with getting the right subtree's unfortunately.
Check out the dialogue between @U051N6TTC, @U4P4NREBY and @U03FKR4EU5S in the #rdf channel on August 23rd. They're discussing performance between frameworks and actual estimations of the size of your data set are mentioned.
I’m working with Stardog at the moment (like, literally, as I type this, it’s in the next window). With 18 million edges. I was calculating subtrees using transitive predicates, but that was taking 66 seconds on this laptop. I’m feeling happy right now because I precalculated it (lifting my edges from 11 million triples to the 18 million I have now), and consequently I got a result of 3 seconds. That’s better
I just… haven’t done it yet 😬 But doing it in Stardog has me motivated to get it done in Asami
I might give asami a try.... not ideal as all the data we need to access is in a Postgres at the moment. but could work as a PoC.
Basically, think of Asami as a flexible index. Which also describes other graph stores. But the data-in/data-out story is better in systems like Stardog
Please keep in touch with any issues you encounter. There are always a lot of things to do, and having users need something is great for both setting priorities, and for my own motivation
I want to extend an asami db to allow https://github.com/lilactown/pyramid/blob/main/src/pyramid/pull.cljc#L8 data out of it using EQL. is there a concrete type that I can extend a protocol to that would work for all database types?
The asami.storage/Database
protocol is just supposed to be a wrapper for this, and it just provides things like as-of
, as-of-t
, and so on. It also has the graph
function which returns the underlying graph
here's what I got so far
(defn- resolve-entity
[db [k v]]
(when-let [node (ffirst (ag/resolve-triple (a/graph db) '?e k v))]
(a/entity db node)))
(extend-protocol pull/IPullable
MemoryDatabase
(resolve-ref [db ref] (resolve-entity db ref))
(resolve-ref [db ref not-found] (or (resolve-entity db ref) not-found))
DurableDatabase
(resolve-ref [db ref] (resolve-entity db ref))
(resolve-ref [db ref not-found] (or (resolve-entity db ref) not-found)))
The functions on a graph are for modifying the data (`graph-add`, graph-delete
, graph-transact
), and ALL of the read operations, particularly resolve-triple
Oh, of course, I’m reading it upside down. Doh. Excuse me. I’m not feeling great today 🙂
No worries! I want two things:
1. I want to easily be able to go from an "ident" tuple [:person/id 1]
to the entity map
2. I want to be able to pass any arbitrary asami database to pyramid.core/pull
and it Just Works:tm:
I think that leads me to using asami.core/entity
and extending MemoryDatabase
and DurableDatabase
I’m kinda thinking that just using the entity
function on asami.storage.Database
already does this for you?
See how you’re code says:
(when-let [node (ffirst (ag/resolve-triple (a/graph db) '?e k v))]
(a/entity db node))
vs
(resolve-entity (a/db conn) [:person/id 0])
;; => {:person/id 0, :person/name "Rachel", :friend/list [#:db{:ident :a/node-22665} #:person{:id 2, :name "Cassie"} {:person/id 3, :person/name "Jake", :friend/best #:person{:id 1}} #:person{:id 4, :name "Tobias"} #:person{:id 5, :name "Ax", :species "andalite"}]}
I honestly can’t see how they would be different. Both MemoryDatabase
and DurableDatabase
pass the parameters through untouched to asami.entities.reader/ident->entity
Here is the code in that function:
(when-let [eid (or (and (seq (node/find-triple graph [ident '?a '?v])) ident)
(ffirst (node/find-triple graph ['?eid :db/ident ident]))
(ffirst (node/find-triple graph ['?eid :id ident])))]
(ref->entity graph eid nested?)))
{:tx-data
[{:person/id 0,
:person/name "Rachel",
:friend/list
({:person/id 1, :person/name "Marco", :db/id -2}
{:person/id 2, :person/name "Cassie", :db/id -3}
{:person/id 3, :person/name "Jake", :db/id -4}
{:person/id 4, :person/name "Tobias", :db/id -5}
{:person/id 5, :person/name "Ax", :db/id -6}),
:db/id -1}
{:person/id 1,
:friend/best {:person/id 3, :friend/best #:person{:id 1}, :db/id -4},
:db/id -2}
{:species
{:andalites [{:person/id 5, :person/species "andalite", :db/id -6}]}}]}
Your pair there… is that supposed to be :person/id
as an identifying property, and 0
is the key? Or is it a compound key of [:person/id 0]
?
yeah it's a key/value pair is the way I'm treating it rn. so [:person/id 0]
supposedly refers to the entity that has {:person/id 0}
I'm planning out when I'm going to get the new omicron vaccine... might do the flu shot too just to get all the yuck over with at once
Easy to do in memory (just throw an object into the value position). Trickier on disk, but I’m clever 😜
I think I'll assume people are doing that, rather than trying to look things up by some map ID.
given this tx on an in-memory store
(def tx {:tx-data '[{:db/ident [:person/id 0]
:person/id 0
:person/name "Rachel"
:friend/list ({:db/ident [:person/id 1]
:person/id 1
:person/name "Marco"}
{:db/ident [:person/id 2]
:person/id 2
:person/name "Cassie"
:db/id -3}
{:db/ident [:person/id 3]
:person/id 3
:person/name "Jake"}
{:db/ident [:person/id 4]
:person/id 4
:person/name "Tobias"}
{:db/ident [:person/id 5]
:person/id 5
:person/name "Ax"})}
{:db/ident [:person/id 1]
:person/id 1
:friend/best {:db/ident [:person/id 1]
:person/id 3
:friend/best #:person{:id 1}}}]})
the result:
(a/create-database "asami:")
(def conn (a/connect "asami:"))
(a/transact conn tx)
(a/entity (a/db conn) [:person/id 0])
;; => {:person/id 0, :person/name "Rachel", :friend/list [#:db{:ident [:person/id 1]} #:person{:id 2, :name "Cassie"} #:person{:id 3, :name "Jake"} #:person{:id 4, :name "Tobias"} #:person{:id 5, :name "Ax"}]}
When you’re storing nested objects like that, it’s supposed to pull it apart into its component triples
What we’re seeing in the above is that the first object correctly left :db/ident
alone, but then dropped all the other triples. The remaining objects might be OK, because I think :db/ident
properties are hidden when returning from ident->entity
Oh, wait! It may be working fine! Because your object that is keyed as {:db/ident [:person/id 1]}
is a top level entity, and you don’t have the nested?
flag set. This means that it sees the object, but rather than returning you the whole thing, it just gives you the identifier for it
None of the other objects appear outside of that structure, so they will always be embedded inside of it
if I did another transaction with something about {:db/ident [:person/id 2]}
, would it move that to a "top level" entity?
See what you get back if you update:
(defn- resolve-entity
[db [k v]]
(when-let [node (ffirst (ag/resolve-triple (a/graph db) '?e k v))]
(a/entity db node true)))
> if I did another transaction with something about {:db/ident [:person/id 2]}
, would it move that to a “top level” entity?
Yes
I think I want pyramid to be lazy at how it pulls entities out of the db. so I can rely on either the entity map containing :db/ident
OR it will have all the information in it
Alternatively:
(let [gr (a/graph db)
person2 (ffirst (ag/resolve-triple gr '?e :db/ident [:person/id 2]))]
(graph-add gr person2 :a/entity true))
Will also make it a top level entityBTW, when I said that the :db/ident
values don’t show up inside the entities when you retrieve them? It happens on this line:
https://github.com/quoll/asami/blob/6b8c0fb68ba734f968a3e5906d338cfd6b49e2a6/src/asami/entities/reader.cljc#L132
As for the nested structures thing, that’s done here: https://github.com/quoll/asami/blob/6b8c0fb68ba734f968a3e5906d338cfd6b49e2a6/src/asami/entities/reader.cljc#L83-L87
(seen v)
means that the object has already been included in a structure, so don’t recurse into it (this is to break loops in the graph).
Then it checks that you don’t want nested structures and it’s a top level entity. If so, then emit a short map, that’s either {:db/ident v}
, or {:id v}
or, finally, {:db/id v}
I mean… it’s trying to be extremely clever here. And that works for many cases. But if you’re not doing the “standard” thing (whatever that means), then I get that it’s confusing. Sorry!!!!
Well, hopefully by showing you some of the bits of code that are making these decisions, you’ll be able to work with the system, and not fight it
Then again, I have a black belt in TKD, which is more of a “hit it until it breaks” style 😜
my black belt in BJJ taught me to use my head. as a battering ram, when necessary
I think the core issue I'm running into now is that pyramid expects collections to be homogeneous, i.e. either it's a collection of "lookup refs" like [:person/id 0]
OR it's a collection of other stuff
assuming you are transacting entities with :db/ident
, my approach is now to replace maps with {:db/ident x}
with [:db/ident x]
in the data returned by (pyramid.pull/resolve-ref db y)
the tuple [:db/ident x]
would signal to pyramid to call (resolve-ref db [:db/ident x])
and then it would get the entity data via (a/entity db x)
this works as long as your collections are homogeneous, i.e. if (a/entity db [:db/ident 0])
returns
{:person/id 0
:friend/list [{:db/ident 1} {:db/ident 2}]}
however, if you have a mix of top level entities and nested entities then pyramid doesn't know what to do
{:person/id 0
:friend/list [{:db/ident 1} {:person/id 2 :person/name "Cassie}]}
I'll have to think about this. perhaps I can relax the requirement in pyramid that collections that contain references are homogeneous
So you have a choice:
• make everything a top level entity, and get back [{:db/ident 1} {:db/ident 2} …]
• called a/entity
with nested?
set to true
But then you could get deep objects coming back. However, as I mentioned earlier, it DOES break loops
It resolves the tree, yes, but not if it’s seen a node before. In that case, it will update it
See how it checks if it’s “seen” the value v
? If so, then it puts in the short map of {:db/id …}
{:db/ident …}
or {:id …}
that still runs into the issues of heterogeneous collections but probably would work for more data in the interim
So if person 1 includes person 0 in their friend list, and you ask for person 0, you’ll get back something like:
{:db/ident [:person/id 0]
:person/id 0
:person/name "Rachel"
:friend/list ({:person/id 1
:person/name "Marco"
:friend/list ({:db/ident [:person/id 0]})})}
Without the call to “seen” it would have put in the entire object, which would lead to infinite recursion. Always fun to process infinite recursion
yeah. pyramid allows you to recurse up to a point, which is nice. well, actually you can infinitely recurse if you want, but you can also put limits on it
e.g.
[{[:person/id 0] [:person/name {:friend/list [:person/name {:friend/list ...}]}]}]
follows all references, recursing forever
or you can provide limits. this will recurse up to 3 times
[{[:person/id 0] [:person/name {:friend/list [:person/name {:friend/list 3}]}]}]