Fork me on GitHub
#asami
<
2022-09-22
>
thomas08:09:00

Hi Graph people... I have to do some experiments on a rather big graph (800K nodes and 4.2M edges). Is that actually possible with Clojure? has anyone done anything like that?

simongray08:09:58

Hey @thomas, this depends on what kind of experiment you’re doing I think, but in principle I don’ts see why not. There’s also #rdf where lots of people are doing stuff with large datasets.

thomas08:09:09

I need to do some traversals in the graph and find subtree's

Bart Kleijngeld08:09:53

At the very least I would want to say that due to the Java interop capabilities, you can safely say that if it can be done using Java, it can also be done by using Clojure. I'm mostly thinking of the Java ecosystem that becomes available to you

thomas08:09:19

we have been looking at Neo4j as well as a solution, there we can read in the data no problem. but having some problems with getting the right subtree's unfortunately.

Bart Kleijngeld13:09:38

Check out the dialogue between @U051N6TTC, @U4P4NREBY and @U03FKR4EU5S in the #rdf channel on August 23rd. They're discussing performance between frameworks and actual estimations of the size of your data set are mentioned.

🙏 1
quoll14:09:52

I’m working with Stardog at the moment (like, literally, as I type this, it’s in the next window). With 18 million edges. I was calculating subtrees using transitive predicates, but that was taking 66 seconds on this laptop. I’m feeling happy right now because I precalculated it (lifting my edges from 11 million triples to the 18 million I have now), and consequently I got a result of 3 seconds. That’s better

quoll14:09:19

Stardog is all Java, but there’s a Clojure API. I know this, because I wrote it 😜

thomas14:09:42

I saw that mentioned somewhere... really cool!

quoll14:09:52

I think I can do similar things with Asami, but… that takes weekends

quoll14:09:08

I don’t support queries on multiple graphs yet, but I plan that!

thomas14:09:23

sounds good.

quoll14:09:25

I know the syntax and everything

quoll14:09:54

I just… haven’t done it yet 😬 But doing it in Stardog has me motivated to get it done in Asami

thomas14:09:40

I might give asami a try.... not ideal as all the data we need to access is in a Postgres at the moment. but could work as a PoC.

quoll14:09:40

Basically, think of Asami as a flexible index. Which also describes other graph stores. But the data-in/data-out story is better in systems like Stardog

thomas14:09:48

ok that sounds good

quoll14:09:43

Please keep in touch with any issues you encounter. There are always a lot of things to do, and having users need something is great for both setting priorities, and for my own motivation

🙏 1
thomas14:09:16

I'll keep that in mind

lilactown14:09:33

I want to extend an asami db to allow https://github.com/lilactown/pyramid/blob/main/src/pyramid/pull.cljc#L8 data out of it using EQL. is there a concrete type that I can extend a protocol to that would work for all database types?

quoll14:09:32

Hmm, give me a second…

quoll14:09:42

OK, sorry.

quoll14:09:31

The storage abstraction is the asami.graph/Graph protocol

quoll14:09:00

The asami.storage/Database protocol is just supposed to be a wrapper for this, and it just provides things like as-of, as-of-t, and so on. It also has the graph function which returns the underlying graph

lilactown14:09:07

here's what I got so far

(defn- resolve-entity
  [db [k v]]
  (when-let [node (ffirst (ag/resolve-triple (a/graph db) '?e k v))]
    (a/entity db node)))

(extend-protocol pull/IPullable
  MemoryDatabase
  (resolve-ref [db ref] (resolve-entity db ref))
  (resolve-ref [db ref not-found] (or (resolve-entity db ref) not-found))

  DurableDatabase
  (resolve-ref [db ref] (resolve-entity db ref))
  (resolve-ref [db ref not-found] (or (resolve-entity db ref) not-found)))

quoll14:09:27

The functions on a graph are for modifying the data (`graph-add`, graph-delete, graph-transact), and ALL of the read operations, particularly resolve-triple

quoll14:09:45

So MemoryDatabase and DurableDatabase are both records

quoll14:09:50

not protocols

lilactown14:09:05

right... I don't think I can extend my protocol to another protocol?

quoll14:09:42

Oh, of course, I’m reading it upside down. Doh. Excuse me. I’m not feeling great today 🙂

lilactown14:09:26

No worries! I want two things: 1. I want to easily be able to go from an "ident" tuple [:person/id 1] to the entity map 2. I want to be able to pass any arbitrary asami database to pyramid.core/pull and it Just Works:tm:

lilactown14:09:55

I think that leads me to using asami.core/entity and extending MemoryDatabase and DurableDatabase

quoll14:09:04

I’m kinda thinking that just using the entity function on asami.storage.Database already does this for you?

lilactown14:09:38

I think I would need to provide my own wrapper type then that implements IPullable

quoll14:09:43

OK. But don’t do the work you’re doing in resolve-entity

quoll14:09:04

See how you’re code says:

(when-let [node (ffirst (ag/resolve-triple (a/graph db) '?e k v))]
    (a/entity db node))

lilactown14:09:23

when I do this

(a/entity (a/db conn) [:person/id 0])
;; => nil

lilactown14:09:17

vs

(resolve-entity (a/db conn) [:person/id 0])
;; => {:person/id 0, :person/name "Rachel", :friend/list [#:db{:ident :a/node-22665} #:person{:id 2, :name "Cassie"} {:person/id 3, :person/name "Jake", :friend/best #:person{:id 1}} #:person{:id 4, :name "Tobias"} #:person{:id 5, :name "Ax", :species "andalite"}]}

lilactown14:09:02

I can share my code if you want to play with it

quoll14:09:54

I honestly can’t see how they would be different. Both MemoryDatabase and DurableDatabase pass the parameters through untouched to asami.entities.reader/ident->entity

quoll14:09:19

Here is the code in that function:

(when-let [eid (or (and (seq (node/find-triple graph [ident '?a '?v])) ident)
                      (ffirst (node/find-triple graph ['?eid :db/ident ident]))
                      (ffirst (node/find-triple graph ['?eid :id ident])))]
     (ref->entity graph eid nested?)))

quoll14:09:07

The value you passed in is ident

lilactown14:09:47

maybe it's the way I'm transacting these maps? I've got some other weirdnesss

lilactown14:09:13

{:tx-data
 [{:person/id 0,
   :person/name "Rachel",
   :friend/list
   ({:person/id 1, :person/name "Marco", :db/id -2}
    {:person/id 2, :person/name "Cassie", :db/id -3}
    {:person/id 3, :person/name "Jake", :db/id -4}
    {:person/id 4, :person/name "Tobias", :db/id -5}
    {:person/id 5, :person/name "Ax", :db/id -6}),
   :db/id -1}
  {:person/id 1,
   :friend/best {:person/id 3, :friend/best #:person{:id 1}, :db/id -4},
   :db/id -2}
  {:species
   {:andalites [{:person/id 5, :person/species "andalite", :db/id -6}]}}]}

quoll14:09:14

Oh… wait. I see the difference

quoll14:09:25

Your pair there… is that supposed to be :person/id as an identifying property, and 0 is the key? Or is it a compound key of [:person/id 0]?

quoll14:09:48

I had presumed you had a compound key, but now I’m thinking it’s a key/value pair

quoll14:09:21

yeah, OK. It’s a key/value pair. I get it. Carry on, you’re doing the correct thing.

lilactown14:09:31

yeah it's a key/value pair is the way I'm treating it rn. so [:person/id 0] supposedly refers to the entity that has {:person/id 0}

lilactown14:09:38

what's the "asami" way to do this?

quoll14:09:55

I only expect entity keys of :db/ident or :id

lilactown14:09:03

something like

{:db/ident [:person/id 0] ,,,}
?

quoll14:09:17

yes, that’s what I thought you were doing

quoll14:09:36

Told you… my brain is foggy today

lilactown14:09:41

hey no worries at all. I'm learning asami too

quoll14:09:47

(I had a flu shot last night, and woah)

🤒 1
quoll15:09:15

Incidentally, you CAN have compound keys like I was just showing

quoll15:09:27

In case you’re ever interested :rolling_on_the_floor_laughing:

lilactown15:09:39

I'm planning out when I'm going to get the new omicron vaccine... might do the flu shot too just to get all the yuck over with at once

lilactown15:09:10

a compound key is like {:db/ident [:person/id 0]} ? or something else?

quoll15:09:32

Exactly like that

quoll15:09:24

Easy to do in memory (just throw an object into the value position). Trickier on disk, but I’m clever 😜

lilactown15:09:50

I think I'll assume people are doing that, rather than trying to look things up by some map ID.

lilactown15:09:59

maybe you can also show me what I'm doing wrong with my transaction

lilactown15:09:14

given this tx on an in-memory store

(def tx {:tx-data '[{:db/ident [:person/id 0]
                       :person/id 0
                       :person/name "Rachel"
                       :friend/list ({:db/ident [:person/id 1]
                                      :person/id 1
                                      :person/name "Marco"}
                                     {:db/ident [:person/id 2]
                                      :person/id 2
                                      :person/name "Cassie"
                                      :db/id -3}
                                     {:db/ident [:person/id 3]
                                      :person/id 3
                                      :person/name "Jake"}
                                     {:db/ident [:person/id 4]
                                      :person/id 4
                                      :person/name "Tobias"}
                                     {:db/ident [:person/id 5]
                                      :person/id 5
                                      :person/name "Ax"})}
                      {:db/ident [:person/id 1]
                       :person/id 1
                       :friend/best {:db/ident [:person/id 1]
                                     :person/id 3
                                     :friend/best #:person{:id 1}}}]})

lilactown15:09:52

the result:

(a/create-database "asami:")
  (def conn (a/connect "asami:"))
  (a/transact conn tx)

  (a/entity (a/db conn) [:person/id 0])
;; => {:person/id 0, :person/name "Rachel", :friend/list [#:db{:ident [:person/id 1]} #:person{:id 2, :name "Cassie"} #:person{:id 3, :name "Jake"} #:person{:id 4, :name "Tobias"} #:person{:id 5, :name "Ax"}]}
 

lilactown15:09:10

for some reason the first item in the :friend/list is not an entity map

quoll15:09:49

Oh drat… that’s a bug

quoll15:09:18

When you’re storing nested objects like that, it’s supposed to pull it apart into its component triples

quoll15:09:35

UNLESS the key is :id or :db/ident

quoll15:09:43

In that case, it shouldn’t be pulled apart

lilactown15:09:22

I noticed this as well when I was using temporary :db/id -1 etc. before too

lilactown15:09:18

when I remove the second map I'm transacting, it works as expected

lilactown15:09:24

I'll open up a gh issue

quoll15:09:59

What we’re seeing in the above is that the first object correctly left :db/ident alone, but then dropped all the other triples. The remaining objects might be OK, because I think :db/ident properties are hidden when returning from ident->entity

quoll15:09:54

Oh, wait! It may be working fine! Because your object that is keyed as {:db/ident [:person/id 1]} is a top level entity, and you don’t have the nested? flag set. This means that it sees the object, but rather than returning you the whole thing, it just gives you the identifier for it

quoll15:09:15

None of the other objects appear outside of that structure, so they will always be embedded inside of it

quoll15:09:38

And their :db/ident fields are hidden when you ask for them by entity

lilactown15:09:31

if I did another transaction with something about {:db/ident [:person/id 2]}, would it move that to a "top level" entity?

quoll15:09:52

See what you get back if you update:

(defn- resolve-entity
  [db [k v]]
  (when-let [node (ffirst (ag/resolve-triple (a/graph db) '?e k v))]
    (a/entity db node true)))

quoll15:09:25

> if I did another transaction with something about {:db/ident [:person/id 2]}, would it move that to a “top level” entity? Yes

lilactown15:09:48

I think I want pyramid to be lazy at how it pulls entities out of the db. so I can rely on either the entity map containing :db/ident OR it will have all the information in it

quoll15:09:20

Alternatively:

(let [gr (a/graph db)
      person2 (ffirst (ag/resolve-triple gr '?e :db/ident [:person/id 2]))]
  (graph-add gr person2 :a/entity true))
Will also make it a top level entity

quoll15:09:16

BTW, when I said that the :db/ident values don’t show up inside the entities when you retrieve them? It happens on this line: https://github.com/quoll/asami/blob/6b8c0fb68ba734f968a3e5906d338cfd6b49e2a6/src/asami/entities/reader.cljc#L132

1
quoll15:09:59

(seen v) means that the object has already been included in a structure, so don’t recurse into it (this is to break loops in the graph). Then it checks that you don’t want nested structures and it’s a top level entity. If so, then emit a short map, that’s either {:db/ident v}, or {:id v} or, finally, {:db/id v}

quoll15:09:08

I mean… it’s trying to be extremely clever here. And that works for many cases. But if you’re not doing the “standard” thing (whatever that means), then I get that it’s confusing. Sorry!!!!

quoll15:09:34

This all came about due to user requests 😄

lilactown15:09:14

Yeah I want to make pyramid match up with what's "standard"

quoll15:09:28

Well, hopefully by showing you some of the bits of code that are making these decisions, you’ll be able to work with the system, and not fight it

quoll15:09:50

(Can you tell I’ve done several years of Aikido?)

😂 1
🥋 1
quoll15:09:23

Then again, I have a black belt in TKD, which is more of a “hit it until it breaks” style 😜

lilactown15:09:41

my black belt in BJJ taught me to use my head. as a battering ram, when necessary

🐐 1
lilactown15:09:02

I think the core issue I'm running into now is that pyramid expects collections to be homogeneous, i.e. either it's a collection of "lookup refs" like [:person/id 0] OR it's a collection of other stuff

quoll15:09:44

My first thought is to ask you to ensure that this is what you store 🙂

lilactown15:09:15

I'm trying to think of how I can use this on a well constructed asami db

lilactown15:09:41

assuming you are transacting entities with :db/ident, my approach is now to replace maps with {:db/ident x} with [:db/ident x] in the data returned by (pyramid.pull/resolve-ref db y)

lilactown15:09:22

the tuple [:db/ident x] would signal to pyramid to call (resolve-ref db [:db/ident x]) and then it would get the entity data via (a/entity db x)

lilactown15:09:28

this works as long as your collections are homogeneous, i.e. if (a/entity db [:db/ident 0]) returns

{:person/id 0
 :friend/list [{:db/ident 1} {:db/ident 2}]}

lilactown15:09:35

however, if you have a mix of top level entities and nested entities then pyramid doesn't know what to do

{:person/id 0
 :friend/list [{:db/ident 1} {:person/id 2 :person/name "Cassie}]}

lilactown15:09:15

I'll have to think about this. perhaps I can relax the requirement in pyramid that collections that contain references are homogeneous

quoll15:09:49

So you have a choice: • make everything a top level entity, and get back [{:db/ident 1} {:db/ident 2} …] • called a/entity with nested? set to true

lilactown15:09:26

nested? will resolve the entire tree, right?

quoll15:09:31

But then you could get deep objects coming back. However, as I mentioned earlier, it DOES break loops

lilactown15:09:58

what happens when it reaches a cycle? what kind of data does it return?

quoll15:09:03

It resolves the tree, yes, but not if it’s seen a node before. In that case, it will update it

quoll15:09:23

I actually showed this earlier

lilactown15:09:40

ah sorry. I'll look back

quoll15:09:55

See how it checks if it’s “seen” the value v? If so, then it puts in the short map of {:db/id …} {:db/ident …} or {:id …}

lilactown15:09:54

that still runs into the issues of heterogeneous collections but probably would work for more data in the interim

quoll15:09:28

So if person 1 includes person 0 in their friend list, and you ask for person 0, you’ll get back something like:

{:db/ident [:person/id 0]
 :person/id 0
 :person/name "Rachel"
 :friend/list ({:person/id 1
                :person/name "Marco"
                :friend/list ({:db/ident [:person/id 0]})})}

1
quoll15:09:40

Without the call to “seen” it would have put in the entire object, which would lead to infinite recursion. Always fun to process infinite recursion

lilactown15:09:38

yeah. pyramid allows you to recurse up to a point, which is nice. well, actually you can infinitely recurse if you want, but you can also put limits on it

lilactown15:09:37

e.g.

[{[:person/id 0] [:person/name {:friend/list [:person/name {:friend/list ...}]}]}]
follows all references, recursing forever or you can provide limits. this will recurse up to 3 times
[{[:person/id 0] [:person/name {:friend/list [:person/name {:friend/list 3}]}]}]

lilactown15:09:45

or you can specify directly when to terminate by eliding the :friend/list selection

[{[:person/id 0] [:person/name {:friend/list [:person/name {:friend/list [:person/name]}]}]}]

lilactown14:09:46

it looks like I could probably extend asami.memory/MemoryDatabase and asami.durable.store/DurableDatabase?