asami

thomas 2022-09-22T08:23:00.847999Z

Hi Graph people... I have to do some experiments on a rather big graph (800K nodes and 4.2M edges). Is that actually possible with Clojure? has anyone done anything like that?

simongray 2022-09-22T08:42:58.848809Z

Hey @thomas, this depends on what kind of experiment you’re doing I think, but in principle I don’ts see why not. There’s also #rdf where lots of people are doing stuff with large datasets.

thomas 2022-09-22T08:44:09.876129Z

I need to do some traversals in the graph and find subtree's

Bart Kleijngeld 2022-09-22T08:44:53.855589Z

At the very least I would want to say that due to the Java interop capabilities, you can safely say that if it can be done using Java, it can also be done by using Clojure. I'm mostly thinking of the Java ecosystem that becomes available to you

thomas 2022-09-22T08:47:19.825879Z

we have been looking at Neo4j as well as a solution, there we can read in the data no problem. but having some problems with getting the right subtree's unfortunately.

Bart Kleijngeld 2022-09-22T13:42:38.949209Z

Check out the dialogue between @quoll, @simongray and @rowland.watkins in the #rdf channel on August 23rd. They're discussing performance between frameworks and actual estimations of the size of your data set are mentioned.

🙏 1
quoll 2022-09-22T14:05:52.470889Z

I’m working with Stardog at the moment (like, literally, as I type this, it’s in the next window). With 18 million edges. I was calculating subtrees using transitive predicates, but that was taking 66 seconds on this laptop. I’m feeling happy right now because I precalculated it (lifting my edges from 11 million triples to the 18 million I have now), and consequently I got a result of 3 seconds. That’s better

quoll 2022-09-22T14:06:19.871139Z

Stardog is all Java, but there’s a Clojure API. I know this, because I wrote it 😜

thomas 2022-09-22T14:06:42.038619Z

I saw that mentioned somewhere... really cool!

quoll 2022-09-22T14:06:52.547359Z

I think I can do similar things with Asami, but… that takes weekends

quoll 2022-09-22T14:07:08.627399Z

I don’t support queries on multiple graphs yet, but I plan that!

thomas 2022-09-22T14:07:23.432519Z

sounds good.

quoll 2022-09-22T14:07:25.513609Z

I know the syntax and everything

quoll 2022-09-22T14:07:54.129229Z

I just… haven’t done it yet 😬 But doing it in Stardog has me motivated to get it done in Asami

thomas 2022-09-22T14:08:40.929889Z

I might give asami a try.... not ideal as all the data we need to access is in a Postgres at the moment. but could work as a PoC.

quoll 2022-09-22T14:10:40.957259Z

Basically, think of Asami as a flexible index. Which also describes other graph stores. But the data-in/data-out story is better in systems like Stardog

thomas 2022-09-22T14:11:48.999699Z

ok that sounds good

quoll 2022-09-22T14:12:43.392939Z

Please keep in touch with any issues you encounter. There are always a lot of things to do, and having users need something is great for both setting priorities, and for my own motivation

🙏 1
thomas 2022-09-22T14:13:16.869679Z

I'll keep that in mind

lilactown 2022-09-22T14:34:33.254989Z

I want to extend an asami db to allow https://github.com/lilactown/pyramid/blob/main/src/pyramid/pull.cljc#L8 data out of it using EQL. is there a concrete type that I can extend a protocol to that would work for all database types?

quoll 2022-09-22T14:37:32.955669Z

Hmm, give me a second…

quoll 2022-09-22T14:38:42.409599Z

OK, sorry.

quoll 2022-09-22T14:39:31.161379Z

The storage abstraction is the asami.graph/Graph protocol

quoll 2022-09-22T14:41:00.046289Z

The asami.storage/Database protocol is just supposed to be a wrapper for this, and it just provides things like as-of, as-of-t, and so on. It also has the graph function which returns the underlying graph

lilactown 2022-09-22T14:42:07.278459Z

here's what I got so far

(defn- resolve-entity
  [db [k v]]
  (when-let [node (ffirst (ag/resolve-triple (a/graph db) '?e k v))]
    (a/entity db node)))

(extend-protocol pull/IPullable
  MemoryDatabase
  (resolve-ref [db ref] (resolve-entity db ref))
  (resolve-ref [db ref not-found] (or (resolve-entity db ref) not-found))

  DurableDatabase
  (resolve-ref [db ref] (resolve-entity db ref))
  (resolve-ref [db ref not-found] (or (resolve-entity db ref) not-found)))

quoll 2022-09-22T14:42:27.247979Z

The functions on a graph are for modifying the data (`graph-add`, graph-delete, graph-transact), and ALL of the read operations, particularly resolve-triple

quoll 2022-09-22T14:43:45.958479Z

So MemoryDatabase and DurableDatabase are both records

quoll 2022-09-22T14:43:50.205419Z

not protocols

lilactown 2022-09-22T14:44:05.401369Z

right... I don't think I can extend my protocol to another protocol?

quoll 2022-09-22T14:44:42.629449Z

Oh, of course, I’m reading it upside down. Doh. Excuse me. I’m not feeling great today 🙂

lilactown 2022-09-22T14:45:26.171109Z

No worries! I want two things: 1. I want to easily be able to go from an "ident" tuple [:person/id 1] to the entity map 2. I want to be able to pass any arbitrary asami database to pyramid.core/pull and it Just Works:tm:

lilactown 2022-09-22T14:45:55.535789Z

I think that leads me to using asami.core/entity and extending MemoryDatabase and DurableDatabase

quoll 2022-09-22T14:47:04.659999Z

I’m kinda thinking that just using the entity function on asami.storage.Database already does this for you?

lilactown 2022-09-22T14:48:38.240599Z

I think I would need to provide my own wrapper type then that implements IPullable

quoll 2022-09-22T14:49:43.594879Z

OK. But don’t do the work you’re doing in resolve-entity

quoll 2022-09-22T14:50:04.830009Z

See how you’re code says:

(when-let [node (ffirst (ag/resolve-triple (a/graph db) '?e k v))]
    (a/entity db node))

lilactown 2022-09-22T14:50:23.944709Z

when I do this

(a/entity (a/db conn) [:person/id 0])
;; => nil

lilactown 2022-09-22T14:51:17.551909Z

vs

(resolve-entity (a/db conn) [:person/id 0])
;; => {:person/id 0, :person/name "Rachel", :friend/list [#:db{:ident :a/node-22665} #:person{:id 2, :name "Cassie"} {:person/id 3, :person/name "Jake", :friend/best #:person{:id 1}} #:person{:id 4, :name "Tobias"} #:person{:id 5, :name "Ax", :species "andalite"}]}

lilactown 2022-09-22T14:53:02.175789Z

I can share my code if you want to play with it

quoll 2022-09-22T14:53:54.073929Z

I honestly can’t see how they would be different. Both MemoryDatabase and DurableDatabase pass the parameters through untouched to asami.entities.reader/ident->entity

quoll 2022-09-22T14:54:19.512999Z

Here is the code in that function:

(when-let [eid (or (and (seq (node/find-triple graph [ident '?a '?v])) ident)
                      (ffirst (node/find-triple graph ['?eid :db/ident ident]))
                      (ffirst (node/find-triple graph ['?eid :id ident])))]
     (ref->entity graph eid nested?)))

quoll 2022-09-22T14:55:07.379929Z

The value you passed in is ident

lilactown 2022-09-22T14:55:47.876949Z

maybe it's the way I'm transacting these maps? I've got some other weirdnesss

lilactown 2022-09-22T14:56:13.408029Z

{:tx-data
 [{:person/id 0,
   :person/name "Rachel",
   :friend/list
   ({:person/id 1, :person/name "Marco", :db/id -2}
    {:person/id 2, :person/name "Cassie", :db/id -3}
    {:person/id 3, :person/name "Jake", :db/id -4}
    {:person/id 4, :person/name "Tobias", :db/id -5}
    {:person/id 5, :person/name "Ax", :db/id -6}),
   :db/id -1}
  {:person/id 1,
   :friend/best {:person/id 3, :friend/best #:person{:id 1}, :db/id -4},
   :db/id -2}
  {:species
   {:andalites [{:person/id 5, :person/species "andalite", :db/id -6}]}}]}

quoll 2022-09-22T14:56:14.407319Z

Oh… wait. I see the difference

quoll 2022-09-22T14:57:25.462089Z

Your pair there… is that supposed to be :person/id as an identifying property, and 0 is the key? Or is it a compound key of [:person/id 0]?

quoll 2022-09-22T14:57:48.616739Z

I had presumed you had a compound key, but now I’m thinking it’s a key/value pair

quoll 2022-09-22T14:58:21.729519Z

yeah, OK. It’s a key/value pair. I get it. Carry on, you’re doing the correct thing.

lilactown 2022-09-22T14:58:31.403329Z

yeah it's a key/value pair is the way I'm treating it rn. so [:person/id 0] supposedly refers to the entity that has {:person/id 0}

lilactown 2022-09-22T14:58:38.593089Z

what's the "asami" way to do this?

quoll 2022-09-22T14:58:55.534839Z

I only expect entity keys of :db/ident or :id

lilactown 2022-09-22T14:59:03.256129Z

something like

{:db/ident [:person/id 0] ,,,}
?

lilactown 2022-09-22T14:59:05.695579Z

gotcha

quoll 2022-09-22T14:59:17.939349Z

yes, that’s what I thought you were doing

quoll 2022-09-22T14:59:27.835879Z

Sorry!

quoll 2022-09-22T14:59:36.310209Z

Told you… my brain is foggy today

lilactown 2022-09-22T14:59:41.040879Z

hey no worries at all. I'm learning asami too

quoll 2022-09-22T14:59:47.643629Z

(I had a flu shot last night, and woah)

🤒 1
quoll 2022-09-22T15:00:15.869279Z

Incidentally, you CAN have compound keys like I was just showing

quoll 2022-09-22T15:00:27.573229Z

In case you’re ever interested 🤣

lilactown 2022-09-22T15:00:39.403529Z

I'm planning out when I'm going to get the new omicron vaccine... might do the flu shot too just to get all the yuck over with at once

lilactown 2022-09-22T15:01:10.542059Z

a compound key is like {:db/ident [:person/id 0]} ? or something else?

quoll 2022-09-22T15:03:32.838139Z

Exactly like that

quoll 2022-09-22T15:04:24.024489Z

Easy to do in memory (just throw an object into the value position). Trickier on disk, but I’m clever 😜

lilactown 2022-09-22T15:05:50.548599Z

I think I'll assume people are doing that, rather than trying to look things up by some map ID.

lilactown 2022-09-22T15:05:59.545619Z

maybe you can also show me what I'm doing wrong with my transaction

lilactown 2022-09-22T15:06:14.183609Z

given this tx on an in-memory store

(def tx {:tx-data '[{:db/ident [:person/id 0]
                       :person/id 0
                       :person/name "Rachel"
                       :friend/list ({:db/ident [:person/id 1]
                                      :person/id 1
                                      :person/name "Marco"}
                                     {:db/ident [:person/id 2]
                                      :person/id 2
                                      :person/name "Cassie"
                                      :db/id -3}
                                     {:db/ident [:person/id 3]
                                      :person/id 3
                                      :person/name "Jake"}
                                     {:db/ident [:person/id 4]
                                      :person/id 4
                                      :person/name "Tobias"}
                                     {:db/ident [:person/id 5]
                                      :person/id 5
                                      :person/name "Ax"})}
                      {:db/ident [:person/id 1]
                       :person/id 1
                       :friend/best {:db/ident [:person/id 1]
                                     :person/id 3
                                     :friend/best #:person{:id 1}}}]})

lilactown 2022-09-22T15:06:52.894259Z

the result:

(a/create-database "asami:")
  (def conn (a/connect "asami:"))
  (a/transact conn tx)

  (a/entity (a/db conn) [:person/id 0])
;; => {:person/id 0, :person/name "Rachel", :friend/list [#:db{:ident [:person/id 1]} #:person{:id 2, :name "Cassie"} #:person{:id 3, :name "Jake"} #:person{:id 4, :name "Tobias"} #:person{:id 5, :name "Ax"}]}
 

lilactown 2022-09-22T15:07:10.453359Z

for some reason the first item in the :friend/list is not an entity map

quoll 2022-09-22T15:07:49.407259Z

Oh drat… that’s a bug

quoll 2022-09-22T15:08:18.701139Z

When you’re storing nested objects like that, it’s supposed to pull it apart into its component triples

quoll 2022-09-22T15:08:35.091239Z

UNLESS the key is :id or :db/ident

quoll 2022-09-22T15:08:43.039059Z

In that case, it shouldn’t be pulled apart

lilactown 2022-09-22T15:09:22.206909Z

I noticed this as well when I was using temporary :db/id -1 etc. before too

lilactown 2022-09-22T15:11:18.750209Z

when I remove the second map I'm transacting, it works as expected

lilactown 2022-09-22T15:11:24.861699Z

I'll open up a gh issue

quoll 2022-09-22T15:11:59.199989Z

What we’re seeing in the above is that the first object correctly left :db/ident alone, but then dropped all the other triples. The remaining objects might be OK, because I think :db/ident properties are hidden when returning from ident->entity

quoll 2022-09-22T15:14:54.528699Z

Oh, wait! It may be working fine! Because your object that is keyed as {:db/ident [:person/id 1]} is a top level entity, and you don’t have the nested? flag set. This means that it sees the object, but rather than returning you the whole thing, it just gives you the identifier for it

quoll 2022-09-22T15:15:15.540019Z

None of the other objects appear outside of that structure, so they will always be embedded inside of it

lilactown 2022-09-22T15:15:31.040029Z

hmm I see

quoll 2022-09-22T15:15:38.106259Z

And their :db/ident fields are hidden when you ask for them by entity

lilactown 2022-09-22T15:16:31.924589Z

if I did another transaction with something about {:db/ident [:person/id 2]}, would it move that to a "top level" entity?

quoll 2022-09-22T15:16:52.156629Z

See what you get back if you update:

(defn- resolve-entity
  [db [k v]]
  (when-let [node (ffirst (ag/resolve-triple (a/graph db) '?e k v))]
    (a/entity db node true)))

quoll 2022-09-22T15:17:25.791129Z

> if I did another transaction with something about {:db/ident [:person/id 2]}, would it move that to a “top level” entity? Yes

lilactown 2022-09-22T15:17:50.384349Z

gotcha

lilactown 2022-09-22T15:18:48.340199Z

I think I want pyramid to be lazy at how it pulls entities out of the db. so I can rely on either the entity map containing :db/ident OR it will have all the information in it

quoll 2022-09-22T15:20:20.018569Z

Alternatively:

(let [gr (a/graph db)
      person2 (ffirst (ag/resolve-triple gr '?e :db/ident [:person/id 2]))]
  (graph-add gr person2 :a/entity true))
Will also make it a top level entity

quoll 2022-09-22T15:21:16.367669Z

BTW, when I said that the :db/ident values don’t show up inside the entities when you retrieve them? It happens on this line: https://github.com/quoll/asami/blob/6b8c0fb68ba734f968a3e5906d338cfd6b49e2a6/src/asami/entities/reader.cljc#L132

👍🏻 1
quoll 2022-09-22T15:23:25.329879Z

As for the nested structures thing, that’s done here: https://github.com/quoll/asami/blob/6b8c0fb68ba734f968a3e5906d338cfd6b49e2a6/src/asami/entities/reader.cljc#L83-L87

quoll 2022-09-22T15:26:59.984659Z

(seen v) means that the object has already been included in a structure, so don’t recurse into it (this is to break loops in the graph). Then it checks that you don’t want nested structures and it’s a top level entity. If so, then emit a short map, that’s either {:db/ident v}, or {:id v} or, finally, {:db/id v}

quoll 2022-09-22T15:28:08.310179Z

I mean… it’s trying to be extremely clever here. And that works for many cases. But if you’re not doing the “standard” thing (whatever that means), then I get that it’s confusing. Sorry!!!!

quoll 2022-09-22T15:28:34.186519Z

This all came about due to user requests 😄

lilactown 2022-09-22T15:31:14.049579Z

Yeah I want to make pyramid match up with what's "standard"

quoll 2022-09-22T15:33:28.336889Z

Well, hopefully by showing you some of the bits of code that are making these decisions, you’ll be able to work with the system, and not fight it

quoll 2022-09-22T15:33:50.986109Z

(Can you tell I’ve done several years of Aikido?)

😂 1
🥋 1
quoll 2022-09-22T15:35:23.447739Z

Then again, I have a black belt in TKD, which is more of a “hit it until it breaks” style 😜

lilactown 2022-09-22T15:37:41.254889Z

my black belt in BJJ taught me to use my head. as a battering ram, when necessary

🐐 1
lilactown 2022-09-22T15:39:02.097029Z

I think the core issue I'm running into now is that pyramid expects collections to be homogeneous, i.e. either it's a collection of "lookup refs" like [:person/id 0] OR it's a collection of other stuff

quoll 2022-09-22T15:39:44.804559Z

My first thought is to ask you to ensure that this is what you store 🙂

lilactown 2022-09-22T15:40:15.958959Z

I'm trying to think of how I can use this on a well constructed asami db

lilactown 2022-09-22T15:41:41.015649Z

assuming you are transacting entities with :db/ident, my approach is now to replace maps with {:db/ident x} with [:db/ident x] in the data returned by (pyramid.pull/resolve-ref db y)

lilactown 2022-09-22T15:42:22.530349Z

the tuple [:db/ident x] would signal to pyramid to call (resolve-ref db [:db/ident x]) and then it would get the entity data via (a/entity db x)

lilactown 2022-09-22T15:43:28.394499Z

this works as long as your collections are homogeneous, i.e. if (a/entity db [:db/ident 0]) returns

{:person/id 0
 :friend/list [{:db/ident 1} {:db/ident 2}]}

lilactown 2022-09-22T15:44:35.106329Z

however, if you have a mix of top level entities and nested entities then pyramid doesn't know what to do

{:person/id 0
 :friend/list [{:db/ident 1} {:person/id 2 :person/name "Cassie}]}

lilactown 2022-09-22T15:46:15.837839Z

I'll have to think about this. perhaps I can relax the requirement in pyramid that collections that contain references are homogeneous

quoll 2022-09-22T15:46:49.540529Z

So you have a choice: • make everything a top level entity, and get back [{:db/ident 1} {:db/ident 2} …] • called a/entity with nested? set to true

lilactown 2022-09-22T15:47:26.905039Z

nested? will resolve the entire tree, right?

quoll 2022-09-22T15:47:31.788829Z

But then you could get deep objects coming back. However, as I mentioned earlier, it DOES break loops

lilactown 2022-09-22T15:47:58.794229Z

what happens when it reaches a cycle? what kind of data does it return?

quoll 2022-09-22T15:48:03.910899Z

It resolves the tree, yes, but not if it’s seen a node before. In that case, it will update it

quoll 2022-09-22T15:48:23.049259Z

I actually showed this earlier

lilactown 2022-09-22T15:48:40.133779Z

ah sorry. I'll look back

quoll 2022-09-22T15:49:55.514859Z

See how it checks if it’s “seen” the value v? If so, then it puts in the short map of {:db/id …} {:db/ident …} or {:id …}

lilactown 2022-09-22T15:50:22.153589Z

ah ok cool

lilactown 2022-09-22T15:50:54.986449Z

that still runs into the issues of heterogeneous collections but probably would work for more data in the interim

quoll 2022-09-22T15:52:28.328029Z

So if person 1 includes person 0 in their friend list, and you ask for person 0, you’ll get back something like:

{:db/ident [:person/id 0]
 :person/id 0
 :person/name "Rachel"
 :friend/list ({:person/id 1
                :person/name "Marco"
                :friend/list ({:db/ident [:person/id 0]})})}

👍🏻 1
quoll 2022-09-22T15:53:40.309499Z

Without the call to “seen” it would have put in the entire object, which would lead to infinite recursion. Always fun to process infinite recursion

lilactown 2022-09-22T15:54:38.384169Z

yeah. pyramid allows you to recurse up to a point, which is nice. well, actually you can infinitely recurse if you want, but you can also put limits on it

lilactown 2022-09-22T15:56:37.585259Z

e.g.

[{[:person/id 0] [:person/name {:friend/list [:person/name {:friend/list ...}]}]}]
follows all references, recursing forever or you can provide limits. this will recurse up to 3 times
[{[:person/id 0] [:person/name {:friend/list [:person/name {:friend/list 3}]}]}]

lilactown 2022-09-22T15:57:45.928479Z

or you can specify directly when to terminate by eliding the :friend/list selection

[{[:person/id 0] [:person/name {:friend/list [:person/name {:friend/list [:person/name]}]}]}]

lilactown 2022-09-22T15:58:00.444289Z

EQL is cool

lilactown 2022-09-22T14:36:46.733209Z

it looks like I could probably extend asami.memory/MemoryDatabase and asami.durable.store/DurableDatabase?