Fork me on GitHub
#asami
<
2022-05-16
>
phronmophobic03:05:52

I'm finally getting around to giving asami a try. It's really neat! I'm basically trying to load data using upsert. My initial attempts were:

@(d/transact db [{:db/ident "will" :name' "Fitzwilliam"}])
;; Execution error (ExceptionInfo) at asami.entities/eval75327$entity-triples$fn (entities.cljc:74).
;; Cannot update a non-existent node
and
@(d/transact db [{:db/ident "will"}
                 {:db/ident "will" :name' "Fitzwilliam"}])
;; Execution error (ExceptionInfo) at asami.entities/eval75327$entity-triples$fn (entities.cljc:74).
;; Cannot update a non-existent node
It seems like you can get upsert-like behavior by first transacting all the :db/idents and then transacting the rest of each entity as usual.
;; works
@(d/transact db [{:db/ident "will"}])
@(d/transact db [{:db/ident "will" :name' "Fitzwilliam"}])
I'm new to asami, so I'm probably doing something goofy. Is that a reasonable approach?

zeitstein06:05:44

It makes sense that upsert will fail for an entity that doesn't exist yet? So, transacting an entity first (you can include :name), then updating through append annotation is reasonable. Though, it would be nice if you could just do it all in one go, agreed.

Jakub Holý (HolyJak)08:05:55

Try with :tx-data if it makes a difference? Look at this passing test that creates a new entity Betty https://github.com/quoll/asami/blob/main/test/asami/api_test.cljc#L203

quoll12:05:04

:tx-data won't change anything.

quoll12:05:44

Yes, it's because the ' annotation on :name' is an update on the :name property, and that entity doesn't exist yet.

quoll12:05:30

If I ever get time to do the schemas, then that may work better for you

Jakub Holý (HolyJak)12:05:12

Ah, I have overlooked the ' Right, so the solution is not to try an upsert.

quoll13:05:47

Yes. The ' explicitly says, "I already have a :name field, and I want to update it. I don't want to just add another :name field." To do this, it looks for the existing value of :name and removes it. Then it adds the name.

quoll13:05:35

But I'm thinking that it's safe to just ignore fields that don't already exist, and not report this error. That would be more user-friendly

phronmophobic19:05:54

The replacement annotation transaction succeeds even if the property doesn't exist, but it fails if the entity doesn't already exist.

quoll19:05:31

Oh, OK. This makes sense, since the entity ID gets identified. It then tries to do the “remove” before doing the “add”. I guess I just let it fail when the entity doesn’t exist, because there’s no entity to identify. But it could still allow it through

phronmophobic19:05:23

:thumbsup: My use case is that I've been using gmail to answer support emails. I'm trying to automate parts of the process. Currently, I'm trying to implement a "sync" operation that just grabs the latest gmail threads and updates a db with the latest messages. It seems like upsert is not a very common operation. Asami is the 4th datalog db I've tried. So far, I like it the best. I should have tried asami first!

phronmophobic03:05:51

I was also wondering if using collections for :db/ident was supported. It seems to work, but I wasn't sure if that was in the "intended usage" category or the "undefined behavior" category. Example:

> @(d/transact db [{:db/ident [:foo :bar 42]}])

> @(d/transact db [{:db/ident [:foo :bar 42] :foo/bar' "forty-two"}])
> (d/entity db [:foo :bar 42])
;; #:foo{:bar "forty-two"}

> @(d/transact db [{:db/ident [:foo :bar 42] :foo/bar' 42}])
> (d/entity db [:foo :bar 42]) 
;; #:foo{:bar 42}

Jakub Holý (HolyJak)08:05:33

IMO you can use any value for :db/ident or :id

quoll12:05:07

Yes, it's supported, but it's brand new, so please report bugs with it! 🙂

🎉 1
gratitude 1
quoll12:05:02

Usually, a collection would be destructured into triples. But for identifiers, they're kept intact as a value. If schemas ever come (see comment on previous thread) then everything will be able to have structures as values. But for now, it only works in IDs or if you insert them with :db/add

Bart Kleijngeld12:05:24

So, in continuation of our conversation in #rdf,@quoll: the idea is to extend d/entities with support for nodes of datatype URI so that nested? will cause a recursive entity lookup for URIs as well. Correct?

Bart Kleijngeld12:05:36

Is there anything more I can do to help in this? I can lend a hand thinking this through, documenting, etc. I wouldn't feel up to the job of implementing this myself since I'm not experienced enough with Clojure for that.

quoll12:05:55

Try the branch at https://github.com/quoll/asami/tree/nodes There's a small regression that it introduces, but I think it's OK

quoll12:05:51

It's a minor change. It updates the recursion to accept more node types, and follows with a lookup

quoll13:05:02

I think the regression is because I was explicitly testing for things that LOOK like they're nested but aren't to get thrown away instead 🙂

quoll13:05:43

Ah! I see...

quoll13:05:11

It's throwing away an empty collection. Which I'm supporting. But I don't know if I need to

quoll13:05:46

What does it mean if a blank node exists, but it is not the subject of anything? In the context of entities, I've been assuming an empty entity

Bart Kleijngeld13:05:55

Ahh I get your point now. My initial response is that an empty entity is what I'd expect. The alternative would be an unresolved bnode identifier?

Bart Kleijngeld13:05:50

(Meanwhile I'm looking into how to get source code from GitHub in my Leiningen project 😉. Learning on all fronts here. Really curious to see the "fix" in action)

quoll13:05:11

well, if you get asami, checkout the "nodes" branch", then run lein install, you should get asami-2.3.0-SNAPSHOT installed. Depend on that, and you'll be seeing my update

quoll13:05:32

The regression is fixed now

Bart Kleijngeld13:05:50

Created /opt/asami/target/asami-2.3.1-SNAPSHOT.jar
Wrote /opt/asami/pom.xml
Installed jar and pom into local repo.
Alright, so far so good. An depending on it means refering to that JAR in my project file?

Bart Kleijngeld13:05:09

I think I got it installed as a local dependency now, I'm going to test

quoll13:05:24

Sorry for the wrong snapshot number! 😬

quoll13:05:36

But yes, you have it

Bart Kleijngeld13:05:48

😄 no problem, I was sloppy enough to not even notice the mistake

Bart Kleijngeld13:05:26

Something's still wrong, probably with the installing.

(require '[asami.core :as d])
; Execution error (FileNotFoundException) at asami.cache/eval10473$loading (cache.cljc:13).
; Could not locate clojure/data/priority_map__init.class, clojure/data/priority_map.clj or clojure/data/priority_map.cljc on classpath. Please check that namespaces with dashes use underscores in the Clojure file name.
VS Code doesn't autocomplete asami.core either.

Bart Kleijngeld13:05:52

In my project's pom.xml I do see the 2.3.1-SNAPSHOT though (I ran lein deps)

Bart Kleijngeld13:05:16

Sorry to bother you with these basic installation issues haha. I can probably figure this out using Google/SO 😉

quoll13:05:30

Yeah, I'm sorry. I'm about to be in meetings for several hours too, so I won't be available, sorry

Bart Kleijngeld13:05:13

No problem. Really glad you've made the changes this quickly. Definitely will test this and report back. Thanks!

Bart Kleijngeld15:05:44

It works! ❤️

Bart Kleijngeld15:05:53

There's two subtleties to note, however: 1. Blank node identifiers are not valid URIs because of the : in them, so I simply take the substring after the _: prefix and create an URI from that 2. IRIs are a superset of URIs, meaning the URI data type is technically not wide enough to be sure to cover all possible names. Curious about your ideas.

quoll15:05:00

to construct blank nodes, you'll want a blank node construction function like:

(defn blank-node-constructor
  [connection-or-db]
  (let [graph (#'asami.core/as-graph connection-or-db)]
    (memoize (fn [id] (zuko.node/new-node graph)))))
This should return a function that will create an internal node for you each time you need one, returning the same object whenever you pass the same blank node identifier to it

quoll15:05:58

As for IRIs... don't worry about them. Just use URIs. That's because java.net.URI does not check its arguments, and will accept an IRI just fine

quoll15:05:42

Note: the function asami.core/as-graph is marked private, which is why I had to use the reader macro to get the symbol

👍 1
Bart Kleijngeld15:05:56

Haha, alright, great type safety, Java 😉. Well, I'll benefit from it any day. And I'll give that piece of code a try.

quoll15:05:47

I'm not sure of the reasoning for not checking URIs, but if it's not just laziness then I'll guess that it's because URIs have very few syntactic constraints, and it's not really worth it to check. It could potentially have large performance implications if you have a lot of them.

Bart Kleijngeld15:05:54

Hmm yes I can imagine the performance implications indeed

Bart Kleijngeld15:05:15

Thanks a lot for your help. It helps sell Clojure (and Asami) to my team too

quoll15:05:10

I'm just looking at the URI spec, and basically, the only thing not allowed is a space. Most other strings are allowable!

Bart Kleijngeld15:05:06

Wow, that's definitely lenient. Sounds convincing to keep us out of trouble indeed

quoll15:05:16

There can be a prelude of [a-zA-z0-9+\_.]+: but if that's not there, then it's a relative URI. So if there is no : it's relative. If it's some long string, followed by a : then if that string only contains letters, numbers, plus, minus or dot, then it's a scheme prefix. Otherwise, it's still valid, and it's a relative URI. I mean... this gets complex very quickly.

Bart Kleijngeld15:05:32

Yeah, and conversely: if there is a : it is assumed to be absolute, which caused the trouble I had with the Bnode identifiers

quoll15:05:34

There's rules for decoding various parts of it, but it's really about saying what's valid as a URI that you want to create, and not how you check if a URI is valid.

👍 1
quoll15:05:32

I once went down the rabbit-hole of "what is valid as a URI?" only to learn that almost everything is! In the end, I gave up 🙂

Bart Kleijngeld15:05:01

Haha that reminds me of having some intensely complicated expression in math that ultimately resolves into 1 or something similarly trivial. Sometimes behind all sorts of (apparent?) complexity, there lies nothing but something very simple

quoll15:05:47

From memory (and I won’t commit to anything unless I’m looking at the RFCs, and even then I’m dubious as to my interpretation), the main difference between URIs and IRIs is that the authority (the user@hostname:port in a http URI) may have characters in the higher codepages of Unicode if it’s an IRI.

quoll15:05:39

oh… no. URIs are supposed to be ascii only

quoll15:05:29

Well, I know from experience that almost every URI implementation allows non-ascii characters, particularly in the path. So I guess they were always IRI implementations, and just never knew it 😜

Bart Kleijngeld15:05:54

Haha alright well that's comforting 🙂, because of our international collaboration there's a (probably small) chance we might need to support non-ASCII characters

quoll15:05:59

user=> (import '[ URI])
java.net.URI
user=> (def u (URI. ""))
#'user/u
user=> (.getAuthority u)
"見.香港"

quoll15:05:14

Or better yet:

user=> (def u (URI. ""))
#'user/u
user=> (.getAuthority u)
"見.香港"
user=> (.getPath u)
"/中国大陆报纸列表"

❤️ 1
Bart Kleijngeld15:05:59

List of newspapers in mainland China

Bart Kleijngeld15:05:11

Haha. But yes, this certainly proves your point

quoll15:05:14

Yeah, I was just copy/pasting from some online examples, but it shows that java.net.URI is a perfectly fine replacement for an IRI. It just has the wrong name 🙂

quoll15:05:34

Anyway… try that. IRI strings get wrapped in java.net.URI, and blank node strings get sent to the node factory function that you create with my example code (different types of graphs are supposed to get different types of internal nodes, hence the need for a function that is based on the graph)

quoll15:05:21

You can always get the graph yourself, but by using the the private asami.core/as-graph function, it will convert things for you automatically. The function just says: “Do you follow a protocol where you can return yourself as a graph? If so, then use that. Otherwise, are you a database? If so, then ask that for a graph. Otherwise, are you are connection? If so, get the most recent database, and ask that for its graph.”

Bart Kleijngeld15:05:59

Thanks! I'll give it a try. The URIs worked at least, that is with recursive fetching from d/entities. I'll try the function for the blank nodes instead of my hack :thumbsup:

quoll16:05:17

I really do need to glue in some RDF tools. But they’re mostly in Java, and I’m trying to make this work with ClojureScript as well. So right now I’ve been working on a Turtle parser in raw Clojure. Except… I’m building it as a state machine instead of using something like instaparse, because I need to parse multi-GB files, and I want it fast. So it’s taking me forever, because I’m an idiot

Bart Kleijngeld16:05:13

Haha you sound like quite the perfectionist. That's a blessing and a curse I guess (I speak from experience 😉). I would really love to lend a hand at some point, but that's provided that we end up working with Clojure, and I get more experienced.

quoll16:05:58

Help will be welcome. I’ve been getting some help from Jakub in the last 2 weeks, and I’ve been really poor about keeping up with him. I want to turn that around

👍 1
Bart Kleijngeld07:05:44

Hmm it seems the nested? parameter of d/entities has no effect on the outcome, at least it doesn't for my URI-based nodes. That sounds like a bug

quoll12:05:54

It's for entities that reference other top-level entities.

Bart Kleijngeld12:05:49

I was aware of that, but maybe I misunderstood what constitutes an entity "top-level". I'll re-read the wiki to be sure :thumbsup:

quoll12:05:40

When entities are parsed in from JSON or EDN then they're marked at being a top leve entity unless they're already inside of another entity. It's then possible to reference another top level entity by ID. If you try to retrieve an entity that refers to another in this way and you don't have nested? turned on, then you'll just get the entity ID and not the full nested object

Bart Kleijngeld12:05:36

Hmm yes that is how I understood it. And since my data consists of RDF triples (loaded using :tx-triples), I figured all of the resources in object positions should not be resolved if nested? is false

quoll12:05:47

Well, it's the idea of parsing JSON. If I read:

[{:id 1} {:id 2} {:id 3}]
then that's 3 top level entities. Any entities nested inside of them aren't. I mark top level entities with a property of :a/entity

quoll12:05:11

Yes, if nested? is false then it should just return a small object containing just the ID value

Bart Kleijngeld12:05:14

For example, given this RDF Turtle fragment:

shape:CShape
    a sh:NodeShape ;
    sh:targetClass vocab:C ;
When I ask for the shape:CShape entity, I don't expect vocab:C to be resolved with nested? set to false, since that is a top-level entity itself

quoll12:05:34

Ah… maybe it's because your objects don't have IDs?

Bart Kleijngeld12:05:03

Ah yeah maybe that's it. I'm using the URI objects now for resources, without having called the node generation function you referenced. I figured that was only needed for the blank nodes

quoll12:05:15

Though… that's basically a :db/ident… Ah, do you have a triple of: vocab:C :a/entity true ?

Bart Kleijngeld12:05:08

From what I understand, the :db/ident is the first position in the triple right? That corresponds to the URI instance for that resource IRI, yes.

Bart Kleijngeld12:05:38

What do you mean by a/:entity true here?

quoll12:05:57

You have to excuse me… I am rushing out the door to the dentist right now!

Bart Kleijngeld12:05:10

No problem, good luck!

quoll12:05:22

If you want to treat it as an entity you need to do the entity bookkeeping. This is how top level entities are marked

Bart Kleijngeld12:05:14

Ah, alright, now I understand 🙂. It's what you get for free in a nested structure like JSON, but with graph data you have to declare it yourself. Thanks

👍 1
Bart Kleijngeld15:05:58

@quoll In the meantime I discovered https://github.com/ont-app/vocabulary/issues/20 which maps between URI strings and keywords for me. I've found this to be more readable than the URI datatypes when printing the data, and because it's keywords, d/entities already resolves the node. I can imagine the support for URI is still a good idea, but I'll let you be the judge of that. I don't seem to be using it anymore though.

quoll15:05:15

Sounds nice. I like using keywords, from a syntactical point of view

quoll15:05:43

Ideal for predicates. I’d be cautious of using them for entities, unless it’s an in-memory graph

Bart Kleijngeld15:05:36

Yes this is just for an in-memory, on-the-fly graph db used for a single run that transforms a schema

Bart Kleijngeld12:07:34

@quoll Turns out I was mistaken and we actually do rely on the changes in your 2.3.1-SNAPSHOT (the nodes branch?). Other developers are going to get involved now as well so I was wondering if you could release this version on Clojars any time soon. Thanks!

quoll17:07:40

OK. I have a few things backed up, so I really should try to get this done soon. Sorry it’s taking so long right now

Bart Kleijngeld18:07:37

That's sounds good, and no problem!

quoll02:07:10

Drat… I put out a release today, but missed this! I'll try to do it this week

quoll03:07:30

Looks like I had something else to fix! So I merged this in at the same time.

quoll03:07:36

Look for version 2.3.2

Bart Kleijngeld05:07:05

Haha lucky me. Thanks 🙂

Bart Kleijngeld05:07:49

Might be a good idea to announce it in the #announcements channel. I think it also will get picked up in the weekly Clojure Deref

quoll13:07:30

I tend to put larger releases there, and just use #releases for smaller updates like this

👍 1
quoll13:07:48

Maybe when I get the with function going 🙂

Bart Kleijngeld13:07:25

What kind of with function are you working on? Sounds like a context manager?

quoll13:07:44

It’s a reimplementation of Datomic’s with

quoll13:07:07

So you can execute transactions against a database, and it doesn’t change the original database

quoll13:07:38

It will have some features that Datomic is missing (which has frustrated me a lot in the past)

Bart Kleijngeld14:07:10

Oh that sounds like an interesting feature!

quoll14:07:05

It also includes the ability to ask the temporary database what the deltas are from the original database

Bart Kleijngeld14:07:11

That really sounds super cool. I'm not experienced enough to imagine all of the specific things this enables you to do, but I have a hunch it's a lot. Sounds like for instance you could do better version control and improve storage efficiency

quoll14:07:28

Well, it lets you speculate with your data, without affecting storage.

quoll14:07:01

But in Datomic, if you’ve done this, then there is no way to say, “I like these changes. Let’s push them into storage.”

Bart Kleijngeld14:07:40

Ahh right. That's definitely quite an addition. Nice 🙂

quoll14:07:42

I’m trying provide a mechanism for this

Bart Kleijngeld14:07:32

Sounds like a lovely technical challenge you can seek your teeth in deeply 😄

quoll14:07:31

It’s also going to be the basis for faster transactions. i.e. a transaction will quickly create one of these databases and return it. A thread will get launched to merge the changes into the main database. Once the merge is done, then the database will internally switch over to the stored data

quoll14:07:59

The faster transaction part will be the challenge. The rest of it’s already done

Bart Kleijngeld14:07:09

So you're basically dividing up the work multi-threaded using these temporary databases, and safely merging it

quoll14:07:43

That’s the idea

quoll14:07:30

Less about dividing. More about doing transactions in memory, then merging storage in the background

👍 1
phronmophobic19:05:35

I'm now looking at how upsert behaves with child references.

(def db
  (let [uri (str "asami:mem://" (name (gensym "my-in-mem-db-")))]
    (d/create-database uri)
    (d/connect uri)))

@(d/transact db [{:db/ident "parent"
                  :my/children [{:db/ident "child"
                                 :a/id "child"}]}])
(d/entity db "child")
;; #:a{:id "child"}

;; "child" entity disappears
@(d/transact db [{:db/ident "parent"
                  :my/children' [{:db/ident "child"
                                  :a/id "child"}]}])
(d/entity db "child")
;; nil

;; same transaction again. "child" entity reappears
@(d/transact db [{:db/ident "parent"
                  :my/children' [{:db/ident "child"
                                  :a/id "child"}]}])
(d/entity db "child")
;; #:a{:id "child"}
Again, I'm probably doing something goofy. Does it make sense to use upsert with nested children?

phronmophobic19:05:26

I'm also seeing that the replacement annotation doesn't apply for nested children.

(def db
  (let [uri (str "asami:mem://" (name (gensym "my-in-mem-db-")))]
    (d/create-database uri)
    (d/connect uri)))

@(d/transact db [{:db/ident "child"
                  :a/id "child"}])
(d/entity db "child")
;; #:a{:id "child"}

@(d/transact db [{:db/ident "parent"
                  :my/children [{:db/ident "child"
                                 :a/id' "child"}]}])
(d/entity db "child")
;; #:a{:id "child", :id' "child"}