This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-05-16
Channels
- # announcements (2)
- # asami (124)
- # babashka (30)
- # babashka-sci-dev (73)
- # beginners (40)
- # biff (1)
- # calva (39)
- # clj-kondo (54)
- # clj-otel (1)
- # cljdoc (59)
- # cljs-dev (8)
- # clojars (2)
- # clojure (96)
- # clojure-austin (16)
- # clojure-boston (6)
- # clojure-europe (51)
- # clojure-nl (1)
- # clojure-norway (1)
- # clojure-russia (60)
- # clojure-uk (4)
- # clojurescript (34)
- # community-development (6)
- # cursive (2)
- # datahike (10)
- # datascript (18)
- # emacs (109)
- # etaoin (1)
- # events (3)
- # figwheel-main (41)
- # fulcro (13)
- # helix (4)
- # introduce-yourself (5)
- # jobs (1)
- # leiningen (5)
- # lsp (8)
- # malli (6)
- # meander (7)
- # nrepl (6)
- # off-topic (60)
- # pathom (29)
- # polylith (8)
- # re-frame (5)
- # reitit (1)
- # releases (1)
- # remote-jobs (1)
- # rewrite-clj (33)
- # sci (3)
- # shadow-cljs (3)
- # xtdb (82)
I'm finally getting around to giving asami a try. It's really neat! I'm basically trying to load data using upsert. My initial attempts were:
@(d/transact db [{:db/ident "will" :name' "Fitzwilliam"}])
;; Execution error (ExceptionInfo) at asami.entities/eval75327$entity-triples$fn (entities.cljc:74).
;; Cannot update a non-existent node
and
@(d/transact db [{:db/ident "will"}
{:db/ident "will" :name' "Fitzwilliam"}])
;; Execution error (ExceptionInfo) at asami.entities/eval75327$entity-triples$fn (entities.cljc:74).
;; Cannot update a non-existent node
It seems like you can get upsert-like behavior by first transacting all the :db/ident
s and then transacting the rest of each entity as usual.
;; works
@(d/transact db [{:db/ident "will"}])
@(d/transact db [{:db/ident "will" :name' "Fitzwilliam"}])
I'm new to asami, so I'm probably doing something goofy. Is that a reasonable approach?It makes sense that upsert will fail for an entity that doesn't exist yet? So, transacting an entity first (you can include :name
), then updating through append annotation is reasonable. Though, it would be nice if you could just do it all in one go, agreed.
Try with :tx-data
if it makes a difference? Look at this passing test that creates a new entity Betty https://github.com/quoll/asami/blob/main/test/asami/api_test.cljc#L203
Yes, it's because the '
annotation on :name'
is an update on the :name
property, and that entity doesn't exist yet.
Ah, I have overlooked the '
Right, so the solution is not to try an upsert.
Yes. The '
explicitly says, "I already have a :name
field, and I want to update it. I don't want to just add another :name
field."
To do this, it looks for the existing value of :name
and removes it. Then it adds the name.
But I'm thinking that it's safe to just ignore fields that don't already exist, and not report this error. That would be more user-friendly
The replacement annotation transaction succeeds even if the property doesn't exist, but it fails if the entity doesn't already exist.
Oh, OK. This makes sense, since the entity ID gets identified. It then tries to do the “remove” before doing the “add”. I guess I just let it fail when the entity doesn’t exist, because there’s no entity to identify. But it could still allow it through
:thumbsup: My use case is that I've been using gmail to answer support emails. I'm trying to automate parts of the process. Currently, I'm trying to implement a "sync" operation that just grabs the latest gmail threads and updates a db with the latest messages. It seems like upsert is not a very common operation. Asami is the 4th datalog db I've tried. So far, I like it the best. I should have tried asami first!
I was also wondering if using collections for :db/ident
was supported. It seems to work, but I wasn't sure if that was in the "intended usage" category or the "undefined behavior" category.
Example:
> @(d/transact db [{:db/ident [:foo :bar 42]}])
> @(d/transact db [{:db/ident [:foo :bar 42] :foo/bar' "forty-two"}])
> (d/entity db [:foo :bar 42])
;; #:foo{:bar "forty-two"}
> @(d/transact db [{:db/ident [:foo :bar 42] :foo/bar' 42}])
> (d/entity db [:foo :bar 42])
;; #:foo{:bar 42}
IMO you can use any value for :db/ident or :id
Usually, a collection would be destructured into triples. But for identifiers, they're kept intact as a value. If schemas ever come (see comment on previous thread) then everything will be able to have structures as values. But for now, it only works in IDs or if you insert them with :db/add
So, in continuation of our conversation in #rdf,@quoll: the idea is to extend d/entities
with support for nodes of datatype URI
so that nested?
will cause a recursive entity lookup for URI
s as well. Correct?
Is there anything more I can do to help in this? I can lend a hand thinking this through, documenting, etc. I wouldn't feel up to the job of implementing this myself since I'm not experienced enough with Clojure for that.
Try the branch at https://github.com/quoll/asami/tree/nodes There's a small regression that it introduces, but I think it's OK
It's a minor change. It updates the recursion to accept more node types, and follows with a lookup
I think the regression is because I was explicitly testing for things that LOOK like they're nested but aren't to get thrown away instead 🙂
It's throwing away an empty collection. Which I'm supporting. But I don't know if I need to
What does it mean if a blank node exists, but it is not the subject of anything? In the context of entities, I've been assuming an empty entity
Ahh I get your point now. My initial response is that an empty entity is what I'd expect. The alternative would be an unresolved bnode identifier?
(Meanwhile I'm looking into how to get source code from GitHub in my Leiningen project 😉. Learning on all fronts here. Really curious to see the "fix" in action)
well, if you get asami, checkout the "nodes" branch", then run lein install
, you should get asami-2.3.0-SNAPSHOT installed. Depend on that, and you'll be seeing my update
Created /opt/asami/target/asami-2.3.1-SNAPSHOT.jar
Wrote /opt/asami/pom.xml
Installed jar and pom into local repo.
Alright, so far so good. An depending on it means refering to that JAR in my project file?I think I got it installed as a local dependency now, I'm going to test
😄 no problem, I was sloppy enough to not even notice the mistake
Something's still wrong, probably with the installing.
(require '[asami.core :as d])
; Execution error (FileNotFoundException) at asami.cache/eval10473$loading (cache.cljc:13).
; Could not locate clojure/data/priority_map__init.class, clojure/data/priority_map.clj or clojure/data/priority_map.cljc on classpath. Please check that namespaces with dashes use underscores in the Clojure file name.
VS Code doesn't autocomplete asami.core
either.In my project's pom.xml
I do see the 2.3.1-SNAPSHOT
though (I ran lein deps
)
Sorry to bother you with these basic installation issues haha. I can probably figure this out using Google/SO 😉
Yeah, I'm sorry. I'm about to be in meetings for several hours too, so I won't be available, sorry
No problem. Really glad you've made the changes this quickly. Definitely will test this and report back. Thanks!
It works! ❤️
There's two subtleties to note, however:
1. Blank node identifiers are not valid URIs because of the :
in them, so I simply take the substring after the _:
prefix and create an URI from that
2. IRIs are a superset of URIs, meaning the URI data type is technically not wide enough to be sure to cover all possible names.
Curious about your ideas.
to construct blank nodes, you'll want a blank node construction function like:
(defn blank-node-constructor
[connection-or-db]
(let [graph (#'asami.core/as-graph connection-or-db)]
(memoize (fn [id] (zuko.node/new-node graph)))))
This should return a function that will create an internal node for you each time you need one, returning the same object whenever you pass the same blank node identifier to itAs for IRIs... don't worry about them. Just use URIs. That's because java.net.URI
does not check its arguments, and will accept an IRI just fine
Note: the function asami.core/as-graph
is marked private, which is why I had to use the reader macro to get the symbol
Haha, alright, great type safety, Java 😉. Well, I'll benefit from it any day. And I'll give that piece of code a try.
I'm not sure of the reasoning for not checking URIs, but if it's not just laziness then I'll guess that it's because URIs have very few syntactic constraints, and it's not really worth it to check. It could potentially have large performance implications if you have a lot of them.
Hmm yes I can imagine the performance implications indeed
Thanks a lot for your help. It helps sell Clojure (and Asami) to my team too
I'm just looking at the URI spec, and basically, the only thing not allowed is a space. Most other strings are allowable!
Wow, that's definitely lenient. Sounds convincing to keep us out of trouble indeed
There can be a prelude of [a-zA-z0-9+\_.]+:
but if that's not there, then it's a relative URI.
So if there is no :
it's relative. If it's some long string, followed by a :
then if that string only contains letters, numbers, plus, minus or dot, then it's a scheme prefix. Otherwise, it's still valid, and it's a relative URI.
I mean... this gets complex very quickly.
Yeah, and conversely: if there is a :
it is assumed to be absolute, which caused the trouble I had with the Bnode identifiers
There's rules for decoding various parts of it, but it's really about saying what's valid as a URI that you want to create, and not how you check if a URI is valid.
I once went down the rabbit-hole of "what is valid as a URI?" only to learn that almost everything is! In the end, I gave up 🙂
Haha that reminds me of having some intensely complicated expression in math that ultimately resolves into 1
or something similarly trivial.
Sometimes behind all sorts of (apparent?) complexity, there lies nothing but something very simple
From memory (and I won’t commit to anything unless I’m looking at the RFCs, and even then I’m dubious as to my interpretation), the main difference between URIs and IRIs is that the authority (the user@hostname:port
in a http URI) may have characters in the higher codepages of Unicode if it’s an IRI.
Well, I know from experience that almost every URI implementation allows non-ascii characters, particularly in the path. So I guess they were always IRI implementations, and just never knew it 😜
Haha alright well that's comforting 🙂, because of our international collaboration there's a (probably small) chance we might need to support non-ASCII characters
user=> (import '[ URI])
java.net.URI
user=> (def u (URI. " "))
#'user/u
user=> (.getAuthority u)
"見.香港"
Or better yet:
user=> (def u (URI. ""))
#'user/u
user=> (.getAuthority u)
"見.香港"
user=> (.getPath u)
"/中国大陆报纸列表"
List of newspapers in mainland China
Haha. But yes, this certainly proves your point
Yeah, I was just copy/pasting from some online examples, but it shows that java.net.URI
is a perfectly fine replacement for an IRI. It just has the wrong name 🙂
Exactly
Anyway… try that. IRI strings get wrapped in java.net.URI
, and blank node strings get sent to the node factory function that you create with my example code (different types of graphs are supposed to get different types of internal nodes, hence the need for a function that is based on the graph)
You can always get the graph yourself, but by using the the private asami.core/as-graph
function, it will convert things for you automatically.
The function just says: “Do you follow a protocol where you can return yourself as a graph? If so, then use that. Otherwise, are you a database? If so, then ask that for a graph. Otherwise, are you are connection? If so, get the most recent database, and ask that for its graph.”
Thanks! I'll give it a try. The URI
s worked at least, that is with recursive fetching from d/entities
. I'll try the function for the blank nodes instead of my hack :thumbsup:
I really do need to glue in some RDF tools. But they’re mostly in Java, and I’m trying to make this work with ClojureScript as well. So right now I’ve been working on a Turtle parser in raw Clojure. Except… I’m building it as a state machine instead of using something like instaparse, because I need to parse multi-GB files, and I want it fast. So it’s taking me forever, because I’m an idiot
Haha you sound like quite the perfectionist. That's a blessing and a curse I guess (I speak from experience 😉). I would really love to lend a hand at some point, but that's provided that we end up working with Clojure, and I get more experienced.
Help will be welcome. I’ve been getting some help from Jakub in the last 2 weeks, and I’ve been really poor about keeping up with him. I want to turn that around
Hmm it seems the nested?
parameter of d/entities
has no effect on the outcome, at least it doesn't for my URI
-based nodes. That sounds like a bug
I was aware of that, but maybe I misunderstood what constitutes an entity "top-level". I'll re-read the wiki to be sure :thumbsup:
When entities are parsed in from JSON or EDN then they're marked at being a top leve entity unless they're already inside of another entity. It's then possible to reference another top level entity by ID. If you try to retrieve an entity that refers to another in this way and you don't have nested?
turned on, then you'll just get the entity ID and not the full nested object
Hmm yes that is how I understood it. And since my data consists of RDF triples (loaded using :tx-triples
), I figured all of the resources in object positions should not be resolved if nested?
is false
Well, it's the idea of parsing JSON. If I read:
[{:id 1} {:id 2} {:id 3}]
then that's 3 top level entities. Any entities nested inside of them aren't.
I mark top level entities with a property of :a/entity
Yes, if nested?
is false then it should just return a small object containing just the ID value
For example, given this RDF Turtle fragment:
shape:CShape
a sh:NodeShape ;
sh:targetClass vocab:C ;
When I ask for the shape:CShape
entity, I don't expect vocab:C
to be resolved with nested?
set to false
, since that is a top-level entity itselfAh yeah maybe that's it. I'm using the URI
objects now for resources, without having called the node generation function you referenced. I figured that was only needed for the blank nodes
Though… that's basically a :db/ident
…
Ah, do you have a triple of:
vocab:C :a/entity true
?
From what I understand, the :db/ident
is the first position in the triple right? That corresponds to the URI
instance for that resource IRI, yes.
What do you mean by a/:entity true
here?
No problem, good luck!
If you want to treat it as an entity you need to do the entity bookkeeping. This is how top level entities are marked
Ah, alright, now I understand 🙂. It's what you get for free in a nested structure like JSON, but with graph data you have to declare it yourself. Thanks
@quoll In the meantime I discovered https://github.com/ont-app/vocabulary/issues/20 which maps between URI strings and keywords for me. I've found this to be more readable than the URI
datatypes when printing the data, and because it's keywords, d/entities
already resolves the node.
I can imagine the support for URI
is still a good idea, but I'll let you be the judge of that. I don't seem to be using it anymore though.
Ideal for predicates. I’d be cautious of using them for entities, unless it’s an in-memory graph
Yes this is just for an in-memory, on-the-fly graph db used for a single run that transforms a schema
@quoll Turns out I was mistaken and we actually do rely on the changes in your 2.3.1-SNAPSHOT (the nodes
branch?).
Other developers are going to get involved now as well so I was wondering if you could release this version on Clojars any time soon.
Thanks!
OK. I have a few things backed up, so I really should try to get this done soon. Sorry it’s taking so long right now
That's sounds good, and no problem!
Haha lucky me. Thanks 🙂
Might be a good idea to announce it in the #announcements channel. I think it also will get picked up in the weekly Clojure Deref
I tend to put larger releases there, and just use #releases for smaller updates like this
What kind of with
function are you working on? Sounds like a context manager?
So you can execute transactions against a database, and it doesn’t change the original database
It will have some features that Datomic is missing (which has frustrated me a lot in the past)
Oh that sounds like an interesting feature!
It also includes the ability to ask the temporary database what the deltas are from the original database
That really sounds super cool. I'm not experienced enough to imagine all of the specific things this enables you to do, but I have a hunch it's a lot. Sounds like for instance you could do better version control and improve storage efficiency
But in Datomic, if you’ve done this, then there is no way to say, “I like these changes. Let’s push them into storage.”
Ahh right. That's definitely quite an addition. Nice 🙂
Sounds like a lovely technical challenge you can seek your teeth in deeply 😄
It’s also going to be the basis for faster transactions. i.e. a transaction will quickly create one of these databases and return it. A thread will get launched to merge the changes into the main database. Once the merge is done, then the database will internally switch over to the stored data
It’s all done by this namespace: https://github.com/quoll/asami/blob/main/src/asami/wrapgraph.cljc
So you're basically dividing up the work multi-threaded using these temporary databases, and safely merging it
Exciting 😄
Less about dividing. More about doing transactions in memory, then merging storage in the background
Got it
I'm now looking at how upsert behaves with child references.
(def db
(let [uri (str "asami:mem://" (name (gensym "my-in-mem-db-")))]
(d/create-database uri)
(d/connect uri)))
@(d/transact db [{:db/ident "parent"
:my/children [{:db/ident "child"
:a/id "child"}]}])
(d/entity db "child")
;; #:a{:id "child"}
;; "child" entity disappears
@(d/transact db [{:db/ident "parent"
:my/children' [{:db/ident "child"
:a/id "child"}]}])
(d/entity db "child")
;; nil
;; same transaction again. "child" entity reappears
@(d/transact db [{:db/ident "parent"
:my/children' [{:db/ident "child"
:a/id "child"}]}])
(d/entity db "child")
;; #:a{:id "child"}
Again, I'm probably doing something goofy. Does it make sense to use upsert with nested children?I'm also seeing that the replacement annotation doesn't apply for nested children.
(def db
(let [uri (str "asami:mem://" (name (gensym "my-in-mem-db-")))]
(d/create-database uri)
(d/connect uri)))
@(d/transact db [{:db/ident "child"
:a/id "child"}])
(d/entity db "child")
;; #:a{:id "child"}
@(d/transact db [{:db/ident "parent"
:my/children [{:db/ident "child"
:a/id' "child"}]}])
(d/entity db "child")
;; #:a{:id "child", :id' "child"}