rdf

simongray 2023-03-31T07:44:54.344279Z

I wonder if it makes sense to create a simple QName type and then devise protocol which uses @eric.d.scott's conversion method for safe conversions between QName, URI, and keyword.

simongray 2023-03-31T09:23:54.675999Z

@eric.d.scott What do you think about this? https://github.com/kuhumcst/qname/blob/main/src/dk/cst/qname.clj Others are also welcome to weigh in.

2023-03-31T22:11:18.712059Z

Sorry, I'm not in a position to give a thoughtful response right now. I'll revisit this in a couple of days when I'm back home.

simongray 2023-03-31T09:28:28.136779Z

(should probably have been .cljc)

2023-03-31T10:47:05.293729Z

FWIW I’ve spent a while thinking about this, as it’s 9 years now since I created grafter. Grafter’s approach was to use symbols and mount them in namespaces, and simply name them like rdfs:label, and bind the names to URI’s. We also used a simple “prefixer” function which you could give a base uri too, and it would concatenate the URI for you; and then adopted the convention of binding that prefixer to the prefix’s symbol in the namespace. https://github.com/Swirrl/grafter-vocabularies/blob/082ec0a05651316cd406faa13cc23671bfd83827/src/grafter/vocabularies/dcat.cljc#L4. At the time I was intending to build something fancier on top of this; in particular dynamically generating such namespaces from RDF vocabularies etc, and attaching more clojure metadata forms… but in the end I thought better of it; because I didn’t want to bake these kind of side effects and global state in at the bottom, as I wanted something pure. In my mind if you include state in a system you’re building a framework, and frameworks/state stray too much into application concerns; rather than library concerns, and I wanted to build a library. IIRC @eric.d.scott took this sort of approach; I vaguely remember discussing some of this with him at the time… Anyway in the past 9 years, we’ve seen clojure/rich-hickey re-emphasise the fact that keywords are namespaced, and even extend clojure require with better support for keywords and namespaces with :as-alias. So ever since then I’ve also wanted to help close this gap. Our in memory graph database library Matcha also allows the use of keywords as identifiers to support this. Matcha was born around the same time as asami, and shares a similar rationale (schemaless to support RDF), though doesn’t care to be like datomic. If I’d known about Asami back then I’d almost certainly have used it instead of making Matcha. Anyway, one of the biggest issues in my mind around this concerns prefixes and equality; as (not= (URI. "") :rdfs/label); hence handling qnames-as-keywords requires either a special comparison function (likely preloaded/partially applied with the prefixes) or a canonicalisation step. Also you can’t override equality deeply enough, such that it’s integrated into clojure sets etc without conversions, as it’s hard/impossible to make .equals commutative across types where you don’t control both of them. There’s a bunch of other approaches I’ve tried; for example you can modify print-methods to use prefixes you’ve defined, and to pretty print URI’s as prefixed keywords (but for them not to actually be keywords). The issue there is that you can’t copy/paste printed forms back into the REPL and have them read; though arguably you can’t really do that with URI’s either - at least not without adding *data-readers*, which also works by the way, but you then have the extensiblity mutable/state problem — which personally I can’t abide. I think it’s worth mentioning I’ve often thought the other option is to use tagged readers and symbols for this, for example you could have #rdf/iri rdfs:label or #rdf/iri ../foo/bar and the reader could do some canonicalisation for you; you then of course have the problem of injecting state into the reader which is gross — but there are other options/trade offs you could make here.

2023-03-31T10:57:16.057419Z

also meant to mention a coercing tagged reader syntax can coerce across different representations such that are equal… e.g. (= #rdf/iri "" #rdf/iri rdfs:label) ;; => true

quoll 2023-03-31T14:33:19.310409Z

> If I’d known about Asami back then I’d almost certainly have used it instead of making Matcha. And if I’d known about Datascript in 2016, then I’d have used that instead of Asami 🙂

😁 1
2023-03-31T15:14:10.888269Z

Well I did know about Datascript back then but I needed something schemaless, and I needed it to accept URI’s and arbitrary types in the appropriate places to handle RDF. So FWIW I’m glad you didn’t know about Datascript back then, as it means Asami is now an option for me too! 🙂

quoll 2023-03-31T15:50:43.730189Z

lol

quoll 2023-03-31T15:50:59.537689Z

My weekends have been full recently. I need to do more with that project

quoll 2023-03-31T15:51:12.098899Z

(Raphael and Donatello are about to get hooked in)

quoll 2023-03-31T15:52:33.731079Z

I’m also looking to finish Twylyte

quoll 2023-03-31T15:53:02.663819Z

Twylyte is a SPARQL to Asami-query converter

quoll 2023-03-31T15:54:03.582529Z

It parses SPARQL into an AST, then transforms the CST into Asami queries as their internal representation 🙂

quoll 2023-03-31T15:54:40.460029Z

This is basically what Jena does. They parse SPARQL, then transform queries into a lisp-like syntax as the internal representation

2023-03-31T16:43:56.591779Z

Sounds like a fun project… I’ve often wanted a SPARQL -> EDN/AST -> SPARQL chain; for SPARQL query rewriting in clojure. I currently use the Jena stuff you’re referring to, to do some of that. It works but I’d much prefer a native clojure implementation.

2023-03-31T16:45:44.570459Z

I even wrote some preliminary code to do it using instaparse; but it’s a big job, and has always been much bigger than my free time… and it got parked a long time ago.

2023-04-04T08:27:37.027569Z

Yeah the instaparse bit is pretty simple; just paste the ABNF and tweak a few bits of syntax — the rest of the job is huge and requires a pretty large sustained and focussed effort

2023-04-04T08:27:58.568339Z

(Instaparse is pretty amazing though)

quoll 2023-04-03T15:39:27.145619Z

This is me too… I got the Instaparse portion working, but the transforms are large and non-trivial

2023-03-31T10:53:33.964999Z

I think you need to be clear about what problems you want to solve… 1. Printing of URIs at the REPL is gross as they’re very noisy… so removing this noise in printed output at a REPL is nice… e.g. you’ve run a big sparql query and are inspecting intermediate rdf/edn data in memory. 2. You want a terser clojure syntax for supplying URIs/identifiers in queries / data. 3. You want a homoiconic short hand for expressing a subset of URIs which can be both read/written (copy/pasted in REPL sessions)

simongray 2023-03-31T10:57:11.509519Z

4. I want a valid QName shorthand for representing an RDF resource in Clojure that is compatible with keywords, since most of Clojure wants and expects those. I think reader tags are definitely part of the solution too, but before one has tags, one must create conversion functions 🙂

simongray 2023-03-31T11:02:45.815589Z

The alternative to a QName type is having to write out the full IRI every time it conflicts with the Clojure reader.

2023-03-31T10:58:30.706719Z

yes definitely keywords are part of this too

2023-03-31T10:58:51.750879Z

and there are many possible conversion functions / scenarios

simongray 2023-03-31T10:59:07.885009Z

And it really bothers me that I can't write :prefix/123 because the Clojure reader breaks, so I guess the next best thing is to have something like #rdf/qname "prefix:123" .

quoll 2023-03-31T11:17:39.311149Z

I would like to note that this is not a valid QName! Yes, Java’s QName class accepts it, and TTL/SPARQL parsers accept it, but that doesn’t change the fact that QNames are defined to not start the local name with a number

quoll 2023-03-31T11:18:30.075569Z

For this reason, when we have numerically identified objects, we use names like: prefix:_123

quoll 2023-03-31T11:19:33.499639Z

Though Fabian, Dean and Jim suggested things like: prefix:Q123

quoll 2023-03-31T11:21:49.605839Z

If you’re going to break the standard by using a local name that starts with a number, then I think that it’s a good thing that you need to create your keyword with a function call rather than inline syntax 😊

quoll 2023-03-31T11:23:18.275379Z

Note: I take advantage of the flexibility of parsers by declaring a SNOMED-CT namespace to deal with their horrible IRIs 😊

simongray 2023-03-31T11:39:01.625009Z

Aha, good to know.

simongray 2023-03-31T11:46:09.777479Z

I am actually not doing this, but it's not an impossibility to run into it. My main issue is that I want a slash which both breaks the Clojure reader and the transit read handler. I am actually not even sure the vocabulary library even fixes this issue.

quoll 2023-03-31T12:44:43.911589Z

Slashes are also not valid in QNames 🙂

simongray 2023-03-31T13:05:52.332579Z

Really?

simongray 2023-03-31T13:06:37.876359Z

Is there a source for what's a valid qname?

simongray 2023-03-31T13:08:55.319889Z

Aren't they valid according to this? https://en.wikipedia.org/wiki/QName

quoll 2023-03-31T13:09:06.522789Z

Wikipedia simplifies it: https://en.m.wikipedia.org/wiki/QName

quoll 2023-03-31T13:11:10.424249Z

The local part is an NCName, and those are defined as NameStartChar (NameChar)* The NameStartChar does not include numbers

simongray 2023-03-31T13:11:53.424669Z

yeah, but I meant "Slashes are also not valid in QNames". it doesn't seem to state that?

simongray 2023-03-31T13:12:22.310709Z

I can't read what is in the code blocks, but the comment says >

(* any Unicode char, excluding surrogate blocks FFFE and FFFF. *)

quoll 2023-03-31T13:15:33.946919Z

That’s Char not NameChar

quoll 2023-03-31T13:17:38.583669Z

It’s a weird definition… a Name but then anything with a : in it is excluded

quoll 2023-03-31T13:18:22.056219Z

The Char is only used in the “minus” expression

simongray 2023-03-31T13:18:34.759979Z

hm...

quoll 2023-03-31T13:20:28.287109Z

Basically, it’s a Name but exclude anything that matches the regex: #”.*:.*”

quoll 2023-03-31T13:21:09.791419Z

I have never seen - used in a grammar like this!

quoll 2023-03-31T13:27:19.813019Z

Incidentally, for the sake of accuracy, the definition is at: https://www.w3.org/TR/REC-xml/#NT-Name

quoll 2023-03-31T13:27:55.792239Z

(That’s Name)

quoll 2023-03-31T13:28:24.199969Z

QName is: https://www.w3.org/TR/REC-xml-names/#NT-QName

simongray 2023-03-31T13:37:07.854769Z

I see. So perhaps I am imagining an issue that isn't really there in practice.

simongray 2023-03-31T13:37:24.195879Z

Thanks for clarifying!

simongray 2023-03-31T13:39:01.493389Z

It's hard to Google RDF stuff like "are slashes allowed" since you inevitably end up at long w3c specification with some terse grammar that needs to be parsed correctly 😅

quoll 2023-03-31T13:40:21.535689Z

TBH, I had to look up / (It’s #2F)

2023-03-31T10:59:10.676389Z

I’m just trying to put more options on the table for you 🙂

simongray 2023-03-31T10:59:34.228079Z

yes, thank you for your valuable input 🙂 I do appreciate it

2023-03-31T11:02:50.514819Z

FWIW I think the next best thing is actually #rdf/qname prefix:123

quoll 2023-03-31T14:31:09.259229Z

Now that I’m on a keyboard… I like using keywords, since the syntax is so close to QNames, and it’s convenient in code (e.g. :rdf/type). @rickmoynihan is correct about state, but on that I have a few things to say: • Some namespaces will never change (well, they could, but that’s such a terrible idea that you deserve what you get if you try it 😜). So :rdf/type, :rdfs/range :owl/TransitivePredicate and so on can always presume that their namespaces are the standard ones. • Any use of another namespace has to occur within a context. That means passing around the context as an argument. Bind it to a dynamic var if you don’t want to pollute your function signatures, but there is always a possibility of error if you don’t make the current context available. That second point is especially important in syntaxes like Turtle, since a prefix can get rebound during a document. I can’t say that would be best practice, but it’s possible, and if you don’t support it then you’re going to have a hard time when someone gives you a valid document that you can’t deal with.

2023-03-31T15:09:27.303279Z

yeah I totally agree with all that @quoll… It’s a good point about rebinding prefixes in turtle; there are numerous edge cases like that. My answer to that particular one is don’t trust 3rd party user supplied prefixes, or rather expand them on sight unless they are in your supplied context’s prefix map… Basically prefixes and IRI shorthands are useful in your context; typically they represent the vocabulary terms you are targeting in your app at a given point… i.e. the prefixes are a little like the schema for that bit of your application, if you never look at or match on the IRI terms it doesn’t matter. The issue is of course, sometimes the other terms do matter; e.g. when you want all ?p but don’t know what they are; and you’re combining results across contexts — which is when the equality issues strike.

quoll 2023-03-31T15:11:44.877699Z

This is why I save IRIs in databases, and not QNames. It bothers me though… QNames are: • Smaller, and take less space • Often the original representation of the data • What the user wants to see

2023-03-31T16:46:25.973789Z

Yeah.

2023-03-31T16:48:42.364799Z

I’ve argued similar things… e.g. in JSON-LD, if you have an API people want to use the context to provide a nice surface syntax… but the issue is that if unprefixed data leaks out, it’s a breaking change to apply a prefix later to it… hence I’ve argued people need to use the JSON-LD framed syntax, which has all the URI’s expanded. It sucks, but it’s consistent and at least there’s no risk of it breaking for cosmetic reasons.

2023-03-31T16:49:51.688939Z

The other option is to store the prefixes… which is nicer; but requires buy in from the upstream producers etc.