Fork me on GitHub
#rdf
<
2023-03-31
>
simongray07:03:54

I wonder if it makes sense to create a simple QName type and then devise protocol which uses @eric.d.scott's conversion method for safe conversions between QName, URI, and keyword.

simongray09:03:54

@eric.d.scott What do you think about this? https://github.com/kuhumcst/qname/blob/main/src/dk/cst/qname.clj Others are also welcome to weigh in.

Eric Scott22:03:18

Sorry, I'm not in a position to give a thoughtful response right now. I'll revisit this in a couple of days when I'm back home.

simongray09:03:28

(should probably have been .cljc)

rickmoynihan10:03:05

FWIW I’ve spent a while thinking about this, as it’s 9 years now since I created grafter. Grafter’s approach was to use symbols and mount them in namespaces, and simply name them like rdfs:label, and bind the names to URI’s. We also used a simple “prefixer” function which you could give a base uri too, and it would concatenate the URI for you; and then adopted the convention of binding that prefixer to the prefix’s symbol in the namespace. https://github.com/Swirrl/grafter-vocabularies/blob/082ec0a05651316cd406faa13cc23671bfd83827/src/grafter/vocabularies/dcat.cljc#L4. At the time I was intending to build something fancier on top of this; in particular dynamically generating such namespaces from RDF vocabularies etc, and attaching more clojure metadata forms… but in the end I thought better of it; because I didn’t want to bake these kind of side effects and global state in at the bottom, as I wanted something pure. In my mind if you include state in a system you’re building a framework, and frameworks/state stray too much into application concerns; rather than library concerns, and I wanted to build a library. IIRC @eric.d.scott took this sort of approach; I vaguely remember discussing some of this with him at the time… Anyway in the past 9 years, we’ve seen clojure/rich-hickey re-emphasise the fact that keywords are namespaced, and even extend clojure require with better support for keywords and namespaces with :as-alias. So ever since then I’ve also wanted to help close this gap. Our in memory graph database library Matcha also allows the use of keywords as identifiers to support this. Matcha was born around the same time as asami, and shares a similar rationale (schemaless to support RDF), though doesn’t care to be like datomic. If I’d known about Asami back then I’d almost certainly have used it instead of making Matcha. Anyway, one of the biggest issues in my mind around this concerns prefixes and equality; as (not= (URI. "") :rdfs/label); hence handling qnames-as-keywords requires either a special comparison function (likely preloaded/partially applied with the prefixes) or a canonicalisation step. Also you can’t override equality deeply enough, such that it’s integrated into clojure sets etc without conversions, as it’s hard/impossible to make .equals commutative across types where you don’t control both of them. There’s a bunch of other approaches I’ve tried; for example you can modify print-methods to use prefixes you’ve defined, and to pretty print URI’s as prefixed keywords (but for them not to actually be keywords). The issue there is that you can’t copy/paste printed forms back into the REPL and have them read; though arguably you can’t really do that with URI’s either - at least not without adding *data-readers*, which also works by the way, but you then have the extensiblity mutable/state problem — which personally I can’t abide. I think it’s worth mentioning I’ve often thought the other option is to use tagged readers and symbols for this, for example you could have #rdf/iri rdfs:label or #rdf/iri ../foo/bar and the reader could do some canonicalisation for you; you then of course have the problem of injecting state into the reader which is gross — but there are other options/trade offs you could make here.

rickmoynihan10:03:16

also meant to mention a coercing tagged reader syntax can coerce across different representations such that are equal… e.g. (= #rdf/iri "" #rdf/iri rdfs:label) ;; => true

quoll14:03:19

> If I’d known about Asami back then I’d almost certainly have used it instead of making Matcha. And if I’d known about Datascript in 2016, then I’d have used that instead of Asami 🙂

😁 2
rickmoynihan15:03:10

Well I did know about Datascript back then but I needed something schemaless, and I needed it to accept URI’s and arbitrary types in the appropriate places to handle RDF. So FWIW I’m glad you didn’t know about Datascript back then, as it means Asami is now an option for me too! 🙂

quoll15:03:59

My weekends have been full recently. I need to do more with that project

quoll15:03:12

(Raphael and Donatello are about to get hooked in)

quoll15:03:33

I’m also looking to finish Twylyte

quoll15:03:02

Twylyte is a SPARQL to Asami-query converter

quoll15:03:03

It parses SPARQL into an AST, then transforms the CST into Asami queries as their internal representation 🙂

quoll15:03:40

This is basically what Jena does. They parse SPARQL, then transform queries into a lisp-like syntax as the internal representation

rickmoynihan16:03:56

Sounds like a fun project… I’ve often wanted a SPARQL -> EDN/AST -> SPARQL chain; for SPARQL query rewriting in clojure. I currently use the Jena stuff you’re referring to, to do some of that. It works but I’d much prefer a native clojure implementation.

rickmoynihan16:03:44

I even wrote some preliminary code to do it using instaparse; but it’s a big job, and has always been much bigger than my free time… and it got parked a long time ago.

quoll15:04:27

This is me too… I got the Instaparse portion working, but the transforms are large and non-trivial

rickmoynihan08:04:37

Yeah the instaparse bit is pretty simple; just paste the ABNF and tweak a few bits of syntax — the rest of the job is huge and requires a pretty large sustained and focussed effort

rickmoynihan08:04:58

(Instaparse is pretty amazing though)

rickmoynihan10:03:33

I think you need to be clear about what problems you want to solve… 1. Printing of URIs at the REPL is gross as they’re very noisy… so removing this noise in printed output at a REPL is nice… e.g. you’ve run a big sparql query and are inspecting intermediate rdf/edn data in memory. 2. You want a terser clojure syntax for supplying URIs/identifiers in queries / data. 3. You want a homoiconic short hand for expressing a subset of URIs which can be both read/written (copy/pasted in REPL sessions)

simongray10:03:11

4. I want a valid QName shorthand for representing an RDF resource in Clojure that is compatible with keywords, since most of Clojure wants and expects those. I think reader tags are definitely part of the solution too, but before one has tags, one must create conversion functions 🙂

simongray11:03:45

The alternative to a QName type is having to write out the full IRI every time it conflicts with the Clojure reader.

rickmoynihan10:03:30

yes definitely keywords are part of this too

rickmoynihan10:03:51

and there are many possible conversion functions / scenarios

simongray10:03:07

And it really bothers me that I can't write :prefix/123 because the Clojure reader breaks, so I guess the next best thing is to have something like #rdf/qname "prefix:123" .

quoll11:03:39

I would like to note that this is not a valid QName! Yes, Java’s QName class accepts it, and TTL/SPARQL parsers accept it, but that doesn’t change the fact that QNames are defined to not start the local name with a number

quoll11:03:30

For this reason, when we have numerically identified objects, we use names like: prefix:_123

quoll11:03:33

Though Fabian, Dean and Jim suggested things like: prefix:Q123

quoll11:03:49

If you’re going to break the standard by using a local name that starts with a number, then I think that it’s a good thing that you need to create your keyword with a function call rather than inline syntax 😊

quoll11:03:18

Note: I take advantage of the flexibility of parsers by declaring a SNOMED-CT namespace to deal with their horrible IRIs 😊

simongray11:03:01

Aha, good to know.

simongray11:03:09

I am actually not doing this, but it's not an impossibility to run into it. My main issue is that I want a slash which both breaks the Clojure reader and the transit read handler. I am actually not even sure the vocabulary library even fixes this issue.

quoll12:03:43

Slashes are also not valid in QNames 🙂

simongray13:03:37

Is there a source for what's a valid qname?

simongray13:03:55

Aren't they valid according to this? https://en.wikipedia.org/wiki/QName

quoll13:03:10

The local part is an NCName, and those are defined as NameStartChar (NameChar)* The NameStartChar does not include numbers

simongray13:03:53

yeah, but I meant "Slashes are also not valid in QNames". it doesn't seem to state that?

simongray13:03:22

I can't read what is in the code blocks, but the comment says >

(* any Unicode char, excluding surrogate blocks FFFE and FFFF. *)

quoll13:03:33

That’s Char not NameChar

quoll13:03:38

It’s a weird definition… a Name but then anything with a : in it is excluded

quoll13:03:22

The Char is only used in the “minus” expression

quoll13:03:28

Basically, it’s a Name but exclude anything that matches the regex: #”.*:.*”

quoll13:03:09

I have never seen - used in a grammar like this!

quoll13:03:19

Incidentally, for the sake of accuracy, the definition is at: https://www.w3.org/TR/REC-xml/#NT-Name

quoll13:03:55

(That’s Name)

simongray13:03:07

I see. So perhaps I am imagining an issue that isn't really there in practice.

simongray13:03:24

Thanks for clarifying!

simongray13:03:01

It's hard to Google RDF stuff like "are slashes allowed" since you inevitably end up at long w3c specification with some terse grammar that needs to be parsed correctly 😅

quoll13:03:21

TBH, I had to look up / (It’s #2F)

rickmoynihan10:03:10

I’m just trying to put more options on the table for you 🙂

simongray10:03:34

yes, thank you for your valuable input 🙂 I do appreciate it

rickmoynihan11:03:50

FWIW I think the next best thing is actually #rdf/qname prefix:123

quoll14:03:09

Now that I’m on a keyboard… I like using keywords, since the syntax is so close to QNames, and it’s convenient in code (e.g. :rdf/type). @rickmoynihan is correct about state, but on that I have a few things to say: • Some namespaces will never change (well, they could, but that’s such a terrible idea that you deserve what you get if you try it 😜). So :rdf/type, :rdfs/range :owl/TransitivePredicate and so on can always presume that their namespaces are the standard ones. • Any use of another namespace has to occur within a context. That means passing around the context as an argument. Bind it to a dynamic var if you don’t want to pollute your function signatures, but there is always a possibility of error if you don’t make the current context available. That second point is especially important in syntaxes like Turtle, since a prefix can get rebound during a document. I can’t say that would be best practice, but it’s possible, and if you don’t support it then you’re going to have a hard time when someone gives you a valid document that you can’t deal with.

rickmoynihan15:03:27

yeah I totally agree with all that @quoll… It’s a good point about rebinding prefixes in turtle; there are numerous edge cases like that. My answer to that particular one is don’t trust 3rd party user supplied prefixes, or rather expand them on sight unless they are in your supplied context’s prefix map… Basically prefixes and IRI shorthands are useful in your context; typically they represent the vocabulary terms you are targeting in your app at a given point… i.e. the prefixes are a little like the schema for that bit of your application, if you never look at or match on the IRI terms it doesn’t matter. The issue is of course, sometimes the other terms do matter; e.g. when you want all ?p but don’t know what they are; and you’re combining results across contexts — which is when the equality issues strike.

quoll15:03:44

This is why I save IRIs in databases, and not QNames. It bothers me though… QNames are: • Smaller, and take less space • Often the original representation of the data • What the user wants to see

rickmoynihan16:03:42

I’ve argued similar things… e.g. in JSON-LD, if you have an API people want to use the context to provide a nice surface syntax… but the issue is that if unprefixed data leaks out, it’s a breaking change to apply a prefix later to it… hence I’ve argued people need to use the JSON-LD framed syntax, which has all the URI’s expanded. It sucks, but it’s consistent and at least there’s no risk of it breaking for cosmetic reasons.

rickmoynihan16:03:51

The other option is to store the prefixes… which is nicer; but requires buy in from the upstream producers etc.