rdf 2022-02-16 | Slack Archive

If you’ve not seen it, this looks great: https://clojurians.slack.com/archives/C06MAR553/p1644949288597739

@rickmoynihan Wow, that looks fantastic. I’ve been using the lisp-like-yet-superficially-datomic’ish query language of Aristotle and it works well for simple queries, but gets unwieldy for complex ones. Having to use SPARQL is a regression in other ways, so this solves that problem in a neat way.

Kelvin15:02:07

Going off of what you said, I wrote this lib because I was annoyed that there was literally no viable DSL for SPARQL or RDF queries except for Aristotle, which we previously used but quickly ran into its limitations.

simongray15:02:33

Aha, so you get precisely what I mean. Thanks for making this! How stable would you say it is?

Kelvin15:02:55

We're currently using Flint on our current projects at our company, so it's definitely stable enough.

Kelvin15:02:15

Though of course as a new library there will be bugs that'll be uncovered and will need to be ironed out.

👍 1

rickmoynihan15:02:33

@kelvin063 Picking up from the discussion here: https://clojurians.slack.com/archives/C06MAR553/p1645003835679289?thread_ts=1644949288.597739&cid=C06MAR553 https://clojurians.slack.com/archives/C06MAR553/p1645022454071689?thread_ts=1644949288.597739&cid=C06MAR553

rickmoynihan15:02:13

Regarding 1. use of protocols etc… I absolutely agree flint should remain independent of any particular backend. This is one the things I wanted for such a lib too! However that is different to having clean type interop with arbitrary backends via protocols.

rickmoynihan15:02:46

There are a few ways you could do this… One is to just provide the protocols and extend them to the types you already use; but allow others to extend them to rdf4j/jena/whatever-js-lib you want etc.

rickmoynihan15:02:50

You can also do conditional requires, that test if certain classes/libraries etc are available and load the protocols when they are.

rickmoynihan15:02:34

for example grafter (an old clojure lib for rdf that I still maintain) provides quite extensive type coercions for java and clojure types into xsd… e.g. java.time.LocalDate is an xsd:date; but java.time.LocalDateTime is an xsd:dateTime

Kelvin15:02:06

So the protocol approach might look something like this?

(defprotocol RDFDataType
  (-to-string [x]))

(defrecord MyCustomDataType [value]
  RDFDataType
  (-to-string [x] (str (:value x)))

rickmoynihan15:02:51

Yes, though you may also want a datatype-uri protocol too, that provides just the datatype URIs

rickmoynihan15:02:21

We actually have a similar library to this btw; that a colleague of mine wrote; not yet open sourced. The main difference is that it provides a thin layer of macros to aid query building / binding runtime values into the query etc. It more or less provides full sparql 1.1 support. I’m personally on the fence about data vs macros; there are pros and cons. It’s interesting anyway because it works on top of a fine grained data AST. Anyway I mention it because the usecase I often want to support is query generation and splicing runtime values into queries etc… Which is why I was advocating protocols and not representing URIs as strings; in particular though I was wondering if the reason you have the “<http://foo/>” syntax instead of using (URI. "") is essentially because of quoting?

Kelvin14:02:40

So there were a couple of reasons I went with the angle bracket syntax: 1. The thought of using types/protocols simply did not occur to me, as I said in the main thread. 2. I wanted to have a SPARQL-like feel when using Flint, hence variables start with ?, blank nodes start with _, and yes IRIs are surrounded by angle brackets. 3. Related to the above, I thought that the angle bracket syntax was the easiest for a user to grasp, especially if they already had knowledge of SPARQL. 4. Specific to URI but I don't want to be dependent on Java classes like that (not really a problem with custom protocols but then that's another thing for the user to become familiar with).

rickmoynihan16:02:40

Firstly, I’m really sorry if I’m coming across as belligerent; I genuinely LOVE everything you’ve done with flint, except this small small detail. Again I’m very grateful for you sharing this! 🙇 > 2. I wanted to have a SPARQL-like feel when using Flint, hence variables start with ?, blank nodes start with _, and yes IRIs are surrounded by angle brackets. Yes, I totally love this. I just feel like the problem is that URI’s are the odd ones out. i.e. all of your other types are proper ones… e.g. you don’t parse strings to check if they’re xsd:dateTime’s, you map the underlying #inst datatype. Years ago in an early version of grafter, I did the same; allowing strings to be treated like URI’s in some contexts, and later regretted it a lot. > 4. Specific to URI but I don’t want to be dependent on Java classes like that (not really a problem with custom protocols but then that’s another thing for the user to become familiar with). Coming at this from a different angle; I think if you’re doing serious RDF work you’ll already be using a robust library for properly handling the datatypes somewhere. In the work I do, we care a lot about the correct unambiguous interpretation of data. A lot of the time query results from one query will be used in a subsequent query; and having to coerce those types is a problem. I think a good division of responsibilities is for flint to provide protocols for that coercion to be delegated to a library like this and call it at the right time (as late as possible). You don’t need to pick a backend or favour any particular one (I’d certainly rather you remained agnostic); however supporting builtin platform provided types and mapping them to xsd/rdf where possible makes sense and needn’t contradict this goal. For example adding support for java.net.URI is unambiguous… and cljc would just use goog.Uri a user could optionally then choose to use a library like: https://github.com/henryw374/uri Similarly another dep can always extend support for rdf4j/jena or rdf.js.

rickmoynihan16:02:55

incidentally you can resolve this problem:

(-> {:select *
 :where [[(URI. "") ?p ?o]]}
 :where
 first) ;; => '(URI. "")

By registering a data_reader and doing:

{:select *
 :where [[#flint/uri "" ?p ?o]]}

rickmoynihan16:02:34

You shouldn’t coin #uri incase clojure does in the future (I’d love it if it did)… but could probably use #f/uri as a namespace 🙂

Kelvin22:02:19

First of all, don't worry about coming off as belligerent; it's all constructive criticism, so it's all good. The main thing that's holding me back (besides obvious time constraints and such) is that this would be a major change to how Flint queries and updates are written, especially if we break backwards compatibility by getting rid of the <> syntax. (Though I guess it's an argument for "make this change early in the project's life.") I'll have to talk about it with my coworkers who are already using Flint before making such changes.

Kelvin22:02:40

As a side note, the quoting issue was actually never one of the reasons why I chose the <> syntax. In part because there are worse ways to run into quoting issues, e.g. when you're building Flint queries from the ground up (and in that case I just quote the symbols individually).

👍 1

rickmoynihan15:02:57

Sorry I’ll restate this… If you used underlying types, which I think would be more precise/secure and ultimately extensible you could write something like this:

(flint/format-query (let [dataset (URI. "")]
                      `{:select ~'*
                        :where [[~dataset ~'?p ~'?o]]}))

rickmoynihan15:02:20

Essentially a lot of SPARQL queries we write end up binding variables in the query with some user supplied data etc. So being able to validate that by reading it into a proper type rather than a string is useful and more consistent, and ensures types can be perfectly mapped/preserved into xsd.

rickmoynihan15:02:12

the ~' unquoting is the sort of thing my colleagues macro layer lets you avoid (though I’m not advocating that)

quoll16:02:53

This may be really useful for me to do testing with. I’ll be wrapping Asami in RDF/SPARQL soon, since my new job is back in that sphere, and building strings in Clojure is really annoying

rickmoynihan14:02:32

whilst you’re here, so to speak… I was wondering if you could elaborate for me what asami/datomic’s approach to the quoting/binding issue with query data is? i.e. you have a query as data like:

'{:select *
  :where [[?s ?p ?o]]}

And you want to bind ?s to a specific URI?

quoll15:02:36

Well, if the ?s is supposed to be bound to a specific URI, then it would be ideal if the pattern were written that way. e.g. if the uri is . Also, I’m going to pretend that I have a uri reader so I can write it in code as #uri "" Then:

'{:select *
  :where [[#uri "" ?p ?o]]}

But I appreciate that you want to see a ?s bound in there. In SPARQL, then you’d just say:

'{:select [#uri "" :as ?s ?p ?o]
  :where [[#uri "" ?p ?o]]}

But neither of those answer the question 🙂 The way to do it in Asami is:

(q '{:find [?s ?p ?o]
     :in $ ?s
     :where [[?s ?p ?o]]}
   the-db #uri "")

Datomic is nearly the same:

(q '{:find [?s ?p ?o]
     :in $ ?s
     :where [[?p* :db/ident ?p] [?s ?p* ?o]]}
   the-db #uri "")

rickmoynihan15:02:51

Thanks… I was actually asking a slightly different question… How would you resolve this (deliberate) class of problem; in the asami/datomic world?

(let [s (URI. "<http://s>")]
   '{:select *
     :where [[s ?p ?o]]})
=> {:select * :where [[s ?p ?o]]} ;; <=== not what we want :-)

quoll16:02:37

Sorry, I don’t follow. Are you trying to get the * in the select clause to have 3 bindings (with ?s already set to the required URI)? Also, I’m presuming that you have pseudo-code there, since you didn’t unquote the s in the where clause

rickmoynihan16:02:18

the lack of unquote is the deliberate mistake

rickmoynihan16:02:27

i.e. a correct way to do it is here: https://clojurians.slack.com/archives/C09GHBXRC/p1645026177945909 I’m really just asking how people tend do this dynamic query generation stuff with asami/datomic… in particular binding replacement… e.g. another way to do it in a more restricted case is in rdf4j you can take a query like:

SELECT * WHERE { ?s ?p ?o }

and provide essentially a map of bindings to rebind {'?s (URI. "") Which will then essentially rewrite that variable in the query.

rickmoynihan16:02:30

a small issue with the way I linked is that the use of syntax quote makes the query noisier as you need to use ~' to avoid materialising namespace qualified symbols. I’m personally ok with it; but if you do dislike it you can also avoid it with a thin layer of macros.

quoll16:02:22

I’m still a bit confused as to how the binding with the :in clause doesn’t work?

quoll16:02:30

internally, Asami literally creates a map of {?s (URI. "")} when you do this

rickmoynihan16:02:31

ok that bit does answer the question — and is essentially the same approach as in rdf4j etc

rickmoynihan16:02:29

Thanks. I was basically asking, because I suspected the :in clause in datomic/asami did this… I was wondering because I thought this sort of thing might be an acceptable solution for flint — treating URI types more rigorously etc… cc @kelvin063

rickmoynihan16:02:08

@U051N6TTC is there a way in the api to do that kind of substitution on fragments of a query; or must it always be applied at the top?

quoll16:02:39

Come to think of it, it’s not quite a map. Asami results are just seqs of vectors, with metadata for the column names. So:

^{:cols '[?s ?p ?o]}
([s1 p1 o1]
 [s2 p2 o2]
 [s3 p3 o3])

What actually happens with the binding is that it starts resolving the query with:

^{:cols '[?s]}
([<>])

So it’s a single-row result with one column. This then gets joined to the result of [?s ?p ?o] which shares that ?s variable.

rickmoynihan16:02:00

:thumbsup: makes perfect sense

quoll16:02:01

Not sure what you mean by “fragments of a query”?

quoll16:02:53

You can provide bindings to start with, so it need not be one-column/one-row bindings. You can pre-bind multiple columns and rows, and start the query from that point

rickmoynihan16:02:58

say you’re dynamically generating a query; and have a sub function that will ultimately generate a FILTER NOT EXISTS { bgps } block… but some of those bgps contain lexically scoped bindings you want rewritten

quoll16:02:04

Hmmm, that’s interesting. I’ve never thought about prebinding things you want removed.

rickmoynihan16:02:39

Yeah; it’s a useful trick

quoll16:02:45

OK… in Asami there is currently no way. It would be easy enough to do (from a query resolution perspective), but I’d need to figure out the API to get the data into the right place

quoll16:02:21

especially if you had, say, 2 or more NOT EXISTS, then how do you get the bindings to the right one? 🙂

rickmoynihan16:02:02

lexical scope is one way

rickmoynihan16:02:20

e.g.

`{:select * 
 :where [[~(sub-query arg1)]
         [~(sub-query arg2)]]}

rickmoynihan16:02:14

then the subqueries internally replace through the discussed mechanism

quoll17:02:22

For this kind of thing, I build the query in code, and not as a series of quote/unquoting operations

quoll17:02:01

(assoc query :where (concat first-part second-part)) kind of thing

rickmoynihan17:02:26

yeah at some point of complexity that’s always required; the question is how far can you get without sacrificing too much in the way of intent/expressivity 🙂

quoll17:02:24

I think it needs to be a part of the query grammar

2022-02-16

Channels