rdf

2022-02-16T09:35:11.960169Z

If you’ve not seen it, this looks great: https://clojurians.slack.com/archives/C06MAR553/p1644949288597739

simongray 2022-02-16T09:39:43.145499Z

@rickmoynihan Wow, that looks fantastic. I’ve been using the lisp-like-yet-superficially-datomic’ish query language of Aristotle and it works well for simple queries, but gets unwieldy for complex ones. Having to use SPARQL is a regression in other ways, so this solves that problem in a neat way.

Kelvin 2022-02-16T15:10:07.524679Z

Going off of what you said, I wrote this lib because I was annoyed that there was literally no viable DSL for SPARQL or RDF queries except for Aristotle, which we previously used but quickly ran into its limitations.

simongray 2022-02-16T15:20:33.935749Z

Aha, so you get precisely what I mean. Thanks for making this! How stable would you say it is?

Kelvin 2022-02-16T15:29:55.020269Z

We're currently using Flint on our current projects at our company, so it's definitely stable enough.

Kelvin 2022-02-16T15:30:15.151319Z

Though of course as a new library there will be bugs that'll be uncovered and will need to be ironed out.

👍 1
2022-02-16T15:09:13.430569Z

Regarding 1. use of protocols etc… I absolutely agree flint should remain independent of any particular backend. This is one the things I wanted for such a lib too! However that is different to having clean type interop with arbitrary backends via protocols.

2022-02-16T15:11:46.460859Z

There are a few ways you could do this… One is to just provide the protocols and extend them to the types you already use; but allow others to extend them to rdf4j/jena/whatever-js-lib you want etc.

2022-02-16T15:12:50.960489Z

You can also do conditional requires, that test if certain classes/libraries etc are available and load the protocols when they are.

2022-02-16T15:15:34.696029Z

for example grafter (an old clojure lib for rdf that I still maintain) provides quite extensive type coercions for java and clojure types into xsd… e.g. java.time.LocalDate is an xsd:date; but java.time.LocalDateTime is an xsd:dateTime

Kelvin 2022-02-16T15:16:06.268159Z

So the protocol approach might look something like this?

(defprotocol RDFDataType
  (-to-string [x]))

(defrecord MyCustomDataType [value]
  RDFDataType
  (-to-string [x] (str (:value x)))

2022-02-16T15:22:51.891219Z

Yes, though you may also want a datatype-uri protocol too, that provides just the datatype URIs

2022-02-16T15:36:21.875479Z

We actually have a similar library to this btw; that a colleague of mine wrote; not yet open sourced. The main difference is that it provides a thin layer of macros to aid query building / binding runtime values into the query etc. It more or less provides full sparql 1.1 support. I’m personally on the fence about data vs macros; there are pros and cons. It’s interesting anyway because it works on top of a fine grained data AST. Anyway I mention it because the usecase I often want to support is query generation and splicing runtime values into queries etc… Which is why I was advocating protocols and not representing URIs as strings; in particular though I was wondering if the reason you have the “<http://foo/>” syntax instead of using (URI. "") is essentially because of quoting?

Kelvin 2022-02-17T14:21:40.160819Z

So there were a couple of reasons I went with the angle bracket syntax: 1. The thought of using types/protocols simply did not occur to me, as I said in the main thread. 2. I wanted to have a SPARQL-like feel when using Flint, hence variables start with ?, blank nodes start with _, and yes IRIs are surrounded by angle brackets. 3. Related to the above, I thought that the angle bracket syntax was the easiest for a user to grasp, especially if they already had knowledge of SPARQL. 4. Specific to URI but I don't want to be dependent on Java classes like that (not really a problem with custom protocols but then that's another thing for the user to become familiar with).

2022-02-17T16:05:40.799899Z

Firstly, I’m really sorry if I’m coming across as belligerent; I genuinely LOVE everything you’ve done with flint, except this small small detail. Again I’m very grateful for you sharing this! 🙇 > 2. I wanted to have a SPARQL-like feel when using Flint, hence variables start with ?, blank nodes start with _, and yes IRIs are surrounded by angle brackets. Yes, I totally love this. I just feel like the problem is that URI’s are the odd ones out. i.e. all of your other types are proper ones… e.g. you don’t parse strings to check if they’re xsd:dateTime’s, you map the underlying #inst datatype. Years ago in an early version of grafter, I did the same; allowing strings to be treated like URI’s in some contexts, and later regretted it a lot. > 4. Specific to URI but I don’t want to be dependent on Java classes like that (not really a problem with custom protocols but then that’s another thing for the user to become familiar with). Coming at this from a different angle; I think if you’re doing serious RDF work you’ll already be using a robust library for properly handling the datatypes somewhere. In the work I do, we care a lot about the correct unambiguous interpretation of data. A lot of the time query results from one query will be used in a subsequent query; and having to coerce those types is a problem. I think a good division of responsibilities is for flint to provide protocols for that coercion to be delegated to a library like this and call it at the right time (as late as possible). You don’t need to pick a backend or favour any particular one (I’d certainly rather you remained agnostic); however supporting builtin platform provided types and mapping them to xsd/rdf where possible makes sense and needn’t contradict this goal. For example adding support for java.net.URI is unambiguous… and cljc would just use goog.Uri a user could optionally then choose to use a library like: https://github.com/henryw374/uri Similarly another dep can always extend support for rdf4j/jena or rdf.js.

2022-02-17T16:10:55.097189Z

incidentally you can resolve this problem:

(-> {:select *
 :where [[(URI. "") ?p ?o]]}
 :where
 first) ;; => '(URI. "")
By registering a data_reader and doing:
{:select *
 :where [[#flint/uri "" ?p ?o]]}

2022-02-17T16:12:34.345039Z

You shouldn’t coin #uri incase clojure does in the future (I’d love it if it did)… but could probably use #f/uri as a namespace 🙂

Kelvin 2022-02-17T22:45:19.265949Z

First of all, don't worry about coming off as belligerent; it's all constructive criticism, so it's all good. The main thing that's holding me back (besides obvious time constraints and such) is that this would be a major change to how Flint queries and updates are written, especially if we break backwards compatibility by getting rid of the <> syntax. (Though I guess it's an argument for "make this change early in the project's life.") I'll have to talk about it with my coworkers who are already using Flint before making such changes.

Kelvin 2022-02-17T22:48:40.389749Z

As a side note, the quoting issue was actually never one of the reasons why I chose the <> syntax. In part because there are worse ways to run into quoting issues, e.g. when you're building Flint queries from the ground up (and in that case I just quote the symbols individually).

👍 1
2022-02-16T15:42:57.945909Z

Sorry I’ll restate this… If you used underlying types, which I think would be more precise/secure and ultimately extensible you could write something like this:

(flint/format-query (let [dataset (URI. "")]
                      `{:select ~'*
                        :where [[~dataset ~'?p ~'?o]]}))

2022-02-16T15:46:20.550009Z

Essentially a lot of SPARQL queries we write end up binding variables in the query with some user supplied data etc. So being able to validate that by reading it into a proper type rather than a string is useful and more consistent, and ensures types can be perfectly mapped/preserved into xsd.

2022-02-16T15:58:12.987589Z

the ~' unquoting is the sort of thing my colleagues macro layer lets you avoid (though I’m not advocating that)

quoll 2022-02-16T16:17:53.111639Z

This may be really useful for me to do testing with. I’ll be wrapping Asami in RDF/SPARQL soon, since my new job is back in that sphere, and building strings in Clojure is really annoying

2022-02-17T14:02:32.740229Z

whilst you’re here, so to speak… I was wondering if you could elaborate for me what asami/datomic’s approach to the quoting/binding issue with query data is? i.e. you have a query as data like:

'{:select *
  :where [[?s ?p ?o]]}
And you want to bind ?s to a specific URI?

quoll 2022-02-17T15:55:36.857179Z

Well, if the ?s is supposed to be bound to a specific URI, then it would be ideal if the pattern were written that way. e.g. if the uri is . Also, I’m going to pretend that I have a uri reader so I can write it in code as #uri "" Then:

'{:select *
  :where [[#uri "" ?p ?o]]}
But I appreciate that you want to see a ?s bound in there. In SPARQL, then you’d just say:
'{:select [#uri "" :as ?s ?p ?o]
  :where [[#uri "" ?p ?o]]}
But neither of those answer the question 🙂 The way to do it in Asami is:
(q '{:find [?s ?p ?o]
     :in $ ?s
     :where [[?s ?p ?o]]}
   the-db #uri "")
Datomic is nearly the same:
(q '{:find [?s ?p ?o]
     :in $ ?s
     :where [[?p* :db/ident ?p] [?s ?p* ?o]]}
   the-db #uri "")

2022-02-17T15:58:51.999539Z

Thanks… I was actually asking a slightly different question… How would you resolve this (deliberate) class of problem; in the asami/datomic world?

(let [s (URI. "<http://s>")]
   '{:select *
     :where [[s ?p ?o]]})
=> {:select * :where [[s ?p ?o]]} ;; <=== not what we want :-)

quoll 2022-02-17T16:14:37.861959Z

Sorry, I don’t follow. Are you trying to get the * in the select clause to have 3 bindings (with ?s already set to the required URI)? Also, I’m presuming that you have pseudo-code there, since you didn’t unquote the s in the where clause

2022-02-17T16:15:18.478859Z

the lack of unquote is the deliberate mistake

2022-02-17T16:20:27.820559Z

i.e. a correct way to do it is here: https://clojurians.slack.com/archives/C09GHBXRC/p1645026177945909 I’m really just asking how people tend do this dynamic query generation stuff with asami/datomic… in particular binding replacement… e.g. another way to do it in a more restricted case is in rdf4j you can take a query like:

SELECT * WHERE { ?s ?p ?o }
and provide essentially a map of bindings to rebind {'?s (URI. "") Which will then essentially rewrite that variable in the query.

2022-02-17T16:23:30.295439Z

a small issue with the way I linked is that the use of syntax quote makes the query noisier as you need to use ~' to avoid materialising namespace qualified symbols. I’m personally ok with it; but if you do dislike it you can also avoid it with a thin layer of macros.

quoll 2022-02-17T16:25:22.624609Z

I’m still a bit confused as to how the binding with the :in clause doesn’t work?

quoll 2022-02-17T16:27:30.423899Z

internally, Asami literally creates a map of {?s (URI. "")} when you do this

2022-02-17T16:27:31.434669Z

ok that bit does answer the question — and is essentially the same approach as in rdf4j etc

2022-02-17T16:29:29.323489Z

Thanks. I was basically asking, because I suspected the :in clause in datomic/asami did this… I was wondering because I thought this sort of thing might be an acceptable solution for flint — treating URI types more rigorously etc… cc @kelvin063

2022-02-17T16:32:08.092309Z

@quoll is there a way in the api to do that kind of substitution on fragments of a query; or must it always be applied at the top?

quoll 2022-02-17T16:32:39.793689Z

Come to think of it, it’s not quite a map. Asami results are just seqs of vectors, with metadata for the column names. So:

^{:cols '[?s ?p ?o]}
([s1 p1 o1]
 [s2 p2 o2]
 [s3 p3 o3])
What actually happens with the binding is that it starts resolving the query with:
^{:cols '[?s]}
([<>])
So it’s a single-row result with one column. This then gets joined to the result of [?s ?p ?o] which shares that ?s variable.

2022-02-17T16:34:00.060139Z

👍 makes perfect sense

quoll 2022-02-17T16:34:01.073889Z

Not sure what you mean by “fragments of a query”?

quoll 2022-02-17T16:34:53.079879Z

You can provide bindings to start with, so it need not be one-column/one-row bindings. You can pre-bind multiple columns and rows, and start the query from that point

2022-02-17T16:35:58.939899Z

say you’re dynamically generating a query; and have a sub function that will ultimately generate a FILTER NOT EXISTS { bgps } block… but some of those bgps contain lexically scoped bindings you want rewritten

quoll 2022-02-17T16:48:04.552129Z

Hmmm, that’s interesting. I’ve never thought about prebinding things you want removed.

2022-02-17T16:48:39.159229Z

Yeah; it’s a useful trick

quoll 2022-02-17T16:49:45.695469Z

OK… in Asami there is currently no way. It would be easy enough to do (from a query resolution perspective), but I’d need to figure out the API to get the data into the right place

quoll 2022-02-17T16:50:21.300739Z

especially if you had, say, 2 or more NOT EXISTS, then how do you get the bindings to the right one? 🙂

2022-02-17T16:51:02.922459Z

lexical scope is one way

2022-02-17T16:57:20.011159Z

e.g.

`{:select * 
 :where [[~(sub-query arg1)]
         [~(sub-query arg2)]]}

2022-02-17T16:58:14.881429Z

then the subqueries internally replace through the discussed mechanism

quoll 2022-02-17T17:11:22.464199Z

For this kind of thing, I build the query in code, and not as a series of quote/unquoting operations

quoll 2022-02-17T17:12:01.766309Z

(assoc query :where (concat first-part second-part)) kind of thing

2022-02-17T17:15:26.812049Z

yeah at some point of complexity that’s always required; the question is how far can you get without sacrificing too much in the way of intent/expressivity 🙂

quoll 2022-02-17T17:16:24.961669Z

I think it needs to be a part of the query grammar