Fork me on GitHub
#rdf
<
2022-10-14
>
Rowland Watkins01:10:01

@simongray this is awesome to hear, nicely done! I had a similar use case, needing to keep graphs separate and then reference within other graphs (digital signatures in RDF). Hope the rest of your work continues smoothly!

simongray09:10:01

Hey SPARQl experts… is there any way to make such a query performant? I have a bunch many-to-many relationships between RDF resources of a specific rdf:type. I want to capture all of these triples. My naïve query (using Aristotle’s query syntax) is the following:

'[:bgp
  [?s1 :rdf/type :ontolex/LexicalConcept]
  [?s1 ?rel ?s2]
  [?s2 :rdf/type :ontolex/LexicalConcept]]
and this basically is so incredibly slow that I do not when it returns so I have given up on it. Is there any way to construct a query that can capture the relations between :ontolex/LexicalConcept resources without stalling? As an alternative, I am currently attempting to simply get each ?s1 resource out of Jena (this takes like 1-2 seconds on my machine) and then querying the graph for each of their relations separately in some kind of loop construct, since I will at least have some sense of progress when doing that….

Eric Scott19:10:26

Do you know in advance what properties pertain between Lexical Concepts? If so, and if Aristotle supports property paths, would it make sense to rephrase as below?

Eric Scott19:10:13

?s1 (prop:one|prop:two|...prop:n) ?s2. ?s1 a ontolex:LexicalConcept; ?s2 a ontolex:LexicalConcept.}

Eric Scott19:10:13

Or if you need to capture the relation, then maybe use a VALUES clause.

simongray06:10:50

I know for sure the possible relations, but they are optional in every case. Would this make a difference?

simongray06:10:48

Right now I simple fetch a set of all LexicalConcepts and mapcat them using the following function:

(defn synset-rel-table
  "A performant way to fetch synset->synset relations for `synset` in `model`.

  The function basically exists because I wasn't able to perform a similar query
  in a performant way, e.g. doing this for all synsets would take ~45 minutes."
  [^Model model synset]
  (->> (.listProperties (.getResource model (voc/uri-for synset)))
       (iterator-seq)
       (keep (fn [^Statement statement]
               (let [prefix ""
                     obj    (str (.getObject statement))]
                 (when (str/starts-with? obj prefix)
                   [synset
                    (str (.getPredicate statement))
                    (voc/keyword-for obj)]))))))
this works fine.

Eric Scott11:10:45

I don't think it would make a difference, and I'm not even sure if it would solve your performance problem. Looks like you have a solution.

rickmoynihan08:10:10

How big is the set of :ontolex/LexicalConcept s? I’m assuming both sets are large; and the problem is that ?s1 ?rel ?s2` is ungrounded. What storage engine are you using? How is it configured? Is it tdb2? And what query optimizer are you using? Is it any faster if you reorder the clauses, so that ?s1 and ?s2 are resolved first; and ?s1 ?rel ?s2 last? If reordering doesn’t help my instinct at optimising would be as Eric scott says to find the distinct set of ?rel’s used in lexical concepts and bind that with a values clause or a property path.

simongray14:10:36

@U06HHF230 This was for an in-memory graph with no reasoner attached, so it should be the absolute fastest Jena can possibly get. Gonna try your suggestions when I have my work computer available. Thanks!

simongray14:10:14

(I obviously don't need this now, but it's always nice to gain a deeper understanding of the issue)

rickmoynihan14:10:08

I wouldn’t for a second assume an in memory graph is fastest.

1
rickmoynihan14:10:54

I’m guessing it’s not actually bottoming out to sparql at all then; but is just using basic graph pattern joins

rickmoynihan14:10:48

Typically the graph API stuff doesn’t do query planning etc; and isn’t clever about the order of joins

rickmoynihan14:10:27

As @U051N6TTC said; when you know how triple stores are implemented the performance profile is pretty predictable — though it can be in an unpredictable way :rolling_on_the_floor_laughing: — that is if the distribution of data isn’t known… or there are additional optimisations which stop applying. Many triple stores evaluate queries in bgp order… it’s the easy thing to do; as you don’t need to do a query optimiser. As I understand it the low hanging fruit for query planning; is to just shuffle the order of BGP’s in the query around based on what you know about the cardinality of indexes. So you’re evaluating the most restrictive (smallest) sets first for joins. If there’s no optimiser (as is usually the case with in memory stuff) ordering the BGP’s into expected smallest first is on you (the developer).

1
simongray15:10:16

I see... hm...

simongray15:10:27

This is messing with my head.

simongray11:10:21

… I managed to make it performant by doing some interop with the Java objects. Now it takes a few seconds to generate the output I need.