rdf

simongray 2023-08-07T12:17:27.536539Z

I’m toying with the idea of using word clouds to illustrate one-to-many relationships in my graphs, i.e. such relationships would be represented with a single edge per predicate going from the central, subject resource to a single word cloud for that predicate, which then contains all of the object resources in ?s ?p ?o for that ?p. The resources contained in these word cloud would then be sized according to their indegree. In the end, you will have a subject and a handful of predicates each illustrated with edges going to word clouds. The hypothesis here is that this would create a scalable way to illustrate otherwise messy graphs with lots of outgoing edges. Now, in order to do this I will need to precompute the indegree for a number of resources in my dataset. How do I best do this? I will need to limit it to only connections between resources of a specific rdf:type, since I am only interested in illustrating these. Trying to do this naively doesn’t seem performant at all, so might I be better served doing multiple queries and letting Clojure do the hard work? I’m using Apache Jena if that matters.

quoll 2023-08-11T15:36:31.856479Z

This is my question too. (also, there is a typo in the query, but I assume it just came up due to typing the query rather than copy/paste)

quoll 2023-08-11T15:42:01.526469Z

If, for some reason, the query planner chose to do both rdf:type expressions first due to them being the same expression (bound to different values) then this would be an outer product, which might explain the timeout. If that’s what’s happening, one possibility would be to filter the ?s values, either using:

WHERE {
  ?o rdf:type ontolex:LexicalConcept .
  ?s ?p ?o .
  FILTER (EXISTS {?s rdf:type ontolex:LexicalConcept}) .
}
Or maybe just bump the planner a bit with:
WHERE {
  ?o rdf:type ontolex:LexicalConcept .
  ?s ?p ?o .
  ?s rdf:type ?t
  FILTER (?t = ontolex:LexicalConcept) .
}

quoll 2023-08-11T15:42:57.130149Z

It won’t execute the same way, but it ensures that you get the same as the first working query, and filter the results down to just the subjects you want. So it should only be a little bit slower

2023-08-09T16:21:09.023369Z

Does changing the order make a difference, or is the graph planner smart enough to do that for you?

simongray 2023-08-14T09:00:35.486649Z

Yes, the typo was just me being bad at copy-pasting. I will try to reorder the statements or use a filter. Thank you @quoll and @eric.d.scott!

simongray 2023-08-14T09:21:03.448439Z

Takes <10 minutes now! 😍

2023-08-14T23:12:35.132449Z

Which way was fastest?

simongray 2023-08-07T13:02:12.937109Z

Specifically, the basic query

SELECT ?o (COUNT(*) AS ?indegree)
WHERE {
  ?o rdf:type ontolex:LexicalConcept .
  ?s ?p ?o .
}
GROUP BY ?o
completes in ~6 minutes on my machine, which is decent enough. However, if I also try to limit the incoming connections to those emanating from an ontolex:LexicalConcept , I haven’t managed to make the query actually complete yet:
SELECT ?o (COUNT(*) AS ?indegree)
WHERE {
  ?o rdf:type ontolex:LexicalConcept .
  ?s ?p ?o .
  s rdf:type ontolex:LexicalConcept .
}
GROUP BY ?o
the only added part is s rdf:type ontolex:LexicalConcept .

simongray 2023-08-07T13:03:33.701119Z

I can certainly live with the first query, but would definitely prefer the second one since it slightly more fit for purpose.

simongray 2023-08-15T07:26:41.034189Z

I only tried it the first way, but can try the other way too