2022-10-15 rdf | Clojure Slack Archive

rdf

simongray 2022-10-15T11:39:46.656009Z

It might be Aristotle being funky.

quoll 2022-10-15T12:06:42.396409Z

Hopefully!

quoll 2022-10-15T12:10:23.211999Z

I have been getting some bad behavior on my notebook by Stardog lately. Perform a simple query, over and over, adding just a single bgp at a time, with each step taking ~300ms. Then add one more, and suddenly the query won’t return. Drop the database, recreate it, reload the data, and try the entire query again… 300ms.

quoll 2022-10-15T12:11:04.343579Z

Happened last week. Happened again yesterday. And that’s a commercial system

🤷 1

simongray 2022-10-17T09:01:31.397749Z

If I have one criticism of triplestores it would be that the computational complexity of certain queries can be a bit opaque... But then again, I've also had stupid things happening in SQL when I forgot to create an index in advance or some other things related to the stateful nature of SQL databases.

simongray 2022-10-17T09:03:48.703399Z

I do sometimes wish there were a way to explicitly prepare special indices for certain queries, though, like in SQL.

quoll 2022-10-17T12:58:53.966269Z

The benefit of RDF is that everything is already indexed (if you’re not Jena). The one exception to that is string indexing, but that’s why several databases use Lucene (Mulgara did this back in 2002. Stardog does it now. I presume others do too)

quoll 2022-10-17T13:00:02.800269Z

As for the computational complexity… I am protected from that since I’ve implemented these query engines more than once, so it seems clear to me 😊

Kelvin 2022-10-17T14:31:15.185419Z

> everything is already indexed (if you’re not Jena) Wouldn’t that depend on the backend implementation? Though I guess you’re talking about Jena’s built-in backends like Fuseki (for our app we’re using AWS Neptune which definitely has indexing, but we’re using Jena in the application layer).

2022-10-17T17:29:43.296769Z

I had the impression that TDB fully indexes all triples. Am I mistaken?

Kelvin 2022-10-17T17:30:54.445149Z

https://jena.apache.org/documentation/tdb/architecture.html#triple-and-quad-indexes

quoll 2022-10-17T17:30:55.320399Z

It should do, yes. Waaaaaay back, Jena had no indexing. I think that is still possible.

quoll 2022-10-17T17:31:10.511049Z

(like… depending on the storage you use)

quoll 2022-10-17T17:31:40.159019Z

I have some idea of how Andy was implementing TDB, as he was asking me a lot of questions about it (this was 2006/2007)

quoll 2022-10-17T17:32:55.035609Z

The main point is that RDF is usually indexed every way. It’s more about the join operations, and the order they’re done in

simongray 2022-10-18T06:22:53.757389Z

@kelvin063 Isn't Fuseki a web application frontend for Jena?

simongray 2022-10-18T06:25:16.324609Z

@quoll note that there is both TDB and TDB2 now, so things may have changed since then.

2022-10-19T09:00:17.146649Z

@quoll > I have been getting some bad behavior on my notebook by Stardog lately. Perform a simple query, over and over, adding just a single bgp at a time, with each step taking ~300ms. Then add one more, and suddenly the query won’t return. Drop the database, recreate it, reload the data, and try the entire query again… 300ms. We see this sort of thing too, quite frequently. You might want to try running stardog optimize rather than dropping and replacing the whole thing. The query plans can be sensitive to data changes. Optimize in stardog does 3 things. 1. It recalculates stats 2. It clears any tombstoned records on deletions that may have occurred 3. It compacts the indexes https://docs.stardog.com/operating-stardog/database-administration/storage-optimize#database-optimization One problem we have is that afaik you can’t force stardog to do 2 and 3 inside an update transaction. Though you can force it to do 1 by setting index.statistics.update.blocking.ratio. At least this was true of stardog 6; afaik much of it still stands on 7 and 8; though the performance profile of those has changed a little since they switched from their homegrown indexing to using rocksdb.

👍 1

quoll 2022-10-18T10:11:41.261059Z

I doubt things would get worse in the process! 🙂

simongray 2022-10-18T13:27:17.534219Z

Unless worse is better!

Kelvin 2022-10-18T14:24:23.751619Z

@simongray Fuseki is a SPARQL server: https://jena.apache.org/documentation/fuseki2/ - the “webapp” is just one form, the other being a more traditional RDF server (the latter which we use for app dev)

quoll 2022-10-15T00:37:40.501849Z

Ummm… how many triples do you have?

quoll 2022-10-15T00:38:19.665119Z

Because that sounds ridiculously slow.

quoll 2022-10-15T00:39:37.902959Z

Jena used to be bad, but I thought it was better

Clojurians Log v2

rdf