This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-10-15
Channels
- # beginners (26)
- # biff (28)
- # calva (13)
- # clj-commons (4)
- # clj-kondo (3)
- # clojure (45)
- # clojure-austin (17)
- # clojure-europe (8)
- # clojure-finland (1)
- # clojurescript (14)
- # code-reviews (3)
- # emacs (33)
- # helix (4)
- # holy-lambda (7)
- # joyride (5)
- # keechma (1)
- # meander (4)
- # membrane (3)
- # missionary (22)
- # nbb (1)
- # off-topic (1)
- # pathom (4)
- # rdf (24)
- # releases (2)
- # sci (3)
- # shadow-cljs (12)
- # tools-deps (14)
I have been getting some bad behavior on my notebook by Stardog lately. Perform a simple query, over and over, adding just a single bgp at a time, with each step taking ~300ms. Then add one more, and suddenly the query won’t return. Drop the database, recreate it, reload the data, and try the entire query again… 300ms.
If I have one criticism of triplestores it would be that the computational complexity of certain queries can be a bit opaque... But then again, I've also had stupid things happening in SQL when I forgot to create an index in advance or some other things related to the stateful nature of SQL databases.
I do sometimes wish there were a way to explicitly prepare special indices for certain queries, though, like in SQL.
The benefit of RDF is that everything is already indexed (if you’re not Jena). The one exception to that is string indexing, but that’s why several databases use Lucene (Mulgara did this back in 2002. Stardog does it now. I presume others do too)
As for the computational complexity… I am protected from that since I’ve implemented these query engines more than once, so it seems clear to me 😊
> everything is already indexed (if you’re not Jena) Wouldn’t that depend on the backend implementation? Though I guess you’re talking about Jena’s built-in backends like Fuseki (for our app we’re using AWS Neptune which definitely has indexing, but we’re using Jena in the application layer).
I had the impression that TDB fully indexes all triples. Am I mistaken?
It should do, yes. Waaaaaay back, Jena had no indexing. I think that is still possible.
I have some idea of how Andy was implementing TDB, as he was asking me a lot of questions about it (this was 2006/2007)
The main point is that RDF is usually indexed every way. It’s more about the join operations, and the order they’re done in
@U02FU7RMG8M Isn't Fuseki a web application frontend for Jena?
@U051N6TTC note that there is both TDB and TDB2 now, so things may have changed since then.
@U4P4NREBY Fuseki is a SPARQL server: https://jena.apache.org/documentation/fuseki2/ - the “webapp” is just one form, the other being a more traditional RDF server (the latter which we use for app dev)
@U051N6TTC
> I have been getting some bad behavior on my notebook by Stardog lately. Perform a simple query, over and over, adding just a single bgp at a time, with each step taking ~300ms. Then add one more, and suddenly the query won’t return.
Drop the database, recreate it, reload the data, and try the entire query again… 300ms.
We see this sort of thing too, quite frequently.
You might want to try running stardog optimize
rather than dropping and replacing the whole thing.
The query plans can be sensitive to data changes.
Optimize in stardog does 3 things.
1. It recalculates stats
2. It clears any tombstoned records on deletions that may have occurred
3. It compacts the indexes
https://docs.stardog.com/operating-stardog/database-administration/storage-optimize#database-optimization
One problem we have is that afaik you can’t force stardog to do 2 and 3 inside an update transaction. Though you can force it to do 1 by setting index.statistics.update.blocking.ratio
.
At least this was true of stardog 6; afaik much of it still stands on 7 and 8; though the performance profile of those has changed a little since they switched from their homegrown indexing to using rocksdb.