rdf

simongray 2023-06-26T08:14:09.573449Z

anybody here have any experience using GROUP BY and GROUP_CONCAT(...) in SPARQL? My use case is returning limited results in alphanumeric order, but each row is a potential one-to-many relationship. Is it possibly more performant to skip grouping and just use ORDER BY along with a bunch of additional queries? Unfortunately, it seems like GROUP_CONCAT basically just turns groups items into a concatenated string and even removes important information such as the enclosing <...> , which makes it a bit of a hack IMO.

simongray 2023-06-27T09:44:08.386659Z

Thanks for confirming my suspicions, Paula. I suspect that I will have to create a relatively complex query that fetches important relations first to get some of the properties of an ordered result…

simongray 2023-06-27T13:30:32.937499Z

I’m thinking that a way to get around the performance implications of ORDER BY might be to pull out a set of known relations first in a specific order and then just LIMIT the rest in another query, not caring particularly about order (do it client-side). it sucks that this will make my queries signicantly more complex. :S

quoll 2023-06-27T13:32:22.297529Z

LIMIT may speed things up, but sometimes it needs to do complex work to return the first item. It really depends on the query

quoll 2023-06-26T15:05:45.581699Z

The implementations I’m aware of (including the ones I’ve done) accomplish grouping by subquerying. i.e. The query is executed for each group. That sounds expensive, though it’s not. When you’re inside a query already, then there isn’t a lot of overhead for an implementation to do it this way. You already have a lot of data in the query already resolved, and caching is used a lot. But, it is an extra inner loop. On the other hand, ORDER BY is potentially very expensive, since an entire result needs to be eagerly returned, and then reordered. Quicksort is log(n) complexity, but it can still take time if it’s a large result.

quoll 2023-06-26T15:08:05.409659Z

I would expect GROUP BY to perform better. Maybe accumulating results yourself would be a little quicker, or more determinant than GROUP_CONCAT, but the grouping ought to be based on indexes, and therefore you’ll have your results grouped, even if the ordering in the groups isn’t predetermined

quoll 2023-06-26T15:09:06.721119Z

(`GROUP_CONCAT` demonstrates so much potential, and yet is almost useless. It frustrates me so much)