rdf

simongray 2023-04-18T09:21:31.309759Z

Anyone here have any suggestions for a predicate to use to mark former/alternative IDs?

simongray 2023-04-18T09:24:18.355129Z

I have split some resources in a new version of a dataset into multiple IDs, but would like to preserve the older ID in some way. Currently, I just define my own predicate. Perhaps that is preferable?

simongray 2023-04-18T13:18:28.572659Z

Unfortunately not: > This property is intended to be used with non-literal values. This property is an inverse property of Is Replaced By. I may have another use for it though. Thanks. I’m just gonna go with my own property 🙂

2023-04-18T13:18:46.565469Z

@simongray: are the ID’s essentially strings? In which case there is https://www.w3.org/TR/skos-reference/#notations. The “recommendation” with notations is that you assign them a user defined typed literal datatype where the datatype is a link that defines the notation scheme. I understand why they did this, but it can be a bit annoying if you have multiple notations and you want to do things like group by them etc; as triple stores typically don’t index the data-type and provide a means to efficiently query/access it etc… In this case I’ve often argued for creating a subProperty of skos:notation for each notation-system too, so you can more easily separate and query for them.

👍 2
🤔 1
simongray 2023-04-20T15:30:56.594199Z

(I forgot to answer you directly: yes, the IDs are strings.)

curtosis 2023-04-18T19:01:22.950329Z

Another trick I’ve used is using language tags (in the x-foo RFC5646 private use tag space), typically for things like “interpret this literal as a regex”. It’s (intentionally) not as interpretable as skos:notation, but it has the benefit of being handled as a way of selecting “versions” of strings. That said, if you’re splitting the concepts then dc:replaces carries that semantics along.

🙏 1
2023-04-21T09:02:25.903339Z

I think they generalise to the same thing, so solving one would solve 99% of the other. Though in my experience I’ve never found a store which optimised filters by converting them into direct index lookups analogous to BGPs. Though I do vaguely recall @quoll mentioning she knew of a store that did this.

quoll 2023-04-21T13:40:11.027589Z

This is actually the approach used in SQL engines

quoll 2023-04-21T13:41:01.743789Z

they use filters, unless the tested column is indexed, in which case the index is used instead

👍 1
quoll 2023-04-21T13:46:33.070359Z

If we have this in SPARQL then a WHERE clause that looks like:

?entity ex:property ?x . FILTER(?x = 42)
would be equivalent to:
?entity ex:property 42
But it depends entirely on the implementation

quoll 2023-04-21T14:13:40.054759Z

I know we did do this in Mulgara, since if the filter said FILTER(?x > 42) then it became an index lookup, and this was extended to booleans operations in filters, so that if you had: FILTER(?x > 42 && ?x < 64) or even multiple patterns like: FILTER(?x > 42) . FILTER(?x < 64) Then this became a single range lookup in the index. (I know this because I was the one to implement it! It was back in 2004, IIRC. I think I blogged about it)

curtosis 2023-04-21T14:20:01.246339Z

Indeed, that was the whole thing behind Netezza — just put filters in an FPGA at the hard drive controller and Always Be Table Scanning. 😛

curtosis 2023-04-21T17:02:49.222209Z

obviously less directly applicable without the seek penalty of spinning rust, but still was a neat idea at the time.

2023-04-19T08:33:24.003229Z

Why is that better than using a custom ^^regex datatype?

quoll 2023-04-19T14:03:32.203149Z

I’m with Rick on this one. Repurposing language tags may work for an internal application, but it hurts the portability of your data

curtosis 2023-04-19T17:17:47.287429Z

That’s specifically the reason — it’s purely for internal use (we use it to drive generated outputs from the authoritative RDF) and really shouldn’t have much meaning outside. More specifically, it’s how a downstream processor should interpret a given literal. We don’t currently use it, but the sub-tag mechanism could also be useful for future applications (e.g. en-US-x-regex). I think subtypes are the wrong way to do this, but haven’t really thought too hard about it.

curtosis 2023-04-19T17:24:39.339049Z

As Rick also alluded to, unlike custom literal datatypes, filtering by language tags is (at least in theory) already built in to query engines. I should add that in this particular use case it’s always only a filter condition, and never part of a search query (“find all the s,p,o where o is a literal@x-private”). That may affect the choice we made.

curtosis 2023-04-19T17:31:13.231459Z

All that said, we did sort of start down that path for the reasons I mentioned above, without exhaustively considering the custom ^^regex approach. I still think it’s the right one for us, but could be convinced to revisit it.

curtosis 2023-04-19T17:32:00.813889Z

Appreciate the responses!

2023-04-20T12:23:52.809099Z

Well all compliant query engines should let you filter with something like:

FILTER(datatype(?foo) = regex:datatypeuri)
My point was that they’re not usually indexed or easily queryable outside of a filter, at least not without https://www.w3.org/TR/rdf11-mt/#rdf-entailment

👍 1
curtosis 2023-04-20T13:34:32.307519Z

yeah… and I suspect (but have not validated) that the language code path is such a common filter that it’s optimized in some ways, where the datatype path seems less likely to be. (naïvely: one possibly-interned string compare vs. two lookups and a uri comparison)

curtosis 2023-04-20T13:35:05.294869Z

standard have-not-written-my-own-query-engine-disclaimer.rdf