Fork me on GitHub
#rdf
<
2023-04-18
>
simongray09:04:31

Anyone here have any suggestions for a predicate to use to mark former/alternative IDs?

simongray09:04:18

I have split some resources in a new version of a dataset into multiple IDs, but would like to preserve the older ID in some way. Currently, I just define my own predicate. Perhaps that is preferable?

simongray13:04:28

Unfortunately not: > This property is intended to be used with non-literal values. This property is an inverse property of Is Replaced By. I may have another use for it though. Thanks. I’m just gonna go with my own property 🙂

rickmoynihan13:04:46

@simongray: are the ID’s essentially strings? In which case there is https://www.w3.org/TR/skos-reference/#notations. The “recommendation” with notations is that you assign them a user defined typed literal datatype where the datatype is a link that defines the notation scheme. I understand why they did this, but it can be a bit annoying if you have multiple notations and you want to do things like group by them etc; as triple stores typically don’t index the data-type and provide a means to efficiently query/access it etc… In this case I’ve often argued for creating a subProperty of skos:notation for each notation-system too, so you can more easily separate and query for them.

👍 3
1
simongray15:04:56

(I forgot to answer you directly: yes, the IDs are strings.)

curtosis19:04:22

Another trick I’ve used is using language tags (in the x-foo RFC5646 private use tag space), typically for things like “interpret this literal as a regex”. It’s (intentionally) not as interpretable as skos:notation, but it has the benefit of being handled as a way of selecting “versions” of strings. That said, if you’re splitting the concepts then dc:replaces carries that semantics along.

🙏 1
rickmoynihan08:04:24

Why is that better than using a custom ^^regex datatype?

quoll14:04:32

I’m with Rick on this one. Repurposing language tags may work for an internal application, but it hurts the portability of your data

curtosis17:04:47

That’s specifically the reason — it’s purely for internal use (we use it to drive generated outputs from the authoritative RDF) and really shouldn’t have much meaning outside. More specifically, it’s how a downstream processor should interpret a given literal. We don’t currently use it, but the sub-tag mechanism could also be useful for future applications (e.g. en-US-x-regex). I think subtypes are the wrong way to do this, but haven’t really thought too hard about it.

curtosis17:04:39

As Rick also alluded to, unlike custom literal datatypes, filtering by language tags is (at least in theory) already built in to query engines. I should add that in this particular use case it’s always only a filter condition, and never part of a search query (“find all the s,p,o where o is a literal@x-private”). That may affect the choice we made.

curtosis17:04:13

All that said, we did sort of start down that path for the reasons I mentioned above, without exhaustively considering the custom ^^regex approach. I still think it’s the right one for us, but could be convinced to revisit it.

curtosis17:04:00

Appreciate the responses!

rickmoynihan12:04:52

Well all compliant query engines should let you filter with something like:

FILTER(datatype(?foo) = regex:datatypeuri)
My point was that they’re not usually indexed or easily queryable outside of a filter, at least not without https://www.w3.org/TR/rdf11-mt/#rdf-entailment

👍 2
curtosis13:04:32

yeah… and I suspect (but have not validated) that the language code path is such a common filter that it’s optimized in some ways, where the datatype path seems less likely to be. (naïvely: one possibly-interned string compare vs. two lookups and a uri comparison)

curtosis13:04:05

standard have-not-written-my-own-query-engine-disclaimer.rdf

rickmoynihan09:04:25

I think they generalise to the same thing, so solving one would solve 99% of the other. Though in my experience I’ve never found a store which optimised filters by converting them into direct index lookups analogous to BGPs. Though I do vaguely recall @U051N6TTC mentioning she knew of a store that did this.

quoll13:04:11

This is actually the approach used in SQL engines

quoll13:04:01

they use filters, unless the tested column is indexed, in which case the index is used instead

👍 2
quoll13:04:33

If we have this in SPARQL then a WHERE clause that looks like:

?entity ex:property ?x . FILTER(?x = 42)
would be equivalent to:
?entity ex:property 42
But it depends entirely on the implementation

quoll14:04:40

I know we did do this in Mulgara, since if the filter said FILTER(?x > 42) then it became an index lookup, and this was extended to booleans operations in filters, so that if you had: FILTER(?x > 42 && ?x < 64) or even multiple patterns like: FILTER(?x > 42) . FILTER(?x < 64) Then this became a single range lookup in the index. (I know this because I was the one to implement it! It was back in 2004, IIRC. I think I blogged about it)

curtosis14:04:01

Indeed, that was the whole thing behind Netezza — just put filters in an FPGA at the hard drive controller and Always Be Table Scanning. 😛

curtosis17:04:49

obviously less directly applicable without the seek penalty of spinning rust, but still was a neat idea at the time.