This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2023-04-18
Channels
- # ai (2)
- # announcements (11)
- # beginners (34)
- # biff (14)
- # clerk (14)
- # clj-kondo (25)
- # clojure (27)
- # clojure-austin (1)
- # clojure-conj (6)
- # clojure-denmark (1)
- # clojure-europe (20)
- # clojure-hamburg (1)
- # clojure-nl (1)
- # clojure-norway (28)
- # clojure-uk (2)
- # clojuredesign-podcast (6)
- # clojurescript (43)
- # cursive (4)
- # data-science (2)
- # emacs (9)
- # hyperfiddle (9)
- # introduce-yourself (2)
- # jobs (3)
- # lsp (32)
- # missionary (31)
- # nbb (8)
- # off-topic (23)
- # rdf (23)
- # re-frame (10)
- # reitit (11)
- # releases (3)
- # rewrite-clj (4)
- # shadow-cljs (7)
- # specter (6)
- # sql (7)
- # xtdb (7)
Anyone here have any suggestions for a predicate to use to mark former/alternative IDs?
I have split some resources in a new version of a dataset into multiple IDs, but would like to preserve the older ID in some way. Currently, I just define my own predicate. Perhaps that is preferable?
Would this work? https://www.dublincore.org/specifications/dublin-core/dcmi-terms/terms/replaces/
Unfortunately not: > This property is intended to be used with non-literal values. This property is an inverse property of Is Replaced By. I may have another use for it though. Thanks. I’m just gonna go with my own property 🙂
@simongray: are the ID’s essentially strings? In which case there is https://www.w3.org/TR/skos-reference/#notations.
The “recommendation” with notations is that you assign them a user defined typed literal datatype where the datatype is a link that defines the notation scheme.
I understand why they did this, but it can be a bit annoying if you have multiple notations and you want to do things like group by them etc; as triple stores typically don’t index the data-type and provide a means to efficiently query/access it etc…
In this case I’ve often argued for creating a subProperty of skos:notation
for each notation-system too, so you can more easily separate and query for them.
Another trick I’ve used is using language tags (in the x-foo
RFC5646 private use tag space), typically for things like “interpret this literal as a regex”. It’s (intentionally) not as interpretable as skos:notation
, but it has the benefit of being handled as a way of selecting “versions” of strings.
That said, if you’re splitting the concepts then dc:replaces
carries that semantics along.
Why is that better than using a custom ^^regex
datatype?
I’m with Rick on this one. Repurposing language tags may work for an internal application, but it hurts the portability of your data
That’s specifically the reason — it’s purely for internal use (we use it to drive generated outputs from the authoritative RDF) and really shouldn’t have much meaning outside. More specifically, it’s how a downstream processor should interpret a given literal. We don’t currently use it, but the sub-tag mechanism could also be useful for future applications (e.g. en-US-x-regex
). I think subtypes are the wrong way to do this, but haven’t really thought too hard about it.
As Rick also alluded to, unlike custom literal datatypes, filtering by language tags is (at least in theory) already built in to query engines. I should add that in this particular use case it’s always only a filter condition, and never part of a search query (“find all the s,p,o where o is a literal@x-private”). That may affect the choice we made.
All that said, we did sort of start down that path for the reasons I mentioned above, without exhaustively considering the custom ^^regex
approach. I still think it’s the right one for us, but could be convinced to revisit it.
Well all compliant query engines should let you filter with something like:
FILTER(datatype(?foo) = regex:datatypeuri)
My point was that they’re not usually indexed or easily queryable outside of a filter, at least not without https://www.w3.org/TR/rdf11-mt/#rdf-entailmentyeah… and I suspect (but have not validated) that the language code path is such a common filter that it’s optimized in some ways, where the datatype path seems less likely to be. (naïvely: one possibly-interned string compare vs. two lookups and a uri comparison)
I think they generalise to the same thing, so solving one would solve 99% of the other. Though in my experience I’ve never found a store which optimised filters by converting them into direct index lookups analogous to BGPs. Though I do vaguely recall @U051N6TTC mentioning she knew of a store that did this.
they use filters, unless the tested column is indexed, in which case the index is used instead
If we have this in SPARQL then a WHERE clause that looks like:
?entity ex:property ?x . FILTER(?x = 42)
would be equivalent to:
?entity ex:property 42
But it depends entirely on the implementationI know we did do this in Mulgara, since if the filter said FILTER(?x > 42)
then it became an index lookup, and this was extended to booleans operations in filters, so that if you had:
FILTER(?x > 42 && ?x < 64)
or even multiple patterns like:
FILTER(?x > 42) . FILTER(?x < 64)
Then this became a single range lookup in the index. (I know this because I was the one to implement it! It was back in 2004, IIRC. I think I blogged about it)