Fork me on GitHub
#rdf
<
2022-08-15
>
Bart Kleijngeld14:08:18

I'm generating an Avro schema from a SHACL model, and an interesting question has come up: do I choose null or [] to represent absence?

Bart Kleijngeld14:08:06

To get to the point: say I have a resource Sensor, which can have zero or more measurements. I see two ways to encode "there are no measurements": 1. I make its type optional and use null, or 2. I use the empty array [] to do that. (Note that I've assumed a closed world here. Under the open world assumption there would've been a clear distinction: [] encodes the statement "there are no measurements", and null encodes "i don't know of any measurements".)

Bart Kleijngeld14:08:11

I was wondering if others have input here: am I correct in seeing semantic equivalence there under the closed world assumption? And if so, what reasons would I have to choose either option? Thanks!

quoll15:08:49

Yes, I believe you have semantic equivalence. To do it in an open world you’d need a collection, or an explicit “not exists” value to assert. To do it in a closed world, then you can just make it optional, which is what an Avro null does. The empty collection is OK, since being explicit like that means the same thing under both assumptions

quoll15:08:01

Though, I should check in to make sure… you said that you want to “represent absence”. Are you representing that the value is unset, or are you representing that the value is unknown? I’m presuming the former, but I shouldn’t make assumptions 🙂

Bart Kleijngeld15:08:52

Okay, glad to see my reasoning verified at least 🙂.

Bart Kleijngeld15:08:33

Regarding your question: good to check that, particularly since choosing the words "represent absence" was the hardest part of phrasing the question 😄. Anyways, I'm not sure I understand the difference between the two interpretations you suggest (within a closed world that is). Could you elaborate? To add a little context that might help: the code I'm writing aims to be a general generator of Avro schemas from a given SHACL model. That means I'm trying to map SHACL concepts onto Avro ones in a meaningful way (SHACL is of course more expressive), but also a practical way (the Avro schemas are going to be actually used, so things like schema evolution matter to the point where I might sacrifice some "transformational purity" if that makes sense)

Bart Kleijngeld15:08:52

My point is: all I get from a SHACL model is cardinality constraints, and all I have in Avro are the possibility of making a type nullable or not. To that degree of generality I have to make a choice whether I map SHACL's minCardinality = 0; maxCardinality > 1 to a nullable array or not

quoll15:08:10

What I meant was basically 3 types of value: • a value, such as a number. • a null, to indicate that there is no value. • an “unknown” to indicate that the value is not known. We have a surprising number of these with medical data. For instance, if a thumb is broken, then the “laterality” property will be either left, or right, or unknown, but it can’t be null, since one of the 2 thumbs was broken. But the location of a neoplasm can have no laterality, if it is, for instance, in the center of the chest.

💯 3
quoll15:08:54

if you have an array, then I wouldn’t make it nullable, but that’s a personal preference. Clojure is great for treating empty seqables as nil, but only if you wrap them in seq. But in the non-Clojure world, you often need to explicitly check for null before you’re allowed to look at a collection, which is annoying, since you need separate code for that.

quoll15:08:11

The other thing is that you have 2 things with the same meaning: null and []

Bart Kleijngeld15:08:15

I love that example

Bart Kleijngeld15:08:53

Those kind of practical consequences are the ones I'm looking for. That's a good argument for not making it nullable indeed

quoll15:08:05

Well, some people will say that the model was incomplete and should have include “center” as an option. But in the real world ALL models are incomplete 🙂

Bart Kleijngeld15:08:57

Agree. That wouldn't be an argument against the "unknown" value imo

👍 1
quoll15:08:00

I’ve always marveled at how when I try to describe modeling to people via a simple example from the real world, someone will ALWAYS find exceptions to the model

1
☝️ 1
Bart Kleijngeld15:08:38

Haha yes that sounds like a familiar wall to bump into

Bart Kleijngeld20:08:43

I let it sink in and yes, [] indeed seems the more practical choice here :). An extra reason I came up with is: there's only one value to express a cardinality of zero then. I like having fewer choices, keeps things simple ;). Thanks for the input