Fork me on GitHub
Bart Kleijngeld13:08:03

Maybe some people have opinions on this here :)


Hmm… I’m spending so much time in RDFS and OWL at the moment. Other schemas just haven’t been on my radar


My first thought is that because it’s JSON, and that can be stored as a tree in the graph, then evolution can be done as if talking about functional data structures


So… if you have data like on the Java example site:

{"namespace": "example.avro",
 "type": "record",
 "name": "User",
 "fields": [
     {"name": "name", "type": "string"},
     {"name": "favorite_number",  "type": ["int", "null"]},
     {"name": "favorite_color", "type": ["string", "null"]}
And you want the favorite_number to not be optional anymore, then if this were a Clojure structure: (update-in schema ["fields" 1 "type"] "int")


If this were converted to a tree (and in Clojure, under the hood that’s what it is), then you’ll be updating the node that represents the type in the favorite_number entry. That means writing a new node for the type, a new node for the map that holds it (but still pointing to the existing name/`favorite_number` key/value pair) a new node for the containing list, and a new node for the top-level map. But there are still lots of existing nodes in that original tree that can still be referenced by the new version of the tree


If you can’t tell, I’ve taken to building immutable structures in graphs, in almost exactly the same way that the graph indices have been built

😄 1
Bart Kleijngeld14:08:50

I see what you're getting at. My question was more about "what are good ideas to adhere to in order to ease the process of evolving schemas", no so much "how do you properly evolve a schema" in a more technical sense. I feel you've answered the latter. Just to make sure: did you catch the replies I did on the original thread?


no! Sorry. I didn’t realize it was a thread 😳

Bart Kleijngeld14:08:51

No problem. I'm learning from the above as well!


OK… well my opinion is again skewed by RDF/OWL, as in… it’s all OWA! This makes things non-intuitive for many people (and I appreciate the problems there), but it also allows a lot of things to “just work”


The equivalent to OWA in this case is indeed to make everything optional


I’m less on board with the default values. The semantics of an application can often decide what the app wants to do in the case of data that wasn’t provided


which aligns with the “optional” data approach too, since you’ll often be missing data


Of course, any identifying values (compound keys, etc), should not be optional


NB: this is all just opinion 😜

Bart Kleijngeld14:08:27

Haha yes, I'm aware of how complicated this can get, given all sorts of use cases, trade-offs and preferences out there. What you're saying aligns with what Chad Herrington (Lancaster library author, i.e. Avro for Clojure) advocates in my conversations with him as well. I'm starting to more deeply understand the reasons for preferring optional values this way, but a skeptical team and lack of experience makes it somewhat harder on me. For example, people wonder: if some required field was removed, and a consumer working with the older schema reads a new message, this breaks things (the change was not forward compatible). I can show them how optional fields avoid this, but they go on to ask: well, but don't you want this to break? The consumer should be expected to require that field, and now it is gone. This should trigger you to make changes. In the case of an optional value the decoding might have gone well, but the required field for the consumer is still lacking and you can expect to break it still.

Bart Kleijngeld14:08:19

(I have a tough time answering that one)

Bart Kleijngeld14:08:22

One thing I can imagine being relevant is: how often can you say confidently that a field is required for all consumers? It's one of these typical, rigid things. It feels safer to let consumers make things required or not, depending on their use case (it's a bit like Rich's point in the Maybe Not talk)


Well, you have a tradeoff then… do you want it to break with incompatible data, or do you want it to be resilient? Those are mutually exclusive, but it sounds like you’re looking for both 🙂 OTOH, if you start with the open world assumption, so you always expect things to possibly be missing, it may make the code longer (how to deal with missing values that you expect), but it makes you forward compatible, and gives you a consistent approach for resiliency

Bart Kleijngeld14:08:44

I don't think I'm following how you mean the resiliency here. Resilient as in: receiving a null or ignoring a field instead of throwing an error?


as in, being able to continue operating in the face of missing expected data

Bart Kleijngeld14:08:24

Right. It's hard for me to play devil's advocate if I agree so much 😂. Another reason for me never to aspire to become a lawyer


Lawyering means working with people, and people are hard

Bart Kleijngeld14:08:35

Throw another reason in why don't you 😉 haha


Ambiguous rules that are subject to interpretation? I mean, that’s the entire basis for the profession!

😂 1
Bart Kleijngeld14:08:45

I guess people have a tough time understanding the benefits of resiliency in the schema layer (perhaps me too) if ultimately the consumer is going to have to deal with the (say) incomplete data anyways. If it breaks it breaks. But again, I think my point of "fields might not actually be required by all consumers" is a good reason for why resiliency is a good idea.