This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2022-08-25
Channels
- # announcements (4)
- # asami (26)
- # babashka (82)
- # beginners (27)
- # biff (6)
- # boot (1)
- # calva (42)
- # cider (2)
- # clj-commons (1)
- # clj-http-lite (2)
- # clj-kondo (37)
- # cljdoc (1)
- # clojure (46)
- # clojure-europe (34)
- # clojure-nl (1)
- # clojure-norway (7)
- # clojure-uk (2)
- # clojurescript (54)
- # code-reviews (18)
- # cursive (2)
- # datalevin (32)
- # datomic (7)
- # etaoin (1)
- # fulcro (9)
- # gratitude (3)
- # hyperfiddle (15)
- # introduce-yourself (1)
- # jobs (2)
- # lsp (32)
- # nrepl (1)
- # off-topic (18)
- # pathom (17)
- # pedestal (5)
- # polylith (89)
- # reitit (7)
- # releases (3)
- # remote-jobs (4)
- # shadow-cljs (52)
- # spacemacs (3)
- # squint (14)
- # tools-build (10)
- # tools-deps (18)
- # vim (4)
- # xtdb (34)
Maybe some people have opinions on this here :)
Hmm… I’m spending so much time in RDFS and OWL at the moment. Other schemas just haven’t been on my radar
My first thought is that because it’s JSON, and that can be stored as a tree in the graph, then evolution can be done as if talking about functional data structures
So… if you have data like on the Java example site:
{"namespace": "example.avro",
"type": "record",
"name": "User",
"fields": [
{"name": "name", "type": "string"},
{"name": "favorite_number", "type": ["int", "null"]},
{"name": "favorite_color", "type": ["string", "null"]}
]
}
And you want the favorite_number
to not be optional anymore, then if this were a Clojure structure:
(update-in schema ["fields" 1 "type"] "int")
If this were converted to a tree (and in Clojure, under the hood that’s what it is), then you’ll be updating the node that represents the type
in the favorite_number
entry. That means writing a new node for the type, a new node for the map that holds it (but still pointing to the existing name
/`favorite_number` key/value pair) a new node for the containing list, and a new node for the top-level map. But there are still lots of existing nodes in that original tree that can still be referenced by the new version of the tree
If you can’t tell, I’ve taken to building immutable structures in graphs, in almost exactly the same way that the graph indices have been built
I see what you're getting at. My question was more about "what are good ideas to adhere to in order to ease the process of evolving schemas", no so much "how do you properly evolve a schema" in a more technical sense. I feel you've answered the latter. Just to make sure: did you catch the replies I did on the original thread?
No problem. I'm learning from the above as well!
OK… well my opinion is again skewed by RDF/OWL, as in… it’s all OWA! This makes things non-intuitive for many people (and I appreciate the problems there), but it also allows a lot of things to “just work”
I’m less on board with the default values. The semantics of an application can often decide what the app wants to do in the case of data that wasn’t provided
Haha yes, I'm aware of how complicated this can get, given all sorts of use cases, trade-offs and preferences out there. What you're saying aligns with what Chad Herrington (Lancaster library author, i.e. Avro for Clojure) advocates in my conversations with him as well. I'm starting to more deeply understand the reasons for preferring optional values this way, but a skeptical team and lack of experience makes it somewhat harder on me. For example, people wonder: if some required field was removed, and a consumer working with the older schema reads a new message, this breaks things (the change was not forward compatible). I can show them how optional fields avoid this, but they go on to ask: well, but don't you want this to break? The consumer should be expected to require that field, and now it is gone. This should trigger you to make changes. In the case of an optional value the decoding might have gone well, but the required field for the consumer is still lacking and you can expect to break it still.
(I have a tough time answering that one)
One thing I can imagine being relevant is: how often can you say confidently that a field is required for all consumers? It's one of these typical, rigid things. It feels safer to let consumers make things required or not, depending on their use case (it's a bit like Rich's point in the Maybe Not talk)
Well, you have a tradeoff then… do you want it to break with incompatible data, or do you want it to be resilient? Those are mutually exclusive, but it sounds like you’re looking for both 🙂 OTOH, if you start with the open world assumption, so you always expect things to possibly be missing, it may make the code longer (how to deal with missing values that you expect), but it makes you forward compatible, and gives you a consistent approach for resiliency
I don't think I'm following how you mean the resiliency here. Resilient as in: receiving a null or ignoring a field instead of throwing an error?
Right. It's hard for me to play devil's advocate if I agree so much 😂. Another reason for me never to aspire to become a lawyer
Throw another reason in why don't you 😉 haha
Ambiguous rules that are subject to interpretation? I mean, that’s the entire basis for the profession!
I guess people have a tough time understanding the benefits of resiliency in the schema layer (perhaps me too) if ultimately the consumer is going to have to deal with the (say) incomplete data anyways. If it breaks it breaks. But again, I think my point of "fields might not actually be required by all consumers" is a good reason for why resiliency is a good idea.