Fork me on GitHub
Franco Gasperino22:08:45

I'm looking for the preferred route on the following.. Requirement: * Read JSON messages from external streaming source. * Perform message schema validation. Search for required keys and data pairs. * Conform message and filter invalid messages. Question: * Spec definitions are centered around keywords, not strings. * Perform a recursive keywordize-keys call on original string keys? * Alternative spec definition using strings instead of keywords?

Franco Gasperino23:08:33

it appears keys -> keywords model is at least optionally suggested in a couple libraries, such as cheshire and clojure/data.json


You need to be a bit careful about just converting all JSON input to keyword-based hash maps since a malicious user could bombard your server with random JSON with long, unique strings and potentially cause performance/heap problems for you.


(that said, I think a lot of people do simply read the JSON as keyword-based hash maps)


We no longer have the problem of old where keywords were interned and never GC'd so it's not as dangerous as it used to be 🙂


Is there a known recipe for avoiding security pitfalls when using spec at the edges of a production app? Other than "don't" :)


At work, our APIs are all behind authentication so we can "trust" the input to some degree and we do read JSON to keyword-based hash maps and then validate it with Spec. You could write a Spec that just validated the top-level keys as strings -- using a set for the valid keys, but if you're accepting values that can also be structured data that will get a bit gnarly.

👍 2

As for security pitfalls with Spec being used on arbitrary data, I think you mostly need to ensure that validation can't be sent into a deep CPU hole because your specs allow arbitrary nesting and structure (again, the malicious user and the large payload issue).

👍 2

sounds like a hard-to-write meta-spec :) I'm open to everything though it came to mind just now, maybe one create a simple translation layer from spec to malli which is more performant / fit for the use case. Obviously only a subset could be translated there's the precedent of which is a unified DSL that can spit out Spec and Plumatic Schema alike