Fork me on GitHub
#clojure-spec
<
2021-08-04
>
Franco Gasperino22:08:45

I'm looking for the preferred route on the following.. Requirement: * Read JSON messages from external streaming source. * Perform message schema validation. Search for required keys and data pairs. * Conform message and filter invalid messages. Question: * Spec definitions are centered around keywords, not strings. * Perform a recursive keywordize-keys call on original string keys? * Alternative spec definition using strings instead of keywords?

Franco Gasperino23:08:33

it appears keys -> keywords model is at least optionally suggested in a couple libraries, such as cheshire and clojure/data.json

seancorfield23:08:48

You need to be a bit careful about just converting all JSON input to keyword-based hash maps since a malicious user could bombard your server with random JSON with long, unique strings and potentially cause performance/heap problems for you.

seancorfield23:08:40

(that said, I think a lot of people do simply read the JSON as keyword-based hash maps)

seancorfield23:08:34

We no longer have the problem of old where keywords were interned and never GC'd so it's not as dangerous as it used to be 🙂

vemv23:08:12

Is there a known recipe for avoiding security pitfalls when using spec at the edges of a production app? Other than "don't" :)

seancorfield23:08:16

At work, our APIs are all behind authentication so we can "trust" the input to some degree and we do read JSON to keyword-based hash maps and then validate it with Spec. You could write a Spec that just validated the top-level keys as strings -- using a set for the valid keys, but if you're accepting values that can also be structured data that will get a bit gnarly.

👍 2
seancorfield23:08:32

As for security pitfalls with Spec being used on arbitrary data, I think you mostly need to ensure that validation can't be sent into a deep CPU hole because your specs allow arbitrary nesting and structure (again, the malicious user and the large payload issue).

👍 2
vemv23:08:34

sounds like a hard-to-write meta-spec :) I'm open to everything though it came to mind just now, maybe one create a simple translation layer from spec to malli which is more performant / fit for the use case. Obviously only a subset could be translated there's the precedent of https://github.com/threatgrid/flanders which is a unified DSL that can spit out Spec and Plumatic Schema alike