This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2021-08-04
Channels
- # announcements (7)
- # babashka (32)
- # beginners (106)
- # bristol-clojurians (10)
- # cider (6)
- # clj-kondo (5)
- # cljdoc (10)
- # clojure (110)
- # clojure-australia (10)
- # clojure-dev (6)
- # clojure-europe (12)
- # clojure-nl (2)
- # clojure-norway (16)
- # clojure-spec (9)
- # clojure-uk (59)
- # clojurescript (105)
- # community-development (2)
- # conjure (46)
- # cursive (12)
- # data-science (1)
- # datalog (26)
- # datomic (37)
- # docker (4)
- # emacs (10)
- # events (1)
- # fulcro (8)
- # graalvm (2)
- # jobs (1)
- # jobs-discuss (1)
- # malli (24)
- # meander (13)
- # off-topic (52)
- # pathom (4)
- # polylith (17)
- # proletarian (4)
- # react (1)
- # rewrite-clj (4)
- # shadow-cljs (56)
- # sql (21)
- # xtdb (14)
I'm looking for the preferred route on the following.. Requirement: * Read JSON messages from external streaming source. * Perform message schema validation. Search for required keys and data pairs. * Conform message and filter invalid messages. Question: * Spec definitions are centered around keywords, not strings. * Perform a recursive keywordize-keys call on original string keys? * Alternative spec definition using strings instead of keywords?
it appears keys -> keywords model is at least optionally suggested in a couple libraries, such as cheshire and clojure/data.json
You need to be a bit careful about just converting all JSON input to keyword-based hash maps since a malicious user could bombard your server with random JSON with long, unique strings and potentially cause performance/heap problems for you.
(that said, I think a lot of people do simply read the JSON as keyword-based hash maps)
We no longer have the problem of old where keywords were interned and never GC'd so it's not as dangerous as it used to be 🙂
Is there a known recipe for avoiding security pitfalls when using spec at the edges of a production app? Other than "don't" :)
At work, our APIs are all behind authentication so we can "trust" the input to some degree and we do read JSON to keyword-based hash maps and then validate it with Spec. You could write a Spec that just validated the top-level keys as strings -- using a set for the valid keys, but if you're accepting values that can also be structured data that will get a bit gnarly.
As for security pitfalls with Spec being used on arbitrary data, I think you mostly need to ensure that validation can't be sent into a deep CPU hole because your specs allow arbitrary nesting and structure (again, the malicious user and the large payload issue).
sounds like a hard-to-write meta-spec :) I'm open to everything though it came to mind just now, maybe one create a simple translation layer from spec to malli which is more performant / fit for the use case. Obviously only a subset could be translated there's the precedent of https://github.com/threatgrid/flanders which is a unified DSL that can spit out Spec and Plumatic Schema alike