This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-03-21
Channels
- # beginners (38)
- # boot (88)
- # cljs-dev (142)
- # cljsrn (2)
- # clojars (1)
- # clojure (107)
- # clojure-berlin (2)
- # clojure-italy (8)
- # clojure-russia (76)
- # clojure-spec (325)
- # clojure-taiwan (3)
- # clojure-uk (28)
- # clojurescript (80)
- # clojurewest (2)
- # core-async (36)
- # core-logic (1)
- # cursive (21)
- # datomic (16)
- # dirac (18)
- # docs (2)
- # emacs (1)
- # euroclojure (3)
- # garden (3)
- # gsoc (2)
- # hoplon (3)
- # immutant (4)
- # jobs-discuss (16)
- # lein-figwheel (5)
- # liberator (17)
- # lumo (19)
- # off-topic (2)
- # om (20)
- # onyx (28)
- # pedestal (50)
- # planck (4)
- # re-frame (5)
- # reagent (3)
- # ring-swagger (13)
- # spacemacs (1)
- # specter (43)
- # testing (3)
- # timbre (3)
- # uncomplicate (1)
- # vim (2)
- # yada (4)
@stathissideris @seancorfield regarding your conformer question above, use (s/conformer val) not second
Or if you're feeling bold you can wrap the semi hidden and possibly going away (s/nonconforming ...)
Ah, yes, it’s a MapEntry not a vector.
It's both! :)
I would use s/nonconforming
but you keep saying it might go away 😛
So why does it matter whether we use val
or second
?
(I’m happy to “do the right thing” but want to know why my current code is “wrong”)
val is direct field lookup
Ah, so it’s faster.
second will convert to a seq and pull two values
Should be much faster
I suspect I use second
in other places where val
would be better...
Just don't ever use second :)
Yeah, I have a whole bunch of (->> … (group-by something) (map second) …)
forms!
What about destructuring where you likely have (map (fn [[k v]] …) some-map)
better to do (map (fn [me] … (key me) … (val me) …) some-map)
?
I'd use the one that you find more readable
I'd destructure
Id have to look at code to guess at any perf difference
If destructuring uses nth then its probably a wash
@alexmiller thanks, I wasn't aware that using second instead of val for map entries is less performant
I made a custom tagless s/or
adapted from the original, but had to copy half of spec's private functions like specize
, pvalid?
, explain-1
etc. Am I missing some public util functions?
https://github.com/yonatane/spec-grind/blob/master/src/spec_grind/impl.clj
A spec newbie question: How do I create expectation about non namespaced keyword keys of a map?
(do
(s/def ::customer-id int?)
(s/def ::category (s/keys :req [::customer-id]))
(s/explain-data ::category {:customer-id 2}))
You can use req-un
and opt-un
There's some details on the about spec page I think https://clojure.org/about/spec
Thanks. We missed that you still need to pass namespaced keywords, but the expectation is non namespaced. Now it works
:thumbsup:
@yonatanel no, those are internal and subject to violent change without warning :)
Another question: I wanted to use spec in my tests, where I do want to expect most keys in my map to be exactly this, but others like the id or created_at, which I can't predict, to just conform to predicate. For example, given this map:
{:id 4
:customer_id 5
:section "lala"}
I want to see that customer_id
is exactly 5 and section
is "lala"
, but for id
it just want to know that it's an integer. Finally, it would be nice if the spec would fail if the result has other keys, but that's not required. Is that possible and how?@igel for the first part, you can use sets as predicates, like #{5}
or #{"lala"}
. For the second part, you can do (s/and (s/keys ...) #(every? #{:your :keys} (keys %)))
Another question related to (s/keys …)
Imagine I run this:
(s/def ::my-map (s/keys))
(s/valid? {:foo/core “aa”})
If foo.core
is somehow defined in my spec registry, then its specs will run when calling (s/valid? {:foo/core “aa”})
But it is not my intent at all
foo/core
might be a namespace defined by a library that I am requiring
Is there a security risk?
Unexpected code can run based on user input
imagine I parse a JSON with {”foo/core” “aa”}
and decide to keywordize the keys
@viebel that's why it's recommended to use namespaces you own, otherwise you break others and others break you.
The problem is that once I require a library, I cannot prevent it to add specs in the registry
So when I parse the JSON, I must be careful when keywordizing the keys !!!?!?!
@viebel I'm not sure spec is meant for portable data validation and coercion at all.
if you accept input with keywordized keys and use spec you must follow some rules yes (proper namespaces), otherwise just use the *-un variants
well, the point is that if you're using keys
and accepting user data you don't have control over what the user will pass you
and might malignantly pass namespaced keys that force conform using an expensive conformer
we are talking about JSON
that you parse with keywrodizing the keys which is quite common practice
still, if you do something like (comp clojure.walk/keywordize-keys cheshire.core/parse-string)
over your input data
ah right, didn't realize that (keyword "event/type") would actually create a ns keyword, bummer
@bronsa I guess you shouldn't blindly use (s/keys) on user input, or do anything blindly with user input.
@mpenet I read somewhere that keyword
accepts anything exactly for this reason, to keywordize user input.
@yonatanel meh, i don't buy the argument "you should be careful with what you do while parsing user input" tho -- i'm conforming user input to check for valid data
then it's either live with the possibility of people exploiting this for possibly overload cpu while validating user input or don't use namespaced keys at all
Hey, new here and new to clojure-spec, I was discussing with @viebel on this issue before he came here.
the thing that struct me is that the (s/valid?)
will call unexpected code on various input. as we saw here, it might be used to run malicious code on the server
I think the openness is a huge problem here, (s/valid?)
shouldn't run anything that I didn't ask for
but sure, the more common problem is a bug in the library, or just a cpu-intensive code
@yonatanel yeah that's never going to happen
rich's last clojure/conj talk explains why closed specs make it harder to evolve systems
-Namespaced keyword specs can be checked even when no map spec declares those keys
This last point is vital when dynamically building up, composing, or generating maps. Creating a spec for every map subset/union/intersection is unworkable. It also facilitates fail-fast detection of bad data - when it is introduced vs when it is consumed.
You're asking spec to sanitize your inputs and conformers to be your lossy coercions. From what I gather it's not meant to do that (I do use it that way though :))
By that reasoning you'd also have to prohibit polymorphism across libraries
It's similar in a way that you potentially run code that you didn't know you were going to run from the point of calling a function
this issue implies that you can't run s/valid?
on user-provided input before validation (which is what s/valid?
is for?).
Now suppose one of your libs defines a spec for some internal validation and it's bugged in a way that causes certain inputs to never terminate parsing.
Now somebody discovers it, and maliciously provides those inputs to you and you have no way of avoiding that to run
let's wait to see what @alexmiller thinks about it
Yeah, that spec is not good for running on untrusted inputs is a good obversation
this to me feels similar to the problem that clojure.edn
solved for parsing user input
but you are asking spec to run validation using s/keys, which will validate keys
if you don’t want that, don’t do it
Good point, hehe
I think you’re implicitly conflating multiple things into one question
spec has capabilities for validating data
one of them is to define a spec for a map as a collection of attributes
recognizing that many current maps have unqualified keys, s/keys provides :req-un and :opt-un options for validating unqualified keys
and that is one approach to validating json
but :req-un
and :opt-un
still accepts fully qualified namespaces, which will cause the same issue
so don’t rely on spec to solve every problem for you
its your responsibility as an application developer to think about how you handle untrusted inputs from the external world
>but you are asking spec to run validation using s/keys, which will validate keys right, but even if I'm not while validating input data, I might in some internal function that my valid input data might get threaded through
>its your responsibility as an application developer to think about how you handle untrusted inputs from the external world which is why I'm validating it with spec :) it feels a bit odd to me that we have to be careful about passing user data to validating functions
if "spec is not the right tool at this level" is the answer I guess I'll live with that but I can imagine a lot of people will be confused by this answer?
I’m not saying it’s not the right tool, just that you should think critically about what you’re doing. sorry not sorry about that.
esp because people are using schema for doing that right now and will be tempted to switch from schema to spec w/o realizing the implication
you control the libraries you’re using, the code you’re running, and when and how you validate data
all your specs call predicate functions, maybe from other libraries - how is that any different?
that's all true, it still doesn't make me feel any less weird about having to validate my data in order to validate it through spec
it's a bit like adding eval
to the code, without telling the non-expert user that you're doing it.
I was really surprised when @viebel pointed this feature to me.
the issue is it appears it's not safe to validate/conform user-provided data with spec w/o.. validating it before passing it to spec
so it may work especially well for anyone in the "knows" but will shot the regular developer in the leg.
can you give an example of a spec that would be “not safe” in this way?
not safety is not an issue, I don't think. I'm thinking of some malformed spec that causes non termination with very simple inputs, which is not unlikely to happen
so, example?
if it’s a bug in a predicate or spec, then you would fix it, just like you would with any bug that has that effect in your code
I’m questioning the premise of this problem
you do - you chose to load it
you can choose to not load it
or to replace it
and you might not know about a bug in a function in a library you call
same for any function
The weird thing is that even if the spec is just there - defined in the lib - without even being in use, it will be in the registry and corrupt my code
it will not be unless you load the code
you control the registry
Should I check all the specs defined in all the libs I load?
should you check all the functions defined in all the libs you load?
right, it seems highly more likely to be explotable in a spec tho that finding the correct data that causes an invocation of that function to trigger the bug
All the functions I use
But not the funtions I don’t use
all the functions your functions call?
then yes, you should check all the specs too
this is not different
say I'm using a common library (say cheshire) and it upgraded with a buggy spec. now anyone who uses that lib, are open to that bug, without explicitly calling the spec. just by loading cheshire
yes, exactly like if cheshire had a function that was buggy
I understand what you're saying, I don't agree that the two have the same severity tho
I cannot discover this bug with any kind of unit tests
(Even using test.check
!)
that does not make sense
I have control over what functions I use and how I use them, I don't have control over what (loaded) specs some user data will invoke unless I validate that data before I spec/validate it (or don't use keys)
At the easiest you can trigger a stackoverflow if a user has a recursive schema somewhere defined.
anyway, it seems like the answer is "be careful", not sure it satisfies me but I can live with it
I'd love to vote on a ticket if one is created, because until 10min ago I didn't realize s/keys checked other keys implicitly.
@rauh if the spec throws an error, then it is doing it’s job in telling you that it’s invalid data
the implicitly of (s/keys) opens the code for trouble. an in-between solution would be that (s/keys) will invoke only locally namespaced specs, and not library defined specs.
those are not two differentiable things
Every once in a while I wish for a s/keys
version that allowed a whitelist of spec keywords or namespaces that it checks
the local registry could be defined as a subset of the global one, becoming effectively a whitelist
not doing that
s/keys combines multiple things. enhancement requests to do individual parts of that might be worth considering.
separating required/optional key checks from validation of specified keyed attributes from validation of all keyed attributes
@alexmiller A Stackoverflow which can easily be triggered by user input isn't something most programmers catch, and certainly valid?
is expected to return true/false. Stackoverflow doesn't even get caught with (a pretty wide) (catch Exception _)
again, how is this different from having a bad function that throws a StackOverflowError?
if it’s a bug, fix it
which is not at all different than multimethods or protocol extensions
there are several Clojure constructs that create runtime state on load
how is it any different than a bug in the validation function you were going to hand write instead?
the attack (from outside) is identical
the question is, if a programmer which isn't in this slack room, uses clojure-spec, will he think about sanitizing the json before passing it to (s/valid?)
if not, then the buggy behavior will be wide spread.
what is the exposure?
was this behavior of the (s/keys)
been discussed in regards to this issue before? if not, I think that it qualifies as a "surprising" side effect of clojure-spec
it’s been discussed many times. it’s discussed in the guide, and in the spec rationale, and in the doc string.
no, I mean for a particular application
if someone passes bad input, it could yield an error response
how is that different than any bad input
that’s the whole point of validation?
but it IS bad
how can it harm the machine?
you chose to load the “arbitrary code”
this is not code some attacker is supplying to you
all things that can happen from an invalid input also sent to a “bad” validation function
I don’t see any way to actually cause memory consumption or cpu overload in dangerous ways. stackoverflow maybe, although I don’t have an example of that either.
i think the claim being made is that the surface area of the possible "attack" (for lack of a better word) is potentially way larger with bugged specs than other bugged constructs
I have no examples of “faulty specs” that can cause improper machine resource usage.
what are the alternatives?
1) don’t validate and don’t be aware of invalid data
2) validate by using functions in either your code or libs
1 does not seem better and 2 does not seem effectively different to me
and why is that harmful?
a spec predicate that does something over the network (call a db?), parses string content (or whatever resource heavy operation you can think of)
IMO it'd be nice to have multiple repositories, for instance, I'm using :db/id
, :object/id
and :file/id
in my application code. Down the road when many libraries are spec'ed this will get trampled and lead to issues. Or am I missing something?
it's the same problem with project names/ns/package. that said multiple repos would be nice for other reasons
@rauh you are missing the use of proper namespacing :)
@alexmiller Would you say datomic attributes should be qualified with a namespace you own, or is :album/year enough?
@yonatanel same advice as spec. if you’re providing data for use with others, you must use a qualifier that you “control" (reverse domain, trademarked entity, etc). if the data will be used only in an environment that you control, it must only be “sufficiently unique"
so in a generic open source library, use a qualifier you control. If confined in your app, do whatever you want. If in an organizational context, you might need to ensure uniqueness in your organization.
@alexmiller care to comment on the snippet above? How should one defend against an 'attack' like this?
Don't call valid?, don't load this spec, don't use s/keys, or use select-keys to pre-filter what you look at
Check whether your input contains 10000 nested maps
Again, also compare this with what you would do without spec too - is that prone to the same issue?
so is it better to notice the bad input or to pass 10000 nested maps around your system?
(provided your JSON parser didn't blow up with a StackOverflowError before that point anyway, hehe)
(or whatever your input format is)
And "attack" is a bit of a strong word here, considering it's an exception, not a granting of root privileges or something like that.
It isn't a bad input though. My spec says it might have an 'a' key and that's it. I do not care about the rest, if I did, I would have specified that in my spec.
@tbaldridge I agree about the attack part, hence the quotes (I'm pulling a trump here lol)
@moxaj that’s not what spec says that spec means
that spec says “a map that might have an ::a key, and where all values are valid according to the spec of their keys"
Is there any way to define coll-of
specs such that the associated generator would yield a collection type such as PersistentQueue
, sorted-set
, etc? I assume I could do it with a custom generator, just wondering if there's a shorter path.
@moxaj if you want what you said, then just map?
is sufficient
@dave.dixon look at the :into
key
(s/def ::c (s/coll-of int? :into (sorted-set)))
The keys spec is perfect for me, except for the implicit part. But I guess that's not subject to change, so no point in arguing :)
well as I said above, an enhancement ticket that separates parts of keys
seems like a reasonable idea
I do not know how Rich would react to it, but “decomplecting” is usually a good thing :)
I do a lot of ETL stuff where I start with a pile of strings and turn them into some sort of seq of data structures. The strings encode things like a map of key to string or key to set of numbers.
I've had some reasonable success in using spec to check that the strings are of a format that is coercable to something "100" -> 100 or 100,101,102 -> #{ 100 101 102 }
Is this a terrible, no good, pls stop use of spec?
are you checking that they are coercible or actually doing the coercion?
I've got a function that checks s/valid? using the spec and then s/conform using the same spec
so you are transforming the data via the conform
so the general caveat for stuff like that is not that it’s necessarily bad but that it has consequences that you should understand
namely that users of that spec cannot recover the original data (as you could via s/unform normally)
and if you put that spec in the registry, you’ve made that choice for consumers of the spec
if “consumers of the spec” == you, then that’s up to you :)
if you’re using conformers, and if the function is reversible, you can supply an unform function in conformer to retain the reciprocal nature of that (assuming it is worth the trouble for you, which it may not be)
I've not seen an unform example that does that yet. That would be interesting. After talking to seancorfield about it quite a bit I've been building up quite a few generators for testing things that have done a good job of driving out some bugs in my functions.
(and this is gold dust to me so thx. I'm glad it is a trade off I should think about rather than a terrible idea)
@alexmiller Thanks, that worked. Docs make it sound like :into
is limited to [], (), {}, #{}
, should have just tried it.
user=> (s/def ::s (s/conformer #(str "hi "%) #(subs % 3)))
:user/s
user=> (s/conform ::s "bruce")
"hi bruce"
user=> (s/unform ::s "hi bruce")
“bruce”
it’s just functions yo
Rich would also caution you against treating spec as a transformation engine rather than something that validates declarative statements about your data (ie “it’s not a meat grinder”).
but that capability is there for your abuse :)
@dave.dixon actually, I may have led you into the path of undefined behavior there
I think you’re right that the intention was to only support that fixed set in :into
for the time being
alexmiller this is the "use clojure.core for this kind of thing" comment I've seen floating about
I think I'm still searching for a better way of turning other peoples strings into data.
I quite liked the error msgs that I could get out of spec around what had failed when I tried to validate
@dave.dixon there is some ambiguity about whether (I think due to impl churn) about whether :kind
is supposed to be a function or a spec. Only a function works now, but the doc implies a spec (which could itself have a custom generator). There is a ticket regarding this issue and I haven’t yet gotten Rich to respond to me about it. :)
@alexmiller Thanks, that answers my next question. Will probably just go with it for now since it seems to work, and the only other option would appear to be to write a custom generator for every "non-standard" collection spec.
racket has a contract system which, while I don't think I have heard anyone on the core team say was inspiration, a lot of people just assume it must have been
the way spec validates sequential data structures is by "parsing" them using a parser based on parsing with derivatives (also a racket connection there)
@yonatanel Here is an interactive article that I wrote about the basics of “parsing with derivatives"
With interactive clojure code snippets