Fork me on GitHub
#clojure-spec
<
2016-10-10
>
jrheard00:10:02

gotcha, makes sense. thanks!

juliobarros17:10:21

I'd like to implement a multi step data pipeline where each step's "spec" is slightly modified and based on the previous step's spec. For example at step 1 an address has some number of fields and at step 2 it has the same fields plus geo coordinates. Or in step 1 it can have optional string geo coordinates and after step 2 it has to have numeric ones. It seems like spec requires/encourages you to duplicate base address fields in two separate specs (rather than build on a base spec) and possibly have different names for the string geo coords and double geo coords (since they are registered under the same name). This appears to be more straight forward in schema where I can modify the schema map as I go through the pipeline. Am I reading this right? Am I missing something? What's the best spec based approach to this? How would you approach this? Thanks in advance.

hiredman17:10:01

if your changes are purely additive between pipeline steps, you can compose specs using and

hiredman17:10:40

my perspective on specs is you should treat them like defining columns in a sql database. if I had a column named date in a sql database, would I want the scheme to allow strings or dates? most likely not, I would have different columns for strings and dates.

juliobarros17:10:09

Thanks @hiredman the sql table analogy is a good way to think about it.

mattly17:10:27

I think specs that coerce in a conformer are useful, but I've taken to naming them with a ! suffix

mattly17:10:11

or well, naming the function that the conformer calls with a ! suffix and typically only using them where absolutely necessary

jrheard18:10:11

seems like a reasonable pattern, thanks for sharing

Yehonathan Sharvit18:10:11

@mattly don’t forget to pass the 2nd param to s/conformer - otherwise s/unform will fail

mattly18:10:47

in this case I'm basically using it to validate environment variables

bbloom23:10:14

@hiredman indeed the table/column analogy is interesting, but it’s not the whole story. i’ve been thinking about “refinement” (for lack of a better word) a bunch lately

bbloom23:10:00

specs (and for that matter: types) have an inherit problem when modeling time

bbloom23:10:38

for a simple example, consider before/after validation of some property

bbloom23:10:27

you can have :unvalidated/foo and :validated/foo, but it seems like a weird way to go about that if they are going to be identical? to each other in all cases

bbloom23:10:57

of course, you can always model the validation as coercion, such that you get the validated value or nil, so having two keys makes sense - and indeed that’s what i’d probably do with spec

bbloom23:10:29

the real problem comes in when you talk about nested structure

bbloom23:10:24

you don’t want to do O(N) allocations to validate a structure

bbloom23:10:31

renaming all the keys, losing generality of functions

hiredman23:10:18

I don't think I follow

hiredman23:10:36

when I said sql column above, that is because I am more familiar with using sql, but obviously the best analogy is a datomic attribute

hiredman23:10:03

so stick all your maps in datomic so you can scrub forward and backwards in time on them 🙂

bbloom23:10:16

heh, sorry, let me try to explain again

bbloom23:10:16

so i’m talking about a more general problem, but i think this neatly specifies one specific instance of it: http://blog.ezyang.com/2013/05/the-ast-typing-problem/

bbloom23:10:04

the ast typing problem is double bad b/c they aren’t modeling things with extensible records

bbloom23:10:32

but the notion of having two recursive structures that are subtle distinct at different stages in a pipeline is a common theme

bbloom23:10:12

it sucks to have to do O(N) work (eg copy a whole AST) in order to make some small changes to a subset of nodes

hiredman23:10:48

isn't that the same argument against immutable datastructures?

bbloom23:10:29

not quite, it’s more related to the argument in favor of covariant arrays in java/c# 🙂

bbloom23:10:21

eg you have an array of integers and need an array of objects, so why should you have to copy every integer in order to satisfy the type checker? obviously you don’t have to do that with spec, but it’s still a problem if you have a recursive structure and use different keys for different specs

hiredman23:10:51

I mean, your options are to write novelty in place, or accrete it

bbloom23:10:00

sure, you’re totally right - this problem is usually easily avoidable in spec thanks to the dynamic instrumentation and extensible records

bbloom23:10:01

i just think its interesting to think about refinement of specifications without having to introduce new names

bbloom23:10:38

there’s definitely use cases to have “changed” structures vs “extended” structures, but yeah - go with the latter whenever possible