Fork me on GitHub

gotcha, makes sense. thanks!


I'd like to implement a multi step data pipeline where each step's "spec" is slightly modified and based on the previous step's spec. For example at step 1 an address has some number of fields and at step 2 it has the same fields plus geo coordinates. Or in step 1 it can have optional string geo coordinates and after step 2 it has to have numeric ones. It seems like spec requires/encourages you to duplicate base address fields in two separate specs (rather than build on a base spec) and possibly have different names for the string geo coords and double geo coords (since they are registered under the same name). This appears to be more straight forward in schema where I can modify the schema map as I go through the pipeline. Am I reading this right? Am I missing something? What's the best spec based approach to this? How would you approach this? Thanks in advance.


if your changes are purely additive between pipeline steps, you can compose specs using and


my perspective on specs is you should treat them like defining columns in a sql database. if I had a column named date in a sql database, would I want the scheme to allow strings or dates? most likely not, I would have different columns for strings and dates.


Thanks @hiredman the sql table analogy is a good way to think about it.


I think specs that coerce in a conformer are useful, but I've taken to naming them with a ! suffix


or well, naming the function that the conformer calls with a ! suffix and typically only using them where absolutely necessary


seems like a reasonable pattern, thanks for sharing

Yehonathan Sharvit18:10:11

@mattly don’t forget to pass the 2nd param to s/conformer - otherwise s/unform will fail


in this case I'm basically using it to validate environment variables


@hiredman indeed the table/column analogy is interesting, but it’s not the whole story. i’ve been thinking about “refinement” (for lack of a better word) a bunch lately


specs (and for that matter: types) have an inherit problem when modeling time


for a simple example, consider before/after validation of some property


you can have :unvalidated/foo and :validated/foo, but it seems like a weird way to go about that if they are going to be identical? to each other in all cases


of course, you can always model the validation as coercion, such that you get the validated value or nil, so having two keys makes sense - and indeed that’s what i’d probably do with spec


the real problem comes in when you talk about nested structure


you don’t want to do O(N) allocations to validate a structure


renaming all the keys, losing generality of functions


I don't think I follow


when I said sql column above, that is because I am more familiar with using sql, but obviously the best analogy is a datomic attribute


so stick all your maps in datomic so you can scrub forward and backwards in time on them 🙂


heh, sorry, let me try to explain again


so i’m talking about a more general problem, but i think this neatly specifies one specific instance of it:


the ast typing problem is double bad b/c they aren’t modeling things with extensible records


but the notion of having two recursive structures that are subtle distinct at different stages in a pipeline is a common theme


it sucks to have to do O(N) work (eg copy a whole AST) in order to make some small changes to a subset of nodes


isn't that the same argument against immutable datastructures?


not quite, it’s more related to the argument in favor of covariant arrays in java/c# 🙂


eg you have an array of integers and need an array of objects, so why should you have to copy every integer in order to satisfy the type checker? obviously you don’t have to do that with spec, but it’s still a problem if you have a recursive structure and use different keys for different specs


I mean, your options are to write novelty in place, or accrete it


sure, you’re totally right - this problem is usually easily avoidable in spec thanks to the dynamic instrumentation and extensible records


i just think its interesting to think about refinement of specifications without having to introduce new names


there’s definitely use cases to have “changed” structures vs “extended” structures, but yeah - go with the latter whenever possible