This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-10-10
Channels
- # aleph (2)
- # arachne (1)
- # beginners (5)
- # boot (33)
- # cider (12)
- # cljs-dev (6)
- # cljsrn (26)
- # clojure (33)
- # clojure-austin (7)
- # clojure-belgium (6)
- # clojure-chicago (1)
- # clojure-dusseldorf (1)
- # clojure-fr (1)
- # clojure-hamburg (1)
- # clojure-nl (11)
- # clojure-portugal (3)
- # clojure-russia (14)
- # clojure-spec (35)
- # clojure-uk (28)
- # clojurescript (49)
- # component (7)
- # core-async (75)
- # cursive (13)
- # datomic (15)
- # dirac (57)
- # emacs (5)
- # events (1)
- # hoplon (34)
- # jobs (2)
- # jobs-discuss (8)
- # lambdaisland (1)
- # lein-figwheel (7)
- # leiningen (3)
- # om (5)
- # onyx (8)
- # re-frame (56)
- # reagent (13)
- # testing (7)
- # untangled (30)
- # vim (51)
- # yada (17)
I'd like to implement a multi step data pipeline where each step's "spec" is slightly modified and based on the previous step's spec. For example at step 1 an address has some number of fields and at step 2 it has the same fields plus geo coordinates. Or in step 1 it can have optional string geo coordinates and after step 2 it has to have numeric ones. It seems like spec requires/encourages you to duplicate base address fields in two separate specs (rather than build on a base spec) and possibly have different names for the string geo coords and double geo coords (since they are registered under the same name). This appears to be more straight forward in schema where I can modify the schema map as I go through the pipeline. Am I reading this right? Am I missing something? What's the best spec based approach to this? How would you approach this? Thanks in advance.
if your changes are purely additive between pipeline steps, you can compose specs using and
my perspective on specs is you should treat them like defining columns in a sql database. if I had a column named date
in a sql database, would I want the scheme to allow strings or dates? most likely not, I would have different columns for strings and dates.
Thanks @hiredman the sql table analogy is a good way to think about it.
I think specs that coerce in a conformer are useful, but I've taken to naming them with a !
suffix
or well, naming the function that the conformer calls with a !
suffix and typically only using them where absolutely necessary
@mattly don’t forget to pass the 2nd param to s/conformer
- otherwise s/unform
will fail
@hiredman indeed the table/column analogy is interesting, but it’s not the whole story. i’ve been thinking about “refinement” (for lack of a better word) a bunch lately
you can have :unvalidated/foo and :validated/foo, but it seems like a weird way to go about that if they are going to be identical?
to each other in all cases
of course, you can always model the validation as coercion, such that you get the validated value or nil, so having two keys makes sense - and indeed that’s what i’d probably do with spec
when I said sql column above, that is because I am more familiar with using sql, but obviously the best analogy is a datomic attribute
so stick all your maps in datomic so you can scrub forward and backwards in time on them 🙂
so i’m talking about a more general problem, but i think this neatly specifies one specific instance of it: http://blog.ezyang.com/2013/05/the-ast-typing-problem/
the ast typing problem is double bad b/c they aren’t modeling things with extensible records
but the notion of having two recursive structures that are subtle distinct at different stages in a pipeline is a common theme
it sucks to have to do O(N) work (eg copy a whole AST) in order to make some small changes to a subset of nodes
not quite, it’s more related to the argument in favor of covariant arrays in java/c# 🙂
eg you have an array of integers and need an array of objects, so why should you have to copy every integer in order to satisfy the type checker? obviously you don’t have to do that with spec, but it’s still a problem if you have a recursive structure and use different keys for different specs
sure, you’re totally right - this problem is usually easily avoidable in spec thanks to the dynamic instrumentation and extensible records