Fork me on GitHub
#flambo
<
2016-11-15
>
sorenmacbeth20:11:24

you should be able to define and use flambo in a REPL connected to a cluster

sorenmacbeth20:11:30

I do this all the time

sorenmacbeth20:11:32

one key is that if you define it in a repl, you need to use anonymous f/fn's for you operations

sorenmacbeth20:11:43

those are serializable

sorenmacbeth20:11:39

if you def or defn in the user namespace, those won't exist on the cluster worker nodes when you try to execute them

sorenmacbeth20:11:55

so you can do stuff like:

sorenmacbeth20:11:56

(def whatever (f/text-file sc ""))
(def res (-> whatever (f/flat-map (f/fn [x] ...)) (f/reduce (f/fn [x y] (merge x y))) f/collect))

sorenmacbeth21:11:06

something I do as well to make working in a repl on a cluster easier, is I make a namespace in my uberjar that has a bunch of commonly used functions, such as date/time stuff

sorenmacbeth21:11:24

in my REPL, I switch into that workspace and work there

sorenmacbeth21:11:46

that way, those common functions do exist on the worker nodes