Fork me on GitHub
#powderkeg
<
2017-03-30
>
cgrand10:03:00

I just returned from the Spark codebase and I’m more hopeful than before about Encoders

viesti10:03:24

so if I don't have time with my sql branch you can take over relevant code

viesti10:03:47

that is neat :)

viesti10:03:04

encoder relief :)

cgrand10:03:54

But now we need to provide two arguments `serializer: Seq[Expression], deserializer: Expression,`

cgrand10:03:11

You have to understand that these expressions are just builders for Java code source.

cgrand10:03:21

(that will be compiled on the fly)

cgrand10:03:56

So one has to think like when one does Clojure interop from Java

cgrand12:03:19

I think we can achieve an API like (df src spec & xforms-then-options) very close to what we have for rdd

viesti12:03:07

that would be awesome

cgrand12:03:43

the snippet above is totally untested (well it compiles and produces an expression)

viesti19:03:27

while learning about CollReduce, what about parallel fold?

viesti19:03:30

not managing a PR tonight, maybe later

cgrand19:03:21

This code is there from the early times. I don't remember why it got commented out.

viesti19:03:02

there is a related todo comment

viesti19:03:49

on another note, where is the book on Clojure collection protocols, would need one :)

cgrand19:03:35

I do remember. Transducers assume linear traversal. So you have to solve transducers+fold first.

viesti19:03:03

saw some stackoverflow question about that

viesti19:03:00

Rich's strange loop talk on transducers mentioned parallellism on one slide, wonder where it is at now :)

cgrand19:03:21

If I understand Alex's mention of kv, I believe I solved it my way in xforms.

viesti19:03:40

have to grok that too, but now have to try sleeping :)

cgrand19:03:11

With by-key I deal with deterministic partitioning. Fold is non-deterministic.