Fork me on GitHub
#portkey
<
2018-12-17
>
cgrand09:12:01

@viesti > data+compute locality is a good thing I think, avoids shuffling of data. don’t forget the fineprint: • when data is easily shardable (no broad join required) • when you have a good enough idea of data distribution to have balanced shards.

viesti09:12:24

this paper spawned discussion elsewhere too, I skimmed (I have to learn to read papers too, instead of skimming :)) the Pywren paper and was thinking that in what they did, data placement in S3 was a good git for Lambda autoscaling

viesti09:12:49

there are different problems to which different solutions work well for

viesti09:12:09

I don’t know if it is good to generally label one approach “the best”

cgrand12:12:51

simple/definite answers to complex problems are always dubious

viesti20:12:43

Could this be done with a more data-centric approach: https://aws.amazon.com/blogs/aws/boost-your-infrastruc