Fork me on GitHub
Drew Verlee01:04:26

> We're missing first class iteration right coud you explain what iteration means in this context?


Cyclical workflows


Being able to loop from an output back into a portion of the workflow


And accrue state locally for that loop.

Drew Verlee01:04:51

hmmm ok. I was almost correct. Thanks for the clarification

Drew Verlee01:04:30

is that the same meaning that spark uses in their docs?


It amounts to the same thing that they use their RDD's for, they both tackle the same kinds of problems

Drew Verlee01:04:21

ah ok. Thanks! i gave a talk on data processing to my company and sort of cramed for part of it. Spark and Iterative were the last on my list and i think i gave the wrong impression 😕


How did it go otherwise?

Drew Verlee01:04:55

really great! I think i got showcased the progress that has been made in the dataprocessing space in the last 15 years. The leadership had started constructing a lambda architecture and I’m hopeful their now considering some of the newer solutions (onyx, flink ,dataflow, ...). I’m trying to do my due diligence in everything i bring to the table. The next lunch and learn (thats where we present things) i’ll be teaching them some clojure and showing them onyx simple_smile


Hm; that makes me wonder how big you can realistically make the catalog since you want to parametrize general fns


but some of those parameters (say, a specific classifier for e.g. a bunch of neural nets or even SVM or something) are bigger than others (say, a hostname to tell you where to do your geoip lookups)


@lvh Is the concern the size of the serialized catalog in ZooKeeper?


I guess so; I’m not convinced it’s actually a problem yet


I think the maximum size of a znode in 1 megabyte. So if the catalog, after Nippy compression, is bigger than that, it would be problematic. When sometime hits that, I'll take the time to make each catalog entry its on znode, which should permanently alleviate the problem. Easy fix. simple_smile