Fork me on GitHub
#onyx
<
2017-09-16
>
Travis18:09:04

Hey guys, I have a question about onyx jobs in relation to Kafka topics. Originally we were thinking of having a Kafka topic per customer partitioned by devices that customer had. My issue with that seems if we had 100 customers that would be 100 onyx jobs to manage. Seems like that could get hard to manage. Question is would it be reasonable to have one topic for all customers therefore having one job for that topic? Not sure the best topic design here. Also I suspect we will have other topics for customers for completely different types of data.

gklijs19:09:59

@camechis from my experience with kafka I prefer to have one topic (or a couple in some cases) for each event/data type. You can easily filter out messages for specific devices, and also easily real all the data that way. Managing dynamic topics in kafka creates a lot of headaches. Maybe if you really want a high level of security, and put encryption on the topics, each with a different key it’s needed, but otherwise I would try to limit the number of topics. Especially since scaling is no problem as long as you create enough partitions.

Travis19:09:32

That was my thought as well. More I thought about it the idea of possibly 100s of jobs did not seem right. I think the main issue will be is just making sure customer data ends up in there own buckets and is not mixed

lucasbradstreet19:09:34

I agree in general

Travis20:09:33

Cool, just making sure I am thinking correctly

lucasbradstreet20:09:39

You can use group-by with windows to do customer specific aggregates. That’s typically the approach I would take

Travis21:09:53

Yep, that's what I was thinking