Fork me on GitHub
#aws
<
2021-08-09
>
practicalli-johnny10:08:58

Given a pipeline of services, all connected via kafka topics, would it be possible to programmatically spin up a particular service in the middle of the pipleline, based on a new kafka partition appearing? For example, we have ingestion, assembly, process, publish, export services. Assembly puts messages onto individual kafka partitions based on a unique id shared by 10,000 events Events are ingested and assembled sequentially. However, to scale the processing service, a new instance of the processing service would spin up when a new unique id was detected by one of the previous services (or some kind of monitoring service that had oversight over the whole process). Has anyone done anything like this before with Clojure and AWS (it doesnt have to be aws though).

dominicm14:08:05

ASGs run based on CloudWatch metrics. I'm not sure if the managed kafka service publishes anyway CloudWatch metrics, but you could try that.

practicalli-johnny14:08:35

Interesting. Ideally all events with the same id should be processed by the same instance (usually a stream of several thousand events), as data contained in previous events may affect the processing of newer events. I guess that is what makes it more than just scaling, its more like spinning up a parallel processing service with the data segregated by a specific data value. The data value to segregate the data is usually known a little in advance, up to a day, so there would be time to spin up and configure each specific process to use a specific id.

dominicm15:08:11

I know Kafka Streams does that when it scales out, it has a whole system around state preservation and leader election to manage that.

practicalli-johnny15:08:14

Yes, Kafka Streams seems to be the driver and it fit into what we are doing with events... Thanks