Fork me on GitHub
#onyx
<
2019-03-03
>
Ho0man14:03:25

Hi everyone, I am new to onyx Wanted to know what is the appropriate way to synchronize multiple streams in onyx ? (I have multiple streams of data segments whose ids maybe similar across streams and I want to emit a segment that is the merge of the latest received version of all related segments from different streams rather than the original segments themselves)

lmergen14:03:33

that's actually easier asked than done 🙂

lmergen14:03:45

what you need is basically multi-stream deduplication, right ?

lmergen14:03:10

what i found works best in those situations is to use a "staging area", where both streams write to (e.g. s3)

lmergen14:03:21

then periodically read from this staging area and apply de-duplication

lmergen15:03:28

alternatively, an approach that might work for you is to keep track of "seen" ids