Fork me on GitHub
#onyx
<
2016-06-16
>
Drew Verlee01:06:02

How would you (anyone willing to chim in) clean out a db that had bad data. Essential something like...

[
 {id: 1, time: 1}, #bad data
 {id: 1, time: 2} #good data is newer
]
Lets say its about 1TB with an unknown amount of “bad data".The data is stored in HDFS, which might impact the most performant option. But im more curious what that most elegant solution is. I was thinking this would be a good use case for a Document Store. Is there a sensible way to approach this problem using onyx? Is there a distributed systems slack channel i can use so i don’t start dumping my ideas here 😕

gardnervickers01:06:46

1TB? I would probably just load it onto a beefy node and clean it with some clojure

Drew Verlee02:06:15

right. something just clicked. I had this mental block like i had to keep all the data in memory somehow for this to work.

gardnervickers02:06:06

Yea sequential reads with InputStream are crazy quick

luiseugenio14:06:14

Hi. Is there any simple way to put all segments inside a map {:message segment} to get the segments output to a kafka topic? I’ve tried the :kafka/wrap-with-metadata? true in all kafka output items but it didn’t work. I’ve tried using a lifecycle event before writing to kafka, but without success any way. Thanks.

lucasbradstreet14:06:16

@luiseugenio: sorry, I don't really understand the problem. Do you have multiple segments in one map, which should each be a Kafka message, or are you trying to put multiple segments in a single Kafka message.

luiseugenio14:06:10

@lucasbradstreet: , I have multiple output to kafka topics in a big workflow. Outputs to kafka topics in Onyx have to be inserted in a {:message segment}, right? I’ve got it working puting a function node before any output to kafka, and this functions inserts the segment in a {:message xx} map. But I have too many topics to do it this way. I’d like to simply send topic messages without the need to insert the segment inside a :message map. or a unique solution (like a lifecycle event) to do it in one place. Sorry my english. If it’s not clear yet, let me know. 😅

michaeldrogalis15:06:24

@luiseugenio: Need to check whether this is true, but Im 95% sure you can modify the shape of the segment in an output plugin with :onyx/fn

michaeldrogalis15:06:38

You can do that on an input plugin too. Havent tried it with an output plugin, but I dont see anything that precludes that from working

michaeldrogalis17:06:06

Sorry that the updates about the pooled ZooKeeper connection patch tapered off. The changes are in master, and it's undergoing final testing. We found some problems initially during Jepsen testing and went back to redesign. Should be out soon, plus a new feature that's been requested for a long time - idempotent job submission.