This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-04-14
Channels
- # admin-announcements (5)
- # aws (3)
- # beginners (35)
- # boot (96)
- # cider (1)
- # clara (6)
- # cljs-dev (12)
- # cljsrn (34)
- # clojure (151)
- # clojure-boston (3)
- # clojure-brasil (4)
- # clojure-canada (1)
- # clojure-czech (8)
- # clojure-dusseldorf (11)
- # clojure-japan (5)
- # clojure-russia (120)
- # clojure-taiwan (1)
- # clojure-uk (3)
- # clojurescript (7)
- # component (27)
- # cursive (13)
- # data-science (45)
- # datomic (1)
- # devcards (5)
- # emacs (3)
- # funcool (65)
- # hoplon (103)
- # instaparse (3)
- # jobs (14)
- # jobs-discuss (1)
- # juxt (2)
- # lein-figwheel (2)
- # off-topic (16)
- # om (20)
- # onyx (49)
- # parinfer (17)
- # perun (1)
- # planck (5)
- # proton (4)
- # re-frame (14)
- # ring-swagger (4)
- # spacemacs (4)
- # untangled (110)
- # yada (14)
@jeroenvandijk: cool 😄
I would like to batch messages given other criteria than the batch-size. For instance I would like to calculate the accumulated file size and write when the threshold is reached
@aspra: I'll have to think about that a little bit. I can't think of many good options off the top of my head
@lucasbradstreet thanks!
@aspra: Is the idea to control the batch size for outgoing segments from an output plugin?
@michaeldrogalis yes exactly
@aspra: Interesting use case. Just out of curiosity, is there a problem with incrementally writing to an open file handle rather than going at it all at once?
I have created an output plugin that batches to a new file per batch-size but I would like to do it for a byte-size
The best I’ve got so far is to either do that, or to manually ack from an output plugin and accumulate the segments until you hit your criteria, at which point you write out and ack.
@lucasbradstreet: Can you see any problems with using a leaf function task for that with a global window and a trigger that writes to a file when the criteria is met?
Sorry, need to brb for a bit.
That would work, but it would be achieving fault tolerance by journalling the whole file to BookKeeper, which is probably undesirable
@aspra: Can you point me to another tool that has a similar feature? Just curious to see how it works elsewhere.
@michaeldrogalis: no idea if there is such a tool I am afraid
I think @michaeldrogalis’s suggestion of incrementally writing to the file would work. You could track the size of the file and switch to a new handle if the file would grow past the size limit
@aspra: Okay, no worries.
I think window/triggers are probably the wrong solution because you will end journalling all your files to BookKeeper too
Instead of creating a file per batch in your output plugin, you can keep your file handle in an atom in your plugin, and switch to a new one each time the file becomes too big
Yeah, was just a quick suggestion. An async ack is probably the best shot for performance.
Or that, yeah
@michaeldrogalis: one example of a tool that does this is the Pail consolidator for Hadoop https://github.com/nathanmarz/dfs-datastores/blob/develop/dfs-datastores/src/main/java/com/backtype/hadoop/Consolidator.java
FYI, this is the use case we want to implement http://metamx.github.io/docs.metamarkets.com/docs/latest/send-data.html#file-formats-names-and-compression . MetaMarkets uses Druid as data crunch solution
Ah. Will uploading to a HTTPS endpoint be part of the Onyx job, or will that happen outside of Onyx?
The idea was to do it inside, but we are noobs
@jeroenvandijk: Ah okay, I understand the motivation now.
OK, so in that case you would be both writing to a file and also making the http call?
You might not need to write to a file first, I’m guessing?
Alright I gotta run for real now, catch ya'll.
Catch you
@lucasbradstreet there are two use cases
Right, it makes sense to start with the second one
Let me know if you'd like any pointers splitting up the file writing in the plugin then
What is batch-file doing?
it collects messages and creates a file of around 0.5mb and then gzips this. (that’s the goal at least)
i’m also not sure about whether it is ok if the file needs to be transported from one node to another node via Aeron
Yeah, that’s the main thing I wanted to determine
Making sure that file is on the same node as the send-to-api and write-to-s3 tasks is a little tricky.
Is it a good practise to serialize the file and send it from one node to the other?
Maybe the reuse of the batching step isn’t a particular good idea if it’s not
I think it’s OK to send the file contents in a segment to each, based on it being around 300-600KB. However, it might not be worth the pain. It might be better to just re-use the batching code in both send-to-api and write-to-s3
@lucasbradstreet ok. So more what we were discussing before, the batching happens on the output plugins
Ah, I must have misunderstood the purpose of metamarkets/batch-file?
Yeah i guess a case of premature optimization :#
my bad