Fork me on GitHub

When using plugins such as onyx-http or onyx-kafka, the segment is expected to have certain values in it, like :message ... when using the kafka output. We have been creating additional tasks to come before the plugin tasks to transform the segment to the expected structure for the plugin. Is that the expected convention or is there a better way?


I almost expect to be able to do segment transformation as part of a lifecycle before the output task execution to transform the segment into the expected structure, but I don’t think lifecycles can manipulate segment data if I understand correctly.


If you use an output plugin that expects the segments like that you can always wrap them via an :onyx/fn on the output task that has the plugin


It’s really up to you whether you wrap them in the task before or on the final task


Lifecycles can manipulate segment data but it gets into the internals more so onyx/fn is more important.


So having :onyx/fn on the output task defined will execute that function on the segment before the rest of the output task is called?


That’s great. I don’t think I noticed that explained anywhere in the User Guide. That’s much better than what we’ve been doing so far.


Yeah, I can see how you could miss it. I just updated the description in


If you see somewhere in the user guide you could add it, I would love to merge a PR about it.


I’ll certainly think that over and open a pull request for that, thanks.


I guess there’s a lot of information in the cheat sheet that isn’t necessarily explicitly described as part of the user guide. I think I made the mistake of assuming the cheat sheet was going to be a subset of information in the guide.


They’re far more complimentary than I thought.


Yeah, I need to change the name from cheat sheet


It’s really a the number one documentation source at this point


This is what generates our validation, error messages, and the cheat sheet now


Hi! Im running onyx saving checkpoint on S3 and today i got these exception, any tip?

{:status :success, :result #error {
 :cause "Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout; Request ID: 67A4CAB11516672E)"
 :data {:original-exception}
 [{:type clojure.lang.ExceptionInfo
   :message "Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout; Request ID: 67A4CAB11516672E)"
   :data {:original-exception}
   :at [com.amazonaws.http.AmazonHttpClient$RequestExecutor handleErrorResponse "" 1545]}]
 [[com.amazonaws.http.AmazonHttpClient$RequestExecutor handleErrorResponse "" 1545]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutor executeOneRequest "" 1183]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutor executeHelper "" 964]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutor doExecute "" 676]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutor executeWithTimer "" 650]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutor execute "" 633]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutor access$300 "" 601]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl execute "" 583]
  [com.amazonaws.http.AmazonHttpClient execute "" 447]
  [ invoke "" 4031]
  [ putObject "" 1585]
  [ uploadInOneChunk "" 131]
  [ call "" 123]
  [ call "" 139]
  [ call "" 47]
  [java.util.concurrent.FutureTask run "" 266]
  [java.util.concurrent.ThreadPoolExecutor runWorker "" 1149]
  [java.util.concurrent.ThreadPoolExecutor$Worker run "" 624]


congrats on 0.12!


I take it as if reduce is meant to replace / improve the current way of doing windows, so you don't have to both emit downstream and trigger at the same time ?


@lellis Hm, not sure at a first glance


Ill dig in a little later today and get an answer for you.


@lellis Are you seeing that with only one job?


Im wondering if you're S3 endpoint is misconfigured? Just a guess though


I have only one datomic-input type job.


Has that endpoint ever worked for you? We use that endpoint regularly


working fine, and still after resubmit job


That's really strange.


I read something about wrong content-length, so S3 waiting for more data and throw timeout because there's no more data. But its just a superficial looking to these exception.


I have checkpoint working in all my 3 env's.


@lellis do you have any idea how big the checkpoints are? Which version of Onyx?


Hi @lucasbradstreet, onyx "0.10.0" and i have no ideia how big are, how can i check this?


@lellis if you use onyx-peer-http-query you can query /metrics and view checkpoint_size_Value


We recently changed checkpoint recovery to load the checkpoint more asynchronously, which will mean that it no longer times out. You may have a better experience with 0.12