onyx 2017-12-06 | Slack Archive

niamu00:12:17

When using plugins such as onyx-http or onyx-kafka, the segment is expected to have certain values in it, like :message ... when using the kafka output. We have been creating additional tasks to come before the plugin tasks to transform the segment to the expected structure for the plugin. Is that the expected convention or is there a better way?

niamu00:12:59

I almost expect to be able to do segment transformation as part of a lifecycle before the output task execution to transform the segment into the expected structure, but I don’t think lifecycles can manipulate segment data if I understand correctly.

lucasbradstreet00:12:19

If you use an output plugin that expects the segments like that you can always wrap them via an :onyx/fn on the output task that has the plugin

lucasbradstreet00:12:32

It’s really up to you whether you wrap them in the task before or on the final task

lucasbradstreet00:12:52

Lifecycles can manipulate segment data but it gets into the internals more so onyx/fn is more important.

niamu00:12:47

So having :onyx/fn on the output task defined will execute that function on the segment before the rest of the output task is called?

lucasbradstreet00:12:49

Yes

niamu00:12:03

That’s great. I don’t think I noticed that explained anywhere in the User Guide. That’s much better than what we’ve been doing so far.

lucasbradstreet00:12:34

Yeah, I can see how you could miss it. I just updated the description in http://www.onyxplatform.org/docs/cheat-sheet/latest/

lucasbradstreet00:12:47

If you see somewhere in the user guide you could add it, I would love to merge a PR about it.

niamu00:12:55

I’ll certainly think that over and open a pull request for that, thanks.

lucasbradstreet00:12:58

The batch processing phases are described here: http://www.onyxplatform.org/docs/cheat-sheet/latest/#task-states/:process-batch

niamu01:12:16

I guess there’s a lot of information in the cheat sheet that isn’t necessarily explicitly described as part of the user guide. I think I made the mistake of assuming the cheat sheet was going to be a subset of information in the guide.

niamu01:12:46

They’re far more complimentary than I thought.

lucasbradstreet01:12:15

Yeah, I need to change the name from cheat sheet

lucasbradstreet01:12:22

It’s really a the number one documentation source at this point

lucasbradstreet01:12:51

This is what generates our validation, error messages, and the cheat sheet now https://github.com/onyx-platform/onyx/blob/0.12.x/src/onyx/information_model.cljc

lellis11:12:24

Hi! Im running onyx saving checkpoint on S3 and today i got these exception, any tip?

curl 
{:status :success, :result #error {
 :cause "Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout; Request ID: 67A4CAB11516672E)"
 :data {:original-exception :com.amazonaws.services.s3.model.AmazonS3Exception}
 :via
 [{:type clojure.lang.ExceptionInfo
   :message "Your socket connection to the server was not read from or written to within the timeout period. Idle connections will be closed. (Service: Amazon S3; Status Code: 400; Error Code: RequestTimeout; Request ID: 67A4CAB11516672E)"
   :data {:original-exception :com.amazonaws.services.s3.model.AmazonS3Exception}
   :at [com.amazonaws.http.AmazonHttpClient$RequestExecutor handleErrorResponse "AmazonHttpClient.java" 1545]}]
 :trace
 [[com.amazonaws.http.AmazonHttpClient$RequestExecutor handleErrorResponse "AmazonHttpClient.java" 1545]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutor executeOneRequest "AmazonHttpClient.java" 1183]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutor executeHelper "AmazonHttpClient.java" 964]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutor doExecute "AmazonHttpClient.java" 676]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutor executeWithTimer "AmazonHttpClient.java" 650]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutor execute "AmazonHttpClient.java" 633]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutor access$300 "AmazonHttpClient.java" 601]
  [com.amazonaws.http.AmazonHttpClient$RequestExecutionBuilderImpl execute "AmazonHttpClient.java" 583]
  [com.amazonaws.http.AmazonHttpClient execute "AmazonHttpClient.java" 447]
  [com.amazonaws.services.s3.AmazonS3Client invoke "AmazonS3Client.java" 4031]
  [com.amazonaws.services.s3.AmazonS3Client putObject "AmazonS3Client.java" 1585]
  [com.amazonaws.services.s3.transfer.internal.UploadCallable uploadInOneChunk "UploadCallable.java" 131]
  [com.amazonaws.services.s3.transfer.internal.UploadCallable call "UploadCallable.java" 123]
  [com.amazonaws.services.s3.transfer.internal.UploadMonitor call "UploadMonitor.java" 139]
  [com.amazonaws.services.s3.transfer.internal.UploadMonitor call "UploadMonitor.java" 47]
  [java.util.concurrent.FutureTask run "FutureTask.java" 266]
  [java.util.concurrent.ThreadPoolExecutor runWorker "ThreadPoolExecutor.java" 1149]
  [java.util.concurrent.ThreadPoolExecutor$Worker run "ThreadPoolExecutor.java" 624]

lmergen12:12:47

congrats on 0.12!

lmergen12:12:26

I take it as if reduce is meant to replace / improve the current way of doing windows, so you don't have to both emit downstream and trigger at the same time ?

michaeldrogalis16:12:25

@lmergen Correct 🙂

michaeldrogalis16:12:47

@lellis Hm, not sure at a first glance

michaeldrogalis16:12:58

Ill dig in a little later today and get an answer for you.

lellis16:12:09

Ty! @michaeldrogalis

michaeldrogalis16:12:49

@lellis Are you seeing that with only one job?

michaeldrogalis16:12:00

Im wondering if you're S3 endpoint is misconfigured? Just a guess though

lellis16:12:17

I have only one datomic-input type job.

michaeldrogalis16:12:38

Has that endpoint ever worked for you? We use that endpoint regularly

lellis16:12:38

working fine, and still after resubmit job

michaeldrogalis16:12:56

That's really strange.

lellis16:12:11

I read something about wrong content-length, so S3 waiting for more data and throw timeout because there's no more data. But its just a superficial looking to these exception.

lellis16:12:09

I have checkpoint working in all my 3 env's.

lucasbradstreet17:12:35

@lellis do you have any idea how big the checkpoints are? Which version of Onyx?

lellis17:12:45

Hi @lucasbradstreet, onyx "0.10.0" and i have no ideia how big are, how can i check this?

lucasbradstreet18:12:11

@lellis if you use onyx-peer-http-query you can query /metrics and view checkpoint_size_Value

lucasbradstreet18:12:50

We recently changed checkpoint recovery to load the checkpoint more asynchronously, which will mean that it no longer times out. You may have a better experience with 0.12

2017-12-06

Channels