onyx 2016-12-20 | Slack Archive

It’s really not that bad. Docker is acceptably quick.

Got another error on cheshire: :kafka/deserializer-fn :onyx.tasks.kafka/deserialize-message-json

I mean, when I use that deserializer. cheshire is on onyx-kafka :dev dependencies. Am I not supposed to use that deserializer?

michaeldrogalis00:12:29

Did you require the namespace that the function is in? (:require [onyx.tasks.kafka])

michaeldrogalis00:12:49

(I assume the error was that the symbol couldn’t be resolved.)

yonatanel00:12:45

At first I didn't but then I did and it still didn't work until I added cheshire to my own dependencies

yonatanel00:12:29

Does it matter where I require oinyx.tasks.kafka? I required it in the namespace where I define the job map

lucasbradstreet00:12:00

@yonatanel yeah, you might be submitting your job from a namespace that is different to the one that the peers boot up

michaeldrogalis00:12:07

It needs to be required on the classpath of the JVM that is running the actual peer.

yonatanel00:12:45

Sure, I required onyx.plugin.kafka in my core namespace instead of dev namespace and I was using the REPL. Probably that was the problem. Thanks!

yonatanel12:12:12

onyx-kafka reader task has :kafka/offset-reset values of :earliest or :latest according to its readme, but the user guide says "and you’ve also set :kafka/offset-reset to :largest".

yonatanel12:12:18

I'm getting a rocksdb error that the tmp file is not found. I didn't see examples configuring rocksdb so I wonder if I should set it up somehow or maybe my task name confuses it: org.rocksdb.RocksDBException: IO error: /tmp/rocksdb_filter/e268acd9-2230-48b6-b79c-2dfd74c86bed_:my.task/process-commands: No such file or directory

yonatanel12:12:37

my task name is :my.task/process-commands

yonatanel12:12:04

Yep, I changed the task names to be non-qualified keywords like :process-commands instead of :my.ns/process-commands and now it works. Is it something you want to fix? I like using qualified keywords :(

gardnervickers13:12:11

A fix for that would be very welcome.

gardnervickers13:12:41

I believe we’d just have to ensure that task-id was something acceptable as a file path https://github.com/onyx-platform/onyx/blob/75a48ce99f7d9cc829ea27922d987617abc7c5ac/src/onyx/state/filter/rocksdb.clj#L61

yonatanel13:12:24

@gardnervickers Don't you think task-id should be any keyword, and implement escaping for file name?

gardnervickers13:12:50

I believe it’s just used for keying Rocksdb buckets to each leaf aggregation in a job, so it’s probably suitable to just drop anything that would make it an invalid path, or even take the hash of the key and use that.

gardnervickers13:12:24

The former would be preferable incase you ever wanted to go inspect the buckets yourself

mariusz_jachimowicz13:12:13

I vote for hashing task-id 😄

mariusz_jachimowicz13:12:18

rocksDB db file is deleted on close filter

yonatanel15:12:40

When using embedded zookeeper and kafka I'm getting this error in onyx.log. Any ideas as to why? Configuration is pasted after the error:

clojure.lang.ExceptionInfo: Could not locate any Kafka brokers to connect to.
    recoverable?: true
         zk-addr: "127.0.0.1:2188"
clojure.lang.ExceptionInfo: Caught exception inside task lifecycle. Rebooting the task. -> Exception type: clojure.lang.ExceptionInfo. Exception message: Could not locate any Kafka brokers to connect to.
          job-id: #uuid "31d71252-1104-4718-b337-e530cfa138db"
        metadata: {:job-id #uuid "31d71252-1104-4718-b337-e530cfa138db", :job-hash "3c3ec58f7e276b81cbbea96a8992741bb92a9409a83e4e8424276e275e147"}
         peer-id: #uuid "d66319a4-8624-40a3-b3e9-d15867c55ca3"
    recoverable?: true
       task-name: :read-commands
         zk-addr: "127.0.0.1:2188"

yonatanel15:12:19

(def tenancy-id #uuid"58582e9e-673e-4646-a07c-9fdd26d3d2eb")

(def peer-config
  {:zookeeper/address "127.0.0.1:2188"
   :onyx/tenancy-id tenancy-id
   :onyx.peer/job-scheduler :onyx.job-scheduler/balanced
   :onyx.messaging/impl :aeron
   :onyx.messaging/peer-port 40200
   :onyx.messaging/bind-addr "localhost"})

;; And later in component system map:
    :onyx-env
    (onyx-env
      {
       :zookeeper/server? true
       :zookeeper/address "127.0.0.1"
       :zookeeper.server/port 2188
       :onyx.bookkeeper/server? true
       :onyx.bookkeeper/delete-server-data? true
       :onyx.bookkeeper/local-quorum? true
       :onyx.bookkeeper/local-quorum-ports [3196 3197 3198]
       :onyx/tenancy-id tenancy-id})

    :embedded-kafka
    (component/using
      (embedded-kafka
        {:hostname "127.0.0.1"
         :port 90922
         :broker-id 0
         :num-partitions 1
         :zookeeper-addr "127.0.0.1:2188"})
      [:onyx-env])

    :onyx-peer-group
    (component/using
      (onyx-peer-group peer-config)
      [:onyx-env])

    :onyx-virtual-peers
    (component/using
      (onyx-virtual-peers 5)
      [:onyx-peer-group])

    :onyx-job
    (component/using
      (onyx-job peer-config (em-job))
      [:onyx-virtual-peers :embedded-kafka])

gardnervickers15:12:31

Can you connect to it from some other tool like kafkacat?

gardnervickers15:12:43

https://github.com/edenhill/kafkacat

yonatanel15:12:32

I saw an error that the port was illegal, so I tried 9094. I still can't connect to the embedded kafka. I tried using kafka's cli

gardnervickers15:12:35

Yea sorry, I don’t know much about embedded Kafka, it’s definitely better running in a container we’ve found

yonatanel15:12:35

Right now I can't use docker. Do you know how I make the real zookeeper installed on my machine to forget its onyx state?

yonatanel15:12:14

For some reason I changed the job map, restarted, and it still runs the old one even though I don't submit an ID.

gardnervickers15:12:24

Onyx namespaces itself by tenancy id, so you can either delete that in ZK or change your tenancy id

gardnervickers15:12:55

Yea if you don’t cancel the job it’ll keep trying to run

yonatanel15:12:02

yonatanel15:12:36

So either kill the job after code changes or do a memory wipe for zookeeper?

yonatanel15:12:45

Or regenerate tenancy-id

gardnervickers15:12:59

Or use the with-test-env macro which clears out in-mem ZK for you

gardnervickers15:12:24

Just by restarting the in-mem ZK component

yonatanel15:12:47

I'll try in-memory zk for onyx and real kafka for plugin, and using my own env component to do the shutdown.

gardnervickers15:12:48

Hmm I’m actually not sure if in-mem ZK allows external connections.

gardnervickers15:12:31

If you can’t use docker then maybe try and replicate it’s functionality by having a set of setup/teardown scripts for local ZK and Kafka?

yonatanel15:12:35

Also, the kafka plugin uses zookeeper to find brokers, and my real kafka uses real zk

gardnervickers15:12:57

Yes, you need to use the same ZK

yonatanel15:12:40

Is there a policy for a job to not start once all peers were shut down?

gardnervickers15:12:09

I would just add a hook to kill the job

yonatanel16:12:19

All good now. Thanks @gardnervickers

yonatanel16:12:10

Is there a way to silence the embedded bookkeeper log? I need to see my own prints

gardnervickers16:12:45

You can supply it with a log4j file

yonatanel17:12:57

What happens if the group-by-key is not found in a segment?

michaeldrogalis18:12:16

@yonatanel It buckets the segment into the nil group.

yonatanel18:12:09

Unfortunately I misconfigured the uniqueness key and it was always nil

2016-12-20

Channels