Fork me on GitHub
#onyx
<
2016-06-12
>
devth00:06:03

Has anyone used onyx with GCP Pub/Sub? I'm not seeing any open source plugins.

greywolve12:06:34

"New feature: Jobs now support metadata by including a metadata map with job submission data e.g. {:workflow ... :catalog ... :job-metadata {:name "myjobname"}}. This makes it easier to correlate information about jobs, query the cluster for jobs, etc." from the change log, shouldn't the key name for metadata just be :metadata, like it is in the metrics README ?

lucasbradstreet12:06:41

Indeed it should be. Did you find that in changes.md?

lucasbradstreet12:06:53

@devth: no GCP pub/sub users yet, as far as I know

devth15:06:39

Ok cool. Might be fun to write one myself 🙂

devth15:06:06

I'm just getting ready to run a peer in prod. I've looked through the prod checklist and tested a workflow locally. I'm a little confused on the generated project startup options:

"  start-peers [npeers]    Start Onyx peers."
        "  submit-job  [job-name]  Submit a registered job to an Onyx cluster."
By default the Dockerfile is starting the peer with:
/opt/jdk/bin/java -cp /opt/peer.jar lab79.idx.core start-peers "$NPEERS" -p :docker
What's the intended usage here? Do I let Docker start it up, then attach a shell to my running container and submit a job? (Btw, my job is meant to be long-running and always running as long as my Onyx cluster is up).

devth15:06:03

Where is the correct place to configure things like Datomic uri and ElasticSearch uri? :env-config?

lucasbradstreet16:06:07

We’ve been putting a lot of our job config in the config.edn, and then reading it out via aero (which can handle env variables)

devth16:06:55

Cool. I noticed register-job gets passed the config so I think that'll work.

devth16:06:10

Why does the onyx template use FROM anapsix/alpine-java:jre8 in the Dockerfile as opposed to the official Java image FROM java:8-jre-alpine?

devth16:06:00

☝️ possibly a question for @gardnervickers ?

lucasbradstreet16:06:58

Yep, that’s a question for Gardner

gardnervickers16:06:40

@devth: no particular reason, I didn't know there was an official alpine Java container. I'm out and about now but make an issue and I'll swap over

devth16:06:26

@gardnervickers: will do! Main reason I ask is because the official image keeps up with Alpine better than anapsix does and alpine:3.4 is required to work properly with Kubernetes internal DNS

devth16:06:50

That PR is against the 0.9.x branch. Should I PR against master instead?

lucasbradstreet17:06:09

master is better

devth17:06:17

Trying to run in prod, hitting:

Starting peer-group
16-Jun-12 16:57:07 idx-2696581386-dc3rt INFO [onyx.static.logging-configuration] - Starting Logging Configuration
16-Jun-12 16:57:07 idx-2696581386-dc3rt INFO [onyx.messaging.aeron] - Starting Aeron Peer Group

***
*** Failed to connect to the Media Driver - is it currently running?
***
16-Jun-12 16:57:08 idx-2696581386-dc3rt FATAL [onyx.system] -
                           lab79.idx.core.main
                                           ...
                          lab79.idx.core/-main        core.clj:   66
                          lab79.idx.core/-main        core.clj:   76
                      lib-onyx.peer/start-peer        peer.clj:    7

devth17:06:27

@lucasbradstreet: ok I'll redo the PR

devth17:06:44

This is using the generated Dockerfile from onyx-template

lucasbradstreet17:06:51

the docker container should be starting an aeron media driver as a service

devth17:06:54

Keeps retrying until it crashes

devth17:06:51

I'll just switch to embedded aeron for now

devth17:06:47

Weird:

java.io.IOException: No space left on device
clojure.lang.ExceptionInfo: Error in component :messaging-group in system onyx.system.OnyxPeerGroup calling #'com.stuartsierra.component/start
     component: #<Aeron Peer Group>
      function: #'com.stuartsierra.component/start
        reason: :com.stuartsierra.component/component-function-threw-exception
        system: <#C051WKSP3>.system.OnyxPeerGroup{:config {:onyx/tenancy-id "idx", :onyx.messaging/allow-short-circuit? false, :onyx.messaging/impl :aeron, :onyx.log/config {:level :info}, :onyx.messaging/peer-port 40200, :zookeeper/address "zookeeper:2181", :onyx.peer/job-scheduler :onyx.job-scheduler/greedy, :onyx.messaging.aeron/embedded-driver? true, :onyx.peer/zookeeper-timeout 60000, :onyx.messaging/bind-addr "localhost"}, :logging-config #<Logging Configuration>, :messaging-group #<Aeron Peer Group>}
    system-key: :messaging-group

Exception in thread "main" java.io.IOException: No space left on device

devth17:06:10

I've got plenty of space

devth17:06:13

running on Kubernetes

lucasbradstreet17:06:19

try starting docker run with a bigger —shm-size

devth17:06:32

Not sure if K8S exposes that but I'll check

devth17:06:52

I don't think it's possible

gardnervickers17:06:39

It is possible with kubernetes, mounting the hosts /dev/shm inside the container worked for me in the past

devth17:06:40

I mean setting an explicit --shm-size doesn't appear to be possible

devth17:06:58

It looks like it is mounting but immediately running out of space. Can I tell aeron to use a different location?

devth17:06:14

Also wondering if giving more resources to the pod would help

devth17:06:23

I think it's defaulting to 64m. Inside my pod if I df -h I see

shm                      64.0M     64.0M         0 100% /dev/shm

devth17:06:23

gardnervickers: oh you're saying I should mount /dev/shm as a volume?

devth17:06:07

That would probably work. I just ssh'd to the underlying VM and it has a

Filesystem      Size  Used Avail Use% Mounted on
tmpfs           1.5G  648K  1.5G   1% /run/shm

gardnervickers17:06:05

Yea mount it as an emptydir

devth17:06:22

oh I just mounted it as a hostPath

devth17:06:26

to use the underlying VM's

devth17:06:35

not a good idea?

devth17:06:47

Cool. It started up. Thanks!

devth18:06:43

New exception (after removing old pods and bringing up new ones):

java.lang.IllegalStateException: aeron cnc file version not understood: version=0
java.lang.IllegalStateException: Could not initialise communication buffers
     clojure.lang.ExceptionInfo: Error in component :messaging-group in system onyx.system.OnyxPeerGroup calling #'com.stuartsierra.component/start
     component: #<Aeron Peer Group>
      function: #'com.stuartsierra.component/start
        reason: :com.stuartsierra.component/component-function-threw-exception
        system: <#C051WKSP3>.system.OnyxPeerGroup{:config {:onyx/tenancy-id "idx", :onyx.messaging/allow-short-circuit? false, :onyx.messaging/impl :aeron, :onyx.log/config {:level :info}, :onyx.messaging/peer-port 40200, :zookeeper/address "zookeeper:2181", :onyx.peer/job-scheduler :onyx.job-scheduler/greedy, :onyx.messaging.aeron/embedded-driver? true, :onyx.peer/zookeeper-timeout 60000, :onyx.messaging/bind-addr "localhost"}, :logging-config #<Logging Configuration>, :messaging-group #<Aeron Peer Group>}
    system-key: :messaging-group
Could this be Aeron not cleaning up after itself?

devth18:06:01

Switched /dev/shm to emptyDir so it wasn't leaving state around on the host's shm. Seems to have fixed it.