onyx 2016-02-01 | Slack Archive

thank you, @michaeldrogalis - that’s very helpful

@michaeldrogalis: we have a busy time ahead, so we're probably going to go with balanced for now, but i'll see what i can do to get you a reproducible case when i can for sure.

lucasbradstreet07:02:54

@greywolve: the problem was explicitly with colocated?

lucasbradstreet07:02:14

And it never starts the job?

greywolve07:02:37

yup, it just does nothing after starting all the peers

greywolve07:02:02

doesn't ever say "enough peers started, starting the task..."

greywolve07:02:33

balanced works fine though

lucasbradstreet07:02:00

Ok cool. I'll see if I can quickly reproduce it. Definitely an alpha feature for now I guess

greywolve07:02:02

this is just on a single node btw

lucasbradstreet07:02:34

Cool, that helps

lucasbradstreet08:02:19

I’m not able to easily reproduce it, so we’ll probably need more details later

robert-stuttaford09:02:40

interesting set of exceptions we got recently

robert-stuttaford09:02:46

system continues to run

lucasbradstreet09:02:08

Those look a lot like a deploy that was bad and was fixed, or a jar that was incomplete.

lucasbradstreet09:02:46

The only time I've seen things like those class not found issues have been when I lein installed over a running jar

lucasbradstreet09:02:41

Maybe the daemon that starts the uberjar tried to start it up before it was fully uploaded and then tried again and didn't crash

robert-stuttaford09:02:39

yeah

robert-stuttaford09:02:09

ok, Lucas, i think we’re very close to a multi-node cluster. there are many variables, but i’ve got it all the way to the point where it submits the job and then doesn’t process anything

robert-stuttaford09:02:36

on both servers, the onyx log shows many Starting ZK connections in a row, followed by one Stopping ZK connection message

robert-stuttaford09:02:12

on both servers, the zk log has many of these: Got user-level KeeperException when processing sessionid:0x1c529c38d6770028 type:create cxid:0x13 zxid:0x900000770 txntype:-1 reqpath:n/a Error Path:/onyx/highstorm/origin/origin Error:KeeperErrorCode = NodeExists for /onyx/highstorm/origin/origin

robert-stuttaford09:02:43

i’ve already confirmed that the ZK cluster is up - i can see one is Leader and one is Follower, and watching A’s log while cycling B has an effect, ditto B while watching A

robert-stuttaford09:02:05

i’m a little unsure as to where to look next for clues

lucasbradstreet09:02:14

I think that error is ok. It doesn't look like you're versioning your onyx/id though.

lucasbradstreet09:02:24

I don't recommend using the same onyx/id between deploys

robert-stuttaford09:02:38

i’m very sure there are sufficient peers - each node starts 40, which is the total for all the tasks in the system

robert-stuttaford09:02:45

ah, that’s good information, thank you

robert-stuttaford09:02:11

should be ok if we cleanly stop and start all instances, right?

lucasbradstreet09:02:32

Sure

lucasbradstreet10:02:52

The reason for not sharing the onyx/id is that it needs to play back the log which may not be compatible with your onyx version, plus you can end up with different versions of your jar communicating with each other

greywolve10:02:44

but each node needs to use the same onyx id though, correct?

lucasbradstreet10:02:25

Correct

lucasbradstreet10:02:04

One approach is to use a git sha or uberjar md5

greywolve10:02:14

ahh right!

lucasbradstreet10:02:15

You could pretend highstorm to if

lucasbradstreet10:02:17

greywolve10:02:27

clever

robert-stuttaford10:02:53

yep, we’ll amend config in circleci script so that we have a history to refer back to

lucasbradstreet10:02:48

You could also prepend the circle build number so everything is sorted nicely

robert-stuttaford10:02:54

ok. ZK running. aeron running. jobs start. no exceptions anywhere. onyx and zk logs clear. plenty of peers started. it just doesn’t do anything.

robert-stuttaford10:02:57

what else can i check?

robert-stuttaford10:02:32

true, although the git sha is better, i think, because it relates to the code and not the build

lucasbradstreet10:02:39

yeah, you can do both

lucasbradstreet10:02:45

Hmm

lucasbradstreet10:02:53

Are any metrics being output?

robert-stuttaford10:02:57

no metrics

lucasbradstreet10:02:28

I assume it says some log messages like this: "16-Jan-13 23:16:22 mbr INFO [onyx.peer.task-lifecycle] - [cf9ef6b1-bf1c-46aa-ad7a-9b25f8cb07be] Enough peers are active, starting the task"

lucasbradstreet10:02:34

None at all huh

robert-stuttaford10:02:37

nope, i don’t see those messages

lucasbradstreet10:02:45

OK, it sounds like the job hasn’t been started

lucasbradstreet10:02:55

Did the submit job get submitted to the right onyx/id?

greywolve10:02:01

hmm that's the same thing that was occurring with colocated on a single node

robert-stuttaford10:02:09

yes, the onyx/id is currently hard-coded in the config file to “highstorm"

robert-stuttaford10:02:12

used throughout

lucasbradstreet10:02:27

Hmm

robert-stuttaford10:02:34

it starts lots and then one ZK conn closes right at the end

robert-stuttaford10:02:40

is that significant?

robert-stuttaford10:02:58

`16-Feb-01 04:48:49 http://hs1.cognician.com INFO [onyx.log.zookeeper] - Starting ZooKeeper client connection. If Onyx hangs here it may indicate a difficulty connecting to ZooKeeper. 16-Feb-01 04:48:53 http://hs1.cognician.com INFO [onyx.log.zookeeper] - Stopping ZooKeeper client connection`

greywolve10:02:58

yeah i think that's normal

lucasbradstreet10:02:00

Sounds like lots of peers starting up and a final submit job

lucasbradstreet10:02:18

Weren’t you doing the submit job as part of the startup process before?

greywolve10:02:23

it stops the ZK for me locally, and then the "Enough peers..." messages flood the log

robert-stuttaford10:02:35

yes; we still are

lucasbradstreet10:02:38

If so, are you doing a single submit job on one node now?

lucasbradstreet10:02:45

Otherwise you might be starting up three jobs

robert-stuttaford10:02:54

robert-stuttaford10:02:57

that’s why

robert-stuttaford10:02:01

but...

lucasbradstreet10:02:11

I don’t know why one doesn’t get started though

robert-stuttaford10:02:14

each node starts 40 peers, the job requires 40 peers, and we start it twice

greywolve10:02:20

oh shit

robert-stuttaford10:02:21

(because both start the job on startup)

greywolve10:02:26

so only one node must do the starting, of course

robert-stuttaford10:02:31

so i would think that they’d still work

lucasbradstreet10:02:38

Yeah, I think it should still work

lucasbradstreet10:02:50

It’s my best guess though. Hrm.

robert-stuttaford10:02:09

ok. i’m going to down one onyx instance and get the other working first

greywolve10:02:09

worth a shot to just try one job starting only

robert-stuttaford10:02:24

one onyx + a 2 node zk cluster should work just fine, right

lucasbradstreet10:02:32

sure

robert-stuttaford10:02:41

so, while i do that, how do you control which server submits the jobs

lucasbradstreet10:02:59

you have whatever is doing the deploy do the submit

robert-stuttaford10:02:05

we’d have to provide alternate config to one

lucasbradstreet10:02:06

in a separate process

robert-stuttaford10:02:35

sounds like a custom AWS CodeDeploy script

lucasbradstreet10:02:08

Are you running it with a new onyx/id just in case?

robert-stuttaford10:02:15

got a couple uk.co.real_logic.aeron.exceptions.DriverTimeoutException: Driver has been inactive for over 10000ms on stopping the service, and then it hung for over a minute on kill-job

lucasbradstreet10:02:21

Otherwise all your jobs will still be scheduled

lucasbradstreet10:02:22

Hmm

robert-stuttaford10:02:27

eventually kill -9’d it

robert-stuttaford10:02:37

i’ll hup ZK too

lucasbradstreet10:02:14

ZooKeeper is probably safe

robert-stuttaford10:02:57

i restarted it to discard scheduled jobs

robert-stuttaford10:02:39

ok. so maybe running ZK, Onyx, Aeron on a 2 core box aint such a good idea.

lucasbradstreet10:02:31

Heh

lucasbradstreet10:02:58

Since you discarded your ZK, make sure you set the start-tx in the log reader

lucasbradstreet10:02:10

This may just be testing anyway.

robert-stuttaford10:02:53

yeah i’ve coded it to go back 2 days in the datomic tx log

robert-stuttaford11:02:49

@lucasbradstreet: another bit of info

robert-stuttaford11:02:02

although the job isn’t running (no task logging output), the system is super busy

robert-stuttaford11:02:31

these CPU numbers are representative of the last 20m mins at least

robert-stuttaford11:02:40

what might it be doing, if not processing :input?

lucasbradstreet11:02:33

@robert-stuttaford: is it possible your metrics are broken?

lucasbradstreet11:02:45

i.e. pointing at the wrong server?

lucasbradstreet11:02:49

Or something like that

robert-stuttaford11:02:50

that is possible

robert-stuttaford11:02:32

gosh. now i feel a fool. it is wrong. i’ll fix that and let you know how i go

lucasbradstreet11:02:55

Heh, it was starting to be the only explanation

robert-stuttaford11:02:13

used to be on the same node, then moved it

lucasbradstreet11:02:14

I'm surprised it didn't throw / log an exception on connect after a while

robert-stuttaford11:02:29

i can show you a nice big screenshot of all the tails if you like

lucasbradstreet11:02:56

Might as well. There might be a fix needed in onyx-metrics

lucasbradstreet11:02:30

You're still using the riemann sender, right?

robert-stuttaford11:02:37

statsd

robert-stuttaford11:02:50

slight variant for DataDog’s impl

lucasbradstreet11:02:27

This is a sender your team built? Figured you never ended up building it since we were going to put it in onyx-metrics at some point

robert-stuttaford11:02:37

yes, our own sender

lucasbradstreet11:02:02

K, my guess is a future without exception handling in it then

lucasbradstreet11:02:21

In the sender

robert-stuttaford11:02:37

yup 😞

robert-stuttaford11:02:46

well, this is why we have a test setup figure out all the grizzlies

lucasbradstreet11:02:48

Always a little bit satisfying when it's not our fault :p

robert-stuttaford11:02:56

hahaha

robert-stuttaford11:02:01

it very often isn't

lucasbradstreet11:02:36

Futures are a good way to find out you don't handle your exceptions very well :). Been there many times

greywolve12:02:03

haha funny i wrote a note to myself to always consider that, but somehow still forgot to catch other exceptions there and log them, doh

greywolve12:02:07

@lucasbradstreet: do you guys want a dogstatsd specific sender? otherwise we'll probably just open source it separately

greywolve12:02:36

(in case you weren't aware, data dog has their own flavour of statsd)

lucasbradstreet12:02:04

Ah, if it's just datadog flavored statsd you guys feel free to open source it separately

lucasbradstreet12:02:45

I guess it'd be handy to have it on call if needed

lucasbradstreet12:02:47

I guess it'd be handy to have it on call if needed

greywolve12:02:42

yeah entirely up to you

robert-stuttaford14:02:38

@lucasbradstreet, the onyx/start-job fn returns nil

robert-stuttaford14:02:53

robert-stuttaford14:02:54

nevermind

robert-stuttaford14:02:57

i’m a dork

robert-stuttaford14:02:59

-sigh-

michaeldrogalis15:02:30

@robert-stuttaford: Over the hurdle?

robert-stuttaford15:02:20

not yet

robert-stuttaford15:02:25

quite stuck, actually

robert-stuttaford15:02:50

going to take a step back and make a fresh branch from known-good and apply changes and test bit by bit

michaeldrogalis15:02:44

Sounds good

michaeldrogalis17:02:28

New template is here! https://twitter.com/MichaelDrogalis/status/694202368095727616

lucasbradstreet17:02:24

There's a couple more steps but it makes for good copy ;)

michaeldrogalis17:02:45

Heh, indeed. Read the instructions in README.md after docker-compose is up.

lvh18:02:01

I wonder how easy it would be to make some of that behavior not live in templates. I mean , I like templates and I get why that’s a great starting point for people, but now it’s also not clear if I want to backport changes to the thing I already created with the template or backport changes from the new template to the thing I already have 😄

lvh18:02:52

there have been 46 commits in the last week, even if they’re small, that’s hard to keep track of

michaeldrogalis18:02:01

@lvh: There's not much in the way of feature-level behavior in the template. It's mostly an example of idioms that we find most helpful, and a suggestion about how to structure the project. I'd say give the new template a read and decide if you like the way it's set up.

michaeldrogalis18:02:02

As much as possible, we moved sharable behavior into https://github.com/onyx-platform/lib-onyx.

lvh19:02:39

michaeldrogalis: Sure. I’m looking at changes like: https://github.com/onyx-platform/onyx-template/commit/40c9f84d86d3ec711f4cd3a30c8eda28603f57b9 (which feels like a lein plugin waiting to happen) or: https://github.com/onyx-platform/onyx-template/commit/5a17f083c30d595965b892b8a95f4502e2dc52ac (where it’s not clear why I want that and what other changes I need to backport for that to work) or changes to: https://github.com/onyx-platform/onyx-template/blob/0.8.x/src/leiningen/new/onyx_app/script/run_peers.sh Where I stil had the old verison that runs two exec java s and it’s not clear to me which one I might want and why

lvh19:02:23

I’m trying to backport things as they make sense since that’s probably more sustainable than just restarting every time the template updates

michaeldrogalis19:02:47

@gardnervickers: ^^

michaeldrogalis19:02:25

Tbh I'm trying to remain relatively hands off, time to see where other people run with the operational aspects. But him and @lucasbradstreet can give some insight

michaeldrogalis19:02:53

I agree with the general sentiment though, it's the drawback of templating. I would backport as well once I had a handle on it.

gardnervickers19:02:14

@lvh: The changes to run_peers are to allow docker to cleanup properly on shutdown https://blog.phusion.nl/2015/01/20/docker-and-the-pid-1-zombie-reaping-problem/

gardnervickers19:02:58

@lvh: I totally agree with you. It’s going to take a bit more time to find the set of conventions that are universal across Onyx app’s, and developing tooling to handle that will come.

gardnervickers19:02:41

I believe that the idioms put forward with this new template are general enough as to allow us flexibility while writing tools to manipulate your onyx jobs.

gardnervickers19:02:13

The eventual goal would be to get to something ruby on rails like.

gardnervickers19:02:35

onyx new job —input sql —output kafka or similiar

lvh19:02:42

Sure! I wanna make clear that I’m not criticizing these efforts, and I’m super glad and thankful that you’re working on these universal conventions

lvh19:02:46

Yeah.

gardnervickers19:02:16

Thanks for testing it out! I welcome harsh criticism, it’s important to get this right for a wide range of use cases.

lvh19:02:56

The changes are obvious improvements IMHO

lvh19:02:02

At least the ones I understand

lvh19:02:37

Is https://github.com/onyx-platform/onyx-template/commit/5a17f083c30d595965b892b8a95f4502e2dc52ac literally all I need to port for Bookkeeper to work? If so, why do I want bookkeeper?

lvh19:02:06

IIUC it creates a log over ZK; is that the log mentioned in onyx’ documentation?

lvh19:02:11

if so, how does that ever work without it?

gardnervickers19:02:20

You can use Onyx w/out BookKeeper, Bookkeeper is used for persistence with state aggregation (windowing)

gardnervickers19:02:49

Essentially if you want to use the stateful windowing stuff, you need bookkeeper

lvh19:02:00

I’d be pretty happy if we can make the dev env for my project just always be docker compose, and have that expose a REPL or something for local development; that feels like having less different thing around

lvh19:02:04

gardnervickers: ah, gotcha, that makes sense, thanks 😄

gardnervickers19:02:17

https://github.com/onyx-platform/onyx/blob/0.8.x/doc/user-guide/aggregation-state-management.md#bookkeeper-implementation

gardnervickers19:02:27

Not sure how familiar with docker-compose you are (i’m brand new, so I think this is cool) but you can do docker-compose scale peer=3 to create multiple containers, start a job, and kill off containers to test Onyx’s failover behavior

lvh19:02:40

gardnervickers: I am pretty familiar with docker-compose

lvh19:02:53

another reason I care about compose is that I work for rackspace, and we have a thing called carina

lvh19:02:10

which means that if you can get your thing to run on docker compose, we can probably get you a bare metal machine that it works on, too

lvh19:02:24

s/bare metal machine/bunch of bare metal machines/

gardnervickers19:02:57

Is it similar to Kubernetes/Marathon?

lvh19:02:12

there are a lot of similarities, yes

lvh19:02:22

the biggest difference is that carina just tries to be docker (for better or for worse)

lvh19:02:35

so, the way it works is you set some env vars that point to a new docker socket

gardnervickers19:02:50

It implements the docker api?

lvh19:02:54

except because :unicorn_face: ⭐ 🌟 magic 🌟 it points at a hefty remote box

lvh19:02:55

lvh19:02:21

you run docker run -d nginx and you get something running on a carina cluster

lvh19:02:41

there’s a carina tool, but it’s entirely optional; it just configures your shell for you

michaeldrogalis19:02:54

Ahh, that's so cool!

michaeldrogalis19:02:04

Gotta run for real now, back later

gardnervickers19:02:48

https://getcarina.com/ Sweet i’ll read up on it. My plan was to add options for +kubernetes etc. to generate the templates for you. +carina would be great to have too

lvh19:02:16

I’d love to contribute that once I understand more about the template

lvh19:02:27

the good news is if it just uses docker compose, you almost certainly need to write 0 code for that to happen

lvh19:02:36

you just do carina env $WHATEVER

lvh19:02:40

and then docker compose up

lvh19:02:50

and then ☁️ ✨

gardnervickers19:02:57

Yea thats interesting. Does it handle the networking for you like docker-compose does?

lvh19:02:05

yes

lvh19:02:21

it’s docker-swarm under the hood, except it’s smart about multi-phys-host and multi-segment

gardnervickers19:02:36

Ok that makes sense

lvh19:02:43

it’s roughly “rack-aware” docker swarm

gardnervickers19:02:01

Wow thats really interesting

lvh19:02:17

and when I say “rack” that sometimes means “physical rack, you know, with computers” and sometimes that means “rackspace” since, well, we’re operating it

lvh19:02:23

it’s open source though, so there’s that

lvh19:02:58

they also figured out how to make it work hypervisorlessly, which is a nice performance boost

gardnervickers19:02:16

I’ve not used any of the docker/cluster management tools in a year or so but how easy is it to request volumes?

gardnervickers19:02:37

I remember that being a bit wonky when I was using Kubernetes, sometimes it would not work and fail silently on GCE

lvh19:02:47

as in docker volumes, for writing bytes persistently?

gardnervickers19:02:54

yea

lvh19:02:39

it continues to be a huge weak spot for docker, and it’s clear that the Thing You Should Use(TM) is “external services for storage”, obviously that doesn’t work all of the time

lvh19:02:13

for better or worse, carina has very clearly chosen to interop directly with what docker provides, and not, unless absolutely necessary, write a proprietary alternative

lvh19:02:36

unfortunately that means the tools are what docker gives you, and I’d be lying if I said those were prefect

lvh19:02:58

so, with carina you just ask for a volume and you get it

lvh19:02:08

migrating those volumes within physical hosts is not a thing it does AFAIK

gardnervickers19:02:14

gotcha

lvh19:02:30

so, good news: you get a real filesystem; bad news: you get a real filesystem

gardnervickers19:02:35

lvh19:02:25

carina was working on (and I think this is done now?) giving you cloud block storage that you can attach as a volume, which essentially solves that problem

lvh19:02:37

you get the usual tradeoffs of reduced IOPS, but HA

gardnervickers19:02:47

Yea that’s what Kubernetes does

gardnervickers19:02:02

but last I checked it was only possible on the GCE platform

lvh19:02:09

and that CBS is just Rackspace’s public cloud’s CBS

lvh19:02:21

so, eh, well, it’ll run on any openstack install

gardnervickers19:02:26

Ohhh nice

lvh19:02:33

(you probably don’t want to run your own openstack though)

gardnervickers19:02:44

No, but a client of mine that’s currently on RackSpace private cloud is looking at running a bunch of their services as docker containers. I saw that you guys have OpenStack as an option instead of vsxi for the private cloud stuff.

gardnervickers19:02:04

Carina+OpenStack might be a good option for them

lvh19:02:43

yeah!

lvh19:02:02

I was going to say “you don’t want to run your own openstack, but we’ll totally run that for you” but that sentence sounded like I was being an awful corporate shill

lvh19:02:19

but yes, we will totally run an openstack for you and that’s probably a good deal, turns out running clouds is p hard

lvh19:02:37

and openstack is not optimized for the “I just want to mess around with this right now” audience

gardnervickers19:02:43

it’s good!

lvh21:02:41

herrwolfe, sirsean and reaperhulk (not here yet, I don’t think) are folks on my team, FWIW

sirsean21:02:59

Hello. 👋

derwolfe21:02:01

hello

michaeldrogalis21:02:28

Hello! Thanks for the introduction. Makes it much easier when I know which people are sharing the same problems.

sirsean21:02:06

I’m currently trying to get the docker-compose thing running to get data through Kafka (and in this demo into MySQL). Can’t really tell where I’m hung up … submitting jobs seems to work but nothing appears in the database. Is there an obvious way to turn on more logging?

michaeldrogalis21:02:15

@sirsean: I assume you're looking at onyx.log?

michaeldrogalis21:02:43

@gardnervickers made this helpful walkthrough, also. http://recordit.co/OxM66e0kG8

gardnervickers21:02:03

Hey folks!

gardnervickers21:02:39

@sirsean: Hey, did you setup the db table?

sirsean21:02:13

gardnervickers: yep, I thought that was my problem (since I didn’t do it the first time) but now that it’s there it remains empty.

sirsean21:02:47

michaeldrogalis: the onyx.log file doesn’t seem to be getting updated, which makes sense since I’m running inside Docker (I believe it’s only there because once I ran the dev mode outside Docker).

gardnervickers21:02:36

onyx.log is replicated to stdout when using docker-compose

gardnervickers21:02:55

@sirsean: What do you see in your docker-compose logs?

sirsean21:02:16

Only zookeeper stuff.

gardnervickers21:02:35

Really? odd.. is the container for the onyx peer running if you do docker ps

gardnervickers22:02:29

You should be seeing something like

peer_1      | Attempting to connect to to Zookeeper:  zk:2181
peer_1      | Started peers. Blocking forever.

sirsean22:02:29

Yeah.

gardnervickers22:02:36

in the docker-compose logs

sirsean22:02:02

Launched the Media Driver. Blocking forever...
16-Feb-01 21:28:44 f32b7bc67204 INFO [onyx.static.logging-configuration] - Starting Logging Configuration
16-Feb-01 21:28:44 f32b7bc67204 INFO [onyx.messaging.aeron] - Starting Aeron Peer Group
Attempting to connect to to Zookeeper:  zk:2181
Started peers. Blocking forever.

sirsean22:02:07

And then nothing.

gardnervickers22:02:11

Yea that’s fine

gardnervickers22:02:19

that means the cluster is up and waiting for a job

gardnervickers22:02:48

If you go and create the DB table

sirsean22:02:41

The DB table is there.

gardnervickers22:02:21

Then you can submit a job to ZK with

ZOOKEEPER=$(echo $DOCKER_HOST|cut -d ':' -f 2|sed "s/\/\///g") lein run -m app-name.jobs.sample-submit-job

gardnervickers22:02:04

Then you should get segments flowing into your DB

gardnervickers22:02:45

There will be some chatter from ZK if that’s what you mean

sirsean22:02:34

That’s how I had been submitting the job. I see ZK things happen and then no logs from the peer.

gardnervickers22:02:57

But do you see results accumulating in your DB?

sirsean22:02:58

No, nothing happens.

sirsean22:02:02

DB table is still empty.

sirsean22:02:17

Perhaps relevant, the kafkacat logs:

% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed

gardnervickers22:02:07

It could totally be kafkacat, it’s just piping the results of curl to stdin on kafkacat and sometimes it breaks.

gardnervickers22:02:16

docker-compose rm

gardnervickers22:02:36

I would delete the containers and re-make them. Need to make a proper service that restarts on failure for kafkacat

sirsean22:02:52

Got this error in the middle of building kafkacat:

== running CMake in build directory
./configure: 41: ./configure: cmake: not found
The "cmake" program is required to configure yajl.
It's available from most ports/packaging systems and 
Build of libyajl FAILED!
Failed to build libyajl: JSON support will probably be disabled
Building kafkacat
./bootstrap.sh: line 65: pkg-config: command not found
Using -lpthread -lz -lrt -lpthread -lz -lrt for rdkafka
./bootstrap.sh: line 65: pkg-config: command not found
grep: tmp-bootstrap/usr/local/lib/pkgconfig/yajl.pc: No such file or directory
Using  for yajl

sirsean22:02:55

That matter?

gardnervickers22:02:55

Yea I think so, i’ll look into it.

gardnervickers22:02:21

Oh sorry that’s not an issue

gardnervickers22:02:08

Can you make a gist with the output from docker-compose logs after you have deleted your old containers (`docker-compose rm`)

sirsean22:02:46

You mean do a docker-compose rm, then docker-compose up, then docker-compose logs and show the output?

gardnervickers22:02:56

Yea if you dont mind

gardnervickers22:02:08

just to eliminate possibilities here 😕

michaeldrogalis22:02:30

What OS are you on @sirsean?

sirsean22:02:50

OS X 10.11.2

michaeldrogalis22:02:13

I'm on OS X 10.9.5. Just a datapoint

gardnervickers22:02:01

95% of the problems I have with this setup is the kafkacat container.

sirsean22:02:35

https://gist.github.com/sirsean/d8496d0e6f3b8356600f

gardnervickers22:02:22

looks good

gardnervickers22:02:32

create the table and submit the job, then update the logs please? Thanks for the help

sirsean22:02:38

sirsean22:02:19

$ ZOOKEEPER=$(echo $DOCKER_HOST|cut -d ':' -f 2|sed "s/\/\///g") lein run -m desdemona.jobs.sample-submit-job
16-Feb-01 16:28:43 lips.local INFO [onyx.log.zookeeper] - Starting ZooKeeper client connection. If Onyx hangs here it may indicate a difficulty connecting to ZooKeeper.
16-Feb-01 16:28:43 lips.local INFO [onyx.log.zookeeper] - Stopping ZooKeeper client connection
Submitted job:  #uuid "d953c935-6d27-4d52-8bc5-5e9e2bd4f018”

sirsean22:02:22

And the gist is updated.

sirsean22:02:59

(Showing that only a ZK connection happened, and nothing made it to Kafka.)

gardnervickers22:02:21

Not ignoring, trying to recreate

gardnervickers22:02:22

sirsean22:02:12

sirsean22:02:45

I’m wondering if I should try to hook up a different source for Kafka that doesn’t use kafkacat. (It’s not like my actual app is going to use this, haha.)

gardnervickers22:02:53

@sirsean: I did find a bug with our logger. For some reason the default timbre logger was not logging our user namespaces

gardnervickers22:02:59

(defn standard-out-logger
  "Logger to output on std-out, for use with docker-compose"
  [data]
  (let [{:keys [output-fn]} data]
    (println (output-fn data))))

(defn -main [n & args]
  (let [n-peers (Integer/parseInt n)
        config (read-config ( "config.edn") {:profile :default})
        peer-config (-> (:peer-config config)
                        (assoc :onyx.log/config {:appenders {:standard-out
                                                             {:enabled? true
                                                              :async? false
                                                              :output-fn t/default-output-fn
                                                              :fn standard-out-logger}}}))
        peer-group (onyx.api/start-peer-group peer-config)
        env (onyx.api/start-env (:env-config config))
        peers (onyx.api/start-peers n-peers peer-group)]
    (println "Attempting to connect to to Zookeeper: " (:zookeeper/address peer-config))
    (.addShutdownHook (Runtime/getRuntime)
                      (Thread.
                       (fn []
                         (doseq [v-peer peers]
                           (onyx.api/shutdown-peer v-peer))
                         (onyx.api/shutdown-peer-group peer-group)
                         (shutdown-agents))))
    (println "Started peers. Blocking forever.")
    ;; Block forever.
    (<!! (chan))))

gardnervickers22:02:11

Can you change your launch-prod-peers to look like that

gardnervickers22:02:44

Then, you will see if any segments are actually flowing through onyx after you docker-compose rm, ./script/build.sh, docker-compose up

sirsean22:02:33

Doing it.

sirsean22:02:01

No change. The peer doesn’t log anything and Kafka doesn’t log anything.

gardnervickers22:02:33

You rebuilt the containers, correct?

sirsean22:02:51

Yeah.

sirsean22:02:14

(Had to go make the DB table again, but I don’t think that would’ve been a problem since nothing gets to that point.)

michaeldrogalis22:02:23

@gardnervickers: Could this be that obscure bug that popped up where the containers cant talk to the internet?

sirsean22:02:48

:docker-quality:

gardnervickers22:02:50

When the DNS settings are not transfered to the container by the docker environment

michaeldrogalis22:02:02

LOL @sirsean

gardnervickers22:02:31

Not really sure, Onyx is starting up fine it seems.

michaeldrogalis22:02:09

Er, I meat maybe thats why kafcat isnt pumping in messages

sirsean22:02:10

I’m rebuilding the kafkacat image with dig in the script.sh to see if it’s able to even connect.

michaeldrogalis22:02:28

Good call

gardnervickers22:02:10

@michaeldrogalis: sorry yea I got that, thinking out loud.

gardnervickers22:02:25

My last resort is to usually delete the docker images associated with the docker-thing im doing 😕

michaeldrogalis22:02:54

Its okay, hard to escape Docker being finicky on everyone's machine

sirsean22:02:15

kafkacat_1  | ;; ANSWER SECTION:
kafkacat_1  | .	299	IN	A	104.16.49.168
kafkacat_1  | .	299	IN	A	104.16.50.168
kafkacat_1  | .	299	IN	A	104.16.52.168
kafkacat_1  | .	299	IN	A	104.16.53.168
kafkacat_1  | .	299	IN	A	104.16.51.168
kafkacat_1  |
kafkacat_1  | ;; Query time: 34 msec
kafkacat_1  | ;; SERVER: 8.8.8.8#53(8.8.8.8)

sirsean22:02:18

Okay so it’s not that.

gardnervickers22:02:58

Can you delete your peer, kafkacat, kafka and zk images?

gardnervickers22:02:07

Sorry to go nuclear but I have no idea what else it could be

sirsean23:02:22

I’ve done it but I’ll do it again.

gardnervickers23:02:50

I’m more interested in figuring out why even after you made the code changes and rebuilt, you were not seeing any logs from the peer

gardnervickers23:02:40

I’m going to revert that commit preventing proper logging though. Maybe @michaeldrogalis can push the update?

gardnervickers23:02:18

I have to run for a few, @sirsean I pushed the changes to https://github.com/onyx-platform/onyx-template but I cannot do a deploy. If you want you can clone, lein install and lein new onyx-app my-app-name +docker.

gardnervickers23:02:32

Or wait for @michaeldrogalis to deploy

gardnervickers23:02:48

That should get you output looking like this

gardnervickers23:02:52

zookeeper_1 | 2016-02-01 23:14:38,737 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /172.17.0.6:47432 which had sessionid 0x1529f108fe10083
peer_1      | 16-Feb-01 23:14:41 f45c733dc20b INFO [onyxapp.lifecycles.logging] - :write-lines  logging segment:  {:rows [{"groupId" 19094838, "groupCity" "Chatsworth", "category" "health/wellbeing"}]}
peer_1      | 16-Feb-01 23:14:41 f45c733dc20b INFO [onyxapp.lifecycles.logging] - :write-lines  logging segment:  {:rows [{"groupId" 15173842, "groupCity" "Renton", "category" "socializing"}]}
zookeeper_1 | 2016-02-01 23:14:42,242 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /172.17.0.6:47433
zookeeper_1 | 2016-02-01 23:14:42,244 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /172.17.0.6:47433
zookeeper_1 | 2016-02-01 23:14:42,245 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@617] - Established session 0x1529f108fe10084 with negotiated timeout 5000 for client /172.17.0.6:47433
zookeeper_1 | 2016-02-01 23:14:42,248 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x1529f108fe10084
zookeeper_1 | 2016-02-01 23:14:42,250 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /172.17.0.6:47433 which had sessionid 0x1529f108fe10084
peer_1      | 16-Feb-01 23:14:44 f45c733dc20b INFO [onyxapp.lifecycles.logging] - :write-lines  logging segment:  {:rows [{"groupId" 5073632, "groupCity" "Yelm", "category" "paranormal"}]}
peer_1      | 16-Feb-01 23:14:44 f45c733dc20b INFO [onyxapp.lifecycles.logging] - :write-lines  logging segment:  {:rows [{"groupId" 3054822, "groupCity" "Jackson Heights", "category" "movements/politics"}]}
zookeeper_1 | 2016-02-01 23:14:44,755 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@197] - Accepted socket connection from /172.17.0.6:47434
zookeeper_1 | 2016-02-01 23:14:44,757 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:ZooKeeperServer@868] - Client attempting to establish new session at /172.17.0.6:47434
zookeeper_1 | 2016-02-01 23:14:44,758 [myid:] - INFO  [SyncThread:0:ZooKeeperServer@617] - Established session 0x1529f108fe10085 with negotiated timeout 5000 for client /172.17.0.6:47434
zookeeper_1 | 2016-02-01 23:14:44,763 [myid:] - INFO  [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@494] - Processed session termination for sessionid: 0x1529f108fe10085
zookeeper_1 | 2016-02-01 23:14:44,765 [myid:] - INFO  [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxn@1007] - Closed socket connection for client /172.17.0.6:47434 which had sessionid 0x1529f108fe10085

gardnervickers23:02:42

I just deleted all my docker images, generated a fresh template and followed the steps to get that working. I’ll be back in a few hours if you’re still having problems just write them here and I’ll get to it. Thanks for helping me debug this

sirsean23:02:01

Thanks for your help!

2016-02-01

Channels