Fork me on GitHub
#onyx
<
2016-08-26
>
jeroenvandijk08:08:04

Hi all, we try to get the onyx-cheatsheet to run locally, but no luck so far. Clojurescript errors etc. Is the README up to date?

lucasbradstreet09:08:53

Hi @jeroenvandijk. It should work, though I remember there being some problems viewing the page if you accidentally use the Jekyll index.html meant for our website. I'll give it a go when I'm on a computer shortly. I'd clojurescript throwing an error on compile, or is it an error loading the page?

jeroenvandijk09:08:47

yeah it starts with an assertion error on :source-map "resources/public/js/app.js.map” after setting it to :source-map true it does compile

jeroenvandijk09:08:13

the (browser-repl) command gives IllegalArgumentException No value supplied for key: weasel.repl.websocket.WebsocketEnv@49ac47e0 clojure.lang.PersistentHashMap.create (PersistentHashMap.java:77)

jeroenvandijk09:08:53

and when I open localhost:10555 I see in the console something about :asset-path

lucasbradstreet09:08:56

Browser repl is likely broken. I haven't used it in a long time. I mostly just use figwheel when I'm working on the cheat sheet

jeroenvandijk09:08:20

When I “fix” that,`onyx_cheatsheet.main` cannot be found 🙂

jeroenvandijk09:08:34

Is there another way to generate the documentation?

lucasbradstreet09:08:22

All sounds pretty broken. The way we do it is to pull the http://onyx-platform.io repo, and then run build-site.sh which will build the latest cheat sheet

lucasbradstreet09:08:32

You may have noticed while looking at the project.clj that advanced compile is currently not used too

jeroenvandijk09:08:08

yeah I’m not up to date with the latest cljsbuild settings so I wasn’t sure what is good and bad

jeroenvandijk09:08:23

Is the build-site.sh in one of your repo’s?

jeroenvandijk09:08:24

ah ok found it, thanks 🙂

lucasbradstreet09:08:15

The whole thing could do with some more love but I guess we're doing OK overall

jeroenvandijk09:08:55

yeah well this is good enough for us I think. If we find fixes we’ll let you know

robert-stuttaford09:08:40

@lucasbradstreet just confirming that :zookeeper/address needs to include the port for each server in the csv? s1:2181,s2:2181 vs s1,s2

lucasbradstreet09:08:21

If I remember correctly, it’ll default to 2181 if you don’t provide them. I’ve always just tried to be safe and include the ports too

robert-stuttaford09:08:25

INFO [onyx.log.zookeeper] - Starting ZooKeeper client connection. If Onyx hangs here it may indicate a difficulty connecting to ZooKeeper. INFO [onyx.log.zookeeper] - Stopping ZooKeeper client connection

robert-stuttaford09:08:35

so, unlikely to be the lack of ports

lucasbradstreet09:08:46

yep, looks ok then

robert-stuttaford09:08:30

mmm. i can telnet from an Onyx node to a ZK node address that Onyx is being given on 2181, so it's open. is this a fair test to ensure O can reach ZK?

lucasbradstreet09:08:46

Yep, that’s usually my standard goto

robert-stuttaford09:08:06

so, if that works, but i get the above failure, what else could it be?

lucasbradstreet09:08:16

you can also write “ruok” into telnet

lucasbradstreet09:08:22

which always cracks me up

lucasbradstreet09:08:55

Where’s the above failure?

robert-stuttaford09:08:56

i don't think it's a case of too little ZK; it's a 3 x c4.large cluster

robert-stuttaford09:08:01

when i submit jobs

lucasbradstreet09:08:39

Are you seeing any exceptions? The only logging / failure I can see is those two info lines which look OK

robert-stuttaford09:08:05

just busy running the jar directly now to see if i get anything

robert-stuttaford09:08:52

tailing zk's log i only see logging for conns from 127.0.0.1

lucasbradstreet09:08:31

So you submit a job, and nothing happens, is basically the summary?

robert-stuttaford10:08:00

and then eventually the ZK timeout messages

robert-stuttaford10:08:10

i don't have any peers running yet

lucasbradstreet10:08:45

Which timeout messages? I don’t think you pasted any of those

robert-stuttaford10:08:28

hold on. maybe i'm being impatient.

robert-stuttaford10:08:01

if i see 'stopping zookeeper' after 'starting zookeeper. if hang, conn issues', and then i get a job-id back, then that's success, right?

robert-stuttaford10:08:14

takes 3 minutes to do though

lucasbradstreet10:08:23

That is kinda long. Is the ZK conn remote?

lucasbradstreet10:08:32

Normally we’d see on the order of a second or two

lucasbradstreet10:08:42

Are you submitting a lot of data as part of your job definition?

robert-stuttaford10:08:07

it's a fairly large catalog / workflow / flow-cond, yes

robert-stuttaford10:08:34

we're also reading the datomic log to determine a start position for the read-log catalog entry which takes some time

robert-stuttaford10:08:48

ok. make it right, make it fast, make it pretty. i'm happy that i've got the first one done!

lucasbradstreet10:08:25

Yeah, you can start to move some of that data out of the job data and load it via before-task-start. @michaeldrogalis is keen to get some of these chunks in S3 soon, which should help there a bit

robert-stuttaford10:08:23

looks like all that time is in building the job - which probably means slow Datomic start up. i'll dig

robert-stuttaford10:08:00

another 3 minutes to boot the peers up, which is the same op - getting that start-t for the catalog

robert-stuttaford10:08:26

2016-08-26 09:57:52.934 INFO - Starting Jobs for :onyx/tenancy-id highstorm-prod-be112dc3b065dba1065d795a4776c6ef41c73e5e 2016-08-26 10:00:36.373 INFO - :read-log Start t 1000 ( 26831 behind ) tx 13194139534312 #inst "2013-01-15T00:00:00.000-00:00"

robert-stuttaford10:08:23

-checks DDB read provisioning-

lucasbradstreet10:08:44

weird that it takes just as long to load the data then startup

robert-stuttaford10:08:53

i'm starting to understand this stuff

robert-stuttaford10:08:32

the memcached cluster should have covered that second one, but perhaps the operation we're doing doesn't benefit from the cache

robert-stuttaford10:08:59

ok. so there's the answer. throttled DDB reads

lucasbradstreet10:08:53

@jeroenvandijk I got interactive dev with the cheat sheet going again. You should be able to pull and follow the README to make it work now (after doing a lein clean just to be safe)

lucasbradstreet10:08:11

possibly from the original building of the job?

lucasbradstreet10:08:27

which causes it to be throttled when you start the job

robert-stuttaford10:08:40

it's the code we use to find the right starting t

robert-stuttaford10:08:08

which means looking for the newest :highstorm/processed in the :vaet index

robert-stuttaford10:08:58

we're going to have to rewrite that code to scan d/log backwards from the present moment

robert-stuttaford10:08:29

we're taking that hit 4 times - 3 peers and 1 job submit

robert-stuttaford10:08:05

@lucasbradstreet forgive me, we may have had this conversation before. is there a top-level metric where we can graph active peers over time?

lucasbradstreet10:08:04

There are metrics for when peers come online and go offline, but you can't just sum each and get a final figure, because sometimes peers will go offline by crashing and won't write their metric. This is something that'll be easier to do once we have the new query server be able to be inbuilt in the peer group. Then you'll be able to query the nodes to see how many peers you think each has. If still won't be easy to push this data to metrics since it'll be more pull based though. Short answer is no, not without grabbing from a peer query server that is in libonyx / 0.9.10

jeroenvandijk12:08:13

@lucasbradstreet Wow thanks, that’s quick

eelke12:08:24

Yes, thank you!

robert-stuttaford13:08:48

your answer shows the great forethought and planning you guys put into this, as always 🙂

lucasbradstreet14:08:42

One downside of being masterless is that you don't have a single place to report this stuff from. We've got solutions in the works which will be even better though :)

aaelony20:08:20

Suppose you have a several event-filtering jobs and several more windowed-aggregation jobs, do you put them in the same repo? what are people's thoughts around what constitutes an organized code structure for Onyx jobs? any thoughts welcome

michaeldrogalis20:08:16

For what it's worth, internally we're essentially building a very large Onyx system, and we're satisfied with how task bundles are scaling to high numbers of tasks, organization-wise.

aaelony20:08:22

So, one big repo with many different types of jobs (in various states of development) ?

michaeldrogalis20:08:02

Probably heavily depends on your use case -- for us it makes sense.

michaeldrogalis20:08:22

We're also big users of lein-voom to make multi-repo work feel single-repo.

aaelony20:08:25

yeah, collecting thoughts at this point

aaelony20:08:41

haven't used lein-voom, will google it

aaelony21:08:41

I'm noticing that a lot of my flow-condition predicates are similar.. e.g.

(defn event-a? [event old-segment new-segment all-new-segments]
  (= :a (:event-type new-segment)))
  
(defn event-b? [event old-segment new-segment all-new-segments]
  (= :b (:event-type new-segment)))  
I could write a macro, but kinda wish there was a way to use partial to reduce the boilerplate instead... maybe there's a more elegant way?

michaeldrogalis23:08:24

@aaelony Flow condition predicates can take args, see the docs