This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-05-10
Channels
- # admin-announcements (4)
- # aleph (1)
- # beginners (29)
- # boot (112)
- # braveandtrue (1)
- # cider (44)
- # cljs-site (1)
- # cljsjs (2)
- # cljsrn (1)
- # clojure (46)
- # clojure-gamedev (3)
- # clojure-germany (1)
- # clojure-nl (1)
- # clojure-norway (1)
- # clojure-russia (20)
- # clojure-sg (2)
- # clojure-uk (14)
- # clojurescript (228)
- # cursive (41)
- # datascript (5)
- # datomic (17)
- # editors-rus (48)
- # emacs (3)
- # flambo (1)
- # hoplon (9)
- # jobs (2)
- # kekkonen (1)
- # lein-figwheel (1)
- # luminus (5)
- # mount (11)
- # nrepl (3)
- # off-topic (7)
- # om (12)
- # onyx (139)
- # other-languages (54)
- # planck (1)
- # proton (17)
- # re-frame (37)
- # remote-jobs (1)
- # rethinkdb (9)
- # ring (2)
- # ring-swagger (6)
- # test-check (1)
- # uncomplicate (8)
- # untangled (2)
I am wondering about the static analysis module. Reading the blogpost and some of the code it feels like it is just implemented as Schema with better errors. I might copy this approach somewhere else if it can be that simple. Would you consider this to be a solution applicable to all Schema’s (recursive, conditional)? Any pitfalls?
@jeroenvandijk: yes, that's pretty much right. I've used onyx.static.analyzer/analyze-error in a non onyx project to give better schema errors, though it didn't have all the pretty printing. I had to provide my own error implementation for recursive schemas there
Ah very nice
We would like to extract this work out for use by other projects, including the pretty printing. I'm not sure when we'll get around to it yet
Sounds like I would want to use it. Thanks for explaining 👍
Any time
Anyone ever had an error with 'No space left on device' errors when running onyx-starter Docker image?
java.io.IOException: No space left on device
clojure.lang.ExceptionInfo: Error in component :messaging-group in system onyx.system.OnyxPeerGroup calling #'com.stuartsierra.component/start
component: #<Aeron Peer Group>
function: #'com.stuartsierra.component/start
reason: :com.stuartsierra.component/component-function-threw-exception
system: <#C051WKSP3>.system.OnyxPeerGroup{:config {:zookeeper/address "127.0.0.1:2188", :onyx/tenancy-id #uuid "1ca95f9d-70d4-4275-b369-386bda88f7dc", :onyx.peer/job-scheduler :onyx.job-scheduler/balanced, :onyx.messaging/impl :aeron, :onyx.messaging/peer-port 40200, :onyx.messaging/bind-addr "localhost"}, :logging-config #<Logging Configuration>, :messaging-group #<Aeron Peer Group>}
system-key: :messaging-group
@acron From onyx-template? This can usually be solved by starting the container with a bigger --shm-size
@lucasbradstreet: I'm not familiar with that, what's the switch?
Ah, ok, I am not using this one: https://github.com/onyx-platform/onyx-template/blob/0.9.x/src/leiningen/new/onyx_app/Dockerfile
When you docker run, you can supply shm-size to increase the amount of shared memory space, which is what aeron uses to maintain the messaging logs
I'm using this one: https://github.com/onyx-platform/onyx-starter/blob/0.8.x/Dockerfile
Ah is there an onyx starter docker image?
Right ok, you can use --shm-size there too. I didn't realise onyx-starter had a Dockerfile
Yeah, onyx-starter is mostly for spinning it up and playing. onyx-template is the recommendation if you're building a project
Ok, so the --shm-size switch seemed to fix it but I will move over to using the onyx-template version anyhow thanks
Aside from onyx-template, are there any examples of projects for "dumb peers"? I.e, none of the job starting code, or would that just be a case of start-peer-group
and start-peers
?
Yep, the scripts and start_prod_peers.clj is where you want to look
Starts the aeron media driver, which is essentially a user land tcp stack
Not quite TCP, but you get the idea
I hate dumping STs in Slack, but I'm getting this error when trying the onyx-template with docker-compose up
org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /brokers/topics/meetup
peer_1 | code: -101
peer_1 | path: "/brokers/topics/meetup"
peer_1 | clojure.lang.ExceptionInfo: Caught exception inside task lifecycle.
Lots of Zookeeper errors too:
zookeeper_1 | 2016-05-10 13:11:24,117 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1549ac8bbe7001c type:create cxid:0x2 zxid:0x1e4 txntype:-1 reqpath:n/a Error Path:/onyx Error:KeeperErrorCode = NodeExists for /onyx
zookeeper_1 | 2016-05-10 13:11:24,127 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1549ac8bbe7001c type:create cxid:0x3 zxid:0x1e5 txntype:-1 reqpath:n/a Error Path:/onyx/1 Error:KeeperErrorCode = NodeExists for /onyx/1
zookeeper_1 | 2016-05-10 13:11:24,138 [myid:] - INFO [ProcessThread(sid:0 cport:-1)::PrepRequestProcessor@645] - Got user-level KeeperException when processing sessionid:0x1549ac8bbe7001c type:create cxid:0x4 zxid:0x1e6 txntype:-1 reqpath:n/a Error Path:/onyx/1/pulse Error:KeeperErrorCode = NodeExists for /onyx/1/pulse
@acron: I wonder if that example is broken if you didn't create the template with +docker +metrics
How did you create from the template?
I did use those switches, plus a colleague of mine followed the same instructions and it works for him 😕
Can you verify that the container that curl
’s to kafka is getting stuff onto the kafka topic?
Occassionally it sits in a loop with this:
16-May-10 13:38:08 0f6b3fbefc7c INFO [onyx.peer.task-lifecycle] - Job 50e3666f-cbd4-4eac-8e3c-e38765976b18 {} - Task 24c2fba9-bab6-48f0-a4be-bcddc9d83c1c :write-lines - Peer e17f371a-8fa5-448d-81be-392d200eedf0 - Not enough virtual peers have warmed up to start the task yet, backing off and trying again...
peer_1 | 16-May-10 13:38:08 0f6b3fbefc7c INFO [onyx.peer.task-lifecycle] - Job 50e3666f-cbd4-4eac-8e3c-e38765976b18 {} - Task ed15a04d-bd7d-4f11-a2ac-138e1c5bb2f8 :prepare-rows - Peer 8afa0c4d-c09b-4cd4-9cdc-0fdd9ca2b16c - Not enough virtual peers have warmed up to start the task yet, backing off and trying again...
peer_1 | 16-May-10 13:38:08 0f6b3fbefc7c INFO [onyx.peer.task-lifecycle] - Job 50e3666f-cbd4-4eac-8e3c-e38765976b18 {} - Task 0f0133df-6bc4-46bb-bf54-6d2c3ecfbb62 :extract-meetup-info - Peer 854c5c22-a43c-4fe9-85b1-69c96c98fb8f - Not enough virtual peers have warmed up to start the task yet, backing off and trying again...
peer_1 | 16-May-10 13:38:08 0f6b3fbefc7c INFO [onyx.peer.task-lifecycle] - Job 50e3666f-cbd4-4eac-8e3c-e38765976b18 {} - Task ed15a04d-bd7d-4f11-a2ac-138e1c5bb2f8 :prepare-rows - Peer 0ba6f5c3-a35e-44e7-ad51-b4fb9ee39b8f - Not enough virtual peers have warmed up to start the task yet, backing off and trying again...
@gardnervickers: Not sure how I'd do that. I'm going to nuke everything and start again
There’s a tool called kafkacat
that’s a command line kafka consumer
should be something like kafkacat -C <docker-host>:9092 -t meetup -o beginning
They will use docker compose dns linking
Yea that’s just a string, submit job wont resolve anything for you
I thought not, but the onyx-template has it here: https://github.com/onyx-platform/onyx-template/blob/0.9.x/src/leiningen/new/onyx_app/src/onyx_app/launcher/launch_prod_peers.clj#L35
The env there is mostly used to startup BookKeeper, which should really be through a separate entrypoint
You can take it out if you’re not using BookKeeper, or separate it into its own launch ns
It’s an option if you don’t use windowing/state
If you don’t use state / windowing then you can just take that line out and you’ll be fine
Having real trouble with kafkacat: http://pastebin.com/MzK8ntTH
restart your docker machine
dns gets screwed up sometimes
the whole VM if your on mac
yea then your docker daemon
I know that when jumping between networks something my docker vm will not correctly resolve dns and I have to restart it
Nice, that was probably why you were not getting messages into kafka
couldnt resolve
There’s pretty much no error reporting around the curl->kafka
portion of the tutorial, it’s a very janky setup.
I’m working on improving that
Can you see with kafkacat that json is getting put onto your kafka topic?
kafkacat_1 | % Total % Received % Xferd Average Speed Time Time Time Current
kafkacat_1 | Dload Upload Total Spent Left Speed
0 0 0 0 0 0 0 0 --:--:-- 0:00:19 --:--:-- 0curl: (6) Could not resolve host:
Never used docker on linux, maybe restart your box?
can other containers resolve DNS?
@jeroenvandijk: The "static analyzer" bit refers to the fact that we're looking at the job at submission time and searching for semantic, not necessarily structural, errors.
Its not nearly as complicated as I made it sound. Should have chose my words better.
@michaeldrogalis: The end result looks fancy. Doesn’t matter if the implementation is not as complicated as one would expect, right? It sounds like a benefit to me
Hehe, yes 😄
I'd say in total it took about 2 focused weeks, 3 actual weeks. Hardest part was pretty printing the errors.
Hehe so not that simple
I’ll try if i can do something similar in another project
I started ripping it out into its own lib, but got bored after about 15 minutes. I need some time away from me. If you make any concrete progress, maybe Ill get excited about it again. Hehe.
Yep, i’ll let you know!
@gardnervickers: It must be me, with network errors
Yea that happens to me, restarting my VM is the only way I can fix it
maybe a machine restart is in order?
select count(*) from recentMeetups;;
+----------+
| count(*) |
+----------+
| 68 |
+----------+
1 row in set (0.00 sec)
👼vim vim haha
Hoorah!
Curiously, scaling up using docker-compose scale peer=2
seems to displease it, but scaling back to 1 starts it again
Hey folks working on a new template example, ditching the http://meetup.com api in favor of something a little more fun to play with.
+----+---------------+-------------+-------------+---------------+
| id | timespan | CountryCode | TotalTweets | AverageEmojis |
+----+---------------+-------------+-------------+---------------+
| 1 | 1462894900000 | US | 4 | 1 |
| 2 | 1462894910000 | PH | 12 | 12 |
| 3 | 1462894920000 | US | 12 | 4 |
| 4 | 1462894930000 | PH | 4 | 4 |
| 5 | 1462894940000 | US | 2 | 1 |
+----+---------------+-------------+-------------+———————+
Average emojis by country 😆
Just verified it on my machine. Works like a charm, worth an early look before we add more docs
Meetup's stream has been bursty for me
Makes sense though - that's live data for people actually subscribing to meet ups.
So I want to have 3-4 tasks that each take 1 input segment and output N results, then aggregate the results back into 1 segment. I can use a global window with "conj" aggregation and group by an internal ID. But I don't understand how to tell when it's "done." Does each task need to store how many segments it emitted, then count how many were received during the aggregation?
You would use a trigger on segment count
Yea, the trigger will just fire when N segments have come through the aggregation
Maybe I’m missing something
Currently you’re not able to do multiple sequential aggregations
You cant have an aggregation output to another task
I probably don't understand, but when the job is created, I don't have a value for :trigger/threshold to set . The segments may emit 1 or 20, I just want to catch it all.
Oh so your upstream is creating multiple segments from one segment?
Yes. 1 segment -> a layer with several tasks that each emit N segments -> 1 aggregation window
and you want to window by the root “1 segment"
Right, which I can handle with a group. The question is just how do I know when the group is done -- IE, all upstream tasks have run for it.
Each upstream task could emit a special "done" marker, then I create a trigger with a predicate that it's seen 4 markers? (if there are 4 upstream tasks)
Do the N segments have something in common, telling you they are from the origin segment?
I believe that if in your splitting task, you emit a segment as part of the fan-out segments that says that the root segment is done, and setup a predicate trigger on that to fire, that should work.
It’s important to have the 1 segment create the n-segments AND the done-signaling segment so that if any of them are dropped in flight, they are retried.
I would also set it up as a session window to group by job-id
are their any more recent benchmarks for onyx then this: https://michaeldrogalis.github.io/jekyll/update/2015/06/08/Onyx-0.7.0.html
No but the benchmark suite is available to run.
We don’t have anything public for the most recent version.
We don't post formal benchmark assessments as often because they take a while to run and write announcements for. Performance is somewhat higher than 0.7, but not by a lot. That's the current focus of our work with the next generation streaming engine.
Those numbers are already way better then we need. ok, Thanks for the update.
@michaeldrogalis: at a high level why target a tutorial at k8s over docker swarm?
@drewverlee: Kubernetes seems to be getting more mindshare is the primary reason.
As far as Onyx-specific things go, Kubernetes has interchangeable networking backends in the works which Onyx could utilize for lower-latency, not sure where Swarm is with this.
Also the scheduler is extensible which we might be able to take advantage of in the future