This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-07-29
Channels
- # admin-announcements (2)
- # beginners (10)
- # boot (253)
- # cider (11)
- # cljs-dev (26)
- # cljsjs (21)
- # cljsrn (7)
- # clojure (87)
- # clojure-berlin (13)
- # clojure-dusseldorf (5)
- # clojure-greece (7)
- # clojure-poland (11)
- # clojure-russia (189)
- # clojure-spec (31)
- # clojure-uk (86)
- # clojurescript (89)
- # cursive (15)
- # datavis (2)
- # datomic (57)
- # devcards (3)
- # dirac (92)
- # editors-rus (3)
- # emacs (4)
- # events (1)
- # funcool (30)
- # hoplon (3)
- # jobs-rus (6)
- # leiningen (1)
- # luminus (12)
- # mount (25)
- # off-topic (5)
- # om (43)
- # onyx (41)
- # perun (1)
- # proton (2)
- # protorepl (7)
- # re-frame (17)
- # reagent (34)
- # ring (13)
- # specter (1)
- # spirituality-ethics (1)
Morning.
ETL with Clojure, recommended approaches and tools? Go!
Hmm, the last ETL work I did in clojure we rolled our own. But that was about 4 years ago, so there may be better tools available now
@korny: I was looking at Onyx
but Spark and Flambo might work too
I'm not sure I need this level of scalability as I haven't even won contract with client yet but just looking for things to investigate around creation of a 'data lake'...whatever that means to this particular org!
From what little I know atm it's likely to be using a mixture of calls to SOAP API end points and raw JDBC connections to extract.
I'm thinking simply dumping data to S3 for storage at rest but might be HDFS
@glenjamin: cool. Will investigate that stack. I've briefly looked at Lambda but tbh it feels like overkill from what little I know.
This feels more like a batch processing exercise tbh. However, I'm really open to anything as I'm investigating solutions to understand their advantages and constraints but the initial bid will be for a discovery consultancy piece to identify problems and solution spaces. I hope (if I win work) to follow that up with a bid for solving one or more of the problems.
@benedek: @glenjamin looks like I've got plenty of homework to do! Thanks guys.
Of course it could all be moot if I don't win the work!
Is kinesis better than kafka? I never feel comfortable using AWS products as I worry about hosting lock-in.
I guess, in what regards is Kinesis an improved product over Kafka? Features? Performance? I don't entirely know what I'm looking for.
@dominicm: One distinction is that Kinesis only keeps 24 hours of events. I don’t think Kafka has any limit other than disk storage
> Data records are accessible for a default of 24 hours from the time they are added to a stream. This time frame is called the retention period and is configurable in hourly increments from 24 to 168 hours (1 to 7 days). For more information about a stream’s retention period, see Changing the Data Retention Period.
Also at my last job we used Kinesis as part of our email sending pipeline. Twice during a deploy it decided to reset the marker to the previous day, resulting in 20k emails being sent to customers 😕
It was probably a bug in the amazonica consumer code we were using.. but still, that shouldn’t happen!
oh well, you can easily end up with something like this in the kafka world too. we had soemthing similar when the ping timout (or similarly named config property) was set to too low for the zookeeper cluster we used for our kafka installation
this is more like a characteristic of this architecture i think… (not meaning you are bound to have such ‘bugs’ but you have to prepare for this kind of situations…)
the voice of experience 🙂 definitely will factor in the events pipeline going nuts next time I work on batch/stream processing
yeah after such ‘hiccups’ we built in some replayability and with things like emails (customer facing stuff): we basically send a warning 15 mins before the real email where you can easily block the real emails going out
Slightly different kettle of fish, but I’d always turn to SQS before reaching for a pub/sub like Kafka + Kinesis. Much more of a known quantity, and far more reliable.
the biggest gain is not having to mess around with the operational setup/running of kafka
re: idempotency - definitely - we’re finding all sorts of cases where JMS brokers decide to re-play messages when the network goes screwy. Idempotency can be tricky to handle though.
So, datomic has some nice features with allowing you to attach an event id to a transaction
Could a k/v store with the event id be stored in something like Cassandra or Dynamo be useful for that too?
Kafka can work well for this too.
However, I don't think the thing I'm going to be looking at is an 'real time' streaming problem
@korny: yeah been looking at it couple of days ago but haven't used it yet.
I’m not sure I’d use it for infrastructure automation, there are better tools out there for getting Amazon to behave - my team are using Terraform and seem to think it’s pretty good. But for fiddling with infrastructure quickly, it’s neat. I’m looking at getting Jepsen to fiddle with security groups, and it looks like it’ll be nice and simple.
Train O'Clock
My girlfriend just came downstairs to tell me she's just had a nap, and is wide awake
Hello.
Anybody here working for the Daily Mail?
I applied at Mail Online, their interview process is weird, didn't do the technical test as I got another job. That's as far as I went
@pupeno: Also applied and accepted their offer, but I haven't started yet. @xlevus what did you find weird if you don't mind sharing it ?
at one point it was just "Potato?" "I don't understand, can you elaborate" "Well, do you... potato?"
I found the rating email I had - definitely no potato here (and it was 1 to 5). But maybe different teams have different questions
@dominicm: what do you mean?
I was approached by them, so, I’d like to know what’s it like to work there.
Ah… ok 🙂