Fork me on GitHub

Greetings from West Sussex


Looks like a beautiful start to the day


I know it doesn't help the folks outside London but good talks at SkillsMatter last night. SkillsCasts should be linked off this page soon if you missed it.


Off topic: I saw a tweet from a guy I follow who promotes tech in north of England suggesting that tech startups should not abandon London for Frankfurt and Amsterdam but should head North instead. I live in the NE and I can't imagine why any startup whose leaving London because of #Brexit would possibly want to go north unless they keep going to Scotland!


Basically, I'm resigning myself to never working from home again!


That's really depressed me for the day! I've already resigned myself to never getting paid for writing Clojure, I've written off any idea of ever having a little startup in NE....someone give me some good news!


I suppose one good thing is if I did set up a startup there will be a mass of cheap real estate to use for offices...including a massive empty car manufacturing plant two miles from my house when Nissan closes desperately searching for silver lining


Just want to announce that the Call for Papers for Clojure eXchange 2016 in London is open. If you have an idea for a talk, a library you've contributed too you want to discuss, experiences around your use of Clojure/Clojurescript or community work you want to showcase please apply here:


@agile_geek: 6 talk proposals like last time? 🙂


morning… and thank you for the links. Looks like it was a good session, shame I couldn’t come


and now I also need to think about a talk for ClojureEx….


@agile_geek: something needs to happen to regalvanise parts of the north. Dare I say the NE has a different problem to the NW? Mancester has a funky tech scene and some lovely buildings


agile_geek: at a personal level, I’m working remotely for a Chicago-based firm at the moment


the current £/$ rate probably makes that more desirable for them


country-wise, can’t see the North doing well if London suffers


I suppose it’s still cheaper up here


@jonpither: I absolutely agree. The needs of the NE and NW are completely different as are the resources available and skills.


The NW has a much healthier tech culture. The NE has a very healthy but small 'digital' community (dynamic website's and digital agencies) but no really innovative companies. There are two big government dept's tho, HMRC and DWP


There’s not that many large companies in Manc are there?


Leeds does reasonably well, but there’s only a handful of large employers. Sheffield is similar-ish, but probably with slightly less of each type


IBM has an office in Manchester, part of the UK Labs


and Manchester Uni of course was very influential in the 50’s with the first generation of computers


HP and Accenture have delivery centres in Newcastle to service DWP, HMRC and P & G


but it's body shopping consultancy.


Back on Clojure for a minute....yes I know, what are we discussing Clojure for in a Clojurians slack channel? Anyone, apart from Mastodon C, using Onyx for ETL (see @michaeldrogalis's blog on 'Onyx: A new data bridge' ). I think this has real potential as just unlocking and cleansing data is a big obstacle to a lot of companies seeking insight.


extract transform load ?


@dominicm: Extract, Transform and Load


Wassat then


Shifting data from one source to another


data-warehousing type stuff traditionally


Oh. Data processing pipelines. So I haven't, yet.


Extract data from data source a - transform it and ship it to b


But I've been experimenting a lot with CQRS/ES lately, and it seems like a perfect fit.


ETL has been a lot of my career for 28+ years. Before the internet I was doing this with COBOL batch programs


My 1 year of career has been a fair amount of ETL too. 😉 I think a lot of systems are based on it.


It gets a fancy term now...Data Engineering!


I only understood it as "Data transformation pipeline" I think of it as the threading macro but distributed. 😛


Personally I'm more of a Data Hod Carrier than a Data Engineer


@dominicm: conceptually yes that's a reasonable analogy. Although you can split and combine 'pipelines'


Yeah, I saw that. It's really cool. Although, I am wondering about async pipelines (starting a thread and pushing out to a pipeline later.) Maybe that's just a new source though. I'm not sure.


You can write/read from/to Kafka or other messaging infrastructure or even just use core.async plugins to read/write channels


Is it okay for some things to be sync out of the same pipeline though? I guess you just get two inputs to the next step? One from kafka/core.async and one from the previous processor.


I do wonder if @michaeldrogalis and @lucasbradstreet are going to re-implement the Java API one day as as soon as they put this back it may be adopted in the way Storm was.

dominicm10:07:21 Yuppiechef are using Onyx btw for this.


@dominicm: If you split the workflow into two, one sync and one async, and then recombined in a later task. My understanding is the combiner task would expect the same shaped input that could come from either the sync or async tasks although I guess you could write that task to conditionally process either. I suspect if the two halves are doing different stuff it's two workflows that reuse tasks.


is onyx a good fit for (general) ETL ? it has stream-based aggregation capabililties, but it doesn't have general purpose join or aggregation


(i'm a very happy user of onyx, but for stream processing, not ETL)


@agile_geek: Yeah, they'd eventually create the same shaped data. Just sometimes I need to wait on a HTTP request, and there's no point in not processing the next command in the meantime.


@mccraigmccraig: good point. Aggregating multiple heterogenous data sources would be challenging I guess. I've only read about onyx and played with learn-onyx tutorial so defer to your greater experience


afaik spark is the best fit for general ETL atm ... @otfrom is probably a lot more up to date on this stuff than me tho


@mccraigmccraig: so why use Onyx instead of Storm for stream processing?


onyx is all data+functions all the way down (no gnarly macros), and it's really easy to deploy a cluster, and there aren't any hard clojure-version dependencies... i tried out both onyx & storm at the beginning of this project and had a much nicer time with onyx


Cool. I've only used Storm in anger, and then only from Java using Trident library.


iirc storm has a bunch of cool stuff in it which isn't there in onyx yet (and perhaps vice-versa too)... trident has a cool message tracking algo for just-once doesn't it ?


spark, general etl, good. ugh


I had a hard time with Storm and the devops side of it


still recovering


onyx devops has been a breeze for me


it's ZK based though, so if you aren't already using ZK it may be more painful


I may not understand onyx correctly. The distributed nature, does it cause parts of a pipline (say a fn processor) to be run in parallel in different places? Or does each part of the pipeline get distributed?


@dominicm: is a good explanation - onyx looks after running the tasks on virtual-peers and shipping data between tasks - virtual-peers may be co-located on a host or not, you mostly don't need to care


(i can't remember if there is any stuff to help when you do need to care)


Looks like it can run in parallel, but you can also do :onyx/max-peers 1


I wonder how it decides what to run and when.


I guess that's the magic


@dominicm: ask mike or lucas in #C051WKSP3 - that's another plus with onyx - the core team is fantastically responsive


That's really helpful, thanks 🙂.


Yeah. Mike and Lucas have both responded to my questions in minutes. I even found a bug for them.


@mccraigmccraig: missed your question about Trident earlier. Yes, it has a number of classes for state management and things like Kafka spouts that provide guarantees for at-least-once, once-only, etc. delivery. They depend on writing batch id info to Zookeeper (from memory). Oh, yes. Trident introduces a form of micro batching to Storm hence batch id.


Marceline wraps Trident for nicer Clojure interop Although I find it a bit perverse that Clojure code (Marceline) wraps Java code (Trident) wraps Clojure code (Storm).


@agile_geek: clojure->java->clojure ... umm ok ... that sounds byzantine