Fork me on GitHub
#clojure-uk
<
2016-07-06
>
yogidevbear07:07:48

Greetings from West Sussex

yogidevbear07:07:07

Looks like a beautiful start to the day

agile_geek07:07:41

I know it doesn't help the folks outside London but good talks at SkillsMatter last night. SkillsCasts should be linked off this page soon if you missed it. https://skillsmatter.com/meetups/7970-london-clojurians-meetup#overview

agile_geek07:07:54

Off topic: I saw a tweet from a guy I follow who promotes tech in north of England suggesting that tech startups should not abandon London for Frankfurt and Amsterdam but should head North instead. I live in the NE and I can't imagine why any startup whose leaving London because of #Brexit would possibly want to go north unless they keep going to Scotland!

agile_geek07:07:16

Basically, I'm resigning myself to never working from home again!

agile_geek07:07:14

That's really depressed me for the day! I've already resigned myself to never getting paid for writing Clojure, I've written off any idea of ever having a little startup in NE....someone give me some good news!

agile_geek07:07:38

I suppose one good thing is if I did set up a startup there will be a mass of cheap real estate to use for offices...including a massive empty car manufacturing plant two miles from my house when Nissan closes desperately searching for silver lining

agile_geek07:07:54

Just want to announce that the Call for Papers for Clojure eXchange 2016 in London is open. If you have an idea for a talk, a library you've contributed too you want to discuss, experiences around your use of Clojure/Clojurescript or community work you want to showcase please apply here: https://skillsmatter.com/conferences/7430-clojure-exchange-2016#get_involved

martintrojer07:07:29

@agile_geek: 6 talk proposals like last time? 🙂

thomas08:07:05

morning… and thank you for the links. Looks like it was a good session, shame I couldn’t come

thomas08:07:55

and now I also need to think about a talk for ClojureEx….

jonpither08:07:34

@agile_geek: something needs to happen to regalvanise parts of the north. Dare I say the NE has a different problem to the NW? Mancester has a funky tech scene and some lovely buildings

glenjamin09:07:16

agile_geek: at a personal level, I’m working remotely for a Chicago-based firm at the moment

glenjamin09:07:32

the current £/$ rate probably makes that more desirable for them

glenjamin09:07:00

country-wise, can’t see the North doing well if London suffers

glenjamin09:07:11

I suppose it’s still cheaper up here

agile_geek09:07:17

@jonpither: I absolutely agree. The needs of the NE and NW are completely different as are the resources available and skills.

agile_geek09:07:21

The NW has a much healthier tech culture. The NE has a very healthy but small 'digital' community (dynamic website's and digital agencies) but no really innovative companies. There are two big government dept's tho, HMRC and DWP

glenjamin09:07:53

There’s not that many large companies in Manc are there?

glenjamin09:07:45

Leeds does reasonably well, but there’s only a handful of large employers. Sheffield is similar-ish, but probably with slightly less of each type

thomas09:07:15

IBM has an office in Manchester, part of the UK Labs

thomas09:07:30

and Manchester Uni of course was very influential in the 50’s with the first generation of computers

agile_geek10:07:24

HP and Accenture have delivery centres in Newcastle to service DWP, HMRC and P & G

agile_geek10:07:40

but it's body shopping consultancy.

agile_geek10:07:21

Back on Clojure for a minute....yes I know, what are we discussing Clojure for in a Clojurians slack channel? Anyone, apart from Mastodon C, using Onyx for ETL (see @michaeldrogalis's blog on 'Onyx: A new data bridge' http://michaeldrogalis.tumblr.com/ ). I think this has real potential as just unlocking and cleansing data is a big obstacle to a lot of companies seeking insight.

quentin10:07:37

extract transform load ?

agile_geek10:07:45

@dominicm: Extract, Transform and Load

dominicm10:07:57

Wassat then

agile_geek10:07:06

Shifting data from one source to another

glenjamin10:07:19

data-warehousing type stuff traditionally

dominicm10:07:23

Oh. Data processing pipelines. So I haven't, yet.

agile_geek10:07:26

Extract data from data source a - transform it and ship it to b

dominicm10:07:41

But I've been experimenting a lot with CQRS/ES lately, and it seems like a perfect fit.

agile_geek10:07:11

ETL has been a lot of my career for 28+ years. Before the internet I was doing this with COBOL batch programs

dominicm10:07:50

My 1 year of career has been a fair amount of ETL too. 😉 I think a lot of systems are based on it.

agile_geek10:07:04

It gets a fancy term now...Data Engineering!

dominicm10:07:52

I only understood it as "Data transformation pipeline" I think of it as the threading macro but distributed. 😛

agile_geek10:07:58

Personally I'm more of a Data Hod Carrier than a Data Engineer

agile_geek10:07:55

@dominicm: conceptually yes that's a reasonable analogy. Although you can split and combine 'pipelines'

dominicm10:07:24

Yeah, I saw that. It's really cool. Although, I am wondering about async pipelines (starting a thread and pushing out to a pipeline later.) Maybe that's just a new source though. I'm not sure.

agile_geek10:07:22

You can write/read from/to Kafka or other messaging infrastructure or even just use core.async plugins to read/write channels

dominicm10:07:57

Is it okay for some things to be sync out of the same pipeline though? I guess you just get two inputs to the next step? One from kafka/core.async and one from the previous processor.

agile_geek10:07:39

I do wonder if @michaeldrogalis and @lucasbradstreet are going to re-implement the Java API one day as as soon as they put this back it may be adopted in the way Storm was.

dominicm10:07:21

https://yuppiechef.github.io/cqrs-server/ Yuppiechef are using Onyx btw for this.

agile_geek10:07:51

@dominicm: If you split the workflow into two, one sync and one async, and then recombined in a later task. My understanding is the combiner task would expect the same shaped input that could come from either the sync or async tasks although I guess you could write that task to conditionally process either. I suspect if the two halves are doing different stuff it's two workflows that reuse tasks.

mccraigmccraig10:07:41

is onyx a good fit for (general) ETL ? it has stream-based aggregation capabililties, but it doesn't have general purpose join or aggregation

mccraigmccraig10:07:19

(i'm a very happy user of onyx, but for stream processing, not ETL)

dominicm10:07:19

@agile_geek: Yeah, they'd eventually create the same shaped data. Just sometimes I need to wait on a HTTP request, and there's no point in not processing the next command in the meantime.

agile_geek10:07:56

@mccraigmccraig: good point. Aggregating multiple heterogenous data sources would be challenging I guess. I've only read about onyx and played with learn-onyx tutorial so defer to your greater experience

mccraigmccraig10:07:39

afaik spark is the best fit for general ETL atm ... @otfrom is probably a lot more up to date on this stuff than me tho

agile_geek10:07:43

@mccraigmccraig: so why use Onyx instead of Storm for stream processing?

mccraigmccraig10:07:53

onyx is all data+functions all the way down (no gnarly macros), and it's really easy to deploy a cluster, and there aren't any hard clojure-version dependencies... i tried out both onyx & storm at the beginning of this project and had a much nicer time with onyx

agile_geek10:07:25

Cool. I've only used Storm in anger, and then only from Java using Trident library.

mccraigmccraig10:07:54

iirc storm has a bunch of cool stuff in it which isn't there in onyx yet (and perhaps vice-versa too)... trident has a cool message tracking algo for just-once doesn't it ?

otfrom11:07:55

spark, general etl, good. ugh

jonpither11:07:00

I had a hard time with Storm and the devops side of it

jonpither11:07:14

still recovering

mccraigmccraig11:07:48

onyx devops has been a breeze for me

mccraigmccraig11:07:35

it's ZK based though, so if you aren't already using ZK it may be more painful

dominicm11:07:23

I may not understand onyx correctly. The distributed nature, does it cause parts of a pipline (say a fn processor) to be run in parallel in different places? Or does each part of the pipeline get distributed?

mccraigmccraig11:07:18

@dominicm: http://www.onyxplatform.org/docs/user-guide/latest/concepts.html is a good explanation - onyx looks after running the tasks on virtual-peers and shipping data between tasks - virtual-peers may be co-located on a host or not, you mostly don't need to care

mccraigmccraig11:07:36

(i can't remember if there is any stuff to help when you do need to care)

dominicm11:07:51

Looks like it can run in parallel, but you can also do :onyx/max-peers 1

dominicm11:07:05

I wonder how it decides what to run and when.

dominicm11:07:12

I guess that's the magic

mccraigmccraig11:07:32

@dominicm: ask mike or lucas in #C051WKSP3 - that's another plus with onyx - the core team is fantastically responsive

dominicm11:07:47

That's really helpful, thanks 🙂.

agile_geek12:07:42

Yeah. Mike and Lucas have both responded to my questions in minutes. I even found a bug for them.

agile_geek16:07:09

@mccraigmccraig: missed your question about Trident earlier. Yes, it has a number of classes for state management and things like Kafka spouts that provide guarantees for at-least-once, once-only, etc. delivery. They depend on writing batch id info to Zookeeper (from memory). Oh, yes. Trident introduces a form of micro batching to Storm hence batch id.

agile_geek16:07:24

Marceline wraps Trident for nicer Clojure interop https://github.com/yieldbot/marceline. Although I find it a bit perverse that Clojure code (Marceline) wraps Java code (Trident) wraps Clojure code (Storm).

mccraigmccraig20:07:27

@agile_geek: clojure->java->clojure ... umm ok ... that sounds byzantine