clojure-uk 2018-05-10 | Slack Archive

My request to JFrog to mirror Clojars in JCenter repository seems to be done (I am on hols, so will test next week). This mirror should give Clojure developers behind corporate firewalls access to all the Clojure libraries that aren't pushed to Maven Central.

👍 12

dominicm10:05:52

neat

maleghast11:05:11

Hello All 🙂

danielneal11:05:25

hello!

maleghast12:05:15

How is everyone doin'?

danielneal12:05:25

not bad thanks how bout yourself

maleghast12:05:41

Pretty good thanks 🙂

danielneal12:05:52

what you working on today?

yogidevbear13:05:16

It must be top secret 😉

danielneal13:05:24

😂

danielneal13:05:56

the grand old duke of york he had 10000 databases he migrated them all to the origin and he migrated them down again 🎶

😂 12

alexlynham13:05:45

sorry to crash in as usual with offtopic questions

alexlynham13:05:47

but

alexlynham13:05:01

anybody used sparkling or flambo for spark stuff?

alexlynham13:05:28

I can't be fucked to learn scala and how to do this shit in azure in the same week, and I can probably win one argument but not both

thomas13:05:32

morning btw

alexlynham13:05:06

haha yes, morning

thomas13:05:45

joys of days off 😉

mccraigmccraig13:05:59

@alex.lynham you probably want @otfrom or @jasonbell to grumble at you

alexlynham13:05:10

any opinions or grumbling muchly appreciated ¯\(ツ)/¯

alexlynham13:05:36

I'm just a linked data boy in a big data world

jasonbell13:05:40

@alex.lynham I've used Sparkling a fair bit. Really like it.

jasonbell13:05:50

And I agree, not worth learning Scala for.

jasonbell13:05:50

If its just for core Spark then it's perfect, if it's beyond that then you'll have to see if it's been updated as the streaming wasn't implemented (Flambo had it but it was usable but not perfect)

jasonbell13:05:19

@mccraigmccraig Grumble, it's like I have a reputation 🙂

alexlynham14:05:53

> as the streaming wasn't implemented

alexlynham14:05:14

assume that I haven't used spark before - why would this sway me one way or the other?

mccraigmccraig14:05:02

@alex.lynham if you only want to use spark for batch queries then you won't mind if streaming isn't supported in your clojure lib... if you want to use it for stream-processing (something like windowing over a conceptual endless stream of data) then you might want spark streaming support

alexlynham14:05:46

riiiiiiiight okay

alexlynham14:05:03

well at the moment I think the only data we have will be static or batch anyway

alexlynham14:05:15

but we will want to support streaming eventually

mccraigmccraig14:05:24

but, depending on what you are wanting to do, you might also want to look at https://kafka.apache.org/documentation/streams/ for stream processing

mccraigmccraig14:05:41

or https://github.com/onyx-platform/onyx

alexlynham14:05:10

yeah

alexlynham14:05:34

so the thing I don't fully understand about spark is why I don't just use kafka and then use clojure to interact with the streaming API

alexlynham14:05:23

(I get that e.g. databricks or spark python API is more data scientist friendly)

alexlynham14:05:30

but what's the USP of spark?

mccraigmccraig14:05:50

spark-batch is a good fit for us for ad-hoc queries against cassandra. spark-streaming might look good to me if i wanted to avoid adding another component

mccraigmccraig14:05:37

but i'm currently more interested in the streaming-as-a-lib deployment model offered by kafka-streams... spark-streaming doesn't seem very shiny to me atm

alexlynham14:05:07

> streaming-as-a-lib deployment model because rather than being standalone it's something you can interact with inside of your app in clj?

mccraigmccraig14:05:37

no, for straightforward deployments and monitoring. we're currently using onyx, and you run a bunch of onyx peers as processes, and then submit jobs... stuff like configuring your jobs is a bit painful and has to use a different mechanism to other components (like our api)

mccraigmccraig14:05:24

and because the peers are onyx processes, i can't just run a healthcheck listener in each process, and link that to dc/os health monitoring

mccraigmccraig14:05:47

streaming-as-a-lib makes life a lot simpler

jasonbell14:05:18

@alex.lynham for batch then Sparkling would be fine.

jasonbell14:05:58

@alex.lynham I've blogged a fair bit about it over the years (yes I just said that) https://dataissexy.wordpress.com//?s=sparkling&search=Go

🔖 4

alexlynham14:05:29

so my situation is I'm one of maybe three engineers with a data background (engineering is circa 20 plus contractors) in engineering, data science is mostly people who are used to using SQL and Power BI, data engineering (separate function) is almost entirely outsourced to a company that... so far as I can tell aren't the most up-to-date

alexlynham14:05:16

but it means we're at once having to deal with engineering inexperience and lack of resource as well as needing v user friendly stuff at the other end... which I think is where the spark notion has come from

alexlynham14:05:35

@jasonbell oh shit, you're that jason bell? oh right yeah I've read your blog, it's really useful

🙂 4

😂 8

yogidevbear14:05:37

It's "Jade Bell", not "Jason" (that was aimed at @jasonbell 😉)

👍 4

mccraigmccraig14:05:09

ohhhh, that's the famous jade !

🙂 8

yogidevbear14:05:14

Yup

yogidevbear14:05:28

(ducks down behind a wall)

jasonbell14:05:31

THAT'S IT I'M CHANGING MY NAME TO TAYLOR SWIFT

jasonbell14:05:39

You lot are gold! 🙂

yogidevbear14:05:41

🎵

jasonbell14:05:20

@alex.lynham <<oh shit, you're that jason bell? oh right yeah I've read your blog, it's really useful>> Please can I put that in my ClojureX slides?

yogidevbear14:05:34

On an aside from the Jade question... I know someone that's hoping to put together a beginners kafka study group (most likely a remote thing). Can I voluntell anyone for the role of group mentor? (cough @mccraigmccraig cough @jasonbell cough)

alexlynham14:05:39

@jasonbell 100%, a few of your blog posts have been really really good for helping me spot dumb shit

mccraigmccraig14:05:57

haha, my kafka experience is narrow... @minimal is doing lots with kafka atm as well tho

alexlynham14:05:23

in fact I just realised why my json parser was doing something unexpected - because it's reading one character at a time, so probably somewhere I needed to explicitly call .getBytes

alexlynham14:05:29

so chalk up another one sir

jasonbell14:05:26

@yogidevbear getting me in a good mood isn't going to help. I'm up to the eyeballs at the mo.

yogidevbear14:05:11

No worries, figured it was worth asking 😉

yogidevbear14:05:34

Craig has thrown @minimal under the bus for you both anyway 😆

yogidevbear14:05:18

Chris, feel like mentoring a group of people interesting in learning kafka?

minimal14:05:34

@yogidevbear that makes the assumption that I know kafka 😬

yogidevbear14:05:30

Well... do you? 🙂

jasonbell14:05:58

@alex.lynham yes messages are byte arrays and need deserialising

minimal14:05:29

I would say I know people that know

yogidevbear14:05:55

@otfrom that will leave the real Taylor Swift feeling rather jaded

jasonbell14:05:15

Oh BRAVO!

otfrom14:05:44

alexlynham14:05:28

yeah I've realised that I'm comp ing together elements at producer and consumer end and on the one that uses nippy vs json the getBytes call is partialed, and in the JSON one it's not. d'oh

yogidevbear14:05:56

Lol, sorry, I couldn't resist that one 🙂

jasonbell15:05:03

🙂

cddr15:05:17

I find myself wishing Mach had an "install" verb. Does that mean I'm doing it wrong

2018-05-10

Channels