This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2017-03-09
Channels
- # beginners (22)
- # boot (80)
- # cider (6)
- # cljs-dev (5)
- # clojure (190)
- # clojure-berlin (5)
- # clojure-dev (24)
- # clojure-italy (14)
- # clojure-russia (70)
- # clojure-spec (39)
- # clojure-uk (82)
- # clojurescript (121)
- # clojurewest (1)
- # core-logic (2)
- # cursive (25)
- # datascript (186)
- # datomic (33)
- # dirac (266)
- # emacs (9)
- # gsoc (4)
- # hoplon (37)
- # immutant (34)
- # instaparse (22)
- # jobs (4)
- # juxt (6)
- # lambdaisland (2)
- # leiningen (1)
- # liberator (1)
- # luminus (5)
- # lumo (28)
- # off-topic (9)
- # om (23)
- # onyx (26)
- # other-lisps (1)
- # parinfer (39)
- # pedestal (45)
- # proton (1)
- # protorepl (10)
- # re-frame (18)
- # reagent (4)
- # ring-swagger (8)
- # rum (4)
- # specter (13)
- # test-check (14)
- # testing (1)
- # unrepl (164)
- # untangled (10)
- # yada (14)
@mccraigmccraig nothing wrong with 'stealing' ideas... @weavejester admitted 'stealing' from @luke who probably 'stole' them from someone else!
yogidevbear who says you missed it? https://skillsmatter.com/skillscasts/9820-enter-integrant-a-micro-framework-for-data-driven-architecture-with-james-reeves
Morning 😄 Thanks for the reminder @otfrom :thumbsup:
The team involved in organising all of these meetups and recordings rock. You guys are awesome 🎉
the recordings and hosting for the talks thx to skillsmatter. The organisation is done by many members of the community (a number of whom are here)
@mccraigmccraig Just remember: good artists copy, great artists steal!!!
@otfrom dyu use sparkling at MC ?
have you used the spark-sql support ?
we've got some jobs in vanilla sparkling, and they are getting more complex and it would be a lot easier to write them in a sql dialect
tho if I was going to do sql on stuff in s3 I'd probably look at Amazon Athena https://aws.amazon.com/athena/
it's out of c* so we are kind of tied to spark without getting more complicated
DSE 5, which is c* 3.0 and spark 1.6 iirc
i'm not particularly wed to DSE except that the hive metastore impl gives us tableau interactivity
which has proven to be very convenient
mccraigmccraig things seem to be moving away from that a bit w/flambo, sparkling, parkour and powderkeg
powderkeg is the one that interests me the most atm. I like the idea of transducer like stuff rather than ->> like stuff
powderkeg looks nice
yeah, i don't want anything to do with deploying hadoop either... and cascalog had many shortcomings - the macros were pretty horrible
but the underlying set of cascading ops was pretty neat... and composition was fairly straightforward
I'm hoping onyx comes through. I'd like to have something clojure all the way down. I'm not into the idea of writing much java and the idea of fixing some scala fills me w/dread
cascading is cool, though even that seems to be stalling after the parent company got acquired (I think)
I really like using cascalog when all I need to do is unions and joins. I've found it can get messy beyond that
otfrom: what’s your impression of onyx? Every time I look at it I hear lots of talk of zookeeper / scheduling / cool distributed systems stuff etc… and almost no talk of data transformation.
rickmoynihan we're finding it good for streaming ETL, but it is very early days for us still (and we're using it a bit out of its comfort zone)
by streaming ETL - do you mean push data capture?
we're doing some small bits of analytics atm, but nothing exciting yet. More trying to get used to how to do the ops side of things on some simple flows (archiving from kafka and a bit of analysis)
@rickmoynihan Onyx is excellent, I really like it. I've got a few more blog posts in the pipeline to do especially on payload calculations and other interesting things we've learned along the way.
I have a Clojure related question. I'm doing something at work (non Clojure) and massaging some data into a format I'd like and this is a little tedious using our current language. So I created a gist to very loosely show a similar example of the initial data structure I have and what I'd like to convert it to. I've used JSON notation for the data structures in the gist. I'd be really interested to see how this manipulation could be achieved using Clojure (if anyone has some time and is interested in taking a look). https://gist.github.com/yogidevbear/4b386f10c63ba008d3f7b49524262cf0
@yogidevbear group-by
We have a lot of code in our system that does similar types of things where the data is retrieved relatively flat and then looped over in many differing levels so I figure it might be a good case study for the powers that be in the company to see how much easier / more efficiently the code could be written to achieve the same end result
Plus I get to learn some Clojure in the process 🙂
Yeah something like (reduce (fn [acc elem] …) [] (group-by :judges))
is how I’d go about it
So for the option_2.txt
in that gist, it would be a reduce
with a group-by
on :comments
that is encapsulated within an outer reduce
with a group-by
on the :judges
(or something along those lines)?
I'm guessing it might be a little more complicated than that
Sounds about right! You could also experiment with creating some intermediate maps to help with building the final structure
for opt1 you can group-by
with (juxt :entry :judge)
, then for opt2 you can take the output of opt1 and group-by
with :entry
If you had to guesstimate, how quick/efficient would these types of solutions be with datasets of e.g. tens of thousand of rows?
I realise that question might be similar to "How long is a piece of string?"
dunno... benchmark it... where are your rows coming from ? fewer than millions of rows is all going to fit into not too much memory though, so should be pretty fast... probably considerably less than the time it takes to read from the disk or network
Rows would be coming from a database (in this particular case, MS SQL Server)
I'm a strong believer in trying to do a lot of the grunt work in the initial SQL commands and letting SQL do what it's best at, but sometimes these queries get particularly complex and hard for all members on the team to maintain so would be good to assess the alternatives from the perspective of Clojure functions 🙂
@yogidevbear I agree, although writing the transformations in pure clojure functions makes for much easier testing
So if I specify:
(def initial_data [
{ :entry "E1", :judge "J1", :comment "E1J1C1" },
{ :entry "E1", :judge "J2", :comment "E1J2C1" },
{ :entry "E1", :judge "J1", :comment "E1J1C2" },
{ :entry "E2", :judge "J1", :comment "" },
{ :entry "E2", :judge "J2", :comment "" },
{ :entry "E3", :judge "J1", :comment "E3J1C1" },
{ :entry "E3", :judge "J1", :comment "E3J1C2" },
{ :entry "E3", :judge "J2", :comment "" }
])
I have initial_data
in my repl now
How would I specify this, for example, using this approach: https://clojurians.slack.com/archives/clojure-uk/p1489071432499317
Like so? (group-by (juxt :entry :judge) initial_data)
So that creates a vector
(?) with the paired key/index of [:entry :judge]
values and the corresponding data that matches that. Is that correct?
@mccraigmccraig with that example, is this what I'm aiming at? (group-by :entry (group-by (juxt :entry :judge) initial_data))
it creates a map where the keys are [:entry :judge]
vectors and the values are vectors of the matching records
you will then need to process that map of vectors to get your option-1 sequence, and then a couple more steps required to get to your option-2 sequence
ah okay
Still feels like a lot less effort than the hoops I'm currently jumping through
you will probably want to use ->
, ->>
and maybe as->
to make your code flow nicely @yogidevbear
I'm got my homework cut out for me 🙂
e.g. (->> initial-data (group-by (juxt [:entry :judge])) (make-option-1) (group-by :entry) (make-option-2))
or
(->> initial-data
(group-by (juxt [:entry :judge]))
(make-option-1)
(group-by :entry)
(make-option-2))
With make-option-1
and make-option-2
, I'm guessing this is placeholder text for some functionality that I'd still need to define?
something that takes a {group-by-key [record...]}
map and outputs one of your options
terraform users, you may like this if you haven’t heard of it already: https://github.com/coinbase/terraform-landscape
@glenjamin we use ClojureScript to manipulate and derive Terraform... https://github.com/juxt/roll, but it's a work in progress 🙂