This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2016-04-26
Channels
- # admin-announcements (4)
- # beginners (3)
- # boot (78)
- # cider (13)
- # cljs-dev (29)
- # cljs-edn (8)
- # cljsjs (11)
- # cljsrn (15)
- # clojure (81)
- # clojure-beijing (2)
- # clojure-belgium (3)
- # clojure-canada (1)
- # clojure-dusseldorf (8)
- # clojure-greece (6)
- # clojure-russia (40)
- # clojure-sg (1)
- # clojure-uk (59)
- # clojurebridge (1)
- # clojurescript (101)
- # core-logic (1)
- # cursive (3)
- # data-science (1)
- # datomic (60)
- # emacs (4)
- # error-message-catalog (12)
- # funcool (1)
- # hoplon (60)
- # jobs (1)
- # jobs-discuss (40)
- # leiningen (5)
- # liberator (1)
- # mount (22)
- # off-topic (8)
- # om (16)
- # onyx (53)
- # re-frame (11)
- # reagent (2)
- # specter (4)
- # testing (18)
- # untangled (51)
G'day
Anyone here using Cassandra?
Seems to be a very in-demand skill to have
And Apache Spark?
@yogidevbear: i'm using cassandra
i use alia+hayt plus our own lib on top of them for higher-level stuff https://github.com/employeerepublic/er-cassandra
Is it quite different from traditional rdbms?
yes - data modelling is very different
ls the data modeling similar across other NoSQL dbs?
In relation to other NoSQL dbs
i've only really used elasticsearch and hadoop/cascalog greatly - and the modelling is quite different to those
you mostly model by the queries you want to do, rather than attempting to discover a suitable natural structure in the data
e.g. you might have a table of users
, with an id
primary key... if you want to be able to retrieve users by their email too then you will need to denormalize to another table, users_by_email
or something, with an email
primary key
That is quite different
Thanks for the example
also you are very limited on sorting and filtering... primary keys in cassandra are divided into two parts - called partition and clustering keys - the partition key is the columns used to determine which partition(s) a record will live on, and can't be used for sorting or filtering (beyond an IN query) while the clustering key columns can be used for sorting and filtering (and maps to the wide-row concept which is kinda sorta hidden beneath the CQL table concept these days)
i enjoyed reading http://www.amazon.com/NoSQL-Distilled-Emerging-Polyglot-Persistence-ebook/dp/B0090J3SYW/ref=mt_kindle?_encoding=UTF8&me= quite a few years back as an intro to nosql. it could be quite dated now tho (not aware that there is an updated edition)
@yogidevbear: if your requirements don't include one of ["must be nukeproof" "must scale a long long way" "i wanna understand this thing"]
then you may well have an easier time with postgresql, if your requirements do include one of those things, then go for it - i've found it relatively straightforward so far, though it took me a little while to get a good feel for different modelling approaches
Cool, thanks again. I'm definitely going to be investing time in getting up and running properly with postgresql as I'm very comfortable working with rdbms, but it's always good to know about alternative options like NoSQL and where/when/how to use them
having said that, psql scales pretty ridiculously nowadays.
and you have JSON columns, BRIN indices etc
also, I avoid Cass* like the plague
Totally unrelated, but there are blue skies, snow and hail going on around my house right now
@martintrojer: What are you reservations around Cassandra?
Lots of ops issues, easy to lose data, frequent downtime, don’t really work on a dynamic infrastructure (without lots of blood, sweat and tears)
I've heard loss of data mentioned about a few different NoSQL db options
That's what makes me a little hesitant to use them
if you think about using cass, setup a large (i.e. expensive) cluster with lots and lots of redundancy
that’s the way to do it.
If you’re on AWS, just use Dynamo. Scales with your needs and 0 ops issues (and stop worrying about downtime and/or dataloss)
@yogidevbear: we have had just a little bit of snow… weird
@martintrojer: were you doing anything in particular to c* to cause it to bork so ? what size instances were you running it on ?
m4.large
I had some Cass-dudes look at it, they couldn’t find anything wrong.
My current looking-back-conclusion is that the cluster was way way too small.
only 5 nodes.
should be >20
what was happening to it ? did nodes fall over, or start losing data or performing badly ?
@mccraigmccraig: When EC2 decided to kill some of the nodes, and new one rejoined, the entire cluster went down
also, running Cass* on dynamic IPs is a mess, you need a discovery service on the side, and when provisioning update the config file before starting Cass*
@otfrom: Dynamo FTW
ah, well i haven't encountered either of those two situations yet, though at some point i will doubtless be encountering the dead node problem...
why did you run on dynamic ips tho ? it makes lots of things painful, surely ?
@mccraigmccraig: I want to automate everything. No human hands should ever touch the VMs.
I want to scale the cluster by just changing a number in the auto scaling group
This works perfectly with for instance Elasticsearch (without any service discovery thing)
ah, i see - i agree about no-human-hands - though i have taken a differing approach - my config mgmt tool distributes the ips of created instances to config files, so it effectively pre-empts discovery at converge time and ips are static (until an instance dies and needs to be replaced)