Fork me on GitHub
#datomic
<
2016-01-19
>
yenda12:01:12

does this mean using datomic as a component with system is a bad idea ?

stuartsierra13:01:14

@yenda: No, it's not a bad idea. You can use a component to handle the Datomic connection if you want.

yenda13:01:32

cool, I'm using it in danielsz/system

robert-stuttaford15:01:59

we use it with trapperkeeper. it uses http://github.com/rkneufeld/conformity on start to update schema

dexter16:01:21

quick question, I think I'm missing something with Datomic. I see everywhere people saying how it is horizontally read scalable especially when used with something like Dynamo or infinispan because all the querying is done on the client (peer). But this seems contradictory. With large datasets, sending say 80GB of data over the wire from say 50 machines to a single client for that one machine to process down to whatever tiny subset is needed is, even if you could manage to make it not oom going to be very slow. This gives a relatively small absolute limit on dataset size Whereas if the query was pushed down to the servers then the query can be distributed across the nodes allowing it to actually scale as nodes are added. Scale this up to multi-terabyte datasets and it very quickly becomes obvious this is impossible. What am I missing here. Can datomic actually handle large datasets; how much is actually done on the servers? Becuase other than scale concerns it sounds perfect for us.

curtosis16:01:47

@dexter: I'm not entirely sure what you mean by "sending 80GB of data from 50 machines to a single client" in the context of read scaling.

curtosis16:01:12

sending data from 50 machines sounds like putting data in, which is the job of the transactor (and so single-threaded), but that addresses write sacalability.

dexter16:01:41

lets say I have 50 machines, with my dataset evenly distributed

dexter16:01:46

if I create a query and push it to the servers a-la h-base server side filters, SQL stored procedures etc then the task is subdivided across all 50 machines and each one conducts data-local ops where possible only sending the minimal amount needed over the wire to the client

dexter16:01:22

this means as your data scales you simply divide it more and the overall data needing to be returned to the client as the answer remains constant

dexter16:01:03

if however the client has a snapshot of all relevant data so it can perform the query client side that is fine at small scale but with large datasets you will be limited by a) network bandwidth which with a cluster of 50 machines throwing data at you to process will surely saturate the network and cause a serious bottleneck. Moving 80GB of data to a single machine takes a loooong time and b) once you finally get all the data you then have to perform the query but one machine will easily hit CPU limits long before it can efficiently process that much data

dexter16:01:15

most horizontally read-scalable systems rely on pushing as much of the work as possible out for divide and conquer, so logically a system that does the inverse cannot scale to large datasets as it will take hours just to get all the source data for a query even before you manage a query

curtosis16:01:05

ok, I think I see... A couple of thoughts: 1) Peers need to see the transactor and storage. If that's not local, that's a huge bottleneck. (80GB is of course a lot on anything other than very fast networks.)

curtosis16:01:23

(on the other hand, once you have the data it's there)

dexter16:01:02

but you are still limited processing-wise even just streaming over 80GB on one machine will take hours

dexter16:01:16

scale that to terrabytes and it is really impractical

dexter16:01:34

I'm looking at a dataset of around 30T

dexter16:01:20

if you co-locate your peer with part of the dataset that bit can be fast but you still need to get data from the rest of the cluster and process it

dexter16:01:38

to use the example from flyingmachinestudios:

dexter16:01:42

1) Peer: Peer Library! Find me all redheads who know how to play the ukelele! 2) Peer Library: Yes yes! Right away sir! Database - give me your data! 3) (Peer Library retrieves the database and performs query on it) 4) Peer Library: Here are all ukelele-playing redheads! 5) Peer: Hooraaaaaaaay!

dexter17:01:54

step 3 is a massive bottleneck that limits any datomic dataset to whatever a single machine can practically get its hands on in time

curtosis17:01:58

I think the Peer is generally smarter than that, and will only load the chunks it needs to get those datoms

dexter17:01:24

true, but thats still the source so if I read from my 30T dataset, 80G datoms -> some filter -> 20G datoms -> some transform -> 20G datoms -> some filter -> 1G datoms -> some join -> 100 datoms

dexter17:01:24

I only needed the 1G for the join, but I have to send all 80G and do all that parallelisable work on one very beefy machine

curtosis17:01:02

based on my understanding, most of what you would do with HBase filters is handled by the indexes in datomic.

dexter17:01:44

so you pre-calculate the filters as indexes?

dexter17:01:00

doesn't that heavily limit flexibility in that you need to know every query in advance

curtosis17:01:51

not really.... that's a relational way of thinking of it.

curtosis17:01:35

datoms are much more granular, kind of like a triplestore.

curtosis17:01:34

so to find "all the redheads" it just needs to look at the AVET index, to get all the Entities with the Attribute :hair/color and the value :hair.color/red (hypothetically)

curtosis17:01:29

(disclaimer: I'm relatively new to datomic myself, so I'm only speaking from my understanding)

dexter17:01:44

no worries there, every hint helps 😉

dexter17:01:01

basically we have a large transaction dataset that we could never fit on a single box, and we want very fast reads for complex matrix manipulation / modelling purposes. People submit queries asking for models to be generated over large swathes of data and they want them sub-second. The problem with things like HBase is that they are too slow with their clunky HDFS backing stores. Datomic seems ideal but we really need to do the work data-local on the servers because no one machine could process all the data fast enough, more to the point you couldn't even send the source data to one box in-time. Fortunately the problem can be sub-divided and thus as long as each section completes fast enough we should be fine.

dexter17:01:02

if i'm understanding right to use datomic in this context we may want a proxy process to conduct the queries datalocal on subsets of data then manually collect the results from the delegates?

dexter17:01:21

these queries also are heavily time-series oriented hence why something like datomic is better than just say postgres

dexter17:01:15

so you might collect all the transactions for one transactor using an index and shard by transactor

dexter17:01:39

but sadly we need to join across transactors

curtosis17:01:52

no. that breaks the core datomic model.

curtosis17:01:20

there is one transactor, and it runs single-threaded. period. (well, HA failover excepted)

dexter17:01:45

that's for writes though right?

curtosis17:01:31

well, yes.... but "sharding by transactor" doesn't make sense in this context.

dexter17:01:00

transactor in my parlance meaning a legal entity which conducts transactions

dexter17:01:06

not the datomic transactor

curtosis17:01:22

ah. a very bad namespace collision in here! simple_smile

dexter17:01:41

indeed, new enough to this thing I chose bad wording

dexter17:01:47

lets say legal entity

dexter17:01:58

and err trades

curtosis17:01:10

so in this context, you could definitely partition the data set by legal entity. partitions are all about index locality

dexter17:01:02

so in a datomic 'cluster' I could partition by legal entity while retaining the ability to join across partitions (after filtering to reduce the cost of course)

curtosis17:01:15

that said, the peers need to be sized for their working set, so if they really do need 80GB to answer a query, then they need that much space.

dexter17:01:15

the thing is its divisible, so ideally the part that works on 80G would be farmed out to many machines, it pairs down to small sizes for actual result-sets

dexter17:01:29

very little needs all the data at once

curtosis17:01:56

I don't think that model really maps to datomic that well.

dexter17:01:21

yeah, we are currently evaluating a whole bunch of things, if it could handle the scale datomic is the best fit by far

dexter17:01:40

but we dont want to invest months of time into something that just wont scale

curtosis17:01:45

I mean, you could do it manually and do a join across results from a bunch of Peers, but that seems like re-inventing the wheel.

curtosis17:01:08

without knowing more about what the data & access patterns look like, it's hard to say.

curtosis17:01:48

I'd recommend talking with the Datomic staff directly.

dexter17:01:00

my confusion here is just because people say reads scale horizontally here, but it seems like they very definitely don't number of clients is irrelevant to the datomic structure but the scale is limited by how much one machine can do

jonahbenton17:01:00

@dexter: yeah, what the Datomic folk mean is that in the traditional centralized OLTP database world, your read and write scalability is tightly bound.

jonahbenton17:01:54

if you have heavy read traffic- again, OLTP-like traffic- you can marry your load to your capacity very easily

jonahbenton17:01:37

for your problem, have you looked at Onyx? http://www.onyxplatform.org/

dexter17:01:01

really its basically OLAP queries with OLTP access patterns.

curtosis17:01:50

read scalability is always limited to what a single node can do... in a cluster/map-reduce database, there are mechanisms for breaking up small bits of work and putting them together. Datomic is not a map-reduce system, and using it that way (on its own) you're gonna have a bad time.

dexter17:01:51

yeah, Onyx looks interesting, it's actually what reminded my of datomic. The question there is that you still need to store and get access to the data somehow

curtosis17:01:42

you might be able to use Onyx to work across a number of datomic peers.

dexter17:01:27

indeed, our feeling is that without an MR option our load cannot be handled but all the MR options are slow or 100% in memory which would cost a lot. VoltDB looks promising too

jonahbenton17:01:01

@dexter at what rate is your dataset changing, or is it fixed?

dexter17:01:56

it varies, small data (100s MB) changes slowly but constantly, maybe 10M updates at a time. Big Data updates say minimum 30G /day come in overnight via our batch process (grab giant files from clients and normalise). Then people can submit arbitrary queries across almost any field which does e.g. montecarlo simulations on the streams of transactional data

dexter17:01:42

really tiny data changes via users inputting on the site, fairly constantly, which is tiny data but causes many changes

dexter17:01:54

very little is pre-calculatable

dexter17:01:32

apparently competitors manage thousands of these sorts of things in massive oracle deployments

dexter17:01:57

we want something that would scale to ~100 times that load

dexter17:01:20

as we are carefully only pulling in basically samples atm

dexter17:01:43

@curtosis: that seems about right simple_smile

dexter17:01:19

we only have ~30T atm in total

dexter17:01:49

but the lack of ability to pre-calc just due to the variance of the queries is a real pain

dexter17:01:02

you could think of us like an analytics database with a web ui

jonahbenton17:01:40

yes. a number of tricky problems in that.

dexter17:01:48

I know right 😛

dexter17:01:11

at present we have basically bunged it all in Redis, but well, it's creaking, we've hit the limit

dexter17:01:22

and its a pain to join

jonahbenton17:01:30

yes. you're definitely at a scale in terms of data and potential licenses that you should be talking to cognitect directly but a couple of other things come to mind

jonahbenton17:01:51

datomic has datom count limits within a single database

jonahbenton17:01:04

and likely peer count limits within a "cluster"

dexter17:01:53

as long as you can join cross db that's not necessarily an issue, in that there are a lot of partitions, at least a few hundreds

dexter17:01:30

but if there are absolute limits that may be an issue worth looking into

dexter17:01:19

like I say atm this is exploration work, we have a few options to consider, none are exactly perfect so we need to find what's best or maybe even divide the issue and have two solutions

dexter17:01:48

pairing down what's needed for the OLTP style stuff knocks you down to under 1TB

dexter17:01:02

but if we can solve the problem once rather than twice that would be nice

jonahbenton17:01:07

are you able to partition your model executions into groups based on the amount or type of data they'd need to churn through?

dexter17:01:37

amount no, type not quite, we can partition via which top level entity the data is about

dexter17:01:43

so by key basically

dexter17:01:13

so e.g. one for Google / Amazon / Ebay

dexter17:01:29

though then the results may need joining

jonahbenton17:01:16

right. cpu consumption per model may vary greatly as well, no?

lucasbradstreet18:01:37

#onyx can certainly used to scale out the reads. We've just shipped with a new scheduler implementation. One feature we're hoping to achieve soon, is better scheduling constraints to prefer Datomic peers be collocated on nodes. http://www.onyxplatform.org/jekyll/update/2016/01/16/Onyx-0.8.4-Colocation.html is a decent discussion of how the scheduler can be used for performance optimisation purposes

lucasbradstreet18:01:30

@dexter: looking at your requirements above, onyx may be a good fit. Sounds like you have some interesting issues at scale though. @michaeldrogalis and I would be happy to have a chat with you about it if you're interested.

lucasbradstreet18:01:28

@dexter I think you could likely scale out your reads OK, but it's worth mentioning that all writes must go through a single transactor so you may have write scalability issues, depending on your needs

dexter18:01:59

@lucasbradstreet: onyx does indeed sound like a good fit

dexter18:01:31

especially with the new co-location scheduler

dexter18:01:27

though I'm concerned about the likely stability onyx both operationally and api-stability given its sub v1 state

dexter18:01:37

I don't think writes should be too much of an issue though we'd need to ensure the batch insert did not affect read performance

dexter18:01:24

I think data being a bit behind at 2AM is less of an issue given that it just came out of FTP -> Hadoop

dexter18:01:48

during the day when reads are heaviest there will be little write traffic

dexter18:01:10

if datomic/onyx stays in our top options I'll take you up on that chat with some concrete examples of actual queries

lucasbradstreet18:01:32

Yup. I understand your position on the sub v1 state given where you’re at and what you’re looking for. We’re quickly stabilising with recent work, including jepsen testing, but it sounds like you need something very solid now.

lucasbradstreet18:01:07

batch datomic writes don’t affect read performance as far as I know, because the peers don’t interact with the transactor at all

lucasbradstreet18:01:59

I guess it could affect index creation, but I don’t understand the subtleties there

dexter18:01:26

really sounds like a good option still if its stable enough. What's the roadmap for stability like

dexter18:01:44

are we talking months or years for a stable release

dexter18:01:12

I'm sure I'm not the only one that remembers storm 😛

curtosis18:01:08

IIRC batch transactions might need some tuning (and back pressure). There's also at least a theoretical upper bound on bandwidth to Peers for the index updates. But again, the performance-at-scale questions are better answered by Cognitecticians. 😉

michaeldrogalis18:01:14

@dexter: It's in a pretty stable state right now. Datomic itself is < 1.0, for a comparison.

michaeldrogalis18:01:28

Anyhow - I don't want to encroach on the Datomic conversation. Over to #C051WKSP3 if you have more questions.

firstclassfunc21:01:47

Anyone have any pointers for the best way to deal with transact errors?

stuartsierra21:01:11

@dexter Datomic does not try to position itself as a "big data" system. It's in a completely different category from something like HBase. "Horizontally-scalable reads" is to contrast to traditional relational databases where both queries and transactions are executed on a single machine.

stuartsierra21:01:51

Datomic centralizes the transaction load on one machine to get efficient ACID transactions, but each Peer can execute its own queries without affecting transaction throughput or queries on other Peers.

stuartsierra21:01:35

Each Peer may be interested in different subset of the whole database, but a single query is limited in scope to what a single Peer can hold in RAM.

timgilbert22:01:24

Say, if I'm creating my own partition for some related entities, is it good style to use a :db.part/my-partition identifier for it, or should I just use :my-partition and then pass that keyword to (d/tempid)? Trying to decide between this:

{:db/id #db/id[:db.part/db]
 :db/ident :db.part/notifications
 :db.install/_partition :db.part/db}
...and this:
{:db/id #db/id[:db.part/db]
 :db/ident :notifications
 :db.install/_partition :db.part/db}

timgilbert22:01:35

Parenthetically, I wish the datomic docs around this had a few more examples, specifically ones where new data is asserted in the created partitions