Fork me on GitHub
#onyx
<
2018-05-16
>
eoliphant13:05:00

Hi, I have a kind of general architecture, etc question. I’m looking to use onyx for a commander’ish pattern implementation. One of the things I’m working through is the best way to manage/maintain the state that a command processor needs in order to do its thing and issue the appropriate event(s). I’ve done stuff with ES/CQRS frameworks like axon, that have things like an explicit ‘event sourcing repository’, such that my say order command processor, would ask the repo for order 27, and it would return the state as a function of all it’s stored events. I’d been considering using datomic as this ‘aggregate repo’ or whatever, but I initially was having some heartburn as it could possibly violate the principle that the events are the source of record for everything, but now I’m thinking that as long as the datomic state is a function of applying events, can be rebuilt as needed, then it actually is ok, and actually potentially makes for a better ‘repository’ implementation,as it’s essentially an ongoing snapshot, as opposed to other libs/approaches where you maintain the snapshot, and still read N events to get to the current state. Sorry for the ramble lol, but just wanted to see what you folks thought about this

Travis13:05:32

@eoliphant Been doing a little thinking about this myself. I am currently implementing this with Kafka streams since I don't have onyx available to me in this case. If using onyx and datomic you probably can use it as the state store and have an onyx job read the datomic log for the events

eoliphant14:05:29

yeah @camechis I’ve been looking at Kafka streams also, and trying to decide how that might fit in, pros/cons, etc. It’s the Tyranny of Good Choices lol

eoliphant14:05:15

And yeah, pulling stuff from the datomic log is yet another dilemma lol. Because, in that case strictly speaking, the datomic log(s) are the SOR and the event stream/store is derived, so it’s not event ‘sourcing’ per se. I know the NuBank guys did that (microservice datomic log -> kafka) but I saw a talk by their CTO recently in which he indicated that if he had to do it again, he’d have flipped it around

lmergen14:05:17

i think datomic could be fine for this, but i don't think it's that good of an event store

lmergen14:05:27

it's more of an aggregate store than an event store imho

eoliphant14:05:33

yeah @lmergen, that’s my take also

eoliphant14:05:02

for smaller projects I’ve just done reified transactions and tagged them

eoliphant14:05:07

but this is abigger deal

eoliphant14:05:14

so i need to break stuff out

lmergen14:05:14

@eoinhurrell have you seen the latest project by the onyx guys ? http://pyrostore.io/

eoliphant14:05:25

looks pretty cool

lmergen14:05:31

it's perfect as an event store

lmergen14:05:51

but it depends a bit upon your use case / requirements

lmergen14:05:24

you could also just stream things to S3

lmergen14:05:42

so then you have s3 next to datomic

lmergen14:05:58

at least then you can always easily go back to the raw data

eoliphant14:05:05

yep, especially with athena, etc

eoliphant14:05:09

again, too many choices lol

lmergen14:05:29

so whenever i face too many choices, i usually opt to keep things really simple

lmergen14:05:38

which would be s3 in this case

eoliphant14:05:39

yeah that’s what I’m trying to get to

eoliphant14:05:51

for me the main decision point

eoliphant14:05:55

is what’s authoritative

lmergen14:05:06

event store is always authorative

eoliphant14:05:07

and I’m trying to make that the events

eoliphant14:05:39

yeah, but in some of these scenarios like datomic log -> event store

eoliphant14:05:04

it doesn’t well ‘feel right’ lol, as strictly speaking it would be datomic

lmergen14:05:05

i would think that's overcomplicating things

eoliphant14:05:09

yeah exactly

lmergen14:05:15

i would do kafka -> s3 and in parallel, kafka -> datomic

lmergen14:05:24

s3 for events

lmergen14:05:27

datomic for aggregates

lmergen14:05:41

you could even use kafka as the event store

eoliphant14:05:44

yeah I’m going to take another look at s3

eoliphant14:05:53

that’s what I’d been planning to do

lmergen14:05:56

but it's a terrible event store in my experience

eoliphant14:05:59

just store forever in kafka

lmergen14:05:12

again, it depends upon what you want to do with it

lmergen14:05:26

if you only want to use it as backup, it's fine

eoliphant14:05:37

so what issues have you had with it from the event storage perspective? I’ve seen some rumblings along these lines lol

lmergen14:05:39

if you want to allow your data scientists to query the event store directly, it sucks

lmergen14:05:10

if you use kafka as the event store, imho it's not a great tool for ad-hoc querying and data exploration

eoliphant14:05:15

well right, but I thought typically, those guys would build their own views etc

eoliphant14:05:22

ah but I see what you’re saying

lmergen14:05:22

if you put it on s3, you get a ton of extra tools like athena for free

eoliphant14:05:29

querying ‘into’ the store itself

eoliphant14:05:39

as opposed to just in order reading into somethign more suitable

lmergen14:05:48

also, more tools integrate with s3 than kafka

eoliphant14:05:57

yeah interestingly

eoliphant14:05:03

this opens up some other possibilities

eoliphant14:05:14

i’d been trying to push as much mgmt overhead to aws

eoliphant14:05:22

so i’d been looking at kinesis etc

lmergen14:05:38

i've used kinesis firehose for years, it's solid

eoliphant14:05:39

but the fact that there’s no ‘store’

eoliphant14:05:49

was pushing me back to kafka

lmergen14:05:03

kinesis firehose can easily stream everything to s3 as well

lmergen14:05:11

so then s3 becomes your store, again

eoliphant14:05:36

yeah and now i’m having some ideas, which is always dangerous lol

lmergen14:05:00

as long as the ideas are good... 🙂

eoliphant14:05:19

because I can get my in order semantics i guess out of athena

eoliphant14:05:39

and again, datomic would actually make for an awesome aggregate store

lmergen14:05:52

you can make order semantics explicit

eoliphant14:05:09

none of that (look at the snapshot, then grab the last few events stuff)

lmergen14:05:56

the guy never actually implemented it

eoliphant14:05:57

yeah I’ve played with it acutally

eoliphant14:05:04

the stuff that was there lol

lmergen14:05:05

as in, never ran in production

eoliphant14:05:14

and to your point

eoliphant14:05:19

we’ve got some more options now

eoliphant14:05:00

so to your point kinesis/kafka could give us the required serialization/ordering

eoliphant14:05:17

so that stuff shows up in s3 correctly

eoliphant14:05:24

ah that’s another thing

lmergen14:05:47

who decides what the correct ordering is ?

eoliphant14:05:50

are you in your case just using the ‘put time’ for order?

lmergen14:05:55

when you have multiple kafka partitions / brokers

lmergen14:05:06

how would you manage ordering ?

eoliphant14:05:12

right that’s anoher thing i was working through lol

eoliphant14:05:32

that makes it less than suitable in some scenarios

eoliphant14:05:57

since this is business stuff as opposed to just streams of data from IOT or somehting

lmergen14:05:02

so what i do is not depend upon ordering

lmergen14:05:20

i use the onyx epoch id

lmergen14:05:23

i tag that

lmergen14:05:27

so i can deduplicate

eoliphant14:05:37

because I have to have some notiion of it

lmergen14:05:42

but, for explicit ordering of multiple commands / retries that would conflict

eoliphant14:05:54

because i’ve got financial transactions, etc going on

lmergen14:05:00

i came to the conclusion that the only reliable way to deal with it is eventual consistency

eoliphant14:05:28

yeah and de-duping/idempotency have to be in the mix as well

lmergen14:05:30

if your aggregate processors detect a conflicting operation (e.g. deleting the same user twice), they do conflict resolution at that point

eoliphant14:05:02

well that’s where datomic could be super useful

lmergen14:05:03

imho the cqrs / event sourcing pattern demands those kind of conflict resolutions. you cannot achieve strong consistency like a rdbms anymore.

lmergen14:05:09

yes that is true

eoliphant14:05:19

tag the transaction

lmergen14:05:33

yes, but that's actually similar to onyx' epoch

eoliphant14:05:43

right so it could be done there as well

lmergen14:05:07

i would explore it, because if it's possible, you give yourself more freedom in choice of database

eoliphant14:05:22

yeah i’m going to give that a whirl

lmergen14:05:30

anyway this is all my opinionated advice, take it with a grain of salt 🙂

eoliphant14:05:46

Yeah I’m dealing with 97 things on this project lol

eoliphant14:05:00

nah man this is super helpful

eoliphant14:05:08

so I’m getting this worked out

lmergen14:05:16

i learned one thing: one does not simply implement cqrs

eoliphant14:05:27

but I’m also pushing for a ‘clojure all the way down’ approach

eoliphant14:05:49

Clojurescript in the browser, this backend stuff we’re discussing, EDN/Transit all over, but basicially where throughout the system, :application/id means the same thing, can be validated the same way, etc. Then some automated to the extent possible translators for typical REST/GraphQL for clients who aren’t fortunate enough to be using this cool stuff lol

eoliphant14:05:17

yeah I’ve been down this road a few times myself lol

eoliphant14:05:23

I used Axon on a couple projects

eoliphant14:05:36

as well as Lightbend’s Lagom

eoliphant14:05:00

both have a lot of nice batteries included stuff

eoliphant14:05:11

but they bring simple vs easy to mind lol

lmergen14:05:07

yes, so now you need to make a lot of choices yourself

👍 4
dbernal15:05:25

@lmergen I'm using the SQL plugin to get some initial values and then a downstream task uses those to call out to SQL. I'm thinking now that the downstream task is actually the issue here. I'm still struggling to get it to consistently call out to SQL from within a function task. I'm not sure how the SQL plugin is able to do it so consistently with the PooledDataSource; but for me, even with an input sequence it's not able to get consistent results back from a SQL call