Fork me on GitHub
#architecture
<
2022-02-21
>
Ferdinand Beyer08:02:25

I am a believer in treating the database as an implementation detail and not tying my logic to database details. The idea of the “repository” pattern in DDD is not to build a “pass-through” interface to SQL, but to provide a collection-like interface over your data. I think this fits nicely into Clojure’s mindset, when done right. In Clojure, collections are immutable. So how about you design a repository protocol to resemble an immutable collection?

(defprotocol ArticleRepository
  (add [repo article])
  (remove [repo article])
  (find-article-by-id [repo article])
  ...)
In the beginning, you don’t even need a database. Just implement this with a backing hashmap. add will assoc and remove will dissoc. Write your logic with pure functions operating on this “pure” repository and returning an updated version. This also fits nicely with “database as a value” implementations such as Datomic or XTDB. You can accumulate changes in add /`remove` and transact them at the end of a “command processing” function. In DDD, your domain model will not do that. Your application layer will create repositories, pass them to your domain code, and transact the result. This approach gives you a lot of flexibility. You can write blazingly fast tests. You can split your application into microservices and have a repo implementation that talks to other services. You can switch your database. All of that without changing the domain/business logic code. Of course there is no free lunch. This might be overkill for really small apps. Architecture is all about trade-offs.

Drew Verlee15:02:08

That couples your "ArticleRepository" to one that can only assoc and dissoc. Which means it's more of ArticleHashMap right? And what prevents me for assoc things that aren't articles? I earnestly don't see how this is different then just a regular hashmap but with more steps.

Ferdinand Beyer15:02:55

No. In DDD, you model a repository after a collection with set semantics. So it behaves like a map, yes. So does every SQL table with a unique index.

Drew Verlee15:02:22

How do you do a join across hashmaps such that we can keep things "fast and in memory"?

Ferdinand Beyer15:02:05

You don’t. You abstract away the database bit. You create one repository per “aggregate” in DDD, and you don’t join across aggregates.

Drew Verlee15:02:47

you have lost me, how are you eventually persisting your data?

Drew Verlee15:02:01

do you plan on having everything pre aggregated?

Ferdinand Beyer15:02:14

By providing an implementation of your repository protocol that talks to your DB and does not use a hashmap

Drew Verlee15:02:51

hashmaps talk in key value semantics, how would that translate to a join in postgres?

Drew Verlee15:02:18

i mean, this plan would work better on a more key value database obviously, but i'm trying to understand how we have removed coupling to a specific database or even a set of them.

Ferdinand Beyer15:02:19

Think of an aggregate as a cluster of “objects”. An article could also contains comments. In a SQL database, you might want to join across tables to get the full article. But this is all SQL implementation detail that you might want to hide from your application. There you just want to fetch an article. This could be getting one document from a No-SQL DB, or involve joins on a SQL DB, or fetch something from a REST API from another service — the app does not care.

Ferdinand Beyer15:02:57

You app just uses the protocol

kraf15:02:47

@U0DJ4T5U1 are you familiar with DDD aggregates? in this example within the article repository you are doing joins but you only think in articles. you get the whole article with everything that belongs to it joined in. what does not belong in your aggregate you leave as a foreign key and you use different repositories to resolve those (basically doing application level joins)

👍 1
Drew Verlee15:02:37

Databases are a function of there query semantics you cant abstract that away without a huge trade off. Dynamo is fast because it's a key value store that requires developers to know their query patterns. Postgres allows for run time joins so that queries can be built at runtime or devtime (without significant investment). Datomic allows for even more flexibility at a cost.

kraf15:02:56

the driver here is that you model your entities by what you believe is the best conceptual model in your domain and how this is stored is a different concern.

kraf15:02:41

you are right about the trade off. the implementation of the repository can get pretty wild

Drew Verlee16:02:17

I'm curious what query semantics that looks like though? I feel like datalog is as close to a high level query language across structured data as i have seen.

Ferdinand Beyer16:02:11

Question back at you. How would you design a larger system? Would you scatter database query logic all over the place?

kraf16:02:22

hm let's say article is a aggregate and it has comments that are part of the aggregate. articles are also written by a user but we assume it doesn't make sense to have them be part of article. in this setup you have an article repository, a user repository but no comment repository. when you load an article it's a hashmap with the comments joined in and just an id to a user. you can then manipulate this map (e.g. change some typo and add a comment) and then you store it. the repository figures out how to persist your changes

kraf16:02:14

if you persist in pg, you have a comment table and normalize, in mongo you might have one document for the whole thing

Ferdinand Beyer16:02:12

Queries: You would define a simple set of domain-specific queries, and add functions to your repository to implement them. E.g. query articles by author, date, etc. You could pass a clojure map with criteria to a function.

kraf16:02:58

while the repository code is so-so, the business logic is very clean and domain focused. you have just pure functions operating on aggregates. load aggregates, do something, store aggregates

Drew Verlee16:02:16

That makes sense Kraf, the question is, why have two databases? I'm not being flippant, the reason this abstraction might be useful is to handle this polymorphic aspect, but you have to state a business reason why that's the case. It can be for speed, thats fine, thats usually the case.

Ferdinand Beyer16:02:57

Maybe you just have one database. Still it does not make sense to tie your whole application to that one database

Ferdinand Beyer16:02:21

The protocol is just first and foremost an architectural boundary

Drew Verlee16:02:48

It doesn't make sense to prematurely abstract before knowing what you can safely ignore in that abstraction.

💯 2
Drew Verlee16:02:25

The difference between the outlooks is why it's an art imo.

1
Ferdinand Beyer16:02:30

What is the alternative? Writing SQL in your Ring handler?

Ferdinand Beyer16:02:14

Plus how do you even decide that you need a database? Or what kind of database? Could Postgres be already a premature choice?

Drew Verlee16:02:45

Maybe we don't even need an app 🙂 don't tell the business though.

🙂 2
kraf16:02:48

i understand now. definitely a thing i struggle with. hexagonal and uncle bob style clean architecture definitely is extremely verbose and at the same time you will likely never swap out a implementation for something else. it's a nice idea on how to make the database and even frameworks into implementation details. i'm usually a believer in YAGNI. i see a lot of value in the concept of aggregates in any case.

Ferdinand Beyer16:02:38

I don’t quite understand the argument “don’t use interfaces/protocols when you only have one implementation” — that’s beside the point. Why don’t you just solder your laundry machine to the wires in the wall instead of using a plug and an outlet?

😀 1
kraf16:02:39

the overhead of creating a protocol for repositories seems not so big, especially for clojure. i think you can get very far with building out your business logic without having to decide on a db. there is definitely value in being able to defer a decision like this

1
Drew Verlee16:02:47

to be clear, are you talking about Clojure Protocals?

Drew Verlee16:02:35

Why not just use a hashmap as the presumed mock database?

Drew Verlee16:02:07

i guess it's irrelevant for this discussion, i don't see how protocols matter is all.

Ferdinand Beyer16:02:27

Let’s say we have some business logic. I want to add a comment to an article. So it’s like (defn add-comment [article-id commenter text]). This should somehow translate to something being persisted, right? So somehow this function will have to interact with a database of some sort. You can either call your SQL adapter directly or delegate to SQL-handling functions.

Ferdinand Beyer16:02:30

The repository is just a set of functions encapsulating the SQL here. Plus it allows you to write that function without even caring about SQL. When you do it right, you could even make this function pure.

Ferdinand Beyer16:02:16

Without a protocol, how would you inject your hashmap as a mock database?

Drew Verlee16:02:44

err inject into or do you mean reference?

Drew Verlee16:02:16

you pass the hashmap/source in as an argument to the function. Like usual.

Ferdinand Beyer16:02:29

So:

(defn add-comment [db article-id text]
  (update-in db [article-id :comments] conj text))

Drew Verlee16:02:34

I mean, i would only write that function if it did more then add-comment otherwise i would just (update-in db [article-id :comments] conj text)

Ferdinand Beyer16:02:57

Of course, there could be more business logic, invariants, etc.

Drew Verlee16:02:59

Or if the fn was re-used across multiple places where it did more and they needed to be kept in sync

Ferdinand Beyer16:02:07

Let’s assume the function has a good reason to exist. We want to check that the user is allowed to comment. We want to also update counters or whatnot.

Ferdinand Beyer16:02:36

And the function should work on both a hashmap-based DB for testing and the real DB

Ferdinand Beyer16:02:41

This is what protocols can do.

Drew Verlee16:02:47

or a defmulti* but I understand your point. I certainly see how necessity might drive towards doing this, but it might also go other directions as well. And Protocol just provides a template for how to handle the dispatch on types, not guidance on how to do that well. Trying to maintain a Protocol over a hashmap and Kafka when in production only Kafka is used would be challenging and someone would need to motivate that cost. Especially when docker means you can simulate the database in a fairly isolated way.

Ferdinand Beyer16:02:30

Absolutely true. As said — it’s a trade-off.

Ferdinand Beyer16:02:34

TBH your argument has little to do with protocols, but more with whether you want to have a test double for your DB.

kraf16:02:55

for me it's hard to say when to do what. i totally get what you're saying about overengineering stuff and i know plenty of cases where I wouldn't even have that abstraction layer. on the other hand if a team starts building something more monolithic and multiple people work on it from the start this makes a lot of sense to me even if you're just starting out

Ferdinand Beyer16:02:37

Agree. It’s also hard to get the right boundaries in after the fact.

Drew Verlee16:02:24

Yep. I think i'm pushing back on the idea of trying to "hide" the database, that can easily become an ORM which doesn't allow for decoupling and it obscures key functionality away from the developers. as to the specifics of how to abstract, datomic has always allowed for a datafirst interface as well as a in memory client (of sorts) which would seem to satisfy most of painpoints and it doesn't require setting up protocols. I would default to using datalog as the abstraction language, then translate that to my database (or additional databases) if that was my goal.

Casey17:02:44

Nice discussion here. One observation and a question. Observation: It's important to keep in mind that DDD is intended to be used in situations where the business domain is very complex. Somewhere in the first chapter or two of the book Evans specifically says that for smaller apps DDD need not be applied (paraphrasing). Furthermore, DDD assumes that if you have a complex domain, you spend a lot of up-front effort modeling that domain. Not architecting code mind you, but deeply understanding the domain. It's not about pre-mature abstraction, but the right amount from the get-go. DDD is definitely not a one size fits all. But DDD is also sort of an orthogonal to ports+adapters/hexagonal Question: @U031CHTGX1T So you have your ArticleRepository your UserRepository et. al. nicely abstracting the database (and Entities and VOs inside to use more DDD-speak). What happens when you have an action that crosses the aggregate boundary and succeeds or fails based on IO/side-effects of multiple repos? I can't think of an example using Articles and Users. But I think this is a relatively common case when you take user input -> side-effect it against System A, get a result, then feed result into System B and get a second result. If second result is BAD then we have to unwind the first operation against System A. Assuming A and B are both sql-y things against the same db (as is often the case), then a simple (jdbc/with-transaction [tx ds] ...) has you covered.

Ferdinand Beyer17:02:15

First of all: Agree fully and want to point back to my initial post: YMMV, trade-offs, DDD might be overkill, etc. To your question: Aggregates in DDD are defined around transactional consistency. You should not attempt to change multiple aggregates in the same transaction, but accept eventual consistency. Implementation wise, though, there is actually nothing preventing you from doing that. Transactions would be handled behind the scenes, just like with SQL. You can add and remove aggregates to/from repos all day, you still would need to commit or rollback eventually.

kraf17:02:07

you are right and it's good you point out that DDD is not about architecture. i think you do what you would do if it all were services. as you said above, if you're lucky and you are working in a monolith and have one db you wrap in a transaction. if not you will have to get clever and build distributed transactions or sagas right?

kraf17:02:49

are you saying even if the aggregates are part of the same db it would be better to not depend on multi-aggregate transactions being available? an assumption like this is definitely hard to get rid of if you ever need to

Ferdinand Beyer17:02:40

Well, this is at least what the textbooks say 🙂 You design your aggregates around consistency requirements.

seancorfield21:02:14

"abstracting the database (and Entities and VOs inside to use more DDD-speak)" -- that all sounds very OOP to me and with a protocol you're forced to have both the protocol itself for every domain object or aggregate and a record for every domain object or aggregate and then your code needs to make available all of the repository "objects" that any given handler needs in order to do its job. We have over 200 tables at work and that would add a vast amount of boilerplate code that would feel like I'm writing my own ORM (and I detest ORMs!).

Lennart Buit22:02:48

Yeah; what I always wonder in these kind of arrangements, how do you keep your ‘middle layer’ from exploding, without losing the power of the database your protocol is backed by. Like lets say that I implement my ArticleRepository protocol with a datomic based implementation, or a postgres implementation. There is so much power in both query languages, but it’s definitely not the same strengths. So it seems to me that I can either lose the powers of the query language, or create a leaky (or large!) abstraction.

1
👍 1
💯 1
Lennart Buit22:02:08

Maybe losing that power is fine, right, I believe datomic does this. It has a storage protocol, and implementations for dynamo, and jdbc and … that treat these storages as k/v (so, not leaning into the power of its backing storage) . Thats fine because it only requires it to be that simple — it implements querying on that data ‘the datomic way’. But in complex business stuff, …, it feels like giving up datalog or sql for some abstraction either runs the risk of that abstraction becoming sql- or datalog-light, or growing in all directions for each added feature. (Prove me wrong, tho!)

Ferdinand Beyer06:02:08

Wouldn’t you have some sort of “constraint” on database knowledge anyway? Some sort of layer that exposes simple functions and executes queries? Some mechanism to keep your domain logic pure, pushing database I/O to the edge?

Ferdinand Beyer06:02:49

> that all sounds very OOP to me and with a protocol you’re forced to have both the protocol itself for every domain object or aggregate and a record for every domain object or aggregate @U04V70XH6 - why would you need a protocol for every “domain object”? A repository could just as well work on maps and other values. When needed, you could use Spec to describe the data, etc.

seancorfield06:02:33

@U031CHTGX1T I keep hearing you say this in general/generic terms but I have not seen real code that shows how what you're suggesting avoids the pitfalls I'm talking about.

seancorfield06:02:52

What would the protocol for a repository for hash maps look like?

seancorfield06:02:39

As I said, we have over 200 database tables in our system (in fact, we have over 200 database tables just in one schema -- we have four schemas), will basic CRUD on nearly all of them and increasingly complex aggregate operations -- both read and write -- on many of them. How would you represent that in your DDD world with records and protocols?

seancorfield06:02:37

We've tried a number of approaches -- including one that very explicitly separated out sources and sinks so that business functions were "pure" (to the extent that they could only perform readonly queries on their sources) and produced descriptions of changes to be persisted. That created very monadic style code that was difficult to maintain and frustrating to work with.

seancorfield07:02:06

I think the bottom line is that a lot of operations in our system are inherently very database-heavy with not a great deal of "pure logic" applied. Many operations just read an entity, perform some validation, and then update a few columns in that entity and also write records to one or more other tables. Other operations perform complex queries with a lot of conditional pieces in them. Neither of those types of operations lend themselves to the model you're advocating.

seancorfield07:02:42

In addition, for pure CRUD operations, if you've committed to a relational database, next.jdbc.sql provides get-by-id, find-by-keys, insert!, update!, and delete! functions that operate on a table name and a hash map (and some sort of "connectable").

seancorfield07:02:33

We also have a business team that are used to SQL and often want to be able to specify rules and query fragments in SQL which then feeds into operations in our system.

seancorfield07:02:41

I find the DDD stuff interesting -- I've read a bunch of stuff about it and watched several conference talks -- but it just doesn't seem very applicable to the systems I'm currently working with.

Ferdinand Beyer07:02:52

Don’t get me wrong, I’m not trying to advocate DDD-style repositories as the one and only way. I am merely curious how others design systems.

Ferdinand Beyer07:02:37

And in particular if and how people avoid having SQL flying around all over the place, coupling everything to the database and making testing without a database impossible

seancorfield07:02:09

For a lot of our operations, there's nothing but database ops and SQL. Having a database for testing is unavoidable.

Ferdinand Beyer07:02:19

> What would the protocol for a repository for hash maps look like? As simple as

(extend-protocol ArticleRepository
  clojure.lang.HashMap
  (find-by-id [m id] (get m id))
  ; ...
  )

seancorfield07:02:47

☝️:skin-tone-2: But that's not useful in terms of actual persistence.

Ferdinand Beyer07:02:08

It models the repo as a persistent collection

seancorfield07:02:44

For articles. Now imagine 200 different types of things. Do you have 200 protocols? 200 records?

Ferdinand Beyer07:02:21

I would pass repos to pure functions to manipulate maps and (add) them back to the repository. Then, when I want to commit the transaction, I can produce SQL / datomic transactions / whatnot

Ferdinand Beyer07:02:44

Well I can’t answer that for your domain now, can I

seancorfield07:02:51

Again, that doesn't address "200 different types of things".

Ferdinand Beyer07:02:55

Not sure what you want me to say. You might have 200 tables, do you also have 200 aggregates? If not, how many? 50?

☝️ 1
Ferdinand Beyer07:02:27

And I’m totally happy to agree with you that this does not fit well into your domain 🤷

seancorfield07:02:52

We have a very large number of domain entities. I'm asking how XyzRepository scales when you have lots of Xyz things.

Ferdinand Beyer07:02:02

I assume when you have 200 entities you will have 200 or more functions that insert into the database, 400 functions that query stuff, etc. Could just as well have 200 protocols.

seancorfield07:02:12

Even if it's "only" 50 Xyz things, that's 50 protocols and then 50 implementations when basic CRUD can already be done with next.jdbc and five or six functions in that library.

Ferdinand Beyer07:02:43

So you have ring handlers that use next.jdbc to produce SQL?

seancorfield07:02:55

next.jdbc doesn't "produce SQL".

Ferdinand Beyer07:02:15

Oh man, it’s early here. You know what I meant 🙂

Ferdinand Beyer07:02:41

Look, I’m not looking to fight with you. Let’s stop this.

seancorfield07:02:58

No, I don't. I genuinely don't understand what you are advocating because it just doesn't make sense to me -- so I'm hoping you can explain it in the context of a large system.

Ferdinand Beyer07:02:18

I was asking if you have any kind of separation of your database code or if you expose SQL all the way, e.g. writing Ring handlers that call next.jdbc functions to query/insert using SQL

seancorfield07:02:49

Our Ring handlers do a lot of direct next.jdbc calls, yes. Well, they might call down into a "model" layer for the core work, after doing parameter validation/error handling. But a lot of what happens in our app is get one or more things from the database, compare a few things, write one or more things back to the database.

seancorfield07:02:14

If we had to write a protocol and an implementation for each of our primary entities, only to have those implementations just call back down to the five or six functions in next.jdbc.sql, that would be a colossal amount of boilerplate and a complete waste of everyone's time, as far as I can see. I'm trying to find the benefits you're suggesting in the protocol-based approach but I'm struggling so far...

Ferdinand Beyer07:02:24

> I find the DDD stuff interesting -- I’ve read a bunch of stuff about it and watched several conference talks -- but it just doesn’t seem very applicable to the systems I’m currently working with. That’s probably it. I would not dare telling you how to build your system. If the way you are doing it works for you - great. Not sure what is unclear about what I wrote. I am still saying I see value in keeping business logic separate from the database, and I would try to find a way to test my logic without one. If this does not work or doesn’t seem worth the effort — then don’t. There are always trade-offs. I would not infer from that that any kind of database abstraction was pointless or “too OOP” or “not idiomatic in Clojure”.

kraf10:02:15

> feel like I'm writing my own ORM @U04V70XH6 I share your feelings about ORMs 100% and I know what you mean by it feeling OOP. I do not believe what @U031CHTGX1T is talking about is really comparable to ORMs though. Using an ORM leads to very database driven designs and the implementation detail (e.g. using a particular SQL database) more or less drives the entire design from the beginning. This is about the exact opposite of not having to decide which database to even use. Another thing about ORMs is that they aim to be generic while the suggested repositories are tailored specifically to your domain. Of course it's very hard to relate any of this stuff to your particular domain. Without knowing it though I'm certain that you would not have 200 repositories and I'd even expect it to be less than 50. It would be absolute madness to have a repository per table (which ORMs do, because they are mad indeed). It's usually possible to group tables into partitions of strongly related tables, and you'd have a repository per partition. Let's forget about the protocols, I don't think they are important to this discussion or even needed. Consider what you'd do if you decided to switch databases from some SQL one to Datomic. As a first step I would look for all database access code and put it into functions that take the connection/transaction as the first argument and collect them in some abstraction layer (this assuming that I do call the db directly from handlers in some places). When finished with this refactoring and I know that everything works I basically just need to swap out the implementations of those functions to use Datomic now. I would not have this layer as one big file but split up into multiple namespaces roughly mirroring higher order concepts of my domain. And those namespaces you could then call repositories. So I think what @U031CHTGX1T is talking about is in essence two things. This very layer and orthogonal to that DDD aggregates. Protocols in particular, which make this feel OOP, are one way to implement this but can also be skipped. Please correct me if I'm wrong. I think the repository approach (with aggregates) discussed here is inherently inefficient in a few places. You will likely most often overfetch and your code to apply changes to a database will most often be pretty complicated because the interface is so abstract that the implementation needs to figure out what even changed to be able to attempt a more or less optimal set of inserts and updates. Basically you want to say (save article), so pretty deep interface. Side note: I wonder if something like what the React.js reconciler is doing could help here. I'm not trying to advocate this design, I merely find it very interesting and I wanted to make a point about the ideas here not being inherently OOP or similar to ORMs.

👍 1
🙌 1
Ferdinand Beyer10:02:23

> Protocols in particular, which make this feel OOP, are one way to implement this but can also be skipped. Please correct me if I’m wrong. You are right. What you want is probably “polymorphism a la carte”. Protocols are one way, but there are others of course. I might want to add that calling protocols “OOP”, and as a consequence not idiomatic, is kind of weird. All of Clojure is defined on top of interfaces/protocols.

Ferdinand Beyer10:02:02

> I think the repository approach (with aggregates) discussed here is inherently inefficient in a few places. It depends. When using a document DB, you will likely fetch and store whole aggregates anyway. You could optimise with lazy loading, or even go for a pull/EQL-style API.

1
Ferdinand Beyer10:02:00

> I’m not trying to advocate this design, I merely find it very interesting and I wanted to make a point about the ideas here not being inherently OOP or similar to ORMs. This is actually the same how I feel about it. I want to have this design in my toolbox, and I think it could produce some really elegant solutions when you are dealing with a complex domain. In Sean’s case, it seems that his needs are mostly CRUD with little domain logic, and this could be overkill.

Drew Verlee11:02:12

I think the conversation is challenging because it's still unclear to me what's being advocated. Maybe this can be distilled into a library with an example over two databases? The question i have to be wary of is, is this an abstraction or just indirection? The former simplifies a problem, the later just moves it downstream. When we talk in terms of just separation, it seems to lean strongly towards the later. Different db access patterns imply different speed and memory considerations. And those can be business concerns. Trying to abstract over differently storage options is challenging to do well. What I'm not seeing here is any guidance on how to do that. For example, earlier we spoke of how we would handle run time sql joins and it was suggested that the dev repository would have the items pre joined. This leads to a huge trade-off (syncing schems changes across repositories) and if your willing to go that far, then you might as well use a document store and have your costumers benefit from the speed improvements over run time joins.

Drew Verlee11:02:38

My emotional reaction comes from having wrestled with these ideas for quite a bit on both sides to the point where i feel obligated to defer the choice tell it seems more obvious why it pays off. Having worked in code basis where that choice was dogmatically taken from me always left me feeling drained. As if I was burdened with all the time consuming cons and reaped none of the abstract rewards.

💯 1
Ferdinand Beyer13:02:57

In the end this all boils down to trade-offs. Without abstracting the database, you are probably quicker in the beginning. You will need less code. Your system might be easier to understand (although that is subjective). You can tweak your SQL queries individually for every single use case. On the other hand you might find it hard to test without a database + setting up tables, test data, etc. It might be harder to run tests in parallel. It might be harder to refactor / synchronise schema changes with your code because it’s all over the place. It will be harder to switch database technology, wholly or partially. It might be harder to move a part of your application to a different server (e.g. microservices / share nothing). You can’t eat your cake and have it, too.

seancorfield16:02:38

FWIW, we actually have changed databases in the past. Twice. We went from MySQL to MongoDB and back to MySQL. And we did it almost entirely through a thin wrapper around CRUD operations because almost every entity in our system is pretty much a flat hash map so interacting with tables in MySQL or collections in MongoDB looks very similar. Our wrapper had a lookup hash map of which entity type lived where so we'd say (store/get-by-id db :entity id) and (store/insert! db :entity hash-map) and it would lookup :entity and route to either the JDBC routine or MongoDB routine based on that. And db was a generic Database Component that included both JDBC and MongoDB connection setups.

apbleonard20:03:33

I must say I personally don't think it's that controversial an idea to abstract away database access. If your business thinks in terms of a large SQL database schema - so be it - but that needn't skew another's gut feel that it's often a good approach. Again - need to bear in mind the scale/complexity/longevity anticipated. There's a big difference between choices for a startup vs a lumbering government giant (like I work with!) CQRS can allow one to have a fairly simple (immutable) eventing repo api with hidden behind a defprotocol or (my favourite) a defmulti fairly easily. Then the complex querying side can be tied closely a particular read model implementation somewhere else where the events are streamed into a graph database or a data lake Azure thing if really needed. Of course there's tradeoffs with that. But yes I would encourage abstractions where it makes sense because for systems with any longevity Y-often-are-GNI and it can help to call out how closely you are getting tied to a particular data layer implementation if that interface starts growing too large.

didibus06:03:52

The discussions on DDD I think mix ways to implement the ideas in OO versus what the ideas of DDD are. First and foremost, DDD is about modeling your domain. That starts with understanding what your data is, and how you want to operate over that data and around that data (side effects). Then it says to speak in terms close to the domain, call fields and composite of fields the same as the people whose domain you are creating software for talk. That's kind of the basic of DDD. Then design your app around that domain model, the model, aka data and how it is structured is the hardest thing to change in an application, because everything else will be implemented in terms of how it assumes the data is modeled and structured. So spend more time getting that right. I think that's good advice. Data is hard to change. And I feel Clojure and data-driven and data-oriented principles actually fit well with DDD personally. When it comes to Clojure, what I would say is that abstracting the DB or not doesn't matter. What matters for a DDD approach is that you model your application data using Clojure's tools, so you'll probably use maps and vectors and sets to model it. Then decide how exactly your domain is beat modeled using those data structures that Clojure offers. Finally you'll just store and retrieve this data in those forms. So your map might get stored to a relational table, that doesn't mean you want it as a vector of vectors inside your app. That means you'll have a translation layer somewhere between the data structures your app uses, and the way that it's persisted to a database. That's where Repository comes into play in DDD. Where Repository gets mixed up sometimes is with the OO problem of object relational mapping. The idea that you have an ArticleRepository, that's very OO. All you need for your repository is really just that mapping layer. You don't have to abstract the database, the repository could be plain SQL, what you want to abstract is your data structures, like I said, your app might want a map where your DB is a relational table.

didibus06:03:44

Another cool thing about DDD that people sometimes mix up I feel, is the concept of entities, value objects and aggregates. And that plays with Repository and Finder. An entity is data which has an identity and whose equality is based on that. Two users are the same because they are the same userID, but the data about that userID, such as the profile-name could change overtime. A value object is data which is what it is based on its value, and has value equality. A good example is "5$". Five dollar US is always five dollar US, it doesn't change, if you have 6 dollar US it's not that the dollar went from 5 to 6, it's you who swapped out a 5 for a 6. And an aggregate is the set of data for which invariants apply at write time. If you have a car it probably shouldn't be driving = true and doors = open at the same time, maybe that's a corrupted state for a car. So you'd want to set doors = closed and driving = true atomically together so that you can't be in a state with one but not the other. Repository is meant to store and retrieve entities by ID, that means it always returns one and only one. While Finder is meant to search for entities by any possible search criteria, and return the list of them. In Clojure what I like to do is model all this with Spec. Your entities and value objects can be Clojure maps, or vectors or sets. And you have a Spec for each one as well as aggregate roots. When you want to store something to the DB, you'd validate the Spec for what you're storing, that validates your invariants, and then if that passes you translate it to your persistent model and make the write. When you read, you can use a whole other data structure. The point is when you write, you might not want to allow someone to just do a DB update of driving field from false to true. You might say that both driving and doors always must be updated together as an aggregate, so you can validate that never is doors = open and driving = true.

didibus06:03:03

And basically, I think unlike in OO, there's really no barrier to this in Clojure. You can easily start with maps, eventually grow a spec, then use whatever you want to store and retrieve the maps, etc. I don't think you need protocols or anything like that either. All you need to do DDD in Clojure is to do it, which just means to think about how you want your domain modeled, think about the data first, and how to structure it first. Think about the various context this data can exist in, think about what the de name in different context might mean slightly different things, etc.

Drew Verlee08:03:11

Interesting thoughts didibus as always, i feel like some concrete examples and a clear motivation would go a long way to helping me understand the finer points which your getting at. My impression is that trying to make clojuresl data structures function like a relational model is a questionable use of time because the translation layer between them can't support meaningful change. E.g you can't swap your relational darabase for a graph one. And because it's unclear why changing databases itself is something we should spend time on vs other objectives.

didibus17:03:10

I was trying to say that the point isn't to change database, but to have the data in a structure best conductive to your application code and your domain, and not forced into a structure best for your database.

didibus19:03:22

A simple example, say you have user accounts. If you store that in MySQL, you'll have basically a set of vectors:

#{[1234 "username" "email" "hashedPassword" "us-en"]}
And even the types, MySql won't support keywords for example, so the language is a string inside it "us-en". Do you really want to get this data-structure back as-is? And have your application make changes to it in that form? The answer can be yes here. For simple apps, or things where your application adds little business logic, maybe it's fine. But often you'd rather have the application maybe represent users as a map of map:
{1234 {:id 1234 :username "username" :email "email" :hashed-password "pass" :language :us-en}}
This is the same data, but structured differently. If you want your application to have it in that form, you need a layer of translation now. I feel it's almost common sense, the difference is that DDD tries to tell you to think about this part more thoroughly. And not just the structure, but the data as well, are you sure you want IDs? Should they be globally unique? Is a User always a User? Does it always have a password? Can it exist in other contexts without a password? Is the email and language a part of the User or should it be there is a Profile entity that gets attached to a User? Finally, it gives a guidance to think about "unit" of data. For example, why should email not be on User? Why extract it out into a profile? One way to help reason about it is to think about what piece of data is needed together to validate invariants. Basically, what is safe to update or write independently and in parallel. So that's one way to reason about "aggregate" of data. Basically this info has to be taken together to update or write.

👍 1
Drew Verlee19:03:02

Thanks a lot for the example and further explanation, it helps a lot. Translating a table to a hashmap means bringing the columns into each item as keys or having callers know the column order. Having the ID as the key to the item is situationally useful, but often the right choice. The rest of the questions to are situational, but in general the most developer ergonomic option is to have callers know the schema and how narrative behind the story. E.g There is no need to request the user password if i need to display just there name, I as the developer know this because I understand a user name isn't their password. This example is contrived of course, but in more complex situations it's very easy for the story to get lost as it travels through the system. As I believe Rich Hickey has spoken on before, tables create slots and by and large what we seem to discussing is how those slots make it easy to over bundle information (a user email appearing where it's not needed, or vise vera). A problem that datomic was designed specifically to address (at the cost of other things). Similarly, in my mind, the original problem seemed to be conflated with the limitation of SQL api's which take strings as input and so are hard to compose using features of the programming language. It's possible I just haven't read some of the same literature on DDD that's being presented and so i'm interpenetrating the information differently. As most of what you say above resonates with me as being for the most part fine. Though I would prefer that the default be a solution where the query and transaction language spanned the full stack like hyperfiddle or https://github.com/oakes/odoyle-rum-todo.

didibus21:03:54

I find Spec pretty good for all that. You can literally model contexts explicitly say:

{:user/context :signing
 :user/username "foobar"
 :user/language :us-en}
{:user/context :signedup
 :user/username "foobar"
 :user/language :us-en
 :user/email ""}
And validate semantics and existence for various contexts, using a multi-spec, you see the user journey through various contexts and time.

Ferdinand Beyer06:03:44

@U0K064KQV — thanks for sharing your thoughts. I feel a lot of this is what I tried to express as well, but I properly focused to much on the “repository” bit. As you said: When the structure of your data does not fit your DB naturally, you will need some kind of mapping. The repository pattern (from tactical DDD) is one option to achieve that.

Drew Verlee12:03:06

Can you explain what you mean by "when the structure of your data doesn't fit your db naturally"? I feel like what I'm reading are problems that occur with tables but wouldn't happen using a graph db like datomic. Other issues like keyword support aren't a matter of shape but encoding. We suffer in this field from a lack of shared set of concepts and notation by which to discuss issues. it leads to a lot of stress and mis communication. The only material that comes close i can think of is martin kleppmanns. I'm read the link on ddd, ty for sharing.

Ferdinand Beyer13:03:17

I mean it in a very general sense: Table normalisation, complex / nested fields, types that are not supported in the DB, … Ideally, I want to express my domain in Clojure data without having to compromise to fit the DB technology

Drew Verlee14:03:49

Aren't those limitations specific to certain databases?

didibus04:04:52

I thought I'd write an example of Domain Driven Design using Clojure, so I did here: https://github.com/didibus/clj-ddd-example I also basically explained most of DDD and how I'm implementing it in Clojure as part of the doc-strings throughout the code base, so it's a bit of a literate programming guide to learning about DDD and doing DDD in Clojure.

👀 2
🙏 3
👍 2
💯 2
❤️ 2
Drew Verlee05:04:30

This is great, I'll give it a read tomorrow.

msolli13:04:38

That is a great resource, @U0K064KQV. The explanations of DDD concepts together with the code examples are spot on. Thanks!

woohoo 1
phronmophobic17:04:25

Is this code thread-safe? It seems like the reads and writes all happen independently which could lead to double spending.

didibus17:04:06

No, I was thinking I should probably show that as well. I started mostly caring about showing the structure of the code base, but I do feel a bit uneasy about showing an example with that flaw. I'm thinking how I want to demonstrate it, but it depends a bit on the details of the storage. I might pretend like you can acquire a transaction from the repo, but just make it a lock underneath to my mocked datastore.

didibus17:04:16

That said... It might be a good opportunity to talk about how the repository isn't really meant to abstract your database, because the concerns of consistency need to consider the datastore characteristic, and how you design your domain events, and your application service and your repository together matters. The purpose of the repository is simply to structure the code base neatly so you can reuse the same state management functions from multiple places. And that's often a misunderstanding of DDD, well people think the Repository is meant to let you swap storage at will. You could to some extent also try to do that with it if it mattered to your use case, but it's non-trivial and no abstraction can fit all datastore. It's easier if you only want to swap one SQL engine for another, say support for MySql and PsotgressSql. But if you want to swap between an atom, MySql, Cassandra, Kafka, Mongo, etc. You'll most likely not find an interface that you can correctly implement for all.

didibus17:04:35

@U7RJTCH6J If you look at the blog links that I'm using as my example, their requirements allowed double spending. Their argument is that the bank business can actually handle that edge case without a problem, so eventual consistency is all you need. What would happen is your balance would go negative and you'd owe money to the bank. I think that's an interesting argument in this case as well. And that's what I implemented as well here. The transfer is eventually consistent, in that it's not possible to send someone money and still have that money in your account after, or for the money to be removed from your account but not added to the other account. That's because the transfer-money service returns all three changes together to be eventually committed to the datastore. What can happen though is you can transfer money you don't have during a short time window. The linked blog says that's fine, you'd now just owe the bank money. But, there's still an issue in my case, which is that my account balance can override each other, and won't reconcile with the transfer ledger. I'm not sure if I want to go eventually consistent and show how your event would change to be a diff instead of the new account, or if I want to show a fake two phase commit using a transaction.

phronmophobic18:04:07

I think implementing transfers without consistency is maybe not such a great example. Consistency is the hard part of the problem and a strict requirement for many real-world problems. One of the arguments against this sort of design is that it adds indirection without much benefit.

1
👍 1
didibus07:04:38

Ya, that's fair. I'll update the example. I'm just worried if I solve it with an atom as the datastore, it might confuse people, since in a real world app you'd use either MySql, Mongo or some event sourcing, etc. And the solution for those will depend on the datastore mechanisms for it. So what I might do with an atom wouldn't translate to what you'd do with MySql... But I'll probably just try and make it look like I was using MySQL and explain some of it in comments. Though I also like the eventual consistent solution... Hum, maybe I need two branch haha

gklijs19:04:56

It's interesting. I've done similar things with Clojure + Kafka, and Java + Axon Framework. It's on my list to try Clojure with Axon Framework stil.

didibus23:04:28

With an Event Store type of thing, you would just have it that when you modify the Account entity, instead of returning the new modified Account, you'd return a domain event that describes the change. And eventually the event will get processed and will apply the change to the state. So you get eventual consistency. The downside is you don't have read-after-write guarantees. So it means that eventually the balance of each account will always be showing the correct amount as based on the ledger of transfers, but it is possible to make a change that is "illegal", such as allowing a transfer even when there's technically not enough money in the account, resulting in a negative balance. That's because, in the time it takes to reflect the state change, if another transfer is initiated, it won't see that the balance has already been deducted and will generate another domain event to change the state further.

didibus23:04:32

What the blog I was reproducing the example claims, is that we engineers often fear that kind of behavior. But they were arguing that we should embrace it, because these kind of problems happen to businesses and people in real life all the time, and they have mechanism to handle them. Such as in the example here, they said most banks have the concept of overdraft, or force a fee if someone transfers money they don't have, allowing the bank to cover the transfer.

gklijs05:04:49

That should not happen, als the aggregate would indicate there is not enough on the balance, so the money won't be transfered.

didibus06:04:04

Updated the example with event sourcing and eventual consistent approach. It's common to pair DDD with that, so I showed that first. I'm thinking I'll make two other branches, one that shows a strongly consistent pessimistic locking, most common with say SQL database and transactions. And one that shows a strongly consistent optimistic locking approach, more common with NoSQL databases.

gklijs06:04:13

If another command changed the aggregate in the meantime, sending the event fails, and can be tried again with the updated aggregate. You do need to prevent concurrent writes to the same aggregate for this.

didibus13:04:13

@U26FJ5FDM In your example, say that two handle-debit commands for the same account are sent at the same time, and they both call get-account at the same time. They'd both see that there's still money in the balance, and will both allow the debit, and both produce an event to allow it, and eventually both event will update the DB, resulting in a negative balance no?

didibus13:04:43

@U26FJ5FDM Similarly, say there's already an event to debit in the Kafka stream, but it hasn't reflected the change to the DB yet, it's queued up. And another debit command happens, then your code would see a stale account state from the DB no? And it would possibly allow another debit event to publish to Kafka again no? Or do you have something else going on to prevent these?

gklijs13:04:25

No, the second update will fail in this case, that's because of bkes. There are better ways to prevent concurrent updates. Since it's based on an aggregate state you can see beyond it. Which is why https://github.com/gklijs/bank-axon-graphql/blob/master/command-handler/src/main/kotlin/tech/gklijs/commandhandler/BankTransferManagementSaga.kt a saga is used.

didibus13:04:34

Sorry, that doesn't really tell me what the mechanism is. Does your Saga take a read/write lock of some sort on your DB?

gklijs13:04:39

Yes, but that's using Axon Framework. In the other case it's bkes, which is an event store on top of Kafka, which need a key + serial number to write.

didibus13:04:30

I see, so with Axon, it's using a distributed lock over the saga, thus pessimistic locking, and in your other case it is using an optimistic lock checking versions at write or something like that using bkes?

didibus14:04:38

In the bkes scenario, how do you revert the change? You just drop the message? How would you say alert the user?

didibus14:04:58

Does bkes take a read lock as well?

gklijs15:04:46

So with bkes, it does two 'writes' via gRPC that have something like key 'foo' and order '5'. Only the first one will be turned into a message, the second will fail. I don't think I implemented some auto retry on the Clojure side, but that should work. I. That case the aggregate is updated by the message, and the retry will use use order '6'. Note that bkes is it's current form is far from production ready. You need to make sure only one instance is running for each Kafka topic. But it was nice to get a better idea about the complexities involved with event sourcing.

didibus16:04:33

So you mean that when you publish to Kafka, it actually publishes to bkes synchronously and that would throw an error if it sees your message is for a stale "order" version? And the command handler would get that error synchronously and could therefore retry the whole handling from the read again?

gklijs17:04:23

Almost, bkes is between Kafka and the application. Exactly because with Kafka you can prevent concurrent writes, or query by key.

didibus22:04:41

Is it asynchronous though? I'm picturing now something like: 1. Get the account 2. Check if it has enough balance 3. Publish a message to bkes about the change, but if bkes sees that a change for that same key at the same version already exists, then the publishing to bkes throws an exception 4. Catch the exception thrown by bkes and go back to step 1 Where bkes publishing is synchronous, at least for its verification of if a possibility conflicting change exists. If it's async for that, I'm confused what would happen, how would you retry the operation?

gklijs06:04:28

No, bkes is not async, retry would need to be initiated from Clojure, after async reading the new event.

didibus19:04:20

I see, so the publishing to bkes would fail, and that gives a chance to retry the operation. I guess now it's all a matter of performance for bkes. If it can be faster than just using the DB directly, it still makes sense, but if it isn't, it would seem to defeat the purpose of going eventually consistent and using Kafka in the first place no?

gklijs19:04:54

It should be fast, but the biggest problem is there should be just one instance. So you would need some Kubernetes setup, where you know where all the leader partitions are. Also, as the data grows it will take more time for each instance to read all existing data. So you want some sort of snap shotting. All very well possible, but than, given me current position, I'd rather advise to use Axon Server 😉.

gklijs19:04:04

Eventually consisted is fine, but you want to have something in place to prevent 'invalid' events, since that's your source of truth.

didibus20:04:42

Till now, I still don't know of any way to do that without just making things strongly consistent again. Both axon and bkes seem to just add locking (pessimistic or optimistic) back on top. It can be an okay strategy if you only have a few entities you need to protect, they can be strongly consistent, while the rest is eventually consistent. But if you find yourself having to lock around everything all the time, the whole point of going with async writes and Kafka seems lost to me. You'd be better just write straight to your DB at that point, the whole architecture would be simpler with easier write backoff handling.

didibus20:04:45

Though the bkes idea is interesting. Potentially you could just track which entity has a pending change, and block on that. This tracking could be all in-memory, so arguably I think it would have the potential to be faster than just using the DB directly. And if your writes are mostly independent, it would never contend either. Also, you can be optimistically locking which tend to also be faster unless you have really contentious entities. Did I get it right in that this is the idea behind bkes?

gklijs22:04:09

Yes, that's about right. It should be faster than a database, and scale better, given you solve a few additional problems. Plus you can leverage Kafka tiered storage when available, like on RedPanda or Confluent Cloud, to have all the history.

apbleonard16:04:49

@U0K064KQV Thanks so much for your example. I would love to see more "reference implementations" of the same problem in different styles (and languages!) 🙂 I wondered if your code would "treat the database as an implementation detail" even more if the "repository" was made to be pluggable. You could do this e.g. with multimethods or protocols exposing the "get-account" and "commit-transfered-money-event" and providing an "atom" and (perhaps later) "mysql" implementations for each. Saying that is easy and the knock on effect of trying either is surprisingly hard and can feel quite non-idiomatic for Clojure, but it fulfils this need better than simply directly calling repository functions with no indirection I think?

gklijs19:04:26

@U3TSNPRT9 it's interesting you mention this. Axon Framework is pluggable to different kind of databases. Some of the parts can also be implemented for Kafka. The kinds of abstractions involved could be interesting.

didibus22:04:23

@U3TSNPRT9 I think you could, to some extent, but the issue is that the way you model the events will probably depend on how your DB handles atomicity and coordinated changes. For example, if you use MySQL, you don't really need to model changes as a diff. You can just return the new entities with the change applied already, and do a transaction with a read lock in MySQL where you simply one after the other update each entity in one transaction. And the second issue isn't the model, but the application service itself. For example, with MySQL, if you wanted to use a transaction for update with a read lock, when you call get-account, you need to do so within the context of an existing transaction on the DB, and you need to execute the SQL to take a FOR UPDATE lock. So now your protocol for your repository needs to support a way to pass that MySQL transaction context and to specify that on this get-update call you need a "for update" read lock to be taken as well. Finding the right protocol abstraction here so that it can then be implemented for MySQL, Kafka + Mongo, DynamoDB, etc. can be tricky and it might just end up as a kind of union of all of the one you support.

didibus22:04:15

If you only wanted to mock your real DB, I think you can do it, model the protocol for the repository in the way your DB best works with it, have your application service use it that way. And then I'm sure you can find a way with atoms or other things to mock that specifically. But if you wanted to easily swap between very different DBs, I think that is harder, because of the challenges I described

gklijs03:04:51

Swapping is not important, although if you have an abstraction to get all data as events, which you can read from the beginning, should not be that hard either. It's more important people can use it with a db they already have or know how to manage.

didibus04:04:52
replied to a thread:I am a believer in treating the database as an implementation detail and not tying my logic to database details. The idea of the “repository” pattern in DDD is not to build a “pass-through” interface to SQL, but to provide a collection-like interface over your data. I think this fits nicely into Clojure’s mindset, when done right. In Clojure, collections are immutable. So how about you design a repository protocol to resemble an immutable collection? (defprotocol ArticleRepository (add [repo article]) (remove [repo article]) (find-article-by-id [repo article]) ...) In the beginning, you don’t even need a database. Just implement this with a backing hashmap. `add` will `assoc` and `remove` will `dissoc`. Write your logic with pure functions operating on this “pure” repository and returning an updated version. This also fits nicely with “database as a value” implementations such as Datomic or XTDB. You can accumulate changes in `add` /`remove` and transact them at the end of a “command processing” function. In DDD, your domain model will not do that. Your application layer will create repositories, pass them to your domain code, and transact the result. This approach gives you a lot of flexibility. You can write blazingly fast tests. You can split your application into microservices and have a repo implementation that talks to other services. You can switch your database. All of that without changing the domain/business logic code. Of course there is no free lunch. This might be overkill for really small apps. Architecture is all about trade-offs.

I thought I'd write an example of Domain Driven Design using Clojure, so I did here: https://github.com/didibus/clj-ddd-example I also basically explained most of DDD and how I'm implementing it in Clojure as part of the doc-strings throughout the code base, so it's a bit of a literate programming guide to learning about DDD and doing DDD in Clojure.

👀 2
🙏 3
👍 2
💯 2
❤️ 2