Fork me on GitHub
#datomic
<
2020-04-25
>
marciol16:04:59

Hi all. I'm thinking in increase our usage of Datomic, but I have some doubts about patterns of usage in a distributed microservices setting. It's common to see in the wild Datomic as the souce of truth and the final place where all our data should live. There are a set of good practices related to persistence layer with the microservices approach, and one of them is to set a database per bounded context to avoid coupling, but seems that doesn't apply when using Datomic, given that Datomic allows distributed peers. Can anyone shed more light on this subject. Blog posts and articles are very welcome.

bhurlow20:04:08

FWIW I know that nubank deploys a datomic instance per microservice

eraad00:04:10

I believe Datomic Cloud is optimized to work with one database. There is no need to shard or divide your application in multiple databases. Per my understanding, microservices architectures with physically separated databases are needed because of technological constraints related to scalability. With Datomic, you should not worry about that because it is already optimized for all kinds of data access patterns. Check these for further technical recommendations about those patterns: https://docs.datomic.com/cloud/best.html Regarding domain bounded contexts, I believe these should be enforced at the code level. If you have diferent traffic patterns for your applications, you can use query groups for example. This style of architecture is a bit different from the “common knowledge” out there that couples domain modeling of bounded contexts with technology/scalability contraints of specific database technologies. Anyways, I recommend you stick to one database and enforce your bounded contexts at code level. If you need more, checkout this planning strategies: https://docs.datomic.com/cloud/operation/planning.html

marciol14:04:00

@U0FHWANJK I have talked to a person that works there and he said that they deploy one datomic per bounded context

👍 8
bhurlow14:04:28

@U28A9C90Q yea I recall the same. I believe they have a "template" for starting a microservice which installs a datomic on-prem instance and s3 bucket per serivce

marciol14:04:08

Here at PayGo, a company which the main responsability is deal with payments in the ecosystem of C6 Bank (somewhat Nubank competing) we are using Clojure and Datomic in some services, and we are trying to build something like a template as well, it is on my list right now.

🙏 4
marciol14:04:59

Thank you @eraad, it is apparent more and more that with Datomic other kinds of patterns are needed. In order to not lose the advantage to have the database in our application process, as we have today using Datomic on-prem where each application is a peer, we need to plan how make the application run using the datomic ions strategy.

marciol14:04:23

We are still thinking in pros and cons of this approach.

eraad14:04:24

Nice, good way of thinking about it.

marciol14:04:50

Regarding the usage patterns, I wonder if it is possible to an application that depends of multiple databases to make a query joining other databases, as described by @dustingetz in this article: http://www.dustingetz.com/:datomic-myth-of-slow-writes > The problem with place-oriented stores is that sharding writes forces you to shard your reads to those writes.

marciol14:04:54

Nubank uses Pathom so they do almost the same, but relaying on each service to get from database a specific part of the data, aggregating all this data after that.

dustingetz14:04:33

Does Ions have multi-db queries? I thought Cognitect quietly turned that off shortly after Datomic Cloud release, not sure if they ever turned it back on with Ions

marciol14:04:26

Yes, but with datomic on-prem we can use multi-db queries. I’m just wonder about how the application will behave regarding memory usage, latency, etc etc

marciol14:04:09

Or just use Pathom to obtain the same result

dustingetz14:04:28

For on-prem, the databases will compete for object cache in the peer

marciol14:04:52

Yes, it is what I thought, this can happen even with one database, depending of amount of data and usage patterns, as we can read on this awesom post from Nubank: https://medium.com/building-nubank/the-evergreen-cache-d0e6d9df2e4b

🙏 4
marciol15:04:22

So I’ll avoid future problems giving up what would be a fantastic feature 😅

marciol15:04:41

unless someone change my mind 😄

marciol15:04:24

Based on what @eraad said, I think that the smart move is to avoid multiples databases, and only break into it if: 1. You hit the write limit throughput of one transactor, 2. The amount of data is so huge that you start to experiment some issues related to object cache space. Can you confirm this usage pattern: cc: @eraad @dustingetz @stuarthalloway @marshall @val_waeselynck

marshall15:04:42

on-prem or cloud?

marciol15:04:19

on-prem at first @marshall but we are evaluating cloud as well

marciol15:04:23

but one additional question @marshall: is it possible to avoid shard at datomic cloud? What is the strategy when data grows really big?

marshall15:04:50

In on-prem you should run a single primary logical DB per transactor. However, in Cloud multiple DBs per system is fine.

marshall15:04:18

can you define “really big”?

marciol16:04:16

Thinking about the limit of one Datomic Database instance being 11 Billion of Datoms, what corresponds to 353 datoms per second, we are planning to get on some our transaction system approximately 10% of this number.

marciol16:04:49

So what I consider “really big” is not that big in Datomic standard

marshall16:04:44

There is no hard limit on db size The 10B number is a guideline around when you need to consider options for handling volume, shards, etc if you’re unlikely to hit 10B datoms in 3 to 5 years, then i wouldn’t worry about it

marciol16:04:23

Seems the case @marshall, but I have another question regarding the architectural aspect: Is it possible to use a single primary logical DB to handle in a unified way all my data, even within a distributed services setting? Sometimes, according “common knowledge”, as pointed by @eraad is the way to go with multiple databases, but seems to me that this can be different when using Datomic. It will really awesome to concentrate all your data in one place.

marshall16:04:10

it depends a lot on your particular system needs, architecture, etc

marshall16:04:17

there is no right or wrong answer

marshall16:04:50

there are definitely advantages to a central single db

marshall16:04:04

likewise, there are lifecycle advantages to individual services having their own dbs

marshall16:04:52

i would assess the tradeoffs to the different options and determine which fits your particular system needs best

marciol16:04:08

I need to isolate my individual bias towards monolithic applications or “modular monoliths” as some name it, in order to do the best assessment

marciol16:04:40

But it is really fantastic that Datomic offers a larger range of options

marciol18:04:50

Btw, very good article @U0C4ECS1K

Aleed18:04:11

Hey y’all I’m working on two libs as I build my Datomic API one to manage AWS infrastructure: https://github.com/rejure/infra.aws another to manage schema accretions: https://github.com/rejure/dation both are intended for Datomic Cloud, try to make it easier to create configurations using EDN, and overtime will (hopefully) provide more utilities for managing aws infrastructure and database attributes/migrations, respectively feedback is welcome 🙂 feel free to open issue or discuss in #rejure channel I just created

Aleed18:04:01

on a related note, I have a question about Datomic accretions, unsure of what approach to take in above lib from my understanding, Datomic transactions are idempotent so you could reinstall attributes every time on startup but sometimes you also need to migrate data, so it helps to have some control over process currently I’m keeping a version number for schema that can be manually changed whenever schema/migration change is desired. another approach I just read in one of @val_waeselynck’s post about Datomic is to reinstall schema on startup but track migrations* that have been run (so that unlike the schema, they’re not rerun). i prefer this latter approach over version numbers,  but i’m curious, the `ensure-schemas` example in day-of-datomic-cloud repo checks if a given schema attribute exists before reinstalling - is there a reason this approach was taken instead? are there considerations I’m not taking into account?

val_waeselynck21:04:09

Note that Datomic transactions are not idempotent in general (e.g [[:db/add "my-tempid" :person/age 42]] will always create a new entity, for lack of an identity attribute to induce upsert behaviour).

val_waeselynck21:04:20

I only meant that schema installation transaction tend to be idempotent (e.g, creating a new attribute). So if you're a bit careful, you can usually just re-run your schema installation transaction, but it does require vigilance.

val_waeselynck21:04:17

I don't know if that's what you read, but you might take inspiration from this: https://github.com/vvvvalvalval/datofu#managing-data-schema-evolutions (won't work for Datomic Cloud, but shouldn't be too hard to port)

Aleed22:04:38

thanks for clarifying that. I was reading the “Using Datomic in your App” article, implementation in linked repo seems to be similar, will take a look. as is, datofu only works with on prem, right?