This page is not created by, affiliated with, or supported by Slack Technologies, Inc.
2020-04-25
Channels
- # announcements (2)
- # architecture (7)
- # aws (1)
- # babashka (105)
- # beginners (88)
- # braveandtrue (2)
- # calva (9)
- # cider (18)
- # cljs-dev (265)
- # cljsrn (22)
- # clojure (138)
- # clojure-argentina (3)
- # clojure-austin (1)
- # clojure-france (14)
- # clojure-italy (6)
- # clojure-uk (8)
- # clojurescript (283)
- # community-development (4)
- # conjure (11)
- # datomic (43)
- # docker (12)
- # duct (16)
- # emacs (1)
- # figwheel (1)
- # figwheel-main (27)
- # fulcro (10)
- # graalvm (6)
- # kaocha (4)
- # malli (9)
- # off-topic (13)
- # rdf (2)
- # reagent (12)
- # shadow-cljs (86)
- # spacemacs (1)
- # vrac (1)
Hi all. I'm thinking in increase our usage of Datomic, but I have some doubts about patterns of usage in a distributed microservices setting. It's common to see in the wild Datomic as the souce of truth and the final place where all our data should live. There are a set of good practices related to persistence layer with the microservices approach, and one of them is to set a database per bounded context to avoid coupling, but seems that doesn't apply when using Datomic, given that Datomic allows distributed peers. Can anyone shed more light on this subject. Blog posts and articles are very welcome.
I found this great article from @U7JK67T3N https://theconsultingcto.com/posts/datomic-with-terraform/
I believe Datomic Cloud is optimized to work with one database. There is no need to shard or divide your application in multiple databases. Per my understanding, microservices architectures with physically separated databases are needed because of technological constraints related to scalability. With Datomic, you should not worry about that because it is already optimized for all kinds of data access patterns. Check these for further technical recommendations about those patterns: https://docs.datomic.com/cloud/best.html Regarding domain bounded contexts, I believe these should be enforced at the code level. If you have diferent traffic patterns for your applications, you can use query groups for example. This style of architecture is a bit different from the “common knowledge” out there that couples domain modeling of bounded contexts with technology/scalability contraints of specific database technologies. Anyways, I recommend you stick to one database and enforce your bounded contexts at code level. If you need more, checkout this planning strategies: https://docs.datomic.com/cloud/operation/planning.html
@U0FHWANJK I have talked to a person that works there and he said that they deploy one datomic per bounded context
@U28A9C90Q yea I recall the same. I believe they have a "template" for starting a microservice which installs a datomic on-prem instance and s3 bucket per serivce
Here at PayGo, a company which the main responsability is deal with payments in the ecosystem of C6 Bank (somewhat Nubank competing) we are using Clojure and Datomic in some services, and we are trying to build something like a template as well, it is on my list right now.
Thank you @eraad, it is apparent more and more that with Datomic other kinds of patterns are needed. In order to not lose the advantage to have the database in our application process, as we have today using Datomic on-prem where each application is a peer, we need to plan how make the application run using the datomic ions strategy.
Regarding the usage patterns, I wonder if it is possible to an application that depends of multiple databases to make a query joining other databases, as described by @dustingetz in this article: http://www.dustingetz.com/:datomic-myth-of-slow-writes > The problem with place-oriented stores is that sharding writes forces you to shard your reads to those writes.
Nubank uses Pathom so they do almost the same, but relaying on each service to get from database a specific part of the data, aggregating all this data after that.
Does Ions have multi-db queries? I thought Cognitect quietly turned that off shortly after Datomic Cloud release, not sure if they ever turned it back on with Ions
Yes, but with datomic on-prem we can use multi-db queries. I’m just wonder about how the application will behave regarding memory usage, latency, etc etc
For on-prem, the databases will compete for object cache in the peer
Yes, it is what I thought, this can happen even with one database, depending of amount of data and usage patterns, as we can read on this awesom post from Nubank: https://medium.com/building-nubank/the-evergreen-cache-d0e6d9df2e4b
Based on what @eraad said, I think that the smart move is to avoid multiples databases, and only break into it if: 1. You hit the write limit throughput of one transactor, 2. The amount of data is so huge that you start to experiment some issues related to object cache space. Can you confirm this usage pattern: cc: @eraad @dustingetz @stuarthalloway @marshall @val_waeselynck
but one additional question @marshall: is it possible to avoid shard at datomic cloud? What is the strategy when data grows really big?
In on-prem you should run a single primary logical DB per transactor. However, in Cloud multiple DBs per system is fine.
Thinking about the limit of one Datomic Database instance being 11 Billion of Datoms, what corresponds to 353 datoms per second, we are planning to get on some our transaction system approximately 10% of this number.
There is no hard limit on db size The 10B number is a guideline around when you need to consider options for handling volume, shards, etc if you’re unlikely to hit 10B datoms in 3 to 5 years, then i wouldn’t worry about it
Seems the case @marshall, but I have another question regarding the architectural aspect: Is it possible to use a single primary logical DB to handle in a unified way all my data, even within a distributed services setting? Sometimes, according “common knowledge”, as pointed by @eraad is the way to go with multiple databases, but seems to me that this can be different when using Datomic. It will really awesome to concentrate all your data in one place.
likewise, there are lifecycle advantages to individual services having their own dbs
i would assess the tradeoffs to the different options and determine which fits your particular system needs best
I need to isolate my individual bias towards monolithic applications or “modular monoliths” as some name it, in order to do the best assessment
Btw, very good article @U0C4ECS1K
on a related note, I have a question about Datomic accretions, unsure of what approach to take in above lib from my understanding, Datomic transactions are idempotent so you could reinstall attributes every time on startup but sometimes you also need to migrate data, so it helps to have some control over process currently I’m keeping a version number for schema that can be manually changed whenever schema/migration change is desired. another approach I just read in one of @val_waeselynck’s post about Datomic is to reinstall schema on startup but track migrations* that have been run (so that unlike the schema, they’re not rerun). i prefer this latter approach over version numbers, but i’m curious, the `ensure-schemas` example in day-of-datomic-cloud repo checks if a given schema attribute exists before reinstalling - is there a reason this approach was taken instead? are there considerations I’m not taking into account?
Note that Datomic transactions are not idempotent in general (e.g [[:db/add "my-tempid" :person/age 42]]
will always create a new entity, for lack of an identity attribute to induce upsert behaviour).
I only meant that schema installation transaction tend to be idempotent (e.g, creating a new attribute). So if you're a bit careful, you can usually just re-run your schema installation transaction, but it does require vigilance.
I don't know if that's what you read, but you might take inspiration from this: https://github.com/vvvvalvalval/datofu#managing-data-schema-evolutions (won't work for Datomic Cloud, but shouldn't be too hard to port)
thanks for clarifying that. I was reading the “Using Datomic in your App” article, implementation in linked repo seems to be similar, will take a look. as is, datofu only works with on prem, right?
Based on what @eraad said, I think that the smart move is to avoid multiples databases, and only break into it if: 1. You hit the write limit throughput of one transactor, 2. The amount of data is so huge that you start to experiment some issues related to object cache space. Can you confirm this usage pattern: cc: @eraad @dustingetz @stuarthalloway @marshall @val_waeselynck