architecture 2023-06-02 | Slack Archive

Drew Verlee14:06:58

People who are using a monorepo, why and what advice do you have on that choice? I'm defining a mono repo as any one repo which builds at least two different services. E.g teo http serves.

lukasz15:06:09

My team started with many-repo approach but eventually we merged ~50 repos into one. The productivity boost and simplification of our release process were 100% worth it. I'm starting a new product right now and we're going with a monorepo from day 0. Happy to answer more specific questions!

Drew Verlee15:06:34

I feel like the only difference is how you interact with git repo. For me, that would be git hub. For insurance, if i used submodules, the lines get blurred right? What was the productivity boost from?

lukasz15:06:10

Not using submodules :D

Drew Verlee15:06:26

I guess maybe i need to make sure i understand what your doing Lukasz. Are you building two artifacts from the same repio? Is there code that's used in one build but not the other?

lukasz15:06:25

Yes, many different artifacts - there's a bunch of private libraries that the CI builds and pushes to http://deps.co which then are used by a dozen of Clojure services

lukasz15:06:44

which in turn are packaged as container images

lukasz15:06:59

More on submodules: we relied on them for sharing static configuration, Avro schemas and a bunch of other stuff and it was a royal pain in the butt . Other benefits, that many people wrote about that I have first hand experience with: • a single commit points you at the state of the whole system • you can make sweeping changes across the whole codebase and manage them as a single review/deployment/whatever unit • mass upgrades to dependencies are one babashka script away • my original team use Maven for distributing internal Clojure libraries, in my new project I just use deps.edn and local dependency paths • Terraform configuration is in sync with the application code There are disadvantages: • long lived branches hurt so much more • everybody has to stay on top of ensuring that the main branch is regularly merged in • we had to "stack" pull requests for bigger changes to allow for easier reviews while not being able to ship in-progress work to production

seancorfield15:06:19

> long lived branches hurt so much more IMO, long-lived branches indicate work isn't being broken down into small enough chunks. You can use feature flags and configuration to allow small pieces of larger features to be merged and released often, to avoid long-lived branches.

➕ 10

lukasz15:06:06

100% and we used all of these but there were cases where it was unavoidable (e.g. complete rewrite of one of core pieces of the whole app)

Drew Verlee15:06:58

Are your build targets all building from the same commit?

seancorfield15:06:59

We also have a monorepo at work -- it used to be just a big bucket containing lots of independent projects but now we have a lot of reusable code and a unified dev/test/build system overlaid (partly via Polylith). Being able to centrally manage the process/infrastructure -- and have a dev REPL containing all code -- has really streamlined our work. Polylith can automatically figure out what projects and what components need to be retested after any change is made, so our CI only tests, builds, and deploys what is needed for each PR & merge. Release management is simpler, since the repo as a whole has a tag for each set of released artifacts (and we can still choose what goes to production independently, so we can -- and do -- still have mini-releases of just one or two artifacts, as needed). Tooling is streamlined and simpler. The unified REPL really improves productivity since you can have one edit/eval session up and running and work across any part of the codebase, in any components, or projects.

👀 6

lukasz15:06:30

> Are your build targets all building from the same commit? Not by default, libraries have their own versions, same goes for services. Although we do have tooling to "build the world" from a single commit if there's a need.

seancorfield15:06:05

> Are your build targets all building from the same commit? Each CI pipeline run targets a single commit, and tests and builds whatever is needed based on changes made since the last CI pipeline run (on the main branch). Each set of artifacts is deployed automatically to our staging environment. For some pipelines, no artifacts are produced (changes to infrastructure/tooling), but most pipelines produce artifacts... sometimes just one, sometimes all of them (if a low-level reusable component changed). We have 21 projects (artifacts) built from 141 components, of which three have "swappable" implementations (at build time).

seancorfield15:06:31

(I just checked our deployment page and it looks like our most recent CI pipeline built and deployed 20 of our 21 projects!)

Drew Verlee15:06:18

In our case, we have two servers building on their own braches, from one repo. And sometimes i run into the case where i have a merge conflict that contains code for the other server. This is a problem because this server, by definition, isn't going to use this code. The only way i can see to avoid this, using one repo,is to either use two separate branches for the apps, and keep the shared code in a third.

seancorfield15:06:23

So you actually maintain two different versions of the "shared" code? That seems like a recipe for disaster...

➕ 8

seancorfield15:06:06

Why not break it in more pieces, or use some sort of (runtime) configuration so you can have a single, common reusable component?

seancorfield15:06:57

If that "shared" code has a common API, in Polylith you could have one interface with two implementations, and each server ("project") could select the appropriate implementation ("component") at test/build time.

seancorfield15:06:23

I think your life could be so much simpler 🙂

Drew Verlee15:06:48

The reason it wasn't a disaster, and only maybe a mild inconvenience, was because it was just one developer. I made two, and so i increased the amount of coordination. My confusion is that my solution to this issue would be to move each server into its own repo. And similarly, have one repo per library. Then use submodules to create bundles of repos. This means coordination through something which can span submodules, but with deps just being data, that seems within reach. For instance, each bundle could have a parent deps that they all inherited from to get the same version. (the details on this are fuzzy bc i haven't gotten to it yet) But since you're not using submodules, and you have that isolation, i think there is another way. And I want to have it in mind as we consider our options. If we add more team members, the coordination will go up and will have to do something.

lukasz15:06:42

In that case, jumping all the way to polylith might be an overkill: a setup using local deps + deps.edn might be a good start

seancorfield15:06:07

> My confusion is that my solution to this issue would be to move each server into its own repo. And similarly, have one repo per library. Having worked in multi-repo and mono-repo environments, I would say that would make things worse, not better 🙂

➕ 7

Drew Verlee15:06:50

To simplify. If you have 1 repo. And two branches, one for each app. And by definition, an "app" is a bundle of code that has and needs code the other app doesn't have or needs. Then how do you honor that definition using a mono repo?

seancorfield15:06:19

We moved to Polylith incrementally, and we'd tried a bunch of different approaches before we started down that path -- all documented on my blog -- and the first step of moving from multiple projects and disjoint releases to a single, unified "mono-project" with lock-step releases was the biggest benefit early on.

seancorfield15:06:31

Don't have two branches. Stop hurting yourself! 🙂

Drew Verlee15:06:16

But it also hurts to have an app contain code it doesn't use, that can only weigh it down, right?

seancorfield15:06:11

Monorepo has "components" A, B, C, and D -- and "projects" P1 and P2. P1 uses A, B, and C. P2 uses A, C, and D. Some reusable code, some disjoint code. Test/build everything off each commit/merge to the main branch. Done.

👀 2

seancorfield15:06:32

The build process chooses what code goes into an app.

seancorfield15:06:06

The key is to separate the build/artifact process from the app code itself.

seancorfield15:06:53

In Polylith terms, apps are "bases" and all the "reusable" parts are "components". The "projects" just contain the build stuff for each artifact. That has proven key for us.

Drew Verlee15:06:08

as an implementation for this, you mean that your build selects the namespaces it needs? e.g the build for app 1 looks at the code in directory /app1 and app 2's build looks in the directory /app2.I see how that separates the build targets, but the git history is still intertwined, right?

seancorfield15:06:07

It's all local deps. So each project specifies the components it needs as local deps.

seancorfield16:06:29

I don't understand your concern about git history -- if you have separate repos, your git history is intertwined across multiple repos which seems far worse?

👀 2

Drew Verlee16:06:28

maybe 🙂 . I'll think on it a bit. Have to run into a meeting atm.

seancorfield16:06:10

I'd be happy to get on a screen share with you, and show you around our Polylith monorepo and answer Qs that way, if it would help...

Drew Verlee19:06:20

@U04V70XH6 Tell me when and where and I'll be there! I really appreciate the opportunity and you will have to let me know how I can repay the favor. I ask that you give me maybe a week to prepare by reflecting on today's discussion and gathering more information (your blog, poliyth, git concepts, etc..), that way when I get the tour I'll see more.

domparry20:06:13

+1 for polylith. It has been a game changer. Sean has already ironed out the details, but I just wanted to add another voice to how nice it is.

👍 6

seancorfield20:06:38

@U0DJ4T5U1 yeah, no rush. We can pick any time that's convenient for both of us (I'm Pacific time).

👀 2

Rupert (All Street)19:06:45

> People who are using a monorepo, why and what advice do you have on that choice? We do a hybrid repo which is a mono repo but with submodules. The most important thing about mono repo is that it flips responsibility for downstream usage on to the upstream projects. ie. if you release a breaking change to an upstream library you need to fix all downstream usages with your commit. Compared to multi-repo, where you would commit your change to a library other projects decide to upgrade or not to your library. The mono repo way makes sense in most companies. The team changing the upstream library should only go ahead if they have all the resources and budget to fix the usages - otherwise there is no point making the change in the first place. The upstream project knows about downstream impacts because all down stream coded is compiled and tested with the change. Suddenly unit tests become useful become beneficial to a team as they protect you from breaking changes from an upstream project - and upstream projects benefit from all the downstream unit tests. Hybrid is the same as mono repo but with fine grained access control to each repo (thanks to submodules). Highly recommend hybrid or mono repos.

lukasz16:06:07

I'd advise against submodules - with a bigger team and many concurrent changes things get out of sync very very fast. > if you release a breaking change to an upstream library you need to fix all downstream usages with your commit. Co This can be mitigated by requiring every piece of your stack to produce a deployable artifact (a jar, container image etc) so that you can update only what you have to - you get both benefits of a monorepo and less churn in terms of making sure that all library consumers are up to date

Rupert (All Street)16:06:45

I like that it forces downstream to be impacted - I see this as the main advantage of mono/hybrid repo. If your project doesn't have the budget to upgrade the downstream usages then your project should probably not go ahead (because code no one uses isn't valuable). Often you have downstream production systems that are critical but have no team or budget to upgrade them. So it's best to budget for these as part of the upstream project change. Given the choice of upgrading or not upgrading - many downstream teams will just not upgrade. So the upstream team then has multiple versions to maintain.

Rupert (All Street)16:06:26

> with a bigger team and many concurrent changes things get out of sync very very fast. There shouldn't be any synchronisation necessary for mono/hybrid repo. There's just one current version of everything. This is enforced by running compilation & unit tests against all downstream impacted projects.

👀 1

lukasz17:06:38

I guess, we never got to that point because our CI was never wired up that way - then again, given the size of the system it wasn't feasible for us to wait x hours for all unit tests to run - we preferred to test in production 😉

👀 1

Rupert (All Street)19:06:06

That's fair enough. It certainly takes time to optimise builds - which could be spent on other things. Good starting points for us were (A) use beefy boxes (ie not default Github Actions/Circle CI boxes which are slower than laptops), (B) use parallelisation (possibly even distributed to multiple servers) and (C) caching (so you only build what needs to be).

adi07:06:11

Trust boundaries are a reason to cleave source code into different repos. In git, submodules are an answer to this problem---where it is critical to have entirely separate commit history for governance reasons. e.g. You have an open core product, with a bunch of proprietary addons. You want the commit history of the open part public, but not any of the other parts. Similarly, access control within a large enterprise would dictate this trust boundary. You want different teams to share libraries and executables, but not original sources for <various reasons>. Within a circle of trust there exist these mutually orthogonal problems: • build pipeline requirements ◦ run test suite only against relevant subset of source for any given code change ◦ likewise, linters and other kinds of checkers ◦ build only exactly what we need per deployable artifact • the version control mechanism ◦ how commits are made and shared • source code organisation method/rationale, e.g. ◦ a directory per service (foo-service, bar-service) ◦ a directory per service per team (foo-iOS, foo-Android, foo-backend, foo-web) ◦ a matrix / building block system such as polylith (Organise code such that you can choose to compose it any which way after the fact. A sort of late-binding.) • shared productivity ◦ throughput (laser sharp tests, builds, reviews for fast cycles) ◦ visibility + trust (all-hands view of features crossing service/library boundaries) ◦ complexity control (clear service semantics via naming convention, composability of source, duplicate prevention, "diamond dependency" hell avoidance) Polylith and NASA's https://github.com/nasa/Common-Metadata-Repository are two different solutions to the question of managing all of an organisation's code under a single repo. Google's famous monorepo is yet another. Basically if you can build a sophisticated enough toolchain against a standardised organising principle, you can have one repo to rule them all.

Rupert (All Street)08:06:55

Yeah - we try and get best of both worlds with our hybrid repo approach. It's just like a mono repo, but we have fine grained control of who gets access to each submodule. To set it up just build a mono repo but use submodules for each project. You can use a simple lein plugin for the builds: https://github.com/ruped/lein-multi-modules Not much to learn or set up, just have build server run:

lein modules -p <threads> <test/build-command>

👍 2

ehernacki09:06:12

I tend to prefer monorepos, because it usually makes my job easier at first. Then "refactor" - i.e. move to multi or mono repos - when I really need to. I tend, however, to avoid having multiple teams working under the same monorepo, as it makes it more complicated for multiple teams to interact with and by having "no clear owner". IMO team interactions should be managed in another layer

adi12:06:02

I understand that "layer" as management policy and associated tooling that must deliver the same outcome (clarity, coherence, management ease) irrespective of method of organising source (mono/hybrid/multi repo). The policy + tooling implementation will of course change for whatever choice is made... and there is no silver bullet.

✅ 2

didibus06:06:55

The biggest benefit of a monorepo are that your dependencies are always the latest versions. For example, if you change lib B, when you build lib A it picks up the new B. So if you make breaking changes it forces you to get all dependent libraries working together. And the commits can rollback across libraries, so if A and B needed to both be changed, and some issue is found, you can rollback both at the same time. But at the same time, I feel it can make boundaries feel more monolithic, where you don't treat libraries like they need to carefully maintain backwards compatibility for example, the way that open source libraries do.

👍 2

Rupert (All Street)06:06:44

> The biggest benefit of a monorepo are that your dependencies are always the latest versions. > So if you make breaking changes it forces you to get all dependent libraries working together. Exactly. This is similar to my observation [https://clojurians.slack.com/archives/C0904S2QJ/p1685820825254549?thread_ts=1685717218.771449&cid=C0904S2QJ]. So many times I've seen multi-repo companies using multiple versions of the same dependencies, which: (A) can cause conflicts, (B) is confusing for developers because they need to know/memorise different versions of the same thing, and (C) means that multiple versions need to be maintained - possibly forever if a legacy systems cannot upgrade their dependencies.

2023-06-02

Channels