├── README.png ├── cspell.json ├── CITATION.cff ├── CODE_OF_CONDUCT.md └── README.md /README.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/joelparkerhenderson/monorepo-vs-polyrepo/HEAD/README.png -------------------------------------------------------------------------------- /cspell.json: -------------------------------------------------------------------------------- 1 | { 2 | "version": "0.2", 3 | "language": "en", 4 | "words": [ 5 | "Bazel", 6 | "buckconfig", 7 | "CODEOWNERS", 8 | "DVCS", 9 | "Phabricator", 10 | "PTSD", 11 | "subproject", 12 | "subprojects" 13 | ], 14 | "flagWords": [] 15 | } 16 | -------------------------------------------------------------------------------- /CITATION.cff: -------------------------------------------------------------------------------- 1 | cff-version: 1.2.0 2 | title: Monorepo vs. polyrepo 3 | message: >- 4 | If you use this work and you want to cite it, 5 | then you can use the metadata from this file. 6 | type: software 7 | authors: 8 | - given-names: Joel Parker 9 | family-names: Henderson 10 | email: joel@joelparkerhenderson.com 11 | affiliation: joelparkerhenderson.com 12 | orcid: 'https://orcid.org/0009-0000-4681-282X' 13 | identifiers: 14 | - type: url 15 | value: 'https://github.com/joelparkerhenderson/monorepo-vs-polyrepo/' 16 | description: Monorepo vs. polyrepo 17 | repository-code: 'https://github.com/joelparkerhenderson/monorepo-vs-polyrepo/' 18 | abstract: >- 19 | Monorepo vs. polyrepo 20 | license: See license file 21 | -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- 1 | 2 | # Contributor Covenant Code of Conduct 3 | 4 | ## Our Pledge 5 | 6 | We as members, contributors, and leaders pledge to make participation in our 7 | community a harassment-free experience for everyone, regardless of age, body 8 | size, visible or invisible disability, ethnicity, sex characteristics, gender 9 | identity and expression, level of experience, education, socio-economic status, 10 | nationality, personal appearance, race, caste, color, religion, or sexual 11 | identity and orientation. 12 | 13 | We pledge to act and interact in ways that contribute to an open, welcoming, 14 | diverse, inclusive, and healthy community. 15 | 16 | ## Our Standards 17 | 18 | Examples of behavior that contributes to a positive environment for our 19 | community include: 20 | 21 | * Demonstrating empathy and kindness toward other people 22 | * Being respectful of differing opinions, viewpoints, and experiences 23 | * Giving and gracefully accepting constructive feedback 24 | * Accepting responsibility and apologizing to those affected by our mistakes, 25 | and learning from the experience 26 | * Focusing on what is best not just for us as individuals, but for the overall 27 | community 28 | 29 | Examples of unacceptable behavior include: 30 | 31 | * The use of sexualized language or imagery, and sexual attention or advances of 32 | any kind 33 | * Trolling, insulting or derogatory comments, and personal or political attacks 34 | * Public or private harassment 35 | * Publishing others' private information, such as a physical or email address, 36 | without their explicit permission 37 | * Other conduct which could reasonably be considered inappropriate in a 38 | professional setting 39 | 40 | ## Enforcement Responsibilities 41 | 42 | Community leaders are responsible for clarifying and enforcing our standards of 43 | acceptable behavior and will take appropriate and fair corrective action in 44 | response to any behavior that they deem inappropriate, threatening, offensive, 45 | or harmful. 46 | 47 | Community leaders have the right and responsibility to remove, edit, or reject 48 | comments, commits, code, wiki edits, issues, and other contributions that are 49 | not aligned to this Code of Conduct, and will communicate reasons for moderation 50 | decisions when appropriate. 51 | 52 | ## Scope 53 | 54 | This Code of Conduct applies within all community spaces, and also applies when 55 | an individual is officially representing the community in public spaces. 56 | Examples of representing our community include using an official e-mail address, 57 | posting via an official social media account, or acting as an appointed 58 | representative at an online or offline event. 59 | 60 | ## Enforcement 61 | 62 | Instances of abusive, harassing, or otherwise unacceptable behavior may be 63 | reported to the community leaders responsible for enforcement at 64 | [INSERT CONTACT METHOD]. 65 | All complaints will be reviewed and investigated promptly and fairly. 66 | 67 | All community leaders are obligated to respect the privacy and security of the 68 | reporter of any incident. 69 | 70 | ## Enforcement Guidelines 71 | 72 | Community leaders will follow these Community Impact Guidelines in determining 73 | the consequences for any action they deem in violation of this Code of Conduct: 74 | 75 | ### 1. Correction 76 | 77 | **Community Impact**: Use of inappropriate language or other behavior deemed 78 | unprofessional or unwelcome in the community. 79 | 80 | **Consequence**: A private, written warning from community leaders, providing 81 | clarity around the nature of the violation and an explanation of why the 82 | behavior was inappropriate. A public apology may be requested. 83 | 84 | ### 2. Warning 85 | 86 | **Community Impact**: A violation through a single incident or series of 87 | actions. 88 | 89 | **Consequence**: A warning with consequences for continued behavior. No 90 | interaction with the people involved, including unsolicited interaction with 91 | those enforcing the Code of Conduct, for a specified period of time. This 92 | includes avoiding interactions in community spaces as well as external channels 93 | like social media. Violating these terms may lead to a temporary or permanent 94 | ban. 95 | 96 | ### 3. Temporary Ban 97 | 98 | **Community Impact**: A serious violation of community standards, including 99 | sustained inappropriate behavior. 100 | 101 | **Consequence**: A temporary ban from any sort of interaction or public 102 | communication with the community for a specified period of time. No public or 103 | private interaction with the people involved, including unsolicited interaction 104 | with those enforcing the Code of Conduct, is allowed during this period. 105 | Violating these terms may lead to a permanent ban. 106 | 107 | ### 4. Permanent Ban 108 | 109 | **Community Impact**: Demonstrating a pattern of violation of community 110 | standards, including sustained inappropriate behavior, harassment of an 111 | individual, or aggression toward or disparagement of classes of individuals. 112 | 113 | **Consequence**: A permanent ban from any sort of public interaction within the 114 | community. 115 | 116 | ## Attribution 117 | 118 | This Code of Conduct is adapted from the [Contributor Covenant][homepage], 119 | version 2.1, available at 120 | [https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1]. 121 | 122 | Community Impact Guidelines were inspired by 123 | [Mozilla's code of conduct enforcement ladder][Mozilla CoC]. 124 | 125 | For answers to common questions about this code of conduct, see the FAQ at 126 | [https://www.contributor-covenant.org/faq][FAQ]. Translations are available at 127 | [https://www.contributor-covenant.org/translations][translations]. 128 | 129 | [homepage]: https://www.contributor-covenant.org 130 | [v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html 131 | [Mozilla CoC]: https://github.com/mozilla/diversity 132 | [FAQ]: https://www.contributor-covenant.org/faq 133 | [translations]: https://www.contributor-covenant.org/translations 134 | 135 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Monorepo vs. polyrepo 2 | 3 | Objective 4 | 5 | Monorepo means using one repository that contains many projects, and polyrepo means using a repository per project. This page discusses the similarities and differences, and has advice and opinions on both. 6 | 7 | Contents: 8 | 9 | * [Introduction](#introduction) 10 | * [What is monorepo?](#what-is-monorepo) 11 | * [What is polyrepo?](#what-is-polyrepo) 12 | * [Comparisons](#comparisons) 13 | * [Key similarities](#key-similarities) 14 | * [Key differences](#key-differences) 15 | * [Tooling](#tooling) 16 | * [Bazel](#bazel) 17 | * [moon](#moon) 18 | * [Lerna](#lerna) 19 | * [OctoLinker](#octolinker) 20 | * [Nx](#nx) 21 | * [Monorepo scaling](#monorepo-scaling) 22 | * [Monorepo scaling problem](#monorepo-scaling-problem) 23 | * [Monorepo scaling mitigations](#monorepo-scaling-mitigations) 24 | * [Monorepo scaling metrics](#monorepo-scaling-metrics) 25 | * [Proponents of monorepo](#proponents-of-monorepo) 26 | * [If components need to release together, then use a monorepo](#if-components-need-to-release-together-then-use-a-monorepo) 27 | * [If components need to share common code, then use a monorepo](#if-components-need-to-share-common-code-then-use-a-monorepo) 28 | * [I’ve found monorepos to be extremely valuable in an less-mature, high-churn codebase](#ive-found-monorepos-to-be-extremely-valuable-in-an-less-mature-high-churn-codebase) 29 | * [A common mission](#a-common-mission) 30 | * [Intricacies](#intricacies) 31 | * [Use servers](#use-servers) 32 | * [Clear vs. tribal](#clear-vs-tribal) 33 | * [Proponents of polyrepo](#proponents-of-polyrepo) 34 | * [If tech's biggest names use a monorepo, should we do the same?](#if-techs-biggest-names-use-a-monorepo-should-we-do-the-same) 35 | * [Coupling between unrelated projects](#coupling-between-unrelated-projects) 36 | * [Visible organization](#visible-organization) 37 | * [Opinions about splitting](#opinions-about-splitting) 38 | * [Splitting one repo is easier than combining multiple repos](#splitting-one-repo-is-easier-than-combining-multiple-repos) 39 | * [Splitting may be too fine](#splitting-may-be-too-fine) 40 | * [Opinions about balances](#opinions-about-balances) 41 | * [It's a social problem in how you manage boundaries](#its-a-social-problem-in-how-you-manage-boundaries) 42 | * [Challenges of monorepo and polyrepo](#challenges-of-monorepo-and-polyrepo) 43 | * [On-premise applications or desktop applications](#on-premise-applications-or-desktop-applications) 44 | * [Opinions about alternatives](#opinions-about-alternatives) 45 | * [Could you get the best of both worlds by having a monorepo of submodules?](#could-you-get-the-best-of-both-worlds-by-having-a-monorepo-of-submodules) 46 | * [Hybrid of "many repos"](#hybrid-of-many-repos) 47 | * [Prediction of a new type of VCS](#prediction-of-a-new-type-of-vcs) 48 | 49 | See: 50 | 51 | * [SCM at Facebook](https://github.com/joelparkerhenderson/source_code_management/scm_at_facebook.md) 52 | * [SCM at Google](https://github.com/joelparkerhenderson/source_code_management/scm_at_google.md) 53 | * [Why Google stores billions of lines of code in a single repository (2016) (acm.org)](https://dl.acm.org/citation.cfm?id=2854146) 54 | * [Scaling Mercurial at Facebook](https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/) 55 | * [Cthulhu: Organizing Go Code in a Scalable Repo](https://blog.digitalocean.com/cthulhu-organizing-go-code-in-a-scalable-repo/) 56 | * [Why use a monoreop build tool?](https://mill-build.org/blog/2-monorepo-build-tool.html) 57 | 58 | Posts with Hacker News discussions: 59 | 60 | * [The Ingredients of a Productive Monorepo](https://blog.swgillespie.me/posts/monorepo-ingredients/) & [discussion](https://news.ycombinator.com/item?id=44086917) 61 | * [Monorepos: Please don’t! - By Matt Klein](https://medium.com/@mattklein123/monorepos-please-dont-e9a279be011b) & [discussion](https://news.ycombinator.com/item?id=18808909) 62 | * [Monorepos and the Fallacy of Scale - By Paulus Esterhazy](https://presumably.de/monorepos-and-the-fallacy-of-scale.html) & [discussion](https://news.ycombinator.com/item?id=18855660) 63 | 64 | Credits: 65 | 66 | * Opinions and comments on this page are thanks to many people on various discussion websites, such as Hacker News, and lightly edited for clarity. 67 | 68 | * If you're the author of an opinion here, and would like to attribute it, or explain more, please let us know and we'll give you commit access. 69 | 70 | ## Introduction 71 | 72 | ### What is monorepo? 73 | 74 | Monorepo is a nickname that means "using one repository for the source code management version control system". 75 | 76 | * A monorepo architecture means using one repository, rather than multiple repositories. 77 | 78 | * For example, a monorepo can use one repo that contains a directory for a web app project, a directory for a mobile app project, and a directory for a server app project. 79 | 80 | * Monorepo is also known as one-repo or uni-repo. 81 | 82 | ### What is polyrepo? 83 | 84 | Polyrepo is a nickname that means "using multiple repositories for the source code management version control system". 85 | 86 | * A polyrepo architecture means using multiple repositories, rather than one repository. 87 | 88 | * For example, a polyrepo can use a repo for a web app project, a repo for a mobile app project, and a repo for a server app project. 89 | 90 | * Polyrepo is also known as many-repo or multi-repo. 91 | 92 | ## Comparisons 93 | 94 | 95 | ### Key similarities 96 | 97 | Key similarities between monorepo and polyrepo: 98 | 99 | * Both architectures ultimately track the same source code files, and do it by using source code management (SCM) version control systems (VCS) such as git or mercurial. 100 | 101 | * Both architectures are proven successful for projects of all sizes. 102 | 103 | * Both architectures are straightforward to implement using any typical SCM VCS, up to a scaling limit. 104 | 105 | 106 | ### Key differences 107 | 108 | Key differences between monorepo and polyrepo, summarized from many proponents, and intending to highlight typical differences between a typical monorepo and a typical polyrepo. 109 | 110 | 111 | 112 | 113 | 114 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | 142 | 143 | 144 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 155 | 156 | 157 | 158 | 159 | 160 | 161 | 162 | 163 | 164 | 165 | 166 | 167 | 168 | 169 | 170 | 171 | 172 | 173 | 174 | 175 | 176 | 177 | 178 | 179 | 180 | 181 | 182 | 183 | 184 | 185 | 186 |
MonorepoPolyrepo
ContentsTypically a repo contains multiple projects, programming languages, packaging processes, etc.Typically a repo contains one project, programming language, packaging process, etc.
ProjectsManages projects in one repository, together, holistically.Manages projects in multiple repositories, separately, independently.
WorkflowsEnables workflows in all projects simultaneously, all within the monorepo. Enables workflows in each project one at a time, each in its own repo.
ChangesEnsures changes affect all the projects, can be tracked together, tested together, and released together.Ensures changes affect only one project, can be tracked separately, tested separately, and released separately.
CollaborationEncourages collaboration and code sharing within an organization. Encourages collaboration and code sharing across organizations.
TestingGood white box testing because all projects are testable together, and verifiable holistically.Good black box testing because each project is testable separately, and verifiable independently.
ReleasesCoordinated releases are inherent, yet must use a polygot of tooling.Coordinated releases must be programmed, yet can use vanilla tooling.
StateThe current state of everything is one commit in one repo.The current state of everything is a commit per repo.
CouplingTight coupling of projects.No coupling of projects.
ThinkingEncourages thinking about conjoins among projects.Encourages thinking about contracts between projects.
Access

Access control defaults to all projects.

Some teams use tools for finer-grained access control.

Gitlab and GitHub offer ownership control where you can say who owns what directories for things like approving merge requests that affect those directories(using CODEOWNERS). Google Piper has finer-grained access control. Phabricator offers herald rules that stop a merge from happening if a file has changed in a specific subdirectory. Some teams use service owners, so when a change spans multiple services, they are all added automatically as blocking reviewers.

Access control defaults to per project.

Some teams use tools for broader-grained access control.

GitHub offers teams where you can say one team owns many projects and for things like approving requests that affect multiple repos.

ScalingScaling needs specialized tooling. It is currently not practical to use vanilla git with very large repos, or very large files, without any extra tooling. For monorepo scaling, teams invest in writing custom tooling and providing custom training.Scaling needs specialized coordination. It is currently not practical to use vanilla git with many projects across many repos, where a team wants to coordinate code changes, testing, packaging, and releasing. For polyrepo scaling, teams invest in writing coordination scripts and careful cross-version compatibility.
ToolingGoogle wrote the tool “bazel”, which tracks internal dependencies by using directed acyclic graphs.Lyft wrote the tool “refactorator”, which automates making changes in multiple repos, including opening PRs, tracking status, etc.
187 | 188 | ## Tooling 189 | 190 | ### Bazel 191 | 192 | [Bazel](https://github.com/bazelbuild/bazel) is a fast, scalable, multi-language 193 | and extensible build system. Bazel only rebuilds what is necessary. With 194 | advanced local and distributed caching, optimized dependency analysis and 195 | parallel execution, you get fast and incremental builds. 196 | 197 | Bazel requires you to explicitly declare your dependencies for each 'target' you 198 | want to build. These dependencies can be within the same Bazel workspace, or 199 | imported at build time via say git - there's no need to have all the files 200 | directly in your repo. 201 | 202 | The nice thing is you can declare the commit id or file hash for the dependency 203 | you're importing to make sure you're getting what you expect, and keep Bazel's 204 | reproducibility properties. 205 | 206 | ### moon 207 | 208 | [moon](https://moonrepo.dev/moon) is a multi-language task runner and monorepo 209 | management tool. Like Bazel, it only rebuilds what is necessary, with support 210 | for local and remote caching, dependency analysis, parallel execution, 211 | incremental builds, and even a robust language toolchain. 212 | 213 | moon is not a build system, but is a powerful task runner and organization tool, 214 | so if you're looking for a tool that sits somewhere between Bazel (full 215 | commitment) and Make scripts (no commitment), moon is a great choice. 216 | 217 | ### Lerna 218 | 219 | [Lerna](https://github.com/lerna/lerna) is a tool that optimizes the workflow 220 | around managing multi-package repositories with git and npm. It's primarily for 221 | Node.js based repositories. 222 | 223 | ### OctoLinker 224 | 225 | [Octolinker](https://github.com/OctoLinker/OctoLinker) really helps when 226 | browsing a polyrepo on Github. You can just click the import [project] name and 227 | it will switch to the repo. 228 | 229 | ### Nx 230 | 231 | [Nx](https://nx.dev/) is a build system with monorepo support and powerful 232 | integrations. 233 | 234 | ## Monorepo scaling 235 | 236 | ### Monorepo scaling problem 237 | 238 | Monorepo scaling becomes a problem when a typical developer can't work well with 239 | the code by using typical tools such as vanilla git. 240 | 241 | * Monorepo scaling eventually becomes impractical in terms of space: when a 242 | monorepo grows to have more data than fits on a developer's laptop, then the 243 | developer cannot fetch the monorepo, and it may be impractical to obtain to 244 | more storage space. 245 | 246 | * Monorepo scaling eventually becomes impractical in terms of time: when a 247 | monorepo grows, then a complete file transfer takes more time, and in 248 | practice, there are other operations that also take more time, such as git 249 | pruning, git repacking. 250 | 251 | * A monorepo may grow so large contain so many projects that it takes too much 252 | mental effort to work across projects, such as for searching, editing, and 253 | isolating changes. 254 | 255 | ### Monorepo scaling mitigations 256 | 257 | Monorepo scaling can be improved by: 258 | 259 | * Some type of virtual file system (VFS) that allows a portion of the code to be 260 | present locally. This might be accomplished via a proprietary VCS like 261 | Perforce which natively operates this way, or via Google’s “G3” internal 262 | tooling, or via Microsoft’s GVFS. 263 | 264 | * Sophisticated source code indexing/searching/discovery capabilities as a 265 | service. This is because a typical developer is not going to have all the 266 | source code locally, in a searchable state, using vanilla tooling. 267 | 268 | ### Monorepo scaling metrics 269 | 270 | Monorepo scaling seems to become an issue, in practice, at approximately these 271 | kinds of metrics: 272 | 273 | * 10-100 developers writing code full time. 274 | 275 | * 10-100 projects in progress at the same time. 276 | 277 | * 10-100 packaging processes during the same time period, such as a daily release. 278 | 279 | * 1K-10K versioned dependencies, such as Node modules, Python packages, Ruby gems, etc. 280 | 281 | * 1M-10M lines of code 282 | 283 | ## Proponents of monorepo 284 | 285 | ### If components need to release together, then use a monorepo 286 | 287 | If you think components might need to release together then they should go in 288 | the same repo, because you can in fact pretty easily manage projects with 289 | different release schedules from the same repo if you really need to. 290 | 291 | On the other hand if you've got a whole bunch of components in different repos 292 | which need to release together it suddenly becomes a real pain. 293 | 294 | ### If components need to share common code, then use a monorepo 295 | 296 | If you have components that will never need to release together, then of course 297 | you can stick them in different repositories-- but if you do this and you want 298 | to share common code among the repositories, then you will need to manage that 299 | code with some sort of robust versioning system, and robust versioning systems 300 | are hard. Only do something like that when the value is high enough to justify 301 | the overhead. If you're in a startup, chances are very good that the value is 302 | not high enough. 303 | 304 | ### I’ve found monorepos to be extremely valuable in an less-mature, high-churn codebase 305 | 306 | Need to change a function signature or interface? Cool, global find & replace. 307 | 308 | At some point a monorepo outgrows its usefulness. The sheer amount of files in 309 | something that’s 10K+ LOC (not that large, I know) warrants breaking apart the 310 | codebase into packages. 311 | 312 | Still, I almost err on the side of monorepos because of the convenience that 313 | editors like vscode offer: autocomplete, auto-updating imports, etc. 314 | 315 | ### A common mission 316 | 317 | I find it helpful to think of a company as a group of people engaged in a common 318 | mission. The company pursues its mission through multiple subprojects, and every 319 | decision taken and every code change introduced is a step towards its primary 320 | goal. The code base is a chunk of the company's institutional knowledge about 321 | its overarching goal and means to that end. 322 | 323 | Looking at it from this perspective, a monorepo can be seen as the most natural 324 | expression of the fact that all team members are engaged in a single, if 325 | multi-faceted, enterprise. 326 | 327 | ### Intricacies 328 | 329 | Monorepos are great if set up correctly, but there's a lot of intricacies that 330 | often goes on behind the scenes to make a monorepo successful that it's easy to 331 | overlook since usually some "other" team (devops teams, devtools team, etc.) is 332 | shouldering all that burden. Still worth it, but most be approached with caution. 333 | 334 | ### Use servers 335 | 336 | For our monorepo, most development was done either on your development server 337 | running a datacenter (think ~50-100 cores) - or on an "on demand" machine that 338 | was like a short lived container that generally stayed up to date with known 339 | good commits every few hours. IDE was integrated with devservers / machines & 340 | generally language servers, other services were prewarmed or automatically setup 341 | via chef/ansible, etc. Rarely would you want to run the larger monorepos on your 342 | laptop client (exception would generally be mobile apps, Mac OS apps, etc.). 343 | 344 | ### Clear vs. tribal 345 | 346 | As a former IC at a large monorepo company, I preferred monorepos over 347 | polyrepos. It was the "THE" monorepo, and it made understanding the company's 348 | service graph, call graph, ownership graph, etc etc. incredibly clear. Polyrepos 349 | are tribal knowledge. You don't know where anything lives and you can't look or 350 | discover it. Every team does their own thing. Inheriting new code is a curse. 351 | Code archeology feels like an adventure in root cause analysis in a library of 352 | hidden and cryptic tomes. 353 | 354 | ## Proponents of polyrepo 355 | 356 | ### If tech's biggest names use a monorepo, should we do the same? 357 | 358 | Some of tech’s biggest names use a monorepo, including Google, Facebook, 359 | Twitter, and others. Surely if these companies all use a monorepo, the benefits 360 | must be tremendous, and we should all do the same, right? Wrong! 361 | 362 | Why? Because, at scale, a monorepo must solve every problem that a polyrepo must 363 | solve, with the downside of encouraging tight coupling, and the additional 364 | herculean effort of tackling VCS scalability. 365 | 366 | Thus, in the medium to long term, a monorepo provides zero organizational 367 | benefits, while inevitably leaving some of an organization’s best engineers with 368 | a wicked case of PTSD (manifested via drooling and incoherent mumbling about git 369 | performance internals). 370 | 371 | ### Coupling between unrelated projects 372 | 373 | I worry about the monorepo coupling between unrelated products. While I admit 374 | part of this probably comes from my more libertarian world view but I have seen 375 | something as basic as a server upgrade schedule that is tailored for one product 376 | severely hurt the development of another product, to the point of almost halting 377 | development for months. I can't imagine needing a new feature or a big fix from 378 | a dependency but to be stuck because the whole company isn't ready to upgrade. 379 | 380 | I've read of at least one less serious case of this from google with JUnit: "In 381 | 2007, Google tried to upgrade their JUnit from 3.8.x to 4.x and struggled as 382 | there was a subtle backward incompatibility in a small percentage of their 383 | usages of it. The change-set became very large, and struggled to keep up with 384 | the rate developers were adding tests." 385 | 386 | ### Visible organization 387 | 388 | I argue that a visible organization of a codebase into repositories makes it 389 | easier to reuse code in the same way that interface/implementation splits do: it 390 | makes it clearer which parts felt domain-specific and which felt like reusable 391 | libraries. 392 | 393 | Being able to represent "not directly involved, but versioned together" and 394 | "separate enough to be versioned separately" is a very valuable distinction to 395 | have in your toolbox. 396 | 397 | Once your team is large enough that your developers are not all attending the 398 | same standup, then you should be working in multiple repositories. You need to 399 | have a release cycle with semVer etc. so that developers who aren't in close 400 | communication with you can understand the impact of changes to your code area. 401 | Since tags are repository-global, the repository should be the unit of 402 | versioning/releasing. 403 | 404 | ## Opinions about splitting 405 | 406 | ### Splitting one repo is easier than combining multiple repos 407 | 408 | You can split big repositories into smaller ones quite easily (in Git anyway). 409 | If you only need to do this once, then subtree will do the job, even retaining 410 | all your history if you want. As another way to split, you can duplicate the 411 | repo and pull trees out of each dupe in normal commits. 412 | 413 | But combining small repositories together into a bigger repo is a lot harder. 414 | 415 | So start out with a monorepo. Only split a monorepo into multiple smaller 416 | repositories when you're clear that it really makes sense. 417 | 418 | ### Splitting may be too fine 419 | 420 | My problem with polyrepo is that often organizations end up splitting things too 421 | finely, and now I'm unable to make a single commit to introduce a feature 422 | because my changes have to live across several repositories. 423 | 424 | This makes code review more annoying because you have to tab back and forth to 425 | see all the context. 426 | 427 | This makes it worse to make changes to fundamental (internal) libraries used by 428 | every project. It's too much hassle to track down all the uses of a particular 429 | function, so I end up putting that change elsewhere, which means someone else 430 | will do it a little different in their corner of the world, which utterly 431 | confuses the first person who's unlucky enough to work in both code bases (at 432 | the same time, or after moving teams). 433 | 434 | ## Opinions about balances 435 | 436 | ### It's a social problem in how you manage boundaries 437 | 438 | Among many of the major monorepos, boundaries still exist, they just become far 439 | more opaque because no one has to track them. You find the weird gatekeepers in 440 | the dark that spring out only when you get late in your code review process 441 | because you touched "their" file and they got an automated notice from a hidden 442 | rules engine in your CI process you didn't even realize existed. 443 | 444 | In the polyrepo case those boundaries have to be made explicit (otherwise no one 445 | gets anything done) and those owners should be easily visible. You may not like 446 | the friction they sometimes bring to the table, but at least it won't be a 447 | surprise. 448 | 449 | ### Challenges of monorepo and polyrepo 450 | 451 | My last 2 jobs have been working on developer productivity for 100+ developer 452 | organizations. One organization uses monorepo, one organization uses polyrepo. 453 | 454 | Neither really seems to result in less work, or a better experience. But I've 455 | found that your choice just dictates what type of problems you have to solve. 456 | 457 | Monorepo involves mostly challenges around scaling the organization in a single 458 | repo. 459 | 460 | Polyrepo involves mostly challenges with coordination. 461 | 462 | ### On-premise applications or desktop applications 463 | 464 | If you're creating on-premise applications or desktop applications (things 465 | requiring long lived release branches), the discussion is totally different. 466 | 467 | I find that shipped software actually operates better in a normal, branched 468 | monorepo. You just branch the whole thing. The alternative is several repos and 469 | using a package manager, which minimizes merging in many branches, as you can 470 | just point the package manager at the updated module version, but brings its own 471 | hassles. 472 | 473 | I've worked on projects where there were 6-7 major branches active at the same 474 | time and several smaller ones, besides the master branch. Then you'd have to 475 | merge everywhere applicable, etc. This is a totally different approach from the 476 | Google monorepo approach of "master only", basically. And probably one of the 477 | main reasons why Golang is having a ton of difficulties in the outside world by 478 | not having a proper versioning story. 479 | 480 | Comment: Once you are shipping software off prem you need to patch it between 481 | major and minor releases. Typically one way to do that is to branch when you do 482 | a release to a branch named for the release. Say 1.2. Then when issues pop up 483 | you fix it in the branch then see if it applies to the trunk or other branches 484 | after that. 485 | 486 | ## Opinions about alternatives 487 | 488 | ### Could you get the best of both worlds by having a monorepo of submodules? 489 | 490 | Code would live in separate repos, but references would be declared in the 491 | monorepo. Check-ins and rollbacks to the monorepo would trigger CI. 492 | 493 | Answer: There's not much good to either world. You need fairly extensive tooling 494 | to make working with a repo of submodules comfortable at any scale. At large 495 | scale, that tooling can be simpler than the equivalent monorepo tooling, 496 | assuming that your individual repos remain "small" but also appropriately 497 | granular (not a given--organizing is hard, especially if you leave it to 498 | individual project teams). However, in the process of getting there, a monorepo 499 | requires no particular bespoke tooling at small or even medium scale (it's just 500 | "a repo"), and the performance intolerability pretty much scales smoothly from 501 | there. And those can be treated as technical problems if you don't want to 502 | approach social problems. 503 | 504 | Answer: We actually did this. When I started at Uber ATG one of our devs made a 505 | submodule called `uber_monorepo` that was linked from the root of our git repo. 506 | In our repo's `.buckconfig` file we had access to everything that the mobile 507 | developers at Uber had access to by prefixing our targets with 508 | `//uber_monorepo/`. We did however run into the standard dependency resolution 509 | issue when you have any loosely coupled dependency. Updating our submodule 510 | usually required a 1-2 day effort because we were out of sync for a month or 511 | two. 512 | 513 | ### Hybrid of "many repos" 514 | 515 | We consider our org to be "many repos". We have several thousands. However, 516 | hundreds of them contain 5, 10, or 20+ packages/projects/services. It's funny 517 | because we'll talk about creating "monorepos" (plural) for certain part of our 518 | product, and it confuses people. 519 | 520 | There's a few thousand libraries, those obviously refer to each other.. Some 521 | have readme files and that's enough, some have full documentation "books", some 522 | have comments in the code and that's enough. 523 | 524 | We don't mandate a company wide development process, so each team and groups can 525 | choose their own process and how they track their stuff. 526 | 527 | We do have automation and tooling to keep track of things though. 528 | 529 | ### Prediction of a new type of VCS 530 | 531 | What we all really want is a VCS where repos can be combined and separated 532 | easily, or where one repo can gain the benefits of a monorepo without the 533 | drawbacks of one. 534 | 535 | Prediction: just as DVCS killed off pre-DVCS practically overnight, the thing 536 | that will quickly kill off DVCS is a new type of VCS where you can trivially 537 | combine/separate repos and sections of repos as needed. You can assign, at the 538 | repo level, sub-repos to include in this one, get an atomic commit hash for the 539 | state of the whole thing, and where my VCS client doesn't need to actually 540 | download every linked repo, but where tools are available to act like I have. 541 | 542 | In a sense, we already have all of these features, in folders. You can combine 543 | and separate them, you can make a local folder mimic a folder on a remote 544 | system, and access its content without needing to download it all ahead of time. 545 | They just don't have any VCS features baked in. We've got {file systems, network 546 | file systems, and VCS}, and each of the three has some features the others would 547 | like. 548 | --------------------------------------------------------------------------------