├── README.png
├── cspell.json
├── CITATION.cff
├── CODE_OF_CONDUCT.md
└── README.md
/README.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/joelparkerhenderson/monorepo-vs-polyrepo/HEAD/README.png
--------------------------------------------------------------------------------
/cspell.json:
--------------------------------------------------------------------------------
1 | {
2 | "version": "0.2",
3 | "language": "en",
4 | "words": [
5 | "Bazel",
6 | "buckconfig",
7 | "CODEOWNERS",
8 | "DVCS",
9 | "Phabricator",
10 | "PTSD",
11 | "subproject",
12 | "subprojects"
13 | ],
14 | "flagWords": []
15 | }
16 |
--------------------------------------------------------------------------------
/CITATION.cff:
--------------------------------------------------------------------------------
1 | cff-version: 1.2.0
2 | title: Monorepo vs. polyrepo
3 | message: >-
4 | If you use this work and you want to cite it,
5 | then you can use the metadata from this file.
6 | type: software
7 | authors:
8 | - given-names: Joel Parker
9 | family-names: Henderson
10 | email: joel@joelparkerhenderson.com
11 | affiliation: joelparkerhenderson.com
12 | orcid: 'https://orcid.org/0009-0000-4681-282X'
13 | identifiers:
14 | - type: url
15 | value: 'https://github.com/joelparkerhenderson/monorepo-vs-polyrepo/'
16 | description: Monorepo vs. polyrepo
17 | repository-code: 'https://github.com/joelparkerhenderson/monorepo-vs-polyrepo/'
18 | abstract: >-
19 | Monorepo vs. polyrepo
20 | license: See license file
21 |
--------------------------------------------------------------------------------
/CODE_OF_CONDUCT.md:
--------------------------------------------------------------------------------
1 |
2 | # Contributor Covenant Code of Conduct
3 |
4 | ## Our Pledge
5 |
6 | We as members, contributors, and leaders pledge to make participation in our
7 | community a harassment-free experience for everyone, regardless of age, body
8 | size, visible or invisible disability, ethnicity, sex characteristics, gender
9 | identity and expression, level of experience, education, socio-economic status,
10 | nationality, personal appearance, race, caste, color, religion, or sexual
11 | identity and orientation.
12 |
13 | We pledge to act and interact in ways that contribute to an open, welcoming,
14 | diverse, inclusive, and healthy community.
15 |
16 | ## Our Standards
17 |
18 | Examples of behavior that contributes to a positive environment for our
19 | community include:
20 |
21 | * Demonstrating empathy and kindness toward other people
22 | * Being respectful of differing opinions, viewpoints, and experiences
23 | * Giving and gracefully accepting constructive feedback
24 | * Accepting responsibility and apologizing to those affected by our mistakes,
25 | and learning from the experience
26 | * Focusing on what is best not just for us as individuals, but for the overall
27 | community
28 |
29 | Examples of unacceptable behavior include:
30 |
31 | * The use of sexualized language or imagery, and sexual attention or advances of
32 | any kind
33 | * Trolling, insulting or derogatory comments, and personal or political attacks
34 | * Public or private harassment
35 | * Publishing others' private information, such as a physical or email address,
36 | without their explicit permission
37 | * Other conduct which could reasonably be considered inappropriate in a
38 | professional setting
39 |
40 | ## Enforcement Responsibilities
41 |
42 | Community leaders are responsible for clarifying and enforcing our standards of
43 | acceptable behavior and will take appropriate and fair corrective action in
44 | response to any behavior that they deem inappropriate, threatening, offensive,
45 | or harmful.
46 |
47 | Community leaders have the right and responsibility to remove, edit, or reject
48 | comments, commits, code, wiki edits, issues, and other contributions that are
49 | not aligned to this Code of Conduct, and will communicate reasons for moderation
50 | decisions when appropriate.
51 |
52 | ## Scope
53 |
54 | This Code of Conduct applies within all community spaces, and also applies when
55 | an individual is officially representing the community in public spaces.
56 | Examples of representing our community include using an official e-mail address,
57 | posting via an official social media account, or acting as an appointed
58 | representative at an online or offline event.
59 |
60 | ## Enforcement
61 |
62 | Instances of abusive, harassing, or otherwise unacceptable behavior may be
63 | reported to the community leaders responsible for enforcement at
64 | [INSERT CONTACT METHOD].
65 | All complaints will be reviewed and investigated promptly and fairly.
66 |
67 | All community leaders are obligated to respect the privacy and security of the
68 | reporter of any incident.
69 |
70 | ## Enforcement Guidelines
71 |
72 | Community leaders will follow these Community Impact Guidelines in determining
73 | the consequences for any action they deem in violation of this Code of Conduct:
74 |
75 | ### 1. Correction
76 |
77 | **Community Impact**: Use of inappropriate language or other behavior deemed
78 | unprofessional or unwelcome in the community.
79 |
80 | **Consequence**: A private, written warning from community leaders, providing
81 | clarity around the nature of the violation and an explanation of why the
82 | behavior was inappropriate. A public apology may be requested.
83 |
84 | ### 2. Warning
85 |
86 | **Community Impact**: A violation through a single incident or series of
87 | actions.
88 |
89 | **Consequence**: A warning with consequences for continued behavior. No
90 | interaction with the people involved, including unsolicited interaction with
91 | those enforcing the Code of Conduct, for a specified period of time. This
92 | includes avoiding interactions in community spaces as well as external channels
93 | like social media. Violating these terms may lead to a temporary or permanent
94 | ban.
95 |
96 | ### 3. Temporary Ban
97 |
98 | **Community Impact**: A serious violation of community standards, including
99 | sustained inappropriate behavior.
100 |
101 | **Consequence**: A temporary ban from any sort of interaction or public
102 | communication with the community for a specified period of time. No public or
103 | private interaction with the people involved, including unsolicited interaction
104 | with those enforcing the Code of Conduct, is allowed during this period.
105 | Violating these terms may lead to a permanent ban.
106 |
107 | ### 4. Permanent Ban
108 |
109 | **Community Impact**: Demonstrating a pattern of violation of community
110 | standards, including sustained inappropriate behavior, harassment of an
111 | individual, or aggression toward or disparagement of classes of individuals.
112 |
113 | **Consequence**: A permanent ban from any sort of public interaction within the
114 | community.
115 |
116 | ## Attribution
117 |
118 | This Code of Conduct is adapted from the [Contributor Covenant][homepage],
119 | version 2.1, available at
120 | [https://www.contributor-covenant.org/version/2/1/code_of_conduct.html][v2.1].
121 |
122 | Community Impact Guidelines were inspired by
123 | [Mozilla's code of conduct enforcement ladder][Mozilla CoC].
124 |
125 | For answers to common questions about this code of conduct, see the FAQ at
126 | [https://www.contributor-covenant.org/faq][FAQ]. Translations are available at
127 | [https://www.contributor-covenant.org/translations][translations].
128 |
129 | [homepage]: https://www.contributor-covenant.org
130 | [v2.1]: https://www.contributor-covenant.org/version/2/1/code_of_conduct.html
131 | [Mozilla CoC]: https://github.com/mozilla/diversity
132 | [FAQ]: https://www.contributor-covenant.org/faq
133 | [translations]: https://www.contributor-covenant.org/translations
134 |
135 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Monorepo vs. polyrepo
2 |
3 |
4 |
5 | Monorepo means using one repository that contains many projects, and polyrepo means using a repository per project. This page discusses the similarities and differences, and has advice and opinions on both.
6 |
7 | Contents:
8 |
9 | * [Introduction](#introduction)
10 | * [What is monorepo?](#what-is-monorepo)
11 | * [What is polyrepo?](#what-is-polyrepo)
12 | * [Comparisons](#comparisons)
13 | * [Key similarities](#key-similarities)
14 | * [Key differences](#key-differences)
15 | * [Tooling](#tooling)
16 | * [Bazel](#bazel)
17 | * [moon](#moon)
18 | * [Lerna](#lerna)
19 | * [OctoLinker](#octolinker)
20 | * [Nx](#nx)
21 | * [Monorepo scaling](#monorepo-scaling)
22 | * [Monorepo scaling problem](#monorepo-scaling-problem)
23 | * [Monorepo scaling mitigations](#monorepo-scaling-mitigations)
24 | * [Monorepo scaling metrics](#monorepo-scaling-metrics)
25 | * [Proponents of monorepo](#proponents-of-monorepo)
26 | * [If components need to release together, then use a monorepo](#if-components-need-to-release-together-then-use-a-monorepo)
27 | * [If components need to share common code, then use a monorepo](#if-components-need-to-share-common-code-then-use-a-monorepo)
28 | * [I’ve found monorepos to be extremely valuable in an less-mature, high-churn codebase](#ive-found-monorepos-to-be-extremely-valuable-in-an-less-mature-high-churn-codebase)
29 | * [A common mission](#a-common-mission)
30 | * [Intricacies](#intricacies)
31 | * [Use servers](#use-servers)
32 | * [Clear vs. tribal](#clear-vs-tribal)
33 | * [Proponents of polyrepo](#proponents-of-polyrepo)
34 | * [If tech's biggest names use a monorepo, should we do the same?](#if-techs-biggest-names-use-a-monorepo-should-we-do-the-same)
35 | * [Coupling between unrelated projects](#coupling-between-unrelated-projects)
36 | * [Visible organization](#visible-organization)
37 | * [Opinions about splitting](#opinions-about-splitting)
38 | * [Splitting one repo is easier than combining multiple repos](#splitting-one-repo-is-easier-than-combining-multiple-repos)
39 | * [Splitting may be too fine](#splitting-may-be-too-fine)
40 | * [Opinions about balances](#opinions-about-balances)
41 | * [It's a social problem in how you manage boundaries](#its-a-social-problem-in-how-you-manage-boundaries)
42 | * [Challenges of monorepo and polyrepo](#challenges-of-monorepo-and-polyrepo)
43 | * [On-premise applications or desktop applications](#on-premise-applications-or-desktop-applications)
44 | * [Opinions about alternatives](#opinions-about-alternatives)
45 | * [Could you get the best of both worlds by having a monorepo of submodules?](#could-you-get-the-best-of-both-worlds-by-having-a-monorepo-of-submodules)
46 | * [Hybrid of "many repos"](#hybrid-of-many-repos)
47 | * [Prediction of a new type of VCS](#prediction-of-a-new-type-of-vcs)
48 |
49 | See:
50 |
51 | * [SCM at Facebook](https://github.com/joelparkerhenderson/source_code_management/scm_at_facebook.md)
52 | * [SCM at Google](https://github.com/joelparkerhenderson/source_code_management/scm_at_google.md)
53 | * [Why Google stores billions of lines of code in a single repository (2016) (acm.org)](https://dl.acm.org/citation.cfm?id=2854146)
54 | * [Scaling Mercurial at Facebook](https://code.facebook.com/posts/218678814984400/scaling-mercurial-at-facebook/)
55 | * [Cthulhu: Organizing Go Code in a Scalable Repo](https://blog.digitalocean.com/cthulhu-organizing-go-code-in-a-scalable-repo/)
56 | * [Why use a monoreop build tool?](https://mill-build.org/blog/2-monorepo-build-tool.html)
57 |
58 | Posts with Hacker News discussions:
59 |
60 | * [The Ingredients of a Productive Monorepo](https://blog.swgillespie.me/posts/monorepo-ingredients/) & [discussion](https://news.ycombinator.com/item?id=44086917)
61 | * [Monorepos: Please don’t! - By Matt Klein](https://medium.com/@mattklein123/monorepos-please-dont-e9a279be011b) & [discussion](https://news.ycombinator.com/item?id=18808909)
62 | * [Monorepos and the Fallacy of Scale - By Paulus Esterhazy](https://presumably.de/monorepos-and-the-fallacy-of-scale.html) & [discussion](https://news.ycombinator.com/item?id=18855660)
63 |
64 | Credits:
65 |
66 | * Opinions and comments on this page are thanks to many people on various discussion websites, such as Hacker News, and lightly edited for clarity.
67 |
68 | * If you're the author of an opinion here, and would like to attribute it, or explain more, please let us know and we'll give you commit access.
69 |
70 | ## Introduction
71 |
72 | ### What is monorepo?
73 |
74 | Monorepo is a nickname that means "using one repository for the source code management version control system".
75 |
76 | * A monorepo architecture means using one repository, rather than multiple repositories.
77 |
78 | * For example, a monorepo can use one repo that contains a directory for a web app project, a directory for a mobile app project, and a directory for a server app project.
79 |
80 | * Monorepo is also known as one-repo or uni-repo.
81 |
82 | ### What is polyrepo?
83 |
84 | Polyrepo is a nickname that means "using multiple repositories for the source code management version control system".
85 |
86 | * A polyrepo architecture means using multiple repositories, rather than one repository.
87 |
88 | * For example, a polyrepo can use a repo for a web app project, a repo for a mobile app project, and a repo for a server app project.
89 |
90 | * Polyrepo is also known as many-repo or multi-repo.
91 |
92 | ## Comparisons
93 |
94 |
95 | ### Key similarities
96 |
97 | Key similarities between monorepo and polyrepo:
98 |
99 | * Both architectures ultimately track the same source code files, and do it by using source code management (SCM) version control systems (VCS) such as git or mercurial.
100 |
101 | * Both architectures are proven successful for projects of all sizes.
102 |
103 | * Both architectures are straightforward to implement using any typical SCM VCS, up to a scaling limit.
104 |
105 |
106 | ### Key differences
107 |
108 | Key differences between monorepo and polyrepo, summarized from many proponents, and intending to highlight typical differences between a typical monorepo and a typical polyrepo.
109 |
110 |
111 |
112 |
113 | |
114 | Monorepo |
115 | Polyrepo |
116 |
117 |
118 |
119 |
120 | | Contents |
121 | Typically a repo contains multiple projects, programming languages, packaging processes, etc. |
122 | Typically a repo contains one project, programming language, packaging process, etc. |
123 |
124 |
125 | | Projects |
126 | Manages projects in one repository, together, holistically. |
127 | Manages projects in multiple repositories, separately, independently. |
128 |
129 |
130 | | Workflows |
131 | Enables workflows in all projects simultaneously, all within the monorepo. |
132 | Enables workflows in each project one at a time, each in its own repo. |
133 |
134 |
135 | | Changes |
136 | Ensures changes affect all the projects, can be tracked together, tested together, and released together. |
137 | Ensures changes affect only one project, can be tracked separately, tested separately, and released separately. |
138 |
139 |
140 | | Collaboration |
141 | Encourages collaboration and code sharing within an organization. |
142 | Encourages collaboration and code sharing across organizations. |
143 |
144 |
145 | | Testing |
146 | Good white box testing because all projects are testable together, and verifiable holistically. |
147 | Good black box testing because each project is testable separately, and verifiable independently. |
148 |
149 |
150 | | Releases |
151 | Coordinated releases are inherent, yet must use a polygot of tooling. |
152 | Coordinated releases must be programmed, yet can use vanilla tooling. |
153 |
154 |
155 | | State |
156 | The current state of everything is one commit in one repo. |
157 | The current state of everything is a commit per repo. |
158 |
159 |
160 | | Coupling |
161 | Tight coupling of projects. |
162 | No coupling of projects. |
163 |
164 |
165 | | Thinking |
166 | Encourages thinking about conjoins among projects. |
167 | Encourages thinking about contracts between projects. |
168 |
169 |
170 | | Access |
171 | Access control defaults to all projects. Some teams use tools for finer-grained access control. Gitlab and GitHub offer ownership control where you can say who owns what directories for things like approving merge requests that affect those directories(using CODEOWNERS). Google Piper has finer-grained access control. Phabricator offers herald rules that stop a merge from happening if a file has changed in a specific subdirectory. Some teams use service owners, so when a change spans multiple services, they are all added automatically as blocking reviewers. |
172 |
173 | Access control defaults to per project. Some teams use tools for broader-grained access control. GitHub offers teams where you can say one team owns many projects and for things like approving requests that affect multiple repos. |
174 |
175 |
176 | | Scaling |
177 | Scaling needs specialized tooling. It is currently not practical to use vanilla git with very large repos, or very large files, without any extra tooling. For monorepo scaling, teams invest in writing custom tooling and providing custom training. |
178 | Scaling needs specialized coordination. It is currently not practical to use vanilla git with many projects across many repos, where a team wants to coordinate code changes, testing, packaging, and releasing. For polyrepo scaling, teams invest in writing coordination scripts and careful cross-version compatibility. |
179 |
180 |
181 | | Tooling |
182 | Google wrote the tool “bazel”, which tracks internal dependencies by using directed acyclic graphs. |
183 | Lyft wrote the tool “refactorator”, which automates making changes in multiple repos, including opening PRs, tracking status, etc. |
184 |
185 |
186 |
187 |
188 | ## Tooling
189 |
190 | ### Bazel
191 |
192 | [Bazel](https://github.com/bazelbuild/bazel) is a fast, scalable, multi-language
193 | and extensible build system. Bazel only rebuilds what is necessary. With
194 | advanced local and distributed caching, optimized dependency analysis and
195 | parallel execution, you get fast and incremental builds.
196 |
197 | Bazel requires you to explicitly declare your dependencies for each 'target' you
198 | want to build. These dependencies can be within the same Bazel workspace, or
199 | imported at build time via say git - there's no need to have all the files
200 | directly in your repo.
201 |
202 | The nice thing is you can declare the commit id or file hash for the dependency
203 | you're importing to make sure you're getting what you expect, and keep Bazel's
204 | reproducibility properties.
205 |
206 | ### moon
207 |
208 | [moon](https://moonrepo.dev/moon) is a multi-language task runner and monorepo
209 | management tool. Like Bazel, it only rebuilds what is necessary, with support
210 | for local and remote caching, dependency analysis, parallel execution,
211 | incremental builds, and even a robust language toolchain.
212 |
213 | moon is not a build system, but is a powerful task runner and organization tool,
214 | so if you're looking for a tool that sits somewhere between Bazel (full
215 | commitment) and Make scripts (no commitment), moon is a great choice.
216 |
217 | ### Lerna
218 |
219 | [Lerna](https://github.com/lerna/lerna) is a tool that optimizes the workflow
220 | around managing multi-package repositories with git and npm. It's primarily for
221 | Node.js based repositories.
222 |
223 | ### OctoLinker
224 |
225 | [Octolinker](https://github.com/OctoLinker/OctoLinker) really helps when
226 | browsing a polyrepo on Github. You can just click the import [project] name and
227 | it will switch to the repo.
228 |
229 | ### Nx
230 |
231 | [Nx](https://nx.dev/) is a build system with monorepo support and powerful
232 | integrations.
233 |
234 | ## Monorepo scaling
235 |
236 | ### Monorepo scaling problem
237 |
238 | Monorepo scaling becomes a problem when a typical developer can't work well with
239 | the code by using typical tools such as vanilla git.
240 |
241 | * Monorepo scaling eventually becomes impractical in terms of space: when a
242 | monorepo grows to have more data than fits on a developer's laptop, then the
243 | developer cannot fetch the monorepo, and it may be impractical to obtain to
244 | more storage space.
245 |
246 | * Monorepo scaling eventually becomes impractical in terms of time: when a
247 | monorepo grows, then a complete file transfer takes more time, and in
248 | practice, there are other operations that also take more time, such as git
249 | pruning, git repacking.
250 |
251 | * A monorepo may grow so large contain so many projects that it takes too much
252 | mental effort to work across projects, such as for searching, editing, and
253 | isolating changes.
254 |
255 | ### Monorepo scaling mitigations
256 |
257 | Monorepo scaling can be improved by:
258 |
259 | * Some type of virtual file system (VFS) that allows a portion of the code to be
260 | present locally. This might be accomplished via a proprietary VCS like
261 | Perforce which natively operates this way, or via Google’s “G3” internal
262 | tooling, or via Microsoft’s GVFS.
263 |
264 | * Sophisticated source code indexing/searching/discovery capabilities as a
265 | service. This is because a typical developer is not going to have all the
266 | source code locally, in a searchable state, using vanilla tooling.
267 |
268 | ### Monorepo scaling metrics
269 |
270 | Monorepo scaling seems to become an issue, in practice, at approximately these
271 | kinds of metrics:
272 |
273 | * 10-100 developers writing code full time.
274 |
275 | * 10-100 projects in progress at the same time.
276 |
277 | * 10-100 packaging processes during the same time period, such as a daily release.
278 |
279 | * 1K-10K versioned dependencies, such as Node modules, Python packages, Ruby gems, etc.
280 |
281 | * 1M-10M lines of code
282 |
283 | ## Proponents of monorepo
284 |
285 | ### If components need to release together, then use a monorepo
286 |
287 | If you think components might need to release together then they should go in
288 | the same repo, because you can in fact pretty easily manage projects with
289 | different release schedules from the same repo if you really need to.
290 |
291 | On the other hand if you've got a whole bunch of components in different repos
292 | which need to release together it suddenly becomes a real pain.
293 |
294 | ### If components need to share common code, then use a monorepo
295 |
296 | If you have components that will never need to release together, then of course
297 | you can stick them in different repositories-- but if you do this and you want
298 | to share common code among the repositories, then you will need to manage that
299 | code with some sort of robust versioning system, and robust versioning systems
300 | are hard. Only do something like that when the value is high enough to justify
301 | the overhead. If you're in a startup, chances are very good that the value is
302 | not high enough.
303 |
304 | ### I’ve found monorepos to be extremely valuable in an less-mature, high-churn codebase
305 |
306 | Need to change a function signature or interface? Cool, global find & replace.
307 |
308 | At some point a monorepo outgrows its usefulness. The sheer amount of files in
309 | something that’s 10K+ LOC (not that large, I know) warrants breaking apart the
310 | codebase into packages.
311 |
312 | Still, I almost err on the side of monorepos because of the convenience that
313 | editors like vscode offer: autocomplete, auto-updating imports, etc.
314 |
315 | ### A common mission
316 |
317 | I find it helpful to think of a company as a group of people engaged in a common
318 | mission. The company pursues its mission through multiple subprojects, and every
319 | decision taken and every code change introduced is a step towards its primary
320 | goal. The code base is a chunk of the company's institutional knowledge about
321 | its overarching goal and means to that end.
322 |
323 | Looking at it from this perspective, a monorepo can be seen as the most natural
324 | expression of the fact that all team members are engaged in a single, if
325 | multi-faceted, enterprise.
326 |
327 | ### Intricacies
328 |
329 | Monorepos are great if set up correctly, but there's a lot of intricacies that
330 | often goes on behind the scenes to make a monorepo successful that it's easy to
331 | overlook since usually some "other" team (devops teams, devtools team, etc.) is
332 | shouldering all that burden. Still worth it, but most be approached with caution.
333 |
334 | ### Use servers
335 |
336 | For our monorepo, most development was done either on your development server
337 | running a datacenter (think ~50-100 cores) - or on an "on demand" machine that
338 | was like a short lived container that generally stayed up to date with known
339 | good commits every few hours. IDE was integrated with devservers / machines &
340 | generally language servers, other services were prewarmed or automatically setup
341 | via chef/ansible, etc. Rarely would you want to run the larger monorepos on your
342 | laptop client (exception would generally be mobile apps, Mac OS apps, etc.).
343 |
344 | ### Clear vs. tribal
345 |
346 | As a former IC at a large monorepo company, I preferred monorepos over
347 | polyrepos. It was the "THE" monorepo, and it made understanding the company's
348 | service graph, call graph, ownership graph, etc etc. incredibly clear. Polyrepos
349 | are tribal knowledge. You don't know where anything lives and you can't look or
350 | discover it. Every team does their own thing. Inheriting new code is a curse.
351 | Code archeology feels like an adventure in root cause analysis in a library of
352 | hidden and cryptic tomes.
353 |
354 | ## Proponents of polyrepo
355 |
356 | ### If tech's biggest names use a monorepo, should we do the same?
357 |
358 | Some of tech’s biggest names use a monorepo, including Google, Facebook,
359 | Twitter, and others. Surely if these companies all use a monorepo, the benefits
360 | must be tremendous, and we should all do the same, right? Wrong!
361 |
362 | Why? Because, at scale, a monorepo must solve every problem that a polyrepo must
363 | solve, with the downside of encouraging tight coupling, and the additional
364 | herculean effort of tackling VCS scalability.
365 |
366 | Thus, in the medium to long term, a monorepo provides zero organizational
367 | benefits, while inevitably leaving some of an organization’s best engineers with
368 | a wicked case of PTSD (manifested via drooling and incoherent mumbling about git
369 | performance internals).
370 |
371 | ### Coupling between unrelated projects
372 |
373 | I worry about the monorepo coupling between unrelated products. While I admit
374 | part of this probably comes from my more libertarian world view but I have seen
375 | something as basic as a server upgrade schedule that is tailored for one product
376 | severely hurt the development of another product, to the point of almost halting
377 | development for months. I can't imagine needing a new feature or a big fix from
378 | a dependency but to be stuck because the whole company isn't ready to upgrade.
379 |
380 | I've read of at least one less serious case of this from google with JUnit: "In
381 | 2007, Google tried to upgrade their JUnit from 3.8.x to 4.x and struggled as
382 | there was a subtle backward incompatibility in a small percentage of their
383 | usages of it. The change-set became very large, and struggled to keep up with
384 | the rate developers were adding tests."
385 |
386 | ### Visible organization
387 |
388 | I argue that a visible organization of a codebase into repositories makes it
389 | easier to reuse code in the same way that interface/implementation splits do: it
390 | makes it clearer which parts felt domain-specific and which felt like reusable
391 | libraries.
392 |
393 | Being able to represent "not directly involved, but versioned together" and
394 | "separate enough to be versioned separately" is a very valuable distinction to
395 | have in your toolbox.
396 |
397 | Once your team is large enough that your developers are not all attending the
398 | same standup, then you should be working in multiple repositories. You need to
399 | have a release cycle with semVer etc. so that developers who aren't in close
400 | communication with you can understand the impact of changes to your code area.
401 | Since tags are repository-global, the repository should be the unit of
402 | versioning/releasing.
403 |
404 | ## Opinions about splitting
405 |
406 | ### Splitting one repo is easier than combining multiple repos
407 |
408 | You can split big repositories into smaller ones quite easily (in Git anyway).
409 | If you only need to do this once, then subtree will do the job, even retaining
410 | all your history if you want. As another way to split, you can duplicate the
411 | repo and pull trees out of each dupe in normal commits.
412 |
413 | But combining small repositories together into a bigger repo is a lot harder.
414 |
415 | So start out with a monorepo. Only split a monorepo into multiple smaller
416 | repositories when you're clear that it really makes sense.
417 |
418 | ### Splitting may be too fine
419 |
420 | My problem with polyrepo is that often organizations end up splitting things too
421 | finely, and now I'm unable to make a single commit to introduce a feature
422 | because my changes have to live across several repositories.
423 |
424 | This makes code review more annoying because you have to tab back and forth to
425 | see all the context.
426 |
427 | This makes it worse to make changes to fundamental (internal) libraries used by
428 | every project. It's too much hassle to track down all the uses of a particular
429 | function, so I end up putting that change elsewhere, which means someone else
430 | will do it a little different in their corner of the world, which utterly
431 | confuses the first person who's unlucky enough to work in both code bases (at
432 | the same time, or after moving teams).
433 |
434 | ## Opinions about balances
435 |
436 | ### It's a social problem in how you manage boundaries
437 |
438 | Among many of the major monorepos, boundaries still exist, they just become far
439 | more opaque because no one has to track them. You find the weird gatekeepers in
440 | the dark that spring out only when you get late in your code review process
441 | because you touched "their" file and they got an automated notice from a hidden
442 | rules engine in your CI process you didn't even realize existed.
443 |
444 | In the polyrepo case those boundaries have to be made explicit (otherwise no one
445 | gets anything done) and those owners should be easily visible. You may not like
446 | the friction they sometimes bring to the table, but at least it won't be a
447 | surprise.
448 |
449 | ### Challenges of monorepo and polyrepo
450 |
451 | My last 2 jobs have been working on developer productivity for 100+ developer
452 | organizations. One organization uses monorepo, one organization uses polyrepo.
453 |
454 | Neither really seems to result in less work, or a better experience. But I've
455 | found that your choice just dictates what type of problems you have to solve.
456 |
457 | Monorepo involves mostly challenges around scaling the organization in a single
458 | repo.
459 |
460 | Polyrepo involves mostly challenges with coordination.
461 |
462 | ### On-premise applications or desktop applications
463 |
464 | If you're creating on-premise applications or desktop applications (things
465 | requiring long lived release branches), the discussion is totally different.
466 |
467 | I find that shipped software actually operates better in a normal, branched
468 | monorepo. You just branch the whole thing. The alternative is several repos and
469 | using a package manager, which minimizes merging in many branches, as you can
470 | just point the package manager at the updated module version, but brings its own
471 | hassles.
472 |
473 | I've worked on projects where there were 6-7 major branches active at the same
474 | time and several smaller ones, besides the master branch. Then you'd have to
475 | merge everywhere applicable, etc. This is a totally different approach from the
476 | Google monorepo approach of "master only", basically. And probably one of the
477 | main reasons why Golang is having a ton of difficulties in the outside world by
478 | not having a proper versioning story.
479 |
480 | Comment: Once you are shipping software off prem you need to patch it between
481 | major and minor releases. Typically one way to do that is to branch when you do
482 | a release to a branch named for the release. Say 1.2. Then when issues pop up
483 | you fix it in the branch then see if it applies to the trunk or other branches
484 | after that.
485 |
486 | ## Opinions about alternatives
487 |
488 | ### Could you get the best of both worlds by having a monorepo of submodules?
489 |
490 | Code would live in separate repos, but references would be declared in the
491 | monorepo. Check-ins and rollbacks to the monorepo would trigger CI.
492 |
493 | Answer: There's not much good to either world. You need fairly extensive tooling
494 | to make working with a repo of submodules comfortable at any scale. At large
495 | scale, that tooling can be simpler than the equivalent monorepo tooling,
496 | assuming that your individual repos remain "small" but also appropriately
497 | granular (not a given--organizing is hard, especially if you leave it to
498 | individual project teams). However, in the process of getting there, a monorepo
499 | requires no particular bespoke tooling at small or even medium scale (it's just
500 | "a repo"), and the performance intolerability pretty much scales smoothly from
501 | there. And those can be treated as technical problems if you don't want to
502 | approach social problems.
503 |
504 | Answer: We actually did this. When I started at Uber ATG one of our devs made a
505 | submodule called `uber_monorepo` that was linked from the root of our git repo.
506 | In our repo's `.buckconfig` file we had access to everything that the mobile
507 | developers at Uber had access to by prefixing our targets with
508 | `//uber_monorepo/`. We did however run into the standard dependency resolution
509 | issue when you have any loosely coupled dependency. Updating our submodule
510 | usually required a 1-2 day effort because we were out of sync for a month or
511 | two.
512 |
513 | ### Hybrid of "many repos"
514 |
515 | We consider our org to be "many repos". We have several thousands. However,
516 | hundreds of them contain 5, 10, or 20+ packages/projects/services. It's funny
517 | because we'll talk about creating "monorepos" (plural) for certain part of our
518 | product, and it confuses people.
519 |
520 | There's a few thousand libraries, those obviously refer to each other.. Some
521 | have readme files and that's enough, some have full documentation "books", some
522 | have comments in the code and that's enough.
523 |
524 | We don't mandate a company wide development process, so each team and groups can
525 | choose their own process and how they track their stuff.
526 |
527 | We do have automation and tooling to keep track of things though.
528 |
529 | ### Prediction of a new type of VCS
530 |
531 | What we all really want is a VCS where repos can be combined and separated
532 | easily, or where one repo can gain the benefits of a monorepo without the
533 | drawbacks of one.
534 |
535 | Prediction: just as DVCS killed off pre-DVCS practically overnight, the thing
536 | that will quickly kill off DVCS is a new type of VCS where you can trivially
537 | combine/separate repos and sections of repos as needed. You can assign, at the
538 | repo level, sub-repos to include in this one, get an atomic commit hash for the
539 | state of the whole thing, and where my VCS client doesn't need to actually
540 | download every linked repo, but where tools are available to act like I have.
541 |
542 | In a sense, we already have all of these features, in folders. You can combine
543 | and separate them, you can make a local folder mimic a folder on a remote
544 | system, and access its content without needing to download it all ahead of time.
545 | They just don't have any VCS features baked in. We've got {file systems, network
546 | file systems, and VCS}, and each of the three has some features the others would
547 | like.
548 |
--------------------------------------------------------------------------------