├── .nojekyll ├── .gitignore ├── assets └── images │ └── swe_at_google.2.cover.jpg ├── zh-cn ├── Chapter-17_Code_Search │ └── images │ │ ├── Figure 17-1.png │ │ ├── Figure 17-2.png │ │ ├── Figure 17-3.png │ │ └── Figure 17-4.png ├── Chapter-14_Larger_Testing │ └── images │ │ ├── Figure 14-1.png │ │ ├── Figure 14-2.png │ │ ├── Figure 14-3.png │ │ ├── Figure 14-4.png │ │ ├── Figure 14-5.png │ │ └── Figure 14-6.png ├── Chapter-20_Static_Analysis │ ├── images │ │ ├── Figure 20-1.png │ │ └── Figure 20-2.png │ └── Chapter-20_Static_Analysis.md ├── Chapter-6_Leading_at_Scale │ └── images │ │ ├── Figure 6-1.png │ │ └── Figure 6-2.png ├── Chapter-21_Dependency_Management │ └── images │ │ └── Figure 21-1.png ├── Chapter-23_Continuous_Integration │ └── images │ │ ├── Figure 23-1.png │ │ ├── Figure 23-2.png │ │ ├── Figure 23-3.png │ │ ├── Figure 23-4.png │ │ └── Figure 23-5.png ├── Chapter-1_What_Is_Software_Engineering │ └── images │ │ ├── figure 1-1.png │ │ └── figure 1-2.png ├── Chapter-11_Testing_Overview │ └── images │ │ ├── image-20220407195517053.png │ │ ├── image-20220407195824423.png │ │ ├── image-20220407200232089.png │ │ ├── image-20220407200917862.png │ │ └── image-20220407201117705.png ├── Chapter-19_Critique_Googles_Code_Review_Tool │ └── images │ │ ├── Figure 19-1.png │ │ ├── Figure 19-2.png │ │ ├── Figure 19-3.png │ │ ├── Figure 19-4.png │ │ ├── Figure 19-5.png │ │ ├── Figure 19-6.png │ │ ├── Figure 19-7.png │ │ └── Figure 19-8.png ├── Chapter-18_Build_Systems_and_Build_Philosophy │ └── images │ │ ├── Figure 18-1.jpg │ │ ├── Figure 18-2.jpg │ │ ├── Figure 18-3.png │ │ ├── Figure 18-4.png │ │ └── Figure 18-5.png ├── Chapter-16_Version_Control_and_Branch_Management │ └── images │ │ └── Figure 16-1.png ├── Afterword.md ├── Foreword.md ├── Preface.md ├── Chapter-24_Continuous_Delivery │ └── Chapter-24_Continuous_Delivery.md ├── Chapter-4_Engineering_for_Equity │ └── Chapter-4_Engineering_for_Equity.md └── Chapter-15_Deprecation │ └── Chapter-15_Deprecation.md ├── _coverpage.md ├── _sidebar.md ├── index.html └── README.md /.nojekyll: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | node_modules 2 | .temp 3 | .cache 4 | .idea 5 | .DS_Store 6 | -------------------------------------------------------------------------------- /assets/images/swe_at_google.2.cover.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/assets/images/swe_at_google.2.cover.jpg -------------------------------------------------------------------------------- /zh-cn/Chapter-17_Code_Search/images/Figure 17-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-17_Code_Search/images/Figure 17-1.png -------------------------------------------------------------------------------- /zh-cn/Chapter-17_Code_Search/images/Figure 17-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-17_Code_Search/images/Figure 17-2.png -------------------------------------------------------------------------------- /zh-cn/Chapter-17_Code_Search/images/Figure 17-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-17_Code_Search/images/Figure 17-3.png -------------------------------------------------------------------------------- /zh-cn/Chapter-17_Code_Search/images/Figure 17-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-17_Code_Search/images/Figure 17-4.png -------------------------------------------------------------------------------- /zh-cn/Chapter-14_Larger_Testing/images/Figure 14-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-14_Larger_Testing/images/Figure 14-1.png -------------------------------------------------------------------------------- /zh-cn/Chapter-14_Larger_Testing/images/Figure 14-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-14_Larger_Testing/images/Figure 14-2.png -------------------------------------------------------------------------------- /zh-cn/Chapter-14_Larger_Testing/images/Figure 14-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-14_Larger_Testing/images/Figure 14-3.png -------------------------------------------------------------------------------- /zh-cn/Chapter-14_Larger_Testing/images/Figure 14-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-14_Larger_Testing/images/Figure 14-4.png -------------------------------------------------------------------------------- /zh-cn/Chapter-14_Larger_Testing/images/Figure 14-5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-14_Larger_Testing/images/Figure 14-5.png -------------------------------------------------------------------------------- /zh-cn/Chapter-14_Larger_Testing/images/Figure 14-6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-14_Larger_Testing/images/Figure 14-6.png -------------------------------------------------------------------------------- /zh-cn/Chapter-20_Static_Analysis/images/Figure 20-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-20_Static_Analysis/images/Figure 20-1.png -------------------------------------------------------------------------------- /zh-cn/Chapter-20_Static_Analysis/images/Figure 20-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-20_Static_Analysis/images/Figure 20-2.png -------------------------------------------------------------------------------- /zh-cn/Chapter-6_Leading_at_Scale/images/Figure 6-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-6_Leading_at_Scale/images/Figure 6-1.png -------------------------------------------------------------------------------- /zh-cn/Chapter-6_Leading_at_Scale/images/Figure 6-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-6_Leading_at_Scale/images/Figure 6-2.png -------------------------------------------------------------------------------- /zh-cn/Chapter-21_Dependency_Management/images/Figure 21-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-21_Dependency_Management/images/Figure 21-1.png -------------------------------------------------------------------------------- /zh-cn/Chapter-23_Continuous_Integration/images/Figure 23-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-23_Continuous_Integration/images/Figure 23-1.png -------------------------------------------------------------------------------- /zh-cn/Chapter-23_Continuous_Integration/images/Figure 23-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-23_Continuous_Integration/images/Figure 23-2.png -------------------------------------------------------------------------------- /zh-cn/Chapter-23_Continuous_Integration/images/Figure 23-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-23_Continuous_Integration/images/Figure 23-3.png -------------------------------------------------------------------------------- /zh-cn/Chapter-23_Continuous_Integration/images/Figure 23-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-23_Continuous_Integration/images/Figure 23-4.png -------------------------------------------------------------------------------- /zh-cn/Chapter-23_Continuous_Integration/images/Figure 23-5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-23_Continuous_Integration/images/Figure 23-5.png -------------------------------------------------------------------------------- /zh-cn/Chapter-1_What_Is_Software_Engineering/images/figure 1-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-1_What_Is_Software_Engineering/images/figure 1-1.png -------------------------------------------------------------------------------- /zh-cn/Chapter-1_What_Is_Software_Engineering/images/figure 1-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-1_What_Is_Software_Engineering/images/figure 1-2.png -------------------------------------------------------------------------------- /zh-cn/Chapter-11_Testing_Overview/images/image-20220407195517053.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-11_Testing_Overview/images/image-20220407195517053.png -------------------------------------------------------------------------------- /zh-cn/Chapter-11_Testing_Overview/images/image-20220407195824423.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-11_Testing_Overview/images/image-20220407195824423.png -------------------------------------------------------------------------------- /zh-cn/Chapter-11_Testing_Overview/images/image-20220407200232089.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-11_Testing_Overview/images/image-20220407200232089.png -------------------------------------------------------------------------------- /zh-cn/Chapter-11_Testing_Overview/images/image-20220407200917862.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-11_Testing_Overview/images/image-20220407200917862.png -------------------------------------------------------------------------------- /zh-cn/Chapter-11_Testing_Overview/images/image-20220407201117705.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-11_Testing_Overview/images/image-20220407201117705.png -------------------------------------------------------------------------------- /zh-cn/Chapter-19_Critique_Googles_Code_Review_Tool/images/Figure 19-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-19_Critique_Googles_Code_Review_Tool/images/Figure 19-1.png -------------------------------------------------------------------------------- /zh-cn/Chapter-19_Critique_Googles_Code_Review_Tool/images/Figure 19-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-19_Critique_Googles_Code_Review_Tool/images/Figure 19-2.png -------------------------------------------------------------------------------- /zh-cn/Chapter-19_Critique_Googles_Code_Review_Tool/images/Figure 19-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-19_Critique_Googles_Code_Review_Tool/images/Figure 19-3.png -------------------------------------------------------------------------------- /zh-cn/Chapter-19_Critique_Googles_Code_Review_Tool/images/Figure 19-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-19_Critique_Googles_Code_Review_Tool/images/Figure 19-4.png -------------------------------------------------------------------------------- /zh-cn/Chapter-19_Critique_Googles_Code_Review_Tool/images/Figure 19-5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-19_Critique_Googles_Code_Review_Tool/images/Figure 19-5.png -------------------------------------------------------------------------------- /zh-cn/Chapter-19_Critique_Googles_Code_Review_Tool/images/Figure 19-6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-19_Critique_Googles_Code_Review_Tool/images/Figure 19-6.png -------------------------------------------------------------------------------- /zh-cn/Chapter-19_Critique_Googles_Code_Review_Tool/images/Figure 19-7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-19_Critique_Googles_Code_Review_Tool/images/Figure 19-7.png -------------------------------------------------------------------------------- /zh-cn/Chapter-19_Critique_Googles_Code_Review_Tool/images/Figure 19-8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-19_Critique_Googles_Code_Review_Tool/images/Figure 19-8.png -------------------------------------------------------------------------------- /zh-cn/Chapter-18_Build_Systems_and_Build_Philosophy/images/Figure 18-1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-18_Build_Systems_and_Build_Philosophy/images/Figure 18-1.jpg -------------------------------------------------------------------------------- /zh-cn/Chapter-18_Build_Systems_and_Build_Philosophy/images/Figure 18-2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-18_Build_Systems_and_Build_Philosophy/images/Figure 18-2.jpg -------------------------------------------------------------------------------- /zh-cn/Chapter-18_Build_Systems_and_Build_Philosophy/images/Figure 18-3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-18_Build_Systems_and_Build_Philosophy/images/Figure 18-3.png -------------------------------------------------------------------------------- /zh-cn/Chapter-18_Build_Systems_and_Build_Philosophy/images/Figure 18-4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-18_Build_Systems_and_Build_Philosophy/images/Figure 18-4.png -------------------------------------------------------------------------------- /zh-cn/Chapter-18_Build_Systems_and_Build_Philosophy/images/Figure 18-5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-18_Build_Systems_and_Build_Philosophy/images/Figure 18-5.png -------------------------------------------------------------------------------- /zh-cn/Chapter-16_Version_Control_and_Branch_Management/images/Figure 16-1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/qiangmzsx/Software-Engineering-at-Google/HEAD/zh-cn/Chapter-16_Version_Control_and_Branch_Management/images/Figure 16-1.png -------------------------------------------------------------------------------- /_coverpage.md: -------------------------------------------------------------------------------- 1 | ![logo](assets/images/swe_at_google.2.cover.jpg ':size=20%') 2 | 3 | 4 | # Software Engineering at Google 中文版 5 | 6 | > Lessons Learned from Programming Over Time 7 | 8 | 9 | [GitHub](https://github.com/qiangmzsx/Software-Engineering-at-Google) 10 | [Get Started](#software-engineering-at-google) 11 | -------------------------------------------------------------------------------- /_sidebar.md: -------------------------------------------------------------------------------- 1 | - [前言](zh-cn/Foreword.md) 2 | - [序言](zh-cn/Preface.md) 3 | - [第一章 软件工程是什么?](zh-cn/Chapter-1_What_Is_Software_Engineering/Chapter-1_What_Is_Software_Engineering.md) 4 | - [第二章 如何融入团队](zh-cn/Chapter-2_How_to_Work_Well_on_Teams/Chapter-2_How_to_Work_Well_on_Teams.md) 5 | - [第三章 知识共享](zh-cn/Chapter-3_Knowledge_Sharing/Chapter-3_Knowledge_Sharing.md) 6 | - [第四章 公平工程](zh-cn/Chapter-4_Engineering_for_Equity/Chapter-4_Engineering_for_Equity.md) 7 | - [第五章 如何领导团队](zh-cn/Chapter-5_How_to_Lead_a_Team/Chapter-5_How_to_Lead_a_Team.md) 8 | - [第六章 规模优先](zh-cn/Chapter-6_Leading_at_Scale/Chapter-6_Leading_at_Scale.md) 9 | - [第七章 测量工程效率](zh-cn/Chapter-7_Measuring_Engineering_Productivity/Chapter-7_Measuring_Engineering_Productivity.md) 10 | - [第八章 风格指南和规则](zh-cn/Chapter-8_Style_Guides_and_Rules/Chapter-8_Style_Guides_and_Rules.md) 11 | - [第九章 代码审查](zh-cn/Chapter-9_Code_Review/Chapter-9_Code_Review.md) 12 | - [第十章 文档](zh-cn/Chapter-10_Documentation/Chapter-10_Documentatio.md) 13 | - [第十一章 测试概述](zh-cn/Chapter-11_Testing_Overview/Chapter-11_Testing_Overview.md) 14 | - [第十二章 单元测试](zh-cn/Chapter-12_Unit_Testing/Chapter-12_Unit_Testing.md) 15 | - [第十三章 测试替代](zh-cn/Chapter-13_Test_Doubles/Chapter-13_Test_Doubles.md) 16 | - [第十四章 大型测试](zh-cn/Chapter-14_Larger_Testing/Chapter-14_Larger_Testing.md) 17 | - [第十五章 废弃](zh-cn/Chapter-15_Deprecation/Chapter-15_Deprecation.md) 18 | - [第十六章 版本控制和分支管理](zh-cn/Chapter-16_Version_Control_and_Branch_Management/Chapter-16_Version_Control_and_Branch_Management.md) 19 | - [第十七章 代码搜索](zh-cn/Chapter-17_Code_Search/Chapter-17_Code_Search.md) 20 | - [第十八章 构建系统,构建理念](zh-cn/Chapter-18_Build_Systems_and_Build_Philosophy/Chapter-18_Build_Systems_and_Build_Philosophy.md) 21 | - [第十九章 体验:谷歌的代码审查工具](zh-cn/Chapter-19_Critique_Googles_Code_Review_Tool/Chapter-19_Critique_Googles_Code_Review_Tool.md) 22 | - [第二十章 静态分析](zh-cn/Chapter-20_Static_Analysis/Chapter-20_Static_Analysis.md) 23 | - [第二十一章 依赖管理](zh-cn/Chapter-21_Dependency_Management/Chapter-21_Dependency_Management.md) 24 | - [第二十二章 大规模变更](zh-cn/Chapter-22_Large-Scale_Changes/Chapter-22_Large-Scale_Changes.md) 25 | - [第二十三章 持续集成](zh-cn/Chapter-23_Continuous_Integration/Chapter-23_Continuous_Integration.md) 26 | - [第二十四章 持续交付](zh-cn/Chapter-24_Continuous_Delivery/Chapter-24_Continuous_Delivery.md) 27 | - [第二十五章 计算即服务](zh-cn/Chapter-25_Compute_as_a_Service/Chapter-25_Compute_as_a_Service.md) 28 | - [后记](zh-cn/Afterword.md) 29 | 30 | -------------------------------------------------------------------------------- /zh-cn/Afterword.md: -------------------------------------------------------------------------------- 1 | # 后记 2 | Software engineering at Google has been an extraordinary experiment in how to develop and maintain a large and evolving codebase. I’ve seen engineering teams break ground on this front during my time here, moving Google forward both as a company that touches billions of users and as a leader in the tech industry. This wouldn’t have been possible without the principles outlined in this book, so I’m very excited to see these pages come to life. 3 | 4 | 谷歌的软件工程是一个非凡的实验,探索如何开发和维护一个庞大且不断演进的代码库。在我在这里的时间里,我见证了工程团队在这方面取得的突破,将谷歌推向了一个既触及数十亿用户又是科技行业领导者的公司。如果没有本书中概述的原则,这将是不可能的,因此我非常兴奋地看到这些内容变为现实。 5 | 6 | If the past 50 years (or the preceding pages here) have proven anything, it’s that soft‐ ware engineering is far from stagnant. In an environment in which technology is steadily changing, the software engineering function holds a particularly important role within a given organization. Today, software engineering principles aren’t simply about how to effectively run an organization; they’re about how to be a more respon‐ sible company for users and the world at large. 7 | 8 | 如果过去的50年(或前面的内容)证明了什么,那就是软件工程远没有停滞不前。在技术不断变化的环境中,软件工程功能在组织中扮演着特别重要的角色。今天,软件工程原则不仅仅是关于如何有效地运行一个组织;他们关注的是如何成为一家对用户和整个世界更负责任的公司。 9 | 10 | Solutions to common software engineering problems are not always hidden in plain sight—most require a certain level of resolute agility to identify solutions that will work for current-day problems and also withstand inevitable changes to technical systems. This agility is a common quality of the software engineering teams I’ve had the privilege to work with and learn from since joining Google back in 2008. 11 | 12 | 解决常见的软件工程问题并非总是显而易见的事情--大多数问题需要一定程度的果断敏捷性,以确定能够解决当前问题的解决方案,同时也能承受技术系统不可避免的变化。这种敏捷性是我自2008年加入谷歌以来,有幸与之合作和学习的软件工程团队普遍具备的一种品质。 13 | 14 | The idea of sustainability is also central to software engineering. Over a codebase’s expected lifespan, we must be able to react and adapt to changes, be that in product direction, technology platforms, underlying libraries, operating systems, and more. Today, we rely on the principles outlined in this book to achieve crucial flexibility in changing pieces of our software ecosystem. 15 | 16 | 可持续性的概念对软件工程也至关重要。在代码库的预期生命周期内,我们必须能够对变化做出反应和适应,无论是在产品方向、技术平台、底层库、操作系统还是其他方面。今天,我们依靠本书中概述的原则,在改变软件生态系统的各个部分时实现至关重要的灵活性。 17 | 18 | We certainly can’t prove that the ways we’ve found to attain sustainability will work for every organization, but I think it’s important to share these key learnings. Soft‐ ware engineering is a new discipline, so very few organizations have had the chance to achieve both sustainability and scale. By providing this overview of what we’ve seen, as well as the bumps along the way, our hope is to demonstrate the value and feasibility of long-term planning for code health. The passage of time and the impor‐ tance of change cannot be ignored. 19 | 20 | 我们当然不能证明我们找到的实现可持续性的途径对每个组织都有效,但我认为分享这些关键的经验很重要。软体工程是一门新的学科,所以很少有组织有机会同时实现可持续性和规模化。通过概述我们所看到的,以及沿途的坎坷,我们希望展示价值和可行性。时间的推移和变化的重要性是不容忽视的。 21 | 22 | This book outlines some of our key guiding principles as they relate to software engi‐ neering. At a high level, it also illuminates the influence of technology on society. As software engineers, it’s our responsibility to ensure that our code is designed with inclusion, equity, and accessibility for everyone. Building for the sole purpose of innovation is no longer acceptable; technology that helps only a set of users isn’t innovative at all. 23 | 24 | 本书概述了我们与软件工程相关的一些关键指导原则。在高维度上,它还阐明了技术对社会的影响。作为软件工程师,我们有责任确保我们的代码设计具有包容性、公平性和可访问性。以创新为唯一目的的架构不再被接受;只帮助一个群体的技术根本就不是创新。 25 | 26 | Our responsibility at Google has always been to provide developers, internally and externally, with a well-lit path. With the rise of new technologies like artificial intelli‐ gence, quantum computing, and ambient computing, there’s still plenty for us to learn as a company. I’m particularly excited to see where the industry takes software engi‐ neering in the coming years, and I’m confident that this book will help shape that path. 27 | 28 | 我们在谷歌的责任一直是为内部和外部的开发者提供一条光明的道路。随着人工智能、量子计算和环境计算等新技术的兴起,作为一家公司,我们仍有很多东西需要学习。我特别期待看到软件行业在未来几年的发展方向,我相信这本书将有助于塑造这条路。 29 | 30 | —*Asim Husain* 31 | 32 | *Vice President of Engineering, Google* 33 | 34 | 35 | -------------------------------------------------------------------------------- /zh-cn/Foreword.md: -------------------------------------------------------------------------------- 1 | ## Foreword 序言 2 | 3 | I have always been endlessly fascinated with the details of how Google does things. I have grilled my Googler friends for information about the way things really work inside of the company. How do they manage such a massive, monolithic code repository without falling over? How do tens of thousands of engineers successfully collaborate on thousands of projects? How do they maintain the quality of their systems? 4 | 5 | 我对谷歌做事的细节着迷不已。我也曾向在谷歌工作的朋友问询谷歌内部如何运作。他们是如何管理如此庞大的单体代码库而不出错的?数以万计的工程师是如何在数千个项目上成功协作的?他们是如何保持系统的质量的? 6 | 7 | Working with former Googlers has only increased my curiosity. If you’ve ever worked with a former Google engineer (or “Xoogler,” as they’re sometimes called), you’ve no doubt heard the phrase “at Google we…” Coming out of Google into other companies seems to be a shocking experience, at least from the engineering side of things. As far as this outsider can tell, the systems and processes for writing code at Google must be among the best in the world, given both the scale of the company and how often peo‐ ple sing their praises. 8 | 9 | 与前谷歌员工一起共事,只会增加我的好奇心。如果你曾经与前谷歌工程师(或他们有时称之为“Xoogler”)一起工作,你无疑听到过这样一句话:"在谷歌我们......" 从谷歌出来进入其他公司已经是一个令人羡慕的经历,至少从工程方面来说是这样。就我这个局外人而言,考虑到公司的规模和员工对其的赞誉程度,谷歌公司编写代码的系统和流程一定是世界上最好的之一。 10 | 11 | In *Software Engineering at Google*, a set of Googlers (and some Xooglers) gives us a lengthy blueprint for many of the practices, tools, and even cultural elements that underlie software engineering at Google. It’s easy to overfocus on the amazing tools that Google has built to support writing code, and this book provides a lot of details about those tools. But it also goes beyond simply describing the tooling to give us the philosophy and processes that the teams at Google follow. These can be adapted to fit a variety of circumstances, whether or not you have the scale and tooling. To my delight, there are several chapters that go deep on various aspects of automated testing, a topic that continues to meet with too much resistance in our industry. 12 | 13 | 在*《Google的软件工程》*中,一组Googlers(和一些Xooglers)为我们提供了谷歌软件工程的许多实践、工具甚至文化元素的详细蓝图。我们很容易过度关注谷歌为支持编写代码而构建的神奇工具,本书提供了很多关于这些工具的细节。本书不仅仅是简单地描述工具,为我们提供谷歌团队遵循的理念和流程。这些都可以适应各种情况,无论你是否有这样的规模和工具。令我兴奋的是,有几个章节深入探讨了自动化测试的各个方面,这个话题在我们的行业中仍然遇到太多的阻力。 14 | 15 | The great thing about tech is that there is never only one way to do something. Instead, there is a series of trade-offs we all must make depending on the circumstances of our team and situation. What can we cheaply take from open source? What can our team build? What makes sense to support for our scale? When I was grilling my Googler friends, I wanted to hear about the world at the extreme end of scale: resource rich, in both talent and money, with high demands on the software being built. This anecdotal information gave me ideas on some options that I might not otherwise have considered. 16 | 17 | 技术的伟大之处在于,做一件事永远不会只有一种方法。相反,有一系列的权衡,我们都必须根据我们的团队和现状来选择。我们可以从开放源码中低成本地获取什么?我们的团队可以创建什么?对我们的规模来说,什么是有意义的支持?当我在询问我的Googler朋友时,我想听听处于规模之颠的世界:要钱有钱,要人有人,对正在构建的软件要求很高。这些信息给了我一些想法,这些想法可能是我没有思考过的。 18 | 19 | With this book, we’ve written down those options for everyone to read. Of course, Google is a unique company, and it would be foolish to assume that the right way to run your software engineering organization is to precisely copy their formula. Applied practically, this book will give you ideas on how things could be done, and a lot of information that you can use to bolster your arguments for adopting best practices like testing, knowledge sharing, and building collaborative teams. 20 | 21 | 通过这本书,我们把这些选择写下来供大家阅读。当然,谷歌是一家独一无二的公司,如果认为运行你的软件工程组织的正确方法是精确地复制他们的模式,那就太愚蠢了。在实际应用中,这本书会给你提供关于如何做事情的想法,以及很多信息,你可以用这些信息来支持你采用最佳实践的论据,如测试、知识共享和建立协作团队。 22 | 23 | You may never need to build Google yourself, and you may not even want to reach for the same techniques they apply in your organization. But if you aren’t familiar with the practices Google has developed, you’re missing a perspective on software engineering that comes from tens of thousands of engineers working collaboratively on software over the course of more than two decades. That knowledge is far too valuable to ignore. 24 | 25 | 你可能永远不需要自己创建谷歌,你甚至可能不想在你的组织中使用他们所应用的技术。但是,如果你不熟悉谷歌开发的实践,你就会错过一个关于软件工程的视角,这个视角来自于二十多年来数万名工程师在软件上的协作。这些知识太有价值了,不能忽视。 26 | 27 | 28 | 29 | *— Camille Fournier* *Author,* The Manager’s Path -------------------------------------------------------------------------------- /index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | Software Engineering at Google 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |
20 | 21 | 115 | 116 | 117 | 118 | 119 | 120 | 121 | 122 | 123 | 124 | 125 | 126 | 127 | 128 | 129 | 130 | 151 | 152 | 153 | 154 | -------------------------------------------------------------------------------- /zh-cn/Preface.md: -------------------------------------------------------------------------------- 1 | ## Preface 序言 2 | 3 | This book is titled *Software Engineering at Google*. What precisely do we mean by software engineering? What distinguishes “software engineering” from “programming” or “computer science”? And why would Google have a unique perspective to add to the corpus of previous software engineering literature written over the past 50 years? 4 | 5 | 本书的标题是*《谷歌的软件工程》*。我们对软件工程的确切定义是什么?软件工程 "与 "编程 "或 "计算机科学 "的区别是什么?为什么谷歌在过去50年的软件工程文献库中会有那些独特的视角? 6 | 7 | The terms “programming” and “software engineering” have been used interchangeably for quite some time in our industry, although each term has a different emphasis and different implications. University students tend to study computer science and get jobs writing code as “programmers.” 8 | 9 | 在我们的业界,"编程 "和 "软件工程 "这两个术语已经被交替使用了相当长的时间,尽管每个术语都有不同的重点和不同的含义。大学生倾向于学习计算机科学,并以作为"程序员 “的身份进行写代码的工作。 10 | 11 | “Software engineering,” however, sounds more serious, as if it implies the application of some theoretical knowledge to build something real and precise. Mechanical engineers, civil engineers, aeronautical engineers, and those in other engineering disciplines all practice engineering. They all work in the real world and use the application of their theoretical knowledge to create something real. Software engineers also create “something real,” though it is less tangible than the things other engineers create. 12 | 13 | 然而,"软件工程 "听起来更加严肃,似乎它意味着应用理论知识来建立真实和精确的东西。机械工程师、土木工程师、航空工程师和其他工程学科的人都在进行工程实践。他们都在现实世界中工作,运用他们的理论知识来创造一些真实的东西。软件工程师也创造 "真实的东西",尽管它没有像其他工程师创造的东西那么有形。 14 | 15 | Unlike those more established engineering professions, current software engineering theory or practice is not nearly as rigorous. Aeronautical engineers must follow rigid guidelines and practices, because errors in their calculations can cause real damage; programming, on the whole, has traditionally not followed such rigorous practices. But, as software becomes more integrated into our lives, we must adopt and rely on more rigorous engineering methods. We hope this book helps others see a path toward more reliable software practices. 16 | 17 | 与那些更成熟的工程专业不同,目前的软件工程理论或实践方法还没有那么严格。航空工程师必须遵循严格的准则和实践,因为他们的计算错误会造成真正的损失;而编程,总体来说,传统上没有遵循这样严格的实践。但是,随着软件越来越多地融入我们的生活,我们必须采用并依赖更严格的工程方法。我们希望这本书能帮助其他人看到一条通往更可靠的软件实践的道路。 18 | 19 | ### Programming Over Time 随时间变化的编程 20 | 21 | We propose that “software engineering” encompasses not just the act of writing code, but all of the tools and processes an organization uses to build and maintain that code over time. What practices can a software organization introduce that will best keep its code valuable over the long term? How can engineers make a codebase more sustainable and the software engineering discipline itself more rigorous? We don’t have fundamental answers to these questions, but we hope that Google’s collective experience over the past two decades illuminates possible paths toward finding those answers. 22 | 23 | 我们建议,"软件工程 "不仅包括编写代码的行为,还包括一个组织用来长期构建和随时间维护代码的所有工具和流程。一个软件组织可以采用哪些做法来使其代码长期保持最佳价值?工程师们如何才能使代码库更具有可持续性,并使软件工程学科本身更加严格?我们没有这些问题的最终答案,但我们希望谷歌在过去20年的集体经验能够为寻找这些答案的提供可能。 24 | 25 | One key insight we share in this book is that software engineering can be thought of as “programming integrated over time.” What practices can we introduce to our code to make it *sustainable*—able to react to necessary change—over its life cycle, from conception to introduction to maintenance to deprecation? 26 | 27 | 我们在本书中分享的一个关键观点是,软件工程可以被认为是 "随着时间推移而整合的编程"。我们可以在我们的代码中引入哪些实践,使其*可持续*——能够对必要的变化做出反应——在其生命周期中,从设计到引入到维护到废弃? 28 | 29 | The book emphasizes three fundamental principles that we feel software organizations should keep in mind when designing, architecting, and writing their code: 30 | 31 | *Time and Change* 32 |      How code will need to adapt over the length of its life 33 | 34 | *Scale and Growth* 35 |      How an organization will need to adapt as it evolves 36 | 37 | *Trade-offs and Costs* 38 |      How an organization makes decisions, based on the lessons of Time and Change and Scale and Growth 39 | 40 | 本书强调了三个基本原则,我们认为软件组织在设计、架构和编写代码时应该牢记这些原则: 41 | 42 | *时间和变化* 43 |      ​代码如何在其生命周期内进行适配。 44 | 45 | *规模和增长* 46 |      ​一个组织如何适应它的发展过程。 47 | 48 | *权衡和成本* 49 |      ​一个组织如何根据时间和变化以及规模和增长的经验教训做出决策。 50 | 51 | Throughout the chapters, we have tried to tie back to these themes and point out ways in which such principles affect engineering practices and allow them to be sustainable. (See [Chapter 1 ](#_bookmark3)for a full discussion.) 52 | 53 | 在整个章节中,我们都尝试与这些主题联系起来,并指出这些原则如何影响工程实践并使其可持续。(见[第1章](#_bookmark3)的全面讨论)。 54 | 55 | ### Google’s Perspective 谷歌的视角 56 | 57 | Google has a unique perspective on the growth and evolution of a sustainable soft‐ ware ecosystem, stemming from our scale and longevity. We hope that the lessons we have learned will be useful as your organization evolves and embraces more sustainable practices. 58 | 59 | 谷歌对可持续软件生态系统的发展和演变有着独特的视角,这源于我们的规模和寿命。我们希望在你的组织发展和采用更多的可持续发展的做法时,我们学到的经验将能对你有帮助。 60 | 61 | We’ve divided the topics in this book into three main aspects of Google’s software engineering landscape: 62 | - Culture 63 | - Processes 64 | - Tools 65 | 66 | 我们将本书的主题分为谷歌软件工程领域的三个主要方面: 67 | - 文化 68 | - 流程 69 | - 工具 70 | 71 | Google’s culture is unique, but the lessons we have learned in developing our engineering culture are widely applicable. Our chapters on Culture ([Part II](#_bookmark100)) emphasize the collective nature of a software development enterprise, that the development of software is a team effort, and that proper cultural principles are essential for an organization to grow and remain healthy. 72 | 73 | 谷歌的文化是独一无二的,但我们在发展工程文化中所获得的经验是广泛适用的。我们关于文化的章节([第二部分](#_bookmark100))强调了软件开发企业的集体性,软件开发是一项团队工作,正确的文化原则对于一个组织的成长和保持健康至关重要。 74 | 75 | The techniques outlined in our Processes chapters ([Part III](#_bookmark579)) are familiar to most soft‐ ware engineers, but Google’s large size and long-lived codebase provides a more complete stress test for developing best practices. Within those chapters, we have tried to emphasize what we have found to work over time and at scale as well as identify areas where we don’t yet have satisfying answers. 76 | 77 | 在我们的流程章节([第三部分](#_bookmark579))中概述的技术是大多数软体工程师所熟悉的,但谷歌的庞大规模和长期的代码库为开发最佳实践提供了一个更完整的压力测试。在这些章节中,我们强调我们发现随着时间的推移和规模的扩大,什么是有效的,以及确定我们还没有满意的答案的领域。 78 | 79 | Finally, our Tools chapters ([Part IV](#_bookmark1363)) illustrate how we leverage our investments in tooling infrastructure to provide benefits to our codebase as it both grows and ages. In some cases, these tools are specific to Google, though we point out open source or third-party alternatives where applicable. We expect that these basic insights apply to most engineering organizations. 80 | 81 | 最后,我们的工具章节([第四部分](#_bookmark1363))说明了我们如何利用对工具基础设施的投入来优化代码库,因为它既增长又腐化。在某些情况下,这些工具是谷歌特有的,尽管我们在适当的地方指出了开源或第三方的替代品。我们希望这些基本的见解适用于大多数工程组织。 82 | 83 | The culture, processes, and tools outlined in this book describe the lessons that a typical software engineer hopefully learns on the job. Google certainly doesn’t have a monopoly on good advice, and our experiences presented here are not intended to dictate what your organization should do. This book is our perspective, but we hope you will find it useful, either by adopting these lessons directly or by using them as a starting point when considering your own practices, specialized for your own problem domain. 84 | 85 | 本书中描写的文化、流程和工具是大多数的软件工程师希望在工作中使用的内容。谷歌当然不会独断好建议,我们在这里介绍的经验并不是要规定你的组织应当这么做。本书是我们的观点,但我们希望你会发现它是有用的,可以直接采用这些经验,也可以在考虑自己的实践时把它们作为一个起点,专门用于解决自己的领域问题。 86 | 87 | Neither is this book intended to be a sermon. Google itself still imperfectly applies many of the concepts within these pages. The lessons that we have learned, we learned through our failures: we still make mistakes, implement imperfect solutions, and need to iterate toward improvement. Yet the sheer size of Google’s engineering organization ensures that there is a diversity of solutions for every problem. We hope that this book contains the best of that group. 88 | 89 | 本书也不打算成为一本布道书。谷歌自身仍在不完善地应用这些书中的许多理念。我们从失败中吸收了教训:我们仍然会犯错误,采用不完美的解决方案,还需要迭代改进。然而,谷歌工程组织的庞大规模确定了每个问题都有多样化的解决方案。我们希望这本书包含了这群人中最好的方案。 90 | 91 | ### What This Book Isn’t 本书不适用于哪些 92 | 93 | This book is not meant to cover software design, a discipline that requires its own book (and for which much content already exists). Although there is some code in this book for illustrative purposes, the principles are language neutral, and there is little actual “programming” advice within these chapters. As a result, this text doesn’t cover many important issues in software development: project management, API design, security hardening, internationalization, user interface frameworks, or other language-specific concerns. Their omission in this book does not imply their lack of importance. Instead, we choose not to cover them here knowing that we could not provide the treatment they deserve. We have tried to make the discussions in this book more about engineering and less about programming. 94 | 95 | 本书并不是要涵盖软件设计,这门学科有自己的书(而且已经有很多类型的书)。虽然书中有一些代码用于说明问题,但原则是语言无关的,而且这些章节中几乎没有实际的 "编程 "建议。因此,本书没有涉及软件开发中的许多重要问题:项目管理、API设计、安全加固、国际化、用户界面框架或其他特定编程语言问题。本书对这些问题的忽略并不意味着它们不重要。相反,我们选择不在这里涉及它们,因为我们知道我们无法提供它们应有的内容。我们试图使本书的讨论更多的关于工程领域,而不是关于编程领域。 96 | 97 | ### Parting Remarks 临别赠言 98 | 99 | This text has been a labor of love on behalf of all who have contributed, and we hope that you receive it as it is given: as a window into how a large software engineering organization builds its products. We also hope that it is one of many voices that helps move our industry to adopt more forward-thinking and sustainable practices. Most important, we further hope that you enjoy reading it and can adopt some of its lessons to your own concerns. 100 | 101 | 这篇文章是所有贡献者的心血结晶,我们希望你能虚心地接受它:作为了解一个大型软件工程组织如何构建其产品的窗口。我们还希望它是有助于推动我们的行业采用更具前瞻性和可持续实践的众多声音之一。最重要的是,我们更希望你喜欢它,并能将其中的一些经验用于你的工作。 102 | 103 | 104 | 105 | *— Tom Manshreck* 106 | 107 | 108 | 109 | 110 | 111 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Software-Engineering-at-Google 2 | 3 | 《Software Engineering at Google》的中英文对译版本。 4 | 5 | ## 书籍介绍 6 | 7 | ![Software Engineering at Google](./assets/images/swe_at_google.2.cover.jpg) 8 | by **Titus Winters, Tom Manshreck, and Hyrum Wright** 9 | 10 | 当前国内已经有中文版书籍出版了,大家需要更加准确,推荐大家去购买。 11 | 12 | ## 为什么翻译 13 | 14 | 目前 GitHub 上并没有对《Software Engineering at Google》的中文翻译。加之本人的英语也不好,好不容易看了一遍,似有所有领悟,想要再看一遍的时候,发现满眼都是英文,实在是痛苦! 15 | 16 | 为了让自己也让更多的中文读者有更好的阅读体验,学习到当前人类最为复杂系统是如何开发和维护的知识。 17 | 18 | 本人边看边记录翻译和学习笔记。 19 | 20 | ## 在线阅读 21 | 22 | https://qiangmzsx.github.io/Software-Engineering-at-Google/#/ 23 | 24 | ## 当前状态 25 | 26 | | PART | 部分 | 章节 | 名称 | 状态 | 翻译人员 | 计划完成时间 | 备注 | 27 | | ----------------- | ------------- | -------------------------------------------------- | ------------------------------------------------------------ | ---------- | ---------------------------------------- | -------------- | ------------------------------------------------------------ | 28 | | Foreword | 序言 | | [序言](./zh-cn/Foreword.md) | 完成校验 | @qiangmzsx | | | 29 | | | 前言 | | [前言](./zh-cn/Preface.md) | 完成校验 | @qiangmzsx | | | 30 | | PART 1 Thesis | 第一部分 理论 | chapter 1 : What Is Software Engineering? | [第一章:软件工程是什么?](./zh-cn/Chapter-1_What_Is_Software_Engineering/Chapter-1_What_Is_Software_Engineering.md) | 完成校验 | @qiangmzsx | 2021-12-02 | 2021-11-30 完成。
统一将政策改为策略 | 31 | | PART 2 Culture | 第二部分 文化 | chapter 2 : How to Work Well on Teams | [第二章:如何融入团队](./zh-cn/Chapter-2_How_to_Work_Well_on_Teams/Chapter-2_How_to_Work_Well_on_Teams.md) | 完成校验 | @qiangmzsx | 2021-12-20 | 2021-12-02 完成 | 32 | | | | chapter 3 : Knowledge Sharing | [第三章:知识共享](./zh-cn/Chapter-3_Knowledge_Sharing/Chapter-3_Knowledge_Sharing.md) | 完成校验 | @qiangmzsx | 2021-12-20 | 2021-12-05 完成 | 33 | | | | chapter 4 : Engineering for Equity | [第四章:公平工程](./zh-cn/Chapter-4_Engineering_for_Equity/Chapter-4_Engineering_for_Equity.md) | 完成校验 | @qiangmzsx | 2021-12-20 | 我现在还很难区分『平等』和『公平』。
平等和公平的区别:性质不同。本义不同。平等:指人们在社会、政治、经济、法律等方面享有相等待遇。公平:处理任何事情合情合理,不会偏袒哪一方面。
扩展资料: 性质不同:公平”是一种手段,“平等”是一个结果。 本义不同:平等强调的是无差别,而公平则强调公道、公正、不偏袒。或者说,是否承认存在差别,构成了公平与平等最主要的区别。平等否认存在差别,而公平则不然,它承认存在差别,并在此基础上得以实现。 平等意思:它是人和人之间的一种关系、人对人的一种态度,它是人类的终极理想之一。由于人之差异绝对的公平不存在,只有相对的平等,现代社会的进步就是人和人之间从不平等走向平等过程是平等逐渐实现的过程,遇到不道德之处一定要坚决消灭。 公平指公正,不偏不倚。平是指所有的参与者(人或者团体)的各项属性(包括投入、获得等)平均。公为公正、合理,能获得广泛的支持;平指平等、平均。 由于人之差异而没有绝对的公平,只有相对的公平。 | 34 | | | | chapter 5 : How to Lead a Team | [第五章:如何领导团队](./zh-cn/Chapter-5_How_to_Lead_a_Team/Chapter-5_How_to_Lead_a_Team.md) | 完成校验 | @qiangmzsx | 2022-05-02 | | 35 | | | | chapter 6 : Leading at Scale | [第六章:规模优先](./zh-cn/Chapter-6_Leading_at_Scale/Chapter-6_Leading_at_Scale.md) | 完成校验 | @FingerLiu | 2022-04-06 | | 36 | | | | chapter 7 : Measuring Engineering Productivity | [第七章:测量工程效率](./zh-cn/Chapter-7_Measuring_Engineering_Productivity/Chapter-7_Measuring_Engineering_Productivity.md) | 完成校验 | @qiangmzsx | 2022-05-23 | | 37 | | PART 3 Processes | 第三部分 流程 | chapter 8 : Style Guides and Rules | [第八章:风格指南和规则](./zh-cn/Chapter-8_Style_Guides_and_Rules/Chapter-8_Style_Guides_and_Rules.md) | 完成校验 | @ll13 | 2022-04-18 | | 38 | | | | chapter 9 : Code Review | [第九章:代码审查](./zh-cn/Chapter-9_Code_Review/Chapter-9_Code_Review.md) | 完成校验 | @qiangmzsx | 2022-04-04 | | 39 | | | | chapter 10 : Documentation | [第十章:文档](./zh-cn/Chapter-10_Documentation/Chapter-10_Documentatio.md) | 完成校验 | @qiangmzsx | 2022-01-15 | 1、概述,大略地叙述,对文章或事物进行概括表达。在百度百科里,特指词条概述,对已有信息进行简明归纳。
2、概念:人类在认识过程中,从感性认识上升到理性认识,把所感知的事物的共同本质特点抽象出来,加以概括,是自我认知意识的一种表达,形成概念式思维惯性。在人类所认知的思维体系中最基本的构筑单位。 | 40 | | | | chapter 11 : Testing Overview | [第十一章:测试概述](./zh-cn/Chapter-11_Testing_Overview/Chapter-11_Testing_Overview.md) | 完成校验 | @qiangmzsx | 2022-04-15 | | 41 | | | | chapter 12 : Unit Testing | [第十二章:单元测试](./zh-cn/Chapter-12_Unit_Testing/Chapter-12_Unit_Testing.md) | 完成校验 | @qiangmzsx | 2022-03-19 | | 42 | | | | chapter 13 : Test Doubles | [第十三章:测试替代](./zh-cn/Chapter-13_Test_Doubles/Chapter-13_Test_Doubles.md) | 完成校验 | @qiangmzsx | 2022-02-11完成 | | 43 | | | | chapter 14 : Larger Testing | [第十四章:大型测试](./zh-cn/Chapter-14_Larger_Testing/Chapter-14_Larger_Testing.md) | 完成校验 | @qiangmzsx | | | 44 | | | | chapter 15 : Deprecation | [第十五章:废弃](./zh-cn/Chapter-15_Deprecation/Chapter-15_Deprecation.md) | 完成校验 | [jixiufeng](https://github.com/jixiuf) | | | 45 | | PART 4 Tools | 第四部分 工具 | chapter 16 : Version Control and Branch Management | [第十六章:版本控制和分支管理](./zh-cn/Chapter-16_Version_Control_and_Branch_Management/Chapter-16_Version_Control_and_Branch_Management.md) | 完成校验 | @qiangmzsx | 2022-04-07 | | 46 | | | | chapter 17 : Code Search | [第十七章:代码搜索](zh-cn/Chapter-17_Code_Search/Chapter-17_Code_Search.md) | 完成校验 | [caili](https://github.com/transfercai) | 2022-02-16 | | 47 | | | | chapter 18 : Build Systems and Build Philosophy | [第十八章:构建系统,构建理念](./zh-cn/Chapter-18_Build_Systems_and_Build_Philosophy/Chapter-18_Build_Systems_and_Build_Philosophy.md) | 完成校验 | @qiangmzsx | 2022-01-26 | 构件(功能模块化,前提是接口标准化);组件对数据和方法的简单封装。 | 48 | | | | chapter 19 : Critique : Google’s Code Review Tool | [第十九章:体验:谷歌的代码审查工具](./zh-cn/Chapter-19_Critique_Googles_Code_Review_Tool/Chapter-19_Critique_Googles_Code_Review_Tool.md) | 完成校验 | @qiangmzsx | 2022-01-30 | | 49 | | | | chapter 20 : Static Analysis | [第二十章:静态分析](./zh-cn/Chapter-20_Static_Analysis/Chapter-20_Static_Analysis.md) | 完成校验 | @yangjun | 2022-04-24 | | 50 | | | | chapter 21 : Dependency Management | [第二十一章:依赖管理](./zh-cn/Chapter-21_Dependency_Management/Chapter-21_Dependency_Management.md) | 完成校验 | @qiangmzsx | 2022-03-013 | | 51 | | | | chapter 22 : Large-Scale Changes | [第二十二章:大规模变更](./zh-cn/Chapter-22_Large-Scale_Changes/Chapter-22_Large-Scale_Changes.md) | 完成校验 | @qiangmzsx | 2022-03-10 | 宠物类比的是传统服务器或虚拟机,每一个都被照顾得很好,不可或缺,不会自动变得有序。牛比的是云端服务器实例,量多、可替代、短生命周期,可以自动变得有序。 | 52 | | | | chapter 23 : Continuous Integration | [第二十三章:持续集成](./zh-cn/Chapter-23_Continuous_Integration/Chapter-23_Continuous_Integration.md) | 完成校验 | @qiangmzsx | 2022-03-06 | | 53 | | | | chapter 24 : Continuous Delivery | [第二十四章:持续交付](./zh-cn/Chapter-24_Continuous_Delivery/Chapter-24_Continuous_Delivery.md) | 完成校验 | @qiangmzsx | 2022-03-08 | | 54 | | | | chapter 25 : Compute as a Service | [第二十五章:计算即服务](./zh-cn/Chapter-25_Compute_as_a_Service/Chapter-25_Compute_as_a_Service.md) | 完成校验 | @qiangmzsx | 2022-02-23 | | 55 | | PART 5 Conclusion | 第五部分 总结 | Afterword | [后记](./zh-cn/Afterword.md) | 完成校验 | @qiangmzsx | 2022-01-26完成 | | 56 | 57 | ## Star 58 | [![Stargazers over time](https://starchart.cc/qiangmzsx/Software-Engineering-at-Google.svg)](https://starchart.cc/qiangmzsx/Software-Engineering-at-Google) 59 | 60 | ## 授权许可 61 | 62 | 除特别声明外,本书中的内容使用 [CC BY-SA 3.0 License](http://creativecommons.org/licenses/by-sa/3.0/)(创作共用 署名-相同方式共享3.0 许可协议)授权,代码遵循 [BSD 3-Clause License](https://github.com/astaxie/build-web-application-with-golang/blob/master/LICENSE.md)(3 项条款的 BSD 许可协议)。 63 | -------------------------------------------------------------------------------- /zh-cn/Chapter-24_Continuous_Delivery/Chapter-24_Continuous_Delivery.md: -------------------------------------------------------------------------------- 1 | 2 | **CHAPTER 24** 3 | 4 | # Continuous Delivery 5 | 6 | # 第二十四章 持续交付 7 | 8 | **Written by adha Narayan, Bobbi Jones, Sheri Shipe, and David Owens** 9 | 10 | **Edited by Lisa Carey** 11 | 12 | Given how quickly and unpredictably the technology landscape shifts, the competitive advantage for any product lies in its ability to quickly go to market. An organization’s velocity is a critical factor in its ability to compete with other players, maintain product and service quality, or adapt to new regulation. This velocity is bottlenecked by the time to deployment. Deployment doesn’t just happen once at initial launch. There is a saying among educators that no lesson plan survives its first contact with the student body. In much the same way, no software is perfect at first launch, and the only guarantee is that you’ll have to update it. Quickly. 13 | 14 | 鉴于技术领域的变化是如此之快且不可预测,任何产品的竞争优势都在于其快速进入市场的能力。一个组织的速度是其与其他参与者竞争、保持产品和服务质量或适应新法规能力的关键因素。这种速度受到部署时间的瓶颈制约。部署不会在初始启动时只发生一次。教育工作者们有一种说法,没有一个教案能在第一次与学生接触后幸存下来。同样,没有软件在第一次发布时就是完美的,唯一的保证是你需要快速更新它。 15 | 16 | The long-term life cycle of a software product involves rapid exploration of new ideas, rapid responses to landscape shifts or user issues, and enabling developer velocity at scale. From Eric Raymond’s The Cathedral and the Bazaar to Eric Reis’ The Lean Startup, the key to any organization’s long-term success has always been in its ability to get ideas executed and into users’ hands as quickly as possible and reacting quickly to their feedback. Martin Fowler, in his book Continuous Delivery (aka CD), points out that “The biggest risk to any software effort is that you end up building something that isn’t useful. The earlier and more frequently you get working software in front of real users, the quicker you get feedback to find out how valuable it really is.” 17 | 18 | 软件产品的长期生命周期包括快速探索新想法、快速响应环境(行业)变化或用户问题,以及实现大规模开发速度。从埃里克·雷蒙德(Eric Raymond)的《大教堂与集市》(The Cathedral and The Bazaar)到埃里克·赖斯(Eric Reis)的《精益创业》(The Lean Startup),任何组织长期成功的关键始终在于其能够尽快将想法付诸实施并交到用户手中,并对他们的反馈做出快速反应。马丁·福勒(Martin Fowler)在其著作《持续交付》(Continuous Delivery,又名CD)中指出,“任何软件工作的最大风险是,你最终建立的东西并不实用。你越早、越频繁地将工作中的软件展现在真正的用户面前,你就能越快地得到反馈,发现它到底有多大价值。” 19 | 20 | Work that stays in progress for a long time before delivering user value is high risk and high cost, and can even be a drain on morale. At Google, we strive to release early and often, or “launch and iterate,” to enable teams to see the impact of their work quickly and to adapt faster to a shifting market. The value of code is not realized at the time of submission but when features are available to your users. Reducing the time between “code complete” and user feedback minimizes the cost of work that is in progress. 21 | 22 | 在交付用户价值之前进行很长时间的工作是高风险和高成本的,甚至可能会消耗士气。在谷歌,我们努力做到早期和经常发布,或者说 "发布和迭代",以使团队能够迅速看到他们工作的影响,并更快地适应不断变化的市场。代码的价值不是在提交时实现的,而是在你的用户可以使用的功能时实现的。缩短 "代码完成"和用户反馈之间的时间,可以将正在进行中的工作的成本降到最低。 23 | 24 | You get extraordinary outcomes by realizing that the launch *never lands* but that it begins a learning cycle where you then fix the next most important thing, measure how it went, fix the next thing, etc.—and it is *never complete*. 25 | —David Weekly, Former Google product manager 26 | 27 | 当你意识到发射从未着陆,但它开始了一个学习周期,然后你修复下一个最重要的事情,衡量它如何进行,修复下一个事情,等等——而且它永远不会完成。 28 | -David Weekly,前谷歌产品经理 29 | 30 | At Google, the practices we describe in this book allow hundreds (or in some cases thousands) of engineers to quickly troubleshoot problems, to independently work on new features without worrying about the release, and to understand the effectiveness of new features through A/B experimentation. This chapter focuses on the key levers of rapid innovation, including managing risk, enabling developer velocity at scale, and understanding the cost and value trade-off of each feature you launch. 31 | 32 | 在谷歌,我们在本书中描述的做法使数百名(或在某些情况下数千名)工程师能够快速解决问题,独立完成新功能而不必担心发布问题,并通过A/B实验了解新功能的有效性。本章重点关注快速创新的关键措施,包括管理风险、实现大规模的开发者速度,以及了解你推出的每个功能的成本和价值权衡。 33 | 34 | ## Idioms of Continuous Delivery at Google 谷歌持续交付的习惯用法 35 | 36 | A core tenet of Continuous Delivery (CD) as well as of Agile methodology is that over time, smaller batches of changes result in higher quality; in other words, *faster is safer*. This can seem deeply controversial to teams at first glance, especially if the prerequisites for setting up CD—for example, Continuous Integration (CI) and testing— are not yet in place. Because it might take a while for all teams to realize the ideal of CD, we focus on developing various aspects that deliver value independently en route to the end goal. Here are some of these: 37 | 38 | - *Agility* 39 | Release frequently and in small batches 40 | 41 | - *Automation* 42 | ​ Reduce or remove repetitive overhead of frequent releases 43 | 44 | - *Isolation* 45 | ​ Strive for modular architecture to isolate changes and make troubleshooting easier 46 | 47 | - *Reliability* 48 | ​ Measure key health indicators like crashes or latency and keep improving them 49 | 50 | - *Data-driven* *decision* *making* 51 | ​ Use A/B testing on health metrics to ensure quality 52 | 53 | - *Phased* *rollout* 54 | ​ Roll out changes to a few users before shipping to everyone 55 | 56 | 持续交付(CD)以及敏捷方法论的一个核心原则是,随着时间的推移,较小的变更批次能够产生更高的质量;换句话说,越快越安全。乍一看,这似乎对团队有很大的争议,尤其是当建立CD的前提条件--例如,持续集成(CI)和测试--还没有到位的时候。因为所有团队可能需要一段时间才能实现CD的理想,所以我们将重点放在开发能够在实现最终目标的过程中独立交付价值的各个方面。下面是其中的一些: 57 | 58 | - *敏捷性* 59 | ​ 频繁地、较小的变更批次地发布。 60 | 61 | - *自动化* 62 | ​ 减少或消除频繁发布的重复性开销。 63 | 64 | - *隔离性* 65 | ​ 努力实现模块化体系结构,以隔离更改并使故障排除更加容易。 66 | 67 | - *可靠性* 68 | ​ 衡量关键的健康指标,如崩溃或延迟,并不断改善它们。 69 | 70 | - *数据驱动的决策* 71 | ​ 在健康指标上使用A/B测试以确保质量。 72 | 73 | - *分阶段推出* 74 | ​ 在向所有人发送之前,先在少数用户中推广变更。 75 | 76 | At first, releasing new versions of software frequently might seem risky. As your userbase grows, you might fear the backlash from angry users if there are any bugs that you didn’t catch in testing, and you might quite simply have too much new code in your product to test exhaustively. But this is precisely where CD can help. Ideally, there are so few changes between one release and the next that troubleshooting issues is trivial. In the limit, with CD, every change goes through the QA pipeline and is automatically deployed into production. This is often not a practical reality for many teams, and so there is often work of culture change toward CD as an intermediate step, during which teams can build their readiness to deploy at any time without actually doing so, building up their confidence to release more frequently in the future. 77 | 78 | 起初,频繁发布新版本的软件可能看起来很冒险。随着用户群的增长,如果在测试中发现任何错误,你可能会担心用户的反弹,而且你的产品中可能有太多新代码,无法彻底测试。但这恰恰是CD可以帮助的地方。理想情况下,一个版本和下一个版本之间的变化非常少,排除问题是非常简单的。在极限情况下,有了CD,每个变化都会通过QA管道,并自动部署到生产中。对于许多团队来说,这通常不是一个实际的现实,因此往往需要进行向CD文化的转变工作作为中间步骤,团队可以在不实际部署的情况下建立部署准备性,增强未来更频繁发布的信心。 79 | 80 | ## Velocity Is a Team Sport: How to Break Up a Deployment into Manageable Pieces 速度是一项团队运动:如何将部署工作分解成可管理的部分 81 | 82 | When a team is small, changes come into a codebase at a certain rate. We’ve seen an antipattern emerge as a team grows over time or splits into subteams: a subteam branches off its code to avoid stepping on anyone’s feet, but then struggles, later, with integration and culprit finding. At Google, we prefer that teams continue to develop at head in the shared codebase and set up CI testing, automatic rollbacks, and culprit finding to identify issues quickly. This is discussed at length in Chapter 23. 83 | 84 | 当一个团队较小的时候,代码变化以一定的速度进入一个代码库。我们看到,随着时间的推移,一个团队的成长或分裂成子团队,会出现一种反模式:一个子团队将其代码分支,以避免踩到其他团队的脚,但之后却会遇到集成和故障排查的问题。在谷歌,我们更倾向于团队继续在共享代码库中进行开发,并设置CI测试、自动回滚和故障查找,以快速识别问题。这在第23章中有详细的讨论。 85 | 86 | One of our codebases, YouTube, is a large, monolithic Python application. The release process is laborious, with Build Cops, release managers, and other volunteers. Almost every release has multiple cherry-picked changes and respins. There is also a 50-hour manual regression testing cycle run by a remote QA team on every release. When the operational cost of a release is this high, a cycle begins to develop in which you wait to push out your release until you’re able to test it a bit more. Meanwhile, someone wants to add just one more feature that’s almost ready, and pretty soon you have yourself a release process that’s laborious, error prone, and slow. Worst of all, the experts who did the release last time are burned out and have left the team, and now nobody even knows how to troubleshoot those strange crashes that happen when you try to release an update, leaving you panicky at the very thought of pushing that button. 87 | 88 | 我们的一个代码库,YouTube,是一个大型的、单体的Python应用程序。发布过程很费劲,有Build Cops、发布经理和其他志愿者。每个发布版本都需要多次 cherry-pick(选择性合并)代码变更和重新打包等过程。此外,还有需要由远程QA团队运行的50小时手工回归测试周期。当一个发布的操作成本如此之高时,就会形成一个循环,在这个循环中,你要等待测试更多,才能推出发布版本。与此同时,有人想再增加一个几乎已经准备好的功能,很快你就有了一个费力、容易出错和缓慢的发布过程。最糟糕的是,上次做发布工作的专家已经精疲力尽,离开了团队,现在甚至没有人知道如何解决那些当你试图发布更新时发生的奇怪崩溃,让你一想到要按下那个按钮就感到恐慌。 89 | 90 | If your releases are costly and sometimes risky, the *instinct* is to slow down your release cadence and increase your stability period. However, this only provides short- term stability gains, and over time it slows velocity and frustrates teams and users. The *answer* is to reduce cost, increase discipline, and make the risks more incremental, but it is critical to resist the obvious operational fixes and invest in long-term architectural changes. The obvious operational fixes to this problem lead to a few traditional approaches: reverting to a traditional planning model that leaves little room for learning or iteration, adding more governance and oversight to the development process, and implementing risk reviews or rewarding low-risk (and often low-value) features. 91 | 92 | 如果你的发布是昂贵的,有时是有风险的,那么*本能*的反应是放慢你的发布节奏,增加你的稳定期。然而,这只能提供短期的稳定性收益,随着时间的推移,它会减慢速度,使团队和用户感到沮丧。答案是降低成本,增加纪律,使风险更加渐进式,但关键是要抵制明显的操作修复,投资于长期的架构变化。对这个问题的明显的操作性修正导致了一些传统的方法:恢复到传统的计划模式,为学习或迭代留下很少的空间,为开发过程增加更多的治理和监督,以及实施风险审查或奖励低风险(通常是低价值)的功能。 93 | 94 | The investment with the best return, though, is migrating to a microservice architecture, which can empower a large product team with the ability to remain scrappy and innovative while simultaneously reducing risk. In some cases, at Google, the answer has been to rewrite an application from scratch rather than simply migrating it, establishing the desired modularity into the new architecture. Although either of these options can take months and is likely painful in the short term, the value gained in terms of operational cost and cognitive simplicity will pay off over an application’s lifespan of years. 95 | 96 | 不过,回报率最高的投资是迁移到微服务架构,这可以使一个大型产品团队有能力保持活力和创新,同时降低风险。在某些情况下,在谷歌,答案是从头开始重写一个应用程序,而不是简单地迁移它,在新的架构中建立所需的模块化。尽管这两种选择都需要几个月的时间,而且在短期内可能是痛苦的,但在运营成本和认知的简单性方面获得的价值将在应用程序多年的生命周期中得到回报。 97 | 98 | ## Evaluating Changes in Isolation: Flag-Guarding Features 评估隔离中的更改:标志保护功能 99 | 100 | A key to reliable continuous releases is to make sure engineers “flag guard” *all changes*. As a product grows, there will be multiple features under various stages of development coexisting in a binary. Flag guarding can be used to control the inclusion or expression of feature code in the product on a feature-by-feature basis and can be expressed differently for release and development builds. A feature flag disabled for a build should allow build tools to strip the feature from the build if the language permits it. For instance, a stable feature that has already shipped to customers might be enabled for both development and release builds. A feature under development might be enabled only for development, protecting users from an unfinished feature. New feature code lives in the binary alongside the old codepath—both can run, but the new code is guarded by a flag. If the new code works, you can remove the old codepath and launch the feature fully in a subsequent release. If there’s a problem, the flag value can be updated independently from the binary release via a dynamic config update. 101 | 102 | 可靠的连续发布的关键是确保工程师“通过标志保护”所有更改。随着产品的发展,在二进制文件中,将有处于不同开发阶段的多种功能共存。以单个功能为单位,标志保护位决定了功能的代码在产品中是否被包含或如何呈现,并可在发布和开发版本中以不同方式表达。如果编程语言允许,打上”禁用”标志的功能标志位会使得构建工具从对应的版本构建中剥离该功能。例如,一个已经提供给客户的稳定特性可能会在开发版本和发布版本中启用。正在开发的功能可能仅为开发而启用,从而保护用户不受未完成功能的影响。新的特性代码与旧的代码路径一起存在于二进制文件中,两者都可以运行,但新代码由一个标志保护。如果新代码有效,你可以删除旧代码路径,并在后续版本中完全启动该功能。如果出现问题,可以通过动态配置更新独立于二进制版本更新标志值。 103 | 104 | In the old world of binary releases, we had to time press releases closely with our binary rollouts. We had to have a successful rollout before a press release about new functionality or a new feature could be issued. This meant that the feature would be out in the wild before it was announced, and the risk of it being discovered ahead of time was very real. 105 | 106 | 在过去的二进制发布的世界里,我们必须将新闻发布时间与二进制发布时间紧密协调。在发布关于新功能或新功能的新闻稿之前,我们必须进行成功的发布。这意味着该功能将在发布之前就已经公开,提前被发现的风险是非常现实的。 107 | 108 | This is where the beauty of the flag guard comes to play. If the new code has a flag, the flag can be updated to turn your feature on immediately before the press release, thus minimizing the risk of leaking a feature. Note that flag-guarded code is not a *perfect* safety net for truly sensitive features. Code can still be scraped and analyzed if it’s not well obfuscated, and not all features can be hidden behind flags without adding a lot of complexity. Moreover, even flag configuration changes must be rolled out with care. Turning on a flag for 100% of your users all at once is not a great idea, so a configuration service that manages safe configuration rollouts is a good investment. Nevertheless, the level of control and the ability to decouple the destiny of a particular feature from the overall product release are powerful levers for long-term sustainability of the application. 109 | 110 | 这就是标志守卫的优势所在。如果新的代码有一个标志,标在发布新功能之前可以立即更新该标志,从而最大限度地减少泄露功能的风险。请注意,对于真正敏感的功能,有标志的代码并不是一个完美的安全网。如果代码没有被很好地混淆,它仍然可以被抓取和分析,而且不是所有的功能都可以隐藏在标志后面而不增加太多复杂性。此外,即使是标志配置的改变,也必须谨慎地推出。一次性为100%的用户打开一个标志并不是一个好主意,所以一个能管理安全配置推出的配置服务是一个很好的投资。尽管如此,对于长期可持续性的应用程序而言,控制的水平和将特定功能的命运与整体产品发布分离的能力是强大的杠杆作用。 111 | 112 | ## Striving for Agility: Setting Up a Release Train 为敏捷而奋斗:建立一个发布序列 113 | 114 | Google’s Search binary is its first and oldest. Large and complicated, its codebase can be tied back to Google’s origin—a search through our codebase can still find code written at least as far back as 2003, often earlier. When smartphones began to take off, feature after mobile feature was shoehorned into a hairball of code written primarily for server deployment. Even though the Search experience was becoming more vibrant and interactive, deploying a viable build became more and more difficult. At one point, we were releasing the Search binary into production only once per week, and even hitting that target was rare and often based on luck. 115 | 116 | 谷歌搜索是其最早也是最古老的二进制文件。它的代码库庞大而复杂,可以追溯到谷歌的起源--在我们的代码库中搜索,仍然可以找到至少早在2003年编写的代码,通常更早。当智能手机开始使用时,一个接一个的移动功能被塞进了一大堆主要为服务器部署而编写的代码中。尽管搜索体验变得更加生动和互动,部署一个可行的构建变得越来越困难。有一次,我们每周只发布一次搜索二进制文件到生产中,而即使达到这个目标也是很难得的,而且往往要靠运气。 117 | 118 | When one of our contributing authors, Sheri Shipe, took on the project of increasing our release velocity in Search, each release cycle was taking a group of engineers days to complete. They built the binary, integrated data, and then began testing. Each bug had to be manually triaged to make sure it wouldn’t impact Search quality, the user experience (UX), and/or revenue. This process was grueling and time consuming and did not scale to the volume or rate of change. As a result, a developer could never know when their feature was going to be released into production. This made timing press releases and public launches challenging. 119 | 120 | 当我们的贡献作者之一Sheri Shipe承担了提高搜索发布速度的项目时,每个发布周期都需要一组工程师几天才能完成。他们构建了二进制集成数据,然后开始测试。每一个bug都必须进行手动分类,以确保它不会影响搜索质量、用户体验(UX)和/或收入。这一过程既费时又费力,而且不能适应变化的数量和速度。因此,开发人员永远不可能知道他们的特性将在何时发布到生产环境中。这使得新闻发布和公开发布的时间安排具有挑战性。 121 | 122 | Releases don’t happen in a vacuum, and having reliable releases makes the dependent factors easier to synchronize. Over the course of several years, a dedicated group of engineers implemented a continuous release process, which streamlined everything about sending a Search binary into the world. We automated what we could, set deadlines for submitting features, and simplified the integration of plug-ins and data into the binary. We could now consistently release a new Search binary into production every other day. 123 | 124 | 发布不是在真空中发生的,拥有可靠的发布使得依赖性因素更容易同步。在几年的时间里,一个专门的工程师小组实施了一个持续的发布过程,它简化了关于向世界发送搜索二进制文件的所有工作。我们把我们能做的事情自动化,为提交功能设定最后期限,并简化插件和数据到二进制文件的集成。我们现在可以每隔一天发布一个新的搜索二进制文件。 125 | 126 | What were the trade-offs we made to get predictability in our release cycle? They narrow down to two main ideas we baked into the system. 127 | 128 | 为了在发布周期中获得可预测性,我们做了哪些权衡?他们把我们融入系统的两个主要想法归纳了下来。 129 | 130 | ### No Binary Is Perfect 没有完美的二进制包 131 | 132 | The first is that *no binary is perfect*, especially for builds that are incorporating the work of tens or hundreds of developers independently developing dozens of major features. Even though it’s impossible to fix every bug, we constantly need to weigh questions such as: If a line has been moved two pixels to the left, will it affect an ad display and potential revenue? What if the shade of a box has been altered slightly? Will it make it difficult for visually impaired users to read the text? The rest of this book is arguably about minimizing the set of unintended outcomes for a release, but in the end we must admit that software is fundamentally complex. There is no perfect binary—decisions and trade-offs have to be made every time a new change is released into production. Key performance indicator metrics with clear thresholds allow features to launch even if they aren’t perfect[^1] and can also create clarity in otherwise contentious launch decisions. 133 | 134 | 首先,*没有一个二进制包是完美的*,,尤其是对于包含数十个或数百个独立开发几十个主要功能的开发人员的工作的构建。尽管不可能修复每个bug,但我们需要不断权衡这样的问题:如果一条线向左移动了两个像素,它会影响广告显示和潜在收入吗?如果盒子的颜色稍微改变了怎么办?这是否会让视障用户难以阅读文本?本书的其余部分可以说是关于最小化发布的一系列意外结果,但最终我们必须承认软件从根本上来说是复杂的。没有完美的二进制包--每当有新的变化发布到生产中时,就必须做出决定和权衡。具有明确阈值的关键性能指标允许功能在不完美的情况下推出,也可以在其他有争议的发布决策中创造清晰的思路。 135 | 136 | One bug involved a rare dialect spoken on only one island in the Philippines. If a user asked a search question in this dialect, instead of an answer to their question, they would get a blank web page. We had to determine whether the cost of fixing this bug was worth delaying the release of a major new feature. 137 | 138 | 其中一个bug涉及一种罕见的方言,这种方言只在菲律宾的一个岛屿上使用。如果用户用这种方言问搜索问题,而不是回答他们的问题,他们会得到一个空白网页。我们必须确定修复这个bug的成本是否值得推迟发布一个重要的新特性。 139 | 140 | We ran from office to office trying to determine how many people actually spoke this language, if it happened every time a user searched in this language, and whether these folks even used Google on a regular basis. Every quality engineer we spoke with deferred us to a more senior person. Finally, data in hand, we put the question to Search’s senior vice president. Should we delay a critical release to fix a bug that affected only a very small Philippine island? It turns out that no matter how small your island, you should get reliable and accurate search results: we delayed the release and fixed the bug. 141 | 142 | 我们奔走于各个办公室,试图确定究竟有多少人讲这种语言,是否每次用户用这种语言搜索时都会出现这种情况,以及这些人是否经常使用谷歌。每个与我们交谈的质量工程师都把我们推给更高级别的人。最后,数据在手,我们把问题交给了搜索部的高级副总裁。我们是否应该推迟一个重要的版本来修复一个只影响到菲律宾一个很小的岛屿的错误?事实证明,无论你的岛有多小,你都应该得到可靠和准确的搜索结果:我们推迟了发布,并修复了这个错误。 143 | 144 | > [^1]: Remember the SRE “error-budget” formulation: perfection is rarely the best goal. Understand how much room for error is acceptable and how much of that budget has been spent recently and use that to adjust the trade-off between velocity and stability. 145 | > 146 | > 1 记住SRE的 "错误预算 "表述:完美很少是最佳目标。了解多少误差空间是可以接受的,以及该预算最近花了多少,并利用这一点来调整速度和稳定性之间的权衡。 147 | 148 | ### Meet Your Release Deadline 满足你的发布期限 149 | 150 | The second idea is that *if you’re late for the release train, it will leave without you*. There’s something to be said for the adage, “deadlines are certain, life is not.” At some point in the release timeline, you must put a stake in the ground and turn away developers and their new features. Generally speaking, no amount of pleading or begging will get a feature into today’s release after the deadline has passed. 151 | 152 | 第二个想法是,*如果你赶不上发布列车,它就会丢下你离开*。有一句格言值得一提,“最后期限是确定的,而生活不是。”在发布时间表的某个时刻,你必须立木取信,拒绝开发人员及其新功能。一般来说,在截止日期过后,无论多少恳求或乞求都不会在今天的版本中出现。 153 | 154 | There is the *rare* exception. The situation usually goes like this. It’s late Friday evening and six software engineers come storming into the release manager’s cube in a panic. They have a contract with the NBA and finished the feature moments ago. But it must go live before the big game tomorrow. The release must stop and we must cherry- pick the feature into the binary or we’ll be in breach of contract! A bleary-eyed release engineer shakes their head and says it will take four hours to cut and test a new binary. It’s their kid’s birthday and they still need to pick up the balloons. 155 | 156 | 有一个*罕见的例外*。这种情况通常是这样的。周五晚间,六名软件工程师惊慌失措地冲进发布经理的办公室。他们与NBA签订了合同,并在不久前完成了这个功能。但它必须在明天的大比赛之前上线。发布必须停止,我们必须将该特性插入二进制包,否则我们将违反合同!"。一个目光呆滞的发布工程师摇摇头,说切割和测试一个新的二进制文件需要四个小时。今天是他们孩子的生日,他们还需要带着气球回家。 157 | 158 | A world of regular releases means that if a developer misses the release train, they’ll be able to catch the next train in a matter of hours rather than days. This limits developer panic and greatly improves work–life balance for release engineers. 159 | 160 | 定期发布的世界意味着,如果开发人员错过了发版班车,他们将能够在几个小时而不是几天内赶上下一班班车。这限制了开发人员的恐慌,并大大改善了发布工程师的工作与生活平衡。 161 | 162 | ## Quality and User-Focus: Ship Only What Gets Used 质量和用户关注点:只提供使用的产品 163 | 164 | Bloat is an unfortunate side effect of most software development life cycles, and the more successful a product becomes, the more bloated its code base typically becomes. One downside of a speedy, efficient release train is that this bloat is often magnified and can manifest in challenges to the product team and even to the users. Especially if the software is delivered to the client, as in the case of mobile apps, this can mean the user’s device pays the cost in terms of space, download, and data costs, even for features they never use, whereas developers pay the cost of slower builds, complex deployments, and rare bugs. In this section, we’ll talk about how dynamic deployments allow you to ship only what is used, forcing necessary trade-offs between user value and feature cost. At Google, this often means staffing dedicated teams to improve the efficiency of the product on an ongoing basis. 165 | 166 | 臃肿是大多数软件开发生命周期的一个不幸的副作用,产品越成功,其代码库通常就越臃肿。快速、高效的发布系列的一个缺点是,这种臃肿经常被放大,并可能表现为对产品团队甚至用户的挑战。特别是如果软件交付给客户端(如移动应用程序),这可能意味着用户的设备要支付空间、下载和数据成本,即使是他们从未使用过的功能,而开发人员要支付构建速度较慢、部署复杂和罕见bug的成本。在本节中,我们将讨论动态部署如何允许你仅发布所使用的内容,从而在用户价值和功能成本之间进行必要的权衡。在谷歌,这通常意味着配备专门的团队,以不断提高产品的效率。 167 | 168 | Whereas some products are web-based and run on the cloud, many are client applications that use shared resources on a user’s device—a phone or tablet. This choice in itself showcases a trade-off between native apps that can be more performant and resilient to spotty connectivity, but also more difficult to update and more susceptible to platform-level issues. A common argument against frequent, continuous deployment for native apps is that users dislike frequent updates and must pay for the data cost and the disruption. There might be other limiting factors such as access to a network or a limit to the reboots required to percolate an update. 169 | 170 | 有些产品是基于网络并在云上运行的,而许多产品是客户端应用程序,使用用户设备上的共享资源--手机或平板电脑。这种选择本身就展示了原生应用之间的权衡,原生应用可以有更高的性能,对不稳定的连接有弹性,但也更难更新,更容易受到平台问题的影响。反对原生应用频繁、持续部署的一个常见论点是,用户不喜欢频繁的更新,而且必须为数据成本和中断付费。可能还有其他限制因素,如访问网络或限制渗透更新所需的重新启动。 171 | 172 | Even though there is a trade-off to be made in terms of how frequently to update a product, the goal is to *have these choices be intentional*. With a smooth, well-running CD process, how often a viable release is *created* can be separated from how often a user *receives* it. You might achieve the goal of being able to deploy weekly, daily, or hourly, without actually doing so, and you should intentionally choose release processes in the context of your users’ specific needs and the larger organizational goals, and determine the staffing and tooling model that will best support the long-term sustainability of your product. 173 | 174 | 即使在更新产品的频率方面需要做出权衡,但目标是*让这些选择是有意的决策*。有了一个平滑、运行良好的CD流程,创建一个可行版本的频率可以与用户收到它的频率分开。你可能会实现每周、每天或每小时部署一次的目标,但实际上并没有这样做。你应该根据用户的具体需求和更大的组织目标有意识地选择发布流程,并确定最能支持产品长期可持续性的人员配置和工具模型。 175 | 176 | Earlier in the chapter, we talked about keeping your code modular. This allows for dynamic, configurable deployments that allow better utilization of constrained resources, such as the space on a user’s device. In the absence of this practice, every user must receive code they will never use to support translations they don’t need or architectures that were meant for other kinds of devices. Dynamic deployments allow apps to maintain small sizes while only shipping code to a device that brings its users value, and A/B experiments allow for intentional trade-offs between a feature’s cost and its value to users and your business. 177 | 178 | 在本章的前面部分,我们讨论了保持代码模块化。这允许动态、可配置的部署,以便更好地利用有限资源,例如用户设备上的空间。在没有这种实践的情况下,每个用户都必须收到他们永远不会使用的代码,以支持他们不需要的翻译或用于其他类型设备的架构。动态部署允许应用程序保持较小的尺寸,同时只将代码发送给能为用户带来价值的设备,而A/B实验允许在功能的成本及其对用户和企业的价值之间进行有意义的权衡。 179 | 180 | There is an upfront cost to setting up these processes, and identifying and removing frictions that keep the frequency of releases lower than is desirable is a painstaking process. But the long-term wins in terms of risk management, developer velocity, and enabling rapid innovation are so high that these initial costs become worthwhile. 181 | 182 | 建立这些流程是有前期成本的,识别和消除使发布频率低于理想水平的阻力是一个艰苦的工作。但是,在风险管理、开发者速度和实现快速创新方面的长期胜利是如此之高,以至于这些初始成本是值得的。 183 | 184 | ## Shifting Left: Making Data-Driven Decisions Earlier 左移:提前做出数据驱动的决策 185 | 186 | If you’re building for all users, you might have clients on smart screens, speakers, or Android and iOS phones and tablets, and your software may be flexible enough to allow users to customize their experience. Even if you’re building for only Android devices, the sheer diversity of the more than two billion Android devices can make the prospect of qualifying a release overwhelming. And with the pace of innovation, by the time someone reads this chapter, whole new categories of devices might have bloomed. 187 | 188 | 如果你是为所有用户建立应用程序,你可能在智能屏幕、扬声器或Android和iOS手机和平板电脑上有客户,你的软件可能足够灵活,允许用户定制他们的体验。即使你只为安卓设备构建,超过20亿的安卓设备的多样性也会使一个版本的场景变得不堪重负。随着创新的步伐,当有人读到这一章时,全新的设备类别可能已经出现。 189 | 190 | One of our release managers shared a piece of wisdom that turned the situation around when he said that the diversity of our client market was not a *problem*, but a *fact*. After we accepted that, we could switch our release qualification model in the following ways: 191 | 192 | - If *comprehensive* testing is practically infeasible, aim for *representative* testing instead. 193 | - Staged rollouts to slowly increasing percentages of the userbase allow for fast fixes. 194 | - Automated A/B releases allow for statistically significant results proving a release’s quality, without tired humans needing to look at dashboards and make decisions. 195 | 196 | 我们的一位发布经理分享了一条智慧,他说我们客户市场的多样性不是问题,而是事实,这扭转了局面。在我们接受后,我们可以通过以下方式切换我们的发布资格模型: 197 | 198 | - 如果*全面*的测试实际上是不可行的,就以*代表性*的测试为目标。 199 | - 分阶段向用户群中慢慢增加的百分比进行发布,可以快速修复问题。 200 | - 自动的A/B发布允许统计学上有意义的结果来证明一个版本的质量,而无需疲惫的人去看仪表盘和做决定。 201 | 202 | When it comes to developing for Android clients, Google apps use specialized testing tracks and staged rollouts to an increasing percentage of user traffic, carefully monitoring for issues in these channels. Because the Play Store offers unlimited testing tracks, we can also set up a QA team in each country in which we plan to launch, allowing for a global overnight turnaround in testing key features. 203 | 204 | 在为Android客户端开发时,谷歌应用程序使用专门的测试轨道和分阶段推出,以增加用户流量的百分比,仔细监控这些渠道中的问题。由于Play Store提供无限的测试轨道,我们还可以在我们计划推出的每个国家/地区建立一个QA团队,允许在全球范围内一夜之间完成关键功能的测试。 205 | 206 | One issue we noticed when doing deployments to Android was that we could expect a statistically significant change in user metrics *simply from pushing an update*. This meant that even if we made no changes to our product, pushing an update could affect device and user behavior in ways that were difficult to predict. As a result, although canarying the update to a small percentage of user traffic could give us good information about crashes or stability problems, it told us very little about whether the newer version of our app was in fact better than the older one. 207 | 208 | 我们在Android部署时注意到的一个问题是,仅仅通过推送更新,我们就可以预期用户指标会发生统计上的显著变化。这意味着,即使我们没有对产品进行任何更改,推动更新也可能以难以预测的方式影响设备和用户行为。因此,尽管对一小部分用户流量进行更新可以为我们提供关于崩溃或稳定性问题的良好信息,但它几乎没有告诉我们更新版本的应用程序是否比旧版本更好。 209 | 210 | Dan Siroker and Pete Koomen have already discussed the value of A/B testing[^2] your features, but at Google, some of our larger apps also A/B test their *deployments*. This means sending out two versions of the product: one that is the desired update, with the baseline being a placebo (your old version just gets shipped again). As the two versions roll out simultaneously to a large enough base of similar users, you can compare one release against the other to see whether the latest version of your software is in fact an improvement over the previous one. With a large enough userbase, you should be able to get statistically significant results within days, or even hours. An automated metrics pipeline can enable the fastest possible release by pushing forward a release to more traffic as soon as there is enough data to know that the guardrail metrics will not be affected. 211 | 212 | Dan Siroker和Pete Koomen已经讨论了A/B测试的价值,但在Google,我们的一些大型应用也对其*部署*进行A/B测试。这意味着发送两个版本的产品:一个是所需的更新,基线是一个安慰剂(你的旧版本只是被再次发送)。当这两个版本同时向足够多的类似用户推出时,你可以将一个版本与另一个版本进行比较,看看你的软件的最新版本是否真的比以前的版本有所改进。有了足够大的用户群,你应该能够在几天内,甚至几小时内得到统计学上的显著结果。一个自动化的指标管道可以实现最快的发布,只要有足够的数据知道护栏指标不会受到影响,就可以将一个版本推到更多的流量。 213 | 214 | Obviously, this method does not apply to every app and can be a lot of overhead when you don’t have a large enough userbase. In these cases, the recommended best practice is to aim for change-neutral releases. All new features are flag guarded so that the only change being tested during a rollout is the stability of the deployment itself. 215 | 216 | 显然,这种方法并不适用于每个应用程序,当你没有足够大的用户群时,可能会有很多开销。在这种情况下,推荐的最佳做法是以变化中立的发布为目标。所有的新功能都有标志保护,这样在发布过程中测试的唯一变化就是部署本身的稳定性。 217 | 218 | > [^2]: Dan Siroker and Pete Koomen, *A/B Testing: The Most Powerful Way to Turn Clicks Into Customers* (Hoboken: Wiley, 2013). 219 | > 220 | > 2 Dan Siroker和Pete Koomen,《A/B测试:将点击转化为客户的最有效方式》(Hoboken:Wiley,2013)。 221 | 222 | ## Changing Team Culture: Building Discipline into Deployment 改变团队文化:在部署中建立规则 223 | 224 | Although “Always Be Deploying” helps address several issues affecting developer velocity, there are also certain practices that address issues of scale. The initial team launching a product can be fewer than 10 people, each taking turns at deployment and production-monitoring responsibilities. Over time, your team might grow to hundreds of people, with subteams responsible for specific features. As this happens and the organization scales up, the number of changes in each deployment and the amount of risk in each release attempt is increasing superlinearly. Each release contains months of sweat and tears. Making the release successful becomes a high-touch and labor-intensive effort. Developers can often be caught trying to decide which is worse: abandoning a release that contains a quarter’s worth of new features and bug fixes, or pushing out a release without confidence in its quality. 225 | 226 | 尽管“始终进行部署”有助于解决影响开发人员速度的几个问题,但也有一些实践可以解决规模问题。启动产品的初始团队可以少于10人,每个人轮流负责部署和生产监控。随着时间的推移,你的团队可能会发展到数百人,其中的子团队负责特定的功能。随着这种情况的发生和组织规模的扩大,每次部署中的更改数量和每次发布尝试中的风险量都呈超线性增长。每次发布都包含数月的汗水和泪水。使发布成功成为一项高度沟通和劳动密集型的工作。开发人员经常会被发现试图决定哪一个更糟糕:放弃一个包含四分之一新特性和错误修复的版本,或者在对其质量没有信心的情况下推出一个版本。 227 | 228 | At scale, increased complexity usually manifests as increased release latency. Even if you release every day, a release can take a week or longer to fully roll out safely, leaving you a week behind when trying to debug any issues. This is where “Always Be Deploying” can return a development project to effective form. Frequent release trains allow for minimal divergence from a known good position, with the recency of changes aiding in resolving issues. But how can a team ensure that the complexity inherent with a large and quickly expanding codebase doesn’t weigh down progress? 229 | 230 | 在规模扩大时,复杂性的增加通常表现为发布延迟的增加。即使你每天都发布,一个版本也可能需要一周或更长的时间才能完全安全地发布,当你试图调试任何问题时,就会落后一周。这就是 "始终进行部署 "可以使开发项目恢复到有效状态的地方。频繁的发版班车允许从一个已知的良好状态中产生最小的偏差,变化的频率有助于解决问题。但是,一个团队如何才能确保一个庞大而快速扩展的代码库所固有的复杂性不会拖累进度呢? 231 | 232 | On Google Maps, we take the perspective that features are very important, but only very seldom is any feature so important that a release should be held for it. If releases are frequent, the pain a feature feels for missing a release is small in comparison to the pain all the new features in a release feel for a delay, and especially the pain users can feel if a not-quite-ready feature is rushed to be included. 233 | 234 | 在谷歌地图上,我们的观点是:功能是非常重要的,但只有在非常少的情况下,才会有如此重要的功能需要发布。如果发布的频率很高,那么一个功能因为错过了一个版本而感到的痛苦与一个版本中所有的新功能因为延迟而感到的痛苦相比是很小的,特别是如果一个还没有准备好的功能被匆忙纳入,用户会感到痛苦。 235 | 236 | One release responsibility is to protect the product from the developers. 237 | 238 | 一个发布责任是保护产品不受开发人员的影响。 239 | 240 | When making trade-offs, the passion and urgency a developer feels about launching a new feature can never trump the user experience with an existing product. This means that new features must be isolated from other components via interfaces with strong contracts, separation of concerns, rigorous testing, communication early and often, and conventions for new feature acceptance. 241 | 242 | 在进行权衡时,开发人员对推出新功能的热情和紧迫感永远不能超过对现有产品的用户体验。这意味着,新功能必须通过具有强大契约的接口、关注点分离、严格测试、早期和经常的沟通以及新功能验收的惯例,与其他组件隔离。 243 | 244 | ## Conclusion 总结 245 | 246 | Over the years and across all of our software products, we’ve found that, counterintuitively, faster is safer. The health of your product and the speed of development are not actually in opposition to each other, and products that release more frequently and in small batches have better quality outcomes. They adapt faster to bugs encountered in the wild and to unexpected market shifts. Not only that, faster is *cheaper*, because having a predictable, frequent release train forces you to drive down the cost of each release and makes the cost of any abandoned release very low. 247 | 248 | 多年来,在我们所有的软件产品中,我们发现,相反,更快更安全。你的产品的健康状况和开发速度实际上并不是相互对立的,更频繁和小批量发布的产品具有更好的质量结果。它们能更快地适应在野外遇到的错误和意外的市场变化。不仅如此,更快就是*便宜*,因为有一个可预测的、频繁的发版班车,迫使你降低每个版本的成本,并使任何被放弃的发布的成本非常低。 249 | 250 | Simply having the structures in place that *enable* continuous deployment generates the majority of the value, *even if you don’t actually push those releases out to users*. What do we mean? We don’t actually release a wildly different version of Search, Maps, or YouTube every day, but to be able to do so requires a robust, well- documented continuous deployment process, accurate and real-time metrics on user satisfaction and product health, and a coordinated team with clear policies on what makes it in or out and why. In practice, getting this right often also requires binaries that can be configured in production, configuration managed like code (in version control), and a toolchain that allows safety measures like dry-run verification, rollback/rollforward mechanisms, and reliable patching. 251 | 252 | 仅仅拥有*能够*持续部署的结构,就能产生大部分的价值,*即使你没有真正把这些版本推送给用户*。我们的意思是什么呢?我们实际上并不是每天都发布一个完全不同的搜索、地图或YouTube的版本,但要做到这一点,就需要一个健壮的、有良好文档记录的连续部署过程、关于用户满意度和产品健康状况的准确实时指标,以及一个协调一致的团队,该团队拥有明确的策略,以确定成功与否以及原因。在实践中,要做到这一点,往往还需要可以在生产中配置的二进制包,像代码一样管理的配置(在版本控制中),以及一个可以采取安全措施的工具链,如干运行验证、回滚/前滚机制和可靠的补丁。 253 | 254 | ## TL;DRs 内容提要 255 | 256 | - *Velocity is a team sport*: The optimal workflow for a large team that develops code collaboratively requires modularity of architecture and near-continuous integration. 257 | - Evaluate changes in isolation: Flag guard any features to be able to isolate prob‐ lems early. 258 | - Make reality your benchmark: Use a staged rollout to address device diversity and the breadth of the userbase. Release qualification in a synthetic environment that isn’t similar to the production environment can lead to late surprises. 259 | - Ship only what gets used: Monitor the cost and value of any feature in the wild to know whether it’s still relevant and delivering sufficient user value. 260 | - Shift left: Enable faster, more data-driven decision making earlier on all changes through CI and continuous deployment. 261 | - Faster is safer: Ship early and often and in small batches to reduce the risk of each release and to minimize time to market. 262 | 263 | - *速度是一项团队运动*。协作开发代码的大型团队的最佳工作流程需要架构的模块化和近乎连续的集成。 264 | - 孤立地评估变化。对任何功能进行标记,以便能够尽早隔离问题。 265 | - 让现实成为你的基准。使用分阶段推出的方式来解决设备的多样性和用户群的广泛性。在一个与生产环境不相似的合成环境中进行发布鉴定,会导致后期的意外。 266 | - 只发布被使用的东西。监控任何功能的成本和价值,以了解它是否仍有意义,是否能提供足够的用户价值。 267 | - 向左移动。通过CI和持续部署,使所有的变化更快,更多的数据驱动的决策更早。 268 | - 更快是更安全的。尽早地、经常地、小批量地发布,以减少每次发布的风险,并尽量缩短上市时间。 269 | -------------------------------------------------------------------------------- /zh-cn/Chapter-4_Engineering_for_Equity/Chapter-4_Engineering_for_Equity.md: -------------------------------------------------------------------------------- 1 | 2 | **CHAPTER 4** 3 | 4 | # Engineering for Equity 5 | 6 | # 第四章 公平工程 7 | 8 | **Written by Demma Rodriguez** 9 | 10 | **Edited by Riona MacNamara** 11 | 12 | In earlier chapters, we’ve explored the contrast between programming as the production of code that addresses the problem of the moment, and software engineering as the broader application of code, tools, policies, and processes to a dynamic and ambiguous problem that can span decades or even lifetimes. In this chapter, we’ll discuss the unique responsibilities of an engineer when designing products for a broad base of users. Further, we evaluate how an organization, by embracing diversity, can design systems that work for everyone, and avoid perpetuating harm against our users. 13 | 14 | 在前几章中,我们已经探讨了编程与软件工程之间的对比,前者是解决当下问题的代码生产,后者则是对代码、工具、策略和流程的更广泛的应用,以解决可能跨越几十年甚至一生的动态且不确定的问题。在本章中,我们将讨论工程师在为众多用户设计产品时的独特责任。此外,我们还将评估一个组织如何通过拥抱多样性来设计适合每个人的系统,并避免对我们的用户造成永久性的伤害。 15 | 16 | As new as the field of software engineering is, we’re newer still at understanding the impact it has on underrepresented people and diverse societies. We did not write this chapter because we know all the answers. We do not. In fact, understanding how to engineer products that empower and respect all our users is still something Google is learning to do. We have had many public failures in protecting our most vulnerable users, and so we are writing this chapter because the path forward to more equitable products begins with evaluating our own failures and encouraging growth. 17 | 18 | 尽管软件工程领域是个全新领域,但我们在了解它对代表性不足的群体和多元化社会的影响方面还比较浅。我们写这一章并不是因为我们知道所有的答案。我们不知道。事实上,了解如何设计出能够赋予所有用户权益并尊重所有用户的产品仍然是谷歌正在学习做的事情。在保护我们最弱势的用户方面,我们有很多公开的失败产品,所以我们写这一章是因为通往更平等的产品的道路始于评估我们自己的失败和鼓励成长。 19 | 20 | We are also writing this chapter because of the increasing imbalance of power between those who make development decisions that impact the world and those who simply must accept and live with those decisions that sometimes disadvantage already marginalized communities globally. It is important to share and reflect on what we’ve learned so far with the next generation of software engineers. It is even more important that we help influence the next generation of engineers to be better than we are today. 21 | 22 | 我们之所以要写这一章,也是因为在那些做出影响世界发展的人和那些只能选择接受并忍受这些决定的人之间,力量越来越不平衡,这些决定有时使全球已经边缘化的社区处于不利地位。与下一代软件工程师分享和反思我们迄今所学到的知识是很重要的。更重要的是,我们帮助影响下一代工程师,使他们比我们今天做得更好。 23 | 24 | Just picking up this book means that you likely aspire to be an exceptional engineer. You want to solve problems. You aspire to build products that drive positive outcomes for the broadest base of people, including people who are the most difficult to reach. To do this, you will need to consider how the tools you build will be leveraged to change the trajectory of humanity, hopefully for the better. 25 | 26 | 只要拿起这本书,就意味着你可能渴望成为一名卓越的工程师。你想解决问题。你渴望建造产品,为最广泛的人群,包括最难接触的人,打造一个能带来积极成果的产品。要做到这一点,你需要考虑如何利用你建造的工具来改变人类的轨迹,希望是为了获得更好的发展。 27 | 28 | ## Bias Is the Default 偏见是默认的 29 | 30 | When engineers do not focus on users of different nationalities, ethnicities, races, genders, ages, socioeconomic statuses, abilities, and belief systems, even the most talented staff will inadvertently fail their users. Such failures are often unintentional; all people have certain biases, and social scientists have recognized over the past several decades that most people exhibit unconscious bias, enforcing and promulgating existing stereotypes. Unconscious bias is insidious and often more difficult to mitigate than intentional acts of exclusion. Even when we want to do the right thing, we might not recognize our own biases. By the same token, our organizations must also recognize that such bias exists and work to address it in their workforces, product development, and user outreach. 31 | 32 | 当工程师不关注不同国籍、民族、种族、性别、年龄、社会经济地位、能力和信仰体系的用户时,即使是最优秀的工程师也会无意中让他们的用户失望。这种失败往往是无意的;所有的人都存在一定的偏见,社会科学家在过去几十年中已经认识到,大多数人都表现出无意识的偏见,强迫和传播存在的刻板印象。无意识的偏见是隐藏的,往往比有意的排斥行为更难改正。即使我们想做正确的事,我们也可能意识不到自己的偏见。同样,我们的组织也必须认识到这种偏见的存在,并努力在员工队伍、产品开发和用户推广中解决这一问题。 33 | 34 | Because of bias, Google has at times failed to represent users equitably within their products, with launches over the past several years that did not focus enough on underrepresented groups. Many users attribute our lack of awareness in these cases to the fact that our engineering population is mostly male, mostly White or Asian, and certainly not representative of all the communities that use our products. The lack of representation of such users in our workforce[^1] means that we often do not have the requisite diversity to understand how the use of our products can affect underrepresented or vulnerable users. 35 | 36 | 由于偏见,谷歌有时未能在其产品中公平地代表用户,在过去几年中推出的产品没有足够关注代表性不足的群体。许多用户将我们在这些情况下缺乏意识归咎于这样一个事实,即我们的工程人员大多数是男性,大多数是白人或亚洲人,当然不能代表所有使用我们产品的人群。这类用户在我们的员工队伍中缺乏代表性,这意味着我们往往不具备必要的多样性,无法理解使用我们的产品会如何影响代表性不足或弱势的用户。 37 | 38 | ------ 39 | 40 | #### Case Study: Google Misses the Mark on Racial Inclusion 案例研究:谷歌在种族包容方面的失误 41 | 42 | In 2015, software engineer Jacky Alciné pointed out[^2] that the image recognition algorithms in Google Photos were classifying his black friends as “gorillas.” Google was slow to respond to these mistakes and incomplete in addressing them. 43 | 44 | 2015年,软件工程师Jacky Alciné指出,谷歌照片中的图像识别算法将其黑人朋友错误地分类为‘大猩猩’。谷歌对这些错误的反应很慢,解决起来也不彻底。 45 | 46 | What caused such a monumental failure? Several things: 47 | - Image recognition algorithms depend on being supplied a “proper” (often meaning “complete”) dataset. The photo data fed into Google’s image recognition algorithm was clearly incomplete. In short, the data did not represent the population. 48 | - Google itself (and the tech industry in general) did not (and does not) have much black representation,[^3] and that affects decisions subjective in the design of such algorithms and the collection of such datasets. The unconscious bias of the organization itself likely led to a more representative product being left on the table. 49 | - Google’s target market for image recognition did not adequately include such underrepresented groups. Google’s tests did not catch these mistakes; as a result, our users did, which both embarrassed Google and harmed our users. 50 | 51 | 是什么导致了这样一个巨大的失误?有几件事: 52 | - 图像识别算法取决于是否提供了一个 "适当的"(通常意味着 "完整的")数据集。送入谷歌图像识别算法的照片数据显然是不完整的。简而言之,这些数据并不代表所有人口。 53 | - 谷歌本身(以及整个科技行业)过去没有(现在也没有)很多黑人代表,这影响了设计这种算法和收集这种数据集的主观决定。组织本身无意识的偏见很可能导致更具代表性的产品被搁置。 54 | - 谷歌的图像识别目标市场并没有充分包括这种代表性不足的群体。谷歌的测试没有发现这些错误;结果是我们的用户发现了,这既让谷歌感到尴尬,也伤害了我们的用户。 55 | 56 | As late as 2018, Google still had not adequately addressed the underlying problem.[^4] 57 | 58 | 直到2018年,谷歌仍然没有彻底地解决这些潜在的问题。 59 | 60 | ------ 61 | 62 | In this example, our product was inadequately designed and executed, failing to properly consider all racial groups, and as a result, failed our users and caused Google bad press. Other technology suffers from similar failures: autocomplete can return offensive or racist results. Google’s Ad system could be manipulated to show racist or offensive ads. YouTube might not catch hate speech, though it is technically outlawed on that platform. 63 | 64 | 在这个例子中,我们的产品设计和执行不充分,未能适当考虑到所有的种族群体,结果是辜负了我们的用户,给谷歌带来了恶劣的影响。其他技术也有类似的失误:自动完成补全可以返回攻击性或种族主义的结果。谷歌的广告系统可以被操纵来显示种族主义或攻击性广告。YouTube可能没有识别到仇恨言论,尽管从技术上讲,它在该平台上是非法的。 65 | 66 | In all of these cases, the technology itself is not really to blame. Autocomplete, for example, was not designed to target users or to discriminate. But it was also not resilient enough in its design to exclude discriminatory language that is considered hate speech. As a result, the algorithm returned results that caused harm to our users. The harm to Google itself should also be obvious: reduced user trust and engagement with the company. For example, Black, Latinx, and Jewish applicants could lose faith in Google as a platform or even as an inclusive environment itself, therefore undermining Google’s goal of improving representation in hiring. 67 | 68 | 在所有这些情况下,技术本身并不是真正的罪魁祸首。例如,自动完成补全的设计目的不是为了针对用户或进行歧视。但它的设计也没有足够的灵活来排除被认为是仇恨言论的歧视性语言。结果,该算法返回的结果对我们的用户造成了伤害。对谷歌本身的损害也应该是显而易见的:用户对该公司的信任和参与度降低。例如,黑人、拉美人和犹太人的申请者可能会对谷歌这个平台甚至其本身的包容性环境失去信心,因此破坏了谷歌在招聘中改善代表性的目标。 69 | 70 | How could this happen? After all, Google hires technologists with impeccable education and/or professional experience—exceptional programmers who write the best code and test their work. “Build for everyone” is a Google brand statement, but the truth is that we still have a long way to go before we can claim that we do. One way to address these problems is to help the software engineering organization itself look like the populations for whom we build products. 71 | 72 | 这怎么会发生呢?毕竟,谷歌雇用的技术专家拥有无可挑剔的教育和/或专业经验——卓越的程序员,他们编写最好的代码并测试他们的功能。"为每个人而建 "是谷歌的品牌宣言,但事实是,在宣称我们做到这一点之前,我们仍有很长的路要走。解决这些问题的方法之一是帮助软件工程组织本身变得像我们为其建造产品的人群。 73 | 74 | > [^1]: Google’s 2019 Diversity Report. 75 | > 1 谷歌的2019年多样性报告。 76 | > 77 | > [^2]: @jackyalcine. 2015. “Google Photos, Y’all Fucked up. My Friend’s Not a Gorilla.” Twitter, June 29, 2015.https://twitter.com/jackyalcine/status/615329515909156865. 78 | > 2 @jackyalcine. 2015. "谷歌照片,你们都搞砸了。我的朋友不是大猩猩"。Twitter,2015年6月29日。https://twitter.com/jackyalcine/status/615329515909156865 79 | > 80 | > [^3]: Many reports in 2018–2019 pointed to a lack of diversity across tech. Some notables include the National Center for Women & Information Technology, and Diversity in Tech./ 81 | > 3 2018-2019年的许多报告指出,整个科技界缺乏多样性。一些著名的报告包括国家妇女和信息技术中心,以及科技领域的多样性。 82 | > 83 | > [^4]: Tom Simonite, “When It Comes to Gorillas, Google Photos Remains Blind,” Wired, January 11, 2018. 84 | > 4 Tom Simonite,"当涉及到大猩猩时,谷歌照片仍然是盲目的,"《连线》,2018年1月11日。 85 | 86 | ## Understanding the Need for Diversity 了解多样性的必要性 87 | 88 | At Google, we believe that being an exceptional engineer requires that you also focus on bringing diverse perspectives into product design and implementation. It also means that Googlers responsible for hiring or interviewing other engineers must contribute to building a more representative workforce. For example, if you interview other engineers for positions at your company, it is important to learn how biased outcomes happen in hiring. There are significant prerequisites for understanding how to anticipate harm and prevent it. To get to the point where we can build for everyone, we first must understand our representative populations. We need to encourage engineers to have a wider scope of educational training. 89 | 90 | 在谷歌,我们相信,作为一名出色的工程师,你还需要专注于将不同的视角引入到产品设计和实施中。这也意味着,负责招聘或面试其他工程师的谷歌人必须致力于打造更具代表性的团队。例如,如果你为公司的职位面试其他工程师,了解招聘过程中的偏差结果是如何发生,这是很重要的。了解如何预测和预防伤害有重要的先决条件。为了达到我们能够为每个人而建的目的,我们首先必须了解我们的代表人群。了解招聘过程中的偏差结果是如何发生的是很重要的。 91 | 92 | The first order of business is to disrupt the notion that as a person with a computer science degree and/or work experience, you have all the skills you need to become an exceptional engineer. A computer science degree is often a necessary foundation. However, the degree alone (even when coupled with work experience) will not make you an engineer. It is also important to disrupt the idea that only people with computer science degrees can design and build products. Today, [most programmers do have a computer science degree](https://oreil.ly/2Bu0H); they are successful at building code, establishing theories of change, and applying methodologies for problem solving. However, as the aforementioned examples demonstrate, *this approach is insufficient for inclusive and* *equitable engineering*. 93 | 94 | 首要的任务是打破这样的观念:作为一个拥有计算机科学学位或且工作经验的人,你拥有成为一名出色工程师所需的所有技能。计算机科学学位通常是一个必要的基础。然而,单凭学位(即使再加上工作经验)并不能使你成为一名工程师。打破只有拥有计算机科学学位的人才能设计和建造产品的想法也很重要。今天,大多数程序员确实拥有计算机科学学位;他们在构建代码、建立变化理论和应用解决问题的方法方面都很成功。然而,正如上述例子所表明的,*这种方法不足以实现包容性和公平工程*。 95 | 96 | Engineers should begin by focusing all work within the framing of the complete ecosystem they seek to influence. At minimum, they need to understand the population demographics of their users. Engineers should focus on people who are different than themselves, especially people who might attempt to use their products to cause harm. The most difficult users to consider are those who are disenfranchised by the processes and the environment in which they access technology. To address this challenge, engineering teams need to be representative of their existing and future users. In the absence of diverse representation on engineering teams, individual engineers need to learn how to build for all users. 97 | 98 | 工程师应首先关注他们试图影响的完整生态系统框架内的所有工作。至少,他们需要了解用户的人群统计数据。工程师应该关注与自己不同的人,特别是那些试图使用他们的产品而受伤的人。最难考虑的用户是那些被他们获取技术的过程和环境所剥夺了权益的人。为了应对这一挑战,工程团队需要代表其现有和未来的用户。在工程团队缺乏多元化代表的情况下,每个工程师需要学习如何为所有用户构建。 99 | 100 | ## Building Multicultural Capacity 构建多元化能力 101 | 102 | One mark of an exceptional engineer is the ability to understand how products can advantage and disadvantage different groups of human beings. Engineers are expected to have technical aptitude, but they should also have the *discernment* to know when to build something and when not to. Discernment includes building the capacity to identify and reject features or products that drive adverse outcomes. This is a lofty and difficult goal, because there is an enormous amount of individualism that goes into being a high-performing engineer. Yet to succeed, we must extend our focus beyond our own communities to the next billion users or to current users who might be disenfranchised or left behind by our products. 103 | 104 | 卓越的工程师的一个标志是能够理解产品对不同的人群的好处和坏处。工程师应该有技术能力,但他们也应该有*敏锐的判断力*,知道什么时候该造什么,什么时候不该造。判断力包括建立识别和拒绝那些导致不良结果的功能或产品的能力。这是一个崇高而艰难的目标,因为要成为一名出色的工程师,需要有大量的个人主义。然而,想要成功,我们必须扩大我们的关注范围,关注我们当前用户之外的未来十亿的用户,哪怕是可能被我们的产品剥夺权利或遗弃的现有用户。 105 | 106 | Over time, you might build tools that billions of people use daily—tools that influence how people think about the value of human lives, tools that monitor human activity, and tools that capture and persist sensitive data, such as images of their children and loved ones, as well as other types of sensitive data. As an engineer, you might wield more power than you realize: the power to literally change society. It’s critical that on your journey to becoming an exceptional engineer, you understand the innate responsibility needed to exercise power without causing harm. The first step is to recognize the default state of your bias caused by many societal and educational factors. After you recognize this, you’ll be able to consider the often-forgotten use cases or users who can benefit or be harmed by the products you build. 107 | 108 | 随着时间的推移,你可能会建立数十亿人每天使用的工具——影响人们思考人类生命价值的工具,监测人类活动的工具,以及捕获和永久保存敏感数据的工具,如他们的孩子和亲人的图像,以及其他类型的敏感数据。作为一名工程师,你可能掌握着比你意识到的更多的权力:真正改变社会的权力。至关重要的是,在你成为一名杰出的工程师的过程中,你必须理解在不造成伤害的情况下行使权力所需的内在责任,这一点至关重要。第一步是要认识到由许多社会和教育因素造成的你的偏见的默认状态。在你认识到这一点之后,你就能考虑那些经常被遗忘的用例或用户,他们可以从你制造的产品中获益或受到伤害。 109 | 110 | The industry continues to move forward, building new use cases for artificial intelligence (AI) and machine learning at an ever-increasing speed. To stay competitive, we drive toward scale and efficacy in building a high-talent engineering and technology workforce. Yet we need to pause and consider the fact that today, some people have the ability to design the future of technology and others do not. We need to understand whether the software systems we build will eliminate the potential for entire populations to experience shared prosperity and provide equal access to technology. 111 | 112 | 软件行业持续发展,以不断提高的速度为人工智能(AI)和机器学习建立新的用例。为了保持竞争力,我们在建设高素质的工程和技术人才队伍方面,朝着规模和效率的方向努力。然而,我们需要暂停并考虑这样一个事实:今天,有些人有能力设计技术的未来,其他人却没有。我们需要了解我们建立的软件系统是否会消除整个人口体验共同繁荣的潜力,并提供平等获得技术的机会。 113 | 114 | Historically, companies faced with a decision between completing a strategic objective that drives market dominance and revenue and one that potentially slows momentum toward that goal have opted for speed and shareholder value. This tendency is exacerbated by the fact that many companies value individual performance and excellence, yet often fail to effectively drive accountability on product equity across all areas. Focusing on underrepresented users is a clear opportunity to promote equity. To continue to be competitive in the technology sector, we need to learn to engineer for global equity. 115 | 116 | 从历史上看,公司在完成推动市场主导地位和收入的战略目标和可能减缓实现这一目标势头的战略目标之间,都选择了速度和股东价值。许多公司重视个人的绩效和卓越,但往往不能有效地推动各领域的产品公平的问责机制,这加剧了这种倾向。关注代表性不足的用户显然是促进公平的机会。为了在技术领域继续保持竞争力,我们需要学习如何设计全球公平。 117 | 118 | Today, we worry when companies design technology to scan, capture, and identify people walking down the street. We worry about privacy and how governments might use this information now and in the future. Yet most technologists do not have the requisite perspective of underrepresented groups to understand the impact of racial variance in facial recognition or to understand how applying AI can drive harmful and inaccurate results. 119 | 120 | 如今,当公司设计扫描、捕获和识别街上行人的技术时,我们感到担忧。我们担心隐私问题以及政府现在和将来如何使用这些信息。然而,大多数技术专家并不具备代表性不足群体的必要视角,无法理解种族差异对面部识别的影响,也无法理解应用人工智能如何导致有害和不准确的结果。 121 | 122 | Currently, AI-driven facial-recognition software continues to disadvantage people of color or ethnic minorities. Our research is not comprehensive enough and does not include a wide enough range of different skin tones. We cannot expect the output to be valid if both the training data and those creating the software represent only a small subsection of people. In those cases, we should be willing to delay development in favor of trying to get more complete and accurate data, and a more comprehensive and inclusive product. 123 | 124 | 目前,人工智能驱动的面部识别软件仍然对有色人种或少数族裔不利。我们的研究还不够全面,没有包括足够多的肤色。如果训练数据和创建软件的人都只代表一小部分人,我们就不能指望输出是有效的。在这种情况下,我们应该愿意推迟开发,以获得更完整、更准确的数据,以及更全面、更包容的产品。 125 | 126 | Data science itself is challenging for humans to evaluate, however. Even when we do have representation, a training set can still be biased and produce invalid results. A study completed in 2016 found that more than 117 million American adults are in a law enforcement facial recognition database.[^5] Due to the disproportionate policing of Black communities and disparate outcomes in arrests, there could be racially biased error rates in utilizing such a database in facial recognition. Although the software is being developed and deployed at ever-increasing rates, the independent testing is not. To correct for this egregious misstep, we need to have the integrity to slow down and ensure that our inputs contain as little bias as possible. Google now offers statistical training within the context of AI to help ensure that datasets are not intrinsically biased. 127 | 128 | 然而,数据科学本身对人类的评估是具有挑战性的。即使我们有表示,训练集仍然可能有偏见,产生无效的结果。2016年完成的一项研究发现,执法部门的面部识别数据库中有1.17亿以上的美国成年人。由于黑人社区的警察比例过高,逮捕的结果也不尽相同,因此在面部识别中使用该数据库可能存在种族偏见错误率。尽管该软件的开发和部署速度不断提高,但独立测试却并非如此。为了纠正这一令人震惊的错误,我们需要有诚信,放慢脚步,确保我们的输入尽可能不包含偏见。谷歌现在在人工智能的范围内提供统计培训,以帮助确保数据集没有内在的偏见。 129 | 130 | Therefore, shifting the focus of your industry experience to include more comprehensive, multicultural, race and gender studies education is not only your responsibility, but also the responsibility of your employer. Technology companies must ensure that their employees are continually receiving professional development and that this development is comprehensive and multidisciplinary. The requirement is not that one individual take it upon themselves to learn about other cultures or other demographics alone. Change requires that each of us, individually or as leaders of teams, invest in continuous professional development that builds not just our software development and leadership skills, but also our capacity to understand the diverse experiences throughout humanity. 131 | 132 | 因此,将你的行业经验的重点转移到更全面的、多文化的、种族和性别研究的教育,不仅是你的责任,也是你雇主的责任。科技公司必须确保他们的员工不断接受专业发展,而且这种发展是全面和多学科的。要求不是个体独自承担起学习其他文化或其他人口统计学的任务。变革要求我们每个人,无论是个人还是团队的领导者,都要投资于持续的专业发展,不仅要培养我们的软件开发和领导技能,还要培养我们理解全人类不同经验的能力。 133 | 134 | > [^5]: Stephen Gaines and Sara Williams. “The Perpetual Lineup: Unregulated Police Face Recognition in America.” 135 | > 136 | > 5 斯蒂芬·盖恩斯和莎拉·威廉姆斯。“永远的阵容:美国不受监管的警察面孔识别。” 137 | 乔治敦法律学院隐私与技术中心,2016年10月18日。 138 | 139 | ## Making Diversity Actionable 让多样性成为现实 140 | 141 | Systemic equity and fairness are attainable if we are willing to accept that we are all accountable for the systemic discrimination we see in the technology sector. We are accountable for the failures in the system. Deferring or abstracting away personal accountability is ineffective, and depending on your role, it could be irresponsible. It is also irresponsible to fully attribute dynamics at your specific company or within your team to the larger societal issues that contribute to inequity. A favorite line among diversity proponents and detractors alike goes something like this: “We are working hard to fix (insert systemic discrimination topic), but accountability is hard. How do we combat (insert hundreds of years) of historical discrimination?” This line of inquiry is a detour to a more philosophical or academic conversation and away from focused efforts to improve work conditions or outcomes. Part of building multicultural capacity requires a more comprehensive understanding of how systems of inequality in society impact the workplace, especially in the technology sector. 142 | 143 | 如果我们愿意接受我们需要对我们在技术部门看到的系统歧视负责,那么系统的公平和公正是可以实现的。我们要对系统的故障负责。推迟或抽离个人责任是无效的,而且根据你的角色,这可能是不负责任的。将特定公司或团队的动态完全归因于导致不平等的更大社会问题也是不负责任的。多样性支持者和反对者中最喜欢的一句话是这样的。"我们正在努力解决(加入系统歧视的话题),但问责是很难的。我们如何打击(加入几百年来的)历史歧视?" 这条调查路线是一条通往哲学或学术对话的迂回之路,与改善工作条件或成果的专注努力相去甚远。建设多元文化能力的一部分需要更全面地了解社会中的不平等制度如何影响工作场所,特别是在技术部门。 144 | 145 | If you are an engineering manager working on hiring more people from underrepresented groups, deferring to the historical impact of discrimination in the world is a useful academic exercise. However, it is critical to move beyond the academic conversation to a focus on quantifiable and actionable steps that you can take to drive equity and fairness. For example, as a hiring software engineer manager, you’re accountable for ensuring that your candidate slates are balanced. Are there women or other underrepresented groups in the pool of candidates’ reviews? After you hire someone, what opportunities for growth have you provided, and is the distribution of opportunities equitable? Every technology lead or software engineering manager has the means to augment equity on their teams. It is important that we acknowledge that, although there are significant systemic challenges, we are all part of the system. It is our problem to fix. 146 | 147 | 如果你是一名工程经理,致力于雇用更多来自代表性不足的群体的人,推崇世界上歧视的历史影响是一项有益的学术活动。然而,关键是要超越学术交流,把重点放在可量化和可操作的步骤上,以推动公平和公正。例如,作为招聘软件工程师经理,你有责任确保你的候选人名单是均衡的。在候选人的审查中是否有女性或其他代表性不足的群体?雇佣员工后,你提供了哪些成长机会,机会分配是否公平?每个技术领导或软件工程经理都有办法在他们的团队中增加平等。重要的是,我们要承认,尽管存在着重大的系统性挑战,但我们都是这个系统的一部分。这是我们要解决的问题。 148 | 149 | ## Reject Singular Approaches 摒弃单一方法 150 | 151 | We cannot perpetuate solutions that present a single philosophy or methodology for fixing inequity in the technology sector. Our problems are complex and multifactorial. Therefore, we must disrupt singular approaches to advancing representation in the workplace, even if they are promoted by people we admire or who have institutional power. 152 | 153 | 我们不能让那些提出单一理念或方法来解决技术部门不公平问题的延续解决方案。我们的问题是复杂和多因素的。因此,我们必须打破推进工作场所代表性的单一方法,即使这些方法是由我们敬佩的人或拥有机构权力的人推动的。 154 | 155 | One singular narrative held dear in the technology industry is that lack of representation in the workforce can be addressed solely by fixing the hiring pipelines. Yes, that is a fundamental step, but that is not the immediate issue we need to fix. We need to recognize systemic inequity in progression and retention while simultaneously focusing on more representative hiring and educational disparities across lines of race, gender, and socioeconomic and immigration status, for example. 156 | 157 | 在科技行业中,有一种单一的说法是,劳动力中缺乏代表性的问题可以只通过修复招聘通道来解决。是的,这是一个基本步骤,但这并不是我们需要解决的紧迫问题。我们需要认识到在晋升和留任方面的系统不平等,同时关注更具代表性的招聘和教育差异,例如种族、性别、社会经济和移民状况。 158 | 159 | In the technology industry, many people from underrepresented groups are passed over daily for opportunities and advancement. Attrition among Black+ Google employees outpaces attrition from all other groups and confounds progress on representation goals. If we want to drive change and increase representation, we need to evaluate whether we’re creating an ecosystem in which all aspiring engineers and other technology professionals can thrive. 160 | 161 | 在科技行业,许多来自代表性不足的群体的人每天都被排除在机会和晋升之外。谷歌黑人员工的流失率超过了所有其他群体的流失率,并影响了代表目标的实现。如果我们想推动变革并提高代表性,我们需要评估我们是否正在创造一个所有有抱负的工程师和其他技术专业人员都能茁壮成长的生态系统。 162 | 163 | Fully understanding an entire problem space is critical to determining how to fix it. This holds true for everything from a critical data migration to the hiring of a representative workforce. For example, if you are an engineering manager who wants to hire more women, don’t just focus on building a pipeline. Focus on other aspects of the hiring, retention, and progression ecosystem and how inclusive it might or might not be to women. Consider whether your recruiters are demonstrating the ability to identify strong candidates who are women as well as men. If you manage a diverse engineering team, focus on psychological safety and invest in increasing multicultural capacity on the team so that new team members feel welcome. 164 | 165 | 充分了解整个问题空间对于确定如何解决它至关重要。这适用于从关键数据迁移到雇佣代表性员工的所有方面。例如,如果你是一个想雇用更多女性的工程经理,不要只关注单个方面建设。关注招聘、保留和晋升生态系统的其他方面,以及它对女性的包容性。考虑一下你的招聘人员是否展示了识别女性和男性候选人的能力。如果你管理一个多元化的工程团队,请关注心理安全,并投入于增加团队的多元文化能力,使新的团队成员感到受欢迎。 166 | 167 | A common methodology today is to build for the majority use case first, leaving improvements and features that address edge cases for later. But this approach is flawed; it gives users who are already advantaged in access to technology a head start, which increases inequity. Relegating the consideration of all user groups to the point when design has been nearly completed is to lower the bar of what it means to be an excellent engineer. Instead, by building in inclusive design from the start and raising development standards for development to make tools delightful and accessible for people who struggle to access technology, we enhance the experience for all users. 168 | 169 | 如今,一种常见的方法是首先为大多数用例构建,将解决边缘用例的改进和特性留待以后使用。但这种方法是有缺陷的;它让那些在获取技术方面已经有优势的用户抢先一步,这增加了不平等。把对所有用户群体的考虑放在设计即将完成的时候,就是降低成为一名优秀工程师的标准。相反,通过从一开始就采用包容性设计,提高开发标准,让那些难以获得技术的人能够轻松地使用工具,我们增强了所有用户的体验。 170 | 171 | Designing for the user who is least like you is not just wise, it’s a best practice. There are pragmatic and immediate next steps that all technologists, regardless of domain, should consider when developing products that avoid disadvantaging or underrepresenting users. It begins with more comprehensive user-experience research. This research should be done with user groups that are multilingual and multicultural and that span multiple countries, socioeconomic class, abilities, and age ranges. Focus on the most difficult or least represented use case first. 172 | 173 | 为与你最不同的用户设计,而且是最佳实践。所有的技术专家,无论在哪个领域,在开发产品时都应该考虑一些实用的和直接的步骤,以避免对用户造成不利影响或代表不足。它从更全面的用户体验研究开始。这项研究应该针对多语言、多文化、跨多个国家、社会经济阶层、能力和年龄范围的用户群体进行。首先关注最困难或最不典型的用例。 174 | 175 | ## Challenge Established Processes 挑战既定流程 176 | 177 | Challenging yourself to build more equitable systems goes beyond designing more inclusive product specifications. Building equitable systems sometimes means challenging established processes that drive invalid results. 178 | 179 | 挑战自己以建立更公平的系统,不仅仅是设计更具包容性的产品规格。建立公平系统有时意味着挑战那些推动无效结果的既定流程。 180 | 181 | Consider a recent case evaluated for equity implications. At Google, several engineering teams worked to build a global hiring requisition system. The system supports both external hiring and internal mobility. The engineers and product managers involved did a great job of listening to the requests of what they considered to be their core user group: recruiters. The recruiters were focused on minimizing wasted time for hiring managers and applicants, and they presented the development team with use cases focused on scale and efficiency for those people. To drive efficiency, the recruiters asked the engineering team to include a feature that would highlight performance ratings—specifically lower ratings—to the hiring manager and recruiter as soon as an internal transfer expressed interest in a job. 182 | 183 | 考虑一下最近一个被评估为对公平有影响的案例。在谷歌,几个工程团队致力于建立一个全球招聘申请系统。该系统同时支持外部招聘和内部流动。参与的工程师和产品经理在倾听他们认为是他们的核心用户群体的请求方面做得很好:招聘人员。招聘人员专注于最大限度地减少招聘经理和申请人的时间浪费,他们向开发团队提出了专注于这些人的规模和效率的案例。为了提高效率,招聘人员要求工程团队加入一项功能,在内部调动人员表示对某项工作感兴趣时,该功能将突出绩效评级,特别是向招聘经理和招聘人员提供较低的评级。 184 | 185 | On its face, expediting the evaluation process and helping job seekers save time is a great goal. So where is the potential equity concern? The following equity questions were raised: 186 | 187 | - Are developmental assessments a predictive measure of performance? 188 | - Are the performance assessments being presented to prospective managers free of individual bias? 189 | - Are performance assessment scores standardized across organizations? 190 | 191 | 从表面上看,加快评估过程和帮助求职者节省时间是一个伟大的目标。那么,潜在的公平问题在哪里?以下是提出的公平问题: 192 | 193 | - 发展评估是否是绩效的预测指标? 194 | - 向潜在经理提交的绩效评估是否没有个人偏见? 195 | - 绩效评估的分数在不同的组织中是标准化的吗? 196 | 197 | If the answer to any of these questions is “no,” presenting performance ratings could still drive inequitable, and therefore invalid, results. 198 | 199 | 如果这些问题的答案都是 "否",呈现绩效评级仍然可能导致不公平,因此是无效的结果。 200 | 201 | When an exceptional engineer questioned whether past performance was in fact predictive of future performance, the reviewing team decided to conduct a thorough review. In the end, it was determined that candidates who had received a poor performance rating were likely to overcome the poor rating if they found a new team. In fact, they were just as likely to receive a satisfactory or exemplary performance rating as candidates who had never received a poor rating. In short, performance ratings are indicative only of how a person is performing in their given role at the time they are being evaluated. Ratings, although an important way to measure performance during a specific period, are not predictive of future performance and should not be used to gauge readiness for a future role or qualify an internal candidate for a different team. (They can, however, be used to evaluate whether an employee is properly or improperly slotted on their current team; therefore, they can provide an opportunity to evaluate how to better support an internal candidate moving forward.) 202 | 203 | 当一位杰出的工程师质疑过去的业绩是否真的能预测未来的业绩时,审查小组决定进行一次彻底的审查。最后确定,曾经获得不良业绩评级的候选人如果找到一个新的团队,就有可能克服较差的评级。事实上,他们获得满意或堪称楷模绩效评级的可能性与从未获得过差评的候选人一样。简而言之,绩效评级仅表示一个人在担任指定角色时的表现。评级虽然是衡量特定时期绩效的一种重要方式,但不能预测未来绩效,不应用于衡量未来角色的准备情况或确定不同团队的内部候选人。(然而,它们可以被用来评估一个员工在其当前团队中的位置是否合适;因此,它们可以提供一个机会来评估如何更好地支持内部候选人发展。) 204 | 205 | This analysis definitely took up significant project time, but the positive trade-off was a more equitable internal mobility process. 206 | 207 | 这一分析无疑占用了大量的项目时间,但积极的权衡是一个更公平的内部流动过程。 208 | 209 | ## Values Versus Outcomes 价值观与成果 210 | 211 | Google has a strong track record of investing in hiring. As the previous example illustrates, we also continually evaluate our processes in order to improve equity and inclusion. More broadly, our core values are based on respect and an unwavering commitment to a diverse and inclusive workforce. Yet, year after year, we have also missed our mark on hiring a representative workforce that reflects our users around the globe. The struggle to improve our equitable outcomes persists despite the policies and programs in place to help support inclusion initiatives and promote excellence in hiring and progression. The failure point is not in the values, intentions, or investments of the company, but rather in the application of those policies at the implementation level. 212 | 213 | 谷歌在招聘方面有着良好的投入记录。正如前面的例子所示,我们也在不断评估我们的流程,以提高公平和包容。更广泛地说,我们的核心价值观是基于尊重、对多元化和包容性劳动力的坚定承诺。然而,一年又一年,我们在雇用一支反映我们全球用户的代表性员工队伍方面却没有达到目标。尽管制定了策略和计划,以帮助支持包容倡议并促进招聘和晋升的卓越性,但改善公平结果的斗争依然存在。失败点不在于公司的价值观、意图或投入,而在于这些策略在执行层面的应用。 214 | 215 | Old habits are hard to break. The users you might be used to designing for today— the ones you are used to getting feedback from—might not be representative of all the users you need to reach. We see this play out frequently across all kinds of products, from wearables that do not work for women’s bodies to video-conferencing software that does not work well for people with darker skin tones. 216 | 217 | 旧习惯很难改掉。你今天可能习惯于为之设计的用户——你习惯于从他们那里获得反馈——可能并不代表你需要接触的所有用户。我们看到这种情况经常发生在各种产品上,从不适合女性身体的可穿戴设备到不适合深肤色人的视频会议软件。 218 | 219 | So, what’s the way out? 220 | 221 | 1. Take a hard look in the mirror. At Google, we have the brand slogan, “Build For Everyone.” How can we build for everyone when we do not have a representative workforce or engagement model that centralizes community feedback first? We can’t. The truth is that we have at times very publicly failed to protect our most vulnerable users from racist, antisemitic, and homophobic content. 222 | 2. Don’t build for everyone. Build with everyone. We are not building for everyone yet. That work does not happen in a vacuum, and it certainly doesn’t happen when the technology is still not representative of the population as a whole. That said, we can’t pack up and go home. So how do we build for everyone? We build with our users. We need to engage our users across the spectrum of humanity and be intentional about putting the most vulnerable communities at the center of our design. They should not be an afterthought. 223 | 3. Design for the user who will have the most difficulty using your product. Building for those with additional challenges will make the product better for everyone. Another way of thinking about this is: don’t trade equity for short-term velocity. 224 | 4. Don’t assume equity; measure equity throughout your systems. Recognize that decision makers are also subject to bias and might be undereducated about the causes of inequity. You might not have the expertise to identify or measure the scope of an equity issue. Catering to a single userbase might mean disenfranchising another; these trade-offs can be difficult to spot and impossible to reverse. Partner with individuals or teams that are subject matter experts in diversity, equity, and inclusion. 225 | 5. Change is possible. The problems we’re facing with technology today, from surveillance to disinformation to online harassment, are genuinely overwhelming. We can’t solve these with the failed approaches of the past or with just the skills we already have. We need to change. 226 | 227 | 那么,出路是什么? 228 | 229 | 1. 认真照照镜子。在谷歌,我们有一个品牌口号,"为每个人而建"。当我们没有一个代表性的员工队伍或首先集中社区反馈的参与模式时,我们如何为每个人建设?我们不能。事实是,我们有时在公开场合未能保护我们最脆弱的用户免受种族主义、反犹太主义和恐同内容的侵害。 230 | 2. 不要为每个人而建。要与所有人一起共建。我们还没有为每个人建设的能力。这项工作不会凭空实现,当技术仍然不能代表整个人口时,这项工作肯定不会发生。话虽如此,我们也不能打包回家。那么,我们如何为每个人建立?我们与我们的用户一起建设。我们需要让全人类的用户参与进来,并有意将最脆弱的群体置于我们设计的中心。他们不应该是事后的考虑对象。 231 | 3. 为那些在使用你的产品时遇到最大困难的用户设计。为那些有额外挑战的人设计将使产品对所有人都更好。另一种思考方式是:不要用公平来换取短期的速度。 232 | 4. 不要假设公平;**衡量整个系统的公平性**。认识到决策者也会有偏见,而且可能对不平等的原因认识不足。你可能不具备识别或衡量公平问题的范围的专业知识。迎合单个用户群可能意味着剥夺另一个用户群的权利;这些权衡可能很难发现,也不可能逆转。与作为多元化主题专家的个人或团队合作,公平、平等和包容。 233 | 5. 改变是可能的。我们今天所面临的技术问题,从监视到虚假信息再到在线骚扰,确实是令人难以承受的。我们不能用过去失败的方法或只用我们已有的技能来解决这些问题。我们需要改变。 234 | 235 | ## Stay Curious, Push Forward 保持好奇心,勇往直前 236 | 237 | The path to equity is long and complex. However, we can and should transition from simply building tools and services to growing our understanding of how the products we engineer impact humanity. Challenging our education, influencing our teams and managers, and doing more comprehensive user research are all ways to make progress. Although change is uncomfortable and the path to high performance can be painful, it is possible through collaboration and creativity. 238 | 239 | 通往公平的道路是道阻且长。然而,我们可以也应该从简单地构建工具和服务过渡到加深我们对我们设计的产品如何影响人类的理解。挑战我们的教育,影响我们的团队和管理者,以及做更全面的用户研究,都是取得进展的方法。虽然改变是痛苦的,而且通向高绩效的道路可能是痛苦的,但通过合作和创新,变革是可能的。 240 | 241 | Lastly, as future exceptional engineers, we should focus first on the users most impacted by bias and discrimination. Together, we can work to accelerate progress by focusing on Continuous Improvement and owning our failures. Becoming an engineer is an involved and continual process. The goal is to make changes that push humanity forward without further disenfranchising the disadvantaged. As future exceptional engineers, we have faith that we can prevent future failures in the system. 242 | 243 | 最后,作为未来的杰出工程师,我们应该首先关注受偏见和歧视影响最大的用户。通过共同努力,我们可以通过专注于持续改进和承认失败来加速进步。成为一名工程师是一个复杂而持续的过程。目标是在不进一步剥夺弱势群体权利的情况下,做出推动人类前进的变革。作为未来杰出的工程师,我们有信心能够防止未来系统的失败。 244 | 245 | ## Conclusion 总结 246 | 247 | Developing software, and developing a software organization, is a team effort. As a software organization scales, it must respond and adequately design for its user base, which in the interconnected world of computing today involves everyone, locally and around the world. More effort must be made to make both the development teams that design software and the products that they produce reflect the values of such a diverse and encompassing set of users. And, if an engineering organization wants to scale, it cannot ignore underrepresented groups; not only do such engineers from these groups augment the organization itself, they provide unique and necessary perspectives for the design and implementation of software that is truly useful to the world at large. 248 | 249 | 开发软件和开发软件组织是一项团队工作。随着软件组织规模的扩大,它必须对其用户群做出响应并进行充分设计,在当今互联的计算世界中,用户群涉及到本地和世界各地的每个人。必须做出更多的努力,使设计软件的开发团队和他们生产的产品都能反映出这样一个多样化的、包含了所有用户的价值观。而且,如果一个工程组织想要扩大规模,它不能忽视代表性不足的群体;这些来自这些群体的工程师不仅能增强组织本身,还能为设计和实施对整个世界真正有用的软件提供独特而必要的视角。 250 | 251 | ## TL;DRs 内容提要 252 | 253 | - Bias is the default. 254 | - Diversity is necessary to design properly for a comprehensive user base. 255 | - Inclusivity is critical not just to improving the hiring pipeline for underrepresented groups, but to providing a truly supportive work environment for all people. 256 | - Product velocity must be evaluated against providing a product that is truly useful to all users. It’s better to slow down than to release a product that might cause harm to some users. 257 | 258 | - 偏见是默认的。 259 | - 多样性是正确设计综合用户群所必需的。 260 | - 包容性不仅对于改善代表不足的群体的招聘渠道至关重要,而且对于为所有人提供一个真正支持性的工作环境也至关重要。 261 | - 产品速度必须根据提供对所有用户真正有用的产品来评估。与其发布一个可能对某些用户造成伤害的产品,还不如放慢速度。 262 | -------------------------------------------------------------------------------- /zh-cn/Chapter-20_Static_Analysis/Chapter-20_Static_Analysis.md: -------------------------------------------------------------------------------- 1 | 2 | **CHAPTER 20** 3 | 4 | # Static Analysis 5 | # 第二十章 静态分析 6 | 7 | **Written by Caitlin Sadowski** 8 | 9 | **Edited by Lisa Carey** 10 | 11 | Static analysis refers to programs analyzing source code to find potential issues such as bugs, antipatterns, and other issues that can be diagnosed *without executing the* *program*. The “static” part specifically refers to analyzing the source code instead of a running program (referred to as “dynamic” analysis). Static analysis can find bugs in programs early, before they are checked in as production code. For example, static analysis can identify constant expressions that overflow, tests that are never run, or invalid format strings in logging statements that would crash when executed.[^1] However, static analysis is useful for more than just finding bugs. Through static analysis at Google, we codify best practices, help keep code current to modern API versions, and prevent or reduce technical debt. Examples of these analyses include verifying that naming conventions are upheld, flagging the use of deprecated APIs, or pointing out simpler but equivalent expressions that make code easier to read. Static analysis is also an integral tool in the API deprecation process, where it can prevent backsliding during migration of the codebase to a new API (see Chapter 22). We have also found evidence that static analysis checks can educate developers and actually prevent antipatterns from entering the codebase.[^2] 12 | 13 | 静态分析是指通过程序分析源代码来发现潜在的问题,例如bug、反模式和其他无需执行程序即可诊断的问题。“静态”具体是指分析源代码,而不是运行中的程序(即“动态”分析)。它可以在代码被合入生产环境前发现bug,例如,可以识别溢出的常量表达式、永远不会运行的测试用例或日志字符串的无效格式化导致运行崩溃的问题。但静态分析的作用不只是查找bug。通过对Google代码的静态分析,我们编写了最佳实践,帮助推进代码使用最新接口和减少技术债,这些分析的例子包括:校验是否遵循命名规范;标记使用已弃用的API;简化表达式以提高代码可读性。静态分析也是弃用某个接口时不可或缺的工具,它可以防止将代码库迁移到新接口时出现“倒退”现象(参见第22章,指被调用系统不断迁移旧接口到新接口,而其他系统不断的调用弃用接口而不调用新接口)。我们还发现静态分析检查可以对开发人员起到启发和约束作用,可以防止开发人员写出反模式的代码。 14 | 15 | In this chapter, we’ll look at what makes effective static analysis, some of the lessons we at Google have learned about making static analysis work, and how we implemented these best practices in our static analysis tooling and processes.[^3] 16 | 17 | 本章我们将介绍如何进行高效的静态分析,包含我们在 Google 了解到的一些关于静态分析工作的经验和我们在静态分析工具和流程中的最佳实践。 18 | 19 | > [^1]: See `http://errorprone.info/bugpatterns`. 20 | > 21 | > 1 查阅 `http://errorprone.info/bugpatterns`。 22 | > 23 | > [^2]: Caitlin Sadowski et al. Tricorder: Building a Program Analysis Ecosystem, International Conference on Software Engineering (ICSE), May 2015. 24 | > 25 | > 2 Caitlin Sadowski等人,Tricorder。构建一个程序分析生态系统,国际软体工程会议(ICSE),2015年5月。 26 | > 27 | > [^3]: A good academic reference for static analysis theory is: Flemming Nielson et al. Principles of Program Analysis (Gernamy: Springer, 2004) 28 | > 29 | > 3 关于静态分析理论,一个很好的学术参考资料是。Flemming Nielson等人,《程序分析原理》(Gernamy: Springer, 2004) 30 | 31 | ## 高效静态分析的特点 32 | 33 | Although there have been decades of static analysis research focused on developing new analysis techniques and specific analyses, a focus on approaches for improving *scalability* and *usability* of static analysis tools has been a relatively recent development. 34 | 35 | 尽管几十年来,静态分析研究一直专注于开发新的分析技术和具体分析,但提高静态分析工具的可扩展性和可用性的方法最近才开始发展。 36 | 37 | ### Scalability 可扩展性 38 | 39 | Because modern software has become larger, analysis tools must explicitly address scaling in order to produce results in a timely manner, without slowing down the software development process. Static analysis tools at Google must scale to the size of Google’s multibillion-line codebase. To do this, analysis tools are shardable and incremental. Instead of analyzing entire large projects, we focus analyses on files affected by a pending code change, and typically show analysis results only for edited files or lines. Scaling also has benefits: because our codebase is so large, there is a lot of low- hanging fruit in terms of bugs to find. In addition to making sure analysis tools can run on a large codebase, we also must scale up the number and variety of analyses available. Analysis contributions are solicited from throughout the company. Another component to static analysis scalability is ensuring the *process* is scalable. To do this, Google static analysis infrastructure avoids bottlenecking analysis results by showing them directly to relevant engineers. 40 | 41 | 现代软件变得越来越大,为了使分析工具在不减慢软件开发过程的情况下及时生效,必须高效地解决扩展性问题。对 Google 来说,分析工具需要满足 Google 数十亿行代码库的规模。为此,分析工具是分片和增量分析的,即不是分析整个大型项目,而是将分析重点放在受待处理代码变更影响的文件上,并且通常仅显示已编辑文件或行的分析结果。扩展也有好处:因为我们的代码库非常大,这样做在寻找bug时容易的多。除了确保分析工具可以在大型代码库上运行之外,还需要必须扩大可分析的数量和种类。分析器是在整个公司范围内征求的。静态分析可扩展性的另一个组成部分是确保过程是可扩展的,为此,Google静态分析基础架构通过直接向相关工程师展示分析结果来避免造成分析瓶颈。 42 | 43 | ### Usability 可用性 44 | 45 | When thinking about analysis usability, it is important to consider the cost-benefit trade-off for static analysis tool users. This “cost” could either be in terms of developer time or code quality. Fixing a static analysis warning could introduce a bug. For code that is not being frequently modified, why “fix” code that is running fine in production? For example, fixing a dead code warning by adding a call to the previously dead code could result in untested (possibly buggy) code suddenly running. There is unclear benefit and potentially high cost. For this reason, we generally focus on newly introduced warnings; existing issues in otherwise working code are typically only worth highlighting (and fixing) if they are particularly important (security issues, significant bug fixes, etc.). Focusing on newly introduced warnings (or warnings on modified lines) also means that the developers viewing the warnings have the most relevant context on them. 46 | 47 | 考虑可用性时,重要的是要考虑静态分析工具用户的成本效益权衡。这种”成本”可能是开发时间或代码质量。修复静态分析警告可能会引入错误的。对于不经常被修改的代码,为什么要 "修复 "在生产中运行良好的代码?例如,通过添加对死代码(从未被运行过的代码)的调用来修复硬编码警告,可能会导致未经测试(可能有错误)的代码突然运行。这种做法收益不明确,但是成本可能很高。出于这个原因,我们通常只关注新引入的警告,代码中的现有问题通常只在特别重要(安全问题、重大错误修复等)时才值得修复。关注新引入的警告(或修改行上的警告)也意味着查看警告的开发人员具有最相关的上下文和背景。 48 | 49 | Also, developer time is valuable! Time spent triaging analysis reports or fixing highlighted issues is weighed against the benefit provided by a particular analysis. If the analysis author can save time (e.g., by providing a fix that can be automatically applied to the code in question), the cost in the trade-off goes down. Anything that can be fixed automatically should be fixed automatically. We also try to show developers reports about issues that actually have a negative impact on code quality so that they do not waste time slogging through irrelevant results. 50 | 51 | 此外,开发人员的时间很宝贵,要对分析报告进行分类或修复突出问题所花费的时间与特定分析提供的收益进行权衡。如果分析可以节省时间(例如,通过提供可以自动应用于相关代码的修复),则成本就会下降。任何可以自动修复的东西都应该自动修复。我们还尝试向开发人员展示实际上对代码质量有负面影响的问题的报告,这样他们就不会浪费时间费力地处理不相关的分析结果。 52 | 53 | To further reduce the cost of reviewing static analysis results, we focus on smooth developer workflow integration. A further strength of homogenizing everything in one workflow is that a dedicated tools team can update tools along with workflow and code, allowing analysis tools to evolve with the source code in tandem. 54 | 55 | 为了进一步降低查看静态分析结果的成本,我们将重点放在平滑的开发人员工作流程集成上。在一个工作流中同质化所有内容的另一个优势是,一个专门的工具团队可以随着工作流和代码一起更新工具,从而允许分析工具与源代码同步发展。 56 | 57 | We believe these choices and trade-offs that we have made in making static analyses scalable and usable arise organically from our focus on three core principles, which we formulate as lessons in the next section. 58 | 59 | 我们在使静态分析具有可扩展性和可用性方面所做的这些选择和权衡,是从我们对三个核心原则的关注中产生的,我们将在下一节中阐述这三个原则作为经验教训。 60 | 61 | ## Key Lessons in Making Static Analysis Work 静态分析工作中的关键工作 62 | 63 | There are three key lessons that we have learned at Google about what makes static analysis tools work well. Let’s take a look at them in the following subsections. 64 | 65 | 我们在 Google 了解到了如何用好静态分析工具的三个关键点。让我们在下面的小节中看看它们。 66 | 67 | ### Focus on Developer Happiness 关注开发者的幸福感 68 | 69 | We mentioned some of the ways in which we try to save developer time and reduce the cost of interacting with the aforementioned static analysis tools; we also keep track of how well analysis tools are performing. If you don’t measure this, you can’t fix problems. We only deploy analysis tools with low false-positive rates (more on that in a minute). We also *actively solicit and act on feedback* from developers consuming static analysis results, in real time. Nurturing this feedback loop between static analysis tool users and tool developers creates a virtuous cycle that has built up user trust and improved our tools. User trust is extremely important for the success of static analysis tools. 70 | 71 | 我们提到了一些尝试节省开发人员时间并降低与静态分析工具交互成本的方法,我们还跟踪分析工具的性能。如果不对此进行衡量,你就无法解决问题。我们只部署误报率较低的分析工具(稍后将详细介绍)。我们还*积极征求开发人员对静态分析结果的实时反馈并采取行动*,在静态分析工具用户和开发人员之间形成反馈闭环,创造一个良性循环,建立了用户信任,借此改进我们的工具。用户信任对于静态分析工具的成功至关重要。 72 | 73 | For static analysis, a “false negative” is when a piece of code contains an issue that the analysis tool was designed to find, but the tool misses it. A “false positive” occurs when a tool incorrectly flags code as having the issue. Research about static analysis tools traditionally focused on reducing false negatives; in practice, low false-positive rates are often critical for developers to actually want to use a tool—who wants to wade through hundreds of false reports in search of a few true ones?[^4] 74 | 75 | 对于静态分析,“漏报(false negative)”是指一段代码包含分析工具找到的问题,但工具未能发现,“误报(false positive)”是指工具错误地将代码标记为存在问题。一般来说,静态分析工具的研究侧重于减少漏报;实践中,开发者是否真正想要使用工具取决于误报率是否很低——谁愿意在数百个虚假报告中费力寻找一些真实的报告? 76 | 77 | Furthermore, perception is a key aspect of the false-positive rate. If a static analysis tool is producing warnings that are technically correct but misinterpreted by users as false positives (e.g., due to confusing messages), users will react the same as if those warnings were in fact false positives. Similarly, warnings that are technically correct but unimportant in the grand scheme of things provoke the same reaction. We call the user-perceived false-positive rate the “effective false positive” rate. An issue is an “effective false positive” if developers did not take some positive action after seeing the issue. This means that if an analysis incorrectly reports an issue, yet the developer happily makes the fix anyway to improve code readability or maintainability, that is not an effective false positive. For example, we have a Java analysis that flags cases in which a developer calls the contains method on a hash table (which is equivalent to containsValue) when they actually meant to call containsKey—even if the developer correctly meant to check for the value, calling containsValue instead is clearer. Similarly, if an analysis reports an actual fault, yet the developer did not understand the fault and therefore took no action, that is an effective false positive. 78 | 79 | 此外,用户感知是误报率的一个关键因素。如果静态分析工具产生的警告在技术上是正确的,但被用户误解为误报(例如,由于告警消息混乱),用户的反应将与这些警告实际上是误报一样。类似地,技术上正确但在大局中不重要的警告也会引发同样的反应。我们将用户感知的误报率称为“有效误报率”。如果开发者在看到问题后没有采取积极的行动,那么该问题就是“有效的误报(effective false positive)”,这意味着,如果一个分析错误地报告了一个问题,但开发人员仍然乐于进行修复,以提高代码的可读性或可维护性,那么这就不是一个有效的误报。例如,我们有一个Java分析,它标记了这样一种情况:当开发者实际上想要调用`containsKey`方法时,却错误地调用了哈希表的`contains`方法,它会标记这种情况,即使开发人员正确地打算检查值,调用containsValue反而更清晰。同样,如果分析报告了一个实际的故障,但开发人员不了解故障,因此没有采取任何行动,这就是一个有效的误报。 80 | 81 | > [^4]: Note that there are some specific analyses for which reviewers might be willing to tolerate a much higher false-positive rate: one example is security analyses that identify critical problems. 82 | > 83 | > 4 请注意,有一些特定的分析,审查员可能愿意容忍更高的误报率:一个例子是识别关键问题的安全分析。 84 | 85 | ### Make Static Analysis a Part of the Core Developer Workflow 使静态分析成为核心开发工作流程的一部分 86 | 87 | At Google, we integrate static analysis into the core workflow via integration with code review tooling. Essentially all code committed at Google is reviewed before being committed; because developers are already in a change mindset when they send code for review, improvements suggested by static analysis tools can be made without too much disruption. There are other benefits to code review integration. Developers typically context switch after sending code for review, and are blocked on reviewers— there is time for analyses to run, even if they take several minutes to do so. There is also peer pressure from reviewers to address static analysis warnings. Furthermore, static analysis can save reviewer time by highlighting common issues automatically; static analysis tools help the code review process (and the reviewers) scale. Code review is a sweet spot for analysis results.[^5] 88 | 89 | 在 Google ,我们通过与代码审查工具集成,将静态分析集成到核心工作流中。基本上 Google 提交的所有代码在提交之前都会经过审查,因为开发人员在发送代码供审查时已经改变了心态,所以静态分析工具建议的改进可以在没有太多干扰的情况下进行。代码审查集成还有其他好处,开发人员通常在发送代码进行审查后切换上下文,并且在审查员面前被阻止——即使需要几分钟的时间来运行分析。来自审查员的同行压力也要求解决静态分析警告问题,此外,静态分析可以自动突出常见问题,从而节省审阅者的时间,这有助于代码评审过程(以及审查员)的规模化。代码评审是分析结果的最佳选择。 90 | 91 | > [^5]: See later in this chapter for more information on additional integration points when editing and browsing code. 92 | > 93 | > 5 关于编辑和浏览代码时的额外集成点的更多信息,请参见本章后面的内容。 94 | 95 | ### Empower Users to Contribute 允许用户做出贡献 96 | 97 | There are many domain experts at Google whose knowledge could improve code produced. Static analysis is an opportunity to leverage expertise and apply it at scale by having domain experts write new analysis tools or individual checks within a tool. 98 | 99 | Google有许多领域专家,他们的知识可以改进生成的代码。静态分析创造了一个利用他们的专业知识并大规模应用的机会,即利用领域专家编写新的分析工具或在工具中进行单独检查。 100 | 101 | For example, experts who know the context for a particular kind of configuration file can write an analyzer that checks properties of those files. In addition to domain experts, analyses are contributed by developers who discover a bug and would like to prevent the same kind of bug from reappearing anywhere else in the codebase. We focus on building a static analysis ecosystem that is easy to plug into instead of integrating a small set of existing tools. We have focused on developing simple APIs that can be used by engineers throughout Google—not just analysis or language experts— to create analyses; for example, Refaster[^6] enables writing an analyzer by specifying pre- and post-code snippets demonstrating what transformations are expected by that analyzer. 102 | 103 | 例如,了解特定类型配置文件上下文的专家可以编写一个分析器来检查这些文件的属性。除了领域专家之外,发现bug并希望防止同类bug在代码库中的任何其他地方再次出现的开发人员也可以提供贡献。我们专注于构建一个易于插入的静态分析生态系统,而不是集成一小部分现有工具。我们专注于开发简单的API,可供整个 Google 的工程师(不仅仅是分析或语言专家)用来创建分析;例如,重构可以通过指定前后代码片段来编写分析器,来达到该分析器期望的效果。 104 | 105 | > [^6]: Louis Wasserman, “Scalable, Example-Based Refactorings with Refaster.” Workshop on Refactoring Tools, 2013. 106 | > 107 | > 6 Louis Wasserman,"用Refaster进行可扩展的、基于实例的重构"。重构工具研讨会,2013年。 108 | 109 | ## Tricorder: Google’s Static Analysis Platform Tricorder: Google 的静态分析平台 110 | 111 | Tricorder, our static analysis platform, is a core part of static analysis at Google.[^7] Tricorder came out of several failed attempts to integrate static analysis with the developer workflow at Google;[^8] the key difference between Tricorder and previous attempts was our relentless focus on having Tricorder deliver only valuable results to its users. Tricorder is integrated with the main code review tool at Google, Critique. Tricorder warnings show up on Critique’s diff viewer as gray comment boxes, as demonstrated in Figure 20-1. 112 | 113 | 我们的静态分析平台 Tricorder是Google静态分析的核心部分。Tricorder是在Google多次尝试将静态分析与开发人员工作流集成的失败尝试中诞生的,与之前尝试的主要区别在于我们坚持不懈地致力于让Tricorder只为用户提供有价值的结果。Tricorder与 Google 的主要代码审查工具Critique集成在一起。 Tricorder警告在Critique的差异查看器上显示为灰色的注释框,如图 20-1 所示。 114 | 115 | ![Figure 20-1](./images/Figure%2020-1.png) 116 | 117 | *Figure 20-1. Critique’s diff viewing, showing a static analysis warning from Tricorder in* *gray* 图20-1. Critique的diff查看,灰色显示了Tricorder的静态分析警告 118 | 119 | To scale, Tricorder uses a microservices architecture. The Tricorder system sends analyze requests to analysis servers along with metadata about a code change. These servers can use that metadata to read the versions of the source code files in the change via a FUSE-based filesystem and can access cached build inputs and outputs. The analysis server then starts running each individual analyzer and writes the output to a storage layer; the most recent results for each category are then displayed in Critique. Because analyses sometimes take a few minutes to run, analysis servers also post status updates to let change authors and reviewers know that analyzers are running and post a completed status when they have finished. Tricorder analyzes more than 50,000 code review changes per day and is often running several analyses per second. 120 | 121 | 为了方便扩展,Tricorder使用微服务架构。 Tricorder系统将分析请求连同有关代码更改的元数据发送到分析服务器。这些服务器可以使用该元数据通过基于FUSE的文件系统读取更改中源代码文件的版本,并且可以访问缓存的构建输入和输出。然后分析服务器开始运行每个单独的分析器并将输出写入存储层。每个类别的最新结果随后会显示在Critique中。因为分析有时需要等几分钟,分析服务器也会发布状态更新,让代码作者和审查员知道分析器正在运行,并在完成后发布完成状态。Tricorder每天分析超过50,000次代码审查更改,并且通常每秒运行多次分析。整个Google的开发人员编写Tricorder分析(称为“分析器”)或为现有分析贡献单独的“检查”。 122 | 123 | Developers throughout Google write Tricorder analyses (called “analyzers”) or contribute individual “checks” to existing analyses. There are four criteria for new Tricorder checks: 124 | 125 | - *Be understandable* 126 | Be easy for any engineer to understand the output. 127 | - *Be* *actionable* *and* *easy* *to* *fix* 128 | The fix might require more time, thought, or effort than a compiler check, and the result should include guidance as to how the issue might indeed be fixed. 129 | - *Produce less than 10% effective false positives* 130 | Developers should feel the check is pointing out an actual issue [at least 90% of](https://oreil.ly/ARSzt) [the time](https://oreil.ly/ARSzt). 131 | - *Have* *the potential for significant impact on code quality* 132 | The issues might not affect correctness, but developers should take them seriously and deliberately choose to fix them. 133 | 134 | Tricorder 检查有四个标准: 135 | 136 | - *易于理解* 137 | 便于任何工程师理解输出结果。 138 | - *可操作且易于修复* 139 | ​ 与编译器检查相比,修复可能需要更多的时间、思考或尝试,结果应包括有关如何真正修复问题的指导。 140 | - *少于10%的有效误报* 141 | ​ 开发人员应该觉得检查至少在90%的时间里指出了实际问题。 142 | - *有可能对代码质量产生重大影响* 143 | ​ 这些问题可能不会影响正确性,但开发人员应该认真对待它们并有意识地选择修复它们。 144 | 145 | Tricorder analyzers report results for more than 30 languages and support a variety of analysis types. Tricorder includes more than 100 analyzers, with most being contributed from outside the Tricorder team. Seven of these analyzers are themselves plug-in systems that have hundreds of additional checks, again contributed from developers across Google. The overall effective false-positive rate is just below 5%. 146 | 147 | Tricorder分析仪报告支持30种语言,并支持多种分析类型。Tricorder包括100多个分析器,其中大部分来自Tricorder团队外部。 其中七个分析器本身就是插件系统,拥有数百个额外的检查项,由 Google 的开发者们贡献,总体有效的误报率略低于 5%。 148 | 149 | > [^7]: Caitlin Sadowski, Jeffrey van Gogh, Ciera Jaspan, Emma Söderberg, and Collin Winter, Tricorder: Building a Program Analysis Ecosystem, International Conference on Software Engineering (ICSE), May 2015. 150 | > 151 | > 7 Caitlin Sadowski, Jeffrey van Gogh, Ciera Jaspan, Emma Söderberg, and Collin Winter, Tricorder: 构建一个程序分析生态系统,国际软件工程会议(ICSE),2015年5月。 152 | > 153 | > [^8]: Caitlin Sadowski, Edward Aftandilian, Alex Eagle, Liam Miller-Cushon, and Ciera Jaspan, “Lessons from Building Static Analysis Tools at Google”, Communications of the ACM, 61 No. 4 (April 2018): 58–66, `https://cacm.acm.org/magazines/2018/4/226371-lessons-from-building-static-analysis-tools-at-google/fulltext`. 154 | > 155 | > 8 Caitlin Sadowski, Edward Aftandilian, Alex Eagle, Liam Miller-Cushon, and Ciera Jaspan, “Lessons from Building Static Analysis Tools at Google”, ACM通讯期刊, 61 No. 4 (April 2018): 58–66, `https:// cacm.acm.org/magazines/2018/4/226371-lessons-from-building-static-analysis-tools-at-google/fulltext`. 156 | 157 | ### Integrated Tools 集成工具 158 | 159 | There are many different types of static analysis tools integrated with Tricorder. 160 | 161 | Tricorder 集成了许多不同类型的静态分析工具。 162 | 163 | [Error Prone](http://errorprone.info/) and [clang-tidy](https://oreil.ly/DAMiv)extend the compiler to identify AST antipatterns for Java and C++, respectively. These antipatterns could represent real bugs. For example, consider the following code snippet hashing a field f of type long: 164 | 165 | ```C++ 166 | result = 31 * result + (int) (f ^ (f >>> 32)); 167 | ``` 168 | 169 | Error Prone 和 clang-tidy 扩展了编译器以分别识别 Java 和 C++ 的 AST 反模式。 这些反模式可能代表真正的错误。例如,考虑以下代码片段散列 long 类型的字段 f: 170 | 171 | ```C++ 172 | result = 31 * result + (int) (f ^ (f >>> 32)); 173 | ``` 174 | 175 | Now consider the case in which the type of f is int. The code will still compile, but the right shift by 32 is a no-op so that f is XORed with itself and no longer affects the value produced. We fixed 31 occurrences of this bug in Google’s codebase while enabling the check as a compiler error in Error Prone. There are [many more such examples](https://errorprone.info/bugpatterns). AST antipatterns can also result in code readability improvements, such as removing a redundant call to `.get()` on a smart pointer. 176 | 177 | 现在考虑f的类型是int的情况,代码仍然可以编译,但是右移32是空操作,因此 f 与自身进行异或,不再影响最终产生的值。我们修复了 Google 代码库中出现的 31 次该错误,同时在 Error Prone 中将检查作为编译器错误启用。这样的例子还有很多。 AST 反模式还可以提高代码的可读性,例如删除对智能指针的 `.get()` 的冗余调用。 178 | 179 | Other analyzers showcase relationships between disparate files in a corpus. The Deleted Artifact Analyzer warns if a source file is deleted that is referenced by other non-code places in the codebase (such as inside checked-in documentation). IfThisThenThat allows developers to specify that portions of two different files must be changed in tandem (and warns if they are not). Chrome’s Finch analyzer runs on configuration files for A/B experiments in Chrome, highlighting common problems including not having the right approvals to launch an experiment or crosstalk with other currently running experiments that affect the same population. The Finch analyzer makes Remote Procedure Calls (RPCs) to other services in order to provide this information. 180 | 181 | 其他分析器展示了语料库中不同文件之间的关系。如果删除了代码库中其他非代码位置(例如签入文档中)引用的源文件,Deleted Artifact Analyzer 会发出警告。 IfThis-ThenThat 允许开发人员指定两个不同文件的部分必须同时更改(如果不是,则发出警告)。 Chrome 的 Finch 分析器在 Chrome 中的 A/B 实验的配置文件上运行,突出显示常见问题,包括未获得启动实验的正确批准或与影响同一人群的其他当前正在运行的实验串扰。 Finch分析器通过远程过程调用(RPCs)向其他服务请求以提供这些信息。 182 | 183 | In addition to the source code itself, some analyzers run on other artifacts produced by that source code; many projects have enabled a binary size checker that warns when changes significantly affect a binary size. 184 | 185 | 除了源代码本身之外,一些分析器还可以在该源代码生成的其他构件上运行;许多项目启用了二进制大小检查器,当更改显着影响二进制大小时会发出警告。 186 | 187 | Almost all analyzers are intraprocedural, meaning that the analysis results are based on code within a procedure (function). Compositional or incremental interprocedural analysis techniques are technically feasible but would require additional infrastructure investment (e.g., analyzing and storing method summaries as analyzers run). 188 | 189 | 几乎所有分析器都是面向过程内的,这意味着分析结果基于过程(函数)内的代码。组合或增量过程间分析技术在技术上是可行的,但需要额外的基础设施投资(例如,在分析器运行时分析和存储方法摘要)。 190 | 191 | ### Integrated Feedback Channels 集成反馈渠道 192 | 193 | As mentioned earlier, establishing a feedback loop between analysis consumers and analysis writers is critical to track and maintain developer happiness. With Tricorder, we display the option to click a “Not useful” button on an analysis result; this click provides the option to file a bug *directly against the analyzer writer* about why the result is not useful with information about analysis result prepopulated. Code reviewers can also ask change authors to address analysis results by clicking a “Please fix” button. The Tricorder team tracks analyzers with high “Not useful” click rates, particularly relative to how often reviewers ask to fix analysis results, and will disable analyzers if they don’t work to address problems and improve the “not useful” rate. Establishing and tuning this feedback loop took a lot of work, but has paid dividends many times over in improved analysis results and a better user experience (UX)— before we established clear feedback channels, many developers would just ignore analysis results they did not understand. 194 | 195 | 如上所述,建立分析者和作者之间反馈闭环对于跟踪和维护开发人员的成幸福感很重要。Tricorder会在分析结果上显示单击“无用”按钮的选项;此按钮提供了*直接针对分析器作者*提交错误的选项,说明了为什么分析结果信息无用,代码审查员还可以通过单击“请修复”按钮要求变更作者处理分析结果。 Tricorder团队跟踪“无用”按钮点击率高的分析器,特别是与审阅者要求修复分析结果的频率有关,如果分析器不能解决问题并改进“无用”,则会禁用分析器。建立和调整这个反馈闭环需要大量工作,但在改进分析结果和更好的用户体验 (UX) 方面已经获得了巨大的回报——在我们建立清晰的反馈渠道之前,许多开发人员会忽略他们不理解的分析结果. 196 | 197 | And sometimes the fix is pretty simple—such as updating the text of the message an analyzer outputs! For example, we once rolled out an Error Prone check that flagged when too many arguments were being passed to a printf-like function in Guava that accepted only %s (and no other printf specifiers). The Error Prone team received weekly “Not useful” bug reports claiming the analysis was incorrect because the number of format specifiers matched the number of arguments—all due to users trying to pass specifiers other than %s. After the team changed the diagnostic text to state directly that the function accepts only the %s placeholder, the influx of bug reports stopped. Improving the message produced by an analysis provides an explanation of what is wrong, why, and how to fix it exactly at the point where that is most relevant and can make the difference for developers learning something when they read the message. 198 | 199 | 有时修复非常简单,例如更新分析器输出的消息文本。 我们曾经推出了一个容易出错的检查,当太多参数被传递给Guava中的类似printf的函数时,该检查只接受%s(并且不接受其他printf说明符)。Error Prone团队每周都会收到“无用”的错误报告,声称分析不正确,因为格式说明符的数量与参数的数量相匹配——所有这些都是由于用户试图传递除 %s 之外的说明符。在团队将诊断文本更改为直接声明该函数仅接受 %s 占位符后,错误报告的大量涌入便停止了。 改进分析产生的信息可以提供错误点的解释、原因以及如何精确修复,这在最相关的时刻可以为开发者提供学习的机会。 200 | 201 | ### Suggested Fixes 建议的修复 202 | 203 | Tricorder checks also, when possible, *provide fixes*, as shown in Figure 20-2. 204 | 205 | Tricorder 检查也会在可能的情况下提供修复方案,如图 20-2 所示。 206 | 207 | ![Figure 20-2](./images/Figure%2020-2.png) 208 | 209 | *Figure 20-2. View of an example static analysis fix in Critique* 图20-2. Critique中静态分析修复的例子视图 210 | 211 | Automated fixes serve as an additional documentation source when the message is unclear and, as mentioned earlier, reduce the cost to addressing static analysis issues. Fixes can be applied directly from within Critique, or over an entire code change via a command-line tool. Although not all analyzers provide fixes, many do. We take the approach that *style* issues in particular should be fixed automatically; for example, by formatters that automatically reformat source code files. Google has style guides for each language that specify formatting issues; pointing out formatting errors is not a good use of a human reviewer’s time. Reviewers click “Please Fix” thousands of times per day, and authors apply the automated fixes approximately 3,000 times per day. And Tricorder analyzers received “Not useful” clicks 250 times per day. 212 | 213 | 当反馈消息不清晰时,自动修复可作为额外的文档来源,并且可以降低解决静态分析问题的成本。 修复可以直接应用Critique中,也可以通过命令行工具应用于整个代码更改。并非所有分析器都提供修复,但很多都有。 我们的做法是,优先自动修复样式问题, 例如,通过自动重新格式化源代码文件的格式化程序。 Google 有每种语言的风格指南,规定了各种语言的格式,但指出格式错误并不能很好地利用审阅者的时间。审核者每天点击数千次“请修复”,作者每天应用自动修复大约3,000次,Tricorder分析器每天收到250次“无用”点击 214 | 215 | ### Per-Project Customization 按项目定制 216 | 217 | After we had built up a foundation of user trust by showing only high-confidence analysis results, we added the ability to run additional “optional” analyzers to specific projects in addition to the on-by-default ones. The *Proto Best Practices* analyzer is an example of an optional analyzer. This analyzer highlights potentially breaking data format changes to [protocol buffers](https://developers.google.com/protocol-buffers)—Google’s language-independent data serialization format. These changes are only breaking when serialized data is stored somewhere (e.g., in server logs); protocol buffers for projects that do not have stored serialized data do not need to enable the check. We have also added the ability to customize existing analyzers, although typically this customization is limited, and many checks are applied by default uniformly across the codebase. 218 | 219 | 在通过仅显示高置信度分析结果建立用户信任基础后,除了默认启用的分析器之外,我们还添加了对特定项目运行其他“可选”分析器的能力。 比如Proto Best Practices 分析器,此分析器突出显示潜在的破坏性数据协议缓冲区的格式更改——Google 的独立于语言的数据序列化格式。只有当序列化的数据存储在某个地方(例如,在服务器日志中)时,这些更改才会中断;没有存储序列化数据的项目的协议缓冲区不需要启用检查。我们还添加了自定义现有分析器的功能,尽管这种自定义功能很有限,并且默认情况下,许多检查在代码库中统一应用。 220 | 221 | Some analyzers have even started as optional, improved based on user feedback, built up a large userbase, and then graduated into on-by-default status as soon as we could capitalize on the user trust we had built up. For example, we have an analyzer that suggests Java code readability improvements that typically do not actually change code behavior. Tricorder users initially worried about this analysis being too “noisy,” but eventually wanted more analysis results available. 222 | 223 | 一些分析器甚至一开始是可选的,根据用户反馈进行改进,建立了庞大的用户群,然后一旦我们可以利用我们建立的用户信任,就进入默认状态。例如,我们有一个分析器,它建议 Java 代码可读性改进,这些改进通常不会真正改变代码行为。Tricorder用户最初担心这种分析过于“嘈杂”,但最终希望获得更多的分析结果。 224 | 225 | The key insight to making this customization successful was to focus on *project-level* *customization, not user-level customization*. Project-level customization ensures that all team members have a consistent view of analysis results for their project and prevents situations in which one developer is trying to fix an issue while another developer introduces it. 226 | 227 | 这种定制成功的关键是专注于项目定制,而不是用户级定制。项目级定制确保所有团队成员对其项目的分析结果有一致的看法,并减少一个开发人员试图解决问题而需要另一位开发人员介绍的情况。 228 | 229 | Early on in the development of Tricorder, a set of relatively straightforward style checkers (“linters”) displayed results in Critique, and Critique provided user settings to choose the confidence level of results to display and suppress results from specific analyses. We removed all of this user customizability from Critique and immediately started getting complaints from users about annoying analysis results. Instead of reenabling customizability, we asked users why they were annoyed and found all kinds of bugs and false positives with the linters. For example, the C++ linter also ran on Objective-C files but produced incorrect, useless results. We fixed the linting infrastructure so that this would no longer happen. The HTML linter had an extremely high false-positive rate with very little useful signal and was typically suppressed from view by developers writing HTML. Because the linter was so rarely helpful, we just disabled this linter. In short, user customization resulted in hidden bugs and suppressing feedback. 230 | 231 | Tricorder开发的早期,Critique展示了一组相对简单的样式检查器(“linter”),Critique提供了用户设置来选择结果的置信度以显示和抑制来自特定分析的结果。我们从 Critique 中删除了所有这些用户可定制性,并立即开始收到用户对烦人的分析结果的投诉。我们没有重新启用可定制性,而是询问用户为什么他们感到恼火,并发现 linter 存在各种错误和误报。例如,C++ linter 也在 Objective-C 文件上运行,但产生了不正确、无用的结果。我们修复了 linting 基础设施,这样就不会再发生这种情况了。 HTML linter 的误报率非常高,有用的信号很少,并且通常被编写 HTML 的开发人员禁止查看。因为 linter 很少有帮助,所以我们只是禁用了这个 linter。简而言之,用户定制导致隐藏的错误和抑制反馈。 232 | 233 | ### Presubmits 预提交 234 | 235 | In addition to code review, there are also other workflow integration points for static analysis at Google. Because developers can choose to ignore static analysis warnings displayed in code review, Google additionally has the ability to add an analysis that blocks committing a pending code change, which we call a *presubmit check*. Presubmit checks include very simple customizable built-in checks on the contents or metadata of a change, such as ensuring that the commit message does not say “DO NOT SUBMIT” or that test files are always included with corresponding code files. Teams can also specify a suite of tests that must pass or verify that there are no Tricorder issues for a particular category. Presubmits also check that code is well formatted. Presubmit checks are typically run when a developer mails out a change for review and again during the commit process, but they can be triggered on an ad hoc basis in between those points. See Chapter 23 for more details on presubmits at Google. 236 | 237 | 除了代码审查之外, Google 还有其他用于静态分析的工作流集成点。由于开发人员可以选择忽略代码审查中显示的静态分析警告, Google 还可以添加一个分析来阻止提交待处理的代码更改,我们称之为*预提交检查*。预提交检查包括对更改的内容或元数据的非常简单的可定制的内置检查,例如确保提交消息不包含“不要提交”或测试文件始终包含在相应的代码文件中。团队还可以指定一组测试,这些测试必须通过或验证特定类别没有 Tricorder 问题。预提交还会检查代码是否格式正确。预提交检查通常在开发者提交变更以供审查时执行,并且在提交过程中再次执行,但它们也可以在这些时间点之间按需触发。有关 Google 预提交的更多详细信息,请参阅第 23 章。 238 | 239 | Some teams have written their own custom presubmits. These are additional checks on top of the base presubmit set that add the ability to enforce higher best-practice standards than the company as a whole and add project-specific analysis. This enables new projects to have stricter best-practice guidelines than projects with large amounts of legacy code (for example). Team-specific presubmits can make the large- scale change (LSC) process (see Chapter 22) more difficult, so some are skipped for changes with “CLEANUP=” in the change description. 240 | 241 | 一些团队已经编写了自己的定制预提交。这些是在基础预提交检查之上的额外检查,它们能够强化比公司整体更高的最佳实践标准,并增加项目特定的分析。这使得新项目比拥有大量遗留代码的项目(例如)拥有更严格的最佳实践指南。团队特定的预提交会使大规模变更 (LSC) 过程(参见第 22 章)更加复杂,因此在变更描述中带有“CLEANUP=”的变更会被跳过。 242 | 243 | ### Compiler Integration 编译器集成 244 | 245 | Although blocking commits with static analysis is great, it is even better to notify developers of problems even earlier in the workflow. When possible, we try to push static analysis into the compiler. Breaking the build is a warning that is not possible to ignore, but is infeasible in many cases. However, some analyses are highly mechanical and have no effective false positives. An example is [Error Prone “ERROR” checks](https://errorprone.info/bugpatterns). These checks are all enabled in Google’s Java compiler, preventing instances of the error from ever being introduced again into our codebase. Compiler checks need to be fast so that they don’t slow down the build. In addition, we enforce these three criteria (similar criteria exist for the C++ compiler): 246 | 247 | - Actionable and easy to fix (whenever possible, the error should include a suggested fix that can be applied mechanically) 248 | - Produce no effective false positives (the analysis should never stop the build for correct code) 249 | - Report issues affecting only correctness rather than style or best practices 250 | 251 | 尽管使用静态分析来阻止提交是很好的做法,但最好能在工作流程的更早阶段通知开发者问题。 如果可以的话,我们会尝试将静态分析推送到编译器中。 构建失败是一个不可忽视的警告,但在许多情况下是不可行的。然而,有些分析非常机械死板,没有有效的误报。 一个例子是Error Prone的“ERROR”检查, 这些检查都在 Google 的 Java 编译器中启用,防止错误实例再次被引入我们的代码库, 编译器检查需要快速,以免拖慢构建速度。此外,我们强制执行这三个标准(C++ 编译器也存在类似的标准): 252 | 253 | - 可操作且易于修复(只要可能,错误应包括可自动应用的建议修复) 254 | - 不产生有效的误报(分析永远不应因正确的代码而中断构建) 255 | - 报告仅影响正确性而非风格或最佳实践的问题 256 | 257 | To enable a new check, we first need to clean up all instances of that problem in the codebase so that we don’t break the build for existing projects just because the compiler has evolved. This also implies that the value in deploying a new compiler-based check must be high enough to warrant fixing all existing instances of it. Google has infrastructure in place for running various compilers (such as clang and javac) over the entire codebase in parallel via a cluster—as a MapReduce operation. When compilers are run in this MapReduce fashion, the static analysis checks run must produce fixes in order to automate the cleanup. After a pending code change is prepared and tested that applies the fixes across the entire codebase, we commit that change and remove all existing instances of the problem. We then turn the check on in the compiler so that no new instances of the problem can be committed without breaking the build. Build breakages are caught after commit by our Continuous Integration (CI) system, or before commit by presubmit checks (see the earlier discussion). 258 | 259 | 要启用新的检查,我们首先需要清理代码库中该问题的所有实例,以确保我们不会仅因为编译器的更新而破坏现有项目的构建。这也意味着部署新的基于编译器的检查必须有足够的价值,以证明修复所有现有实例是合理的。Google 有基础设施,可以通过集群在整个代码库上并行运行各种编译器(例如 clang 和 javac)——作为 MapReduce 操作。当编译器以这种 MapReduce 方式运行时,运行的静态分析检查必须产生修复以自动进行清理。在准备好并测试了在整个代码库中应用修复的待处理代码更改后,我们提交该更改并删除所有现有的问题实例。然后我们在编译器中打开检查,这样就不会在不破坏构建的情况下提交问题的新实例。在我们的持续集成 (CI) 系统提交之后,或者在提交之前通过预提交检查(参见前面的讨论)捕获构建损坏。 260 | 261 | We also aim to never issue compiler warnings. We have found repeatedly that developers ignore compiler warnings. We either enable a compiler check as an error (and break the build) or don’t show it in compiler output. Because the same compiler flags are used throughout the codebase, this decision is made globally. Checks that can’t be made to break the build are either suppressed or shown in code review (e.g., through Tricorder). Although not every language at Google has this policy, the most frequently used ones do. Both of the Java and C++ compilers have been configured to avoid displaying compiler warnings. The Go compiler takes this to extreme; some things that other languages would consider warnings (such as unused variables or package imports) are errors in Go. 262 | 263 | 我们也旨在绝不发出编译器警告,但是我们不断的发现开发人员会忽略编译器警告,要么启用编译器检查作为错误(并中断构建),要么不在编译器输出中显示它。因为在整个代码库中使用相同的编译器标志,因此这一决定是全局性的。无法破坏构建的检查要么被抑制,要么在代码审查中显示(例如,通过 Tricorder)。尽管并非 Google 的所有语言都有此策略,但最常用的语言都有。Java 和 C++ 编译器都已配置为避免显示编译器警告,Go 编译器将这一点做的很好,因为在其他语言中会考虑警告的一些事情(例如未使用的变量或包导入),在 Go 中是错误的。 264 | 265 | ### Analysis While Editing and Browsing Code 编辑和浏览代码时分析 266 | 267 | Another potential integration point for static analysis is in an integrated development environment (IDE). However, IDE analyses require quick analysis times (typically less than 1 second and ideally less than 100 ms), and so some tools are not suitable to integrate here. In addition, there is the problem of making sure the same analysis runs identically in multiple IDEs. We also note that IDEs can rise and fall in popularity (we don’t mandate a single IDE); hence IDE integration tends to be messier than plugging into the review process. Code review also has specific benefits for displaying analysis results. Analyses can take into account the entire context of the change; some analyses can be inaccurate on partial code (such as a dead code analysis when a function is implemented before adding callsites). Showing analysis results in code review also means that code authors have to convince reviewers as well if they want to ignore analysis results. That said, IDE integration for suitable analyses is another great place to display static analysis results. 268 | 269 | 静态分析的另一个集成点是集成开发环境 (IDE)。但是,IDE 分析需要快速的分析时间(通常小于 1 秒,理想情况下小于 100 毫秒),因此某些工具不适合在这里集成,此外,还存在确保相同分析在多个 IDE 中以相同方式运行的问题。我们还发现 IDE 的受欢迎程度可能会上升或下降(我们不强制要求单一的 IDE),因此 IDE 集成往往比插入审查过程更混乱。代码审查还具有显示分析结果的特定好处。分析可以考虑变更的整个背景,某些对部分代码点分析可能不准确(例如,在添加调用点之前实现函数时的死代码分析)。在代码审查中显示分析结果也意味着如果代码作者想忽略分析结果,他们也需要说服审查者。也就是说,IDE集成进行适当的分析是显示静态分析结果的一个不错的集成点。 270 | 271 | Although we mostly focus on showing newly introduced static analysis warnings, or warnings on edited code, for some analyses, developers actually do want the ability to view analysis results over the entire codebase during code browsing. An example of this are some security analyses. Specific security teams at Google want to see a holistic view of all instances of a problem. Developers also like viewing analysis results over the codebase when planning a cleanup. In other words, there are times when showing results when code browsing is the right choice. 272 | 273 | 尽管我们主要关注显示新引入的静态分析警告或编辑代码的警告,但对于某些分析,开发人员实际上确实希望能够在代码浏览期间查看整个代码库的分析结果。这方面的例子是一些安全分析。 Google 的特定安全团队希望查看所有问题实例的整体视图。开发人员还喜欢在计划清理时通过代码库查看分析结果。换句话说,在浏览代码的时候显示结果是正确的选择。 274 | 275 | ## Conclusion 总结 276 | 277 | Static analysis can be a great tool to improve a codebase, find bugs early, and allow more expensive processes (such as human review and testing) to focus on issues that are not mechanically verifiable. By improving the scalability and usability of our static analysis infrastructure, we have made static analysis an effective component of software development at Google. 278 | 279 | 静态分析是一个很好的工具,可以改进代码库,尽早发现错误,并允许成本更高的过程(如人工审查和测试)聚焦在无法通过机器方式验证的问题。通过提高静态分析基础设施的可扩展性和可用性,我们使静态分析成为 Google 软件开发的有效组成部分。 280 | 281 | ## 内容提要 282 | 283 | - *Focus on developer happiness*. We have invested considerable effort in building feedback channels between analysis users and analysis writers in our tools, and aggressively tune analyses to reduce the number of false positives. 284 | - *Make static analysis part of the core developer workflow*. The main integration point for static analysis at Google is through code review, where analysis tools provide fixes and involve reviewers. However, we also integrate analyses at additional points (via compiler checks, gating code commits, in IDEs, and when browsing code). 285 | - *Empower users to contribute*. We can scale the work we do building and maintaining analysis tools and platforms by leveraging the expertise of domain experts. Developers are continuously adding new analyses and checks that make their lives easier and our codebase better. 286 | 287 | - **关注开发者的幸福感**。我们投入了大量精力,在我们的工具中建立分析用户和作者之间的反馈渠道,并积极调整分析以减少误报的数量。 288 | - **将静态分析作为核心开发人员工作流程的一部分**。 Google 静态分析的主要集成点是通过代码评审,在这里,分析工具提供修复并让评审人员参与。然而,我们也在其他方面(通过编译器检查、选通代码提交、在IDE中以及在浏览代码时)集成分析。 289 | - **授权用户做出贡献**。通过利用领域专家的专业知识,我们可以扩展构建和维护分析工具和平台的工作。开发人员不断添加新的分析和检查,使他们的生活更轻松,使我们的代码库更好。 290 | -------------------------------------------------------------------------------- /zh-cn/Chapter-15_Deprecation/Chapter-15_Deprecation.md: -------------------------------------------------------------------------------- 1 | **CHAPTER 15** 2 | 3 | # Deprecation 4 | 5 | # 第十五章 弃用 6 | 7 | **Written by Hirum Wright** 8 | 9 | **Edited by Tom Manshlake** 10 | 11 | I love deadlines. I like the whooshing sound they make as they fly by. —Douglas Adams 12 | 13 | 我喜欢万事都有一个截止日期。我喜欢它们飞过时发出的嗖嗖声。 14 | 15 | ——道格拉斯·亚当斯 如是说。 16 | 17 | All systems age. Even though software is a digital asset and the physical bits themselves don’t degrade, new technologies, libraries, techniques, languages, and other environmental changes over time render existing systems obsolete. Old systems require continued maintenance, esoteric expertise, and generally more work as they diverge from the surrounding ecosystem. It’s often better to invest effort in turning off obsolete systems, rather than letting them lumber along indefinitely alongside the systems that replace them. But the number of obsolete systems still running suggests that, in practice, doing so is not trivial. We refer to the process of orderly migration away from and eventual removal of obsolete systems as deprecation. 18 | 19 | 所有系统都会老化。虽说软件是一种数字资产,它的字节位本身不会有任何退化。但随着时间的推移,新技术、库、 技术、语言和其他环境变化,都有可能使现有的系统过时。旧系统需要持续维护、深奥的专业知识,通常需要花费 更多的精力,因为它们与周遭的生态略有不同。投入些精力弃用掉过时的系统通常是个不错的选项,让它们无限 期地与它的替代者共存通常不是明智的选择。从实践的角度出发,那些仍在运行的大量过时系统无不表明,弃用掉过时系统所带来的收益并非微不足道。我们将有序迁移并最终移除过时系统的过程称为弃废。 20 | 21 | Deprecation is yet another topic that more accurately belongs to the discipline of software engineering than programming because it requires thinking about how to manage a system over time. For long-running software ecosystems, planning for and executing deprecation correctly reduces resource costs and improves velocity by removing the redundancy and complexity that builds up in a system over time. On the other hand, poorly deprecated systems may cost more than leaving them alone. While deprecating systems requires additional effort, it’s possible to plan for deprecation during the design of the system so that it’s easier to eventually decommission and remove it. Deprecations can affect systems ranging from individual function calls to entire software stacks. For concreteness, much of what follows focuses on code-level deprecations. 22 | 23 | “弃用”,它更严格地属于软件工程领域,而非编程。因为它需要考虑如何随着时间的推移来管理系统。对于长期运行的软件生态系统,正确规划执行“弃用”,可以通过消除系统中随时间累积产生的冗 余、复杂性等,来降低资源成本并提高速度。另一方面,不推荐使用的系统可能比不理会它们的成本更高。虽然 “弃用”系统需要额外花费精力,但可以考虑在系统设计期间有计划地“弃用”,可以更容易地实现彻底停用并删除弃用的系统。“弃用”的影响范围可大可小,小到单个函数,大到整个软件生态。接下来的大部分内容我们都将集中在代码层级“弃用”上。 24 | 25 | Unlike with most of the other topics we have discussed in this book, Google is still learning how best to deprecate and remove software systems. This chapter describes the lessons we’ve learned as we’ve deprecated large and heavily used internal systems. Sometimes, it works as expected, and sometimes it doesn’t, but the general problem of removing obsolete systems remains a difficult and evolving concern in the industry. 26 | 27 | 与我们在本书中讨论的大多数章节不同,Google 仍在学习如何最好地“弃用”和删除软件系统。本章主要介绍我们在 “弃用”大型和大量使用的内部系统时学到的经验教训。有时,它能符合预期,有时则不会。毕竟移除过时系统的普 遍问题,仍然是行业中一个困难且须不断探索的问题。 28 | 29 | This chapter primarily deals with deprecating technical systems, not end-user products. The distinction is somewhat arbitrary given that an external-facing API is just another sort of product, and an internal API may have consumers that consider themselves end users. Although many of the principles apply to turning down a public product, we concern ourselves here with the technical and policy aspects of deprecating and removing obsolete systems where the system owner has visibility into its use. 30 | 31 | 本章主要从技术层面讲“弃用”,而不是从产品层面。考虑到面向外部的 API 也算另一种产品,而内部 API 通常是自产自销,因此这种区别有些武断。尽管许多原则也适用于对外产品,但我们在这里关注的是“弃用”和删除过时 的内部系统的技术和策略方面的问题。 32 | 33 | ## Why Deprecate? 为什么要“弃用” ? 34 | 35 | Our discussion of deprecation begins from the fundamental premise that code is a liability, not an asset. After all, if code were an asset, why should we even bother spending time trying to turn down and remove obsolete systems? Code has costs, some of which are borne in the process of creating a system, but many other costs are borne as a system is maintained across its lifetime. These ongoing costs, such as the operational resources required to keep a system running or the effort to continually update its codebase as surrounding ecosystems evolve, mean that it’s worth evaluating the trade-offs between keeping an aging system running or working to turn it down. 36 | 37 | 我们对“弃用”的讨论始于这样一个基本前提,即代码是一种负债,而不是一种资产。毕竟,如果代码是一种资产, 我们为什么还要费心去尝试“弃用”它呢? 代码有成本,其中有开发成本,但更多的是维护成本。这些持续的成本, 例如保持系统运行所需的运营资源或紧跟周围生态而不断更新迭代花费的精力,意味着你需要在继续维护老化的系统运行和将其下线之间做一个权衡。 38 | 39 | The age of a system alone doesn’t justify its deprecation. A system could be finely crafted over several years to be the epitome of software form and function. Some software systems, such as the LaTeX typesetting system, have been improved over the course of decades, and even though changes still happen, they are few and far between. Just because something is old, it does not follow that it is obsolete. 40 | 41 | “弃用”并不能简单地用项目的年限来定夺。一个系统可以经过数年精心打造,才能稳定成熟。一些软件系统,比如 LaTeX 排版系统,经过几十年的改进,虽然它仍在发生变化,但已经趋向稳定了。系统老旧并不意味着它过时了。 42 | 43 | Deprecation is best suited for systems that are demonstrably obsolete and a replacement exists that provides comparable functionality. The new system might use resources more efficiently, have better security properties, be built in a more sustainable fashion, or just fix bugs. Having two systems to accomplish the same thing might not seem like a pressing problem, but over time, the costs of maintaining them both can grow substantially. Users may need to use the new system, but still have dependencies that use the obsolete one. 44 | 45 | “弃用”最适合那些明显过时的系统,并且存在提供类似功能的替代品。新系统可能更有效地使用资源,具有更好的安全属性,以更可持续的方式构建,或者只是修复错误。拥有两个系统来完成同一件事似乎不是一个紧迫的问题, 但随着时间的推移,维护它们的成本会大幅增加。用户可能需要使用新系统,但仍然依赖于使用过时的系统。 46 | 47 | with the old one. Spending the effort to remove the old system can pay off as the replacement system can now evolve more quickly. The two systems might need to interface with each other, requiring complicated transformation code. As both systems evolve, they may come to depend on each other, making eventual removal of either more difficult. In the long run, we’ve discovered that having multiple systems performing the same function also impedes the evolution of the newer system because it is still expected to maintain compatibility with the old one. Spending the effort to remove the old system can pay off as the replacement system can now evolve more quickly. 48 | 49 | 这两个系统可能需要相互连接,需要复杂的转换代码。随着这两个系统的发展,它们可能会相互依赖,从而使最终消除其中任何一个变得更加困难。从长远来看,我们发现让多个系统执行相同的功能也会阻碍新系统的发展, 因为它仍然需要与旧的保持兼容性。由于替换系统现在可以更快地发展,因此花费精力移除旧系统会有相关的收 益。 50 | 51 | ----- 52 | Earlier we made the assertion that “code is a liability, not an asset.” If that is true, why have we spent most of this book discussing the most efficient way to build software systems that can live for decades? Why put all that effort into creating more code when it’s simply going to end up on the liability side of the balance sheet?Code itself doesn’t bring value: it is the functionality that it provides that brings value. That functionality is an asset if it meets a user need: the code that implements this functionality is simply a means to that end. If we could get the same functionality from a single line of maintainable, understandable code as 10,000 lines of convoluted spaghetti code, we would prefer the former. Code itself carries a cost—the simpler the code is, while maintaining the same amount of functionality, the better.Instead of focusing on how much code we can produce, or how large is our codebase, we should instead focus on how much functionality it can deliver per unit of code and try to maximize that metric. One of the easiest ways to do so isn’t writing more code and hoping to get more functionality; it’s removing excess code and systems that are no longer needed. Deprecation policies and procedures make this possible. 53 | 54 | 前面,我们断言“代码是一种负债,而不是一种资产”。如果这是真的,为什么我们用本书的大部分时间来讨论构建 可以存活数十年的软件系统的最有效方法?当它最终会出现在资产负债表的负债方时,为什么还要付出所有努力来 创建更多代码呢?代码本身不会带来价值:它提供的功能带来了价值。如果该功能满足用户需求,那么它就是一种 资产:实现此功能的代码只是实现该目的的一种手段。如果我们可以从一行可维护、可理解的代码中获得与 10,000 行错综复杂的意大利面条式代码相同的功能,我们会更喜欢前者。代码本身是有成本的——代码越简单,同 时保持相同数量的功能越好。与其关注我们可以生产多少代码,或者我们的代码库有多大,我们应该关注每单位代 码可以提供多少功能,并尝试最大化该指标。最简单的方法之一就是不要编写更多代码并希望获得更多功能;而是 删除不再需要的多余代码和系统。“弃用”策略的存在就是为了解决这个问题。 55 | 56 | ----- 57 | 58 | Even though deprecation is useful, we’ve learned at Google that organizations have a limit on the amount of deprecation work that is reasonable to undergo simultaneously, from the aspect of the teams doing the deprecation as well as the customers of those teams. For example, although everybody appreciates having freshly paved roads, if the public works department decided to close down every road for paving simultaneously, nobody would go anywhere. By focusing their efforts, paving crews can get specific jobs done faster while also allowing other traffic to make progress. Likewise, it’s important to choose deprecation projects with care and then commit to following through on finishing them. 59 | 60 | 尽管“弃用”很有用,但我们在 Google 了解到,从执行“弃用”的团队以及这些团队的客户的角度来看,对同时进行的‘弃用’工作量是有限制的。例如,虽然每个人都喜欢新铺设的道路,但如果政府部门决定同时关闭所有道路并进行铺设,那么将会导致大家无路可走。通过集中精力,铺设人员可以更快地完成特定工作,但同时不应该影响其他道路的通行。故同样重要的是要谨慎选择“弃用”项目并付诸实施。 61 | 62 | ## 为什么“弃用”这么难(Why Is Deprecation So Hard?) 63 | 64 | We’ve mentioned Hyrum’s Law elsewhere in this book, but it’s worth repeating its applicability here: the more users of a system, the higher the probability that users are using it in unexpected and unforeseen ways, and the harder it will be to deprecate and remove such a system. Their usage just “happens to work” instead of being “guaranteed to work.” In this context, removing a system can be thought of as the ultimate change: we aren’t just changing behavior, we are removing that behavior completely! This kind of radical alteration will shake loose a number of unexpected dependents. 65 | 66 | 我们在本书的其他地方提到了海勒姆定律,但值得在这里重申一下它的适用性:一个系统的用户越多,用户以意外和不可预见的方式使用它的可能性就越大,并且越难“弃用”和删除这样的系统。它们有可能只是“碰巧可用”而不是 “绝对可用”。在这种情况下,“弃用”不是简单的行为变更,而是一次大变革-彻底的“弃用”! 这种激进的改变可能会对这样的系统造成意想不到的影响。 67 | 68 | To further complicate matters, deprecation usually isn’t an option until a newer system is available that provides the same (or better!) functionality. The new system might be better, but it is also different: after all, if it were exactly the same as the obsolete system, it wouldn’t provide any benefit to users who migrate to it (though it might benefit the team operating it). This functional difference means a one-to-one match between the old system and the new system is rare, and every use of the old system must be evaluated in the context of the new one. 69 | 70 | 更复杂的是,在提供相同(或更好)功能的新系统可用之前,“弃用”通常不是一种选择。新系统可能更好,但也有不同:毕竟,如果它和过时的系统完全一样,它不会为迁移到它的用户提供任何好处(尽管它可能使运行它的团队受益)。这种功能差异意味着旧系统和新系统之间的一对一匹配很少见,新老系统的切换通常需要进行评估。 71 | 72 | Another surprising reluctance to deprecate is emotional attachment to old systems, particularly those that the deprecator had a hand in helping to create. An example of this change aversion happens when systematically removing old code at Google: we’ve occasionally encountered resistance of the form “I like this code!” It can be difficult to convince engineers to tear down something they’ve spent years building. This is an understandable response, but ultimately self-defeating: if a system is obsolete, it has a net cost on the organization and should be removed. One of the ways we’ve addressed concerns about keeping old code within Google is by ensuring that the source code repository isn’t just searchable at trunk, but also historically. Even code that has been removed can be found again (see Chapter 17). 73 | 74 | 另一个令人惊讶的不愿“弃用”的现象是对旧系统的情感依恋,尤其是那些“弃用”者帮助创建的系统。在 Google 系统 地删除旧代码时,就会发生这种厌恶更改的一个例子:我们偶尔会遇到“我喜欢这段代码!”这种形式的抵制。说服工程师删除他们花了多年时间建造的东西可能很困难。这是一种可以理解的反应,但最终会弄巧成拙:如果一个系统已经过时,它会给组织带来净成本,应该将其删除。我们解决了将旧代码保留在 Google 中的问题的方法之一是确保源代码存储库不仅可以在主干上搜索,而且可以在历史上搜索。甚至被删除的代码也能再次找到 (见 17 章) 75 | 76 | ----- 77 | 78 | There’s an old joke within Google that there are two ways of doing things: the one that’s deprecated, and the one that’s not-yet-ready. This is usually the result of a new solution being “almost” done and is the unfortunate reality of working in a technological environment that is complex and fast-paced.Google engineers have become used to working in this environment, but it can still be disconcerting. Good documentation, plenty of signposts, and teams of experts helping with the deprecation and migration process all make it easier to know whether you should be using the old thing, with all its warts, or the new one, with all its uncertainties. 79 | 80 | 谷歌内部有一个古老的笑话,说有两种做事方式:一种已被“弃用”,另一种尚未准备就绪。这通常发生新解决方案“几乎”完成的时候,并且是在复杂且快节奏的技术环境中工作的不幸现实。谷歌工程师已经习惯了在这种环境中工作,但它仍然令人不安。良好的文档、大量的指引以及帮助“弃用”和迁移过程的专家团队,都能帮助你更容易地判断是使用旧的,有缺点,还是新的,有不确定性的。 81 | 82 | ----- 83 | 84 | Finally, funding and executing deprecation efforts can be difficult politically; staffing a team and spending time removing obsolete systems costs real money, whereas the costs of doing nothing and letting the system lumber along unattended are not readily observable. It can be difficult to convince the relevant stakeholders that deprecation efforts are worthwhile, particularly if they negatively impact new feature development. Research techniques, such as those described in Chapter 7, can provide concrete evidence that a deprecation is worthwhile. 85 | 86 | 最后,资助和执行“弃用”工作在政治上可能很困难;为团队配备人员并花时间移除过时的系统会花费大量金钱,而无所作为和让系统在无人看管的情况下缓慢运行的成本不易观察到。很难让相关利益相关者相信“弃用”工作是值得 的,尤其是当它们(弃用工作)对新功能开发产生负面影响时。研究技术,例如第七章中描述的那些,可以提供具体的证据证明“弃用”是值得的。 87 | 88 | Given the difficulty in deprecating and removing obsolete software systems, it is often easier for users to evolve a system in situ, rather than completely replacing it. Incrementality doesn’t avoid the deprecation process altogether, but it does break it down into smaller, more manageable chunks that can yield incremental benefits. Within Google, we’ve observed that migrating to entirely new systems is extremely expensive, and the costs are frequently underestimated. Incremental deprecation efforts accomplished by in-place refactoring can keep existing systems running while making it easier to deliver value to users. 89 | 90 | 鉴于“弃用”和删除过时软件系统的难度,用户通常更容易就地改进系统,而不是完全替换它。增量并没有完全避免“弃用”过程,但它确实将其分解为更小、更易于管理的块,这些块可以产生增量收益。在 Google 内部,我们观察到迁移到全新系统的成本非常高,而且成本经常被低估。增量“弃用”工作通过就地重构实现的功能可以保持现有系统运行,同时更容易向用户交付价值。 91 | 92 | ### Deprecation During Design 设计之初便考虑“弃用” 93 | 94 | Like many engineering activities, deprecation of a software system can be planned as those systems are first built. Choices of programming language, software architecture, team composition, and even company policy and culture all impact how easy it will be to eventually remove a system after it has reached the end of its useful life. 95 | 96 | 与许多工程活动一样,软件系统的“弃用”可以在这些系统首次设计时便进行规划。编程语言、软件架构、团队组成, 甚至公司策略和文化的选择都会影响系统在使用寿命结束后最终将其“弃用”的难易程度。 97 | 98 | The concept of designing systems so that they can eventually be deprecated might be radical in software engineering, but it is common in other engineering disciplines. Consider the example of a nuclear power plant, which is an extremely complex piece of engineering. As part of the design of a nuclear power station, its eventual decommissioning after a lifetime of productive service must be taken into account, even going so far as to allocate funds for this purpose.[^1] Many of the design choices in building a nuclear power plant are affected when engineers know that it will eventually need to be decommissioned. 99 | 100 | 设计系统以使其最终可以被“弃用”的概念在软件工程中可能是激进的,但它在其他工程学科中很常见。以核电站为例,这是一项极其复杂的工程。作为核电站设计的一部分,必须考虑到其在服务寿命到期后最终退役,甚至为此分配资金。当工程师知道它最终需要退役时,核电站建设中的许多设计,将会随之改变。 101 | 102 | Unfortunately, software systems are rarely so thoughtfully designed. Many software engineers are attracted to the task of building and launching new systems, not maintaining existing ones. The corporate culture of many companies, including Google, emphasizes building and shipping new products quickly, which often provides a disincentive for designing with deprecation in mind from the beginning. And in spite of the popular notion of software engineers as data-driven automata, it can be psychologically difficult to plan for the eventual demise of the creations we are working so hard to build. 103 | 104 | 不幸的是,软件系统很少经过精心设计。许多软件工程师更热心于构建和启动新系统,而不是维护现有系统。包括 Google 在内的许多公司的企业文化都强调快速构建和交付新产品,这通常会阻碍从一开始就考虑“弃用”的设计。尽管普遍认为软件工程师是数据驱动的自动机,但在心理上很难为我们辛勤工作的创造物的最终消亡做计划。 105 | 106 | So, what kinds of considerations should we think about when designing systems that we can more easily deprecate in the future? Here are a couple of the questions we encourage engineering teams at Google to ask: 107 | 108 | - How easy will it be for my consumers to migrate from my product to a potential replacement? 109 | - How can I replace parts of my system incrementally? 110 | 111 | 那么,在设计我们将来更容易“弃用”的系统时,我们应该考虑哪些因素?以下是我们鼓励 Google 的工程团队提出的几个问题: 112 | 113 | - 我的使用者从我的产品迁移到潜在替代品的难易程度如何? 114 | - 如何逐步更换系统部件? 115 | 116 | Many of these questions relate to how a system provides and consumes dependencies. For a more thorough discussion of how we manage these dependencies, see Chapter 16. 117 | 118 | 其中许多问题与系统如何提供和使用依赖项有关。有关我们如何管理这些依赖项的更深入讨论,请参阅第 16 章。 119 | 120 | Finally, we should point out that the decision as to whether to support a project long term is made when an organization first decides to build the project. After a software system exists, the only remaining options are support it, carefully deprecate it, or let it stop functioning when some external event causes it to break. These are all valid options, and the trade-offs between them will be organization specific. A new startup with a single project will unceremoniously kill it when the company goes bankrupt, but a large company will need to think more closely about the impact across its portfolio and reputation as they consider removing old projects. As mentioned earlier, Google is still learning how best to make these trade-offs with our own internal and external products. 121 | 122 | 最后,我们应该指出,是否长期支持项目的决定,是在组织最初决定建立项目时做出的。软件系统存在后,剩下的唯一选择是支持它,小心地“弃用”它,或者在某些外部事件导致它崩溃时让它停止运行。这些都是有效的选项,它们之间的权衡将是特定于组织的。当公司破产时,一个只有一个项目的新创业公司会毫不客气地杀死它,但一家大公司在考虑删除旧项目时需要更仔细地考虑对其投资组合和声誉的影响。如前所述,谷歌仍在学习如何最好地利用我们自己的内部和外部产品进行这些权衡。 123 | 124 | In short, don’t start projects that your organization isn’t committed to support for the expected lifespan of the organization. Even if the organization chooses to deprecate and remove the project, there will still be costs, but they can be mitigated through planning and investments in tools and policy. 125 | 126 | 简而言之,如果你的公司不打算长期支持某个项目,那么轻易不要启动这个项目。即使公司选择“弃用”项目,仍然会有成本,但可以通过规划和投资工具和策略来降低成本。 127 | 128 | > [^1]: “Design and Construction of Nuclear Power Plants to Facilitate Decommissioning,” Technical Reports Series No. 382, IAEA, Vienna (1997). 129 | > 130 | > 1 "设计和建造核电站便捷退役",技术报告系列第382号,IAEA,维也纳(1997年)。 131 | 132 | ## Types of Deprecation “弃用”的类型 133 | 134 | Deprecation isn’t a single kind of process, but a continuum of them, ranging from “we’ll turn this off someday, we hope” to “this system is going away tomorrow, customers better be ready for that.” Broadly speaking, we divide this continuum into two separate areas: advisory and compulsory. 135 | 136 | “弃用”不是一种单一的过程,而是一个连续的过程,从“希望有一天我们能够关闭它”到“这个系统明天就会消失,客户最好为此做好准备。” 从广义上讲,我们将这个连续统一体分为两个独立的领域:建议和强制。 137 | 138 | ### 建议性“弃用” (Advisory Deprecation) 139 | 140 | Advisory deprecations are those that don’t have a deadline and aren’t high priority for the organization (and for which the company isn’t willing to dedicate resources). These could also be labeled aspirational deprecations: the team knows the system has been replaced, and although they hope clients will eventually migrate to the new system, they don’t have imminent plans to either provide support to help move clients or delete the old system. This kind of deprecation often lacks enforcement: we hope that clients move, but can’t force them to. As our friends in SRE will readily tell you: “Hope is not a strategy.” 141 | 142 | 建议性“弃用”是那些没有截止日期并且对组织来说不是高优先级的(并且公司不愿意为此投入资源)。这些也可能被标记为期望“弃用”:团队知道系统已被替换,尽管他们希望客户最终迁移到新系统,但他们没有近期的计划来提供支持以帮助客户迁移或删除旧系统。这种“弃用”往往缺乏执行力:我们希望客户迁移,但不强迫他们做。正如我们在 SRE 的朋友会很容易告诉你的那样:“希望不是策略。” 143 | 144 | Advisory deprecations are a good tool for advertising the existence of a new system and encouraging early adopting users to start trying it out. Such a new system should not be considered in a beta period: it should be ready for production uses and loads and should be prepared to support new users indefinitely. Of course, any new system is going to experience growing pains, but after the old system has been deprecated in any way, the new system will become a critical piece of the organization’s infrastructure. 145 | 146 | 建议性“弃用”是宣传新系统存在并鼓励早期采用的用户开始尝试的好工具。这样的新系统不应该在测试阶段被考虑:它应该准备好用于生产用途和负载,并且应该准备好无限期地支持新用户。当然,任何新系统都会经历成长的痛苦,但是在旧系统以任何方式被“弃用”之后,新系统将成为组织基础设施的关键部分。 147 | 148 | One scenario we’ve seen at Google in which advisory deprecations have strong benefits is when the new system offers compelling benefits to its users. In these cases, simply notifying users of this new system and providing them self-service tools to migrate to it often encourages adoption. However, the benefits cannot be simply incremental: they must be transformative. Users will be hesitant to migrate on their own for marginal benefits, and even new systems with vast improvements will not gain full adoption using only advisory deprecation efforts. 149 | 150 | 我们在谷歌看到的一种情况是,当新系统为其用户提供重大优势时,建议性“弃用”具有强大的好处。在这些情况下,简单地通知用户这个新系统并为他们提供自助服务工具以迁移到它,通常会鼓励采用。然而,改进不能简单地增量:它们必须具有变革性。否则用户将不愿为了这一点点边际收益而自行迁移,不过对于“建议性“弃用””,即使具有巨大改进的新系统也通常不会被完全采纳。 151 | 152 | Advisory deprecation allows system authors to nudge users in the desired direction, but they should not be counted on to do the majority of migration work. It is often tempting to simply put a deprecation warning on an old system and walk away without any further effort. Our experience at Google has been that this can lead to (slightly) fewer new uses of an obsolete system, but it rarely leads to teams actively migrating away from it. Existing uses of the old system exert a sort of conceptual (or technical) pull toward it: comparatively many uses of the old system will tend to pick up a large share of new uses, no matter how much we say, “Please use the new system.” The old system will continue to require maintenance and other resources unless its users are more actively encouraged to migrate. 153 | 154 | 建议性“弃用”允许系统作者将用户推向所需的方向,但不应指望他们完成大部分迁移工作。通常只需要在旧系统上简单地发出“弃用”警告,然后弃之不顾即可。我们在 Google 的经验是,这可能会导致(略微)减少对过时系统的使用, 但很少会导致团队积极迁移。旧系统的现有功能会有一种吸引力,吸引更多的系统使用它,无论我们说多少次,“请使用新的系统。” 除非更积极地鼓励其用户迁移,否则旧系统将需要继续维护。 155 | 156 | ### Compulsory Deprecation 强制性“弃用” 157 | 158 | This active encouragement comes in the form of compulsory deprecation. This kind of deprecation usually comes with a deadline for removal of the obsolete system: if users continue to depend on it beyond that date, they will find their own systems no longer work. 159 | 160 | 这种“弃用”通常伴随着删除过时系统的最后期限:如果用户在该日期之后继续依赖它,他们将发现自己的系统不再正常工作。 161 | 162 | Counterintuitively, the best way for compulsory deprecation efforts to scale is by localizing the expertise of migrating users to within a single team of experts—usually the team responsible for removing the old system entirely. This team has incentives to help others migrate from the obsolete system and can develop experience and tools that can then be used across the organization. Many of these migrations can be effected using the same tools discussed in Chapter 22. 163 | 164 | 与直觉相反,推广强制性“弃用”工作的最佳方法是将迁移用户的工作交给一个专家团队——通常是负责完全删除旧系统的团队。该团队有动力帮助其他人从过时的系统迁移,并可以开发可在整个组织中使用的经验和工具。许多这些迁移可以使用第 22 章中讨论的相同工具来实现。 165 | 166 | For compulsory deprecation to actually work, its schedule needs to have an enforcement mechanism. This does not imply that the schedule can’t change, but empower the team running the deprecation process to break noncompliant users after they have been sufficiently warned through efforts to migrate them. Without this power, it becomes easy for customer teams to ignore deprecation work in favor of features or other more pressing work. 167 | 168 | 为了让强制性“弃用”真正起作用,需要有一个强制执行的时间表。并以警告的形式通知到需要执行迁移的客户团队。没有这种能力,客户团队很容易忽略“弃用”工作,而转而支持其他更紧迫的工作。 169 | 170 | At the same time, compulsory deprecations without staffing to do the work can come across to customer teams as mean spirited, which usually impedes completing the deprecation. Customers simply see such deprecation work as an unfunded mandate, requiring them to push aside their own priorities to do work just to keep their services running. This feels much like the “running to stay in place” phenomenon and creates friction between infrastructure maintainers and their customers. It’s for this reason that we strongly advocate that compulsory deprecations are actively staffed by a specialized team through completion. 171 | 172 | 同时,若没有安排人员协助,可能会给客户团队带来刻薄的印象,这通常会影响迁移的进度。客户只是将它视为一项没有资金的任务,要求他们搁置自己的优先事项,只为保持服务运行而迁移。这会在两个团队间产生摩擦,故此,我们建议安排人员进行协助迁移。 173 | 174 | It’s also worth noting that even with the force of policy behind them, compulsory deprecations can still face political hurdles. Imagine trying to enforce a compulsory deprecation effort when the last remaining user of the old system is a critical piece of infrastructure your entire organization depends on. How willing would you be to break that infrastructure—and, transitively, everybody that depends on it—just for the sake of making an arbitrary deadline? It is hard to believe the deprecation is really compulsory if that team can veto its progress. 175 | 176 | 还值得注意的是,即使有策略支持,强制性“弃用”仍可能面临政治阻力。想象一下,当旧系统的最后一个剩余用户是整个组织所依赖的关键基础架构时, 你会愿意为了在截止日期前完成迁移而破坏那个基础设施及所有依赖它的系统吗? 如果该团队可以否决其进展,就很难相信这种弃用真的是强制性的。 177 | 178 | Google’s monolithic repository and dependency graph gives us tremendous insight into how systems are used across our ecosystem. Even so, some teams might not even know they have a dependency on an obsolete system, and it can be difficult to discover these dependencies analytically. It’s also possible to find them dynamically through tests of increasing frequency and duration during which the old system is turned off temporarily. These intentional changes provide a mechanism for discovering unintended dependencies by seeing what breaks, thus alerting teams to a need to prepare for the upcoming deadline. Within Google, we occasionally change the name of implementation-only symbols to see which users are depending on them unaware. 179 | 180 | Google 的统一代码仓库和依赖关系图让我们深入了解系统如何在我们的生态系统中使用。即便如此,一些团队甚至可能不知道他们依赖于一个过时的系统,并且很难通过分析发现这些依赖关系。也可以通过增加频率和持续时间的测试来动态找到它们,在此期间旧系统暂时关闭。这些有意的更改提供了一种机制,通过观察哪些部分出现了故障来发现意外的依赖关系,从而提醒团队需要为即将到来的截止日期做好准备。在 Google 内部,我们偶尔会仅更改变量的名称,来查看哪些用户不知道依赖了它们。 181 | 182 | Frequently at Google, when a system is slated for deprecation and removal, the team will announce planned outages of increasing duration in the months and weeks prior to the turndown. Similar to Google’s Disaster Recovery Testing (DiRT) exercises, these events often discover unknown dependencies between running systems. This incremental approach allows those dependent teams to discover and then plan for the system’s eventual removal, or even work with the deprecating team to adjust their timeline. (The same principles also apply for static code dependencies, but the semantic information provided by static analysis tools is often sufficient to detect all the dependencies of the obsolete system.) 183 | 184 | 在谷歌,当系统计划“弃用”时,团队经常会在关闭前的几个月和几周内宣布计划停服,持续时间会增加。与 Google 的灾难恢复测试 (DiRT) 类似,这些事件通常会发现正在运行的系统之间的未知依赖关系。这种渐进式方法允许那些依赖的团队发现依赖,然后为系统的最终移除做计划,甚至与“弃用”团队合作调整他们的时间表。(同样的原则也适用于静态代码依赖,但静态分析工具提供的语义信息通常足以检测过时系统的所有依赖。) 185 | 186 | ### Deprecation Warnings 弃用警告 187 | 188 | For both advisory and compulsory deprecations, it is often useful to have a programmatic way of marking systems as deprecated so that users are warned about their use and encouraged to move away. It’s often tempting to just mark something as deprecated and hope its uses eventually disappear, but remember: “hope is not a strategy.” Deprecation warnings can help prevent new uses, but rarely lead to migration of existing systems. 189 | 190 | 对于建议性和强制“弃用”,以程序化的方式将系统标记为“弃用”通常很有用,以便用户得到警告,并被鼓励停止使用。将某些东西标记为已“弃用”并希望它的使用最终消失通常很诱人,但请记住:“希望不是一种策略。” “弃用”警告可以减少它的新增用户,但很少导致现有系统的迁移。 191 | 192 | What usually happens in practice is that these warnings accumulate over time. If they are used in a transitive context (for example, library A depends on library B, which depends on library C, and C issues a warning, which shows up when A is built), these warnings can soon overwhelm users of a system to the point where they ignore them altogether. In health care, this phenomenon is known as “alert fatigue.” 193 | 194 | 在实践中通常会发生这些警告随着时间的推移而累积。如果它们在传递上下文中使用(例如,库 A 依赖于库 B, 而库 B 又依赖于库 C,而 C 发出警告,并在构建 A 时显示),则这些警告很快就会使系统用户不知所措他们完全忽略这些警告。在医疗保健领域,这种现象被称为“警觉疲劳”。 195 | 196 | Any deprecation warning issued to a user needs to have two properties: actionability and relevance. A warning is actionable if the user can use the warning to actually perform some relevant action, not just in theory, but in practical terms, given the expertise in that problem area that we expect for an average engineer. For example, a tool might warn that a call to a given function should be replaced with a call to its updated counterpart, or an email might outline the steps required to move data from an old system to a new one. In each case, the warning provided the next steps that an engineer can perform to no longer depend on the deprecated system.[^2] 197 | 198 | 向用户发出的任何“弃用”警告都需要具有两个属性:可操作性和相关性。如果用户可以使用警告来实际执行某些相关操作,则警告是可操作的,不仅在理论上,而且在实践中,即要提供可操作的迁移步骤,而不仅仅是一个警告。 199 | 200 | A warning can be actionable, but still be annoying. To be useful, a deprecation warning should also be relevant. A warning is relevant if it surfaces at a time when a user actually performs the indicated action. Warning about the use of a deprecated function is best done while the engineer is writing code that uses that function, not after it has been checked into the repository for several weeks. Likewise, an email for data migration is best sent several months before the old system is removed rather than as an afterthought a weekend before the removal occurs. 201 | 202 | 警告可能是可行的,但仍然很烦人。为了有用,“弃用”警告也应该是相关的。如果警告在用户实际执行指示的操作时出现,则该警告是相关的。关于使用已“弃用”函数的警告最好在工程师编写使用该函数的代码时完成,而不是在将其签入存储库数周后。同样,最好在删除旧系统前几个月发送数据迁移电子邮件,而不是在删除前的一个周末之后才发送。 203 | 204 | It’s important to resist the urge to put deprecation warnings on everything possible. Warnings themselves are not bad, but naive tooling often produces a quantity of warning messages that can overwhelm the unsuspecting engineer. Within Google, we are very liberal with marking old functions as deprecated but leverage tooling such as ErrorProne or clang-tidy to ensure that warnings are surfaced in targeted ways. As discussed in Chapter 20, we limit these warnings to newly changed lines as a way to warn people about new uses of the deprecated symbol. Much more intrusive warnings, such as for deprecated targets in the dependency graph, are added only for compulsory deprecations, and the team is actively moving users away. In either case, tooling plays an important role in surfacing the appropriate information to the appropriate people at the proper time, allowing more warnings to be added without fatiguing the user. 205 | 206 | 重要的是要抵制在一切可能的地方放置弃用警告的冲动。警告本身并不坏,但不成熟的工具通常会产生大量警告消息,这些消息可能会让工程师不知所措。在 Google 内部,我们会将旧功能标记为已“弃用”,但会利用 ErrorProne 或 clang-tidy 等工具来确保以有针对性的方式显示警告。正如第 20 章中所讨论的,我们将这些警告限制在新更改的行中,以警告人们有关已 “弃用”符号的新用法。更具侵入性的警告,例如依赖图中已“弃用”的警告,仅针对强制“弃用”添加,并且团队正在积极地将用户移走。在任何一种情况下,工具都在适当的时间向适当的人提供适当的信息方面发挥着重要作用,允许添加更多警告而不会使用户感到疲倦。 207 | 208 | > [^2] See `https://abseil.io/docs/cpp/tools/api-upgrades` for an example. 209 | > 210 | > 2 查阅`https://abseil.io/docs/cpp/tools/api-upgrades` 例子。 211 | 212 | ## Managing the Deprecation Process 管理“弃用”的流程 213 | 214 | Although they can feel like different kinds of projects because we’re deconstructing a system rather than building it, deprecation projects are similar to other software engineering projects in the way they are managed and run. We won’t spend too much effort going over similarities between those management efforts, but it’s worth pointing out the ways in which they differ. 215 | 216 | “弃用”项目尽管与上线一个项目给你的感觉不同,但它们的管理和运行方式却是类似的。我们不会花太多精力去讨论他们有何共同点,但有必要指出他们有何不同。 217 | 218 | ### Process Owners 确定负责“弃用”的负责人 219 | 220 | We’ve learned at Google that without explicit owners, a deprecation process is unlikely to make meaningful progress, no matter how many warnings and alerts a system might generate. Having explicit project owners who are tasked with managing and running the deprecation process might seem like a poor use of resources, but the alternatives are even worse: don’t ever deprecate anything, or delegate deprecation efforts to the users of the system. The second case becomes simply an advisory deprecation, which will never organically finish, and the first is a commitment to maintain every old system ad infinitum. Centralizing deprecation efforts helps better assure that expertise actually reduces costs by making them more transparent. 221 | 222 | 我们在 Google 了解到,如果没有明确的Owner,无论系统产生了多少警报,“弃用”过程恐怕都不会太乐观。为了弃用专门指定一个负责人似乎是对资源的浪费,永不“弃用”,或将“弃用”工作完全交给系统的使用者,恐怕会是一个更糟的方案。交给使用者来执行的方案,最多只能应对建议性“弃用”,恐怕它很难做到彻底地“弃用”,而永不“弃用”则相当于无限期地维护着旧系统。集中性的执行“弃用”则更专业更透明,从而真正达到降低成本的目的。 223 | 224 | Abandoned projects often present a problem when establishing ownership and aligning incentives. Every organization of reasonable size has projects that are still actively used but that nobody clearly owns or maintains, and Google is no exception. Projects sometimes enter this state because they are deprecated: the original owners have moved on to a successor project, leaving the obsolete one chugging along in the basement, still a dependency of a critical project, and hoping it just fades away eventually. 225 | 226 | 弃用的项目通常会在确定归属权上存在扯皮的情形。每个小组都存在大量仍在使用却无明确维护人的项目,谷歌也不例外。当一个项目存在这种情形时,通常说明它已被抛弃:即原维护人已参与到新项目开发维护中,老项目则被弃之不顾,但却仍然被某些关键项目所依赖,只寄希望于它慢慢消失在众人视线中。 227 | 228 | Such projects are unlikely to fade away on their own. In spite of our best hopes, we’ve found that these projects still require deprecation experts to remove them and prevent their failure at inopportune times. These teams should have removal as their primary goal, not just a side project of some other work. In the case of competing priorities, deprecation work will almost always be perceived as having a lower priority and rarely receive the attention it needs. These sorts of important-not-urgent cleanup tasks are a great use of 20% time and provide engineers exposure to other parts of the codebase. 229 | 230 | 但此类项目不太可能自行消失。尽管我们对它满怀希望,但我们发现,“弃用”这些项目仍然需要专人负责,否则恐怕会造成意外的损失。这些团队应将移除作为他们的主要目标,而不仅仅是其他工作的附带项目。在确认优先级时,“弃用”通常会有较低的优先级, 且少有人关注。但实际上,这些重要但不紧急的清理工作,是工程师利用20%时间进行的很好的工作,这让工程师有机会接触代码库的其他部分。 231 | 232 | ### Milestones 制定里程碑 233 | 234 | When building a new system, project milestones are generally pretty clear: “Launch the frobnazzer features by next quarter.” Following incremental development practices, teams build and deliver functionality incrementally to users, who get a win whenever they take advantage of a new feature. The end goal might be to launch the entire system, but incremental milestones help give the team a sense of progress and ensure they don’t need to wait until the end of the process to generate value for the organization. 235 | 236 | 在构建新系统时,项目里程碑通常非常明确:如“在下个季度推出某项功能。” 遵循迭代式开发流程的团队,通常以积小成大的方式构建系统,并最终交付给用户,只要他们使用了新功能,他们的目的便算得到了。最终目标当然是启用整个系统,但增量迭代式的开发,则能让团队成员更有成就感,因他们无需等到项目结束就可体验项目。 237 | 238 | In contrast, it can often feel that the only milestone of a deprecation process is removing the obsolete system entirely. The team can feel they haven’t made any progress until they’ve turned out the lights and gone home. Although this might be the most meaningful step for the team, if it has done its job correctly, it’s often the least noticed by anyone external to the team, because by that point, the obsolete system no longer has any users. Deprecation project managers should resist the temptation to make this the only measurable milestone, particularly given that it might not even happen in all deprecation projects. 239 | 240 | 相反,对于“弃用”,它常会给人一种只有一个里程碑的错觉,即完全干掉老旧的项目。下班时,团队成员通常会有 没取得任何进展的感觉。干掉一个老旧的项目对团队成员来说虽是颇有意义,但对团队之外的人来说却是完全无感, 因老旧的系统已不再被任何服务调用。因此项目经理不应将完全根除旧项目当作唯一的里程碑。 241 | 242 | Similar to building a new system, managing a team working on deprecation should involve concrete incremental milestones, which are measurable and deliver value to users. The metrics used to evaluate the progress of the deprecation will be different, but it is still good for morale to celebrate incremental achievements in the deprecation process. We have found it useful to recognize appropriate incremental milestones, such as deleting a key subcomponent, just as we’d recognize accomplishments in building a new product. 243 | 244 | 与新建项目一样,“弃用”一个项目也该渐进的设置多个可量化的里程碑,用于评估“弃用”进度的指标会有差异,但阶段性的庆祝有助提升士气。 245 | 246 | ### Deprecation Tooling 工具加持 247 | 248 | Much of the tooling used to manage the deprecation process is discussed in depth elsewhere in this book, such as the large-scale change (LSC) process (Chapter 22) or our code review tools (Chapter 19). Rather than talk about the specifics of the tools, we’ll briefly outline how those tools are useful when managing the deprecation of an obsolete system. These tools can be categorized as discovery, migration, and backsliding prevention tooling. 249 | 250 | 许多用于管理“弃用”过程的工具在本书的其他地方进行了深入讨论,例如大规模变更 (LSC) 过程(第 22 章)或我们的代码审查工具(第 19 章)。我们不讨论这些工具的细节,而是简要概述如何让这些工具在管理弃用系统的 “弃用”时发辉作用。这些工具可以归类为发现、迁移和防止倒退工具。 251 | 252 | #### Discovery 发现使用者 253 | 254 | During the early stages of a deprecation process, and in fact during the entire process, it is useful to know how and by whom an obsolete system is being used. Much of the initial work of deprecation is determining who is using the old system—and in which unanticipated ways. Depending on the kinds of use, this process may require revisiting the deprecation decision once new information is learned. We also use these tools throughout the deprecation process to understand how the effort is progressing. 255 | 256 | 在早期阶段,实际上在整个过程中,确认谁在使用及怎样使用我们的弃用项目很有必要。初始工作通常是用于确认谁在用、以及以怎样的方式使用。根据使用的方式不同,有可能会推翻我们“弃用”的推进流程。我们还在整个弃用过程中使用这些工具来了解工作进展情况。 257 | 258 | Within Google, we use tools like Code Search (see Chapter 17) and Kythe (see Chapter 23) to statically determine which customers use a given library, and often to sample existing usage to see what sorts of behaviors customers are unexpectedly depending on. Because runtime dependencies generally require some static library or thin client use, this technique yields much of the information needed to start and run a deprecation process. Logging and runtime sampling in production help discover issues with dynamic dependencies. 259 | 260 | 在谷歌内部,我们使用代码搜索(见第 17 章)和 Kythe(见第 23 章)等工具来静态地确定哪些客户使用给定的库,并经常对现有使用情况进行抽样,以了解客户的使用方式。由于运行时依赖项通常需要使用一些静态库或瘦客户端,因此该技术能提供大部分决策信息。而生产中的日志记录和运行时采样有助于发现动态依赖项的问题。 261 | 262 | Finally, we treat our global test suite as an oracle to determine whether all references to an old symbol have been removed. As discussed in Chapter 11, tests are a mechanism of preventing unwanted behavioral changes to a system as the ecosystem evolves. Deprecation is a large part of that evolution, and customers are responsible for having sufficient testing to ensure that the removal of an obsolete system will not harm them. 263 | 264 | 最后,我们将集成测试套件视为预言机,以确定是否已删除对旧变量、函数的所有引用。正如第 11 章所讨论的,测试是一种防止系统随着生态系统发展而发生不必要的行为变化的机制。“弃用”是这种演变的重要组成部分,客户有责任进行足够的测试,以确保删除过时的系统不会对他们造成危害。 265 | 266 | #### Migration 迁移 267 | 268 | Much of the work of doing deprecation efforts at Google is achieved by using the same set of code generation and review tooling we mentioned earlier. The LSC process and tooling are particularly useful in managing the large effort of actually updating the codebase to refer to new libraries or runtime services. 269 | 270 | 在 Google “弃用”的大部分工作是通过使用我们之前提到的同一组代码生成和审查工具来完成的,即LSC工具集。它在代码仓库在引入新库或运行时服务时会很有用。 271 | 272 | #### Preventing backsliding 避免“弃用”项目被重新启用 273 | 274 | Finally, an often overlooked piece of deprecation infrastructure is tooling for preventing the addition of new uses of the very thing being actively removed. Even for advisory deprecations, it is useful to warn users to shy away from a deprecated system in favor of a new one when they are writing new code. Without backsliding prevention, deprecation can become a game of whack-a-mole in which users constantly add new uses of a system with which they are familiar (or find examples of elsewhere in the codebase), and the deprecation team constantly migrates these new uses. This process is both counterproductive and demoralizing. 275 | 276 | 最后,一个经常被忽视的问题是新增功能重新使用了弃用的项目。即使对于建议性“弃用”,警告用户在编写新代码时避免使用已“弃用”的系统而支持新系统也是很有用的。如果没有预防倒退机制,“弃用”可能会变成一场打地鼠游戏。按下葫芦浮起瓢是很影响士气的。 277 | 278 | To prevent deprecation backsliding on a micro level, we use the Tricorder static analysis framework to notify users that they are adding calls into a deprecated system and give them feedback on the appropriate replacement. Owners of deprecated systems can add compiler annotations to deprecated symbols (such as the @deprecated Java annotation), and Tricorder surfaces new uses of these symbols at review time. These annotations give control over messaging to the teams that own the deprecated system, while at the same time automatically alerting the change author. In limited cases, the tooling also suggests a push-button fix to migrate to the suggested replacement. 279 | 280 | 为了防止使用弃用项目,我们使用 Tricorder 静态分析框架来通知用户他们正在调用一个“弃用”的系统中,并提供替代方案。弃用系统的维护者应该将不推荐使用的符号添加编译器注释(例如@deprecated Java 注释),并且 Tricorder 在审查时会将其发送给弃用项目的维护者。同时自动提醒调用者。在某些情况下,该工具还能一键以替代方案进行修复。 281 | 282 | On a macro level, we use visibility whitelists in our build system to ensure that new dependencies are not introduced to the deprecated system. Automated tooling periodically examines these whitelists and prunes them as dependent systems are migrated away from the obsolete system. 283 | 284 | 在宏观层面上,我们在构建系统中使用可见的白名单来确保不会将新的依赖项引入已“弃用”的系统。自动化工具会 定期检查这些白名单,并在依赖系统从过时系统迁移时对其进行删减。 285 | 286 | ## 结论 (Conclusion) 287 | 288 | Deprecation can feel like the dirty work of cleaning up the street after the circus parade has just passed through town, yet these efforts improve the overall software ecosystem by reducing maintenance overhead and cognitive burden of engineers. Scalably maintaining complex software systems over time is more than just building and running software: we must also be able to remove systems that are obsolete or otherwise unused. 289 | 290 | A complete deprecation process involves successfully managing social and technical challenges through policy and tooling. Deprecating in an organized and well-managed fashion is often overlooked as a source of benefit to an organization, but is essential for its long-term sustainability. 291 | 292 | “弃用”感觉就像马戏团刚刚穿过城镇后,清理街道的肮脏工作,但它能通过减少维护开销和工程师的认知负担来改善整个软件生态系统。随着时间的推移,复杂系统的维护,不仅仅是包含构建和运行那么简单,还应包含清理过时的老旧系统。 293 | 294 | 完整的“弃用”过程涉及到管理和技术两个层面的挑战。有效地管理“弃用”通常因不会带来盈利而被忽视,但它对其长期可持续性维护却至关重要。 295 | 296 | ## TL;DRs 内容提要 297 | 298 | - Software systems have continuing maintenance costs that should be weighed against the costs of removing them. 299 | - Removing things is often more difficult than building them to begin with because existing users are often using the system beyond its original design. 300 | - Evolving a system in place is usually cheaper than replacing it with a new one, when turndown costs are included. 301 | - It is difficult to honestly evaluate the costs involved in deciding whether to deprecate: aside from the direct maintenance costs involved in keeping the old system around, there are ecosystem costs involved in having multiple similar systems to choose between and that might need to interoperate. The old system might implicitly be a drag on feature development for the new. These ecosystem costs are diffuse and difficult to measure. Deprecation and removal costs are often similarly diffuse. 302 | 303 | - 软件系统具有持续的维护成本,应与删除它们的成本进行权衡。 304 | - 删除东西通常比开始构建它们更困难,因为现有用户经常使用超出其原始设计意图的系统。 305 | - 如果将关闭成本包括在内,就地改进系统通常比更换新系统便宜。 306 | - 很难如实地评估 “弃用”所涉及的成本:除了保留旧系统所涉及的直接维护成本外,还有多个相似系统可供选择 所涉及的生态成本,互有交互。旧系统可能会暗中拖累新系统的功能开发。这些不同的生态所带来的成本则分散且难以衡量。“弃用”成本通常同样分散。 307 | --------------------------------------------------------------------------------