├── LICENSE ├── README.md ├── articles ├── 一篇文章让你了解chaosblade-niaoshuai.pdf └── 混沌工程介绍与实践.md └── slides ├── 使用 ChaosBlade 构建高可用的分布式系统-穹谷.pdf ├── 2019-GIAC-分布式服务架构下混沌工程实践-肖长军.pdf ├── 2021-信通院-混沌工程技术沙龙-金融行业专场.pdf ├── Chaosblade-云原生架构下的混沌工程实践-肖长军.pdf ├── chaosblade_introduction_and_practice_CN.pdf ├── 云原生架构下的混沌工程实践-周洋.pdf ├── 混沌工程落地与实践-肖长军-TOP100.pdf └── 通过混沌工程构建高可用的分布式服务-肖长军.pdf /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # awesome-chaosblade 2 | Awesome materials for ChaosBlade(ChaosBlade 相关资料) 3 | 4 | ## 使用文档 5 | * [中文使用文档](https://chaosblade-io.gitbook.io/chaosblade-help-zh-cn/) 6 | 7 | ## 文章 8 | ### 国内 9 | * ChaosBlade MySQL等场景介绍:[简书文章](https://juejin.im/post/5d1cab7ef265da1ba77cc018) 10 | * 混沌工程介绍与实践:[文章](articles/混沌工程介绍与实践.md) 11 | * InfoQ 混沌工程的力量:阿里周洋亲述这一技术背后那些事儿: [文章](https://www.infoq.cn/article/fQDVS*rh6NWbFcMzk12F) 12 | * ChaosBlade:云原生架构下的混沌工程探索和实践:[文章](https://mp.weixin.qq.com/s/Ym7NhhvyUyat4e_uvc8q2w) 13 | * 面向云原生的混沌工程工具 - ChaosBlade[文章](https://mp.weixin.qq.com/s/sdAcqwqf2bFki4QbvOHuUg) 14 | * 一文理解分布式服务架构下的混沌工程实践[文章](https://mp.weixin.qq.com/s/j00qD2_FBPb_ZqCu76fqZg) 15 | * 郭旭东-ChaosBlade:从零开始的混沌工程([一](https://xie.infoq.cn/article/a2a70caf74fe3c314020f178d))([二](https://xie.infoq.cn/article/30b66541344905a1f9bac079d))([三](https://xie.infoq.cn/article/053151fbbc830d3baa53d33e4))([四](https://xie.infoq.cn/article/9f8601e2092242a638813fb29))([五](https://xie.infoq.cn/article/ae2e7258a442df625a7787b7f)) 16 | * 超全总结 | 阿里电商故障治理和故障演练实践:[文章](http://aliyundian.com/post/189.html) 17 | * 满帮稳定性保障技术体系实践: [文章](https://mp.weixin.qq.com/s/Va64_U7mB4Hz8QRGxweKkg) 18 | * 被你质疑价值的混沌工程,阿里巴巴已落地实践了9年: [文章](https://mp.weixin.qq.com/s/jHopgbHmWCuF0JHv7Z7Erg) 19 | * 六年打磨!阿里开源混沌工程工具 ChaosBlade: [文章](https://mp.weixin.qq.com/s/QLlCeYq_j0EwVzEMHHTwPg) 20 | * 混沌工程平台 ChaosBlade-Box 新版重磅发布:[文章](https://mp.weixin.qq.com/s/r5KtsG1Qaw0hfz0u0VqfdQ) 21 | * ChaosBlade Java 场景性能优化,那些你不知道的事:[文章](https://mp.weixin.qq.com/s/11DhVzwYGGcyXRai7tucNQ) 22 | 23 | ### 国际 24 | * ChaosBlade-Box, a New Version of the Chaos Engineering Platform Has Released: [文章](https://www.alibabacloud.com/blog/chaosblade-box-a-new-version-of-the-chaos-engineering-platform-has-released_599069) 25 | 26 | ## 视频&PDF 27 | * 混沌工程落地与实践-肖长军 (TOP100.Beijing): [PDF](slides/混沌工程落地与实践-肖长军-TOP100.pdf) 28 | * Chaosblade: 云原生架构下的混沌工程实践-肖长军 (QConf.Shanghai): [PDF](slides/Chaosblade-云原生架构下的混沌工程实践-肖长军.pdf) 29 | * ChaosBlade 介绍与实践(云栖社区): [视频](https://yq.aliyun.com/live/989) & [PDF](https://github.com/chaosblade-io/awesome-chaosblade/blob/master/slides/chaosblade_introduction_and_practice_CN.pdf) 30 | * 云原生架构下的混沌工程实践-周洋(QCon.2019.Beijing)[PDF](https://github.com/chaosblade-io/awesome-chaosblade/blob/master/slides/%E4%BA%91%E5%8E%9F%E7%94%9F%E6%9E%B6%E6%9E%84%E4%B8%8B%E7%9A%84%E6%B7%B7%E6%B2%8C%E5%B7%A5%E7%A8%8B%E5%AE%9E%E8%B7%B5-%E5%91%A8%E6%B4%8B.pdf) 31 | * 使用 ChaosBlade 构建高可用的分布式系统: [PDF](https://github.com/chaosblade-io/awesome-chaosblade/blob/master/slides/%20%E4%BD%BF%E7%94%A8%20ChaosBlade%20%E6%9E%84%E5%BB%BA%E9%AB%98%E5%8F%AF%E7%94%A8%E7%9A%84%E5%88%86%E5%B8%83%E5%BC%8F%E7%B3%BB%E7%BB%9F-%E7%A9%B9%E8%B0%B7.pdf) 32 | * 分布式服务架构下混沌工程实践-阿里巴巴-肖长军(GIAC.2019.Shenzhen):[PDF](https://github.com/chaosblade-io/awesome-chaosblade/blob/master/slides/2019-GIAC-%E5%88%86%E5%B8%83%E5%BC%8F%E6%9C%8D%E5%8A%A1%E6%9E%B6%E6%9E%84%E4%B8%8B%E6%B7%B7%E6%B2%8C%E5%B7%A5%E7%A8%8B%E5%AE%9E%E8%B7%B5-%E8%82%96%E9%95%BF%E5%86%9B.pdf) 33 | * 通过混沌工程构建高可用的分布式服务-肖长军(DTED.20190720.Shenzhen): [PDF](https://github.com/chaosblade-io/awesome-chaosblade/blob/master/slides/%E9%80%9A%E8%BF%87%E6%B7%B7%E6%B2%8C%E5%B7%A5%E7%A8%8B%E6%9E%84%E5%BB%BA%E9%AB%98%E5%8F%AF%E7%94%A8%E7%9A%84%E5%88%86%E5%B8%83%E5%BC%8F%E6%9C%8D%E5%8A%A1-%E8%82%96%E9%95%BF%E5%86%9B.pdf) 34 | * 混沌工程的过去、现在和未来(20210331): [视频]( https://www.bilibili.com/video/BV1yi4y1P7E9?spm_id_from=333.999.0.0) 35 | * 可信云大会-混沌工程,企业云原生的指南针(20210726): [视频]( https://www.bilibili.com/video/BV1Fv411J7aw?spm_id_from=333.999.0.0) 36 | * 2021-信通院-混沌工程技术沙龙-金融行业专场.pdf(202112):[PDF](slides/2021-信通院-混沌工程技术沙龙-金融行业专场.pdf) 37 | 38 | ## 企业实践 39 | * 一篇文章让你了解chaosblade: [PDF](https://github.com/chaosblade-io/awesome-chaosblade/blob/master/articles/%E4%B8%80%E7%AF%87%E6%96%87%E7%AB%A0%E8%AE%A9%E4%BD%A0%E4%BA%86%E8%A7%A3chaosblade-niaoshuai.pdf), 作者:[niaoshuai](https://github.com/niaoshuai) 40 | * 特来电-特来电混沌工程实践: [文章](https://www.cnblogs.com/tianqing/p/10499611.html) 41 | * 特来电-特来电混沌工程实践-混沌事件注入: [文章](https://www.cnblogs.com/tianqing/p/10628751.html) 42 | * 酷家乐-混沌工程在创业公司中的实践: [文章](https://mp.weixin.qq.com/s/CG6Ig3BIyzKSRO1a5n5Ilg) 43 | * 杭银消费金融-ChaosBlade--动态脚本实现 Java 进阶:[文章](https://www.cnblogs.com/emars/p/12221887.html) 44 | * 转转-异常测试平台搭建方案: [文章](https://mp.weixin.qq.com/s/ma7htEFwTONh4NU9XQ4uCg) 45 | * 164次练习「失败」: [文章](https://mp.weixin.qq.com/s/J-HMh_qeqk6md-l39J6gjg) 46 | * “突袭”阿里: [文章](https://mp.weixin.qq.com/s/Z5wlQ6ac3XZAW_45Rzydew) 47 | * 去哪儿网基于ChaosBlade的混沌工程实践:[文章](https://mp.weixin.qq.com/s/b8bTRosBjQ1rtW_oX954Ng) 48 | 49 | ## Wiki 50 | * [chaosblade 文档](https://github.com/chaosblade-io/chaosblade/wiki) 51 | * [chaosblade-exec-jvm 文档](https://github.com/chaosblade-io/chaosblade-exec-jvm/wiki) 52 | -------------------------------------------------------------------------------- /articles/一篇文章让你了解chaosblade-niaoshuai.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chaosblade-io/awesome-chaosblade/b293cb4ee141b2b8d120353b54f1574962552f1e/articles/一篇文章让你了解chaosblade-niaoshuai.pdf -------------------------------------------------------------------------------- /articles/混沌工程介绍与实践.md: -------------------------------------------------------------------------------- 1 | # 混沌工程介绍与实践 2 | > 文章地址:https://github.com/StabilityMan/StabilityGuide 3 | > 作者:肖长军(穹谷,[@xcaspar](https://github.com/xcaspar)) 4 | 5 | 在分布式系统架构下,服务间的依赖日益复杂,很难评估单个服务故障对整个系统的影响,并且请求链路长,监控告警的不完善导致发现问题、定位问题难度增大,同时业务和技术迭代快,如何持续保障系统的稳定性和高可用性受到很大的挑战。我们知道发生故障的那一刻不是由你来选择的,而是那一刻来选择你,你能做的就是为之做好准备。所以构建稳定性系统很重要的一环是混沌工程,在可控范围或环境下,通过故障注入,来持续提升系统的稳定性和高可用能力。 6 | 本文会着重介绍什么是混沌工程,为什么需要混沌工程以及混沌工程相关工具与实践。如有遗漏或错误,欢迎补充指正。 7 | 8 | ## 目录 9 | 10 | - [什么是混沌工程](#什么是混沌工程) 11 | - [为什么需要混沌工程](#为什么需要混沌工程) 12 | - [混沌工程实施原则](#混沌工程实施原则) 13 | - [混沌工程实施步骤](#混沌工程实施步骤) 14 | - [推荐工具&产品](#推荐工具产品) 15 | - [混沌工程实践案例](#混沌工程实践案例) 16 | - [相关文章&交流群](#相关文章交流群) 17 | - [加入我们](#加入我们) 18 | 19 | ## 什么是混沌工程 20 | 混沌工程是在 [混沌工程理论](https://principlesofchaos.org/) 一文中提出,但在 2010 年 Netflix 从物理机基础设施迁移到 AWS 过程中,为保证 EC2 实例故障不会对业务造成影响,其团队开发出了杀 EC2 实例的工具,这也是混沌工程的雏形。在 2015 年社区发布《混沌工程理论》一文后,混沌工程开始快速发展。 21 | 混沌工程是在分布式系统上进行实验的学科,旨在提升系统容错性,建立系统抵御生产环境中发生不可预知问题的信心。”打不倒我的必使我强大“,尼采的这句话很好了诠释了混沌工程反脆弱的思想。 22 | 23 | ## 为什么需要混沌工程 24 | 分布式系统日益复杂,而且在系统逐渐云化的背景下,系统的稳定性受到很大的挑战。这里从四个角色来说明混沌工程的重要性。 25 | - 对于架构师来说,可以验证系统架构的容错能力,比如验证现在提倡的面向失败设计的系统; 26 | - 对于开发和运维,可以提高故障的应急效率,实现故障告警、定位、恢复的有效和高效性。 27 | - 对于测试来说,可以弥补传统测试方法留下的空白,之前的测试方法基本上是从用户的角度去做,而混沌工程是从系统的角度进行测试,降低故障复发率。 28 | - 对于产品和设计,通过混沌事件查看产品的表现,提升客户使用体验。所以说混沌工程面向的不仅仅是开发、测试,拥有最好的客户体验是每个人的目标 29 | 所以实施混沌工程,可以提早发现生产环境上的问题,并且可以以战养战,提升故障应急效率和可以使用体验,逐渐建设高可用的韧性系统。 30 | 31 | ## 混沌工程实施原则 32 | ![chaos-eng-rules](https://user-images.githubusercontent.com/3992234/63409822-858d2f80-c424-11e9-9aac-58f34a0f5c6d.png) 33 | 34 | - 第一条:”建立一个围绕稳定状态行为的假说“,其包含两个含义,一个是定义能直接反应业务服务的监控指标,需要注意的是这里的监控指标并不是系统资源指标,比如CPU、内存等,这里的监控指标是能直接衡量系统服务质量的业务监控。举个例子,一个调用延迟故障,请求的 RT 会变长,对上层交易量造成下跌的影响,那么这里交易量就可以作为一个监控指标。这条原则的另一个含义是故障触发时,对系统行为作出假设以及监控指标的预期变化。 35 | - 第二条指模拟生产环境中真实的或有理论依据的故障场景,比如依赖的服务调用延迟、超时、异常等。 36 | - 第三条建议在生产环境中运行实验,但也不是说必须在生产环境中执行,只是实验环境越真实,混沌工程越有价值,但如果知道系统在某个故障场景下不具备容灾能力,不可以执行此混沌实验,避免资损发生。 37 | - 第四条,持续的执行才能持续的降低故障复发率和提前发现故障,所以需要持续的自动化运行试验。 38 | - 最后一个,混沌工程很重要的一点是控制爆炸半径,也就是试验影响面,防止预期外的资损发生,可以通过环境隔离或者故障注入工具提供的配置粒度来控制。 39 | 40 | ## 混沌工程实施步骤 41 | - 制订混沌实验计划 42 | - 定义系统稳态指标 43 | - 做出系统容错行为假设 44 | - 执行混沌实验 45 | - 检查系统稳态指标 46 | - 记录&恢复混沌实验 47 | - 修复发现的问题 48 | - 自动化持续进行验证 49 | 50 | ## 推荐工具产品 51 | ![awesome-chaos-engineering.png](https://user-images.githubusercontent.com/3992234/63409859-9473e200-c424-11e9-89bc-09eff69dd390.jpg) 52 | 大家可以从工具的场景丰富度、类型、易用性等方面来选择一款合适的工具,awesome-chaos-engineering Github 项目收纳了一些开源的混沌工程工具,在 CNCF Landscape 中混沌工程作为单独的一个领域存在,并且收纳了一些主流的工具,包含阿里巴巴开源的 ChaosBlade 工具和 AHAS 阿里云产品。 53 | ![cncf-landscape.png](https://user-images.githubusercontent.com/3992234/63409944-b705fb00-c424-11e9-887f-5e057b31536a.jpg) 54 | 下文重点介绍 ChaosBlade 及其相关实践。 55 | 56 | ### ChaosBlade 57 | 58 | ChaosBlade 中文名混沌之刃,是一款混沌实验实施工具,支持丰富的实验场景,比如应用、容器、基础资源等。工具使用简单,扩展方便,其遵循社区提出的混沌实验模型。Github 地址:https://github.com/chaosblade-io/chaosblade 59 | 60 | #### 功能和特点 61 | **场景丰富度高** 62 | ChaosBlade 支持的混沌实验场景不仅覆盖基础资源,如 CPU 满载、磁盘 IO 高、网络延迟等,还包括运行在 JVM 上的应用实验场景,如 Dubbo 调用超时和调用异常、指定方法延迟或抛异常以及返回特定值等,同时涉及容器相关的实验,如杀容器、杀 Pod。后续会持续的增加实验场景。 63 | 64 | **使用简洁,易于理解** 65 | ChaosBlade 通过 CLI 方式执行,具有友好的命令提示功能,可以简单快速的上手使用。命令的书写遵循阿里巴巴集团内多年故障测试和演练实践抽象出的故障注入模型,层次清晰,易于阅读和理解,降低了混沌工程实施的门槛。 66 | 67 | **动态加载,无侵入** 68 | ChaosBlade采用动态故障注入的方式,执行混沌实验时用户系统不需要做任何系统改造或发布,开箱即用。 69 | 70 | **场景扩展方便** 71 | 所有的 ChaosBlade 实验执行器同样遵循上述提到的故障注入模型,使实验场景模型统一,便于开发和维护。模型本身通俗易懂,学习成本低,可以依据模型方便快捷的扩展更多的混沌实验场景。 72 | 73 | #### 使用方式 74 | 在 ChaosBlade Release 页面下载最新版本的包,解压即用。如创建一个 CPU 满载实验,命令为: 75 | ``` 76 | blade create cpu fullload 77 | ``` 78 | 具体使用方式可详见:[ChaosBlade 新手指南](https://github.com/chaosblade-io/chaosblade/wiki/%E6%96%B0%E6%89%8B%E6%8C%87%E5%8D%97) 79 | 80 | 中文使用文档:[帮助文档](https://chaosblade-io.gitbook.io/chaosblade-help-zh-cn/) 81 | 82 | #### 混沌实验模型 83 | ![](https://user-images.githubusercontent.com/3992234/63409808-80c87b80-c424-11e9-9fa8-26b52e1fef73.jpg) 84 | 该模型分四次,层层递进,很清晰的表达出对什么组件做实验,实验范围是什么,实验触发的匹配规则有哪些,执行什么实验。该模型简洁、通用,语言领域无关、易于实现。阿里集团内的 C++、NodeJS、Dart 应用以及容器平台的实验场景都基于此模型实现。此模型具有很重要的意义,依据此模型可以更精准的描述、更好的理解、更方便沉淀实验场景以及发掘更多的场景。依据此模型实现的工具更加规范、简洁。实验模型介绍可详见:[混沌实验模型介绍](https://github.com/chaosblade-io/chaosblade/wiki/%E6%B7%B7%E6%B2%8C%E5%AE%9E%E9%AA%8C%E6%A8%A1%E5%9E%8B)。 85 | 86 | ## 混沌工程实践案例 87 | ![Screen Shot 2019-08-21 at 2.44.42 P](https://user-images.githubusercontent.com/3992234/63409672-35ae6880-c424-11e9-8a93-f4b10bdf6afb.png) 88 | 此拓扑图来自于阿里云 AHAS 产品架构感知功能,可自动感知架构拓扑,并且可以展示进程、网络、节点等数据。这个分布式服务 Demo 分三级调用,consumer 调用 provider,provider 调用 base,同时 provider 还调用 mk-demo 数据库,provider 和 base 服务具有两个实例,在 AHAS 架构拓扑图上,我们点击一个实例节点,可以到非常清晰的调用关系。我们后面结合这个 Demo 去讲解实践。 89 | 90 | ### 验证监控告警 91 | ![Screen Shot 2019-08-21 at 2.43.36 P](https://user-images.githubusercontent.com/3992234/63409252-63df7880-c423-11e9-9b39-13e9e5dca075.png) 92 | ![Screen Shot 2019-08-21 at 2.43.58 P](https://user-images.githubusercontent.com/3992234/63409276-6e017700-c423-11e9-945d-4312005ba27e.png) 93 | 案例一,我们验证系统的监控告警性有效性。按照前面提到的混沌工程实施步骤,那么这个案例执行的实验场景是数据库调用延迟,我们先定义监控指标:慢 SQL 数和告警信息,做出期望假设:慢 SQL 数增加,钉钉群收到慢 SQL 告警。接下来执行实验。我们直接使用 ChaosBlade 工具执行,可以看下左下角,我们对 demo-provider 注入调用 mysql 查询时,若数据库是 demo 且表名是 d_discount,则对 50% 的查询操作延迟 600 毫秒。我们使用阿里云产品 ARMS 做监控告警。大家可以看到,当执行完混沌实验后,很快钉钉群里就收到了报警。所以我们对比下之前定义的监控指标,是符合预期的。但需要注意的是这次符合预期并不代表以后也符合,所以需要通过混沌工程持续性的验证。出现慢 SQL,可通过 ARMS 的 [链路追踪](https://help.aliyun.com/document_detail/63796.html) 来排查定位,可以很清楚的看出哪条语句执行慢。 94 | 95 | ### 案例二 96 | ![Screen Shot 2019-08-21 at 2.44.07 P](https://user-images.githubusercontent.com/3992234/63409297-778adf00-c423-11e9-9179-d991eab7b6db.png) 97 | 前面讲了一个符合预期的案例,我们再来看一个不符合预期的。此案例是验证系统异常实例隔离的能力,我们的 Demo 中 consumer 调用 provider 服务,provider 服务具有两个实例,我们对其中一个注入延迟故障,监控指标是 consumer 的 QPS,稳态在 510 左右。我们做的容错假设是系统会自动隔离或下线出问题的服务实例,防止请求路由的此实例,所有 QPS 会有短暂的下跌,但很快会恢复。这个案例,我们使用阿里云 AHAS 混沌实验平台来执行,我们对 demo-provider-1 注入延迟故障,基于此平台可以很方便的执行混沌实验。执行混沌实验后,QPS 下跌到 40 左右,很长时间没有自动恢复,所以不符合预期,我们通过人工的方式对该异常的实例做下线处理,很快就看到,consumer 的 QPS 恢复正常。所以我们通过混沌工程发现了系统问题,我们后面需要做就是记录此问题,并且推动修复,后续做持续性的验证。 98 | 99 | ## 相关文章交流群 100 | - ChaosBlade 钉钉讨论群号:23177705 101 | - 相关资料:[awesome-chaosblade 项目](https://github.com/chaosblade-io/awesome-chaosblade) 102 | 后续的分享和讨论都会在上述钉钉群中进行,欢迎加入。我们还会不定期的给 ChaosBlade 社区贡献者发放纪念品,欢迎加入到 ChaosBlade 社区中,加入方式:star、issue、pr 等均可。 103 | 104 | ## 加入我们 105 | 106 | 【稳定大于一切】打造国内稳定性领域知识库,**让无法解决的问题少一点点,让世界的确定性多一点点**。 107 | 108 | * [GitHub 地址](https://github.com/StabilityMan/StabilityGuide) 109 | * 钉钉群号:23179349 110 | * 如果阅读本文有所收获,欢迎分享给身边的朋友,期待更多同学的加入! -------------------------------------------------------------------------------- /slides/ 使用 ChaosBlade 构建高可用的分布式系统-穹谷.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chaosblade-io/awesome-chaosblade/b293cb4ee141b2b8d120353b54f1574962552f1e/slides/ 使用 ChaosBlade 构建高可用的分布式系统-穹谷.pdf -------------------------------------------------------------------------------- /slides/2019-GIAC-分布式服务架构下混沌工程实践-肖长军.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chaosblade-io/awesome-chaosblade/b293cb4ee141b2b8d120353b54f1574962552f1e/slides/2019-GIAC-分布式服务架构下混沌工程实践-肖长军.pdf -------------------------------------------------------------------------------- /slides/2021-信通院-混沌工程技术沙龙-金融行业专场.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chaosblade-io/awesome-chaosblade/b293cb4ee141b2b8d120353b54f1574962552f1e/slides/2021-信通院-混沌工程技术沙龙-金融行业专场.pdf -------------------------------------------------------------------------------- /slides/Chaosblade-云原生架构下的混沌工程实践-肖长军.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chaosblade-io/awesome-chaosblade/b293cb4ee141b2b8d120353b54f1574962552f1e/slides/Chaosblade-云原生架构下的混沌工程实践-肖长军.pdf -------------------------------------------------------------------------------- /slides/chaosblade_introduction_and_practice_CN.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chaosblade-io/awesome-chaosblade/b293cb4ee141b2b8d120353b54f1574962552f1e/slides/chaosblade_introduction_and_practice_CN.pdf -------------------------------------------------------------------------------- /slides/云原生架构下的混沌工程实践-周洋.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chaosblade-io/awesome-chaosblade/b293cb4ee141b2b8d120353b54f1574962552f1e/slides/云原生架构下的混沌工程实践-周洋.pdf -------------------------------------------------------------------------------- /slides/混沌工程落地与实践-肖长军-TOP100.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chaosblade-io/awesome-chaosblade/b293cb4ee141b2b8d120353b54f1574962552f1e/slides/混沌工程落地与实践-肖长军-TOP100.pdf -------------------------------------------------------------------------------- /slides/通过混沌工程构建高可用的分布式服务-肖长军.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chaosblade-io/awesome-chaosblade/b293cb4ee141b2b8d120353b54f1574962552f1e/slides/通过混沌工程构建高可用的分布式服务-肖长军.pdf --------------------------------------------------------------------------------