├── LICENSE
├── README-cn.md
├── README.md
├── _config.yml
└── images
├── STAM-no-client-instrumentation.png
├── STAM-no-server-instrumentation.png
├── STAM-span-analysis.png
├── STAM-topo-in-apache-skywalking.png
├── STAM-uninstrumentation-proxy.png
└── dapper-span.png
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/README-cn.md:
--------------------------------------------------------------------------------
1 | STAM:针对大型分布式应用系统的拓扑自动检测方法
2 |
3 | [](https://wu-sheng.github.io/STAM/)
4 |
5 | - Sheng Wu 吴 晟
6 | - wusheng@apache.org
7 | - 翻译:BZFYS(327568824@qq.com), Yanlong He 何延龙 (heyanlong@apache.org)
8 |
9 |
10 | # 摘要
11 | 对大型分布式系统进行监视,可视化和故障排除是一项重大挑战。当今使用的一种常见工具是分布式跟踪系统(例如Google Dapper)[1],它基于跟踪数据检测拓扑和度量。当今拓扑检测的一大局限性在于,只能根据给定的时间窗口的跟踪数据来推断服务之间的依赖关系。这会导致更多的延迟和内存使用,因为在高度分布式的系统中,每个客户端和服务端追踪信息都必须在数百万个随机RPC请求中进行匹配。更重要的是,如果客户端和服务器之间的RPC持续时间长于先前的设置时间窗口或跨越两个窗口,则它可能无法匹配。
12 |
13 | 在本论文中,我们提出了STAM(流拓扑分析方法)。在STAM中,我们可以使用自动检测或手动检测机制在客户端和服务器端截取和操纵RPC。对于自动检测,STAM在runtime处理应用程序代码,例如Java代理。因此,此监视系统不需要应用程序开发团队或RPC框架开发团队修改任何源代码。STAM将客户端使用的RPC网络地址,服务名称和服务实例名称注入RPC上下文,并将服务器端服务名称和服务实例名称绑定为客户端使用的该网络地址的别名 。将依赖性分析从导致阻塞和延迟的机制中解放出来,
14 |
15 | STAM已在Apache Software Foundation的一个开源APM(应用程序性能监视系统)项目Apache SkyWalking [2]中实施,该项目已在许多大型企业中广泛使用[3],其中包括阿里巴巴,华为,腾讯,滴滴,小米,中国移动和其他企业(航空公司,金融机构等)在生产环境中支持其大型分布式系统。它具有更好的水平扩展能力,从而显着降低了负载和内存成本。
16 |
17 | # 介绍
18 | 监控高度分布式的系统(尤其是使用微服务体系结构)非常复杂。许多RPC,包括HTTP,gRPC,MQ,缓存和数据库访问,都在单个客户端请求之后。让IT团队了解数千个服务之间的依赖关系是整个分布式系统可观察性的关键特征和第一步。分布式跟踪系统能够收集跟踪,包括所有分布式请求路径。逻辑上已将依赖关系包括在跟踪数据中。分布式跟踪系统,例如Zipkin [4]或Jaeger Tracing [10],提供了内置的依赖关系分析功能,而且有许多分析功能都基于此。这种分析至少存在两个局限性:实时性和一致的准确性。
19 |
20 | 服务级别和服务实例级别的依赖分析过程中,由于分布式应用程序系统依赖关系的可变性,需要很强大的实时性保障。
21 |
22 | 服务是具有相同功能或代码的实例的逻辑组。
23 |
24 | 服务实例通常是操作系统级别的进程,例如JVM进程。服务和实例之间的关系是可变的,具体取决于配置,代码和网络状态。依赖关系可能会随着时间而改变。
25 |
26 |
27 |
28 |
29 | 图1,在传统的基于Dapper的跟踪系统中生成的span。
30 |
31 |
32 |
33 | Dapper论文中的span模型和现有的跟踪系统(例如Zipkin仪器模式[9])只是将span id传播到服务器端。由于这种模型,依赖性分析需要一定的时间窗口。关系分析同时依赖从客户端和服务端收集到的追踪数据。因此,分析过程必须等待客户端和服务器span在同一时间窗口中匹配,才能输出结果Service A依赖于Service B。因此,RPC请求持续时间必须在时间窗口内;否则,将无法分析出服务关系数据。在生产中,有时必须将窗口持续时间设置为3-5分钟,使得拓扑分析无法在秒级做出反应。另外,由于基于时间窗口的设计,如果一侧涉及到长时间任务,它不能轻易达到一致的精度。因为,为了使分析尽可能快,所以分析时间少于5分钟。但是,如果分析不完整或跨越两个时间窗口,则某些span将无法与其父级或子级匹配。即使我们添加了一种机制来处理前一阶段剩余的span,仍然必须放弃一些机制以保持数据集大小和内存使用合理。
34 |
35 | 在STAM中,我们使用新的分析方法介绍了新的span和上下文传播模型。这种新模型将客户端使用的对端网络地址(IP或主机名),同时,客户端将自己的服务实例名称和客户端服务名称添加到上下文传播模型中。然后,依托RPC调用从客户端传递到服务器,就像现有跟踪系统中的传递原始Trace ID和Span ID一样,并将其收集在服务器端span中。新的分析方法可以轻松地直接生成客户端-服务器关系,而无需等待客户端span。它还将客户端使用的对端网络地址设置为服务端服务的一个别名。在跨集群节点数据同步后,客户端span分析也可以使用此别名元数据直接生成客户端-服务器关系。通过在Apache SkyWalking中使用这些新模型和方法,我们不再依赖基于时间窗口的分析方法,在低于5秒延迟的条件下,准备地进行拓扑图分覅。
36 |
37 |
38 | # Span Model(新的跨度模型)和Context Model(上下文模型)
39 |
40 | 跟踪系统的传统范围包括以下字段[1] [6] [10]。
41 | * Trace ID,代表整个跟踪。
42 | * Span ID,代表当前span。
43 | * 一个operation name,描述此span执行的操作。
44 | * start timestamp(开始时间戳)。
45 | * finish timestamp(完成时间戳)。
46 | * 当前Span的Service和Service Instance名称。
47 | * 一组零个或多个键值对组成的的span tag。
48 | * 一组零个或多个span日志,每个span日志本身就是与时间戳配对的key:value映射。
49 | * 引用零个或多个因果相关的span。参考包括parent span id和trace id。
50 |
51 | 在STAM的新span模型中,我们在span中添加以下字段。
52 |
53 | **Span type**(跨度类型): 枚举类型,Exit,Local和Entry。Entry和Local适用于网络相关的库中。输入范围代表服务器端网络库,例如Apache Tomcat [7]。Exit spans代表客户端网络库,例如Apache HttpComponents [8]。
54 |
55 | **Peer Network Address**:(对端网络地址): 远程“Address”,适用于exit和entry的span。在“Exit spans”中,对端网络地址是客户端库访问服务器的地址。通常,这个字段通常在许多跟踪系统中是可选的。但是在STAM中,我们在所有RPC情况下都需要它们。
56 |
57 | **Context Model**(上下文模型):用于将客户端信息传播到原始RPC调用所携带的服务器端,通常在header(例如HTTP header或MQ header)中传播。在旧的设计中,它带有客户端span的Trace ID和Span ID。在STAM中,我们增强了此模型,添加了父服务名称,parent service instance name(父服务实例名称)和peer of exit span.(出口范围的对等点)。名称可以是文字字符串。所有这些额外的字段将有助于消除流分析的障碍。与现有的上下文模型相比,它使用更多的带宽,但是可以对其进行优化。在Apache SkyWalking中,我们设计了一种注册机制来交换代表这些名称的唯一ID。结果,在RPC上下文中仅添加了3个整数,因此在生产环境中带宽的增加至少小于1%。
58 |
59 | 两种模型的变化可以消除分析过程中的时间窗口。Server-side span分析增强了上下文感知能力。
60 |
61 | # 新的拓扑分析方法
62 |
63 | STAM核心的新拓扑分析方法是以流模式处理span。server-side span(也称为entry span)的分析针对包括parent service name(父服务名称),parent service instance name(父服务实例名称)和exit span的peer信息。因此,分析过程可以得出以下结果。
64 |
65 | 1. 使用exit span的peer做为当前服务和实例的别名。创建`Peer network address <-> service name`和`peer network address <-> Service instance name`别名。这两个将与所有分析节点同步并保存在存储中,从而允许更多分析处理者拥有此别名信息。
66 |
67 | 2. 生成`parent service name -> current service name`和`parent service instance name -> current service instance name`两种关系数据,除非发现当前已经存在还有另外一个不同的`Peer network address <-> Service Instance Name`映射关系。在这种情况下,仅生成`peer network address <-> service name`的关系和`peer network address <-> Service instance name`的关系。
68 |
69 | 为了分析client-side span(exit span),可能存在三种可能性。
70 |
71 | 1. exit span中的对等方已经具有从步骤(1)进行的服务器端范围分析所建立的别名。然后使用别名来代替peer信息,并产生的流量`current service name -> alias service name`和`current service instance name -> alias service instance name`。
72 |
73 | 2. 如果找不到别名,那么只需为`current service name -> peer`和`current service instance name -> peer`生成流量。
74 |
75 | 3. 如果`peer network address <-> Service Instance Name`可以找到的多个别名,则继续为`current service name -> peer network address`和`current service instance name -> peer network address`生成流量。
76 |
77 |
78 |
79 |
80 | 图2,Apache SkyWalking使用STAM来检测和可视化分布式系统的拓扑。
81 |
82 |
83 | # 评价
84 |
85 | 在本节中,我们将在几种典型情况下评估新模型和分析方法,在这些情况下,旧方法会失去及时性和一致的准确性。
86 |
87 |
88 | - 1. **在线新服务或自动扩展**
89 |
90 | 新的服务可以由开发团队随机地添加到整个拓扑中,也可以通过某种扩展策略自动将容器操作平台添加到整个拓扑中,例如Kubernetes [5]。在任何情况下都无法手动通知监视系统。通过使用STAM,我们可以自动检测新节点,并保持分析过程畅通无阻,并与检测到的节点保持一致。在这种情况下,将使用新的服务和网络地址(可以是IP,端口或两者)。`peer network address <-> service`映射不存在,将首先生成`client service -> peer network address`的流量并将其持久化在存储中。生成映射后,可以在分析平台中识别,生成和聚合客户端服务到服务器服务的其他流量。为了在生成映射之前填补一些流量的空白,我们要求在查询阶段再次进行`peer network address <-> service`映射转换,以将`client service->peer network address`和client-service合并到server-service。在生产中,整个SkyWalking分析平台部署的VM数量少于100,在它们之间的同步将完成少于10秒,在大多数情况下仅需要3-5秒。在查询阶段,数据至少要在几分钟或几秒钟内汇总。查询合并性能与生成映射之前发生的流量无关,仅受同步持续时间的影响,此处仅3秒。因此,在分钟级别的聚合拓扑中,它仅在整个拓扑关系数据集中添加1或2个关系记录。考虑到每分钟有500多个关系记录的100多个服务拓扑,此查询合并的有效负载增加非常有限且负担得起。该功能在大型高负载分布式系统中非常重要,因为我们无需担心其扩展能力。在某些fork版本中,为了永久地消除查询阶段的额外负载,在检测到新的对等映射后,他们选择更新现有的`client service->peer network address`到client-service到server-service,以便永久消除查询阶段的额外负载。
91 |
92 |
93 |
94 |
95 | 图3,使用新的拓扑分析方法进行span分析
96 |
97 |
98 |
99 | - 2. **现有未配置节点**
100 |
101 | 在这种情况下,每种拓扑检测方法都必须起作用。在许多情况下,生产环境中存在无法被监控的节点。其原因可能包括:(1)技术限制。在某些使用golang或C ++编写的应用程序中,Java或.Net中没有简单的方法来由代理进行自动检测。因此,可能不会自动检测代码。(2)MQ,数据库服务器等中间件未采用跟踪系统。分布式追踪这些中间件是非常耗时或者时间困难的。(3)第三方服务或云服务不支持当前跟踪系统的工作。(4)缺乏资源:例如,开发人员或操作团队缺乏时间完成探针安装或者修改代码。
102 |
103 | 即使客户端或服务器端没有探针,STAM也可以正常工作。它仍然保持拓扑尽可能准确。
104 |
105 | 如果未安装客户端探针,则server-side span将不会通过RPC上下文获得任何引用,因此,它将仅使用peer来生成流量,如图4所示。
106 |
107 |
108 |
109 |
110 | 图4,没有客户端探针时的STAM流量生成
111 |
112 |
113 |
114 | 如图5所示,在另一种情况下,由于没有server-side span,因此client span分析不需要处理这种情况。STAM分析核心只是简单地不断生成`client service->peer network address`。由于没有生成的peer network address的映射,因此没有合并。
115 |
116 |
117 |
118 |
119 | 图5,没有服务器端探针时的STAM流量生成
120 |
121 |
122 | - 3. **具有报头转发能力的未配置节点**
123 |
124 | 除了我们在(2)未配置节点中评估的情况外,还有一种复杂而特殊的情况:被检测的节点具有将header从下游传播到上游的能力,通常在所有代理中,例如Envoy [11],Nginx [12] ],Spring Cloud Gateway [13]。作为代理,它具有将所有header从下游转发到上游的功能,以将某些信息保留在header中,包括tracing上下文,身份验证信息,浏览器信息和路由信息,以使它们可被位于proxy后面的业务服务访问,例如Envoy路由配置[14]。当无论什么原因,某些代理无法安装探针时,它都不应该会影响拓扑检测。
125 |
126 | 在这种情况下, proxy address在客户端使用,并通过RPC上下文作为peer network address,并且代理将其转发到其他上游服务。然后,STAM可以检测到这种情况并生成代理作为推测节点。在STAM中,应为此网络地址生成多个别名。在检测到这两个并同步到分析节点之后,分析核心知道客户端和服务器之间至少有一个未配置的服务。因此,它将生成`client service->peer network address`,`peer->server service B`和`peer network address->server service C`的关系,如图6所示。。
127 |
128 |
129 |
130 |
131 | 图6,代理无法被监控时,STAM流量的生成
132 |
133 |
134 | # 结论
135 | 这个论文介绍了STAM,据我们所知,STAM是分布式跟踪系统的最佳拓扑检测方法。它取代了基于时间窗口的拓扑分析方法,用于基于跟踪的监视系统。它永久和完全地消除了,使用基于时间窗口分析方法中的磁盘和内存资源成本,并消除了水平扩展的障碍。Apache SkyWalking是一个STAM实现,被广泛用于监视生产中的数百个应用程序。其中一些每天生成超过100 TB的跟踪数据,并为200多种服务实时生成拓扑。
136 |
137 | # 致谢
138 | 我们感谢Apache SkyWalking项目的所有贡献者,感谢他们对此分析模型提出的建议,实现STAM的代码贡献以及在生产环境中使用STAM和SkyWalking,并提供反馈。
139 |
140 | # 许可证
141 | 这篇论文和STAM使用Apache 2.0许可协议
142 |
143 | # 参考文献
144 | 1. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure, https://research.google.com/pubs/pub36356.html?spm=5176.100239.blogcont60165.11.OXME9Z
145 | 1. Apache SkyWalking, http://skywalking.apache.org/
146 | 1. Apache Open Users, https://github.com/apache/skywalking/blob/master/docs/powered-by.md
147 | 1. Zipkin, https://zipkin.io/
148 | 1. Kubernetes, Production-Grade Container Orchestration. Automated container deployment, scaling, and management. https://kubernetes.io/
149 | 1. OpenTracing Specification https://github.com/opentracing/specification/blob/master/specification.md
150 | 1. Apache Tomcat, http://tomcat.apache.org/
151 | 1. Apache HttpComponents, https://hc.apache.org/
152 | 1. Zipkin doc, ‘Instrumenting a library’ section, ‘Communicating trace information’ paragraph. https://zipkin.io/pages/instrumenting
153 | 1. Jaeger Tracing, https://jaegertracing.io/
154 | 1. Envoy Proxy, http://envoyproxy.io/
155 | 1. Nginx, http://nginx.org/
156 | 1. Spring Cloud Gateway, https://spring.io/projects/spring-cloud-gateway
157 | 1. Envoy Route Configuration, https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/rds.proto.html?highlight=request_headers_to_
158 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # STAM: Enhancing Topology Auto Detection For A Highly Distributed and Large-Scale Application System
2 |
3 | - Sheng Wu 吴 晟
4 | - wusheng@apache.org
5 |
6 | [](https://wu-sheng.github.io/STAM/README-cn)
7 |
8 | # Abstract
9 | Monitoring, visualizing and troubleshooting a large-scale distributed system is a major challenge. One common tool used today is the distributed tracing system (e.g., Google Dapper)[1], and detecting topology and metrics based on the tracing data. One big limitation of today’s topology detection is that the analysis depends on aggregating the client-side and server-side tracing spans in a given time window to generate the dependency of services. This causes more latency and memory use, because the client and server spans of every RPC must be matched in millions of randomly occurring requests in a highly distributed system. More importantly, it could fail to match if the duration of RPC between client and server is longer than the prior setup time window, or across the two windows.
10 |
11 | In this paper, we present the STAM, Streaming Topology Analysis Method. In STAM, we could use auto instrumentation or a manual instrumentation mechanism to intercept and manipulate RPC at both client-side and server-side. In the case of auto instrumentation, STAM manipulates application codes at runtime, such as Java agent. As such, this monitoring system doesn’t require any source code changes from the application development team or RPC framework development team. The STAM injects an RPC network address used at client side, a service name and a service instance name into the RPC context, and binds the server-side service name and service instance name as the alias name for this network address used at the client side. Freeing the dependency analysis from the mechanisms that cause blocking and delay, the analysis core can process the monitoring data in stream mode and generate the accurate topology.
12 |
13 | The STAM has been implemented in the Apache SkyWalking[2], an open source APM (application performance monitoring system) project of the Apache Software Foundation, which is widely used in many big enterprises[3] including Alibaba, Huawei, Tencent, Didi, Xiaomi, China Mobile and other enterprises (airlines, financial institutions and others) to support their large-scale distributed systems in the production environment. It reduces the load and memory cost significantly, with better horizontal scale capability.
14 |
15 | # Introduction
16 | Monitoring the highly distributed system, especially with a micro-service architecture, is very complex. Many RPCs, including HTTP, gRPC, MQ, Cache, and Database accesses, are behind a single client-side request. Allowing the IT team to understand the dependency relationships among thousands of services is the key feature and first step for observability of a whole distributed system. A distributed tracing system is capable of collecting traces, including all distributed request paths. Dependency relationships have been logically included in the trace data. A distributed tracing system, such as Zipkin [4] or Jaeger Tracing [10], provides built-in dependency analysis features, but many analysis features build on top of that. There are at least two fundamental limitations: timeliness and consistent accuracy.
17 |
18 | Strong timeliness is required to match the mutability of distributed application system dependency relationship, including service level and service instance level dependency.
19 |
20 | A Service is a logic group of instances which have the same functions or codes.
21 |
22 | A Service Instance is usually an OS level process, such as a JVM process. The relationships between services and instances are mutable, depending on the configuration, codes and network status. The dependency could change over time.
23 |
24 |
25 |
26 |
27 | Figure 1, Generated spans in traditional Dapper based tracing system.
28 |
29 |
30 | The span model in the Dapper paper and existing tracing systems,such as Zipkin instrumenting mode[9], just propagates the span id to the server side. Due to this model,
31 | dependency analysis requires a certain time window. The tracing spans are collected at both client- and server-sides, because the relationship is recorded. Due to that, the analysis process has to wait for the client and server spans to match in the same time window, in order to output the result, Service A depending on Service B. So, this time window must be over the duration of this RPC request; otherwise, the conclusion will be lost. This condition makes the analysis would not react the dependency mutation in second level, in production, it sometimes has to set the window duration in 3-5 mins.
32 | Also, because of the Windows-based design, if one side involves a long duration task, it can’t easily achieve consistent accuracy. Because in order to make the analysis as fast as possible, the analysis period is less than 5 minutes. But some spans can’t match its parent or children if the analysis is incomplete or crosses two time windows. Even if we added a mechanism to process the spans left in the previous stages, still some would have to be abandoned to keep the dataset size and memory usage reasonable.
33 |
34 | In the STAM, we introduce a new span and context propagation models, with the new analysis method. These new models add the peer network address (IP or hostname) used at client side, client service instance name and client service name, into the context propagation model. Then it passes the RPC call from client to server, just as the original trace id and span id in the existing tracing system, and collects it in the server-side span. The new analysis method can easily generate the client-server relationship directly without waiting on the client span. It also sets the peer network address as one alias of the server service. After the across cluster node data sync, the client-side span analysis could use this alias metadata to generate the client-server relationship directly too. By using these new models and method in Apache SkyWalking, we remove the time windows-based analysis permanently, and fully use the streaming analysis mode with less than 5 seconds latency and consistent accuracy
35 |
36 | # New Span Model and Context Model
37 | The traditional span of a tracing system includes the following fields [1][6][10].
38 | - A trace id to represent the whole trace.
39 | - A span id to represent the current span.
40 | - An operation name to describe what operation this span did.
41 | - A start timestamp.
42 | - A finish timestamp
43 | - Service and Service Instance names of current span.
44 | - A set of zero or more key:value Span Tags.
45 | - A set of zero or more Span Logs, each of which is itself a key:value map paired with a timestamp.
46 | - References to zero or more causally related Spans. Reference includes the parent span id and trace id.
47 |
48 | In the new span model of STAM we add the following fields in the span.
49 |
50 | **Span type**. Enumeration, including exit, local and entry. Entry and Exit spans are used in a networking related library. Entry spans represent a server-side networking library, such as Apache Tomcat[7]. Exit spans represent the client-side networking library, such as Apache HttpComponents [8].
51 |
52 | **Peer Network Address**. Remote "address," suitable for use in exit and entry spans. In Exit spans, the peer network address is the address by the client library to access the server.
53 |
54 | These fields usually are optionally included in many tracing system,. But in STAM, we require them in all RPC cases.
55 |
56 | **Context Model** is used to propagate the client-side information to server-side carried by the original RPC call, usually in the header, such as HTTP header or MQ header. In the old design, it carries the trace id and span id of client-side span. In the STAM, we enhance this model, adding the parent service name, parent service instance name and peer of exit span. The names could be literal strings. All these extra fields will help to remove the block of streaming analysis. Compared to the existing context model, this uses a little more bandwidth, but it could be optimized. In Apache SkyWalking, we design a register mechanism to exchange unique IDs to represent these names. As a result, only 3 integers are added in the RPC context, so the increase of bandwidth is at least less than 1% in the production environment.
57 |
58 | The changes of two models could eliminate the time windows in the analysis process. Server-side span analysis enhances the context aware capability.
59 |
60 | # New Topology Analysis Method
61 | The new topology analysis method at the core of STAM is processing the span in stream mode.
62 | The analysis of the server-side span, also named entry span, includes the parent service name, parent service instance name and peer of exit span. So the analysis process could establish the following results.
63 | 1. Set the peer of exit span as client using alias name of current service and instance. `Peer network address <-> service name` and `peer network address <-> Service instance name` aliases created. These two will sync with all analysis nodes and persistent in the storage, allowing more analysis processers to have this alias information.
64 | 2. Generate relationships of `parent service name -> current service name` and `parent service instance name -> current service instance name`, unless there is another different `Peer network address <-> Service Instance Name` mapping found. In that case, only generate relationships of `peer network address <-> service name` and `peer network address <-> Service instance name`.
65 |
66 | For analysis of the client-side span (exit span), there could three possibilities.
67 | 1. The peer in the exit span already has the alias names established by server-side span analysis from step (1). Then use alias names to replace the peer, and generate traffic of `current service name -> alias service name` and `current service instance name -> alias service instance name`.
68 | 2. If the alias could not be found, then just simply generate traffic for `current service name -> peer` and `current service instance name -> peer`.
69 | 3. If multiple alias names of `peer network address <-> Service Instance Name` could be found, then keep generating traffic for `current service name -> peer network address` and `current service instance name -> peer network address`.
70 |
71 |
72 |
73 |
74 | Figure 2, Apache SkyWalking uses STAM to detect and visualize the topology of distributed systems.
75 |
76 |
77 | # Evaluation
78 | In this section, we evaluate the new models and analysis method in the context of several typical cases in which the old method loses timeliness and consistent accuracy.
79 |
80 | - 1.**New Service Online or Auto Scale Out**
81 |
82 | New services could be added into the whole topology by the developer team randomly, or container operation platform automatically by some scale out policy, like Kubernetes [5]. The monitoring system could not be notified in any case manually. By using STAM, we could detect the new node automatically and also keep the analysis process unblocked and consistent with detected nodes.
83 | In this case, a new service and network address (could be IP, port or both) are used. The peer network address <-> service mapping does not exist, the traffic of client service -> peer network address will be generated and persistent in the storage first. After mapping is generated, further traffic of client-service to server-service could be identified, generated and aggregated in the analysis platform. For filling the gap of a few traffic before the mapping generated, we require doing peer network address <-> service mapping translation again in query stage, to merge client service->peer network address and client-service to server-service. In production, the amount of VM for the whole SkyWalking analysis platform deployment is less than 100, syncing among them will finish less than 10 seconds, in most cases it only takes 3-5 seconds. And in the query stage, the data has been aggregated in minutes or seconds at least. The query merge performance is not related to how much traffic happens before the mapping generated, only affected by sync duration, in here, only 3 seconds. Due to that, in minute level aggregation topology, it only adds 1 or 2 relationship records in the whole topology relationship dataset. Considering an over 100 services topology having over 500 relationship records per minute, the payload increase for this query merge is very limited and affordable. This feature is significant in a large and high load distributed system, as we don’t need to concern its scaling capability. And in some fork versions, they choose to update the existing client service->peer network address to client-service to server-service after detecting the new mapping for peer generated, in order to remove the extra load at query stage permanently.
84 |
85 |
86 |
87 |
88 | Figure 3, Span analysis by using the new topology analysis method
89 |
90 |
91 | - 2.**Existing Uninstrumented Nodes**
92 |
93 | Every topology detection method has to work in this case. In many cases, there are nodes in the production environment that can’t be instrumented. Causes for this might include:(1) Restriction of the technology. In some golang or C++ written applications, there is no easy way in Java or .Net to do auto instrumentation by the agent. So, the codes may not be instrumented automatically. (2) The middleware, such as MQ, database server, has not adopted the tracing system. This would make it difficult or time consuming to implement the middleware instrumentation. (3) A 3rd party service or cloud service doesn’t support work with the current tracing system. (4) Lack of resources: e.g., the developer or operation team lacks time to make the instrumentation ready.
94 |
95 | The STAM works well even if the client or server side has no instrumentation. It still keeps the topology as accurate as possible.
96 |
97 | If the client side hasn’t instrumented, the server-side span wouldn’t get any reference through RPC context, so, it would simply use peer to generate traffic, as shown in Figure 4.
98 |
99 |
100 |
101 |
102 | Figure 4, STAM traffic generation when no client-side instrumentation
103 |
104 |
105 | As shown in Figure 5, in the other case, with no server-side instrumentation, the client span analysis doesn’t need to process this case. The STAM analysis core just simply keeps generating client service->peer network address traffic. As there is no mapping for peer network address generated, there is no merging.
106 |
107 |
108 |
109 |
110 | Figure 5, STAM traffic generation when no server-side instrumentation
111 |
112 |
113 | - 3.**Uninstrumented Node Having Header Forward Capability**
114 |
115 | Besides the cases we evaluated in (2) Uninstrumented Nodes, there is one complex and special case: the instrumented node has the capability to propagate the header from downstream to upstream, typically in all proxy, such as Envoy[11], Nginx[12], Spring Cloud Gateway[13]. As proxy, it has the capability to forward all headers from downstream to upstream to keep some of information in the header, including the tracing context, authentication, browser information, and routing information, in order to make them accessible by the business services behind the proxy, like Envoy route configuration [14]. When some proxy can’t be instrumented, no matter what the reason, it should not affect the topology detection.
116 |
117 | In this case, the proxy address would be used at the client side and propagate through RPC context as peer network address, and the proxy forwards this to different upstream services. Then STAM could detect this case and generate the proxy as a conjectural node. In the STAM, more than one alias names for this network address should be generated. After those two are detected and synchronized to the analysis node, the analysis core knows there is at least one uninstrumented service standing between client and servers. So, it will generate the relationships of `client service->peer network address`, `peer->server service B` and `peer network address ->server service C`, as shown in Figure 6.
118 |
119 |
120 |
121 |
122 | Figure 6, STAM traffic generation when the proxy uninstrumentatio
123 |
124 |
125 | # Conclusion
126 |
127 | This paper described the STAM, which is to the best of our knowledge the best topology detection method for distributed tracing systems. It replaces the time-window based topology analysis method for tracing-based monitoring systems. It removes the resource cost of disk and memory for time-window baseds analysis permanently and totally, and the barriers of horizontal scale. One STAM implementation, Apache SkyWalking, is widely used for monitoring hundreds of applications in production. Some of them generated over 100 TB tracing data per day and topology for over 200 services in real time.
128 |
129 | # Acknowledgments
130 | We thank all contributors of Apache SkyWalking project for suggestions, code contributions to implement the STAM, and feedback from using the STAM and SkyWalking in their production environment.
131 |
132 | # License
133 | This paper and the STAM are licensed in the [Apache 2.0](LICENSE)
134 |
135 | # References
136 |
137 | 1. Dapper, a Large-Scale Distributed Systems Tracing Infrastructure, https://research.google.com/pubs/pub36356.html?spm=5176.100239.blogcont60165.11.OXME9Z
138 | 1. Apache SkyWalking, http://skywalking.apache.org/
139 | 1. Apache Open Users, https://github.com/apache/skywalking/blob/master/docs/powered-by.md
140 | 1. Zipkin, https://zipkin.io/
141 | 1. Kubernetes, Production-Grade Container Orchestration. Automated container deployment, scaling, and management. https://kubernetes.io/
142 | 1. OpenTracing Specification https://github.com/opentracing/specification/blob/master/specification.md
143 | 1. Apache Tomcat, http://tomcat.apache.org/
144 | 1. Apache HttpComponents, https://hc.apache.org/
145 | 1. Zipkin doc, ‘Instrumenting a library’ section, ‘Communicating trace information’ paragraph. https://zipkin.io/pages/instrumenting
146 | 1. Jaeger Tracing, https://jaegertracing.io/
147 | 1. Envoy Proxy, http://envoyproxy.io/
148 | 1. Nginx, http://nginx.org/
149 | 1. Spring Cloud Gateway, https://spring.io/projects/spring-cloud-gateway
150 | 1. Envoy Route Configuration, https://www.envoyproxy.io/docs/envoy/latest/api-v2/api/v2/rds.proto.html?highlight=request_headers_to_
151 |
--------------------------------------------------------------------------------
/_config.yml:
--------------------------------------------------------------------------------
1 | theme: jekyll-theme-hacker
--------------------------------------------------------------------------------
/images/STAM-no-client-instrumentation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wu-sheng/STAM/24529f25b35d314ca6005b37c7988da2444e8a46/images/STAM-no-client-instrumentation.png
--------------------------------------------------------------------------------
/images/STAM-no-server-instrumentation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wu-sheng/STAM/24529f25b35d314ca6005b37c7988da2444e8a46/images/STAM-no-server-instrumentation.png
--------------------------------------------------------------------------------
/images/STAM-span-analysis.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wu-sheng/STAM/24529f25b35d314ca6005b37c7988da2444e8a46/images/STAM-span-analysis.png
--------------------------------------------------------------------------------
/images/STAM-topo-in-apache-skywalking.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wu-sheng/STAM/24529f25b35d314ca6005b37c7988da2444e8a46/images/STAM-topo-in-apache-skywalking.png
--------------------------------------------------------------------------------
/images/STAM-uninstrumentation-proxy.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wu-sheng/STAM/24529f25b35d314ca6005b37c7988da2444e8a46/images/STAM-uninstrumentation-proxy.png
--------------------------------------------------------------------------------
/images/dapper-span.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/wu-sheng/STAM/24529f25b35d314ca6005b37c7988da2444e8a46/images/dapper-span.png
--------------------------------------------------------------------------------