├── .gitignore
├── .gitmodules
├── LICENSE
├── README.md
├── dist
├── arthas-control-plugin-9.1.0-SNAPSHOT.jar
├── arthas-controller-9.7.0-SNAPSHOT.jar
└── skywalking-webapp.jar
└── docs
├── img
├── connect-sequence.png
├── disconnect-sequence.png
└── skywalking-x-arthas-ui.png
└── skywalking-x-arthas.md
/.gitignore:
--------------------------------------------------------------------------------
1 | /build/
2 | target/
3 | .idea/
4 | *.iml
5 | .classpath
6 | .project
7 | .settings/
8 | .DS_Store
9 |
--------------------------------------------------------------------------------
/.gitmodules:
--------------------------------------------------------------------------------
1 | [submodule "skywalking"]
2 | path = skywalking
3 | url = https://github.com/weixiang1862/skywalking
4 | [submodule "skywalking-java"]
5 | path = skywalking-java
6 | url = https://github.com/weixiang1862/skywalking-java
7 | [submodule "skywalking-booster-ui"]
8 | path = skywalking-booster-ui
9 | url = https://github.com/weixiang1862/skywalking-booster-ui
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # skywalking-x-arthas
2 | Start an arthas diagnostic process from [skywalking-ui](https://github.com/apache/skywalking-booster-ui), without ask access of server from your boss.
3 |
4 | Package arthas into [skywalking-agent](https://github.com/apache/skywalking-agent), without copy installation package to vm or container when troubleshooting.
5 |
6 | 
7 | ## Quick Start
8 | ### 1. git clone skywalking-x-arthas
9 | ```
10 | git clone https://github.com/weixiang1862/skywalking-x-arthas
11 | ```
12 | ### 2. copy arthas-control-plugin to your agent `plugins` folder
13 | ```
14 | cd ${path_to_skywalking-x-arthas}
15 | cp dist/arthas-control-plugin-9.1.0-SNAPSHOT.jar ${your_sw_agent_home}/plugins/
16 |
17 | cat << EOF >>${your_sw_agent_home}/config/agent.config
18 | # arthas home, default is {$AGENT_HOME}/arthas
19 | plugin.arthas.arthas_home=${SW_ARTHAS_HOME:}
20 | # arthas tunnel server address, (e.g. ws://127.0.0.1:7777/ws)
21 | plugin.arthas.tunnel_server=${SW_ARTHAS_TUNNEL_SERVER:ws://127.0.0.1:7777/ws}
22 | plugin.arthas.session_timeout=${SW_ARTHAS_SESSION_TIMEOUT:}
23 | plugin.arthas.disabled_commands=${SW_ARTHAS_DISABLED_COMMAND:}
24 | EOF
25 | ```
26 | ### 3. copy arthas-controller to your oap-server `oap-libs` folder
27 | ```
28 | cd ${path_to_skywalking-x-arthas}
29 | cp dist/arthas-controller-9.7.0-SNAPSHOT.jar ${your_oap_server_home}/oap-libs/
30 |
31 | cat << EOF >>${your_oap_server_home}/config/application.yml
32 | arthas-controller:
33 | selector: default
34 | default:
35 | EOF
36 | ```
37 | ### 4. copy skywalking-webapp.jar to your oap-server `webapp` folder
38 | ```
39 | cd ${path_to_skywalking-x-arthas}
40 | cp dist/skywalking-webapp.jar ${your_oap_server_home}/webapp/
41 | ```
42 | ### 5. download arthas & start an arthas tunnel server
43 | ```
44 | https://arthas.aliyun.com/download/latest_version?mirror=aliyun -O arthas.zip
45 | mkdir ${your_sw_agent_home}/arthas
46 | unzip arthas.zip -d ${your_sw_agent_home}/arthas
47 |
48 | wget https://arthas.aliyun.com/download/arthas-tunnel-server/latest_version?mirror=aliyun -O arthas-tunnel-server.jar
49 | java -jar arthas-tunnel-server-3.7.1.jar
50 | ```
51 | ### 6. restart your service and oap-server then do some test
52 |
53 | ## How It Works
54 | ### 1. connect
55 | ```mermaid
56 | sequenceDiagram
57 | participant skywalking-ui
58 | participant skywalking-oap-server
59 | participant your-service-with-sw-agent
60 | participant arthas-tunnel-server
61 | skywalking-ui->>skywalking-oap-server: start instance arthas process
62 | loop CommandCheck
63 | your-service-with-sw-agent->>skywalking-oap-server: should start arthas?
64 | end
65 | your-service-with-sw-agent-->>arthas-tunnel-server: connect when get start command
66 | skywalking-ui-->>arthas-tunnel-server: execute arthas command over websocket
67 | arthas-tunnel-server-->>skywalking-ui: get arthas command resoult over websocket
68 | ```
69 |
70 | ### 2. disconnect
71 | ```mermaid
72 | sequenceDiagram
73 | participant skywalking-ui
74 | participant skywalking-oap-server
75 | participant your-service-with-sw-agent
76 | participant arthas-tunnel-server
77 | skywalking-ui->>skywalking-oap-server: stop instance arthas process
78 | skywalking-ui-->>arthas-tunnel-server: disconnect with tunnel
79 | loop CommandCheck
80 | your-service-with-sw-agent->>skywalking-oap-server: should stop arthas?
81 | end
82 | your-service-with-sw-agent-->>arthas-tunnel-server: disconnect when get stop command
83 | ```
84 |
85 |
86 |
87 | ## Building From Source
88 | ### 1. build skywalking
89 | follow skywalking official [skywalking build docs](https://skywalking.apache.org/docs/main/next/en/guides/how-to-build/).
90 | ### 2. build skywalking-agent
91 | follow skywalking official [skywalking-java build docs](https://skywalking.apache.org/docs/skywalking-java/next/en/contribution/compiling/).
92 |
93 | ## Documentation
94 | To learn more, read the [documentation](./docs/skywalking-x-arthas.md).
--------------------------------------------------------------------------------
/dist/arthas-control-plugin-9.1.0-SNAPSHOT.jar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/weixiang1862/skywalking-x-arthas/51a8c93140ce1b1c116bd4ec3506b496ac38caf6/dist/arthas-control-plugin-9.1.0-SNAPSHOT.jar
--------------------------------------------------------------------------------
/dist/arthas-controller-9.7.0-SNAPSHOT.jar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/weixiang1862/skywalking-x-arthas/51a8c93140ce1b1c116bd4ec3506b496ac38caf6/dist/arthas-controller-9.7.0-SNAPSHOT.jar
--------------------------------------------------------------------------------
/dist/skywalking-webapp.jar:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/weixiang1862/skywalking-x-arthas/51a8c93140ce1b1c116bd4ec3506b496ac38caf6/dist/skywalking-webapp.jar
--------------------------------------------------------------------------------
/docs/img/connect-sequence.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/weixiang1862/skywalking-x-arthas/51a8c93140ce1b1c116bd4ec3506b496ac38caf6/docs/img/connect-sequence.png
--------------------------------------------------------------------------------
/docs/img/disconnect-sequence.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/weixiang1862/skywalking-x-arthas/51a8c93140ce1b1c116bd4ec3506b496ac38caf6/docs/img/disconnect-sequence.png
--------------------------------------------------------------------------------
/docs/img/skywalking-x-arthas-ui.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/weixiang1862/skywalking-x-arthas/51a8c93140ce1b1c116bd4ec3506b496ac38caf6/docs/img/skywalking-x-arthas-ui.png
--------------------------------------------------------------------------------
/docs/skywalking-x-arthas.md:
--------------------------------------------------------------------------------
1 | ## 背景介绍
2 | Arthas 是一款常用的 Java 诊断工具,我们可以在 SkyWalking 监控到服务异常后,通过 Arthas 进一步分析和诊断以快速定位问题。
3 |
4 | 在 Arthas 实际使用中,通常由开发人员拷贝或者下载安装包到服务对应的VM或者容器中,attach 到对应的 Java 进程进行问题排查。这一过程不可避免的会造成服务器敏感运维信息的扩散,
5 | 而且在分秒必争的问题排查过程中,这些繁琐的操作无疑会浪费大量时间。
6 |
7 | SkyWalking Java Agent 伴随 Java 服务一起启动,并定期上报服务、实例信息给OAP Server。我们可以借助 SkyWalking Java Agent 的插件化能力,开发一个 Arthas 控制插件,
8 | 由该插件管理 Arthas 运行生命周期,通过页面化的方式,完成Arthas的启动与停止。最终实现效果可以参考下图:
9 |
10 | 
11 |
12 | 要完成上述功能,我们需要实现以下几个关键点:
13 | 1. 开发 agent arthas-control-plugin,执行 arthas 的启动与停止命令
14 | 2. 开发 oap arthas-controller-module ,下发控制命令给 arthas agent plugin
15 | 3. 定制 skywalking-ui, 连接 arthas-tunnel-server,发送 arthas 命令并获取执行结果
16 |
17 | 以上各个模块之间的交互流程如下图所示:
18 |
19 | ### connect
20 | 
21 |
22 | ### disconnect
23 | 
24 |
25 | 本文涉及的所有代码均已发布在 github [skywalking-x-arthas](https://github.com/weixiang1862/skywalking-x-arthas) 上,如有需要,大家可以自行下载代码测试。
26 | 文章后半部分将主要介绍代码逻辑及其中包含的SkyWalking扩展点。
27 |
28 | ## agent arthas-control-plugin
29 | 首先在 skywalking-java/apm-sniffer/apm-sdk-plugin 下创建一个 arthas-control-plugin,
30 | 该模块在打包后会成为 skywalking-agent/plugins 下的一个插件, 其目录结构如下:
31 | ```
32 | arthas-control-plugin/
33 | ├── pom.xml
34 | └── src
35 | └── main
36 | ├── java
37 | │ └── org
38 | │ └── apache
39 | │ └── skywalking
40 | │ └── apm
41 | │ └── plugin
42 | │ └── arthas
43 | │ ├── config
44 | │ │ └── ArthasConfig.java # 模块配置
45 | │ ├── service
46 | │ │ └── CommandListener.java # boot service,监听 oap command
47 | │ └── util
48 | │ ├── ArthasCtl.java # 控制 arthas 的启动与停止
49 | │ └── ProcessUtils.java
50 | ├── proto
51 | │ └── ArthasCommandService.proto # 与oap server通信的 grpc 协议定义
52 | └── resources
53 | └── META-INF
54 | └── services # boot service spi service
55 | └── org.apache.skywalking.apm.agent.core.boot.BootService
56 |
57 | 16 directories, 7 files
58 | ```
59 | 在 ArthasConfig.java 中,我们定义了以下配置,这些参数将在 arthas 启动时传递。
60 |
61 | 以下的配置可以通过 agent.config 文件、system prop、env variable指定。
62 | 关于 skywalking-agent 配置的初始化的具体流程,大家可以参考 [SnifferConfigInitializer](https://github.com/apache/skywalking-java/blob/main/apm-sniffer/apm-agent-core/src/main/java/org/apache/skywalking/apm/agent/core/conf/SnifferConfigInitializer.java) 。
63 | ```java
64 | public class ArthasConfig {
65 | public static class Plugin {
66 | @PluginConfig(root = ArthasConfig.class)
67 | public static class Arthas {
68 | // arthas 目录
69 | public static String ARTHAS_HOME;
70 | // arthas 启动时连接的tunnel server
71 | public static String TUNNEL_SERVER;
72 | // arthas 会话超时时间
73 | public static Long SESSION_TIMEOUT;
74 | // 禁用的 arthas command
75 | public static String DISABLED_COMMANDS;
76 | }
77 | }
78 | }
79 | ```
80 | 接着,我们看下 CommandListener.java 的实现,CommandListener 实现了 BootService 接口,
81 | 并通过 resources/META-INF/services 下的文件暴露给 ServiceLoader。
82 |
83 | BootService 的定义如下,共有prepare()、boot()、onComplete()、shutdown()几个方法,这几个方法分别对应插件生命周期的不同阶段。
84 | ```java
85 | public interface BootService {
86 | void prepare() throws Throwable;
87 |
88 | void boot() throws Throwable;
89 |
90 | void onComplete() throws Throwable;
91 |
92 | void shutdown() throws Throwable;
93 |
94 | default int priority() {
95 | return 0;
96 | }
97 | }
98 | ```
99 | 在 [ServiceManager](https://github.com/apache/skywalking-java/blob/main/apm-sniffer/apm-agent-core/src/main/java/org/apache/skywalking/apm/agent/core/boot/ServiceManager.java) 类的 boot() 方法中,
100 | 定义了BootService 的 load 与启动流程,该方法 由SkyWalkingAgent 的 premain 调用,在主程序运行前完成初始化与启动:
101 | ```java
102 | public enum ServiceManager {
103 | INSTANCE;
104 | ...
105 | ...
106 | public void boot() {
107 | bootedServices = loadAllServices();
108 |
109 | prepare();
110 | startup();
111 | onComplete();
112 | }
113 | ...
114 | ...
115 | }
116 | ```
117 | 回到我们 CommandListener 的 boot 方法,该方法在 agent 启动之初定义了一个定时任务,这个定时任务会轮询 oap ,查询是否需要启动或者停止arthas:
118 | ```java
119 | public class CommandListener implements BootService, GRPCChannelListener {
120 | ...
121 | ...
122 | @Override
123 | public void boot() throws Throwable {
124 | getCommandFuture = Executors.newSingleThreadScheduledExecutor(
125 | new DefaultNamedThreadFactory("CommandListener")
126 | ).scheduleWithFixedDelay(
127 | new RunnableWithExceptionProtection(
128 | this::getCommand,
129 | t -> LOGGER.error("get arthas command error.", t)
130 | ), 0, 2, TimeUnit.SECONDS
131 | );
132 | }
133 | ...
134 | ...
135 | }
136 | ```
137 | getCommand方法中定义了start、stop的处理逻辑,分别对应页面上的 connect 和 disconnect 操作。
138 | 这两个 command 有分别转给 ArthasCtl 的 startArthas 和 stopArthas 两个方法处理,用来控制 arthas 的启停。
139 |
140 | 在 startArthas 方法中,启动arthas-core.jar 并使用 skywalking-agent 的 serviceName 和 instanceName 注册连接至配置文件中指定的arthas-tunnel-server。
141 |
142 | ArthasCtl 逻辑参考自 Arthas 的 [BootStrap.java](https://github.com/alibaba/arthas/blob/master/boot/src/main/java/com/taobao/arthas/boot/Bootstrap.java) ,由于不是本篇文章的重点,这里不再赘述,感兴趣的小伙伴可以自行查看。
143 | ```java
144 | switch (commandResponse.getCommand()) {
145 | case START:
146 | if (alreadyAttached()) {
147 | LOGGER.warn("arthas already attached, no need start again");
148 | return;
149 | }
150 | try {
151 | arthasTelnetPort = SocketUtils.findAvailableTcpPort();
152 | ArthasCtl.startArthas(PidUtils.currentLongPid(), arthasTelnetPort);
153 | } catch (Exception e) {
154 | LOGGER.info("error when start arthas", e);
155 | }
156 | break;
157 | case STOP:
158 | if (!alreadyAttached()) {
159 | LOGGER.warn("no arthas attached, no need to stop");
160 | return;
161 | }
162 | try {
163 | ArthasCtl.stopArthas(arthasTelnetPort);
164 | arthasTelnetPort = null;
165 | } catch (Exception e) {
166 | LOGGER.info("error when stop arthas", e);
167 | }
168 | break;
169 | }
170 | ```
171 | 看完 arthas 的启动与停止控制逻辑,我们回到 CommandListener 的 statusChanged 方法,
172 | 由于要和 oap 通信,这里我们按照惯例监听 grpc channel 的状态,只有状态正常时才会执行上面的getCommand轮询。
173 | ```java
174 | public class CommandListener implements BootService, GRPCChannelListener {
175 | ...
176 | ...
177 | @Override
178 | public void statusChanged(final GRPCChannelStatus status) {
179 | if (GRPCChannelStatus.CONNECTED.equals(status)) {
180 | Object channel = ServiceManager.INSTANCE.findService(GRPCChannelManager.class).getChannel();
181 | // DO NOT REMOVE Channel CAST, or it will throw `incompatible types: org.apache.skywalking.apm.dependencies.io.grpc.Channel
182 | // cannot be converted to io.grpc.Channel` exception when compile due to agent core's shade of grpc dependencies.
183 | commandServiceBlockingStub = ArthasCommandServiceGrpc.newBlockingStub((Channel) channel);
184 | } else {
185 | commandServiceBlockingStub = null;
186 | }
187 | this.status = status;
188 | }
189 | ...
190 | ...
191 | }
192 | ```
193 | 上面的代码,细心的小伙伴可能会发现,getChannel() 的返回值被向上转型成了 Object, 而在下面的 newBlockingStub 方法中,又强制转成了 Channel。
194 |
195 | 看似有点多此一举,其实不然,我们将这里的转型去掉,尝试编译就会收到下面的错误:
196 | ```
197 | [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.10.1:compile (default-compile) on project arthas-control-plugin: Compilation failure
198 | [ERROR] .../CommandListener.java:[59,103] 不兼容的类型: org.apache.skywalking.apm.dependencies.io.grpc.Channel无法转换为io.grpc.Channel
199 | ```
200 | 上面的错误提示 ServiceManager.INSTANCE.findService(GRPCChannelManager.class).getChannel() 的返回值类型是 org.apache.skywalking.apm.dependencies.io.grpc.Channel,无法被赋值给 io.grpc.Channel 引用。
201 |
202 | 我们查看GRPCChannelManager的getChannel()方法代码会发现,方法定义的返回值明明是 io.grpc.Channel,为什么编译时会报上面的错误?
203 |
204 | 其实这是skywalking-agent的一个小魔法,由于 agent-core 最终会被打包进 skywalking-agent.jar,启动时由系统类装载器(或者其他父级类装载器)直接装载,
205 | 为了防止所依赖的类库和被监控服务的类发生版本冲突,agent 核心代码在打包时使用了maven-shade-plugin, 该插件会在 maven package 阶段改变 grpc 依赖的包名,
206 | 我们在源代码里看到的是 io.grpc.Channel,其实在真正运行时已经被改成了 org.apache.skywalking.apm.dependencies.io.grpc.Channel,这便可解释上面编译报错的原因。
207 |
208 | 除了grpc以外,其他一些 well-known 的 dependency 也会进行 shade 操作,详情大家可以参考 [apm-agent-core pom.xml](https://github.com/apache/skywalking-java/blob/main/apm-sniffer/apm-agent-core/pom.xml) :
209 | ```xml
210 |
211 | maven-shade-plugin
212 |
213 |
214 | package
215 |
216 | shade
217 |
218 |
219 | ...
220 | ...
221 |
222 |
223 | ${shade.com.google.source}
224 | ${shade.com.google.target}
225 |
226 |
227 | ${shade.io.grpc.source}
228 | ${shade.io.grpc.target}
229 |
230 |
231 | ${shade.io.netty.source}
232 | ${shade.io.netty.target}
233 |
234 |
235 | ${shade.io.opencensus.source}
236 | ${shade.io.opencensus.target}
237 |
238 |
239 | ${shade.io.perfmark.source}
240 | ${shade.io.perfmark.target}
241 |
242 |
243 | ${shade.org.slf4j.source}
244 | ${shade.org.slf4j.target}
245 |
246 |
247 | ...
248 | ...
249 |
250 |
251 |
252 |
253 | ```
254 | 除了上面的注意点以外,我们来看一下另一个场景,假设我们需要在 agent plugin 的 interceptor 中使用 plugin 中定义的 BootService 会发生什么?
255 |
256 | 我们回到 BootService 的加载逻辑,为了加载到 plugin 中定义的BootService,ServiceLoader 指定了类装载器为AgentClassLoader.getDefault(),
257 | (这行代码历史非常悠久,可以追溯到2018年:[Allow use SkyWalking plugin to override service in Agent core. #1111](https://github.com/apache/skywalking/pull/1111) ),
258 | 由此可见,plugin 中定义的 BootService 的 classloader 是 AgentClassLoader.getDefault():
259 | ```java
260 | void load(List allServices) {
261 | for (final BootService bootService : ServiceLoader.load(BootService.class, AgentClassLoader.getDefault())) {
262 | allServices.add(bootService);
263 | }
264 | }
265 | ```
266 | 再来看下 interceptor 的加载逻辑,[InterceptorInstanceLoader.java](https://github.com/apache/skywalking-java/blob/main/apm-sniffer/apm-agent-core/src/main/java/org/apache/skywalking/apm/agent/core/plugin/loader/InterceptorInstanceLoader.java)
267 | 的 load 方法规定了如果父加载器相同,plugin 中的 interceptor 将使用一个新创建的 AgentClassLoader (在绝大部分简单场景中,plugin 的 interceptor 都由同一个 AgentClassLoader 加载):
268 | ```java
269 | public static T load(String className,
270 | ClassLoader targetClassLoader) throws IllegalAccessException, InstantiationException, ClassNotFoundException, AgentPackageNotFoundException {
271 | ...
272 | ...
273 | pluginLoader = EXTEND_PLUGIN_CLASSLOADERS.get(targetClassLoader);
274 | if (pluginLoader == null) {
275 | pluginLoader = new AgentClassLoader(targetClassLoader);
276 | EXTEND_PLUGIN_CLASSLOADERS.put(targetClassLoader, pluginLoader);
277 | }
278 | ...
279 | ...
280 | }
281 | ```
282 | 按照类装载器的委派机制,interceptor 中如果用到了 BootService,也会由当前的类的装载器去装载。
283 | 所以 ServiceManager 中装载的 BootService 和 interceptor 装载的 BootService 并不是同一个 (一个 class 文件被不同的 classloader 装载了两次),如果在 interceptor 中 调用 BootService 方法,同样会发生 cast 异常。
284 | 由此可见,目前的实现并不支持我们在interceptor中直接调用 plugin 中 BootService 的方法,如果需要调用,只能将 BootService 放到 agent-core 中,由更高级别的类装载器优先装载。
285 |
286 | 这其实并不是 skywalking-agent 的问题,skywalking agent plugin 专注于自己的应用场景,只需要关注 trace、meter 以及默认 BootService 的覆盖就可以了。
287 | 只是我们如果有扩展 skywalking-agent 的需求,要对其类装载机制做到心中有数,否则可能会出现一些意想不到的问题。
288 |
289 | ## oap arthas-controller-module
290 | 看完 agent-plugin 的实现,我们再来看看 oap 部分的修改,oap 同样是模块化的设计,我们可以很轻松的增加一个新的模块,在 /oap-server/ 目录下新建 arthas-controller 子模块:
291 | ```
292 | arthas-controller/
293 | ├── pom.xml
294 | └── src
295 | └── main
296 | ├── java
297 | │ └── org
298 | │ └── apache
299 | │ └── skywalking
300 | │ └── oap
301 | │ └── arthas
302 | │ ├── ArthasControllerModule.java # 模块定义
303 | │ ├── ArthasControllerProvider.java # 模块逻辑实现者
304 | │ ├── CommandQueue.java
305 | │ └── handler
306 | │ ├── CommandGrpcHandler.java # grpc handler,供 plugin 通信使用
307 | │ └── CommandRestHandler.java # http handler,供 skywalking-ui 通信使用
308 | ├── proto
309 | │ └── ArthasCommandService.proto
310 | └── resources
311 | └── META-INF
312 | └── services # 模块及模块实现的 spi service
313 | ├── org.apache.skywalking.oap.server.library.module.ModuleDefine
314 | └── org.apache.skywalking.oap.server.library.module.ModuleProvider
315 | ```
316 | 模块的定义非常简单,只包含一个模块名,由于我们新增的模块并不需要暴露service给其他模块调用,services 我们返回一个空数组
317 | ```java
318 | public class ArthasControllerModule extends ModuleDefine {
319 |
320 | public static final String NAME = "arthas-controller";
321 |
322 | public ArthasControllerModule() {
323 | super(NAME);
324 | }
325 |
326 | @Override
327 | public Class>[] services() {
328 | return new Class[0];
329 | }
330 | }
331 | ```
332 | 接着是模块实现者,实现者取名为 default,module 指定该 provider 所属模块,由于没有模块的自定义配置,newConfigCreator 我们返回null即可。
333 | start 方法分别向 CoreModule 的 grpc 服务和 http 服务注册了两个 handler,grpc 服务和 http 服务就是我们熟知的 11800 和 12800 端口:
334 | ```java
335 | public class ArthasControllerProvider extends ModuleProvider {
336 |
337 | @Override
338 | public String name() {
339 | return "default";
340 | }
341 |
342 | @Override
343 | public Class extends ModuleDefine> module() {
344 | return ArthasControllerModule.class;
345 | }
346 |
347 | @Override
348 | public ConfigCreator> newConfigCreator() {
349 | return null;
350 | }
351 |
352 | @Override
353 | public void prepare() throws ServiceNotProvidedException {
354 |
355 | }
356 |
357 | @Override
358 | public void start() throws ServiceNotProvidedException, ModuleStartException {
359 | // grpc service for agent
360 | GRPCHandlerRegister grpcService = getManager().find(CoreModule.NAME)
361 | .provider()
362 | .getService(GRPCHandlerRegister.class);
363 | grpcService.addHandler(
364 | new CommandGrpcHandler()
365 | );
366 |
367 | // rest service for ui
368 | HTTPHandlerRegister restService = getManager().find(CoreModule.NAME)
369 | .provider()
370 | .getService(HTTPHandlerRegister.class);
371 | restService.addHandler(
372 | new CommandRestHandler(),
373 | Collections.singletonList(HttpMethod.POST)
374 | );
375 | }
376 |
377 | @Override
378 | public void notifyAfterCompleted() throws ServiceNotProvidedException {
379 |
380 | }
381 |
382 | @Override
383 | public String[] requiredModules() {
384 | return new String[0];
385 | }
386 | }
387 | ```
388 | 最后在配置文件中注册本模块及模块实现者,下面的配置表示 arthas-controller 这个 module 由 default provider 提供实现:
389 | ```yaml
390 | arthas-controller:
391 | selector: default
392 | default:
393 | ```
394 | CommandGrpcHandler 和 CommandHttpHandler 的逻辑非常简单,CommandHttpHandler 定义了 connect 和 disconnect 接口,
395 | 收到请求后会放到一个 Queue 中供 CommandGrpcHandler 消费,Queue 的实现如下,这里不再赘述:
396 | ```
397 | public class CommandQueue {
398 |
399 | private static final Map COMMANDS = new ConcurrentHashMap<>();
400 |
401 | // produce by connect、disconnect
402 | public static void produceCommand(String serviceName, String instanceName, Command command) {
403 | COMMANDS.put(serviceName + instanceName, command);
404 | }
405 |
406 | // consume by agent getCommand task
407 | public static Optional consumeCommand(String serviceName, String instanceName) {
408 | return Optional.ofNullable(COMMANDS.remove(serviceName + instanceName));
409 | }
410 | }
411 | ```
412 |
413 | ## skywalking-ui arthas console
414 | 完成了 agent 和 oap 的开发,我们再看下 ui 部分:
415 | 1. connect:调用oap server connect 接口,并连接 arthas-tunnel-server
416 | 2. disconnect:调用oap server disconnect 接口,并与 arthas-tunnel-server 断开连接
417 | 3. arthas 命令交互,这部分代码主要参考 arthas,大家可以查看 [web-ui console](https://github.com/alibaba/arthas/blob/master/web-ui/arthasWebConsole/all/share/component/Console.vue) 的实现
418 |
419 | 修改完skywalking-ui的代码后,我们可以直接通过 `npm run dev` 测试了。
420 |
421 | 如果需要通过主项目打包,别忘了在apm-webapp 的 [ApplicationStartUp.java](https://github.com/apache/skywalking/blob/master/apm-webapp/src/main/java/org/apache/skywalking/oap/server/webapp/ApplicationStartUp.java) 类中添加一条 arthas 的路由:
422 | ```java
423 | Server
424 | .builder()
425 | .port(port, SessionProtocol.HTTP)
426 | .service("/graphql", oap)
427 | .service("/internal/l7check", HealthCheckService.of())
428 | .service("/zipkin/config.json", zipkin)
429 | .serviceUnder("/arthas", oap)
430 | .serviceUnder("/zipkin/api", zipkin)
431 | .serviceUnder("/zipkin",
432 | FileService.of(
433 | ApplicationStartUp.class.getClassLoader(),
434 | "/zipkin-lens")
435 | .orElse(zipkinIndexPage))
436 | .serviceUnder("/",
437 | FileService.of(
438 | ApplicationStartUp.class.getClassLoader(),
439 | "/public")
440 | .orElse(indexPage))
441 | .build()
442 | .start()
443 | .join();
444 | ```
445 |
446 | ## 总结
447 | 1. BootService 启动及停止流程
448 | 2. 如何利用 BootService 实现自定义逻辑
449 | 3. Agent Plugin 的类装载机制
450 | 4. maven-shade-plugin 的使用与注意点
451 | 5. 如何利用 ModuleDefine 与 ModuleProvider 定义新的模块
452 | 6. 如何向 GRPC、HTTP Service 添加新的 handler
453 |
454 | 如果你还有任何的疑问,[欢迎大家与我交流](https://github.com/weixiang1862/skywalking-x-arthas/issues) 。
--------------------------------------------------------------------------------