├── .gitignore ├── .gitmodules ├── LICENSE ├── README.md ├── dist ├── arthas-control-plugin-9.1.0-SNAPSHOT.jar ├── arthas-controller-9.7.0-SNAPSHOT.jar └── skywalking-webapp.jar └── docs ├── img ├── connect-sequence.png ├── disconnect-sequence.png └── skywalking-x-arthas-ui.png └── skywalking-x-arthas.md /.gitignore: -------------------------------------------------------------------------------- 1 | /build/ 2 | target/ 3 | .idea/ 4 | *.iml 5 | .classpath 6 | .project 7 | .settings/ 8 | .DS_Store 9 | -------------------------------------------------------------------------------- /.gitmodules: -------------------------------------------------------------------------------- 1 | [submodule "skywalking"] 2 | path = skywalking 3 | url = https://github.com/weixiang1862/skywalking 4 | [submodule "skywalking-java"] 5 | path = skywalking-java 6 | url = https://github.com/weixiang1862/skywalking-java 7 | [submodule "skywalking-booster-ui"] 8 | path = skywalking-booster-ui 9 | url = https://github.com/weixiang1862/skywalking-booster-ui -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # skywalking-x-arthas 2 | Start an arthas diagnostic process from [skywalking-ui](https://github.com/apache/skywalking-booster-ui), without ask access of server from your boss. 3 | 4 | Package arthas into [skywalking-agent](https://github.com/apache/skywalking-agent), without copy installation package to vm or container when troubleshooting. 5 | 6 | ![Arthas Console](./docs/img/skywalking-x-arthas-ui.png) 7 | ## Quick Start 8 | ### 1. git clone skywalking-x-arthas 9 | ``` 10 | git clone https://github.com/weixiang1862/skywalking-x-arthas 11 | ``` 12 | ### 2. copy arthas-control-plugin to your agent `plugins` folder 13 | ``` 14 | cd ${path_to_skywalking-x-arthas} 15 | cp dist/arthas-control-plugin-9.1.0-SNAPSHOT.jar ${your_sw_agent_home}/plugins/ 16 | 17 | cat << EOF >>${your_sw_agent_home}/config/agent.config 18 | # arthas home, default is {$AGENT_HOME}/arthas 19 | plugin.arthas.arthas_home=${SW_ARTHAS_HOME:} 20 | # arthas tunnel server address, (e.g. ws://127.0.0.1:7777/ws) 21 | plugin.arthas.tunnel_server=${SW_ARTHAS_TUNNEL_SERVER:ws://127.0.0.1:7777/ws} 22 | plugin.arthas.session_timeout=${SW_ARTHAS_SESSION_TIMEOUT:} 23 | plugin.arthas.disabled_commands=${SW_ARTHAS_DISABLED_COMMAND:} 24 | EOF 25 | ``` 26 | ### 3. copy arthas-controller to your oap-server `oap-libs` folder 27 | ``` 28 | cd ${path_to_skywalking-x-arthas} 29 | cp dist/arthas-controller-9.7.0-SNAPSHOT.jar ${your_oap_server_home}/oap-libs/ 30 | 31 | cat << EOF >>${your_oap_server_home}/config/application.yml 32 | arthas-controller: 33 | selector: default 34 | default: 35 | EOF 36 | ``` 37 | ### 4. copy skywalking-webapp.jar to your oap-server `webapp` folder 38 | ``` 39 | cd ${path_to_skywalking-x-arthas} 40 | cp dist/skywalking-webapp.jar ${your_oap_server_home}/webapp/ 41 | ``` 42 | ### 5. download arthas & start an arthas tunnel server 43 | ``` 44 | https://arthas.aliyun.com/download/latest_version?mirror=aliyun -O arthas.zip 45 | mkdir ${your_sw_agent_home}/arthas 46 | unzip arthas.zip -d ${your_sw_agent_home}/arthas 47 | 48 | wget https://arthas.aliyun.com/download/arthas-tunnel-server/latest_version?mirror=aliyun -O arthas-tunnel-server.jar 49 | java -jar arthas-tunnel-server-3.7.1.jar 50 | ``` 51 | ### 6. restart your service and oap-server then do some test 52 | 53 | ## How It Works 54 | ### 1. connect 55 | ```mermaid 56 | sequenceDiagram 57 | participant skywalking-ui 58 | participant skywalking-oap-server 59 | participant your-service-with-sw-agent 60 | participant arthas-tunnel-server 61 | skywalking-ui->>skywalking-oap-server: start instance arthas process 62 | loop CommandCheck 63 | your-service-with-sw-agent->>skywalking-oap-server: should start arthas? 64 | end 65 | your-service-with-sw-agent-->>arthas-tunnel-server: connect when get start command 66 | skywalking-ui-->>arthas-tunnel-server: execute arthas command over websocket 67 | arthas-tunnel-server-->>skywalking-ui: get arthas command resoult over websocket 68 | ``` 69 | 70 | ### 2. disconnect 71 | ```mermaid 72 | sequenceDiagram 73 | participant skywalking-ui 74 | participant skywalking-oap-server 75 | participant your-service-with-sw-agent 76 | participant arthas-tunnel-server 77 | skywalking-ui->>skywalking-oap-server: stop instance arthas process 78 | skywalking-ui-->>arthas-tunnel-server: disconnect with tunnel 79 | loop CommandCheck 80 | your-service-with-sw-agent->>skywalking-oap-server: should stop arthas? 81 | end 82 | your-service-with-sw-agent-->>arthas-tunnel-server: disconnect when get stop command 83 | ``` 84 | 85 | 86 | 87 | ## Building From Source 88 | ### 1. build skywalking 89 | follow skywalking official [skywalking build docs](https://skywalking.apache.org/docs/main/next/en/guides/how-to-build/). 90 | ### 2. build skywalking-agent 91 | follow skywalking official [skywalking-java build docs](https://skywalking.apache.org/docs/skywalking-java/next/en/contribution/compiling/). 92 | 93 | ## Documentation 94 | To learn more, read the [documentation](./docs/skywalking-x-arthas.md). -------------------------------------------------------------------------------- /dist/arthas-control-plugin-9.1.0-SNAPSHOT.jar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/weixiang1862/skywalking-x-arthas/51a8c93140ce1b1c116bd4ec3506b496ac38caf6/dist/arthas-control-plugin-9.1.0-SNAPSHOT.jar -------------------------------------------------------------------------------- /dist/arthas-controller-9.7.0-SNAPSHOT.jar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/weixiang1862/skywalking-x-arthas/51a8c93140ce1b1c116bd4ec3506b496ac38caf6/dist/arthas-controller-9.7.0-SNAPSHOT.jar -------------------------------------------------------------------------------- /dist/skywalking-webapp.jar: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/weixiang1862/skywalking-x-arthas/51a8c93140ce1b1c116bd4ec3506b496ac38caf6/dist/skywalking-webapp.jar -------------------------------------------------------------------------------- /docs/img/connect-sequence.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/weixiang1862/skywalking-x-arthas/51a8c93140ce1b1c116bd4ec3506b496ac38caf6/docs/img/connect-sequence.png -------------------------------------------------------------------------------- /docs/img/disconnect-sequence.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/weixiang1862/skywalking-x-arthas/51a8c93140ce1b1c116bd4ec3506b496ac38caf6/docs/img/disconnect-sequence.png -------------------------------------------------------------------------------- /docs/img/skywalking-x-arthas-ui.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/weixiang1862/skywalking-x-arthas/51a8c93140ce1b1c116bd4ec3506b496ac38caf6/docs/img/skywalking-x-arthas-ui.png -------------------------------------------------------------------------------- /docs/skywalking-x-arthas.md: -------------------------------------------------------------------------------- 1 | ## 背景介绍 2 | Arthas 是一款常用的 Java 诊断工具,我们可以在 SkyWalking 监控到服务异常后,通过 Arthas 进一步分析和诊断以快速定位问题。 3 | 4 | 在 Arthas 实际使用中,通常由开发人员拷贝或者下载安装包到服务对应的VM或者容器中,attach 到对应的 Java 进程进行问题排查。这一过程不可避免的会造成服务器敏感运维信息的扩散, 5 | 而且在分秒必争的问题排查过程中,这些繁琐的操作无疑会浪费大量时间。 6 | 7 | SkyWalking Java Agent 伴随 Java 服务一起启动,并定期上报服务、实例信息给OAP Server。我们可以借助 SkyWalking Java Agent 的插件化能力,开发一个 Arthas 控制插件, 8 | 由该插件管理 Arthas 运行生命周期,通过页面化的方式,完成Arthas的启动与停止。最终实现效果可以参考下图: 9 | 10 | ![skywalking-x-arthas](img/skywalking-x-arthas-ui.png) 11 | 12 | 要完成上述功能,我们需要实现以下几个关键点: 13 | 1. 开发 agent arthas-control-plugin,执行 arthas 的启动与停止命令 14 | 2. 开发 oap arthas-controller-module ,下发控制命令给 arthas agent plugin 15 | 3. 定制 skywalking-ui, 连接 arthas-tunnel-server,发送 arthas 命令并获取执行结果 16 | 17 | 以上各个模块之间的交互流程如下图所示: 18 | 19 | ### connect 20 | ![skywalking-x-arthas](img/connect-sequence.png) 21 | 22 | ### disconnect 23 | ![skywalking-x-arthas](img/disconnect-sequence.png) 24 | 25 | 本文涉及的所有代码均已发布在 github [skywalking-x-arthas](https://github.com/weixiang1862/skywalking-x-arthas) 上,如有需要,大家可以自行下载代码测试。 26 | 文章后半部分将主要介绍代码逻辑及其中包含的SkyWalking扩展点。 27 | 28 | ## agent arthas-control-plugin 29 | 首先在 skywalking-java/apm-sniffer/apm-sdk-plugin 下创建一个 arthas-control-plugin, 30 | 该模块在打包后会成为 skywalking-agent/plugins 下的一个插件, 其目录结构如下: 31 | ``` 32 | arthas-control-plugin/ 33 | ├── pom.xml 34 | └── src 35 | └── main 36 | ├── java 37 | │ └── org 38 | │ └── apache 39 | │ └── skywalking 40 | │ └── apm 41 | │ └── plugin 42 | │ └── arthas 43 | │ ├── config 44 | │ │ └── ArthasConfig.java # 模块配置 45 | │ ├── service 46 | │ │ └── CommandListener.java # boot service,监听 oap command 47 | │ └── util 48 | │ ├── ArthasCtl.java # 控制 arthas 的启动与停止 49 | │ └── ProcessUtils.java 50 | ├── proto 51 | │ └── ArthasCommandService.proto # 与oap server通信的 grpc 协议定义 52 | └── resources 53 | └── META-INF 54 | └── services # boot service spi service 55 | └── org.apache.skywalking.apm.agent.core.boot.BootService 56 | 57 | 16 directories, 7 files 58 | ``` 59 | 在 ArthasConfig.java 中,我们定义了以下配置,这些参数将在 arthas 启动时传递。 60 | 61 | 以下的配置可以通过 agent.config 文件、system prop、env variable指定。 62 | 关于 skywalking-agent 配置的初始化的具体流程,大家可以参考 [SnifferConfigInitializer](https://github.com/apache/skywalking-java/blob/main/apm-sniffer/apm-agent-core/src/main/java/org/apache/skywalking/apm/agent/core/conf/SnifferConfigInitializer.java) 。 63 | ```java 64 | public class ArthasConfig { 65 | public static class Plugin { 66 | @PluginConfig(root = ArthasConfig.class) 67 | public static class Arthas { 68 | // arthas 目录 69 | public static String ARTHAS_HOME; 70 | // arthas 启动时连接的tunnel server 71 | public static String TUNNEL_SERVER; 72 | // arthas 会话超时时间 73 | public static Long SESSION_TIMEOUT; 74 | // 禁用的 arthas command 75 | public static String DISABLED_COMMANDS; 76 | } 77 | } 78 | } 79 | ``` 80 | 接着,我们看下 CommandListener.java 的实现,CommandListener 实现了 BootService 接口, 81 | 并通过 resources/META-INF/services 下的文件暴露给 ServiceLoader。 82 | 83 | BootService 的定义如下,共有prepare()、boot()、onComplete()、shutdown()几个方法,这几个方法分别对应插件生命周期的不同阶段。 84 | ```java 85 | public interface BootService { 86 | void prepare() throws Throwable; 87 | 88 | void boot() throws Throwable; 89 | 90 | void onComplete() throws Throwable; 91 | 92 | void shutdown() throws Throwable; 93 | 94 | default int priority() { 95 | return 0; 96 | } 97 | } 98 | ``` 99 | 在 [ServiceManager](https://github.com/apache/skywalking-java/blob/main/apm-sniffer/apm-agent-core/src/main/java/org/apache/skywalking/apm/agent/core/boot/ServiceManager.java) 类的 boot() 方法中, 100 | 定义了BootService 的 load 与启动流程,该方法 由SkyWalkingAgent 的 premain 调用,在主程序运行前完成初始化与启动: 101 | ```java 102 | public enum ServiceManager { 103 | INSTANCE; 104 | ... 105 | ... 106 | public void boot() { 107 | bootedServices = loadAllServices(); 108 | 109 | prepare(); 110 | startup(); 111 | onComplete(); 112 | } 113 | ... 114 | ... 115 | } 116 | ``` 117 | 回到我们 CommandListener 的 boot 方法,该方法在 agent 启动之初定义了一个定时任务,这个定时任务会轮询 oap ,查询是否需要启动或者停止arthas: 118 | ```java 119 | public class CommandListener implements BootService, GRPCChannelListener { 120 | ... 121 | ... 122 | @Override 123 | public void boot() throws Throwable { 124 | getCommandFuture = Executors.newSingleThreadScheduledExecutor( 125 | new DefaultNamedThreadFactory("CommandListener") 126 | ).scheduleWithFixedDelay( 127 | new RunnableWithExceptionProtection( 128 | this::getCommand, 129 | t -> LOGGER.error("get arthas command error.", t) 130 | ), 0, 2, TimeUnit.SECONDS 131 | ); 132 | } 133 | ... 134 | ... 135 | } 136 | ``` 137 | getCommand方法中定义了start、stop的处理逻辑,分别对应页面上的 connect 和 disconnect 操作。 138 | 这两个 command 有分别转给 ArthasCtl 的 startArthas 和 stopArthas 两个方法处理,用来控制 arthas 的启停。 139 | 140 | 在 startArthas 方法中,启动arthas-core.jar 并使用 skywalking-agent 的 serviceName 和 instanceName 注册连接至配置文件中指定的arthas-tunnel-server。 141 | 142 | ArthasCtl 逻辑参考自 Arthas 的 [BootStrap.java](https://github.com/alibaba/arthas/blob/master/boot/src/main/java/com/taobao/arthas/boot/Bootstrap.java) ,由于不是本篇文章的重点,这里不再赘述,感兴趣的小伙伴可以自行查看。 143 | ```java 144 | switch (commandResponse.getCommand()) { 145 | case START: 146 | if (alreadyAttached()) { 147 | LOGGER.warn("arthas already attached, no need start again"); 148 | return; 149 | } 150 | try { 151 | arthasTelnetPort = SocketUtils.findAvailableTcpPort(); 152 | ArthasCtl.startArthas(PidUtils.currentLongPid(), arthasTelnetPort); 153 | } catch (Exception e) { 154 | LOGGER.info("error when start arthas", e); 155 | } 156 | break; 157 | case STOP: 158 | if (!alreadyAttached()) { 159 | LOGGER.warn("no arthas attached, no need to stop"); 160 | return; 161 | } 162 | try { 163 | ArthasCtl.stopArthas(arthasTelnetPort); 164 | arthasTelnetPort = null; 165 | } catch (Exception e) { 166 | LOGGER.info("error when stop arthas", e); 167 | } 168 | break; 169 | } 170 | ``` 171 | 看完 arthas 的启动与停止控制逻辑,我们回到 CommandListener 的 statusChanged 方法, 172 | 由于要和 oap 通信,这里我们按照惯例监听 grpc channel 的状态,只有状态正常时才会执行上面的getCommand轮询。 173 | ```java 174 | public class CommandListener implements BootService, GRPCChannelListener { 175 | ... 176 | ... 177 | @Override 178 | public void statusChanged(final GRPCChannelStatus status) { 179 | if (GRPCChannelStatus.CONNECTED.equals(status)) { 180 | Object channel = ServiceManager.INSTANCE.findService(GRPCChannelManager.class).getChannel(); 181 | // DO NOT REMOVE Channel CAST, or it will throw `incompatible types: org.apache.skywalking.apm.dependencies.io.grpc.Channel 182 | // cannot be converted to io.grpc.Channel` exception when compile due to agent core's shade of grpc dependencies. 183 | commandServiceBlockingStub = ArthasCommandServiceGrpc.newBlockingStub((Channel) channel); 184 | } else { 185 | commandServiceBlockingStub = null; 186 | } 187 | this.status = status; 188 | } 189 | ... 190 | ... 191 | } 192 | ``` 193 | 上面的代码,细心的小伙伴可能会发现,getChannel() 的返回值被向上转型成了 Object, 而在下面的 newBlockingStub 方法中,又强制转成了 Channel。 194 | 195 | 看似有点多此一举,其实不然,我们将这里的转型去掉,尝试编译就会收到下面的错误: 196 | ``` 197 | [ERROR] Failed to execute goal org.apache.maven.plugins:maven-compiler-plugin:3.10.1:compile (default-compile) on project arthas-control-plugin: Compilation failure 198 | [ERROR] .../CommandListener.java:[59,103] 不兼容的类型: org.apache.skywalking.apm.dependencies.io.grpc.Channel无法转换为io.grpc.Channel 199 | ``` 200 | 上面的错误提示 ServiceManager.INSTANCE.findService(GRPCChannelManager.class).getChannel() 的返回值类型是 org.apache.skywalking.apm.dependencies.io.grpc.Channel,无法被赋值给 io.grpc.Channel 引用。 201 | 202 | 我们查看GRPCChannelManager的getChannel()方法代码会发现,方法定义的返回值明明是 io.grpc.Channel,为什么编译时会报上面的错误? 203 | 204 | 其实这是skywalking-agent的一个小魔法,由于 agent-core 最终会被打包进 skywalking-agent.jar,启动时由系统类装载器(或者其他父级类装载器)直接装载, 205 | 为了防止所依赖的类库和被监控服务的类发生版本冲突,agent 核心代码在打包时使用了maven-shade-plugin, 该插件会在 maven package 阶段改变 grpc 依赖的包名, 206 | 我们在源代码里看到的是 io.grpc.Channel,其实在真正运行时已经被改成了 org.apache.skywalking.apm.dependencies.io.grpc.Channel,这便可解释上面编译报错的原因。 207 | 208 | 除了grpc以外,其他一些 well-known 的 dependency 也会进行 shade 操作,详情大家可以参考 [apm-agent-core pom.xml](https://github.com/apache/skywalking-java/blob/main/apm-sniffer/apm-agent-core/pom.xml) : 209 | ```xml 210 | 211 | maven-shade-plugin 212 | 213 | 214 | package 215 | 216 | shade 217 | 218 | 219 | ... 220 | ... 221 | 222 | 223 | ${shade.com.google.source} 224 | ${shade.com.google.target} 225 | 226 | 227 | ${shade.io.grpc.source} 228 | ${shade.io.grpc.target} 229 | 230 | 231 | ${shade.io.netty.source} 232 | ${shade.io.netty.target} 233 | 234 | 235 | ${shade.io.opencensus.source} 236 | ${shade.io.opencensus.target} 237 | 238 | 239 | ${shade.io.perfmark.source} 240 | ${shade.io.perfmark.target} 241 | 242 | 243 | ${shade.org.slf4j.source} 244 | ${shade.org.slf4j.target} 245 | 246 | 247 | ... 248 | ... 249 | 250 | 251 | 252 | 253 | ``` 254 | 除了上面的注意点以外,我们来看一下另一个场景,假设我们需要在 agent plugin 的 interceptor 中使用 plugin 中定义的 BootService 会发生什么? 255 | 256 | 我们回到 BootService 的加载逻辑,为了加载到 plugin 中定义的BootService,ServiceLoader 指定了类装载器为AgentClassLoader.getDefault(), 257 | (这行代码历史非常悠久,可以追溯到2018年:[Allow use SkyWalking plugin to override service in Agent core. #1111](https://github.com/apache/skywalking/pull/1111) ), 258 | 由此可见,plugin 中定义的 BootService 的 classloader 是 AgentClassLoader.getDefault(): 259 | ```java 260 | void load(List allServices) { 261 | for (final BootService bootService : ServiceLoader.load(BootService.class, AgentClassLoader.getDefault())) { 262 | allServices.add(bootService); 263 | } 264 | } 265 | ``` 266 | 再来看下 interceptor 的加载逻辑,[InterceptorInstanceLoader.java](https://github.com/apache/skywalking-java/blob/main/apm-sniffer/apm-agent-core/src/main/java/org/apache/skywalking/apm/agent/core/plugin/loader/InterceptorInstanceLoader.java) 267 | 的 load 方法规定了如果父加载器相同,plugin 中的 interceptor 将使用一个新创建的 AgentClassLoader (在绝大部分简单场景中,plugin 的 interceptor 都由同一个 AgentClassLoader 加载): 268 | ```java 269 | public static T load(String className, 270 | ClassLoader targetClassLoader) throws IllegalAccessException, InstantiationException, ClassNotFoundException, AgentPackageNotFoundException { 271 | ... 272 | ... 273 | pluginLoader = EXTEND_PLUGIN_CLASSLOADERS.get(targetClassLoader); 274 | if (pluginLoader == null) { 275 | pluginLoader = new AgentClassLoader(targetClassLoader); 276 | EXTEND_PLUGIN_CLASSLOADERS.put(targetClassLoader, pluginLoader); 277 | } 278 | ... 279 | ... 280 | } 281 | ``` 282 | 按照类装载器的委派机制,interceptor 中如果用到了 BootService,也会由当前的类的装载器去装载。 283 | 所以 ServiceManager 中装载的 BootService 和 interceptor 装载的 BootService 并不是同一个 (一个 class 文件被不同的 classloader 装载了两次),如果在 interceptor 中 调用 BootService 方法,同样会发生 cast 异常。 284 | 由此可见,目前的实现并不支持我们在interceptor中直接调用 plugin 中 BootService 的方法,如果需要调用,只能将 BootService 放到 agent-core 中,由更高级别的类装载器优先装载。 285 | 286 | 这其实并不是 skywalking-agent 的问题,skywalking agent plugin 专注于自己的应用场景,只需要关注 trace、meter 以及默认 BootService 的覆盖就可以了。 287 | 只是我们如果有扩展 skywalking-agent 的需求,要对其类装载机制做到心中有数,否则可能会出现一些意想不到的问题。 288 | 289 | ## oap arthas-controller-module 290 | 看完 agent-plugin 的实现,我们再来看看 oap 部分的修改,oap 同样是模块化的设计,我们可以很轻松的增加一个新的模块,在 /oap-server/ 目录下新建 arthas-controller 子模块: 291 | ``` 292 | arthas-controller/ 293 | ├── pom.xml 294 | └── src 295 | └── main 296 | ├── java 297 | │   └── org 298 | │   └── apache 299 | │   └── skywalking 300 | │   └── oap 301 | │   └── arthas 302 | │   ├── ArthasControllerModule.java # 模块定义 303 | │   ├── ArthasControllerProvider.java # 模块逻辑实现者 304 | │   ├── CommandQueue.java 305 | │   └── handler 306 | │   ├── CommandGrpcHandler.java # grpc handler,供 plugin 通信使用 307 | │   └── CommandRestHandler.java # http handler,供 skywalking-ui 通信使用 308 | ├── proto 309 | │   └── ArthasCommandService.proto 310 | └── resources 311 | └── META-INF 312 | └── services # 模块及模块实现的 spi service 313 | ├── org.apache.skywalking.oap.server.library.module.ModuleDefine 314 | └── org.apache.skywalking.oap.server.library.module.ModuleProvider 315 | ``` 316 | 模块的定义非常简单,只包含一个模块名,由于我们新增的模块并不需要暴露service给其他模块调用,services 我们返回一个空数组 317 | ```java 318 | public class ArthasControllerModule extends ModuleDefine { 319 | 320 | public static final String NAME = "arthas-controller"; 321 | 322 | public ArthasControllerModule() { 323 | super(NAME); 324 | } 325 | 326 | @Override 327 | public Class[] services() { 328 | return new Class[0]; 329 | } 330 | } 331 | ``` 332 | 接着是模块实现者,实现者取名为 default,module 指定该 provider 所属模块,由于没有模块的自定义配置,newConfigCreator 我们返回null即可。 333 | start 方法分别向 CoreModule 的 grpc 服务和 http 服务注册了两个 handler,grpc 服务和 http 服务就是我们熟知的 11800 和 12800 端口: 334 | ```java 335 | public class ArthasControllerProvider extends ModuleProvider { 336 | 337 | @Override 338 | public String name() { 339 | return "default"; 340 | } 341 | 342 | @Override 343 | public Class module() { 344 | return ArthasControllerModule.class; 345 | } 346 | 347 | @Override 348 | public ConfigCreator newConfigCreator() { 349 | return null; 350 | } 351 | 352 | @Override 353 | public void prepare() throws ServiceNotProvidedException { 354 | 355 | } 356 | 357 | @Override 358 | public void start() throws ServiceNotProvidedException, ModuleStartException { 359 | // grpc service for agent 360 | GRPCHandlerRegister grpcService = getManager().find(CoreModule.NAME) 361 | .provider() 362 | .getService(GRPCHandlerRegister.class); 363 | grpcService.addHandler( 364 | new CommandGrpcHandler() 365 | ); 366 | 367 | // rest service for ui 368 | HTTPHandlerRegister restService = getManager().find(CoreModule.NAME) 369 | .provider() 370 | .getService(HTTPHandlerRegister.class); 371 | restService.addHandler( 372 | new CommandRestHandler(), 373 | Collections.singletonList(HttpMethod.POST) 374 | ); 375 | } 376 | 377 | @Override 378 | public void notifyAfterCompleted() throws ServiceNotProvidedException { 379 | 380 | } 381 | 382 | @Override 383 | public String[] requiredModules() { 384 | return new String[0]; 385 | } 386 | } 387 | ``` 388 | 最后在配置文件中注册本模块及模块实现者,下面的配置表示 arthas-controller 这个 module 由 default provider 提供实现: 389 | ```yaml 390 | arthas-controller: 391 | selector: default 392 | default: 393 | ``` 394 | CommandGrpcHandler 和 CommandHttpHandler 的逻辑非常简单,CommandHttpHandler 定义了 connect 和 disconnect 接口, 395 | 收到请求后会放到一个 Queue 中供 CommandGrpcHandler 消费,Queue 的实现如下,这里不再赘述: 396 | ``` 397 | public class CommandQueue { 398 | 399 | private static final Map COMMANDS = new ConcurrentHashMap<>(); 400 | 401 | // produce by connect、disconnect 402 | public static void produceCommand(String serviceName, String instanceName, Command command) { 403 | COMMANDS.put(serviceName + instanceName, command); 404 | } 405 | 406 | // consume by agent getCommand task 407 | public static Optional consumeCommand(String serviceName, String instanceName) { 408 | return Optional.ofNullable(COMMANDS.remove(serviceName + instanceName)); 409 | } 410 | } 411 | ``` 412 | 413 | ## skywalking-ui arthas console 414 | 完成了 agent 和 oap 的开发,我们再看下 ui 部分: 415 | 1. connect:调用oap server connect 接口,并连接 arthas-tunnel-server 416 | 2. disconnect:调用oap server disconnect 接口,并与 arthas-tunnel-server 断开连接 417 | 3. arthas 命令交互,这部分代码主要参考 arthas,大家可以查看 [web-ui console](https://github.com/alibaba/arthas/blob/master/web-ui/arthasWebConsole/all/share/component/Console.vue) 的实现 418 | 419 | 修改完skywalking-ui的代码后,我们可以直接通过 `npm run dev` 测试了。 420 | 421 | 如果需要通过主项目打包,别忘了在apm-webapp 的 [ApplicationStartUp.java](https://github.com/apache/skywalking/blob/master/apm-webapp/src/main/java/org/apache/skywalking/oap/server/webapp/ApplicationStartUp.java) 类中添加一条 arthas 的路由: 422 | ```java 423 | Server 424 | .builder() 425 | .port(port, SessionProtocol.HTTP) 426 | .service("/graphql", oap) 427 | .service("/internal/l7check", HealthCheckService.of()) 428 | .service("/zipkin/config.json", zipkin) 429 | .serviceUnder("/arthas", oap) 430 | .serviceUnder("/zipkin/api", zipkin) 431 | .serviceUnder("/zipkin", 432 | FileService.of( 433 | ApplicationStartUp.class.getClassLoader(), 434 | "/zipkin-lens") 435 | .orElse(zipkinIndexPage)) 436 | .serviceUnder("/", 437 | FileService.of( 438 | ApplicationStartUp.class.getClassLoader(), 439 | "/public") 440 | .orElse(indexPage)) 441 | .build() 442 | .start() 443 | .join(); 444 | ``` 445 | 446 | ## 总结 447 | 1. BootService 启动及停止流程 448 | 2. 如何利用 BootService 实现自定义逻辑 449 | 3. Agent Plugin 的类装载机制 450 | 4. maven-shade-plugin 的使用与注意点 451 | 5. 如何利用 ModuleDefine 与 ModuleProvider 定义新的模块 452 | 6. 如何向 GRPC、HTTP Service 添加新的 handler 453 | 454 | 如果你还有任何的疑问,[欢迎大家与我交流](https://github.com/weixiang1862/skywalking-x-arthas/issues) 。 --------------------------------------------------------------------------------