├── .gitignore ├── LICENSE ├── README.md ├── README_zh.md ├── attach ├── benchmark.png ├── cli.jpg └── python.jpg ├── en ├── benchmark.md ├── s1.md ├── s2.md ├── s3.md ├── s4.md ├── s5.md ├── s6.md └── s7.md └── zh ├── benchmark.md ├── s1.md ├── s2.md ├── s3.md ├── s4.md ├── s5.md ├── s6.md └── s7.md /.gitignore: -------------------------------------------------------------------------------- 1 | .DS_Store 2 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # LMDeploy-Jetson Community 2 | 3 | ***Deploying LLMs offline on the NVIDIA Jetson platform marks the dawn of a new era in embodied intelligence, where devices can function independently without continuous internet access.*** 4 | 5 | [[中文]](./README_zh.md) | [[English]](./README.md) 6 | 7 | This project focuses on adapting [LMDeploy](https://github.com/InternLM/lmdeploy) for use with NVIDIA Jetson series edge computing cards, facilitating the implementation of [InternLM](https://github.com/InternLM/InternLM) series LLMs for **Offline Embodied Intelligence (OEI)**. 8 | 9 | ## Latest News🎉 10 | 11 | * [2024/3/15] Updated suppoort for [LMDeploy-v0.2.5](https://github.com/InternLM/lmdeploy/releases/tag/v0.2.5). 12 | * [2024/2/26] This project has been included in the [LMDeploy](https://github.com/InternLM/lmdeploy) community. 13 | 14 | ## Community Recruitment 15 | 16 | * Recruiting community managers (Contact: an.hongjun@foxmail.com) 17 | * Recruiting benchmark testing data for more models of Jetson boards (please PR directly), such as: 18 | * Jetson Nano 19 | * Jetson TX2 20 | * Jetson AGX Xavier 21 | * Jetson Orin Nano 22 | * Jetson AGX Orin 23 | * Recruiting developers to create Jetson-specific whl distributions 24 | * README optimization, etc. 25 | 26 | ## Verified model/platform 27 | 28 | * ✅:Verified and runnable 29 | * ❌:Verified but not runnable 30 | * ⭕️:Pending verification 31 | 32 | |Models|InternLM-7B|InternLM-20B|InternLM2-1.8B|InternLM2-7B|InternLM2-20B| 33 | |:-:|:-:|:-:|:-:|:-:|:-:| 34 | |Orin AGX(32G)
Jetpack 5.1|✅
Mem:??/??
*14.68 token/s*|✅
Mem:??/??
*5.82 token/s*|✅
Mem:??/??
*56.57 token/s*|✅
Mem:??/??
*14.56 token/s*|✅
Mem:??/??
*6.16 token/s*| 35 | |Orin NX(16G)
Jetpack 5.1|✅
Mem:8.6G/16G
*7.39 token/s*|✅
Mem:14.7G/16G
*3.08 token/s*|✅
Mem:5.6G/16G
*22.96 token/s*|✅
Mem:9.2G/16G
*7.48 token/s*|✅
Mem:14.8G/16G
*3.19 token/s*| 36 | |Xavier NX(8G)
Jetpack 5.1|❌|❌|✅
Mem:4.35G/8G
*28.36 token/s*|❌|❌| 37 | 38 | 39 | **If you have more Jetson series boards, feel free to run benchmarks and submit the results via `Pull Requests` (PR) to become one of the community contributors!** 40 | 41 | 42 | ## Future Work 43 | 44 | * Updating benchmark testing data for more models of Jetson boards. 45 | * Creating Jetson-specific whl distributions. 46 | * Following up on updates to the LMDeploy version. 47 | 48 | ## Tutorial 49 | 50 | [S1.Quantize on server by W4A16](./en/s1.md) 51 | 52 | [S2.Install Miniconda on Jetson](./en/s2.md) 53 | 54 | [S3.Install CMake-3.29.0 on Jetson](./en/s3.md) 55 | 56 | [S4.Install RapidJson on Jetson](./en/s4.md) 57 | 58 | [S5.Install Pytorch-2.1.0 on Jetson](./en/s5.md) 59 | 60 | [S6.Port LMDeploy-0.2.5 to Jetson](./en/s6.md) 61 | 62 | [S7.Run InternLM offline on Jetson](./en/s7.md) 63 | 64 | ## Appendix 65 | 66 | * [Reinstall Jetpack for Jetson](https://www.anhongjun.top/blogs.php?id=1) 67 | * [Test Benchmark of LMDeploy-Jetson](./en/benchmark.md) 68 | 69 | ## Community Projects 70 | 71 | * InternDog: Offline embodied intelligent guide dog based on the InternLM2. [[Github]](https://github.com/BestAnHongjun/InternDog) [[Bilibili]](https://www.bilibili.com/video/BV1RK421s7dm) 72 | 73 | ## Citation 74 | 75 | If this project is helpful to your work, please cite it using the following format: 76 | 77 | ```bibtex 78 | @misc{2024lmdeployjetson, 79 | title={LMDeploy-Jetson:Opening a new era of Offline Embodied Intelligence}, 80 | author={LMDeploy-Jetson Community}, 81 | url={https://github.com/BestAnHongjun/LMDeploy-Jetson}, 82 | year={2024} 83 | } 84 | ``` 85 | 86 | ## Acknowledgements 87 | 88 | * [InternLM Practical Camp](https://github.com/InternLM/tutorial/) 89 | * [Shanghai Artificial Intelligence Laboratory](https://www.shlab.org.cn/) 90 | -------------------------------------------------------------------------------- /README_zh.md: -------------------------------------------------------------------------------- 1 | # LMDeploy-Jetson社区 2 | 3 | ***在NVIDIA Jetson平台离线部署大模型,开启离线具身智能新纪元。*** 4 | 5 | [[中文]](./README_zh.md) | [[English]](./README.md) 6 | 7 | 本项目提供一种将[LMDeploy](https://github.com/InternLM/lmdeploy)移植到NVIDIA Jetson系列边缘计算卡的方法,并在Jetson计算卡上运行[InternLM](https://github.com/InternLM/InternLM)系列大模型,为**离线具身智能**提供可能。 8 | 9 | ## 最新新闻🎉 10 | 11 | * [2024/3/15] 更新了对[LMDeploy-v0.2.5](https://github.com/InternLM/lmdeploy/releases/tag/v0.2.5)。 12 | * [2024/2/26] 本项目被[LMDeploy](https://github.com/InternLM/lmdeploy)官方社区收录。 13 | 14 | ## 社区招募 15 | 16 | * 招募社区管理员(联系方式,an.hongjun@foxmail.com) 17 | * 招募更多型号Jetson板卡的Benchmark测试数据,可直接PR,如: 18 | * Jetson Nano 19 | * Jetson TX2 20 | * Jetson AGX Xavier 21 | * Jetson Orin Nano 22 | * Jetson AGX Orin 23 | * 招募开发者制作Jetson专用whl发行版 24 | * README优化等 25 | 26 | ## 已验证模型/平台 27 | 28 | * ✅:已验证可运行 29 | * ❌:已验证不可运行 30 | * ⭕️:待验证 31 | 32 | |Models|InternLM-7B|InternLM-20B|InternLM2-1.8B|InternLM2-7B|InternLM2-20B| 33 | |:-:|:-:|:-:|:-:|:-:|:-:| 34 | |Orin AGX(32G)
Jetpack 5.1|✅
Mem:??/??
*14.68 token/s*|✅
Mem:??/??
*5.82 token/s*|✅
Mem:??/??
*56.57 token/s*|✅
Mem:??/??
*14.56 token/s*|✅
Mem:??/??
*6.16 token/s*| 35 | |Orin NX(16G)
Jetpack 5.1|✅
Mem:8.6G/16G
*7.39 token/s*|✅
Mem:14.7G/16G
*3.08 token/s*|✅
Mem:5.6G/16G
*22.96 token/s*|✅
Mem:9.2G/16G
*7.48 token/s*|✅
Mem:14.8G/16G
*3.19 token/s*| 36 | |Xavier NX(8G)
Jetpack 5.1|❌|❌|✅
Mem:4.35G/8G
*28.36 token/s*|❌|❌| 37 | 38 | **如果您有更多Jetson系列板卡,欢迎运行Benchmark并通过`Pull requests`(PR)提交结果,成为社区贡献者之一!** 39 | 40 | ## 未来工作 41 | * 更新更多型号Jetson板卡的Benchmark测试数据 42 | * 制作Jetson专用whl发行版 43 | * 跟进更新版本的LMDeploy 44 | 45 | ## 部署教程 46 | 47 | [S1.服务器端模型W4A16量化](./zh/s1.md) 48 | 49 | [S2.Jetson端安装Miniconda](./zh/s2.md) 50 | 51 | [S3.Jetson端安装CMake-3.29.0](./zh/s3.md) 52 | 53 | [S4.Jetson端安装RapidJson](./zh/s4.md) 54 | 55 | [S5.Jetson端安装Pytorch-2.1.0](./zh/s5.md) 56 | 57 | [S6.Jetson端移植LMDeploy-0.2.5](./zh/s6.md) 58 | 59 | [S7.Jetson端离线运行InternLM大模型](./zh/s7.md) 60 | 61 | ## 附录 62 | 63 | * [为Jetson重装Jetpack](https://www.anhongjun.top/blogs.php?id=1) 64 | * [LMDeploy-Jetson基准测试](./zh/benchmark.md) 65 | 66 | ## 社区项目 67 | 68 | * InternDog: 基于InternLM2大模型的离线具身智能导盲犬 [[Github]](https://github.com/BestAnHongjun/InternDog) [[Bilibili]](https://www.bilibili.com/video/BV1RK421s7dm) 69 | 70 | ## 引用 71 | 72 | 如果本项目对您的工作有所帮助,请使用以下格式引用: 73 | 74 | ```bibtex 75 | @misc{2024lmdeployjetson, 76 | title={LMDeploy-Jetson:Opening a new era of Offline Embodied Intelligence}, 77 | author={LMDeploy-Jetson Community}, 78 | url={https://github.com/BestAnHongjun/LMDeploy-Jetson}, 79 | year={2024} 80 | } 81 | ``` 82 | 83 | ## 致谢 84 | 85 | * [书生·浦语大模型实战营](https://github.com/InternLM/tutorial/) 86 | * [上海人工智能实验室](https://www.shlab.org.cn/) 87 | -------------------------------------------------------------------------------- /attach/benchmark.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BestAnHongjun/LMDeploy-Jetson/89ba96f07433a247dece3ededfeb137fd9ab3758/attach/benchmark.png -------------------------------------------------------------------------------- /attach/cli.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BestAnHongjun/LMDeploy-Jetson/89ba96f07433a247dece3ededfeb137fd9ab3758/attach/cli.jpg -------------------------------------------------------------------------------- /attach/python.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/BestAnHongjun/LMDeploy-Jetson/89ba96f07433a247dece3ededfeb137fd9ab3758/attach/python.jpg -------------------------------------------------------------------------------- /en/benchmark.md: -------------------------------------------------------------------------------- 1 | # Test Benchmark of LMDeploy-Jetson 2 | 3 | Please first refer to S2-S7 for deploying LMDeploy in Jetson. 4 | 5 | Activate your conda environment. 6 | 7 | ```sh 8 | conda activate lmdeploy 9 | ``` 10 | 11 | Enter the `lmdeploy/benchmark` directory. 12 | 13 | ```sh 14 | cd ~/lmdeploy/benchmark 15 | ``` 16 | 17 | Run Benchmark. 18 | 19 | ```sh 20 | python profile_generation.py \ 21 | /internlm2-chat-1_8b-turbomind \ 22 | --concurrency 1 \ 23 | --prompt-tokens 128 \ 24 | --completion-tokens 128 25 | ``` 26 | 27 | Replace `internlm2 chat-1_8b turbomind` with your model path. 28 | 29 | Record the speed benchmark. 30 | 31 | ![](../attach/benchmark.png) 32 | 33 | During the inference process, the unified memory usage can be viewed through the `htop` command. 34 | 35 | ```sh 36 | # Install htop (if already installed, please ignore) 37 | apt-get install htop 38 | 39 | # Run htop to check the usage of Mem. 40 | htop 41 | ``` 42 | 43 | -------------------------------------------------------------------------------- /en/s1.md: -------------------------------------------------------------------------------- 1 | # S1.Quantize on server by W4A16 2 | 3 | The LLMs occupy a large amount of GPU memory during inference. We can use the LMDeploy tool to quantize the model to [W4A16](https://github.com/InternLM/lmdeploy/blob/main/docs/en/quantization/w4a16.md) format and convert it into a [TurboMind](https://github.com/InternLM/lmdeploy/blob/main/docs/en/inference/turbomind.md) model. This can significantly reduce GPU memory usage, enabling the deployment of LLMs on the Jetson edge computing platform. 4 | 5 | ### 1.Setup conda environment 6 | 7 | The installation method for Anaconda is omitted. 8 | 9 | Setup conda environment: 10 | 11 | ```sh 12 | conda create -n lmdeploy python=3.10 13 | ``` 14 | 15 | Activate conda environment: 16 | 17 | ```sh 18 | conda activate lmdeploy 19 | ``` 20 | 21 | ### 2.Install LMDeploy 22 | 23 | Install lmdeploy by pip. 24 | 25 | ```sh 26 | # ref:https://github.com/InternLM/lmdeploy/issues/1169 27 | pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118 28 | pip install lmdeploy[all]==0.2.3 29 | ``` 30 | 31 | ### 3.Download HF model 32 | 33 | ```sh 34 | mkdir -p ~/models && cd ~/models 35 | ``` 36 | 37 | Install dependencies. 38 | 39 | ```sh 40 | pip install modelscope 41 | ``` 42 | 43 | Create file `download_models.py`: 44 | 45 | ```py 46 | from modelscope import snapshot_download 47 | model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-chat-7b', cache_dir='internlm-chat-7b') 48 | print(model_dir) 49 | ``` 50 | 51 | > `internlm-chat-7b`` can be replaced with different models, such as `internlm-chat-20b`,`internlm2-chat-1_8b`,`internlm2-chat-7b`,`internlm2-chat-20b`. 52 | 53 | Run the download script. 54 | 55 | ```sh 56 | python download_models.py 57 | ``` 58 | 59 | The final printed output path is the path where the model is saved. Please make a note of it. 60 | 61 | ```sh 62 | internlm-chat-7b/Shanghai_AI_Laboratory/internlm-chat-7b 63 | ``` 64 | 65 | ### 4.Quantize model by W4A16 66 | 67 | ```sh 68 | export HF_MODEL=./internlm-chat-7b/Shanghai_AI_Laboratory/internlm-chat-7b 69 | export WORK_DIR=./internlm-chat-7b-4bit 70 | 71 | lmdeploy lite auto_awq \ 72 | $HF_MODEL \ 73 | --calib-dataset 'ptb' \ 74 | --calib-samples 128 \ 75 | --calib-seqlen 2048 \ 76 | --w-bits 4 \ 77 | --w-group-size 128 \ 78 | --work-dir $WORK_DIR 79 | ``` 80 | 81 | Convert model format. 82 | 83 | ```sh 84 | export TM_DIR=./internlm-chat-7b-turbomind 85 | 86 | lmdeploy convert internlm-chat-7b \ 87 | $WORK_DIR \ 88 | --model-format awq \ 89 | --group-size 128 \ 90 | --dst-path $TM_DIR 91 | ``` 92 | 93 | Open the configuration file `internlm-chat-7b-turbomind/triton_models/weights/config.ini` and modify the following three lines. 94 | 95 | ```ini 96 | cache_max_entry_count = 0.5 97 | cache_block_seq_len = 128 98 | cache_chunk_size = 1 99 | ``` 100 | 101 | Compress the TurboMind model. 102 | 103 | ```sh 104 | apt-get install pigz # Multi-threads speed up 105 | tar --use-compress-program=pigz -cvf internlm-chat-7b-turbomind.tgz ./internlm-chat-7b-turbomind 106 | ``` 107 | 108 | Keep `internlm-chat-7b-turbomind.tgz`, it will be used later. -------------------------------------------------------------------------------- /en/s2.md: -------------------------------------------------------------------------------- 1 | # S2.Install Miniconda on Jetson 2 | 3 | For more convenient management of Python packages, we choose to use Conda virtual environment. However, due to the large size of Anaconda, we opt for Miniconda as a replacement. 4 | 5 | Create Miniconda installation directory. 6 | 7 | ```sh 8 | mkdir -p ~/miniconda3 9 | ``` 10 | 11 | Download the Miniconda installation package: 12 | 13 | [[Download by Browser]](https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh) 14 | 15 | ```sh 16 | wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh -O ~/miniconda3/miniconda.sh 17 | ``` 18 | 19 | Install Miniconda: 20 | 21 | ```sh 22 | bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 23 | ``` 24 | 25 | Delete the installation package: 26 | 27 | ```sh 28 | rm -rf ~/miniconda3/miniconda.sh 29 | ``` 30 | 31 | Initialize bash configuration: 32 | 33 | ```sh 34 | ~/miniconda3/bin/conda init bash 35 | source ~/.bashrc 36 | ``` 37 | 38 | Create Conda environment: 39 | 40 | ```sh 41 | conda create -n lmdeploy python=3.8 # 请安装py38,否则安装pytorch会出现不兼容的问题。 42 | ``` 43 | -------------------------------------------------------------------------------- /en/s3.md: -------------------------------------------------------------------------------- 1 | # S3.Install CMake-3.29.0 on Jetson 2 | 3 | "The pre-installed CMake version in Jetpack is too low. We need to upgrade the CMake version in order to compile and install LMDeploy." 4 | 5 | "To avoid the 'butterfly effect' caused by changes in the CMake version, this tutorial adopts a method that does not install the higher version of CMake into the system directory. Instead, it temporarily selects the higher version of CMake using the `export` environment variable when needed." 6 | 7 | Download cmake-3.29.0-rc1: 8 | 9 | [[Download by Browser]](https://github.com/Kitware/CMake/releases/download/v3.29.0-rc1/cmake-3.29.0-rc1-linux-aarch64.tar.gz) 10 | 11 | ```sh 12 | cd ~ 13 | wget https://github.com/Kitware/CMake/releases/download/v3.29.0-rc1/cmake-3.29.0-rc1-linux-aarch64.tar.gz 14 | ``` 15 | 16 | Unzip the install package: 17 | 18 | ```sh 19 | tar xf cmake-3.29.0-rc1-linux-aarch64.tar.gz 20 | ``` 21 | 22 | Delete the install package: 23 | 24 | ```sh 25 | rm cmake-3.29.0-rc1-linux-aarch64.tar.gz 26 | ``` 27 | 28 | Rename the folder: 29 | 30 | ```sh 31 | mv cmake-3.29.0-rc1-linux-aarch64 cmake-3.29.0 32 | cd cmake-3.29.0 33 | mv cmake-3.29.0-rc1-linux-aarch64/* . 34 | rm -rf cmake-3.29.0-rc1-linux-aarch64 35 | ``` 36 | 37 | Verify if it can run normally: 38 | 39 | ```sh 40 | ./bin/cmake --version 41 | ``` 42 | 43 | Temporarily specify CMake version as 3.29.0: 44 | 45 | ```sh 46 | export PATH=/home/nvidia/cmake-3.29.0/bin:$PATH 47 | ``` 48 | 49 | **Note**: Please replace `nvidia` in the above command with your username. 50 | 51 | Check if the version has changed to 3.29.0: 52 | 53 | ```sh 54 | cmake --version 55 | ``` 56 | 57 | **Note**: The higher version of CMake only takes effect in the current terminal session. You need to re-run `export` when opening a new terminal to use the higher version of CMake. 58 | -------------------------------------------------------------------------------- /en/s4.md: -------------------------------------------------------------------------------- 1 | # S4.Install RapidJson on Jetson 2 | 3 | Clone the RapidJson repository. 4 | 5 | ```sh 6 | cd ~ 7 | git clone https://github.com/Tencent/rapidjson.git 8 | 9 | # For Chinese User 10 | # git clone https://gitee.com/Tencent/RapidJSON.git 11 | ``` 12 | 13 | Initialize submodules. 14 | 15 | ```sh 16 | cd rapidjson 17 | 18 | # For Chinese User 19 | # cd RapidJSON 20 | ``` 21 | 22 | Complie RapidJson. 23 | 24 | ```sh 25 | mkdir build && cd build 26 | cmake .. \ 27 | -DRAPIDJSON_BUILD_DOC=OFF \ 28 | -DRAPIDJSON_BUILD_EXAMPLES=OFF \ 29 | -DRAPIDJSON_BUILD_TESTS=OFF 30 | make -j4 31 | ``` 32 | 33 | Install RapidJson to the system. 34 | 35 | ```sh 36 | sudo make install 37 | ``` -------------------------------------------------------------------------------- /en/s5.md: -------------------------------------------------------------------------------- 1 | # S5.Install Pytorch-2.1.0 on Jetson 2 | 3 | Download pytorch v2.1.0 4 | 5 | ```sh 6 | cd ~ 7 | mkdir pytorch-2.1.0-cp38 && cd pytorch-2.1.0-cp38 8 | # For JetPack 5, execute the following command. For JetPack 6, please download the corresponding PyTorch 2.1 from the official website. 9 | wget https://developer.download.nvidia.cn/compute/redist/jp/v512/pytorch/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl 10 | ``` 11 | 12 | [[Download Pytorch by Browser]](https://developer.download.nvidia.cn/compute/redist/jp/v512/pytorch/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl) 13 | 14 | Install dependencies. 15 | 16 | ```sh 17 | sudo apt-get install libopenblas-dev 18 | ``` 19 | 20 | > **For Chinese users**: \ 21 | > Replace Tsinghua Source: https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu-ports/ \ 22 | > Please choose `Ubuntu 20.04 LTS (focal)`. 23 | 24 | Activate Conda environment. 25 | 26 | ```sh 27 | conda activate lmdeploy 28 | ``` 29 | 30 | Install pytorch-v2.1.0 31 | 32 | ```sh 33 | pip install torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl 34 | ``` 35 | 36 | > **For Chinese users**: \ 37 | > Replace Tsinghua Source: 38 | 39 | ```sh 40 | pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple 41 | ``` 42 | 43 | Verify successful installation using Python interpreter mode, normal display should show as `True``. 44 | 45 | ```py 46 | import torch 47 | torch.cuda.is_available() 48 | ``` -------------------------------------------------------------------------------- /en/s6.md: -------------------------------------------------------------------------------- 1 | # S6.Port LMDeploy-0.2.5 to Jetson 2 | 3 | Download lmdeploy-v0.2.5: 4 | 5 | [[Download by Browser]](https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.2.5.zip) 6 | 7 | ```sh 8 | cd ~/ 9 | git clone https://github.com/InternLM/lmdeploy.git 10 | cd lmdeploy 11 | git checkout c5f4014 # 确保为0.2.5版本 12 | ``` 13 | 14 | Activate conda environment: 15 | 16 | ```sh 17 | conda activate lmdeploy 18 | ``` 19 | 20 | Install dependencies. 21 | 22 | ```sh 23 | cd ~/lmdeploy 24 | pip install -r requirements/build.txt 25 | ``` 26 | 27 | Create a new file named `generate_jetson.sh` under `~/lmdeploy`, and fill in the following content: 28 | 29 | ```sh 30 | #!/bin/sh 31 | 32 | builder="-G Ninja" 33 | 34 | if [ "$1" == "make" ]; then 35 | builder="" 36 | fi 37 | 38 | cmake ${builder} .. \ 39 | -DCMAKE_BUILD_TYPE=RelWithDebInfo \ 40 | -DCMAKE_EXPORT_COMPILE_COMMANDS=1 \ 41 | -DCMAKE_INSTALL_PREFIX=./install \ 42 | -DBUILD_PY_FFI=ON \ 43 | -DBUILD_MULTI_GPU=OFF \ 44 | -DCMAKE_CUDA_FLAGS="-lineinfo" \ 45 | -DUSE_NVTX=ON 46 | 47 | ``` 48 | 49 | Modify file permissions. 50 | 51 | ```sh 52 | chmod +x generate_jetson.sh 53 | ``` 54 | 55 | Install Ninja. 56 | 57 | ```sh 58 | sudo apt-get install ninja-build 59 | ``` 60 | 61 | Create a new build folder. 62 | 63 | ```sh 64 | cd ~/lmdeploy 65 | mkdir build && cd build 66 | ``` 67 | 68 | Compile LMDeploy. 69 | 70 | ```sh 71 | ../generate_jetson.sh 72 | ninja install 73 | ``` 74 | 75 | During the compilation process, the memory may run out and result in `Killed`. You can expand the swap capacity as follows, and then execute ninja install again. 76 | 77 | ```sh 78 | # Create a 6 GB swap area. The size can be customized and combined with the disk capacity 79 | sudo fallocate -l 6G /var/swapfile 80 | # Modify file permissions. 81 | sudo chmod 600 /var/swapfile 82 | # Make swap area 83 | sudo mkswap /var/swapfile 84 | # Setup swap area 85 | sudo swapon /var/swapfile 86 | # Setup swap area automatically 87 | sudo bash -c 'echo "/var/swapfile swap swap defaults 0 0" >> /etc/fstab' 88 | ``` 89 | 90 | **Attention**: Use vim to edit `requirements/runtime.txt`, and delete the lines containing `torch<=2.1.2,>=2.0.0` and `triton>=2.1.0,<2.2.0`. 91 | 92 | **Note**: To simplify dependencies, we have removed `triton`. This also means that when deploying models using lmdeploy, they can only be invoked through the turbomind method, and not through the API method. 93 | 94 | Install lmdeploy-v0.2.5 locally. 95 | 96 | ```sh 97 | cd ~/lmdeploy 98 | pip install -e .[serve] 99 | ``` -------------------------------------------------------------------------------- /en/s7.md: -------------------------------------------------------------------------------- 1 | # S7.Run InternLM offline on Jetson 2 | 3 | Create directory to save models. 4 | 5 | ```sh 6 | mkdir -p ~/models 7 | ``` 8 | 9 | Upload the `internlm-chat-7b-turbomind.tgz` obtained from [S1.Quantize on server by W4A16](./s1.md) to the `models` directory. 10 | 11 | Unzip the model. 12 | 13 | ```sh 14 | tar zxvf internlm-chat-7b-turbomind.tgz -C . 15 | ``` 16 | 17 | ### 0.Bug fix: Modify the MMEngine module. 18 | 19 | The PyTorch version on Jetson does not support distributed reduce operations, which may cause errors in the distributed parts of the MMEngine module. 20 | 21 | Error as: 22 | 23 | ```sh 24 | AttributeError: module 'torch.distributed' has no attribute 'ReduceOp' 25 | ``` 26 | 27 | Activate conda environment: 28 | 29 | ```sh 30 | conda activate lmdeploy 31 | ``` 32 | 33 | Run Python in interpreter mode: 34 | 35 | ```sh 36 | python 37 | ``` 38 | 39 | Enter the following content: 40 | 41 | ```py 42 | import mmengine 43 | print(mmengine.__file__) 44 | ``` 45 | 46 | It will output the installation location of the MMEngine module. The author's location is`/home/nvidia/miniconda3/envs/lmdeploy/lib/python3.8/site-packages/mmengine/__init__.py`,then the location of that is`home/nvidia/miniconda3/envs/lmdeploy/lib/python3.8/site-packages/mmengine/`.Let's use `` instead. 47 | 48 | Modify line 208 of `/logging/logger.py`. 49 | 50 | ```git 51 | - global_rank = _get_rank() 52 | + global_rank = 0 53 | ``` 54 | 55 | There will be no errors during operation. 56 | 57 | **Attention * *: This method is too crude and only applicable to Jetson platform deployment inference. It will affect distributed functionality on the server side! 58 | 59 | ### 1.CLI Mode 60 | 61 | Acitavate conda environment: 62 | 63 | ```sh 64 | conda activate lmdeploy 65 | ``` 66 | 67 | Run model. 68 | 69 | ```sh 70 | lmdeploy chat turbomind ./internlm-chat-7b-turbomind 71 | ``` 72 | 73 | ![](../attach/cli.jpg) 74 | 75 | ### 2.Python Mode 76 | 77 | Write a running script `run_model.py` with the following content: 78 | 79 | ```py 80 | from lmdeploy import turbomind as tm 81 | 82 | 83 | if __name__ == "__main__": 84 | model_path = "./internlm-chat-7b-turbomind" # 修改成你的路径 85 | 86 | tm_model = tm.TurboMind.from_pretrained(model_path) 87 | generator = tm_model.create_instance() 88 | 89 | while True: 90 | inp = input("[User] >>> ") 91 | if inp == "exit": 92 | break 93 | prompt = tm_model.model.get_prompt(inp) 94 | input_ids = tm_model.tokenizer.encode(prompt) 95 | for outputs in generator.stream_infer(session_id=0, input_ids=[input_ids]): 96 | res = outputs[1] 97 | response = tm_model.tokenizer.decode(res) 98 | print("[Bot] <<< {}".format(response)) 99 | 100 | ``` 101 | 102 | Activate conda environment: 103 | 104 | ```sh 105 | conda activate lmdeploy 106 | ``` 107 | 108 | Run the script: 109 | 110 | ```sh 111 | python run_model.py 112 | ``` 113 | 114 | ![](../attach/python.jpg) 115 | -------------------------------------------------------------------------------- /zh/benchmark.md: -------------------------------------------------------------------------------- 1 | # LMDeploy-Jetson基准测试 2 | 3 | 请首先参考S2-S7在Jetson部署LMDeploy。 4 | 5 | 激活conda环境。 6 | 7 | ```sh 8 | conda activate lmdeploy 9 | ``` 10 | 11 | 进入`lmdeploy/benchmark`目录。 12 | 13 | ```sh 14 | cd ~/lmdeploy/benchmark 15 | ``` 16 | 17 | 运行Benchmark。 18 | 19 | ```sh 20 | python profile_generation.py \ 21 | /internlm2-chat-1_8b-turbomind \ 22 | --concurrency 1 \ 23 | --prompt-tokens 128 \ 24 | --completion-tokens 128 25 | ``` 26 | 27 | 其中`internlm2-chat-1_8b-turbomind`更换为你的模型路径。 28 | 29 | 记录推理速度benchmark。 30 | 31 | ![](../attach/benchmark.png) 32 | 33 | 推理过程中,可通过`htop`命令查看统一内存占用情况。 34 | 35 | ```sh 36 | # htop安装方法(如已安装,请忽略) 37 | apt-get install htop 38 | 39 | # 运行htop,查看Mem 40 | htop 41 | ``` -------------------------------------------------------------------------------- /zh/s1.md: -------------------------------------------------------------------------------- 1 | # S1.服务器端模型W4A16量化 2 | 3 | 大模型推理时占用显存巨大,我们可以借助LMDeploy工具对模型进行[W4A16量化](https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/quantization/w4a16.md),转换为[TurboMind](https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/inference/turbomind.md)模型,这样在推理时可以极大减少显存占用,使得在Jetson边缘计算平台部署大模型成为可能。 4 | 5 | ### 1.创建conda环境 6 | 7 | 安装Anaconda方法略。 8 | 9 | 创建conda环境: 10 | 11 | ```sh 12 | conda create -n lmdeploy python=3.10 13 | ``` 14 | 15 | 激活conda环境: 16 | 17 | ```sh 18 | conda activate lmdeploy 19 | ``` 20 | 21 | ### 2.安装LMDeploy 22 | 23 | 使用pip方法安装lmdeploy。 24 | 25 | ```sh 26 | # 直接用pip装lmdeploy时安装的pytorch的cuda可能是12版本的,运行时会引发链接错误 27 | # ref:https://github.com/InternLM/lmdeploy/issues/1169 28 | pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118 29 | # 安装好torch-2.1.2-cu118后再安装lmdeploy 30 | pip install lmdeploy[all]==0.2.3 31 | ``` 32 | 33 | ### 3.下载HF模型权重文件 34 | 35 | 创建目录 36 | 37 | ```sh 38 | mkdir -p ~/models && cd ~/models 39 | ``` 40 | 41 | 安装依赖项。 42 | 43 | ```sh 44 | pip install modelscope 45 | ``` 46 | 47 | 创建Python文件`download_models.py`: 48 | 49 | ```py 50 | #模型下载 51 | from modelscope import snapshot_download 52 | model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-chat-7b', cache_dir='internlm-chat-7b') 53 | print(model_dir) 54 | ``` 55 | 56 | > 其中,internlm-chat-7b可替换为不同的模型,比如`internlm-chat-20b`,`internlm2-chat-1_8b`,`internlm2-chat-7b`,`internlm2-chat-20b`,下同。 57 | 58 | 运行下载脚本: 59 | 60 | ```sh 61 | python download_models.py 62 | ``` 63 | 64 | 最后打印输出的路径就是模型保存的路径,请记录下。笔者为: 65 | 66 | ```sh 67 | internlm-chat-7b/Shanghai_AI_Laboratory/internlm-chat-7b 68 | ``` 69 | 70 | ### 4.模型W4A16量化 71 | 72 | ```sh 73 | export HF_MODEL=./internlm-chat-7b/Shanghai_AI_Laboratory/internlm-chat-7b 74 | export WORK_DIR=./internlm-chat-7b-4bit 75 | 76 | lmdeploy lite auto_awq \ 77 | $HF_MODEL \ 78 | --calib-dataset 'ptb' \ 79 | --calib-samples 128 \ 80 | --calib-seqlen 2048 \ 81 | --w-bits 4 \ 82 | --w-group-size 128 \ 83 | --work-dir $WORK_DIR 84 | ``` 85 | 86 | 转换模型格式。 87 | 88 | ```sh 89 | export TM_DIR=./internlm-chat-7b-turbomind 90 | 91 | lmdeploy convert internlm-chat-7b \ 92 | $WORK_DIR \ 93 | --model-format awq \ 94 | --group-size 128 \ 95 | --dst-path $TM_DIR 96 | ``` 97 | 98 | 修改配置文件`internlm-chat-7b-turbomind/triton_models/weights/config.ini`,修改如下三行: 99 | 100 | ```ini 101 | cache_max_entry_count = 0.5 102 | cache_block_seq_len = 128 103 | cache_chunk_size = 1 104 | ``` 105 | 106 | 压缩turbomind模型。 107 | 108 | ```sh 109 | apt-get install pigz # 多线程加速压缩 110 | tar --use-compress-program=pigz -cvf internlm-chat-7b-turbomind.tgz ./internlm-chat-7b-turbomind 111 | ``` 112 | 113 | 将`internlm-chat-7b-turbomind.tgz`保留备用。 -------------------------------------------------------------------------------- /zh/s2.md: -------------------------------------------------------------------------------- 1 | # S2.Jetson端安装Miniconda 2 | 3 | 在Jetson上安装conda虚拟环境方便管理python包。由于Anaconda过于庞大,因此选择安装Miniconda。 4 | 5 | 创建Miniconda安装目录。 6 | 7 | ```sh 8 | mkdir -p ~/miniconda3 9 | ``` 10 | 11 | 下载Miniconda安装包: 12 | 13 | [[网页下载]](https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh) 14 | 15 | ```sh 16 | wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh -O ~/miniconda3/miniconda.sh 17 | ``` 18 | 19 | 安装Miniconda: 20 | 21 | ```sh 22 | bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3 23 | ``` 24 | 25 | 删除安装包: 26 | 27 | ```sh 28 | rm -rf ~/miniconda3/miniconda.sh 29 | ``` 30 | 31 | 初始化bash配置: 32 | 33 | ```sh 34 | ~/miniconda3/bin/conda init bash 35 | source ~/.bashrc 36 | ``` 37 | 38 | 创建conda环境: 39 | 40 | ```sh 41 | conda create -n lmdeploy python=3.8 # 请安装py38,否则安装pytorch会出现不兼容的问题。 42 | ``` 43 | -------------------------------------------------------------------------------- /zh/s3.md: -------------------------------------------------------------------------------- 1 | # S3.Jetson端安装CMake-3.29.0 2 | 3 | Jetpack预装的CMake版本太低了,我们需要升级一下CMake版本才能编译安装LMDeploy。 4 | 5 | 为了避免CMake版本变化引起“蝴蝶效应”,本教程采取的方法不会将高版本的CMake安装到系统目录,而是在使用时临时通过`export`环境变量的方式选用高版本的CMake。 6 | 7 | 下载cmake-3.29.0-rc1: 8 | 9 | [[网页下载]](https://github.com/Kitware/CMake/releases/download/v3.29.0-rc1/cmake-3.29.0-rc1-linux-aarch64.tar.gz) 10 | 11 | ```sh 12 | cd ~ 13 | wget https://github.com/Kitware/CMake/releases/download/v3.29.0-rc1/cmake-3.29.0-rc1-linux-aarch64.tar.gz 14 | ``` 15 | 16 | 解压压缩包: 17 | 18 | ```sh 19 | tar xf cmake-3.29.0-rc1-linux-aarch64.tar.gz 20 | ``` 21 | 22 | 删除压缩包: 23 | 24 | ```sh 25 | rm cmake-3.29.0-rc1-linux-aarch64.tar.gz 26 | ``` 27 | 28 | 重命名文件夹: 29 | 30 | ```sh 31 | mv cmake-3.29.0-rc1-linux-aarch64 cmake-3.29.0 32 | cd cmake-3.29.0 33 | mv cmake-3.29.0-rc1-linux-aarch64/* . 34 | rm -rf cmake-3.29.0-rc1-linux-aarch64 35 | ``` 36 | 37 | 检验能否正常运行: 38 | 39 | ```sh 40 | ./bin/cmake --version 41 | ``` 42 | 43 | 临时指定cmake版本为3.29.0: 44 | 45 | ```sh 46 | export PATH=/home/nvidia/cmake-3.29.0/bin:$PATH 47 | ``` 48 | 49 | **注意**:请将上述命令中`nvidia`替换为你的用户名。 50 | 51 | 查看版本是否变成了3.29.0: 52 | 53 | ```sh 54 | cmake --version 55 | ``` 56 | 57 | **注意**:高版本CMake只对当前终端生效。当你打开新的终端时,需要重新运行`export`才能使用高版本的CMake。 -------------------------------------------------------------------------------- /zh/s4.md: -------------------------------------------------------------------------------- 1 | # S4.Jetson端安装RapidJson 2 | 3 | 克隆RapidJson仓库。 4 | 5 | ```sh 6 | cd ~ 7 | git clone https://github.com/Tencent/rapidjson.git 8 | 9 | # 对于中国用户: 10 | # git clone https://gitee.com/Tencent/RapidJSON.git 11 | ``` 12 | 13 | 初始化子模块: 14 | 15 | ```sh 16 | cd rapidjson 17 | 18 | # 对于中国用户: 19 | # cd RapidJSON 20 | ``` 21 | 22 | 编译RapidJson: 23 | ```sh 24 | mkdir build && cd build 25 | cmake .. \ 26 | -DRAPIDJSON_BUILD_DOC=OFF \ 27 | -DRAPIDJSON_BUILD_EXAMPLES=OFF \ 28 | -DRAPIDJSON_BUILD_TESTS=OFF 29 | make -j4 30 | ``` 31 | 32 | 将RapidJson安装到系统: 33 | 34 | ```sh 35 | sudo make install # 安装到系统 36 | ``` -------------------------------------------------------------------------------- /zh/s5.md: -------------------------------------------------------------------------------- 1 | # S5.Jetson端安装Pytorch-2.1.0 2 | 3 | 下载pytorch v2.1.0 4 | 5 | ```sh 6 | cd ~ 7 | mkdir pytorch-2.1.0-cp38 && cd pytorch-2.1.0-cp38 8 | # Jetpack 5执行如下指令,JetPack 6请到官方网站下载对应的pytorch2.1 9 | wget https://developer.download.nvidia.cn/compute/redist/jp/v512/pytorch/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl 10 | ``` 11 | 12 | [[网页下载Pytorch]](https://developer.download.nvidia.cn/compute/redist/jp/v512/pytorch/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl) 13 | 14 | 安装依赖项 15 | 16 | ```sh 17 | sudo apt-get install libopenblas-dev 18 | ``` 19 | 20 | > **对于中国用户**: \ 21 | > 更换清华源: https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu-ports/ \ 22 | > 请选择 `Ubuntu 20.04 LTS (focal)`。 23 | 24 | 激活conda环境 25 | 26 | ```sh 27 | conda activate lmdeploy 28 | ``` 29 | 30 | 安装pytorch-v2.1.0 31 | 32 | ```sh 33 | pip install torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl 34 | ``` 35 | 36 | > **对于中国用户**: \ 37 | > 更换清华源: 38 | ```sh 39 | pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple 40 | ``` 41 | 42 | 使用python解释器模式验证是否安装成功,正常显示未True 43 | 44 | ```py 45 | import torch 46 | torch.cuda.is_available() 47 | ``` -------------------------------------------------------------------------------- /zh/s6.md: -------------------------------------------------------------------------------- 1 | # S6.Jetson端移植LMDeploy-0.2.5 2 | 3 | 下载lmdeploy-v0.2.5: 4 | 5 | [[网页下载]](https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.2.5.zip) 6 | 7 | ```sh 8 | cd ~/ 9 | git clone https://github.com/InternLM/lmdeploy.git 10 | cd lmdeploy 11 | git checkout c5f4014 # 确保为0.2.5版本 12 | ``` 13 | 14 | 进入conda环境: 15 | 16 | ```sh 17 | conda activate lmdeploy 18 | ``` 19 | 20 | 安装编译依赖项: 21 | 22 | ```sh 23 | cd ~/lmdeploy 24 | pip install -r requirements/build.txt 25 | ``` 26 | 27 | 在`~/lmdeploy`下新建`generate_jetson.sh`,填入以下内容: 28 | 29 | ```sh 30 | #!/bin/sh 31 | 32 | builder="-G Ninja" 33 | 34 | if [ "$1" == "make" ]; then 35 | builder="" 36 | fi 37 | 38 | cmake ${builder} .. \ 39 | -DCMAKE_BUILD_TYPE=RelWithDebInfo \ 40 | -DCMAKE_EXPORT_COMPILE_COMMANDS=1 \ 41 | -DCMAKE_INSTALL_PREFIX=./install \ 42 | -DBUILD_PY_FFI=ON \ 43 | -DBUILD_MULTI_GPU=OFF \ 44 | -DCMAKE_CUDA_FLAGS="-lineinfo" \ 45 | -DUSE_NVTX=ON 46 | 47 | ``` 48 | 49 | 赋予权限。 50 | 51 | ```sh 52 | chmod +x generate_jetson.sh 53 | ``` 54 | 55 | 安装Ninja。 56 | 57 | ```sh 58 | sudo apt-get install ninja-build 59 | ``` 60 | 61 | 新建编译文件夹: 62 | 63 | ```sh 64 | cd ~/lmdeploy 65 | mkdir build && cd build 66 | ``` 67 | 68 | 编译LMDeploy: 69 | 70 | ```sh 71 | ../generate_jetson.sh 72 | ninja install 73 | ``` 74 | 75 | 编译过程中可能出现内存不足而导致`Killed`的现象。可以通过如下方式扩大交换区容量,再重新执行`ninja install`。 76 | 77 | ```sh 78 | # 新建6G大小的交换区,大小可自定义,结合硬盘容量 79 | sudo fallocate -l 6G /var/swapfile 80 | # 赋予权限 81 | sudo chmod 600 /var/swapfile 82 | # 建立交换区 83 | sudo mkswap /var/swapfile 84 | # 启用交换区 85 | sudo swapon /var/swapfile 86 | # 设置自动启用交换区 87 | sudo bash -c 'echo "/var/swapfile swap swap defaults 0 0" >> /etc/fstab' 88 | ``` 89 | 90 | **注意**:使用vim修改`requirements/runtime.txt`,将`torch<=2.1.2,>=2.0.0`和`triton>=2.1.0,<2.2.0`两行行删除。 91 | 92 | **提示**:为了简化依赖,我们去除了`triton`,这也同时意味着使用lmdeploy部署模型时,只能通过turbomind方式调用,而不能通过api方式。 93 | 94 | 本地安装lmdeploy-v0.2.5 95 | 96 | ```sh 97 | cd ~/lmdeploy 98 | pip install -e .[serve] 99 | ``` -------------------------------------------------------------------------------- /zh/s7.md: -------------------------------------------------------------------------------- 1 | # S7.Jetson端离线运行InternLM大模型 2 | 3 | 创建模型保存目录: 4 | 5 | ```sh 6 | mkdir -p ~/models 7 | ``` 8 | 9 | 将[S1.服务器端模型W4A16量化](./s1.md)得到的`internlm-chat-7b-turbomind.tgz`上传到`models`目录下。 10 | 11 | 解压模型文件: 12 | 13 | ```sh 14 | tar zxvf internlm-chat-7b-turbomind.tgz -C . 15 | ``` 16 | 17 | ### 0.Bug解决:修改MMEngine库 18 | 19 | Jetson端的pytorch不支持分布式的reduce算子,这会导致MMEngine库中与分布式有关的部分出现错误。 20 | 21 | 错误为: 22 | 23 | ```sh 24 | AttributeError: module 'torch.distributed' has no attribute 'ReduceOp' 25 | ``` 26 | 27 | 激活conda环境: 28 | 29 | ```sh 30 | conda activate lmdeploy 31 | ``` 32 | 33 | 用解释器方式运行python: 34 | 35 | ```sh 36 | python 37 | ``` 38 | 39 | 输入如下内容: 40 | 41 | ```py 42 | import mmengine 43 | print(mmengine.__file__) 44 | ``` 45 | 46 | 这就输出了MMEngine库的安装位置,笔者的是`/home/nvidia/miniconda3/envs/lmdeploy/lib/python3.8/site-packages/mmengine/__init__.py`,那么相应位置就是`home/nvidia/miniconda3/envs/lmdeploy/lib/python3.8/site-packages/mmengine/`,咱们用``代替。 47 | 48 | 修改`/logging/logger.py`第208行: 49 | 50 | ```git 51 | - global_rank = _get_rank() 52 | + global_rank = 0 53 | ``` 54 | 55 | 在运行就不会报错了。 56 | 57 | **注意**:该方式过于粗暴,仅适用于Jetson平台部署推理,在服务器端会影响分布式功能! 58 | 59 | ### 1.终端运行 60 | 61 | 激活conda环境: 62 | 63 | ```sh 64 | conda activate lmdeploy 65 | ``` 66 | 67 | 运行模型: 68 | 69 | ```sh 70 | lmdeploy chat turbomind ./internlm-chat-7b-turbomind 71 | ``` 72 | 73 | ![](../attach/cli.jpg) 74 | 75 | ### 2.Python集成运行 76 | 77 | 编写运行脚本`run_model.py`,内容如下: 78 | 79 | ```py 80 | from lmdeploy import turbomind as tm 81 | 82 | 83 | if __name__ == "__main__": 84 | model_path = "./internlm-chat-7b-turbomind" # 修改成你的路径 85 | 86 | tm_model = tm.TurboMind.from_pretrained(model_path) 87 | generator = tm_model.create_instance() 88 | 89 | while True: 90 | inp = input("[User] >>> ") 91 | if inp == "exit": 92 | break 93 | prompt = tm_model.model.get_prompt(inp) 94 | input_ids = tm_model.tokenizer.encode(prompt) 95 | for outputs in generator.stream_infer(session_id=0, input_ids=[input_ids]): 96 | res = outputs[1] 97 | response = tm_model.tokenizer.decode(res) 98 | print("[Bot] <<< {}".format(response)) 99 | 100 | ``` 101 | 102 | 激活conda环境: 103 | 104 | ```sh 105 | conda activate lmdeploy 106 | ``` 107 | 108 | 运行脚本: 109 | 110 | ```sh 111 | python run_model.py 112 | ``` 113 | 114 | ![](../attach/python.jpg) 115 | --------------------------------------------------------------------------------