├── .gitignore
├── LICENSE
├── README.md
├── README_zh.md
├── attach
    ├── benchmark.png
    ├── cli.jpg
    └── python.jpg
├── en
    ├── benchmark.md
    ├── s1.md
    ├── s2.md
    ├── s3.md
    ├── s4.md
    ├── s5.md
    ├── s6.md
    └── s7.md
└── zh
    ├── benchmark.md
    ├── s1.md
    ├── s2.md
    ├── s3.md
    ├── s4.md
    ├── s5.md
    ├── s6.md
    └── s7.md


/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |                                  Apache License
  2 |                            Version 2.0, January 2004
  3 |                         http://www.apache.org/licenses/
  4 | 
  5 |    TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
  6 | 
  7 |    1. Definitions.
  8 | 
  9 |       "License" shall mean the terms and conditions for use, reproduction,
 10 |       and distribution as defined by Sections 1 through 9 of this document.
 11 | 
 12 |       "Licensor" shall mean the copyright owner or entity authorized by
 13 |       the copyright owner that is granting the License.
 14 | 
 15 |       "Legal Entity" shall mean the union of the acting entity and all
 16 |       other entities that control, are controlled by, or are under common
 17 |       control with that entity. For the purposes of this definition,
 18 |       "control" means (i) the power, direct or indirect, to cause the
 19 |       direction or management of such entity, whether by contract or
 20 |       otherwise, or (ii) ownership of fifty percent (50%) or more of the
 21 |       outstanding shares, or (iii) beneficial ownership of such entity.
 22 | 
 23 |       "You" (or "Your") shall mean an individual or Legal Entity
 24 |       exercising permissions granted by this License.
 25 | 
 26 |       "Source" form shall mean the preferred form for making modifications,
 27 |       including but not limited to software source code, documentation
 28 |       source, and configuration files.
 29 | 
 30 |       "Object" form shall mean any form resulting from mechanical
 31 |       transformation or translation of a Source form, including but
 32 |       not limited to compiled object code, generated documentation,
 33 |       and conversions to other media types.
 34 | 
 35 |       "Work" shall mean the work of authorship, whether in Source or
 36 |       Object form, made available under the License, as indicated by a
 37 |       copyright notice that is included in or attached to the work
 38 |       (an example is provided in the Appendix below).
 39 | 
 40 |       "Derivative Works" shall mean any work, whether in Source or Object
 41 |       form, that is based on (or derived from) the Work and for which the
 42 |       editorial revisions, annotations, elaborations, or other modifications
 43 |       represent, as a whole, an original work of authorship. For the purposes
 44 |       of this License, Derivative Works shall not include works that remain
 45 |       separable from, or merely link (or bind by name) to the interfaces of,
 46 |       the Work and Derivative Works thereof.
 47 | 
 48 |       "Contribution" shall mean any work of authorship, including
 49 |       the original version of the Work and any modifications or additions
 50 |       to that Work or Derivative Works thereof, that is intentionally
 51 |       submitted to Licensor for inclusion in the Work by the copyright owner
 52 |       or by an individual or Legal Entity authorized to submit on behalf of
 53 |       the copyright owner. For the purposes of this definition, "submitted"
 54 |       means any form of electronic, verbal, or written communication sent
 55 |       to the Licensor or its representatives, including but not limited to
 56 |       communication on electronic mailing lists, source code control systems,
 57 |       and issue tracking systems that are managed by, or on behalf of, the
 58 |       Licensor for the purpose of discussing and improving the Work, but
 59 |       excluding communication that is conspicuously marked or otherwise
 60 |       designated in writing by the copyright owner as "Not a Contribution."
 61 | 
 62 |       "Contributor" shall mean Licensor and any individual or Legal Entity
 63 |       on behalf of whom a Contribution has been received by Licensor and
 64 |       subsequently incorporated within the Work.
 65 | 
 66 |    2. Grant of Copyright License. Subject to the terms and conditions of
 67 |       this License, each Contributor hereby grants to You a perpetual,
 68 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 69 |       copyright license to reproduce, prepare Derivative Works of,
 70 |       publicly display, publicly perform, sublicense, and distribute the
 71 |       Work and such Derivative Works in Source or Object form.
 72 | 
 73 |    3. Grant of Patent License. Subject to the terms and conditions of
 74 |       this License, each Contributor hereby grants to You a perpetual,
 75 |       worldwide, non-exclusive, no-charge, royalty-free, irrevocable
 76 |       (except as stated in this section) patent license to make, have made,
 77 |       use, offer to sell, sell, import, and otherwise transfer the Work,
 78 |       where such license applies only to those patent claims licensable
 79 |       by such Contributor that are necessarily infringed by their
 80 |       Contribution(s) alone or by combination of their Contribution(s)
 81 |       with the Work to which such Contribution(s) was submitted. If You
 82 |       institute patent litigation against any entity (including a
 83 |       cross-claim or counterclaim in a lawsuit) alleging that the Work
 84 |       or a Contribution incorporated within the Work constitutes direct
 85 |       or contributory patent infringement, then any patent licenses
 86 |       granted to You under this License for that Work shall terminate
 87 |       as of the date such litigation is filed.
 88 | 
 89 |    4. Redistribution. You may reproduce and distribute copies of the
 90 |       Work or Derivative Works thereof in any medium, with or without
 91 |       modifications, and in Source or Object form, provided that You
 92 |       meet the following conditions:
 93 | 
 94 |       (a) You must give any other recipients of the Work or
 95 |           Derivative Works a copy of this License; and
 96 | 
 97 |       (b) You must cause any modified files to carry prominent notices
 98 |           stating that You changed the files; and
 99 | 
100 |       (c) You must retain, in the Source form of any Derivative Works
101 |           that You distribute, all copyright, patent, trademark, and
102 |           attribution notices from the Source form of the Work,
103 |           excluding those notices that do not pertain to any part of
104 |           the Derivative Works; and
105 | 
106 |       (d) If the Work includes a "NOTICE" text file as part of its
107 |           distribution, then any Derivative Works that You distribute must
108 |           include a readable copy of the attribution notices contained
109 |           within such NOTICE file, excluding those notices that do not
110 |           pertain to any part of the Derivative Works, in at least one
111 |           of the following places: within a NOTICE text file distributed
112 |           as part of the Derivative Works; within the Source form or
113 |           documentation, if provided along with the Derivative Works; or,
114 |           within a display generated by the Derivative Works, if and
115 |           wherever such third-party notices normally appear. The contents
116 |           of the NOTICE file are for informational purposes only and
117 |           do not modify the License. You may add Your own attribution
118 |           notices within Derivative Works that You distribute, alongside
119 |           or as an addendum to the NOTICE text from the Work, provided
120 |           that such additional attribution notices cannot be construed
121 |           as modifying the License.
122 | 
123 |       You may add Your own copyright statement to Your modifications and
124 |       may provide additional or different license terms and conditions
125 |       for use, reproduction, or distribution of Your modifications, or
126 |       for any such Derivative Works as a whole, provided Your use,
127 |       reproduction, and distribution of the Work otherwise complies with
128 |       the conditions stated in this License.
129 | 
130 |    5. Submission of Contributions. Unless You explicitly state otherwise,
131 |       any Contribution intentionally submitted for inclusion in the Work
132 |       by You to the Licensor shall be under the terms and conditions of
133 |       this License, without any additional terms or conditions.
134 |       Notwithstanding the above, nothing herein shall supersede or modify
135 |       the terms of any separate license agreement you may have executed
136 |       with Licensor regarding such Contributions.
137 | 
138 |    6. Trademarks. This License does not grant permission to use the trade
139 |       names, trademarks, service marks, or product names of the Licensor,
140 |       except as required for reasonable and customary use in describing the
141 |       origin of the Work and reproducing the content of the NOTICE file.
142 | 
143 |    7. Disclaimer of Warranty. Unless required by applicable law or
144 |       agreed to in writing, Licensor provides the Work (and each
145 |       Contributor provides its Contributions) on an "AS IS" BASIS,
146 |       WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 |       implied, including, without limitation, any warranties or conditions
148 |       of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 |       PARTICULAR PURPOSE. You are solely responsible for determining the
150 |       appropriateness of using or redistributing the Work and assume any
151 |       risks associated with Your exercise of permissions under this License.
152 | 
153 |    8. Limitation of Liability. In no event and under no legal theory,
154 |       whether in tort (including negligence), contract, or otherwise,
155 |       unless required by applicable law (such as deliberate and grossly
156 |       negligent acts) or agreed to in writing, shall any Contributor be
157 |       liable to You for damages, including any direct, indirect, special,
158 |       incidental, or consequential damages of any character arising as a
159 |       result of this License or out of the use or inability to use the
160 |       Work (including but not limited to damages for loss of goodwill,
161 |       work stoppage, computer failure or malfunction, or any and all
162 |       other commercial damages or losses), even if such Contributor
163 |       has been advised of the possibility of such damages.
164 | 
165 |    9. Accepting Warranty or Additional Liability. While redistributing
166 |       the Work or Derivative Works thereof, You may choose to offer,
167 |       and charge a fee for, acceptance of support, warranty, indemnity,
168 |       or other liability obligations and/or rights consistent with this
169 |       License. However, in accepting such obligations, You may act only
170 |       on Your own behalf and on Your sole responsibility, not on behalf
171 |       of any other Contributor, and only if You agree to indemnify,
172 |       defend, and hold each Contributor harmless for any liability
173 |       incurred by, or claims asserted against, such Contributor by reason
174 |       of your accepting any such warranty or additional liability.
175 | 
176 |    END OF TERMS AND CONDITIONS
177 | 
178 |    APPENDIX: How to apply the Apache License to your work.
179 | 
180 |       To apply the Apache License to your work, attach the following
181 |       boilerplate notice, with the fields enclosed by brackets "[]"
182 |       replaced with your own identifying information. (Don't include
183 |       the brackets!)  The text should be enclosed in the appropriate
184 |       comment syntax for the file format. We also recommend that a
185 |       file or class name and description of purpose be included on the
186 |       same "printed page" as the copyright notice for easier
187 |       identification within third-party archives.
188 | 
189 |    Copyright [yyyy] [name of copyright owner]
190 | 
191 |    Licensed under the Apache License, Version 2.0 (the "License");
192 |    you may not use this file except in compliance with the License.
193 |    You may obtain a copy of the License at
194 | 
195 |        http://www.apache.org/licenses/LICENSE-2.0
196 | 
197 |    Unless required by applicable law or agreed to in writing, software
198 |    distributed under the License is distributed on an "AS IS" BASIS,
199 |    WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 |    See the License for the specific language governing permissions and
201 |    limitations under the License.
202 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # LMDeploy-Jetson Community
 2 | 
 3 | ***Deploying LLMs offline on the NVIDIA Jetson platform marks the dawn of a new era in embodied intelligence, where devices can function independently without continuous internet access.***
 4 | 
 5 | [[中文]](./README_zh.md) | [[English]](./README.md)
 6 | 
 7 | This project focuses on adapting [LMDeploy](https://github.com/InternLM/lmdeploy) for use with NVIDIA Jetson series edge computing cards, facilitating the implementation of [InternLM](https://github.com/InternLM/InternLM) series LLMs for **Offline Embodied Intelligence (OEI)**.
 8 | 
 9 | ## Latest News🎉
10 | 
11 | * [2024/3/15] Updated suppoort for [LMDeploy-v0.2.5](https://github.com/InternLM/lmdeploy/releases/tag/v0.2.5).
12 | * [2024/2/26] This project has been included in the [LMDeploy](https://github.com/InternLM/lmdeploy) community.
13 | 
14 | ## Community Recruitment
15 | 
16 | * Recruiting community managers (Contact: an.hongjun@foxmail.com)
17 | * Recruiting benchmark testing data for more models of Jetson boards (please PR directly), such as:
18 |     * Jetson Nano
19 |     * Jetson TX2
20 |     * Jetson AGX Xavier
21 |     * Jetson Orin Nano
22 |     * Jetson AGX Orin
23 | * Recruiting developers to create Jetson-specific whl distributions
24 | * README optimization, etc.
25 | 
26 | ## Verified model/platform
27 | 
28 | * ✅：Verified and runnable
29 | * ❌：Verified but not runnable
30 | * ⭕️：Pending verification
31 | 
32 | |Models|InternLM-7B|InternLM-20B|InternLM2-1.8B|InternLM2-7B|InternLM2-20B|
33 | |:-:|:-:|:-:|:-:|:-:|:-:|
34 | |Orin AGX(32G)<br>Jetpack 5.1|✅<br>Mem:??/??<br>*14.68 token/s*|✅<br>Mem:??/??<br>*5.82 token/s*|✅<br>Mem:??/??<br>*56.57 token/s*|✅<br>Mem:??/??<br>*14.56 token/s*|✅<br>Mem:??/??<br>*6.16 token/s*|
35 | |Orin NX(16G)<br>Jetpack 5.1|✅<br>Mem:8.6G/16G<br>*7.39 token/s*|✅<br>Mem:14.7G/16G<br>*3.08 token/s*|✅<br>Mem:5.6G/16G<br>*22.96 token/s*|✅<br>Mem:9.2G/16G<br>*7.48 token/s*|✅<br>Mem:14.8G/16G<br>*3.19 token/s*|
36 | |Xavier NX(8G)<br>Jetpack 5.1|❌|❌|✅<br>Mem:4.35G/8G<br>*28.36 token/s*|❌|❌|
37 | 
38 | 
39 | **If you have more Jetson series boards, feel free to run benchmarks and submit the results via `Pull Requests` (PR) to become one of the community contributors!**
40 | 
41 | 
42 | ## Future Work
43 | 
44 | * Updating benchmark testing data for more models of Jetson boards.
45 | * Creating Jetson-specific whl distributions.
46 | * Following up on updates to the LMDeploy version.
47 | 
48 | ## Tutorial
49 | 
50 | [S1.Quantize on server by W4A16](./en/s1.md)
51 | 
52 | [S2.Install Miniconda on Jetson](./en/s2.md)
53 | 
54 | [S3.Install CMake-3.29.0 on Jetson](./en/s3.md)
55 | 
56 | [S4.Install RapidJson on Jetson](./en/s4.md)
57 | 
58 | [S5.Install Pytorch-2.1.0 on Jetson](./en/s5.md)
59 | 
60 | [S6.Port LMDeploy-0.2.5 to Jetson](./en/s6.md)
61 | 
62 | [S7.Run InternLM offline on Jetson](./en/s7.md)
63 | 
64 | ## Appendix
65 | 
66 | * [Reinstall Jetpack for Jetson](https://www.anhongjun.top/blogs.php?id=1)
67 | * [Test Benchmark of LMDeploy-Jetson](./en/benchmark.md)
68 | 
69 | ## Community Projects
70 | 
71 | * InternDog: Offline embodied intelligent guide dog based on the InternLM2. [[Github]](https://github.com/BestAnHongjun/InternDog) [[Bilibili]](https://www.bilibili.com/video/BV1RK421s7dm)
72 | 
73 | ## Citation
74 | 
75 | If this project is helpful to your work, please cite it using the following format:
76 | 
77 | ```bibtex
78 | @misc{2024lmdeployjetson,
79 |     title={LMDeploy-Jetson：Opening a new era of Offline Embodied Intelligence},
80 |     author={LMDeploy-Jetson Community},
81 |     url={https://github.com/BestAnHongjun/LMDeploy-Jetson},
82 |     year={2024}
83 | }
84 | ```
85 | 
86 | ## Acknowledgements
87 | 
88 | * [InternLM Practical Camp](https://github.com/InternLM/tutorial/)
89 | * [Shanghai Artificial Intelligence Laboratory](https://www.shlab.org.cn/)
90 | 


--------------------------------------------------------------------------------
/README_zh.md:
--------------------------------------------------------------------------------
 1 | # LMDeploy-Jetson社区
 2 | 
 3 | ***在NVIDIA Jetson平台离线部署大模型，开启离线具身智能新纪元。***
 4 | 
 5 | [[中文]](./README_zh.md) | [[English]](./README.md)
 6 | 
 7 | 本项目提供一种将[LMDeploy](https://github.com/InternLM/lmdeploy)移植到NVIDIA Jetson系列边缘计算卡的方法，并在Jetson计算卡上运行[InternLM](https://github.com/InternLM/InternLM)系列大模型，为**离线具身智能**提供可能。
 8 | 
 9 | ## 最新新闻🎉
10 | 
11 | * [2024/3/15] 更新了对[LMDeploy-v0.2.5](https://github.com/InternLM/lmdeploy/releases/tag/v0.2.5)。
12 | * [2024/2/26] 本项目被[LMDeploy](https://github.com/InternLM/lmdeploy)官方社区收录。
13 | 
14 | ## 社区招募
15 | 
16 | * 招募社区管理员(联系方式，an.hongjun@foxmail.com)
17 | * 招募更多型号Jetson板卡的Benchmark测试数据，可直接PR，如：
18 |     * Jetson Nano
19 |     * Jetson TX2
20 |     * Jetson AGX Xavier
21 |     * Jetson Orin Nano
22 |     * Jetson AGX Orin
23 | * 招募开发者制作Jetson专用whl发行版
24 | * README优化等
25 | 
26 | ## 已验证模型/平台
27 | 
28 | * ✅：已验证可运行
29 | * ❌：已验证不可运行
30 | * ⭕️：待验证
31 | 
32 | |Models|InternLM-7B|InternLM-20B|InternLM2-1.8B|InternLM2-7B|InternLM2-20B|
33 | |:-:|:-:|:-:|:-:|:-:|:-:|
34 | |Orin AGX(32G)<br>Jetpack 5.1|✅<br>Mem:??/??<br>*14.68 token/s*|✅<br>Mem:??/??<br>*5.82 token/s*|✅<br>Mem:??/??<br>*56.57 token/s*|✅<br>Mem:??/??<br>*14.56 token/s*|✅<br>Mem:??/??<br>*6.16 token/s*|
35 | |Orin NX(16G)<br>Jetpack 5.1|✅<br>Mem:8.6G/16G<br>*7.39 token/s*|✅<br>Mem:14.7G/16G<br>*3.08 token/s*|✅<br>Mem:5.6G/16G<br>*22.96 token/s*|✅<br>Mem:9.2G/16G<br>*7.48 token/s*|✅<br>Mem:14.8G/16G<br>*3.19 token/s*|
36 | |Xavier NX(8G)<br>Jetpack 5.1|❌|❌|✅<br>Mem:4.35G/8G<br>*28.36 token/s*|❌|❌|
37 | 
38 | **如果您有更多Jetson系列板卡，欢迎运行Benchmark并通过`Pull requests`(PR)提交结果，成为社区贡献者之一！**
39 | 
40 | ## 未来工作
41 | * 更新更多型号Jetson板卡的Benchmark测试数据
42 | * 制作Jetson专用whl发行版
43 | * 跟进更新版本的LMDeploy
44 | 
45 | ## 部署教程
46 | 
47 | [S1.服务器端模型W4A16量化](./zh/s1.md)
48 | 
49 | [S2.Jetson端安装Miniconda](./zh/s2.md)
50 | 
51 | [S3.Jetson端安装CMake-3.29.0](./zh/s3.md)
52 | 
53 | [S4.Jetson端安装RapidJson](./zh/s4.md)
54 | 
55 | [S5.Jetson端安装Pytorch-2.1.0](./zh/s5.md)
56 | 
57 | [S6.Jetson端移植LMDeploy-0.2.5](./zh/s6.md)
58 | 
59 | [S7.Jetson端离线运行InternLM大模型](./zh/s7.md)
60 | 
61 | ## 附录
62 | 
63 | * [为Jetson重装Jetpack](https://www.anhongjun.top/blogs.php?id=1)
64 | * [LMDeploy-Jetson基准测试](./zh/benchmark.md)
65 | 
66 | ## 社区项目
67 | 
68 | * InternDog: 基于InternLM2大模型的离线具身智能导盲犬 [[Github]](https://github.com/BestAnHongjun/InternDog) [[Bilibili]](https://www.bilibili.com/video/BV1RK421s7dm)
69 | 
70 | ## 引用
71 | 
72 | 如果本项目对您的工作有所帮助，请使用以下格式引用：
73 | 
74 | ```bibtex
75 | @misc{2024lmdeployjetson,
76 |     title={LMDeploy-Jetson：Opening a new era of Offline Embodied Intelligence},
77 |     author={LMDeploy-Jetson Community},
78 |     url={https://github.com/BestAnHongjun/LMDeploy-Jetson},
79 |     year={2024}
80 | }
81 | ```
82 | 
83 | ## 致谢
84 | 
85 | * [书生·浦语大模型实战营](https://github.com/InternLM/tutorial/)
86 | * [上海人工智能实验室](https://www.shlab.org.cn/)
87 | 


--------------------------------------------------------------------------------
/attach/benchmark.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BestAnHongjun/LMDeploy-Jetson/89ba96f07433a247dece3ededfeb137fd9ab3758/attach/benchmark.png


--------------------------------------------------------------------------------
/attach/cli.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BestAnHongjun/LMDeploy-Jetson/89ba96f07433a247dece3ededfeb137fd9ab3758/attach/cli.jpg


--------------------------------------------------------------------------------
/attach/python.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BestAnHongjun/LMDeploy-Jetson/89ba96f07433a247dece3ededfeb137fd9ab3758/attach/python.jpg


--------------------------------------------------------------------------------
/en/benchmark.md:
--------------------------------------------------------------------------------
 1 | # Test Benchmark of LMDeploy-Jetson
 2 | 
 3 | Please first refer to S2-S7 for deploying LMDeploy in Jetson.
 4 | 
 5 | Activate your conda environment.
 6 | 
 7 | ```sh
 8 | conda activate lmdeploy
 9 | ```
10 | 
11 | Enter the `lmdeploy/benchmark` directory.
12 | 
13 | ```sh
14 | cd ~/lmdeploy/benchmark
15 | ```
16 | 
17 | Run Benchmark.
18 | 
19 | ```sh
20 | python profile_generation.py \
21 |     <path/to/your/model>/internlm2-chat-1_8b-turbomind \
22 |     --concurrency 1 \
23 |     --prompt-tokens 128 \
24 |     --completion-tokens 128
25 | ```
26 | 
27 | Replace `internlm2 chat-1_8b turbomind` with your model path.
28 | 
29 | Record the speed benchmark.
30 | 
31 | ![](../attach/benchmark.png)
32 | 
33 | During the inference process, the unified memory usage can be viewed through the `htop` command.
34 | 
35 | ```sh
36 | # Install htop (if already installed, please ignore)
37 | apt-get install htop
38 | 
39 | # Run htop to check the usage of Mem.
40 | htop
41 | ```
42 | 
43 | 


--------------------------------------------------------------------------------
/en/s1.md:
--------------------------------------------------------------------------------
  1 | # S1.Quantize on server by W4A16
  2 | 
  3 | The LLMs occupy a large amount of GPU memory during inference. We can use the LMDeploy tool to quantize the model to [W4A16](https://github.com/InternLM/lmdeploy/blob/main/docs/en/quantization/w4a16.md) format and convert it into a [TurboMind](https://github.com/InternLM/lmdeploy/blob/main/docs/en/inference/turbomind.md) model. This can significantly reduce GPU memory usage, enabling the deployment of LLMs on the Jetson edge computing platform.
  4 | 
  5 | ### 1.Setup conda environment
  6 | 
  7 | The installation method for Anaconda is omitted.
  8 | 
  9 | Setup conda environment：
 10 | 
 11 | ```sh
 12 | conda create -n lmdeploy python=3.10
 13 | ```
 14 | 
 15 | Activate conda environment：
 16 | 
 17 | ```sh
 18 | conda activate lmdeploy
 19 | ```
 20 | 
 21 | ### 2.Install LMDeploy
 22 | 
 23 | Install lmdeploy by pip.
 24 | 
 25 | ```sh
 26 | # ref：https://github.com/InternLM/lmdeploy/issues/1169
 27 | pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
 28 | pip install lmdeploy[all]==0.2.3
 29 | ```
 30 | 
 31 | ### 3.Download HF model
 32 | 
 33 | ```sh
 34 | mkdir -p ~/models && cd ~/models
 35 | ```
 36 | 
 37 | Install dependencies.
 38 | 
 39 | ```sh
 40 | pip install modelscope
 41 | ```
 42 | 
 43 | Create file `download_models.py`：
 44 | 
 45 | ```py
 46 | from modelscope import snapshot_download
 47 | model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-chat-7b', cache_dir='internlm-chat-7b')
 48 | print(model_dir)
 49 | ```
 50 | 
 51 | > `internlm-chat-7b`` can be replaced with different models, such as `internlm-chat-20b`,`internlm2-chat-1_8b`,`internlm2-chat-7b`,`internlm2-chat-20b`.
 52 | 
 53 | Run the download script.
 54 | 
 55 | ```sh
 56 | python download_models.py
 57 | ```
 58 | 
 59 | The final printed output path is the path where the model is saved. Please make a note of it. 
 60 | 
 61 | ```sh
 62 | internlm-chat-7b/Shanghai_AI_Laboratory/internlm-chat-7b
 63 | ```
 64 | 
 65 | ### 4.Quantize model by W4A16
 66 | 
 67 | ```sh
 68 | export HF_MODEL=./internlm-chat-7b/Shanghai_AI_Laboratory/internlm-chat-7b
 69 | export WORK_DIR=./internlm-chat-7b-4bit
 70 | 
 71 | lmdeploy lite auto_awq \
 72 |    $HF_MODEL \
 73 |   --calib-dataset 'ptb' \
 74 |   --calib-samples 128 \
 75 |   --calib-seqlen 2048 \
 76 |   --w-bits 4 \
 77 |   --w-group-size 128 \
 78 |   --work-dir $WORK_DIR
 79 | ```
 80 | 
 81 | Convert model format.
 82 | 
 83 | ```sh
 84 | export TM_DIR=./internlm-chat-7b-turbomind
 85 | 
 86 | lmdeploy convert  internlm-chat-7b \
 87 |     $WORK_DIR \
 88 |     --model-format awq \
 89 |     --group-size 128 \
 90 |     --dst-path $TM_DIR
 91 | ```
 92 | 
 93 | Open the configuration file `internlm-chat-7b-turbomind/triton_models/weights/config.ini` and modify the following three lines.
 94 | 
 95 | ```ini
 96 | cache_max_entry_count = 0.5
 97 | cache_block_seq_len = 128
 98 | cache_chunk_size = 1
 99 | ```
100 | 
101 | Compress the TurboMind model.
102 | 
103 | ```sh
104 | apt-get install pigz    # Multi-threads speed up
105 | tar --use-compress-program=pigz -cvf internlm-chat-7b-turbomind.tgz ./internlm-chat-7b-turbomind 
106 | ```
107 | 
108 | Keep `internlm-chat-7b-turbomind.tgz`, it will be used later.


--------------------------------------------------------------------------------
/en/s2.md:
--------------------------------------------------------------------------------
 1 | # S2.Install Miniconda on Jetson
 2 | 
 3 | For more convenient management of Python packages, we choose to use Conda virtual environment. However, due to the large size of Anaconda, we opt for Miniconda as a replacement.
 4 | 
 5 | Create Miniconda installation directory.
 6 | 
 7 | ```sh
 8 | mkdir -p ~/miniconda3
 9 | ```
10 | 
11 | Download the Miniconda installation package:
12 | 
13 | [[<small>Download by Browser</small>]](https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh)
14 | 
15 | ```sh
16 | wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh -O ~/miniconda3/miniconda.sh
17 | ```
18 | 
19 | Install Miniconda:
20 | 
21 | ```sh
22 | bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
23 | ```
24 | 
25 | Delete the installation package:
26 | 
27 | ```sh
28 | rm -rf ~/miniconda3/miniconda.sh
29 | ```
30 | 
31 | Initialize bash configuration:
32 | 
33 | ```sh
34 | ~/miniconda3/bin/conda init bash
35 | source ~/.bashrc
36 | ```
37 | 
38 | Create Conda environment:
39 | 
40 | ```sh
41 | conda create -n lmdeploy python=3.8 # 请安装py38，否则安装pytorch会出现不兼容的问题。
42 | ```
43 | 


--------------------------------------------------------------------------------
/en/s3.md:
--------------------------------------------------------------------------------
 1 | # S3.Install CMake-3.29.0 on Jetson
 2 | 
 3 | "The pre-installed CMake version in Jetpack is too low. We need to upgrade the CMake version in order to compile and install LMDeploy."
 4 | 
 5 | "To avoid the 'butterfly effect' caused by changes in the CMake version, this tutorial adopts a method that does not install the higher version of CMake into the system directory. Instead, it temporarily selects the higher version of CMake using the `export` environment variable when needed."
 6 | 
 7 | Download cmake-3.29.0-rc1:
 8 | 
 9 | [[<small>Download by Browser</small>]](https://github.com/Kitware/CMake/releases/download/v3.29.0-rc1/cmake-3.29.0-rc1-linux-aarch64.tar.gz)
10 | 
11 | ```sh
12 | cd ~
13 | wget https://github.com/Kitware/CMake/releases/download/v3.29.0-rc1/cmake-3.29.0-rc1-linux-aarch64.tar.gz
14 | ```
15 | 
16 | Unzip the install package:
17 | 
18 | ```sh
19 | tar xf cmake-3.29.0-rc1-linux-aarch64.tar.gz
20 | ```
21 | 
22 | Delete the install package:
23 | 
24 | ```sh
25 | rm cmake-3.29.0-rc1-linux-aarch64.tar.gz
26 | ```
27 | 
28 | Rename the folder:
29 | 
30 | ```sh
31 | mv cmake-3.29.0-rc1-linux-aarch64 cmake-3.29.0
32 | cd cmake-3.29.0
33 | mv cmake-3.29.0-rc1-linux-aarch64/* .
34 | rm -rf cmake-3.29.0-rc1-linux-aarch64
35 | ```
36 | 
37 | Verify if it can run normally:
38 | 
39 | ```sh
40 | ./bin/cmake --version
41 | ```
42 | 
43 | Temporarily specify CMake version as 3.29.0:
44 | 
45 | ```sh
46 | export PATH=/home/nvidia/cmake-3.29.0/bin:$PATH
47 | ```
48 | 
49 | **Note**: Please replace `nvidia` in the above command with your username.
50 | 
51 | Check if the version has changed to 3.29.0:
52 | 
53 | ```sh
54 | cmake --version
55 | ```
56 | 
57 | **Note**: The higher version of CMake only takes effect in the current terminal session. You need to re-run `export` when opening a new terminal to use the higher version of CMake.
58 | 


--------------------------------------------------------------------------------
/en/s4.md:
--------------------------------------------------------------------------------
 1 | # S4.Install RapidJson on Jetson
 2 | 
 3 | Clone the RapidJson repository.
 4 | 
 5 | ```sh
 6 | cd ~
 7 | git clone https://github.com/Tencent/rapidjson.git
 8 | 
 9 | # For Chinese User
10 | # git clone https://gitee.com/Tencent/RapidJSON.git
11 | ```
12 | 
13 | Initialize submodules.
14 | 
15 | ```sh
16 | cd rapidjson
17 | 
18 | # For Chinese User
19 | # cd RapidJSON
20 | ```
21 | 
22 | Complie RapidJson.
23 | 
24 | ```sh
25 | mkdir build && cd build
26 | cmake .. \
27 |     -DRAPIDJSON_BUILD_DOC=OFF \
28 |     -DRAPIDJSON_BUILD_EXAMPLES=OFF \
29 |     -DRAPIDJSON_BUILD_TESTS=OFF
30 | make -j4
31 | ```
32 | 
33 | Install RapidJson to the system.
34 | 
35 | ```sh
36 | sudo make install
37 | ```


--------------------------------------------------------------------------------
/en/s5.md:
--------------------------------------------------------------------------------
 1 | # S5.Install Pytorch-2.1.0 on Jetson
 2 | 
 3 | Download pytorch v2.1.0
 4 | 
 5 | ```sh
 6 | cd ~
 7 | mkdir pytorch-2.1.0-cp38 && cd pytorch-2.1.0-cp38
 8 | # For JetPack 5, execute the following command. For JetPack 6, please download the corresponding PyTorch 2.1 from the official website.
 9 | wget https://developer.download.nvidia.cn/compute/redist/jp/v512/pytorch/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl
10 | ```
11 | 
12 | [[<small>Download Pytorch by Browser</small>]](https://developer.download.nvidia.cn/compute/redist/jp/v512/pytorch/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl)
13 | 
14 | Install dependencies.
15 | 
16 | ```sh
17 | sudo apt-get install libopenblas-dev
18 | ```
19 | 
20 | > **For Chinese users**: \
21 | > Replace Tsinghua Source: https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu-ports/ \
22 | > Please choose `Ubuntu 20.04 LTS (focal)`.
23 | 
24 | Activate Conda environment.
25 | 
26 | ```sh
27 | conda activate lmdeploy
28 | ```
29 | 
30 | Install pytorch-v2.1.0
31 | 
32 | ```sh
33 | pip install torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl
34 | ```
35 | 
36 | > **For Chinese users**: \
37 | > Replace Tsinghua Source:
38 | 
39 | ```sh
40 | pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
41 | ```
42 | 
43 | Verify successful installation using Python interpreter mode, normal display should show as `True``.
44 | 
45 | ```py
46 | import torch
47 | torch.cuda.is_available()
48 | ```


--------------------------------------------------------------------------------
/en/s6.md:
--------------------------------------------------------------------------------
 1 | # S6.Port LMDeploy-0.2.5 to Jetson
 2 | 
 3 | Download lmdeploy-v0.2.5：
 4 | 
 5 | [[<small>Download by Browser</small>]](https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.2.5.zip)
 6 | 
 7 | ```sh
 8 | cd ~/
 9 | git clone https://github.com/InternLM/lmdeploy.git
10 | cd lmdeploy 
11 | git checkout c5f4014 # 确保为0.2.5版本
12 | ```
13 | 
14 | Activate conda environment：
15 | 
16 | ```sh
17 | conda activate lmdeploy
18 | ```
19 | 
20 | Install dependencies.
21 | 
22 | ```sh
23 | cd ~/lmdeploy
24 | pip install -r requirements/build.txt
25 | ```
26 | 
27 | Create a new file named `generate_jetson.sh` under `~/lmdeploy`, and fill in the following content:
28 | 
29 | ```sh
30 | #!/bin/sh
31 | 
32 | builder="-G Ninja"
33 | 
34 | if [ "$1" == "make" ]; then
35 |     builder=""
36 | fi
37 | 
38 | cmake ${builder} .. \
39 |     -DCMAKE_BUILD_TYPE=RelWithDebInfo \
40 |     -DCMAKE_EXPORT_COMPILE_COMMANDS=1 \
41 |     -DCMAKE_INSTALL_PREFIX=./install \
42 |     -DBUILD_PY_FFI=ON \
43 |     -DBUILD_MULTI_GPU=OFF \
44 |     -DCMAKE_CUDA_FLAGS="-lineinfo" \
45 |     -DUSE_NVTX=ON
46 | 
47 | ```
48 | 
49 | Modify file permissions.
50 | 
51 | ```sh
52 | chmod +x generate_jetson.sh
53 | ```
54 | 
55 | Install Ninja.
56 | 
57 | ```sh
58 | sudo apt-get install ninja-build
59 | ```
60 | 
61 | Create a new build folder.
62 | 
63 | ```sh
64 | cd ~/lmdeploy
65 | mkdir build && cd build
66 | ```
67 | 
68 | Compile LMDeploy.
69 | 
70 | ```sh
71 | ../generate_jetson.sh
72 | ninja install
73 | ```
74 | 
75 | During the compilation process, the memory may run out and result in `Killed`. You can expand the swap capacity as follows, and then execute ninja install again.
76 | 
77 | ```sh
78 | # Create a 6 GB swap area. The size can be customized and combined with the disk capacity
79 | sudo fallocate -l 6G /var/swapfile
80 | # Modify file permissions.
81 | sudo chmod 600 /var/swapfile
82 | # Make swap area
83 | sudo mkswap /var/swapfile
84 | # Setup swap area
85 | sudo swapon /var/swapfile
86 | # Setup swap area automatically
87 | sudo bash -c 'echo "/var/swapfile swap swap defaults 0 0" >> /etc/fstab'
88 | ```
89 | 
90 | **Attention**: Use vim to edit `requirements/runtime.txt`, and delete the lines containing `torch<=2.1.2,>=2.0.0` and `triton>=2.1.0,<2.2.0`.
91 | 
92 | **Note**: To simplify dependencies, we have removed `triton`. This also means that when deploying models using lmdeploy, they can only be invoked through the turbomind method, and not through the API method.
93 | 
94 | Install lmdeploy-v0.2.5 locally.
95 | 
96 | ```sh
97 | cd ~/lmdeploy
98 | pip install -e .[serve]
99 | ```


--------------------------------------------------------------------------------
/en/s7.md:
--------------------------------------------------------------------------------
  1 | # S7.Run InternLM offline on Jetson
  2 | 
  3 | Create directory to save models.
  4 | 
  5 | ```sh
  6 | mkdir -p ~/models
  7 | ```
  8 | 
  9 | Upload the `internlm-chat-7b-turbomind.tgz` obtained from [S1.Quantize on server by W4A16](./s1.md) to the `models` directory.
 10 | 
 11 | Unzip the model.
 12 | 
 13 | ```sh
 14 | tar zxvf internlm-chat-7b-turbomind.tgz -C .
 15 | ```
 16 | 
 17 | ### 0.Bug fix: Modify the MMEngine module.
 18 | 
 19 | The PyTorch version on Jetson does not support distributed reduce operations, which may cause errors in the distributed parts of the MMEngine module.
 20 | 
 21 | Error as:
 22 | 
 23 | ```sh
 24 | AttributeError: module 'torch.distributed' has no attribute 'ReduceOp'
 25 | ```
 26 | 
 27 | Activate conda environment：
 28 | 
 29 | ```sh
 30 | conda activate lmdeploy
 31 | ```
 32 | 
 33 | Run Python in interpreter mode:
 34 | 
 35 | ```sh
 36 | python
 37 | ```
 38 | 
 39 | Enter the following content:
 40 | 
 41 | ```py
 42 | import mmengine
 43 | print(mmengine.__file__)
 44 | ```
 45 | 
 46 | It will output the installation location of the MMEngine module. The author's location is`/home/nvidia/miniconda3/envs/lmdeploy/lib/python3.8/site-packages/mmengine/__init__.py`，then the location of that is`home/nvidia/miniconda3/envs/lmdeploy/lib/python3.8/site-packages/mmengine/`.Let's use `<path/to/mmengine>` instead.
 47 | 
 48 | Modify line 208 of `<path/to/mmengine>/logging/logger.py`.
 49 | 
 50 | ```git
 51 | - global_rank = _get_rank()
 52 | + global_rank = 0
 53 | ```
 54 | 
 55 | There will be no errors during operation.
 56 | 
 57 | **Attention * *: This method is too crude and only applicable to Jetson platform deployment inference. It will affect distributed functionality on the server side!
 58 | 
 59 | ### 1.CLI Mode
 60 | 
 61 | Acitavate conda environment：
 62 | 
 63 | ```sh
 64 | conda activate lmdeploy
 65 | ```
 66 | 
 67 | Run model.
 68 | 
 69 | ```sh
 70 | lmdeploy chat turbomind ./internlm-chat-7b-turbomind
 71 | ```
 72 | 
 73 | ![](../attach/cli.jpg)
 74 | 
 75 | ### 2.Python Mode
 76 | 
 77 | Write a running script `run_model.py` with the following content:
 78 | 
 79 | ```py
 80 | from lmdeploy import turbomind as tm
 81 | 
 82 | 
 83 | if __name__ == "__main__":
 84 |     model_path = "./internlm-chat-7b-turbomind" # 修改成你的路径
 85 | 
 86 |     tm_model = tm.TurboMind.from_pretrained(model_path)
 87 |     generator = tm_model.create_instance()
 88 | 
 89 |     while True:
 90 |         inp = input("[User] >>> ")
 91 |         if inp == "exit":
 92 |             break
 93 |         prompt = tm_model.model.get_prompt(inp)
 94 |         input_ids = tm_model.tokenizer.encode(prompt)
 95 |         for outputs in generator.stream_infer(session_id=0, input_ids=[input_ids]):
 96 |             res = outputs[1]
 97 |         response = tm_model.tokenizer.decode(res)
 98 |         print("[Bot] <<< {}".format(response))
 99 | 
100 | ```
101 | 
102 | Activate conda environment：
103 | 
104 | ```sh
105 | conda activate lmdeploy
106 | ```
107 | 
108 | Run the script：
109 | 
110 | ```sh
111 | python run_model.py
112 | ```
113 | 
114 | ![](../attach/python.jpg)
115 | 


--------------------------------------------------------------------------------
/zh/benchmark.md:
--------------------------------------------------------------------------------
 1 | # LMDeploy-Jetson基准测试
 2 | 
 3 | 请首先参考S2-S7在Jetson部署LMDeploy。
 4 | 
 5 | 激活conda环境。
 6 | 
 7 | ```sh
 8 | conda activate lmdeploy
 9 | ```
10 | 
11 | 进入`lmdeploy/benchmark`目录。
12 | 
13 | ```sh
14 | cd ~/lmdeploy/benchmark
15 | ```
16 | 
17 | 运行Benchmark。
18 | 
19 | ```sh
20 | python profile_generation.py \
21 |     <path/to/your/model>/internlm2-chat-1_8b-turbomind \
22 |     --concurrency 1 \
23 |     --prompt-tokens 128 \
24 |     --completion-tokens 128
25 | ```
26 | 
27 | 其中`internlm2-chat-1_8b-turbomind`更换为你的模型路径。
28 | 
29 | 记录推理速度benchmark。
30 | 
31 | ![](../attach/benchmark.png)
32 | 
33 | 推理过程中，可通过`htop`命令查看统一内存占用情况。
34 | 
35 | ```sh
36 | # htop安装方法（如已安装，请忽略）
37 | apt-get install htop
38 | 
39 | # 运行htop，查看Mem
40 | htop
41 | ```


--------------------------------------------------------------------------------
/zh/s1.md:
--------------------------------------------------------------------------------
  1 | # S1.服务器端模型W4A16量化
  2 | 
  3 | 大模型推理时占用显存巨大，我们可以借助LMDeploy工具对模型进行[W4A16量化](https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/quantization/w4a16.md)，转换为[TurboMind](https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/inference/turbomind.md)模型，这样在推理时可以极大减少显存占用，使得在Jetson边缘计算平台部署大模型成为可能。
  4 | 
  5 | ### 1.创建conda环境
  6 | 
  7 | 安装Anaconda方法略。
  8 | 
  9 | 创建conda环境：
 10 | 
 11 | ```sh
 12 | conda create -n lmdeploy python=3.10
 13 | ```
 14 | 
 15 | 激活conda环境：
 16 | 
 17 | ```sh
 18 | conda activate lmdeploy
 19 | ```
 20 | 
 21 | ### 2.安装LMDeploy
 22 | 
 23 | 使用pip方法安装lmdeploy。
 24 | 
 25 | ```sh
 26 | # 直接用pip装lmdeploy时安装的pytorch的cuda可能是12版本的，运行时会引发链接错误
 27 | # ref：https://github.com/InternLM/lmdeploy/issues/1169
 28 | pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
 29 | # 安装好torch-2.1.2-cu118后再安装lmdeploy
 30 | pip install lmdeploy[all]==0.2.3
 31 | ```
 32 | 
 33 | ### 3.下载HF模型权重文件
 34 | 
 35 | 创建目录
 36 | 
 37 | ```sh
 38 | mkdir -p ~/models && cd ~/models
 39 | ```
 40 | 
 41 | 安装依赖项。
 42 | 
 43 | ```sh
 44 | pip install modelscope
 45 | ```
 46 | 
 47 | 创建Python文件`download_models.py`：
 48 | 
 49 | ```py
 50 | #模型下载
 51 | from modelscope import snapshot_download
 52 | model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-chat-7b', cache_dir='internlm-chat-7b')
 53 | print(model_dir)
 54 | ```
 55 | 
 56 | > 其中，internlm-chat-7b可替换为不同的模型，比如`internlm-chat-20b`，`internlm2-chat-1_8b`，`internlm2-chat-7b`，`internlm2-chat-20b`，下同。
 57 | 
 58 | 运行下载脚本：
 59 | 
 60 | ```sh
 61 | python download_models.py
 62 | ```
 63 | 
 64 | 最后打印输出的路径就是模型保存的路径，请记录下。笔者为：
 65 | 
 66 | ```sh
 67 | internlm-chat-7b/Shanghai_AI_Laboratory/internlm-chat-7b
 68 | ```
 69 | 
 70 | ### 4.模型W4A16量化
 71 | 
 72 | ```sh
 73 | export HF_MODEL=./internlm-chat-7b/Shanghai_AI_Laboratory/internlm-chat-7b
 74 | export WORK_DIR=./internlm-chat-7b-4bit
 75 | 
 76 | lmdeploy lite auto_awq \
 77 |    $HF_MODEL \
 78 |   --calib-dataset 'ptb' \
 79 |   --calib-samples 128 \
 80 |   --calib-seqlen 2048 \
 81 |   --w-bits 4 \
 82 |   --w-group-size 128 \
 83 |   --work-dir $WORK_DIR
 84 | ```
 85 | 
 86 | 转换模型格式。
 87 | 
 88 | ```sh
 89 | export TM_DIR=./internlm-chat-7b-turbomind
 90 | 
 91 | lmdeploy convert  internlm-chat-7b \
 92 |     $WORK_DIR \
 93 |     --model-format awq \
 94 |     --group-size 128 \
 95 |     --dst-path $TM_DIR
 96 | ```
 97 | 
 98 | 修改配置文件`internlm-chat-7b-turbomind/triton_models/weights/config.ini`，修改如下三行：
 99 | 
100 | ```ini
101 | cache_max_entry_count = 0.5
102 | cache_block_seq_len = 128
103 | cache_chunk_size = 1
104 | ```
105 | 
106 | 压缩turbomind模型。
107 | 
108 | ```sh
109 | apt-get install pigz    # 多线程加速压缩
110 | tar --use-compress-program=pigz -cvf internlm-chat-7b-turbomind.tgz ./internlm-chat-7b-turbomind 
111 | ```
112 | 
113 | 将`internlm-chat-7b-turbomind.tgz`保留备用。


--------------------------------------------------------------------------------
/zh/s2.md:
--------------------------------------------------------------------------------
 1 | # S2.Jetson端安装Miniconda
 2 | 
 3 | 在Jetson上安装conda虚拟环境方便管理python包。由于Anaconda过于庞大，因此选择安装Miniconda。
 4 | 
 5 | 创建Miniconda安装目录。
 6 | 
 7 | ```sh
 8 | mkdir -p ~/miniconda3
 9 | ```
10 | 
11 | 下载Miniconda安装包：
12 | 
13 | [[<small>网页下载</small>]](https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh)
14 | 
15 | ```sh
16 | wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh -O ~/miniconda3/miniconda.sh
17 | ```
18 | 
19 | 安装Miniconda：
20 | 
21 | ```sh
22 | bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
23 | ```
24 | 
25 | 删除安装包：
26 | 
27 | ```sh
28 | rm -rf ~/miniconda3/miniconda.sh
29 | ```
30 | 
31 | 初始化bash配置：
32 | 
33 | ```sh
34 | ~/miniconda3/bin/conda init bash
35 | source ~/.bashrc
36 | ```
37 | 
38 | 创建conda环境：
39 | 
40 | ```sh
41 | conda create -n lmdeploy python=3.8 # 请安装py38，否则安装pytorch会出现不兼容的问题。
42 | ```
43 | 


--------------------------------------------------------------------------------
/zh/s3.md:
--------------------------------------------------------------------------------
 1 | # S3.Jetson端安装CMake-3.29.0
 2 | 
 3 | Jetpack预装的CMake版本太低了，我们需要升级一下CMake版本才能编译安装LMDeploy。
 4 | 
 5 | 为了避免CMake版本变化引起“蝴蝶效应”，本教程采取的方法不会将高版本的CMake安装到系统目录，而是在使用时临时通过`export`环境变量的方式选用高版本的CMake。
 6 | 
 7 | 下载cmake-3.29.0-rc1：
 8 | 
 9 | [[<small>网页下载</small>]](https://github.com/Kitware/CMake/releases/download/v3.29.0-rc1/cmake-3.29.0-rc1-linux-aarch64.tar.gz)
10 | 
11 | ```sh
12 | cd ~
13 | wget https://github.com/Kitware/CMake/releases/download/v3.29.0-rc1/cmake-3.29.0-rc1-linux-aarch64.tar.gz
14 | ```
15 | 
16 | 解压压缩包：
17 | 
18 | ```sh
19 | tar xf cmake-3.29.0-rc1-linux-aarch64.tar.gz
20 | ```
21 | 
22 | 删除压缩包：
23 | 
24 | ```sh
25 | rm cmake-3.29.0-rc1-linux-aarch64.tar.gz
26 | ```
27 | 
28 | 重命名文件夹：
29 | 
30 | ```sh
31 | mv cmake-3.29.0-rc1-linux-aarch64 cmake-3.29.0
32 | cd cmake-3.29.0
33 | mv cmake-3.29.0-rc1-linux-aarch64/* .
34 | rm -rf cmake-3.29.0-rc1-linux-aarch64
35 | ```
36 | 
37 | 检验能否正常运行：
38 | 
39 | ```sh
40 | ./bin/cmake --version
41 | ```
42 | 
43 | 临时指定cmake版本为3.29.0：
44 | 
45 | ```sh
46 | export PATH=/home/nvidia/cmake-3.29.0/bin:$PATH
47 | ```
48 | 
49 | **注意**：请将上述命令中`nvidia`替换为你的用户名。
50 | 
51 | 查看版本是否变成了3.29.0：
52 | 
53 | ```sh
54 | cmake --version
55 | ```
56 | 
57 | **注意**：高版本CMake只对当前终端生效。当你打开新的终端时，需要重新运行`export`才能使用高版本的CMake。


--------------------------------------------------------------------------------
/zh/s4.md:
--------------------------------------------------------------------------------
 1 | # S4.Jetson端安装RapidJson
 2 | 
 3 | 克隆RapidJson仓库。
 4 | 
 5 | ```sh
 6 | cd ~
 7 | git clone https://github.com/Tencent/rapidjson.git
 8 | 
 9 | # 对于中国用户：
10 | # git clone https://gitee.com/Tencent/RapidJSON.git
11 | ```
12 | 
13 | 初始化子模块：
14 | 
15 | ```sh
16 | cd rapidjson
17 | 
18 | # 对于中国用户：
19 | # cd RapidJSON
20 | ```
21 | 
22 | 编译RapidJson：
23 | ```sh
24 | mkdir build && cd build
25 | cmake .. \
26 |     -DRAPIDJSON_BUILD_DOC=OFF \
27 |     -DRAPIDJSON_BUILD_EXAMPLES=OFF \
28 |     -DRAPIDJSON_BUILD_TESTS=OFF
29 | make -j4
30 | ```
31 | 
32 | 将RapidJson安装到系统：
33 | 
34 | ```sh
35 | sudo make install # 安装到系统
36 | ```


--------------------------------------------------------------------------------
/zh/s5.md:
--------------------------------------------------------------------------------
 1 | # S5.Jetson端安装Pytorch-2.1.0
 2 | 
 3 | 下载pytorch v2.1.0
 4 | 
 5 | ```sh
 6 | cd ~
 7 | mkdir pytorch-2.1.0-cp38 && cd pytorch-2.1.0-cp38
 8 | # Jetpack 5执行如下指令，JetPack 6请到官方网站下载对应的pytorch2.1
 9 | wget https://developer.download.nvidia.cn/compute/redist/jp/v512/pytorch/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl
10 | ```
11 | 
12 | [[<small>网页下载Pytorch</small>]](https://developer.download.nvidia.cn/compute/redist/jp/v512/pytorch/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl)
13 | 
14 | 安装依赖项
15 | 
16 | ```sh
17 | sudo apt-get install libopenblas-dev
18 | ```
19 | 
20 | > **对于中国用户**： \
21 | > 更换清华源: https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu-ports/ \
22 | > 请选择 `Ubuntu 20.04 LTS (focal)`。
23 | 
24 | 激活conda环境
25 | 
26 | ```sh
27 | conda activate lmdeploy
28 | ```
29 | 
30 | 安装pytorch-v2.1.0
31 | 
32 | ```sh
33 | pip install torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl
34 | ```
35 | 
36 | > **对于中国用户**： \
37 | > 更换清华源：
38 | ```sh
39 | pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
40 | ```
41 | 
42 | 使用python解释器模式验证是否安装成功，正常显示未True
43 | 
44 | ```py
45 | import torch
46 | torch.cuda.is_available()
47 | ```


--------------------------------------------------------------------------------
/zh/s6.md:
--------------------------------------------------------------------------------
 1 | # S6.Jetson端移植LMDeploy-0.2.5
 2 | 
 3 | 下载lmdeploy-v0.2.5：
 4 | 
 5 | [[<small>网页下载</small>]](https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.2.5.zip)
 6 | 
 7 | ```sh
 8 | cd ~/
 9 | git clone https://github.com/InternLM/lmdeploy.git
10 | cd lmdeploy 
11 | git checkout c5f4014 # 确保为0.2.5版本
12 | ```
13 | 
14 | 进入conda环境：
15 | 
16 | ```sh
17 | conda activate lmdeploy
18 | ```
19 | 
20 | 安装编译依赖项：
21 | 
22 | ```sh
23 | cd ~/lmdeploy
24 | pip install -r requirements/build.txt
25 | ```
26 | 
27 | 在`~/lmdeploy`下新建`generate_jetson.sh`，填入以下内容：
28 | 
29 | ```sh
30 | #!/bin/sh
31 | 
32 | builder="-G Ninja"
33 | 
34 | if [ "$1" == "make" ]; then
35 |     builder=""
36 | fi
37 | 
38 | cmake ${builder} .. \
39 |     -DCMAKE_BUILD_TYPE=RelWithDebInfo \
40 |     -DCMAKE_EXPORT_COMPILE_COMMANDS=1 \
41 |     -DCMAKE_INSTALL_PREFIX=./install \
42 |     -DBUILD_PY_FFI=ON \
43 |     -DBUILD_MULTI_GPU=OFF \
44 |     -DCMAKE_CUDA_FLAGS="-lineinfo" \
45 |     -DUSE_NVTX=ON
46 | 
47 | ```
48 | 
49 | 赋予权限。
50 | 
51 | ```sh
52 | chmod +x generate_jetson.sh
53 | ```
54 | 
55 | 安装Ninja。
56 | 
57 | ```sh
58 | sudo apt-get install ninja-build
59 | ```
60 | 
61 | 新建编译文件夹：
62 | 
63 | ```sh
64 | cd ~/lmdeploy
65 | mkdir build && cd build
66 | ```
67 | 
68 | 编译LMDeploy：
69 | 
70 | ```sh
71 | ../generate_jetson.sh
72 | ninja install
73 | ```
74 | 
75 | 编译过程中可能出现内存不足而导致`Killed`的现象。可以通过如下方式扩大交换区容量，再重新执行`ninja install`。
76 | 
77 | ```sh
78 | # 新建6G大小的交换区，大小可自定义，结合硬盘容量
79 | sudo fallocate -l 6G /var/swapfile
80 | # 赋予权限
81 | sudo chmod 600 /var/swapfile
82 | # 建立交换区
83 | sudo mkswap /var/swapfile
84 | # 启用交换区
85 | sudo swapon /var/swapfile
86 | # 设置自动启用交换区
87 | sudo bash -c 'echo "/var/swapfile swap swap defaults 0 0" >> /etc/fstab'
88 | ```
89 | 
90 | **注意**：使用vim修改`requirements/runtime.txt`，将`torch<=2.1.2,>=2.0.0`和`triton>=2.1.0,<2.2.0`两行行删除。
91 | 
92 | **提示**：为了简化依赖，我们去除了`triton`，这也同时意味着使用lmdeploy部署模型时，只能通过turbomind方式调用，而不能通过api方式。
93 | 
94 | 本地安装lmdeploy-v0.2.5
95 | 
96 | ```sh
97 | cd ~/lmdeploy
98 | pip install -e .[serve]
99 | ```


--------------------------------------------------------------------------------
/zh/s7.md:
--------------------------------------------------------------------------------
  1 | # S7.Jetson端离线运行InternLM大模型
  2 | 
  3 | 创建模型保存目录：
  4 | 
  5 | ```sh
  6 | mkdir -p ~/models
  7 | ```
  8 | 
  9 | 将[S1.服务器端模型W4A16量化](./s1.md)得到的`internlm-chat-7b-turbomind.tgz`上传到`models`目录下。
 10 | 
 11 | 解压模型文件：
 12 | 
 13 | ```sh
 14 | tar zxvf internlm-chat-7b-turbomind.tgz -C .
 15 | ```
 16 | 
 17 | ### 0.Bug解决：修改MMEngine库
 18 | 
 19 | Jetson端的pytorch不支持分布式的reduce算子，这会导致MMEngine库中与分布式有关的部分出现错误。
 20 | 
 21 | 错误为：
 22 | 
 23 | ```sh
 24 | AttributeError: module 'torch.distributed' has no attribute 'ReduceOp'
 25 | ```
 26 | 
 27 | 激活conda环境：
 28 | 
 29 | ```sh
 30 | conda activate lmdeploy
 31 | ```
 32 | 
 33 | 用解释器方式运行python：
 34 | 
 35 | ```sh
 36 | python
 37 | ```
 38 | 
 39 | 输入如下内容：
 40 | 
 41 | ```py
 42 | import mmengine
 43 | print(mmengine.__file__)
 44 | ```
 45 | 
 46 | 这就输出了MMEngine库的安装位置，笔者的是`/home/nvidia/miniconda3/envs/lmdeploy/lib/python3.8/site-packages/mmengine/__init__.py`，那么相应位置就是`home/nvidia/miniconda3/envs/lmdeploy/lib/python3.8/site-packages/mmengine/`，咱们用`<path/to/mmengine>`代替。
 47 | 
 48 | 修改`<path/to/mmengine>/logging/logger.py`第208行：
 49 | 
 50 | ```git
 51 | - global_rank = _get_rank()
 52 | + global_rank = 0
 53 | ```
 54 | 
 55 | 在运行就不会报错了。
 56 | 
 57 | **注意**：该方式过于粗暴，仅适用于Jetson平台部署推理，在服务器端会影响分布式功能！
 58 | 
 59 | ### 1.终端运行
 60 | 
 61 | 激活conda环境：
 62 | 
 63 | ```sh
 64 | conda activate lmdeploy
 65 | ```
 66 | 
 67 | 运行模型：
 68 | 
 69 | ```sh
 70 | lmdeploy chat turbomind ./internlm-chat-7b-turbomind
 71 | ```
 72 | 
 73 | ![](../attach/cli.jpg)
 74 | 
 75 | ### 2.Python集成运行
 76 | 
 77 | 编写运行脚本`run_model.py`，内容如下：
 78 | 
 79 | ```py
 80 | from lmdeploy import turbomind as tm
 81 | 
 82 | 
 83 | if __name__ == "__main__":
 84 |     model_path = "./internlm-chat-7b-turbomind" # 修改成你的路径
 85 | 
 86 |     tm_model = tm.TurboMind.from_pretrained(model_path)
 87 |     generator = tm_model.create_instance()
 88 | 
 89 |     while True:
 90 |         inp = input("[User] >>> ")
 91 |         if inp == "exit":
 92 |             break
 93 |         prompt = tm_model.model.get_prompt(inp)
 94 |         input_ids = tm_model.tokenizer.encode(prompt)
 95 |         for outputs in generator.stream_infer(session_id=0, input_ids=[input_ids]):
 96 |             res = outputs[1]
 97 |         response = tm_model.tokenizer.decode(res)
 98 |         print("[Bot] <<< {}".format(response))
 99 | 
100 | ```
101 | 
102 | 激活conda环境：
103 | 
104 | ```sh
105 | conda activate lmdeploy
106 | ```
107 | 
108 | 运行脚本：
109 | 
110 | ```sh
111 | python run_model.py
112 | ```
113 | 
114 | ![](../attach/python.jpg)
115 | 


--------------------------------------------------------------------------------