├── .gitignore
├── LICENSE
├── README.md
├── README_zh.md
├── attach
├── benchmark.png
├── cli.jpg
└── python.jpg
├── en
├── benchmark.md
├── s1.md
├── s2.md
├── s3.md
├── s4.md
├── s5.md
├── s6.md
└── s7.md
└── zh
├── benchmark.md
├── s1.md
├── s2.md
├── s3.md
├── s4.md
├── s5.md
├── s6.md
└── s7.md
/.gitignore:
--------------------------------------------------------------------------------
1 | .DS_Store
2 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | Apache License
2 | Version 2.0, January 2004
3 | http://www.apache.org/licenses/
4 |
5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION
6 |
7 | 1. Definitions.
8 |
9 | "License" shall mean the terms and conditions for use, reproduction,
10 | and distribution as defined by Sections 1 through 9 of this document.
11 |
12 | "Licensor" shall mean the copyright owner or entity authorized by
13 | the copyright owner that is granting the License.
14 |
15 | "Legal Entity" shall mean the union of the acting entity and all
16 | other entities that control, are controlled by, or are under common
17 | control with that entity. For the purposes of this definition,
18 | "control" means (i) the power, direct or indirect, to cause the
19 | direction or management of such entity, whether by contract or
20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the
21 | outstanding shares, or (iii) beneficial ownership of such entity.
22 |
23 | "You" (or "Your") shall mean an individual or Legal Entity
24 | exercising permissions granted by this License.
25 |
26 | "Source" form shall mean the preferred form for making modifications,
27 | including but not limited to software source code, documentation
28 | source, and configuration files.
29 |
30 | "Object" form shall mean any form resulting from mechanical
31 | transformation or translation of a Source form, including but
32 | not limited to compiled object code, generated documentation,
33 | and conversions to other media types.
34 |
35 | "Work" shall mean the work of authorship, whether in Source or
36 | Object form, made available under the License, as indicated by a
37 | copyright notice that is included in or attached to the work
38 | (an example is provided in the Appendix below).
39 |
40 | "Derivative Works" shall mean any work, whether in Source or Object
41 | form, that is based on (or derived from) the Work and for which the
42 | editorial revisions, annotations, elaborations, or other modifications
43 | represent, as a whole, an original work of authorship. For the purposes
44 | of this License, Derivative Works shall not include works that remain
45 | separable from, or merely link (or bind by name) to the interfaces of,
46 | the Work and Derivative Works thereof.
47 |
48 | "Contribution" shall mean any work of authorship, including
49 | the original version of the Work and any modifications or additions
50 | to that Work or Derivative Works thereof, that is intentionally
51 | submitted to Licensor for inclusion in the Work by the copyright owner
52 | or by an individual or Legal Entity authorized to submit on behalf of
53 | the copyright owner. For the purposes of this definition, "submitted"
54 | means any form of electronic, verbal, or written communication sent
55 | to the Licensor or its representatives, including but not limited to
56 | communication on electronic mailing lists, source code control systems,
57 | and issue tracking systems that are managed by, or on behalf of, the
58 | Licensor for the purpose of discussing and improving the Work, but
59 | excluding communication that is conspicuously marked or otherwise
60 | designated in writing by the copyright owner as "Not a Contribution."
61 |
62 | "Contributor" shall mean Licensor and any individual or Legal Entity
63 | on behalf of whom a Contribution has been received by Licensor and
64 | subsequently incorporated within the Work.
65 |
66 | 2. Grant of Copyright License. Subject to the terms and conditions of
67 | this License, each Contributor hereby grants to You a perpetual,
68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
69 | copyright license to reproduce, prepare Derivative Works of,
70 | publicly display, publicly perform, sublicense, and distribute the
71 | Work and such Derivative Works in Source or Object form.
72 |
73 | 3. Grant of Patent License. Subject to the terms and conditions of
74 | this License, each Contributor hereby grants to You a perpetual,
75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable
76 | (except as stated in this section) patent license to make, have made,
77 | use, offer to sell, sell, import, and otherwise transfer the Work,
78 | where such license applies only to those patent claims licensable
79 | by such Contributor that are necessarily infringed by their
80 | Contribution(s) alone or by combination of their Contribution(s)
81 | with the Work to which such Contribution(s) was submitted. If You
82 | institute patent litigation against any entity (including a
83 | cross-claim or counterclaim in a lawsuit) alleging that the Work
84 | or a Contribution incorporated within the Work constitutes direct
85 | or contributory patent infringement, then any patent licenses
86 | granted to You under this License for that Work shall terminate
87 | as of the date such litigation is filed.
88 |
89 | 4. Redistribution. You may reproduce and distribute copies of the
90 | Work or Derivative Works thereof in any medium, with or without
91 | modifications, and in Source or Object form, provided that You
92 | meet the following conditions:
93 |
94 | (a) You must give any other recipients of the Work or
95 | Derivative Works a copy of this License; and
96 |
97 | (b) You must cause any modified files to carry prominent notices
98 | stating that You changed the files; and
99 |
100 | (c) You must retain, in the Source form of any Derivative Works
101 | that You distribute, all copyright, patent, trademark, and
102 | attribution notices from the Source form of the Work,
103 | excluding those notices that do not pertain to any part of
104 | the Derivative Works; and
105 |
106 | (d) If the Work includes a "NOTICE" text file as part of its
107 | distribution, then any Derivative Works that You distribute must
108 | include a readable copy of the attribution notices contained
109 | within such NOTICE file, excluding those notices that do not
110 | pertain to any part of the Derivative Works, in at least one
111 | of the following places: within a NOTICE text file distributed
112 | as part of the Derivative Works; within the Source form or
113 | documentation, if provided along with the Derivative Works; or,
114 | within a display generated by the Derivative Works, if and
115 | wherever such third-party notices normally appear. The contents
116 | of the NOTICE file are for informational purposes only and
117 | do not modify the License. You may add Your own attribution
118 | notices within Derivative Works that You distribute, alongside
119 | or as an addendum to the NOTICE text from the Work, provided
120 | that such additional attribution notices cannot be construed
121 | as modifying the License.
122 |
123 | You may add Your own copyright statement to Your modifications and
124 | may provide additional or different license terms and conditions
125 | for use, reproduction, or distribution of Your modifications, or
126 | for any such Derivative Works as a whole, provided Your use,
127 | reproduction, and distribution of the Work otherwise complies with
128 | the conditions stated in this License.
129 |
130 | 5. Submission of Contributions. Unless You explicitly state otherwise,
131 | any Contribution intentionally submitted for inclusion in the Work
132 | by You to the Licensor shall be under the terms and conditions of
133 | this License, without any additional terms or conditions.
134 | Notwithstanding the above, nothing herein shall supersede or modify
135 | the terms of any separate license agreement you may have executed
136 | with Licensor regarding such Contributions.
137 |
138 | 6. Trademarks. This License does not grant permission to use the trade
139 | names, trademarks, service marks, or product names of the Licensor,
140 | except as required for reasonable and customary use in describing the
141 | origin of the Work and reproducing the content of the NOTICE file.
142 |
143 | 7. Disclaimer of Warranty. Unless required by applicable law or
144 | agreed to in writing, Licensor provides the Work (and each
145 | Contributor provides its Contributions) on an "AS IS" BASIS,
146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or
147 | implied, including, without limitation, any warranties or conditions
148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A
149 | PARTICULAR PURPOSE. You are solely responsible for determining the
150 | appropriateness of using or redistributing the Work and assume any
151 | risks associated with Your exercise of permissions under this License.
152 |
153 | 8. Limitation of Liability. In no event and under no legal theory,
154 | whether in tort (including negligence), contract, or otherwise,
155 | unless required by applicable law (such as deliberate and grossly
156 | negligent acts) or agreed to in writing, shall any Contributor be
157 | liable to You for damages, including any direct, indirect, special,
158 | incidental, or consequential damages of any character arising as a
159 | result of this License or out of the use or inability to use the
160 | Work (including but not limited to damages for loss of goodwill,
161 | work stoppage, computer failure or malfunction, or any and all
162 | other commercial damages or losses), even if such Contributor
163 | has been advised of the possibility of such damages.
164 |
165 | 9. Accepting Warranty or Additional Liability. While redistributing
166 | the Work or Derivative Works thereof, You may choose to offer,
167 | and charge a fee for, acceptance of support, warranty, indemnity,
168 | or other liability obligations and/or rights consistent with this
169 | License. However, in accepting such obligations, You may act only
170 | on Your own behalf and on Your sole responsibility, not on behalf
171 | of any other Contributor, and only if You agree to indemnify,
172 | defend, and hold each Contributor harmless for any liability
173 | incurred by, or claims asserted against, such Contributor by reason
174 | of your accepting any such warranty or additional liability.
175 |
176 | END OF TERMS AND CONDITIONS
177 |
178 | APPENDIX: How to apply the Apache License to your work.
179 |
180 | To apply the Apache License to your work, attach the following
181 | boilerplate notice, with the fields enclosed by brackets "[]"
182 | replaced with your own identifying information. (Don't include
183 | the brackets!) The text should be enclosed in the appropriate
184 | comment syntax for the file format. We also recommend that a
185 | file or class name and description of purpose be included on the
186 | same "printed page" as the copyright notice for easier
187 | identification within third-party archives.
188 |
189 | Copyright [yyyy] [name of copyright owner]
190 |
191 | Licensed under the Apache License, Version 2.0 (the "License");
192 | you may not use this file except in compliance with the License.
193 | You may obtain a copy of the License at
194 |
195 | http://www.apache.org/licenses/LICENSE-2.0
196 |
197 | Unless required by applicable law or agreed to in writing, software
198 | distributed under the License is distributed on an "AS IS" BASIS,
199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
200 | See the License for the specific language governing permissions and
201 | limitations under the License.
202 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # LMDeploy-Jetson Community
2 |
3 | ***Deploying LLMs offline on the NVIDIA Jetson platform marks the dawn of a new era in embodied intelligence, where devices can function independently without continuous internet access.***
4 |
5 | [[中文]](./README_zh.md) | [[English]](./README.md)
6 |
7 | This project focuses on adapting [LMDeploy](https://github.com/InternLM/lmdeploy) for use with NVIDIA Jetson series edge computing cards, facilitating the implementation of [InternLM](https://github.com/InternLM/InternLM) series LLMs for **Offline Embodied Intelligence (OEI)**.
8 |
9 | ## Latest News🎉
10 |
11 | * [2024/3/15] Updated suppoort for [LMDeploy-v0.2.5](https://github.com/InternLM/lmdeploy/releases/tag/v0.2.5).
12 | * [2024/2/26] This project has been included in the [LMDeploy](https://github.com/InternLM/lmdeploy) community.
13 |
14 | ## Community Recruitment
15 |
16 | * Recruiting community managers (Contact: an.hongjun@foxmail.com)
17 | * Recruiting benchmark testing data for more models of Jetson boards (please PR directly), such as:
18 | * Jetson Nano
19 | * Jetson TX2
20 | * Jetson AGX Xavier
21 | * Jetson Orin Nano
22 | * Jetson AGX Orin
23 | * Recruiting developers to create Jetson-specific whl distributions
24 | * README optimization, etc.
25 |
26 | ## Verified model/platform
27 |
28 | * ✅:Verified and runnable
29 | * ❌:Verified but not runnable
30 | * ⭕️:Pending verification
31 |
32 | |Models|InternLM-7B|InternLM-20B|InternLM2-1.8B|InternLM2-7B|InternLM2-20B|
33 | |:-:|:-:|:-:|:-:|:-:|:-:|
34 | |Orin AGX(32G)
Jetpack 5.1|✅
Mem:??/??
*14.68 token/s*|✅
Mem:??/??
*5.82 token/s*|✅
Mem:??/??
*56.57 token/s*|✅
Mem:??/??
*14.56 token/s*|✅
Mem:??/??
*6.16 token/s*|
35 | |Orin NX(16G)
Jetpack 5.1|✅
Mem:8.6G/16G
*7.39 token/s*|✅
Mem:14.7G/16G
*3.08 token/s*|✅
Mem:5.6G/16G
*22.96 token/s*|✅
Mem:9.2G/16G
*7.48 token/s*|✅
Mem:14.8G/16G
*3.19 token/s*|
36 | |Xavier NX(8G)
Jetpack 5.1|❌|❌|✅
Mem:4.35G/8G
*28.36 token/s*|❌|❌|
37 |
38 |
39 | **If you have more Jetson series boards, feel free to run benchmarks and submit the results via `Pull Requests` (PR) to become one of the community contributors!**
40 |
41 |
42 | ## Future Work
43 |
44 | * Updating benchmark testing data for more models of Jetson boards.
45 | * Creating Jetson-specific whl distributions.
46 | * Following up on updates to the LMDeploy version.
47 |
48 | ## Tutorial
49 |
50 | [S1.Quantize on server by W4A16](./en/s1.md)
51 |
52 | [S2.Install Miniconda on Jetson](./en/s2.md)
53 |
54 | [S3.Install CMake-3.29.0 on Jetson](./en/s3.md)
55 |
56 | [S4.Install RapidJson on Jetson](./en/s4.md)
57 |
58 | [S5.Install Pytorch-2.1.0 on Jetson](./en/s5.md)
59 |
60 | [S6.Port LMDeploy-0.2.5 to Jetson](./en/s6.md)
61 |
62 | [S7.Run InternLM offline on Jetson](./en/s7.md)
63 |
64 | ## Appendix
65 |
66 | * [Reinstall Jetpack for Jetson](https://www.anhongjun.top/blogs.php?id=1)
67 | * [Test Benchmark of LMDeploy-Jetson](./en/benchmark.md)
68 |
69 | ## Community Projects
70 |
71 | * InternDog: Offline embodied intelligent guide dog based on the InternLM2. [[Github]](https://github.com/BestAnHongjun/InternDog) [[Bilibili]](https://www.bilibili.com/video/BV1RK421s7dm)
72 |
73 | ## Citation
74 |
75 | If this project is helpful to your work, please cite it using the following format:
76 |
77 | ```bibtex
78 | @misc{2024lmdeployjetson,
79 | title={LMDeploy-Jetson:Opening a new era of Offline Embodied Intelligence},
80 | author={LMDeploy-Jetson Community},
81 | url={https://github.com/BestAnHongjun/LMDeploy-Jetson},
82 | year={2024}
83 | }
84 | ```
85 |
86 | ## Acknowledgements
87 |
88 | * [InternLM Practical Camp](https://github.com/InternLM/tutorial/)
89 | * [Shanghai Artificial Intelligence Laboratory](https://www.shlab.org.cn/)
90 |
--------------------------------------------------------------------------------
/README_zh.md:
--------------------------------------------------------------------------------
1 | # LMDeploy-Jetson社区
2 |
3 | ***在NVIDIA Jetson平台离线部署大模型,开启离线具身智能新纪元。***
4 |
5 | [[中文]](./README_zh.md) | [[English]](./README.md)
6 |
7 | 本项目提供一种将[LMDeploy](https://github.com/InternLM/lmdeploy)移植到NVIDIA Jetson系列边缘计算卡的方法,并在Jetson计算卡上运行[InternLM](https://github.com/InternLM/InternLM)系列大模型,为**离线具身智能**提供可能。
8 |
9 | ## 最新新闻🎉
10 |
11 | * [2024/3/15] 更新了对[LMDeploy-v0.2.5](https://github.com/InternLM/lmdeploy/releases/tag/v0.2.5)。
12 | * [2024/2/26] 本项目被[LMDeploy](https://github.com/InternLM/lmdeploy)官方社区收录。
13 |
14 | ## 社区招募
15 |
16 | * 招募社区管理员(联系方式,an.hongjun@foxmail.com)
17 | * 招募更多型号Jetson板卡的Benchmark测试数据,可直接PR,如:
18 | * Jetson Nano
19 | * Jetson TX2
20 | * Jetson AGX Xavier
21 | * Jetson Orin Nano
22 | * Jetson AGX Orin
23 | * 招募开发者制作Jetson专用whl发行版
24 | * README优化等
25 |
26 | ## 已验证模型/平台
27 |
28 | * ✅:已验证可运行
29 | * ❌:已验证不可运行
30 | * ⭕️:待验证
31 |
32 | |Models|InternLM-7B|InternLM-20B|InternLM2-1.8B|InternLM2-7B|InternLM2-20B|
33 | |:-:|:-:|:-:|:-:|:-:|:-:|
34 | |Orin AGX(32G)
Jetpack 5.1|✅
Mem:??/??
*14.68 token/s*|✅
Mem:??/??
*5.82 token/s*|✅
Mem:??/??
*56.57 token/s*|✅
Mem:??/??
*14.56 token/s*|✅
Mem:??/??
*6.16 token/s*|
35 | |Orin NX(16G)
Jetpack 5.1|✅
Mem:8.6G/16G
*7.39 token/s*|✅
Mem:14.7G/16G
*3.08 token/s*|✅
Mem:5.6G/16G
*22.96 token/s*|✅
Mem:9.2G/16G
*7.48 token/s*|✅
Mem:14.8G/16G
*3.19 token/s*|
36 | |Xavier NX(8G)
Jetpack 5.1|❌|❌|✅
Mem:4.35G/8G
*28.36 token/s*|❌|❌|
37 |
38 | **如果您有更多Jetson系列板卡,欢迎运行Benchmark并通过`Pull requests`(PR)提交结果,成为社区贡献者之一!**
39 |
40 | ## 未来工作
41 | * 更新更多型号Jetson板卡的Benchmark测试数据
42 | * 制作Jetson专用whl发行版
43 | * 跟进更新版本的LMDeploy
44 |
45 | ## 部署教程
46 |
47 | [S1.服务器端模型W4A16量化](./zh/s1.md)
48 |
49 | [S2.Jetson端安装Miniconda](./zh/s2.md)
50 |
51 | [S3.Jetson端安装CMake-3.29.0](./zh/s3.md)
52 |
53 | [S4.Jetson端安装RapidJson](./zh/s4.md)
54 |
55 | [S5.Jetson端安装Pytorch-2.1.0](./zh/s5.md)
56 |
57 | [S6.Jetson端移植LMDeploy-0.2.5](./zh/s6.md)
58 |
59 | [S7.Jetson端离线运行InternLM大模型](./zh/s7.md)
60 |
61 | ## 附录
62 |
63 | * [为Jetson重装Jetpack](https://www.anhongjun.top/blogs.php?id=1)
64 | * [LMDeploy-Jetson基准测试](./zh/benchmark.md)
65 |
66 | ## 社区项目
67 |
68 | * InternDog: 基于InternLM2大模型的离线具身智能导盲犬 [[Github]](https://github.com/BestAnHongjun/InternDog) [[Bilibili]](https://www.bilibili.com/video/BV1RK421s7dm)
69 |
70 | ## 引用
71 |
72 | 如果本项目对您的工作有所帮助,请使用以下格式引用:
73 |
74 | ```bibtex
75 | @misc{2024lmdeployjetson,
76 | title={LMDeploy-Jetson:Opening a new era of Offline Embodied Intelligence},
77 | author={LMDeploy-Jetson Community},
78 | url={https://github.com/BestAnHongjun/LMDeploy-Jetson},
79 | year={2024}
80 | }
81 | ```
82 |
83 | ## 致谢
84 |
85 | * [书生·浦语大模型实战营](https://github.com/InternLM/tutorial/)
86 | * [上海人工智能实验室](https://www.shlab.org.cn/)
87 |
--------------------------------------------------------------------------------
/attach/benchmark.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BestAnHongjun/LMDeploy-Jetson/89ba96f07433a247dece3ededfeb137fd9ab3758/attach/benchmark.png
--------------------------------------------------------------------------------
/attach/cli.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BestAnHongjun/LMDeploy-Jetson/89ba96f07433a247dece3ededfeb137fd9ab3758/attach/cli.jpg
--------------------------------------------------------------------------------
/attach/python.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/BestAnHongjun/LMDeploy-Jetson/89ba96f07433a247dece3ededfeb137fd9ab3758/attach/python.jpg
--------------------------------------------------------------------------------
/en/benchmark.md:
--------------------------------------------------------------------------------
1 | # Test Benchmark of LMDeploy-Jetson
2 |
3 | Please first refer to S2-S7 for deploying LMDeploy in Jetson.
4 |
5 | Activate your conda environment.
6 |
7 | ```sh
8 | conda activate lmdeploy
9 | ```
10 |
11 | Enter the `lmdeploy/benchmark` directory.
12 |
13 | ```sh
14 | cd ~/lmdeploy/benchmark
15 | ```
16 |
17 | Run Benchmark.
18 |
19 | ```sh
20 | python profile_generation.py \
21 | /internlm2-chat-1_8b-turbomind \
22 | --concurrency 1 \
23 | --prompt-tokens 128 \
24 | --completion-tokens 128
25 | ```
26 |
27 | Replace `internlm2 chat-1_8b turbomind` with your model path.
28 |
29 | Record the speed benchmark.
30 |
31 | 
32 |
33 | During the inference process, the unified memory usage can be viewed through the `htop` command.
34 |
35 | ```sh
36 | # Install htop (if already installed, please ignore)
37 | apt-get install htop
38 |
39 | # Run htop to check the usage of Mem.
40 | htop
41 | ```
42 |
43 |
--------------------------------------------------------------------------------
/en/s1.md:
--------------------------------------------------------------------------------
1 | # S1.Quantize on server by W4A16
2 |
3 | The LLMs occupy a large amount of GPU memory during inference. We can use the LMDeploy tool to quantize the model to [W4A16](https://github.com/InternLM/lmdeploy/blob/main/docs/en/quantization/w4a16.md) format and convert it into a [TurboMind](https://github.com/InternLM/lmdeploy/blob/main/docs/en/inference/turbomind.md) model. This can significantly reduce GPU memory usage, enabling the deployment of LLMs on the Jetson edge computing platform.
4 |
5 | ### 1.Setup conda environment
6 |
7 | The installation method for Anaconda is omitted.
8 |
9 | Setup conda environment:
10 |
11 | ```sh
12 | conda create -n lmdeploy python=3.10
13 | ```
14 |
15 | Activate conda environment:
16 |
17 | ```sh
18 | conda activate lmdeploy
19 | ```
20 |
21 | ### 2.Install LMDeploy
22 |
23 | Install lmdeploy by pip.
24 |
25 | ```sh
26 | # ref:https://github.com/InternLM/lmdeploy/issues/1169
27 | pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
28 | pip install lmdeploy[all]==0.2.3
29 | ```
30 |
31 | ### 3.Download HF model
32 |
33 | ```sh
34 | mkdir -p ~/models && cd ~/models
35 | ```
36 |
37 | Install dependencies.
38 |
39 | ```sh
40 | pip install modelscope
41 | ```
42 |
43 | Create file `download_models.py`:
44 |
45 | ```py
46 | from modelscope import snapshot_download
47 | model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-chat-7b', cache_dir='internlm-chat-7b')
48 | print(model_dir)
49 | ```
50 |
51 | > `internlm-chat-7b`` can be replaced with different models, such as `internlm-chat-20b`,`internlm2-chat-1_8b`,`internlm2-chat-7b`,`internlm2-chat-20b`.
52 |
53 | Run the download script.
54 |
55 | ```sh
56 | python download_models.py
57 | ```
58 |
59 | The final printed output path is the path where the model is saved. Please make a note of it.
60 |
61 | ```sh
62 | internlm-chat-7b/Shanghai_AI_Laboratory/internlm-chat-7b
63 | ```
64 |
65 | ### 4.Quantize model by W4A16
66 |
67 | ```sh
68 | export HF_MODEL=./internlm-chat-7b/Shanghai_AI_Laboratory/internlm-chat-7b
69 | export WORK_DIR=./internlm-chat-7b-4bit
70 |
71 | lmdeploy lite auto_awq \
72 | $HF_MODEL \
73 | --calib-dataset 'ptb' \
74 | --calib-samples 128 \
75 | --calib-seqlen 2048 \
76 | --w-bits 4 \
77 | --w-group-size 128 \
78 | --work-dir $WORK_DIR
79 | ```
80 |
81 | Convert model format.
82 |
83 | ```sh
84 | export TM_DIR=./internlm-chat-7b-turbomind
85 |
86 | lmdeploy convert internlm-chat-7b \
87 | $WORK_DIR \
88 | --model-format awq \
89 | --group-size 128 \
90 | --dst-path $TM_DIR
91 | ```
92 |
93 | Open the configuration file `internlm-chat-7b-turbomind/triton_models/weights/config.ini` and modify the following three lines.
94 |
95 | ```ini
96 | cache_max_entry_count = 0.5
97 | cache_block_seq_len = 128
98 | cache_chunk_size = 1
99 | ```
100 |
101 | Compress the TurboMind model.
102 |
103 | ```sh
104 | apt-get install pigz # Multi-threads speed up
105 | tar --use-compress-program=pigz -cvf internlm-chat-7b-turbomind.tgz ./internlm-chat-7b-turbomind
106 | ```
107 |
108 | Keep `internlm-chat-7b-turbomind.tgz`, it will be used later.
--------------------------------------------------------------------------------
/en/s2.md:
--------------------------------------------------------------------------------
1 | # S2.Install Miniconda on Jetson
2 |
3 | For more convenient management of Python packages, we choose to use Conda virtual environment. However, due to the large size of Anaconda, we opt for Miniconda as a replacement.
4 |
5 | Create Miniconda installation directory.
6 |
7 | ```sh
8 | mkdir -p ~/miniconda3
9 | ```
10 |
11 | Download the Miniconda installation package:
12 |
13 | [[Download by Browser]](https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh)
14 |
15 | ```sh
16 | wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh -O ~/miniconda3/miniconda.sh
17 | ```
18 |
19 | Install Miniconda:
20 |
21 | ```sh
22 | bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
23 | ```
24 |
25 | Delete the installation package:
26 |
27 | ```sh
28 | rm -rf ~/miniconda3/miniconda.sh
29 | ```
30 |
31 | Initialize bash configuration:
32 |
33 | ```sh
34 | ~/miniconda3/bin/conda init bash
35 | source ~/.bashrc
36 | ```
37 |
38 | Create Conda environment:
39 |
40 | ```sh
41 | conda create -n lmdeploy python=3.8 # 请安装py38,否则安装pytorch会出现不兼容的问题。
42 | ```
43 |
--------------------------------------------------------------------------------
/en/s3.md:
--------------------------------------------------------------------------------
1 | # S3.Install CMake-3.29.0 on Jetson
2 |
3 | "The pre-installed CMake version in Jetpack is too low. We need to upgrade the CMake version in order to compile and install LMDeploy."
4 |
5 | "To avoid the 'butterfly effect' caused by changes in the CMake version, this tutorial adopts a method that does not install the higher version of CMake into the system directory. Instead, it temporarily selects the higher version of CMake using the `export` environment variable when needed."
6 |
7 | Download cmake-3.29.0-rc1:
8 |
9 | [[Download by Browser]](https://github.com/Kitware/CMake/releases/download/v3.29.0-rc1/cmake-3.29.0-rc1-linux-aarch64.tar.gz)
10 |
11 | ```sh
12 | cd ~
13 | wget https://github.com/Kitware/CMake/releases/download/v3.29.0-rc1/cmake-3.29.0-rc1-linux-aarch64.tar.gz
14 | ```
15 |
16 | Unzip the install package:
17 |
18 | ```sh
19 | tar xf cmake-3.29.0-rc1-linux-aarch64.tar.gz
20 | ```
21 |
22 | Delete the install package:
23 |
24 | ```sh
25 | rm cmake-3.29.0-rc1-linux-aarch64.tar.gz
26 | ```
27 |
28 | Rename the folder:
29 |
30 | ```sh
31 | mv cmake-3.29.0-rc1-linux-aarch64 cmake-3.29.0
32 | cd cmake-3.29.0
33 | mv cmake-3.29.0-rc1-linux-aarch64/* .
34 | rm -rf cmake-3.29.0-rc1-linux-aarch64
35 | ```
36 |
37 | Verify if it can run normally:
38 |
39 | ```sh
40 | ./bin/cmake --version
41 | ```
42 |
43 | Temporarily specify CMake version as 3.29.0:
44 |
45 | ```sh
46 | export PATH=/home/nvidia/cmake-3.29.0/bin:$PATH
47 | ```
48 |
49 | **Note**: Please replace `nvidia` in the above command with your username.
50 |
51 | Check if the version has changed to 3.29.0:
52 |
53 | ```sh
54 | cmake --version
55 | ```
56 |
57 | **Note**: The higher version of CMake only takes effect in the current terminal session. You need to re-run `export` when opening a new terminal to use the higher version of CMake.
58 |
--------------------------------------------------------------------------------
/en/s4.md:
--------------------------------------------------------------------------------
1 | # S4.Install RapidJson on Jetson
2 |
3 | Clone the RapidJson repository.
4 |
5 | ```sh
6 | cd ~
7 | git clone https://github.com/Tencent/rapidjson.git
8 |
9 | # For Chinese User
10 | # git clone https://gitee.com/Tencent/RapidJSON.git
11 | ```
12 |
13 | Initialize submodules.
14 |
15 | ```sh
16 | cd rapidjson
17 |
18 | # For Chinese User
19 | # cd RapidJSON
20 | ```
21 |
22 | Complie RapidJson.
23 |
24 | ```sh
25 | mkdir build && cd build
26 | cmake .. \
27 | -DRAPIDJSON_BUILD_DOC=OFF \
28 | -DRAPIDJSON_BUILD_EXAMPLES=OFF \
29 | -DRAPIDJSON_BUILD_TESTS=OFF
30 | make -j4
31 | ```
32 |
33 | Install RapidJson to the system.
34 |
35 | ```sh
36 | sudo make install
37 | ```
--------------------------------------------------------------------------------
/en/s5.md:
--------------------------------------------------------------------------------
1 | # S5.Install Pytorch-2.1.0 on Jetson
2 |
3 | Download pytorch v2.1.0
4 |
5 | ```sh
6 | cd ~
7 | mkdir pytorch-2.1.0-cp38 && cd pytorch-2.1.0-cp38
8 | # For JetPack 5, execute the following command. For JetPack 6, please download the corresponding PyTorch 2.1 from the official website.
9 | wget https://developer.download.nvidia.cn/compute/redist/jp/v512/pytorch/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl
10 | ```
11 |
12 | [[Download Pytorch by Browser]](https://developer.download.nvidia.cn/compute/redist/jp/v512/pytorch/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl)
13 |
14 | Install dependencies.
15 |
16 | ```sh
17 | sudo apt-get install libopenblas-dev
18 | ```
19 |
20 | > **For Chinese users**: \
21 | > Replace Tsinghua Source: https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu-ports/ \
22 | > Please choose `Ubuntu 20.04 LTS (focal)`.
23 |
24 | Activate Conda environment.
25 |
26 | ```sh
27 | conda activate lmdeploy
28 | ```
29 |
30 | Install pytorch-v2.1.0
31 |
32 | ```sh
33 | pip install torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl
34 | ```
35 |
36 | > **For Chinese users**: \
37 | > Replace Tsinghua Source:
38 |
39 | ```sh
40 | pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
41 | ```
42 |
43 | Verify successful installation using Python interpreter mode, normal display should show as `True``.
44 |
45 | ```py
46 | import torch
47 | torch.cuda.is_available()
48 | ```
--------------------------------------------------------------------------------
/en/s6.md:
--------------------------------------------------------------------------------
1 | # S6.Port LMDeploy-0.2.5 to Jetson
2 |
3 | Download lmdeploy-v0.2.5:
4 |
5 | [[Download by Browser]](https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.2.5.zip)
6 |
7 | ```sh
8 | cd ~/
9 | git clone https://github.com/InternLM/lmdeploy.git
10 | cd lmdeploy
11 | git checkout c5f4014 # 确保为0.2.5版本
12 | ```
13 |
14 | Activate conda environment:
15 |
16 | ```sh
17 | conda activate lmdeploy
18 | ```
19 |
20 | Install dependencies.
21 |
22 | ```sh
23 | cd ~/lmdeploy
24 | pip install -r requirements/build.txt
25 | ```
26 |
27 | Create a new file named `generate_jetson.sh` under `~/lmdeploy`, and fill in the following content:
28 |
29 | ```sh
30 | #!/bin/sh
31 |
32 | builder="-G Ninja"
33 |
34 | if [ "$1" == "make" ]; then
35 | builder=""
36 | fi
37 |
38 | cmake ${builder} .. \
39 | -DCMAKE_BUILD_TYPE=RelWithDebInfo \
40 | -DCMAKE_EXPORT_COMPILE_COMMANDS=1 \
41 | -DCMAKE_INSTALL_PREFIX=./install \
42 | -DBUILD_PY_FFI=ON \
43 | -DBUILD_MULTI_GPU=OFF \
44 | -DCMAKE_CUDA_FLAGS="-lineinfo" \
45 | -DUSE_NVTX=ON
46 |
47 | ```
48 |
49 | Modify file permissions.
50 |
51 | ```sh
52 | chmod +x generate_jetson.sh
53 | ```
54 |
55 | Install Ninja.
56 |
57 | ```sh
58 | sudo apt-get install ninja-build
59 | ```
60 |
61 | Create a new build folder.
62 |
63 | ```sh
64 | cd ~/lmdeploy
65 | mkdir build && cd build
66 | ```
67 |
68 | Compile LMDeploy.
69 |
70 | ```sh
71 | ../generate_jetson.sh
72 | ninja install
73 | ```
74 |
75 | During the compilation process, the memory may run out and result in `Killed`. You can expand the swap capacity as follows, and then execute ninja install again.
76 |
77 | ```sh
78 | # Create a 6 GB swap area. The size can be customized and combined with the disk capacity
79 | sudo fallocate -l 6G /var/swapfile
80 | # Modify file permissions.
81 | sudo chmod 600 /var/swapfile
82 | # Make swap area
83 | sudo mkswap /var/swapfile
84 | # Setup swap area
85 | sudo swapon /var/swapfile
86 | # Setup swap area automatically
87 | sudo bash -c 'echo "/var/swapfile swap swap defaults 0 0" >> /etc/fstab'
88 | ```
89 |
90 | **Attention**: Use vim to edit `requirements/runtime.txt`, and delete the lines containing `torch<=2.1.2,>=2.0.0` and `triton>=2.1.0,<2.2.0`.
91 |
92 | **Note**: To simplify dependencies, we have removed `triton`. This also means that when deploying models using lmdeploy, they can only be invoked through the turbomind method, and not through the API method.
93 |
94 | Install lmdeploy-v0.2.5 locally.
95 |
96 | ```sh
97 | cd ~/lmdeploy
98 | pip install -e .[serve]
99 | ```
--------------------------------------------------------------------------------
/en/s7.md:
--------------------------------------------------------------------------------
1 | # S7.Run InternLM offline on Jetson
2 |
3 | Create directory to save models.
4 |
5 | ```sh
6 | mkdir -p ~/models
7 | ```
8 |
9 | Upload the `internlm-chat-7b-turbomind.tgz` obtained from [S1.Quantize on server by W4A16](./s1.md) to the `models` directory.
10 |
11 | Unzip the model.
12 |
13 | ```sh
14 | tar zxvf internlm-chat-7b-turbomind.tgz -C .
15 | ```
16 |
17 | ### 0.Bug fix: Modify the MMEngine module.
18 |
19 | The PyTorch version on Jetson does not support distributed reduce operations, which may cause errors in the distributed parts of the MMEngine module.
20 |
21 | Error as:
22 |
23 | ```sh
24 | AttributeError: module 'torch.distributed' has no attribute 'ReduceOp'
25 | ```
26 |
27 | Activate conda environment:
28 |
29 | ```sh
30 | conda activate lmdeploy
31 | ```
32 |
33 | Run Python in interpreter mode:
34 |
35 | ```sh
36 | python
37 | ```
38 |
39 | Enter the following content:
40 |
41 | ```py
42 | import mmengine
43 | print(mmengine.__file__)
44 | ```
45 |
46 | It will output the installation location of the MMEngine module. The author's location is`/home/nvidia/miniconda3/envs/lmdeploy/lib/python3.8/site-packages/mmengine/__init__.py`,then the location of that is`home/nvidia/miniconda3/envs/lmdeploy/lib/python3.8/site-packages/mmengine/`.Let's use `` instead.
47 |
48 | Modify line 208 of `/logging/logger.py`.
49 |
50 | ```git
51 | - global_rank = _get_rank()
52 | + global_rank = 0
53 | ```
54 |
55 | There will be no errors during operation.
56 |
57 | **Attention * *: This method is too crude and only applicable to Jetson platform deployment inference. It will affect distributed functionality on the server side!
58 |
59 | ### 1.CLI Mode
60 |
61 | Acitavate conda environment:
62 |
63 | ```sh
64 | conda activate lmdeploy
65 | ```
66 |
67 | Run model.
68 |
69 | ```sh
70 | lmdeploy chat turbomind ./internlm-chat-7b-turbomind
71 | ```
72 |
73 | 
74 |
75 | ### 2.Python Mode
76 |
77 | Write a running script `run_model.py` with the following content:
78 |
79 | ```py
80 | from lmdeploy import turbomind as tm
81 |
82 |
83 | if __name__ == "__main__":
84 | model_path = "./internlm-chat-7b-turbomind" # 修改成你的路径
85 |
86 | tm_model = tm.TurboMind.from_pretrained(model_path)
87 | generator = tm_model.create_instance()
88 |
89 | while True:
90 | inp = input("[User] >>> ")
91 | if inp == "exit":
92 | break
93 | prompt = tm_model.model.get_prompt(inp)
94 | input_ids = tm_model.tokenizer.encode(prompt)
95 | for outputs in generator.stream_infer(session_id=0, input_ids=[input_ids]):
96 | res = outputs[1]
97 | response = tm_model.tokenizer.decode(res)
98 | print("[Bot] <<< {}".format(response))
99 |
100 | ```
101 |
102 | Activate conda environment:
103 |
104 | ```sh
105 | conda activate lmdeploy
106 | ```
107 |
108 | Run the script:
109 |
110 | ```sh
111 | python run_model.py
112 | ```
113 |
114 | 
115 |
--------------------------------------------------------------------------------
/zh/benchmark.md:
--------------------------------------------------------------------------------
1 | # LMDeploy-Jetson基准测试
2 |
3 | 请首先参考S2-S7在Jetson部署LMDeploy。
4 |
5 | 激活conda环境。
6 |
7 | ```sh
8 | conda activate lmdeploy
9 | ```
10 |
11 | 进入`lmdeploy/benchmark`目录。
12 |
13 | ```sh
14 | cd ~/lmdeploy/benchmark
15 | ```
16 |
17 | 运行Benchmark。
18 |
19 | ```sh
20 | python profile_generation.py \
21 | /internlm2-chat-1_8b-turbomind \
22 | --concurrency 1 \
23 | --prompt-tokens 128 \
24 | --completion-tokens 128
25 | ```
26 |
27 | 其中`internlm2-chat-1_8b-turbomind`更换为你的模型路径。
28 |
29 | 记录推理速度benchmark。
30 |
31 | 
32 |
33 | 推理过程中,可通过`htop`命令查看统一内存占用情况。
34 |
35 | ```sh
36 | # htop安装方法(如已安装,请忽略)
37 | apt-get install htop
38 |
39 | # 运行htop,查看Mem
40 | htop
41 | ```
--------------------------------------------------------------------------------
/zh/s1.md:
--------------------------------------------------------------------------------
1 | # S1.服务器端模型W4A16量化
2 |
3 | 大模型推理时占用显存巨大,我们可以借助LMDeploy工具对模型进行[W4A16量化](https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/quantization/w4a16.md),转换为[TurboMind](https://github.com/InternLM/lmdeploy/blob/main/docs/zh_cn/inference/turbomind.md)模型,这样在推理时可以极大减少显存占用,使得在Jetson边缘计算平台部署大模型成为可能。
4 |
5 | ### 1.创建conda环境
6 |
7 | 安装Anaconda方法略。
8 |
9 | 创建conda环境:
10 |
11 | ```sh
12 | conda create -n lmdeploy python=3.10
13 | ```
14 |
15 | 激活conda环境:
16 |
17 | ```sh
18 | conda activate lmdeploy
19 | ```
20 |
21 | ### 2.安装LMDeploy
22 |
23 | 使用pip方法安装lmdeploy。
24 |
25 | ```sh
26 | # 直接用pip装lmdeploy时安装的pytorch的cuda可能是12版本的,运行时会引发链接错误
27 | # ref:https://github.com/InternLM/lmdeploy/issues/1169
28 | pip install torch==2.1.2 torchvision==0.16.2 torchaudio==2.1.2 --index-url https://download.pytorch.org/whl/cu118
29 | # 安装好torch-2.1.2-cu118后再安装lmdeploy
30 | pip install lmdeploy[all]==0.2.3
31 | ```
32 |
33 | ### 3.下载HF模型权重文件
34 |
35 | 创建目录
36 |
37 | ```sh
38 | mkdir -p ~/models && cd ~/models
39 | ```
40 |
41 | 安装依赖项。
42 |
43 | ```sh
44 | pip install modelscope
45 | ```
46 |
47 | 创建Python文件`download_models.py`:
48 |
49 | ```py
50 | #模型下载
51 | from modelscope import snapshot_download
52 | model_dir = snapshot_download('Shanghai_AI_Laboratory/internlm-chat-7b', cache_dir='internlm-chat-7b')
53 | print(model_dir)
54 | ```
55 |
56 | > 其中,internlm-chat-7b可替换为不同的模型,比如`internlm-chat-20b`,`internlm2-chat-1_8b`,`internlm2-chat-7b`,`internlm2-chat-20b`,下同。
57 |
58 | 运行下载脚本:
59 |
60 | ```sh
61 | python download_models.py
62 | ```
63 |
64 | 最后打印输出的路径就是模型保存的路径,请记录下。笔者为:
65 |
66 | ```sh
67 | internlm-chat-7b/Shanghai_AI_Laboratory/internlm-chat-7b
68 | ```
69 |
70 | ### 4.模型W4A16量化
71 |
72 | ```sh
73 | export HF_MODEL=./internlm-chat-7b/Shanghai_AI_Laboratory/internlm-chat-7b
74 | export WORK_DIR=./internlm-chat-7b-4bit
75 |
76 | lmdeploy lite auto_awq \
77 | $HF_MODEL \
78 | --calib-dataset 'ptb' \
79 | --calib-samples 128 \
80 | --calib-seqlen 2048 \
81 | --w-bits 4 \
82 | --w-group-size 128 \
83 | --work-dir $WORK_DIR
84 | ```
85 |
86 | 转换模型格式。
87 |
88 | ```sh
89 | export TM_DIR=./internlm-chat-7b-turbomind
90 |
91 | lmdeploy convert internlm-chat-7b \
92 | $WORK_DIR \
93 | --model-format awq \
94 | --group-size 128 \
95 | --dst-path $TM_DIR
96 | ```
97 |
98 | 修改配置文件`internlm-chat-7b-turbomind/triton_models/weights/config.ini`,修改如下三行:
99 |
100 | ```ini
101 | cache_max_entry_count = 0.5
102 | cache_block_seq_len = 128
103 | cache_chunk_size = 1
104 | ```
105 |
106 | 压缩turbomind模型。
107 |
108 | ```sh
109 | apt-get install pigz # 多线程加速压缩
110 | tar --use-compress-program=pigz -cvf internlm-chat-7b-turbomind.tgz ./internlm-chat-7b-turbomind
111 | ```
112 |
113 | 将`internlm-chat-7b-turbomind.tgz`保留备用。
--------------------------------------------------------------------------------
/zh/s2.md:
--------------------------------------------------------------------------------
1 | # S2.Jetson端安装Miniconda
2 |
3 | 在Jetson上安装conda虚拟环境方便管理python包。由于Anaconda过于庞大,因此选择安装Miniconda。
4 |
5 | 创建Miniconda安装目录。
6 |
7 | ```sh
8 | mkdir -p ~/miniconda3
9 | ```
10 |
11 | 下载Miniconda安装包:
12 |
13 | [[网页下载]](https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh)
14 |
15 | ```sh
16 | wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-aarch64.sh -O ~/miniconda3/miniconda.sh
17 | ```
18 |
19 | 安装Miniconda:
20 |
21 | ```sh
22 | bash ~/miniconda3/miniconda.sh -b -u -p ~/miniconda3
23 | ```
24 |
25 | 删除安装包:
26 |
27 | ```sh
28 | rm -rf ~/miniconda3/miniconda.sh
29 | ```
30 |
31 | 初始化bash配置:
32 |
33 | ```sh
34 | ~/miniconda3/bin/conda init bash
35 | source ~/.bashrc
36 | ```
37 |
38 | 创建conda环境:
39 |
40 | ```sh
41 | conda create -n lmdeploy python=3.8 # 请安装py38,否则安装pytorch会出现不兼容的问题。
42 | ```
43 |
--------------------------------------------------------------------------------
/zh/s3.md:
--------------------------------------------------------------------------------
1 | # S3.Jetson端安装CMake-3.29.0
2 |
3 | Jetpack预装的CMake版本太低了,我们需要升级一下CMake版本才能编译安装LMDeploy。
4 |
5 | 为了避免CMake版本变化引起“蝴蝶效应”,本教程采取的方法不会将高版本的CMake安装到系统目录,而是在使用时临时通过`export`环境变量的方式选用高版本的CMake。
6 |
7 | 下载cmake-3.29.0-rc1:
8 |
9 | [[网页下载]](https://github.com/Kitware/CMake/releases/download/v3.29.0-rc1/cmake-3.29.0-rc1-linux-aarch64.tar.gz)
10 |
11 | ```sh
12 | cd ~
13 | wget https://github.com/Kitware/CMake/releases/download/v3.29.0-rc1/cmake-3.29.0-rc1-linux-aarch64.tar.gz
14 | ```
15 |
16 | 解压压缩包:
17 |
18 | ```sh
19 | tar xf cmake-3.29.0-rc1-linux-aarch64.tar.gz
20 | ```
21 |
22 | 删除压缩包:
23 |
24 | ```sh
25 | rm cmake-3.29.0-rc1-linux-aarch64.tar.gz
26 | ```
27 |
28 | 重命名文件夹:
29 |
30 | ```sh
31 | mv cmake-3.29.0-rc1-linux-aarch64 cmake-3.29.0
32 | cd cmake-3.29.0
33 | mv cmake-3.29.0-rc1-linux-aarch64/* .
34 | rm -rf cmake-3.29.0-rc1-linux-aarch64
35 | ```
36 |
37 | 检验能否正常运行:
38 |
39 | ```sh
40 | ./bin/cmake --version
41 | ```
42 |
43 | 临时指定cmake版本为3.29.0:
44 |
45 | ```sh
46 | export PATH=/home/nvidia/cmake-3.29.0/bin:$PATH
47 | ```
48 |
49 | **注意**:请将上述命令中`nvidia`替换为你的用户名。
50 |
51 | 查看版本是否变成了3.29.0:
52 |
53 | ```sh
54 | cmake --version
55 | ```
56 |
57 | **注意**:高版本CMake只对当前终端生效。当你打开新的终端时,需要重新运行`export`才能使用高版本的CMake。
--------------------------------------------------------------------------------
/zh/s4.md:
--------------------------------------------------------------------------------
1 | # S4.Jetson端安装RapidJson
2 |
3 | 克隆RapidJson仓库。
4 |
5 | ```sh
6 | cd ~
7 | git clone https://github.com/Tencent/rapidjson.git
8 |
9 | # 对于中国用户:
10 | # git clone https://gitee.com/Tencent/RapidJSON.git
11 | ```
12 |
13 | 初始化子模块:
14 |
15 | ```sh
16 | cd rapidjson
17 |
18 | # 对于中国用户:
19 | # cd RapidJSON
20 | ```
21 |
22 | 编译RapidJson:
23 | ```sh
24 | mkdir build && cd build
25 | cmake .. \
26 | -DRAPIDJSON_BUILD_DOC=OFF \
27 | -DRAPIDJSON_BUILD_EXAMPLES=OFF \
28 | -DRAPIDJSON_BUILD_TESTS=OFF
29 | make -j4
30 | ```
31 |
32 | 将RapidJson安装到系统:
33 |
34 | ```sh
35 | sudo make install # 安装到系统
36 | ```
--------------------------------------------------------------------------------
/zh/s5.md:
--------------------------------------------------------------------------------
1 | # S5.Jetson端安装Pytorch-2.1.0
2 |
3 | 下载pytorch v2.1.0
4 |
5 | ```sh
6 | cd ~
7 | mkdir pytorch-2.1.0-cp38 && cd pytorch-2.1.0-cp38
8 | # Jetpack 5执行如下指令,JetPack 6请到官方网站下载对应的pytorch2.1
9 | wget https://developer.download.nvidia.cn/compute/redist/jp/v512/pytorch/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl
10 | ```
11 |
12 | [[网页下载Pytorch]](https://developer.download.nvidia.cn/compute/redist/jp/v512/pytorch/torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl)
13 |
14 | 安装依赖项
15 |
16 | ```sh
17 | sudo apt-get install libopenblas-dev
18 | ```
19 |
20 | > **对于中国用户**: \
21 | > 更换清华源: https://mirrors.tuna.tsinghua.edu.cn/help/ubuntu-ports/ \
22 | > 请选择 `Ubuntu 20.04 LTS (focal)`。
23 |
24 | 激活conda环境
25 |
26 | ```sh
27 | conda activate lmdeploy
28 | ```
29 |
30 | 安装pytorch-v2.1.0
31 |
32 | ```sh
33 | pip install torch-2.1.0a0+41361538.nv23.06-cp38-cp38-linux_aarch64.whl
34 | ```
35 |
36 | > **对于中国用户**: \
37 | > 更换清华源:
38 | ```sh
39 | pip config set global.index-url https://pypi.tuna.tsinghua.edu.cn/simple
40 | ```
41 |
42 | 使用python解释器模式验证是否安装成功,正常显示未True
43 |
44 | ```py
45 | import torch
46 | torch.cuda.is_available()
47 | ```
--------------------------------------------------------------------------------
/zh/s6.md:
--------------------------------------------------------------------------------
1 | # S6.Jetson端移植LMDeploy-0.2.5
2 |
3 | 下载lmdeploy-v0.2.5:
4 |
5 | [[网页下载]](https://github.com/InternLM/lmdeploy/archive/refs/tags/v0.2.5.zip)
6 |
7 | ```sh
8 | cd ~/
9 | git clone https://github.com/InternLM/lmdeploy.git
10 | cd lmdeploy
11 | git checkout c5f4014 # 确保为0.2.5版本
12 | ```
13 |
14 | 进入conda环境:
15 |
16 | ```sh
17 | conda activate lmdeploy
18 | ```
19 |
20 | 安装编译依赖项:
21 |
22 | ```sh
23 | cd ~/lmdeploy
24 | pip install -r requirements/build.txt
25 | ```
26 |
27 | 在`~/lmdeploy`下新建`generate_jetson.sh`,填入以下内容:
28 |
29 | ```sh
30 | #!/bin/sh
31 |
32 | builder="-G Ninja"
33 |
34 | if [ "$1" == "make" ]; then
35 | builder=""
36 | fi
37 |
38 | cmake ${builder} .. \
39 | -DCMAKE_BUILD_TYPE=RelWithDebInfo \
40 | -DCMAKE_EXPORT_COMPILE_COMMANDS=1 \
41 | -DCMAKE_INSTALL_PREFIX=./install \
42 | -DBUILD_PY_FFI=ON \
43 | -DBUILD_MULTI_GPU=OFF \
44 | -DCMAKE_CUDA_FLAGS="-lineinfo" \
45 | -DUSE_NVTX=ON
46 |
47 | ```
48 |
49 | 赋予权限。
50 |
51 | ```sh
52 | chmod +x generate_jetson.sh
53 | ```
54 |
55 | 安装Ninja。
56 |
57 | ```sh
58 | sudo apt-get install ninja-build
59 | ```
60 |
61 | 新建编译文件夹:
62 |
63 | ```sh
64 | cd ~/lmdeploy
65 | mkdir build && cd build
66 | ```
67 |
68 | 编译LMDeploy:
69 |
70 | ```sh
71 | ../generate_jetson.sh
72 | ninja install
73 | ```
74 |
75 | 编译过程中可能出现内存不足而导致`Killed`的现象。可以通过如下方式扩大交换区容量,再重新执行`ninja install`。
76 |
77 | ```sh
78 | # 新建6G大小的交换区,大小可自定义,结合硬盘容量
79 | sudo fallocate -l 6G /var/swapfile
80 | # 赋予权限
81 | sudo chmod 600 /var/swapfile
82 | # 建立交换区
83 | sudo mkswap /var/swapfile
84 | # 启用交换区
85 | sudo swapon /var/swapfile
86 | # 设置自动启用交换区
87 | sudo bash -c 'echo "/var/swapfile swap swap defaults 0 0" >> /etc/fstab'
88 | ```
89 |
90 | **注意**:使用vim修改`requirements/runtime.txt`,将`torch<=2.1.2,>=2.0.0`和`triton>=2.1.0,<2.2.0`两行行删除。
91 |
92 | **提示**:为了简化依赖,我们去除了`triton`,这也同时意味着使用lmdeploy部署模型时,只能通过turbomind方式调用,而不能通过api方式。
93 |
94 | 本地安装lmdeploy-v0.2.5
95 |
96 | ```sh
97 | cd ~/lmdeploy
98 | pip install -e .[serve]
99 | ```
--------------------------------------------------------------------------------
/zh/s7.md:
--------------------------------------------------------------------------------
1 | # S7.Jetson端离线运行InternLM大模型
2 |
3 | 创建模型保存目录:
4 |
5 | ```sh
6 | mkdir -p ~/models
7 | ```
8 |
9 | 将[S1.服务器端模型W4A16量化](./s1.md)得到的`internlm-chat-7b-turbomind.tgz`上传到`models`目录下。
10 |
11 | 解压模型文件:
12 |
13 | ```sh
14 | tar zxvf internlm-chat-7b-turbomind.tgz -C .
15 | ```
16 |
17 | ### 0.Bug解决:修改MMEngine库
18 |
19 | Jetson端的pytorch不支持分布式的reduce算子,这会导致MMEngine库中与分布式有关的部分出现错误。
20 |
21 | 错误为:
22 |
23 | ```sh
24 | AttributeError: module 'torch.distributed' has no attribute 'ReduceOp'
25 | ```
26 |
27 | 激活conda环境:
28 |
29 | ```sh
30 | conda activate lmdeploy
31 | ```
32 |
33 | 用解释器方式运行python:
34 |
35 | ```sh
36 | python
37 | ```
38 |
39 | 输入如下内容:
40 |
41 | ```py
42 | import mmengine
43 | print(mmengine.__file__)
44 | ```
45 |
46 | 这就输出了MMEngine库的安装位置,笔者的是`/home/nvidia/miniconda3/envs/lmdeploy/lib/python3.8/site-packages/mmengine/__init__.py`,那么相应位置就是`home/nvidia/miniconda3/envs/lmdeploy/lib/python3.8/site-packages/mmengine/`,咱们用``代替。
47 |
48 | 修改`/logging/logger.py`第208行:
49 |
50 | ```git
51 | - global_rank = _get_rank()
52 | + global_rank = 0
53 | ```
54 |
55 | 在运行就不会报错了。
56 |
57 | **注意**:该方式过于粗暴,仅适用于Jetson平台部署推理,在服务器端会影响分布式功能!
58 |
59 | ### 1.终端运行
60 |
61 | 激活conda环境:
62 |
63 | ```sh
64 | conda activate lmdeploy
65 | ```
66 |
67 | 运行模型:
68 |
69 | ```sh
70 | lmdeploy chat turbomind ./internlm-chat-7b-turbomind
71 | ```
72 |
73 | 
74 |
75 | ### 2.Python集成运行
76 |
77 | 编写运行脚本`run_model.py`,内容如下:
78 |
79 | ```py
80 | from lmdeploy import turbomind as tm
81 |
82 |
83 | if __name__ == "__main__":
84 | model_path = "./internlm-chat-7b-turbomind" # 修改成你的路径
85 |
86 | tm_model = tm.TurboMind.from_pretrained(model_path)
87 | generator = tm_model.create_instance()
88 |
89 | while True:
90 | inp = input("[User] >>> ")
91 | if inp == "exit":
92 | break
93 | prompt = tm_model.model.get_prompt(inp)
94 | input_ids = tm_model.tokenizer.encode(prompt)
95 | for outputs in generator.stream_infer(session_id=0, input_ids=[input_ids]):
96 | res = outputs[1]
97 | response = tm_model.tokenizer.decode(res)
98 | print("[Bot] <<< {}".format(response))
99 |
100 | ```
101 |
102 | 激活conda环境:
103 |
104 | ```sh
105 | conda activate lmdeploy
106 | ```
107 |
108 | 运行脚本:
109 |
110 | ```sh
111 | python run_model.py
112 | ```
113 |
114 | 
115 |
--------------------------------------------------------------------------------