├── LICENSE ├── README.md ├── git-pull-push.sh ├── llm-app.md ├── llm-arch.md ├── llm-compression.md ├── llm-inference.md ├── llm-train.md ├── llmops.md └── xxx.md /LICENSE: -------------------------------------------------------------------------------- 1 | Apache License 2 | Version 2.0, January 2004 3 | http://www.apache.org/licenses/ 4 | 5 | TERMS AND CONDITIONS FOR USE, REPRODUCTION, AND DISTRIBUTION 6 | 7 | 1. Definitions. 8 | 9 | "License" shall mean the terms and conditions for use, reproduction, 10 | and distribution as defined by Sections 1 through 9 of this document. 11 | 12 | "Licensor" shall mean the copyright owner or entity authorized by 13 | the copyright owner that is granting the License. 14 | 15 | "Legal Entity" shall mean the union of the acting entity and all 16 | other entities that control, are controlled by, or are under common 17 | control with that entity. For the purposes of this definition, 18 | "control" means (i) the power, direct or indirect, to cause the 19 | direction or management of such entity, whether by contract or 20 | otherwise, or (ii) ownership of fifty percent (50%) or more of the 21 | outstanding shares, or (iii) beneficial ownership of such entity. 22 | 23 | "You" (or "Your") shall mean an individual or Legal Entity 24 | exercising permissions granted by this License. 25 | 26 | "Source" form shall mean the preferred form for making modifications, 27 | including but not limited to software source code, documentation 28 | source, and configuration files. 29 | 30 | "Object" form shall mean any form resulting from mechanical 31 | transformation or translation of a Source form, including but 32 | not limited to compiled object code, generated documentation, 33 | and conversions to other media types. 34 | 35 | "Work" shall mean the work of authorship, whether in Source or 36 | Object form, made available under the License, as indicated by a 37 | copyright notice that is included in or attached to the work 38 | (an example is provided in the Appendix below). 39 | 40 | "Derivative Works" shall mean any work, whether in Source or Object 41 | form, that is based on (or derived from) the Work and for which the 42 | editorial revisions, annotations, elaborations, or other modifications 43 | represent, as a whole, an original work of authorship. For the purposes 44 | of this License, Derivative Works shall not include works that remain 45 | separable from, or merely link (or bind by name) to the interfaces of, 46 | the Work and Derivative Works thereof. 47 | 48 | "Contribution" shall mean any work of authorship, including 49 | the original version of the Work and any modifications or additions 50 | to that Work or Derivative Works thereof, that is intentionally 51 | submitted to Licensor for inclusion in the Work by the copyright owner 52 | or by an individual or Legal Entity authorized to submit on behalf of 53 | the copyright owner. For the purposes of this definition, "submitted" 54 | means any form of electronic, verbal, or written communication sent 55 | to the Licensor or its representatives, including but not limited to 56 | communication on electronic mailing lists, source code control systems, 57 | and issue tracking systems that are managed by, or on behalf of, the 58 | Licensor for the purpose of discussing and improving the Work, but 59 | excluding communication that is conspicuously marked or otherwise 60 | designated in writing by the copyright owner as "Not a Contribution." 61 | 62 | "Contributor" shall mean Licensor and any individual or Legal Entity 63 | on behalf of whom a Contribution has been received by Licensor and 64 | subsequently incorporated within the Work. 65 | 66 | 2. Grant of Copyright License. Subject to the terms and conditions of 67 | this License, each Contributor hereby grants to You a perpetual, 68 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 69 | copyright license to reproduce, prepare Derivative Works of, 70 | publicly display, publicly perform, sublicense, and distribute the 71 | Work and such Derivative Works in Source or Object form. 72 | 73 | 3. Grant of Patent License. Subject to the terms and conditions of 74 | this License, each Contributor hereby grants to You a perpetual, 75 | worldwide, non-exclusive, no-charge, royalty-free, irrevocable 76 | (except as stated in this section) patent license to make, have made, 77 | use, offer to sell, sell, import, and otherwise transfer the Work, 78 | where such license applies only to those patent claims licensable 79 | by such Contributor that are necessarily infringed by their 80 | Contribution(s) alone or by combination of their Contribution(s) 81 | with the Work to which such Contribution(s) was submitted. If You 82 | institute patent litigation against any entity (including a 83 | cross-claim or counterclaim in a lawsuit) alleging that the Work 84 | or a Contribution incorporated within the Work constitutes direct 85 | or contributory patent infringement, then any patent licenses 86 | granted to You under this License for that Work shall terminate 87 | as of the date such litigation is filed. 88 | 89 | 4. Redistribution. You may reproduce and distribute copies of the 90 | Work or Derivative Works thereof in any medium, with or without 91 | modifications, and in Source or Object form, provided that You 92 | meet the following conditions: 93 | 94 | (a) You must give any other recipients of the Work or 95 | Derivative Works a copy of this License; and 96 | 97 | (b) You must cause any modified files to carry prominent notices 98 | stating that You changed the files; and 99 | 100 | (c) You must retain, in the Source form of any Derivative Works 101 | that You distribute, all copyright, patent, trademark, and 102 | attribution notices from the Source form of the Work, 103 | excluding those notices that do not pertain to any part of 104 | the Derivative Works; and 105 | 106 | (d) If the Work includes a "NOTICE" text file as part of its 107 | distribution, then any Derivative Works that You distribute must 108 | include a readable copy of the attribution notices contained 109 | within such NOTICE file, excluding those notices that do not 110 | pertain to any part of the Derivative Works, in at least one 111 | of the following places: within a NOTICE text file distributed 112 | as part of the Derivative Works; within the Source form or 113 | documentation, if provided along with the Derivative Works; or, 114 | within a display generated by the Derivative Works, if and 115 | wherever such third-party notices normally appear. The contents 116 | of the NOTICE file are for informational purposes only and 117 | do not modify the License. You may add Your own attribution 118 | notices within Derivative Works that You distribute, alongside 119 | or as an addendum to the NOTICE text from the Work, provided 120 | that such additional attribution notices cannot be construed 121 | as modifying the License. 122 | 123 | You may add Your own copyright statement to Your modifications and 124 | may provide additional or different license terms and conditions 125 | for use, reproduction, or distribution of Your modifications, or 126 | for any such Derivative Works as a whole, provided Your use, 127 | reproduction, and distribution of the Work otherwise complies with 128 | the conditions stated in this License. 129 | 130 | 5. Submission of Contributions. Unless You explicitly state otherwise, 131 | any Contribution intentionally submitted for inclusion in the Work 132 | by You to the Licensor shall be under the terms and conditions of 133 | this License, without any additional terms or conditions. 134 | Notwithstanding the above, nothing herein shall supersede or modify 135 | the terms of any separate license agreement you may have executed 136 | with Licensor regarding such Contributions. 137 | 138 | 6. Trademarks. This License does not grant permission to use the trade 139 | names, trademarks, service marks, or product names of the Licensor, 140 | except as required for reasonable and customary use in describing the 141 | origin of the Work and reproducing the content of the NOTICE file. 142 | 143 | 7. Disclaimer of Warranty. Unless required by applicable law or 144 | agreed to in writing, Licensor provides the Work (and each 145 | Contributor provides its Contributions) on an "AS IS" BASIS, 146 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or 147 | implied, including, without limitation, any warranties or conditions 148 | of TITLE, NON-INFRINGEMENT, MERCHANTABILITY, or FITNESS FOR A 149 | PARTICULAR PURPOSE. You are solely responsible for determining the 150 | appropriateness of using or redistributing the Work and assume any 151 | risks associated with Your exercise of permissions under this License. 152 | 153 | 8. Limitation of Liability. In no event and under no legal theory, 154 | whether in tort (including negligence), contract, or otherwise, 155 | unless required by applicable law (such as deliberate and grossly 156 | negligent acts) or agreed to in writing, shall any Contributor be 157 | liable to You for damages, including any direct, indirect, special, 158 | incidental, or consequential damages of any character arising as a 159 | result of this License or out of the use or inability to use the 160 | Work (including but not limited to damages for loss of goodwill, 161 | work stoppage, computer failure or malfunction, or any and all 162 | other commercial damages or losses), even if such Contributor 163 | has been advised of the possibility of such damages. 164 | 165 | 9. Accepting Warranty or Additional Liability. While redistributing 166 | the Work or Derivative Works thereof, You may choose to offer, 167 | and charge a fee for, acceptance of support, warranty, indemnity, 168 | or other liability obligations and/or rights consistent with this 169 | License. However, in accepting such obligations, You may act only 170 | on Your own behalf and on Your sole responsibility, not on behalf 171 | of any other Contributor, and only if You agree to indemnify, 172 | defend, and hold each Contributor harmless for any liability 173 | incurred by, or claims asserted against, such Contributor by reason 174 | of your accepting any such warranty or additional liability. 175 | 176 | END OF TERMS AND CONDITIONS 177 | 178 | APPENDIX: How to apply the Apache License to your work. 179 | 180 | To apply the Apache License to your work, attach the following 181 | boilerplate notice, with the fields enclosed by brackets "[]" 182 | replaced with your own identifying information. (Don't include 183 | the brackets!) The text should be enclosed in the appropriate 184 | comment syntax for the file format. We also recommend that a 185 | file or class name and description of purpose be included on the 186 | same "printed page" as the copyright notice for easier 187 | identification within third-party archives. 188 | 189 | Copyright [yyyy] [name of copyright owner] 190 | 191 | Licensed under the Apache License, Version 2.0 (the "License"); 192 | you may not use this file except in compliance with the License. 193 | You may obtain a copy of the License at 194 | 195 | http://www.apache.org/licenses/LICENSE-2.0 196 | 197 | Unless required by applicable law or agreed to in writing, software 198 | distributed under the License is distributed on an "AS IS" BASIS, 199 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 200 | See the License for the specific language governing permissions and 201 | limitations under the License. 202 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # llm-resource(LLM 百宝箱) 2 | 3 | LLM全栈优质资源汇总 4 | 5 | > 非常欢迎大家也参与进来,收集更多优质大模型相关资源。 6 | 7 | ## 目录 8 | 9 | - 🐼 [LLM算法](#llm算法) 10 | - 🐘 [LLM训练](#llm训练) 11 | - 🐘 [LLM微调](#llm微调) 12 | - 🐼 [LLM对齐](#llm对齐) 13 | - 🔥 [LLM推理](#llm推理) 14 | - :palm_tree: [LLM数据工程(Data Engineering)](#llm数据工程) 15 | - 📡 [LLM压缩](#llm压缩) 16 | - 🐰 [LLM测评](#llm测评) 17 | - 🐘 [AI基础知识](#ai基础知识) 18 | - 📡 [AI基础设施](#ai基础设施) 19 | - :palm_tree: [AI芯片](#ai芯片) 20 | - 🐰 [CUDA](#cuda) 21 | - 🐘 [AI编译器](#ai编译器) 22 | - 🐰 [AI框架](#ai框架) 23 | - 📡 [LLM应用开发](#llm应用开发) 24 | - 🐘 [LLMOps](#llmops) 25 | - 📡 [LLM实践](llm实践) 26 | - 📡[微信公众号文章集锦](#微信公众号文章集锦) 27 | 28 | 29 | 30 | ## LLM算法 31 | 32 | 33 | ### Transformer 34 | 35 | 原理: 36 | - [Transformer模型详解(图解最完整版](https://zhuanlan.zhihu.com/p/338817680) 37 | - [OpenAI ChatGPT(一):十分钟读懂 Transformer](https://zhuanlan.zhihu.com/p/600773858) 38 | - [Transformer的结构是什么样的?各个子模块各有什么作用?](https://blog.csdn.net/m0_54929869/article/details/118881804) 39 | - [以Transformer结构为基础的大模型参数量、计算量、中间激活以及KV cache剖析](https://mp.weixin.qq.com/s/3JYz6yrLeBr5ujip3LZe6w) 40 | - [Transformer 一起动手编码学原理](https://mp.weixin.qq.com/s/NgUNuWhvp2SqG-XWYv2PGQ) 41 | - [为什么transformer(Bert)的多头注意力要对每一个head进行降维?](http://www.sniper97.cn/index.php/note/deep-learning/note-deep-learning/4002/) 42 | 43 | 44 | 45 | 源码: 46 | 47 | - [OpenAI ChatGPT(一):Tensorflow实现Transformer](https://zhuanlan.zhihu.com/p/603243890) 48 | - [OpenAI ChatGPT(一):十分钟读懂 Transformer](https://zhuanlan.zhihu.com/p/600773858) 49 | - [GPT (一)transformer原理和代码详解](https://zhuanlan.zhihu.com/p/632880248) 50 | - [Transformer源码详解(Pytorch版本)](https://zhuanlan.zhihu.com/p/398039366) 51 | - [搞懂Transformer结构,看这篇PyTorch实现就够了](https://zhuanlan.zhihu.com/p/339207092) 52 | 53 | 54 | 55 | ### GPT1 56 | 57 | 58 | ### GPT2 59 | 60 | 61 | - GPT2 源码:https://github.com/huggingface/transformers/blob/main/src/transformers/models/gpt2/modeling_gpt2.py 62 | - GPT2 源码解析:https://zhuanlan.zhihu.com/p/630970209 63 | - nanoGPT:https://github.com/karpathy/nanoGPT/blob/master/model.py 64 | 65 | 66 | - 7.3 GPT2模型深度解析:http://121.199.45.168:13013/7_3.html 67 | - GPT(三)GPT2原理和代码详解: https://zhuanlan.zhihu.com/p/637782385 68 | - GPT2参数量剖析: https://zhuanlan.zhihu.com/p/640501114 69 | 70 | 71 | ### ChatGPT 72 | 73 | - [State of GPT:大神Andrej揭秘OpenAI大模型原理和训练过程](https://mp.weixin.qq.com/s/zmEGzm1cdXupNoqZ65h7yg) 74 | - [OpenAI联合创始人亲自上场科普GPT,让技术小白也能理解最强AI](https://mp.weixin.qq.com/s/MD4WwwJLXm8rEm-sniX8Gw) 75 | 76 | 77 | 78 | 79 | 80 | ### GLM 81 | 82 | - [预训练语言模型:GLM](https://zhuanlan.zhihu.com/p/641499380) 83 | 84 | 85 | ### LLaMA 86 | 87 | 88 | 89 | ### MOE 大模型 90 | 91 | - [Mixtral-8x7B MoE大模型微调实践,超越Llama2-65B](https://mp.weixin.qq.com/s?__biz=MzU3Mzg5ODgxMg==&mid=2247486657&idx=1&sn=c5a5e55b01243f477d063c9194d24f42&chksm=fd3be592ca4c6c84bf5eefff23dcc38eeb83624e9f53bbd9a72afba71e235dddf814549322ba&token=499509118&lang=zh_CN#rd) 92 | - [大模型分布式训练并行技术(八)-MOE并行](https://mp.weixin.qq.com/s?__biz=MzU3Mzg5ODgxMg==&mid=2247486145&idx=1&sn=299c28153b286465be26e18153c6db5d&chksm=fd3be392ca4c6a84be283dad80f584443302ea29fc95744f83727e7d9d68952d3a0f8b1b66d5&token=499509118&lang=zh_CN#rd) 93 | - [MoE架构模型爆发或将带飞国产AI芯片](https://mp.weixin.qq.com/s?__biz=MzU3Mzg5ODgxMg==&mid=2247488422&idx=1&sn=eeb18ec0f5b9e972df31d65e7db13f8f&chksm=fd3bfaf5ca4c73e38a696fe7b6f33a30af962fdddfabd92d74b1d06190442759aabe7b560f22&token=499509118&lang=zh_CN#rd) 94 | - [大模型的模型融合方法概述](https://mp.weixin.qq.com/s?__biz=MzU3Mzg5ODgxMg==&mid=2247487652&idx=1&sn=1bbf692b6e1dc6bae719c8e0a10293a0&chksm=fd3bf9f7ca4c70e16473a98d5408f6daea5e8c116a88cb3f41dfb00ffb7f6016874ee092224c&token=499509118&lang=zh_CN#rd) 95 | - [混合专家模型 (MoE) 详解](https://zhuanlan.zhihu.com/p/674698482) 96 | - [群魔乱舞:MoE大模型详解](https://zhuanlan.zhihu.com/p/677638939) 97 | - [大模型LLM之混合专家模型MoE(上-基础篇)](https://zhuanlan.zhihu.com/p/672712751) 98 | - [大模型LLM之混合专家模型MoE(下-实现篇)](https://zhuanlan.zhihu.com/p/673048264) 99 | 100 | 101 | ### 下一代大模型 102 | 103 | - https://github.com/NExT-GPT/NExT-GPT 104 | - https://next-gpt.github.io/ 105 | - [Introduction to NExT-GPT: Any-to-Any Multimodal Large Language Model](https://www.kdnuggets.com/introduction-to-nextgpt-anytoany-multimodal-large-language-model) 106 | 107 | 108 | 109 | ### 多模态大模型 110 | 111 | A Survey on Multimodal Large Language Models:https://arxiv.org/pdf/2306.13549 112 | Efficient-Multimodal-LLMs-Survey:https://github.com/lijiannuist/Efficient-Multimodal-LLMs-Survey 113 | 114 | 115 | ## LLM训练 116 | 117 | 118 | - [OPT-175B是如何炼成的](https://zhuanlan.zhihu.com/p/622061951) 119 | - [全网最全-混合精度训练原理](https://zhuanlan.zhihu.com/p/441591808) 120 | - [飞桨分布式训练4D混合并行可训千亿级AI模型](https://ai.baidu.com/forum/topic/show/987996) 121 | - [Transformer Math 101](https://blog.eleuther.ai/transformer-math/) - 如何计算显存消耗? 122 | - [Megatron-LM 第三篇Paper总结——Sequence Parallelism & Selective Checkpointing](https://zhuanlan.zhihu.com/p/522198082) 123 | - [大模型训练踩坑](https://zhuanlan.zhihu.com/p/660759033) 124 | 125 | 126 | - 学习率(warmup, decay): 127 | - [模型调优,学习率设置(Warm Up、loss自适应衰减等),batch size调优技巧,基于方差放缩初始化方法](https://blog.csdn.net/sinat_39620217/article/details/130236886) 128 | - [深度学习模型训练小技巧](https://blog.csdn.net/sgyuanshi/article/details/108394444) 129 | 130 | 131 | ### LLM微调 132 | 133 | - [Adapting P-Tuning to Solve Non-English Downstream Tasks](https://developer.nvidia.com/blog/adapting-p-tuning-to-solve-non-english-downstream-tasks/) 134 | 135 | 136 | ### LLM对齐 137 | 138 | - [MOSS-RLHF](https://github.com/OpenLMLab/MOSS-RLHF) 139 | - [模型调优(RLHF/DPO/ORPO)- 终极指南](https://zhuanlan.zhihu.com/p/692594519) 140 | - [DPO: Direct Preference Optimization 论文解读及代码实践](https://zhuanlan.zhihu.com/p/642569664) 141 | - [强化学习入门:基本思想和经典算法](https://imzhanghao.com/2022/02/10/reinforcement-learning/) 142 | - [人人都能看懂的PPO原理与源码解读](https://zhuanlan.zhihu.com/p/677607581) 143 | - [关于Instruct GPT复现的一些细节与想法](https://zhuanlan.zhihu.com/p/609078527) 144 | 145 | 146 | paper: 147 | 148 | - [LLM对齐综述](https://arxiv.org/pdf/2407.16216) 149 | - [RLHF-PPO](https://arxiv.org/pdf/2203.02155) 150 | - [DPO](https://arxiv.org/pdf/2305.18290) 151 | - [ORPO](https://arxiv.org/pdf/2403.07691) 152 | 153 | 154 | ## LLM推理 155 | 156 | 157 | - [使用HuggingFace的Accelerate库加载和运行超大模型](https://zhuanlan.zhihu.com/p/605640431) : device_map、no_split_module_classes、 offload_folder、 offload_state_dict 158 | - [借助 PyTorch,Accelerate 如何运行超大模型](https://huggingface.co/blog/accelerate-large-models) 159 | - [使用 DeepSpeed 和 Accelerate 进行超快 BLOOM 模型推理](https://huggingface.co/blog/zh/bloom-inference-pytorch-scripts) 160 | - [LLM七种推理服务框架总结](https://zhuanlan.zhihu.com/p/653352979) 161 | - [LLM投机采样(Speculative Sampling)为何能加速模型推理](https://zhuanlan.zhihu.com/p/653734659) 162 | - [大模型推理妙招—投机采样(Speculative Decoding)](https://zhuanlan.zhihu.com/p/651359908) 163 | - https://github.com/flexflow/FlexFlow/tree/inference 164 | - [TensorRT-LLM(3)--架构](https://zhuanlan.zhihu.com/p/665595557) 165 | - NLP(十八):LLM 的推理优化技术纵览:https://zhuanlan.zhihu.com/p/642412124 166 | - ​揭秘NVIDIA大模型推理框架:TensorRT-LLM:https://zhuanlan.zhihu.com/p/680808866 167 | - [如何生成文本: 通过 Transformers 用不同的解码方法生成文本](https://huggingface.co/blog/zh/how-to-generate) | [How to generate text: using different decoding methods for language generation with Transformers](https://huggingface.co/blog/how-to-generate) 168 | 169 | 170 | 171 | 172 | 173 | ### 大模型推理优化技术 174 | 175 | 176 | KV Cache: 177 | - [图解大模型推理优化:KV Cache](https://mp.weixin.qq.com/s?__biz=MzU3Mzg5ODgxMg==&mid=2247486956&idx=1&sn=cd5e36857bbd8ebd750d2c172550d2bd&chksm=fd3be4bfca4c6da9f2276310995c7d60a42c0d01a960a42a38226cf954bab0d2d2a5772905df&token=1409805983&lang=zh_CN#rd) 178 | - [大模型推理百倍加速之KV cache篇](https://mp.weixin.qq.com/s?__biz=MzU3Mzg5ODgxMg==&mid=2247487886&idx=1&sn=38d3cd36c6c5acb2fe5c80ceffcba2cf&chksm=fd3bf8ddca4c71cb243566b593dfa095926b003a4a06442cc96e8ce3f2c64171b34f0bca8428&token=1409805983&lang=zh_CN#rd) 179 | - [大模型推理加速:看图学KV Cache](https://zhuanlan.zhihu.com/p/662498827) 180 | - [大模型推理性能优化之KV Cache解读](https://zhuanlan.zhihu.com/p/630832593) 181 | 182 | 183 | 解码优化: 184 | - [大模型推理妙招—投机采样(Speculative Decoding)](https://zhuanlan.zhihu.com/p/651359908) 185 | 186 | 187 | 188 | 189 | ### vLLM 190 | 191 | - [vLLM(六)源码解读下 @HelloWorld](https://zhuanlan.zhihu.com/p/694442998) 192 | - [猛猿:图解大模型计算加速系列:vLLM源码解析1,整体架构](https://zhuanlan.zhihu.com/p/691045737) 193 | - [LLM推理2:vLLM源码学习 @ akaihaoshuai ](https://zhuanlan.zhihu.com/p/643336063) 194 | - [大模型推理框架 vLLM 源码解析(一):框架概览](https://zhuanlan.zhihu.com/p/681402162) 195 | 196 | 197 | ## LLM数据工程 198 | 199 | - [An Initial Exploration of Theoretical Support for Language Model Data Engineering. Part 1: Pretraining @ 200 | 符尧](https://yaofu.notion.site/An-Initial-Exploration-of-Theoretical-Support-for-Language-Model-Data-Engineering-Part-1-Pretraini-dc480d9bf7ff4659afd8c9fb738086eb) 201 | 202 | 203 | 204 | ## LLM压缩 205 | 206 | 207 | 208 | - [Awesome Model Quantization](https://github.com/htqin/awesome-model-quantization) 209 | - [Efficient-LLMs-Survey](https://github.com/AIoT-MLSys-Lab/Efficient-LLMs-Survey) 210 | - [Awesome LLM Compression](https://github.com/HuangOwen/Awesome-LLM-Compression) 211 | - [模型转换、模型压缩、模型加速工具汇总](https://blog.csdn.net/WZZ18191171661/article/details/99700992) 212 | - [AI 框架部署方案之模型转换](https://zhuanlan.zhihu.com/p/396781295) 213 | - [Pytorch 模型转 TensorRT (torch2trt 教程)](https://zhuanlan.zhihu.com/p/570822430) 214 | 215 | 216 | 217 | ## LLM测评 218 | 219 | - [CLiB中文大模型能力评测榜单](https://github.com/jeinlee1991/chinese-llm-benchmark) 220 | - [huggingface Open LLM Leaderboard](https://huggingface.co/spaces/HuggingFaceH4/open_llm_leaderboard) 221 | - HELM:https://github.com/stanford-crfm/helm 222 | - HELM:https://crfm.stanford.edu/helm/latest/ 223 | - lm-evaluation-harness:https://github.com/EleutherAI/lm-evaluation-harness/ 224 | - CLEVA:http://www.lavicleva.com/#/homepage/overview 225 | - CLEVA:https://github.com/LaVi-Lab/CLEVA/blob/main/README_zh-CN.md 226 | 227 | 228 | 229 | ## 提示工程 230 | 231 | 232 | - [做数据关键步骤:怎么写好prompt?](https://mp.weixin.qq.com/s?__biz=MzU3Mzg5ODgxMg==&mid=2247486771&idx=1&sn=359c029b010d7ad96fff33952ad634a8&chksm=fd3be460ca4c6d76b4996f971ff21080ca0a83f3042893bb6827752ad8af812b4afeb1151af1&token=1288418017&lang=zh_CN#rd) 233 | - [从1000+模板中总结出的10大提示工程方法助你成为提示词大师!](https://mp.weixin.qq.com/s?__biz=MzU3Mzg5ODgxMg==&mid=2247486174&idx=1&sn=97ddcd5fb44eb4e3143fa746b7d617c8&chksm=fd3be38dca4c6a9b94fb88bd3f7a5009dee53812412e6f62f9f0a5955d165dd0d5f6ce698208&scene=21#wechat_redirect) 234 | - [一文搞懂提示工程的原理及前世今生](https://mp.weixin.qq.com/s?__biz=MzU3Mzg5ODgxMg==&mid=2247485231&idx=1&sn=acfa77264da611983a49297ab8376e8f&chksm=fd3bee7cca4c676a3ccbc459e70a9e9920b08369a4d618c4ed550c96e9acd09b594cc04b21a6&scene=21#wechat_redirect) 235 | - [Effective Prompt: 编写高质量Prompt的14个有效方法](https://mp.weixin.qq.com/s?__biz=MzU3Mzg5ODgxMg==&mid=2247486087&idx=1&sn=118b82abd4b22975e9aeb9f23ed0c9c5&chksm=fd3be3d4ca4c6ac2b41f1c3e908b845d4497a84dc9741034d1e1a830cba93515439b60a835e5&scene=21#wechat_redirect) 236 | - [提示工程和提示构造技巧](https://mp.weixin.qq.com/s?__biz=MzU3Mzg5ODgxMg==&mid=2247487107&idx=1&sn=337325ee6a9a4d4c56821b1e759f1555&chksm=fd3be7d0ca4c6ec60b6394bf76282ee3eef6beccfe2c31885cbb111a5bdc32022ba346509681&token=1288418017&lang=zh_CN#rd) 237 | - [一文带你了解提示攻击!](https://mp.weixin.qq.com/s?__biz=MzU3Mzg5ODgxMg==&mid=2247485936&idx=1&sn=0bcc72e5bfeb50c437253626d763f67d&chksm=fd3be0a3ca4c69b52bba0e0f22730b497c56fad99444b23d437cf49262cd5e52489fb141d338&token=1288418017&lang=zh_CN#rd) 238 | 239 | 240 | 241 | ## 综合 242 | 243 | - [通向AGI之路:大型语言模型(LLM)技术精要](https://zhuanlan.zhihu.com/p/597586623) 244 | - [大语言模型的涌现能力:现象与解释](https://zhuanlan.zhihu.com/p/621438653) 245 | - [NLP(十八):LLM 的推理优化技术纵览](https://zhuanlan.zhihu.com/p/642412124) 246 | - [并行计算3:并行计算模型](https://zhuanlan.zhihu.com/p/568947162) 247 | - [大模型“幻觉”,看这一篇就够了 | 哈工大华为出品](https://www.thepaper.cn/newsDetail_forward_25344873) 248 | 249 | 250 | **safetensors**: 251 | 252 | - [bin和safetensors区别是什么?](https://www.zhihu.com/question/629624037/answer/3307818120) 253 | - [Safetensors:保存模型权重的新格式](https://zhuanlan.zhihu.com/p/691446249) 254 | - [github: safetensors](https://github.com/huggingface/safetensors) 255 | - [huggingface: safetensors](https://huggingface.co/docs/safetensors/index) 256 | - [Safetensors: a simple, safe and faster way to store and distribute tensors.](https://medium.com/@mandalsouvik/safetensors-a-simple-and-safe-way-to-store-and-distribute-tensors-d9ba1931ba04) 257 | - https://huggingface.co/docs/safetensors/index 258 | - https://github.com/huggingface/safetensors/tree/v0.3.3 259 | - [手把手教你:LLama2原始权重转HF模型](https://zhuanlan.zhihu.com/p/669158180) 260 | 261 | 262 | ## AI框架 263 | 264 | ### PyTorch 265 | 266 | - [PyTorch 源码解读系列](https://zhuanlan.zhihu.com/p/328674159) @ OpenMMLab 团队 267 | - [[源码解析] PyTorch 分布式](https://juejin.cn/post/7026144707591815175) @ 罗西的思考 268 | - [PyTorch 分布式(18) --- 使用 RPC 的分布式流水线并行](https://juejin.cn/post/7043601075307282462) @ 罗西的思考 269 | - [【Pytorch】model.train() 和 model.eval() 原理与用法](https://blog.csdn.net/weixin_44211968/article/details/123774649) 270 | 271 | ### DeepSpeed 272 | 273 | - [DeepSpeed使用指南(简略版)](https://blog.csdn.net/weixin_43301333/article/details/127237122) 274 | - [关于Deepspeed的一些总结与心得](https://zhuanlan.zhihu.com/p/650824387) 275 | 276 | 277 | ### Megatron-LM 278 | 279 | - [Megatron-LM 近期的改动](https://zhuanlan.zhihu.com/p/651192295) 280 | - [深入理解 Megatron-LM(1)基础知识](https://zhuanlan.zhihu.com/p/650234985) @ 简枫 281 | - [深入理解 Megatron-LM(2)原理介绍](https://zhuanlan.zhihu.com/p/650383289) 282 | - [[源码解析] 模型并行分布式训练Megatron (1) --- 论文 & 基础](https://juejin.cn/post/7057837676430360584) @ 罗西的思考 283 | - [[源码解析] 模型并行分布式训练Megatron (2) --- 整体架构](https://juejin.cn/post/7061942798957674504) 284 | - [[细读经典]Megatron论文和代码详细分析(1)](https://zhuanlan.zhihu.com/p/366906920) @迷途小书僮​ 285 | - [[细读经典]Megatron论文和代码详细分析(2)](https://zhuanlan.zhihu.com/p/388830967) 286 | 287 | 288 | ### Megatron-DeepSpeed 289 | 290 | 291 | ### Huggingface Transformers 292 | 293 | 294 | 295 | 296 | ## [AI基础知识](./ai-base.md) 297 | 298 | 299 | ## AI基础设施 300 | 301 | ### AI芯片 302 | 303 | - [业界AI加速芯片浅析(一)百度昆仑芯](https://zhuanlan.zhihu.com/p/593143821) 304 | - NVIDIA CUDA-X AI:https://www.nvidia.cn/technologies/cuda-x/ 305 | - [Intel,Nvidia,AMD三大巨头火拼GPU与CPU](https://zhuanlan.zhihu.com/p/629024100) 306 | - 处理器与AI芯片-Google-TPU:https://zhuanlan.zhihu.com/p/646793355 307 | - [一文看懂国产AI芯片玩家](https://www.xckfsq.com/news/show.html?id=29187) 308 | - [深度 | 国产AI芯片,玩家几何](https://mp.weixin.qq.com/s?__biz=MzIwMzgzNTQ1Nw==&mid=2247599349&idx=1&sn=12459cbc418d3831d0c28e87ddb71b2f&scene=21#wechat_redirect) 309 | 310 | 311 | ### CUDA 312 | 313 | - [CUDA编程入门(一)CUDA编程模型](https://zhuanlan.zhihu.com/p/97044592) 314 | - [GPU编程(CUDA)](https://face2ai.com/program-blog/) 315 | - [CUDA编程入门极简教程](https://zhuanlan.zhihu.com/p/34587739) 316 | 317 | 318 | 319 | 320 | ## AI编译器 321 | 322 | - [TVM资料](https://github.com/BBuf/tvm_mlir_learn) 323 | - [AI编译器原理](https://www.bilibili.com/read/cv21242696/?spm_id_from=333.999.0.0) @ZIMO酱 324 | 325 | 326 | ## LLM应用开发 327 | 328 | - [动手学大模型应用开发](https://github.com/datawhalechina/llm-universe) 329 | - [langchain java](https://github.com/HamaWhiteGG/langchain-java) 330 | - [大模型主流应用RAG的介绍——从架构到技术细节](https://luxiangdong.com/2023/09/25/ragone/#/%E5%86%99%E5%9C%A8%E5%89%8D%E9%9D%A2) 331 | - [基于检索的大语言模型和应用(陈丹琦)](https://acl2023-retrieval-lm.github.io/) 332 | - [大模型bad case修复方案思考](https://mp.weixin.qq.com/s/xqFkfzHVnePf1ub_sCk9iw) 333 | - [《综述:全新大语言模型驱动的Agent》——4.5万字详细解读复旦NLP和米哈游最新Agent Survey](https://zhuanlan.zhihu.com/p/656676717) 334 | 335 | 336 | 337 | 338 | ## LLMOps 339 | 340 | - [MLOps Landscape in 2023: Top Tools and Platforms](https://neptune.ai/blog/mlops-tools-platforms-landscape) 341 | - [What Constitutes A Large Language Model Application? ](https://cobusgreyling.medium.com/what-constitutes-a-large-language-model-application-bacf81103475):LLM Functionality Landscape 342 | - [AI System @吃果冻不吐果冻皮](https://github.com/liguodongiot/ai-system) 343 | 344 | 345 | 346 | 347 | ## RAG 348 | 349 | - https://github.com/hymie122/RAG-Survey 350 | 351 | ## 书籍 352 | 353 | - 大语言模型原理与工程 @杨青 354 | - [大语言模型从理论到实践](https://intro-llm.github.io/chapter/LLM-TAP.pdf) @张奇 :https://intro-llm.github.io/ 355 | - [动手学大模型](https://github.com/Lordog/dive-into-llms?tab=readme-ov-file) 356 | 357 | ## LLM实践 358 | 359 | - [minGPT @karpathy](https://github.com/karpathy/minGPT) 360 | - [llm.c @karpathy](https://github.com/karpathy/llm.c): LLM training in simple, raw C/CUDA 361 | - [LLM101n](https://github.com/karpathy/LLM101n) 362 | - [llama2.c](https://github.com/karpathy/llama2.c): Inference Llama 2 in one file of pure C 363 | - [nanoGPT](https://github.com/karpathy/nanoGPT) 364 | - [Baby-Llama2-Chinese](https://github.com/DLLXW/baby-llama2-chinese) 365 | - [从0到1构建一个MiniLLM](https://github.com/Tongjilibo/build_MiniLLM_from_scratch) 366 | - [gpt-fast](https://github.com/pytorch-labs/gpt-fast) 、[blog](https://pytorch.org/blog/accelerating-generative-ai-2/) 367 | 368 | 369 | ## 大模型汇总资料 370 | 371 | - [Awesome-Chinese-LLM](https://github.com/HqWu-HITCS/Awesome-Chinese-LLM) 372 | - [Awesome-LLM-Survey](https://github.com/HqWu-HITCS/Awesome-LLM-Survey) 373 | - [Large Language Model Course](https://github.com/mlabonne/llm-course) 374 | - [Awesome-Quantization-Papers](https://github.com/Zhen-Dong/Awesome-Quantization-Papers) 375 | - [Awesome Model Quantization (GitHub)](https://github.com/htqin/awesome-model-quantization) 376 | - [Awesome Transformer Attention (GitHub)](https://github.com/cmhungsteve/Awesome-Transformer-Attention) 377 | - [语言模型数据选择综述](https://github.com/alon-albalak/data-selection-survey) 378 | - [Awesome Knowledge Distillation of LLM Papers](https://github.com/Tebmer/Awesome-Knowledge-Distillation-of-LLMs) 379 | - [Awasome-Pruning @ghimiredhikura](https://github.com/ghimiredhikura/Awasome-Pruning) 380 | - [Awesome-Pruning @he-y](https://github.com/he-y/Awesome-Pruning) 381 | - [awesome-pruning @hrcheng1066](https://github.com/hrcheng1066/awesome-pruning) 382 | - [Awesome-LLM-Inference](https://github.com/DefTruth/Awesome-LLM-Inference) 383 | 384 | 385 | ## 微信公众号文章集锦 386 | 387 | - [2024年2月大模型文章集锦](https://mp.weixin.qq.com/s?__biz=MzU3Mzg5ODgxMg==&mid=2247487320&idx=2&sn=522fdf838d4ec03f24dbc7a11a3a5a65&chksm=fd3be60bca4c6f1d0c9b0643db0d7334940fb592dac3b5fbf286c7232f6bb08b968fbd237a20&scene=21#wechat_redirect) 388 | - [2024年1月大模型文章集锦](https://mp.weixin.qq.com/s?__biz=MzU3Mzg5ODgxMg==&mid=2247487067&idx=2&sn=33594e6a82cf79a7580272c064635d75&chksm=fd3be708ca4c6e1ece0e1f6cc22bfd286bf3e9073350b91369b1d0e7fb52b50fac8113288e43&scene=21#wechat_redirect) 389 | - [2023年12月大模型文章集锦](https://mp.weixin.qq.com/s?__biz=MzU3Mzg5ODgxMg==&mid=2247486824&idx=2&sn=4faaac42f983af46cce44b35dd416c5f&chksm=fd3be43bca4c6d2d6f5fd1cf3004c37782d0b829111ad5ecd155d6cd3adedd40655653271ba1&scene=21#wechat_redirect) 390 | - [2023年6-11月大模型文章集锦](https://mp.weixin.qq.com/s?__biz=MzU3Mzg5ODgxMg==&mid=2247486480&idx=2&sn=b6b504f9d67a3cdad5ba0eb68eee647b&chksm=fd3be543ca4c6c55e0c2fd335de92103a1aee4e5631be34f06d7557463bc7e339fb63680ad54&scene=21&poc_token=HCwA9WWjTC-CNeedW8iQ1lZwSAwg4fwWFAVcUnai) 391 | 392 | 393 | ## 其他 394 | 395 | - [Hugging Face 博客](https://github.com/huggingface/blog/tree/main) 396 | 397 | 398 | 399 | 400 | 401 | 402 | -------------------------------------------------------------------------------- /git-pull-push.sh: -------------------------------------------------------------------------------- 1 | git pull origin main 2 | git add . 3 | 4 | #time=`date -I minutes` 5 | time=`date +"%Y-%m-%d_%H:%M:%S"` 6 | 7 | echo $time 8 | 9 | commit_info="fix-""$time" 10 | 11 | git commit -m $commit_info 12 | 13 | git push origin main 14 | 15 | 16 | 17 | -------------------------------------------------------------------------------- /llm-app.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | ## knowledge-graph 6 | 7 | - [一文带你入门知识图谱多跳问答](https://blog.csdn.net/qq_27590277/article/details/115804930) 8 | 9 | 10 | -------------------------------------------------------------------------------- /llm-arch.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | ## 通用大模型 5 | 6 | - [Baichuan-7B](https://github.com/baichuan-inc/baichuan-7B) 7 | - [Baichuan-13B](https://github.com/baichuan-inc/Baichuan-13B) 8 | - [XVERSE-13B](https://github.com/xverse-ai/XVERSE-13B) 9 | - 10 | 11 | ## 领域大模型 12 | 13 | ### 法律大模型 14 | - [智海-录问](https://github.com/zhihaiLLM/wisdomInterrogatory) 15 | - 16 | -------------------------------------------------------------------------------- /llm-compression.md: -------------------------------------------------------------------------------- 1 | 2 | ## 模型压缩 3 | 4 | - 模型压缩及移动端部署: https://github.com/scutan90/DeepLearning-500-questions/blob/master/ch17_%E6%A8%A1%E5%9E%8B%E5%8E%8B%E7%BC%A9%E3%80%81%E5%8A%A0%E9%80%9F%E5%8F%8A%E7%A7%BB%E5%8A%A8%E7%AB%AF%E9%83%A8%E7%BD%B2/%E7%AC%AC%E5%8D%81%E4%B8%83%E7%AB%A0_%E6%A8%A1%E5%9E%8B%E5%8E%8B%E7%BC%A9%E3%80%81%E5%8A%A0%E9%80%9F%E5%8F%8A%E7%A7%BB%E5%8A%A8%E7%AB%AF%E9%83%A8%E7%BD%B2.md#1751-%E6%A8%A1%E5%9E%8B%E4%BC%98%E5%8C%96%E5%8A%A0%E9%80%9F%E6%96%B9%E6%B3%95 5 | - 模型压缩:https://paddlepedia.readthedocs.io/en/latest/tutorials/model_compress/index.html 6 | - 模型自动化压缩工具(ACT):https://www.paddlepaddle.org.cn/documentation/docs/zh/guides/infer/paddleslim/paddle_slim_cn.html 7 | - 机器学习系统-模型压缩:https://openmlsys.github.io/chapter_model_deployment/model_compression.html 8 | - 模型压缩部署概述:https://bbs.huaweicloud.com/blogs/390416 9 | 10 | 11 | ### 知识蒸馏 12 | 13 | - 知识蒸馏(Knowledge Distillation) 经典之作:https://zhuanlan.zhihu.com/p/102038521 14 | - 知识蒸馏(Knowledge Distillation)简述(一):https://zhuanlan.zhihu.com/p/81467832 15 | - 知识蒸馏(Knowledge Distillation)简述(二):https://zhuanlan.zhihu.com/p/82129871 16 | - 知识蒸馏综述万字长文:https://github.com/DA-southampton/NLP_ability/blob/master/%E6%B7%B1%E5%BA%A6%E5%AD%A6%E4%B9%A0%E8%87%AA%E7%84%B6%E8%AF%AD%E8%A8%80%E5%A4%84%E7%90%86/%E6%A8%A1%E5%9E%8B%E8%92%B8%E9%A6%8F/%E4%BB%80%E4%B9%88%E6%98%AF%E7%9F%A5%E8%AF%86%E8%92%B8%E9%A6%8F.md 17 | - What is Knowledge Distillation? :https://blog.roboflow.com/what-is-knowledge-distillation/ 18 | - Knowledge Distillation: Principles, Algorithms, Applications:https://neptune.ai/blog/knowledge-distillation 19 | 20 | 21 | 22 | ### 模型剪枝 23 | 24 | - [Neural Network Pruning 101](https://towardsdatascience.com/neural-network-pruning-101-af816aaea61) 25 | - [模型剪枝技术原理及其发展现状和展望](https://zhuanlan.zhihu.com/p/134642289) 26 | - [模型剪枝(Model Pruning)](https://zhuanlan.zhihu.com/p/525071928) 27 | - [我总结了70篇论文的方法,帮你透彻理解神经网络的剪枝算法](https://zhuanlan.zhihu.com/p/408899259) 28 | - [深度模型剪枝](https://zhuanlan.zhihu.com/p/547203195) 29 | - [NLP(八):大语言模型的稀疏化技术](https://zhuanlan.zhihu.com/p/615399255) 30 | - [LLM-Pruner: 大语言模型的结构化剪枝](https://zhuanlan.zhihu.com/p/630902012) 31 | 32 | ### 模型量化 33 | 34 | - [量化感知训练(Quantization-aware-training)探索-从原理到实践](https://zhuanlan.zhihu.com/p/548174416)-TF 35 | - [Pytorch实现卷积神经网络训练量化(QAT)](https://zhuanlan.zhihu.com/p/164901397) 36 | - [LLM 量化技术小结](https://zhuanlan.zhihu.com/p/651874446) 37 | - [NLP(十一):大语言模型的模型量化(INT8/INT4)技术](https://zhuanlan.zhihu.com/p/627436535) 38 | - [LLM大语言模型量化方法介绍(一)](https://www.birentech.com/Research_nstitute_details/13.html) @壁仞 39 | - [Transformers 中原生支持的量化方案概述](https://zhuanlan.zhihu.com/p/666655711) 40 | - [大模型量化:什么是模型量化,如何进行模型量化](https://www.wehelpwin.com/article/4109) 41 | - [模型量化了解一下?](https://zhuanlan.zhihu.com/p/132561405) 42 | 43 | 44 | 45 | 46 | 47 | 48 | 49 | -------------------------------------------------------------------------------- /llm-inference.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | ## LLM推理优化 5 | 6 | 7 | - LLM Inference Performance Engineering: Best Practices:https://www.databricks.com/blog/llm-inference-performance-engineering-best-practices 8 | - 大语言模型推理性能工程优化最佳实践:https://mp.weixin.qq.com/s?__biz=MzU3Mzg5ODgxMg==&mid=2247486293&idx=1&sn=2b47cbbe189953599e254158fd78a18d&chksm=fd3be206ca4c6b109667e8813623db42a53b7ac0cd628a6cfd57f36334cb53e9ee33d49dd2dc&scene=21#wechat_redirect 9 | - 语言大模型推理性能工程:最佳实践:https://zhuanlan.zhihu.com/p/663282469 10 | - Reproducible Performance Metrics for LLM inference:https://www.anyscale.com/blog/reproducible-performance-metrics-for-llm-inference 11 | - 可复现的语言大模型推理性能指标:https://zhuanlan.zhihu.com/p/667612787 12 | - [从零实现AI推理引擎](https://www.zhihu.com/column/c_1760127081235820544) 13 | 14 | 15 | 16 | 17 | ### llama.cpp 18 | 19 | - [Understanding how LLM inference works with llama.cpp](https://www.omrimallis.com/posts/understanding-how-llm-inference-works-with-llama-cpp/) 20 | - [Llama.cpp 教程:高效 LLM 推理和实现的完整指南](https://blog.csdn.net/weixin_41863029/article/details/139456502) 21 | - [LLM如何通过llama.cpp进行推理](https://blog.csdn.net/Blaze_bxh/article/details/137054444) 22 | - [llama.cpp源码解析](https://mp.weixin.qq.com/s?__biz=MzA4MjY4NTk0NQ==&mid=2247519554&idx=1&sn=c619e9907fb515e88b6265cf7017b726&chksm=9f8337d4a8f4bec27c755148f904668a46c85322e6ba31bc6b0286887bbb30f8a24a49d2d918&mpshare=1&scene=23&srcid=1107jzAJDvx0GcmHJO92iIcA&sharer_shareinfo=3dc56c7e8fdbf855398ab67e34dfb5c7&sharer_shareinfo_first=3dc56c7e8fdbf855398ab67e34dfb5c7#rd) 23 | 24 | 25 | 26 | ### vLLM 27 | 28 | 29 | - [Explaining the Code of the vLLM Inference Engine](https://medium.com/@crclq2018/explaining-the-source-code-behind-the-vllm-fast-inference-engine-91429f54d1f7) 30 | - [vLLM框架原理——PagedAttention](https://zhuanlan.zhihu.com/p/649537608) 31 | - [大模型推理框架 vLLM 源码解析(一):框架概览](https://zhuanlan.zhihu.com/p/681402162) 32 | - [LLM推理2:vLLM源码学习](https://zhuanlan.zhihu.com/p/643336063) 33 | - [Deploying Multiple Large Language Models with NVIDIA Triton Server and vLLM](https://awslabs.github.io/data-on-eks/docs/gen-ai/inference/vLLM-NVIDIATritonServer-Llama2) 34 | -------------------------------------------------------------------------------- /llm-train.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | - Megatron-LM源码系列(一):模型并行初始化: https://www.mltalks.com/posts/2207923116/ 6 | - Megatron-LM源码系列(六):Distributed-Optimizer分布式优化器实现Part1: https://www.mltalks.com/posts/3749485165/ 7 | 8 | - 深入理解 Megatron-LM(1)基础知识:https://zhuanlan.zhihu.com/p/650234985 9 | -------------------------------------------------------------------------------- /llmops.md: -------------------------------------------------------------------------------- 1 | 2 | # LLMOps 3 | 4 | LLMOps(Large Language Model Operations)是一个涵盖了大型语言模型(如GPT系列)开发、部署、维护和优化的一整套实践和流程。LLMOps 的目标是确保高效、可扩展和安全地使用这些强大的 AI 模型来构建和运行实际应用程序。它涉及到模型训练、部署、监控、更新、安全性和合规性等方面。 5 | 6 | 7 | 8 | - [tensorchord/Awesome-LLMOps](https://github.com/tensorchord/Awesome-LLMOps) 9 | - [MicroSoft LMOps](https://github.com/microsoft/LMOps) 10 | -------------------------------------------------------------------------------- /xxx.md: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | GPU市场占比,降序:浪潮、新华三、宁畅... 7 | 8 | 9 | 大模型训练: 10 | - oam模组 11 | - pcie 12 | 13 | 14 | --- 15 | 16 | R5330-G7-- 海光四号CPU 17 | 18 | 19 | --- 20 | 21 | 算力中心 22 | 异构算力、海量存储(分布式并行文件)、无损网络、主动安全 23 | 24 | 25 | --- 26 | 27 | 28 | GPFS-NFS 29 | 30 | 31 | 32 | 多元算力统一纳管-天数智芯与NV混训-改动较大 33 | 34 | 天数与CUDA比较紧 35 | 36 | 37 | 多元GPU的 UCCL-统一通信库 38 | 39 | 40 | 41 | 算力 、显存 不同 42 | 43 | 网络一致 44 | 45 | 46 | 47 | 方案中对NCCL、Pytorch、Megatron都进行过定制化改造。 48 | 49 | 目前该方案仅做过H800(PCIe 5.0)+天垓150(PCIe 4.0) + RoCE v2的方案。目前该方案还比较初级。不同算力、显存都不匹配,服务器内卡间通信也存在代差,模型并行切分策略需要纯手动切分,严重依赖于算法工程师对于不同加速卡的算力显存网络带宽的熟悉程度以及分布式训练框架的二次开发能力。目前仅有PCEI的方案,目前最佳是支持到4TP并行,超过4TP的模型并行网络瓶颈就很容易出现,对于超过50B参数规模的模型不推荐。 50 | 51 | 52 | 53 | 54 | 55 | 56 | --- 57 | 银行AI算力建设: 58 | 59 | 60 | GPU卡: 61 | 62 | 国产卡: 63 | 64 | GPU与国产卡并存: 65 | 66 | 67 | --- 68 | 69 | 算力资源池规划建议 70 | 71 | 72 | 生产区、开发测试区、预发布区、运维管理区、互联网接入区、广域网接入区、外联区 73 | 74 | 75 | --- 76 | 77 | 78 | 降序 79 | 80 | --------智算业务区 、通用业务区 81 | 82 | 参数网 83 | 84 | 高速存储网 85 | 86 | 智算业务网 87 | 88 | 通算业务网 89 | 90 | 91 | 92 | --- 93 | 94 | 网络规划 95 | 96 | 97 | 98 | 参数网性能要求: 99 | PCEI 4.0 主推100G、200G PCIE 5.0 主推 200G/400G 100 | 101 | 存储网性能要求:主推100G、200G 102 | 103 | IB / roce v2 104 | 105 | 106 | 107 | --- 108 | 存储规划 109 | 110 | 111 | - 读场景 OPS 112 | 113 | - 写场景 带宽占用 114 | 115 | - 容量占用 116 | 117 | 118 | 综上: 建议配置100:1 的全闪高性能存储+100:3 的大容量混闪存储 119 | 120 | 121 | 122 | 华为、GPFS /DDM 高校? 123 | 124 | 125 | 126 | ------- 127 | 128 | 129 | DCGM 130 | 131 | 132 | 133 | 134 | 135 | 136 | 137 | 138 | 139 | 140 | 141 | ---- 142 | 143 | roce 监控 144 | 145 | 146 | IB 网卡 + ROCE交换机 + 天数智芯GPU 147 | 148 | 149 | 150 | 单机 23GB/s 151 | 152 | 153 | 多机 19GB/s 154 | 155 | --- 156 | 157 | 158 | 混训总结 159 | 160 | 161 | 162 | 163 | 164 | h800 pcie 天hai 150 165 | 166 | 167 | 168 | 代码 (pytorch、通信库、metatron) 169 | 170 | 171 | ---- 172 | 173 | ROCE 174 | 组网-网络负载均衡 175 | 176 | 177 | 178 | 179 | 180 | 181 | 182 | 183 | 参数网 184 | 185 | 存储网 186 | 187 | 业务网 188 | 189 | 带外管理网 190 | 191 | 192 | - 193 | 194 | 单轨组网(昇腾) VS 多轨组网(nv) 195 | 196 | 197 | - 198 | H3C 推荐组网 199 | 200 | 单框/多框组网 201 | 202 | 203 | 盒盒组网(优) 204 | 205 | 框盒组网 206 | 207 | DDC组网 208 | 209 | 210 | 211 | -- 212 | 213 | 二级盒盒组网 214 | 215 | 216 | 217 | 三级盒盒组网 218 | 1:3 收敛比 1:1收敛比 219 | 220 | 221 | 二级框盒组网 222 | 223 | 224 | 225 | DDC组网 226 | 227 | 228 | 229 | 230 | -- 231 | 232 | 网络负载均衡 233 | 234 | 235 | 静态逐流调度--增强逐流调度--逐包调度--信元调度 236 | 237 | 238 | 239 | 解决负载均衡 ,调端、调网(流)-控制器 240 | 241 | 242 | 243 | 静态ECMP 244 | 245 | 246 | 路径导航 247 | 248 | 249 | 动态NSLB 250 | 251 | LBN+昇腾-单轨-512卡(上线不高,下线不低) 252 | 253 | 254 | DDC 方案 上线和下线一直很高,最大3000卡 ,成本高 255 | 256 | 257 | 258 | 259 | FGLB 260 | 261 | 262 | 263 | DLB Eligible Flow-Set 子流 264 | 265 | 266 | 动态逐包SprayLink 267 | 268 | 269 | 270 | 信元类 271 | 272 | 273 | ---- 274 | 275 | H3C 数据交换机芯片状态 276 | 277 | 278 | 商芯 279 | 280 | 国芯盛科 281 | 282 | 283 | 参数网芯片 存储网芯片 284 | 285 | 286 | H3C 200G 400G 盒式 bps 双向 交换机 287 | 288 | 289 | H3C 200G 400G 框式 290 | 291 | 292 | 293 | 294 | 295 | 296 | 297 | 298 | 299 | 300 | 301 | 302 | 303 | 304 | 305 | 306 | 307 | 308 | 309 | 310 | 311 | 312 | 313 | 314 | 315 | 316 | 317 | 318 | 319 | 320 | 321 | 322 | --------------------------------------------------------------------------------