├── 20250211-WeiChen-DeepSeek V3-R1 Architecture Deep Analysis and Deep Think-1.31-v1.31.pdf
├── DeepSeek V3-R1 Deep Explain-v1.31-20250207.pdf
└── README.md


/20250211-WeiChen-DeepSeek V3-R1 Architecture Deep Analysis and Deep Think-1.31-v1.31.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chenweiphd/DeepSeek-MoE-ResourceMap/9d8245615fb49843c516e3fa3367f214b90f67d4/20250211-WeiChen-DeepSeek V3-R1 Architecture Deep Analysis and Deep Think-1.31-v1.31.pdf


--------------------------------------------------------------------------------
/DeepSeek V3-R1 Deep Explain-v1.31-20250207.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/chenweiphd/DeepSeek-MoE-ResourceMap/9d8245615fb49843c516e3fa3367f214b90f67d4/DeepSeek V3-R1 Deep Explain-v1.31-20250207.pdf


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # DeepSeek-MoE-ResourceMap
  2 | 
  3 | ![image](https://github.com/user-attachments/assets/63e68241-e4c9-49a8-a608-52dd05324581)
  4 | 
  5 | 
  6 | ## 1 Introduction
  7 | 
  8 | 
  9 | 
 10 | This is a repo about DeepSeek model architecture, training, deploy and distill
 11 | 
 12 | ## 2 How to Use DeepSeek
 13 | 
 14 | 
 15 | 
 16 | ## 3 DeepSeek and MoE Model Explain
 17 | 
 18 | ### 3.1 Architecture
 19 | 
 20 | | Classification | Title                                                      | Link                                                         |
 21 | | -------------- | ---------------------------------------------------------- | ------------------------------------------------------------ |
 22 | | Summary        | DeepSeek V3-R1 Deep Explain (1) -v1.31                     | https://zhuanlan.zhihu.com/p/21208287743                     |
 23 | | Summary        | DeepSeek V3-R1 Deep Explain (2) -v1.31                     | https://zhuanlan.zhihu.com/p/21755758234                     |
 24 | | Summary Slides | DeepSeek V3-R1 Deep Analysis and Deep Think-1.31           | https://github.com/chenweiphd/DeepSeek-MoE-ResourceMap/blob/main/20250211-WeiChen-DeepSeek%20V3-R1%20Architecture%20Deep%20Analysis%20and%20Deep%20Think-1.31-v1.31.pdf |
 25 | | MoE            | MoE Explain                                                | https://zhuanlan.zhihu.com/p/674698482                       |
 26 | | DeepSeekMoE    | DeepSeek MOE—From Switch Transformers to DeepSeek v1/v2/v3 | https://zhuanlan.zhihu.com/p/21584562624                     |
 27 | | MLA            | DeepSeek-V2 MLA KV Cache                                   | https://zhuanlan.zhihu.com/p/714761319                       |
 28 | | MLA            | DeepSeekV2 Multi-head Latent Attention                     | https://zhuanlan.zhihu.com/p/714686419                       |
 29 | | MTP            | DeepSeek V3: Multi-token Prediction                        | https://zhuanlan.zhihu.com/p/16540775238                     |
 30 | | MTP            | Multi-Token Prediction                                     | https://zhuanlan.zhihu.com/p/15823898951                     |
 31 | 
 32 | 
 33 | 
 34 | 
 35 | 
 36 | ### 3.2 Training
 37 | 
 38 | | Classification     | Title                                                        | Link                                                         |
 39 | | ------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ |
 40 | | GRPO               | GRPO：Group Relative Policy Optimization                     | https://zhuanlan.zhihu.com/p/20021693569                     |
 41 | | Rejection Sampling | LLM-Rejection Sampling                                       | https://zhuanlan.zhihu.com/p/4547529049                      |
 42 | | Rejection Sampling | LLM Training Trick (1) -Rejection Sampling                   | https://zhuanlan.zhihu.com/p/649731916                       |
 43 | | Fill in the Middle | Efficient Training of Language Models to Fill in the Middle  | https://huggingface.co/papers/2207.14255                     |
 44 | | Fill in the Middle | OpenAI Presents a Simple and Efficient Training Strategy to Boost Language Models’ Text-Infilling Capabilities | https://syncedreview.com/2022/08/04/openai-presents-a-simple-and-efficient-training-strategy-to-boost-language-models-text-infilling-capabilities/ |
 45 | | Fill in the Middle | “Where’s the Beef”, Codestral’s Fill-In-the-Middle Magic     | https://auro-227.medium.com/wheres-the-beef-codestral-s-fill-in-the-middle-magic-c0804094a424 |
 46 | |                    |                                                              |                                                              |
 47 | 
 48 | 
 49 | 
 50 | ### 3.3 DeepSeek Related Paper
 51 | 
 52 | 
 53 | 
 54 | | Item                        | Title                                                        | Link                                |
 55 | | --------------------------- | ------------------------------------------------------------ | ----------------------------------- |
 56 | | R1                          | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | https://arxiv.org/abs/2501.12948    |
 57 | | V3                          | DeepSeek-V3 Technical Report                                 | https://arxiv.org/abs/2412.19437    |
 58 | | V2                          | DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model | https://arxiv.org/abs/2405.04434    |
 59 | | Coder-V2                    | DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence | https://arxiv.org/abs/2406.11931    |
 60 | | HAI Platform（Not a model） | Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning | https://arxiv.org/html/2408.14158v2 |
 61 | |                             |                                                              |                                     |
 62 | | Fill in the Middle          | Efficient Training of Language Models to Fill in the Middle  | https://arxiv.org/pdf/2207.14255    |
 63 | 
 64 | 
 65 | 
 66 | ## 4 DeepSeek Deploy
 67 | 
 68 | ### 4.1 Memory Requirements(From Xinference)
 69 | 
 70 | | Model                | # parameters | Data type | Memory Requirements |
 71 | | -------------------- | ------------ | --------- | ------------------- |
 72 | | R1                   | 685B         | FP8       | ≥890GB              |
 73 | | R1                   | 685B         | INT4      | ≥450GB              |
 74 | | V3                   | 671B         | FP8       | ≥870GB              |
 75 | | V3                   | 671B         | INT4      | ≥440GB              |
 76 | | R1-Distill-Llama-70B | 70B          | BF16      | ≥180GB              |
 77 | | R1-Distil1-Qwen-32B  | 32B          | BF16      | ≥80GB               |
 78 | | R1-Distil1-Qwen-14B  | 14B          | BF16      | ≥40GB               |
 79 | | R1-Disti11-Llama-8B  | 8B           | BF16      | ≥22GB               |
 80 | | R1-Distill-Qwen-7B   | 7B           | BF16      | ≥20GB               |
 81 | | R1-Disti11-0wen-1.5B | 1.5B         | BF16      | ≥5GB                |
 82 | 
 83 | ### 4.2 Deploy
 84 | 
 85 | | Item                        | Title                                | Link                                     |
 86 | | --------------------------- | ------------------------------------ | ---------------------------------------- |
 87 | |                             |                                      |                                          |
 88 | | Ollama                      | DeepSeek Simple Local Deploy         | https://zhuanlan.zhihu.com/p/21311556084 |
 89 | | SiliconFlow                 | DeepSeek Simple Local API            | https://zhuanlan.zhihu.com/p/21170425824 |
 90 | | Ollama+Dynamic Quantization | DeepSeek-R1 671B Simple Local Deploy | https://zhuanlan.zhihu.com/p/21856074485 |
 91 | |                             |                                      |                                          |
 92 | |                             |                                      |                                          |
 93 | |                             |                                      |                                          |
 94 | 
 95 | 
 96 | 
 97 | ## 5 Distill
 98 | 
 99 | | Item           | Title                                          | Link                                 |
100 | | -------------- | ---------------------------------------------- | ------------------------------------ |
101 | |                |                                                |                                      |
102 | | Custom Distill | Train your own R1 reasoning model with Unsloth | https://unsloth.ai/blog/r1-reasoning |
103 | |                |                                                |                                      |
104 | |                |                                                |                                      |
105 | |                |                                                |                                      |
106 | |                |                                                |                                      |
107 | |                |                                                |                                      |
108 | 
109 | 
110 | 
111 | 
112 | 
113 | ## 6 Comment about DeepSeek
114 | 
115 | | Author       | Title                           | Link                                                    |
116 | | ------------ | ------------------------------- | ------------------------------------------------------- |
117 | | Dario Amodei | On DeepSeek and Export Controls | https://darioamodei.com/on-deepseek-and-export-controls |
118 | |              |                                 |                                                         |
119 | |              |                                 |                                                         |
120 | |              |                                 |                                                         |
121 | 
122 | 
123 | 
124 | ## 7 DeepSeek Model Link
125 | 
126 | | Item                        | Link                                             |
127 | | --------------------------- | ------------------------------------------------ |
128 | | R1                          | https://github.com/deepseek-ai/DeepSeek-R1       |
129 | | V3                          | https://github.com/deepseek-ai/DeepSeek-V3       |
130 | | V2                          | https://github.com/deepseek-ai/DeepSeek-V2       |
131 | | Coder-V2                    | https://github.com/deepseek-ai/DeepSeek-Coder-V2 |
132 | | MoE                         | https://github.com/deepseek-ai/DeepSeek-MoE      |
133 | | HAI Platform（Not a model） | https://github.com/HFAiLab/hai-platform          |
134 | |                             |                                                  |
135 | 
136 | 
137 | 
138 | ## Discussion Group
139 | 
140 |  
141 | 
142 | | DeepSeek and MoE Discussion Group                            |  ![MoEgroup](https://github.com/user-attachments/assets/78a2fe0b-6a9b-4389-842f-24fd4e1d5119)  |
143 | | ------------------------------------------------------------ | ------------------------------------------------------------ |
144 | | 如群满或二维码失效请添加群助理，加入相应讨论群。添加时请注明：称呼-所在单位-要加入的群 | ![image](https://github.com/user-attachments/assets/6b757bea-a656-4aac-8e5d-83525130db20) |
145 | 
146 | 
147 | 


--------------------------------------------------------------------------------