├── 20250211-WeiChen-DeepSeek V3-R1 Architecture Deep Analysis and Deep Think-1.31-v1.31.pdf ├── DeepSeek V3-R1 Deep Explain-v1.31-20250207.pdf └── README.md /20250211-WeiChen-DeepSeek V3-R1 Architecture Deep Analysis and Deep Think-1.31-v1.31.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chenweiphd/DeepSeek-MoE-ResourceMap/9d8245615fb49843c516e3fa3367f214b90f67d4/20250211-WeiChen-DeepSeek V3-R1 Architecture Deep Analysis and Deep Think-1.31-v1.31.pdf -------------------------------------------------------------------------------- /DeepSeek V3-R1 Deep Explain-v1.31-20250207.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/chenweiphd/DeepSeek-MoE-ResourceMap/9d8245615fb49843c516e3fa3367f214b90f67d4/DeepSeek V3-R1 Deep Explain-v1.31-20250207.pdf -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # DeepSeek-MoE-ResourceMap 2 | 3 | ![image](https://github.com/user-attachments/assets/63e68241-e4c9-49a8-a608-52dd05324581) 4 | 5 | 6 | ## 1 Introduction 7 | 8 | 9 | 10 | This is a repo about DeepSeek model architecture, training, deploy and distill 11 | 12 | ## 2 How to Use DeepSeek 13 | 14 | 15 | 16 | ## 3 DeepSeek and MoE Model Explain 17 | 18 | ### 3.1 Architecture 19 | 20 | | Classification | Title | Link | 21 | | -------------- | ---------------------------------------------------------- | ------------------------------------------------------------ | 22 | | Summary | DeepSeek V3-R1 Deep Explain (1) -v1.31 | https://zhuanlan.zhihu.com/p/21208287743 | 23 | | Summary | DeepSeek V3-R1 Deep Explain (2) -v1.31 | https://zhuanlan.zhihu.com/p/21755758234 | 24 | | Summary Slides | DeepSeek V3-R1 Deep Analysis and Deep Think-1.31 | https://github.com/chenweiphd/DeepSeek-MoE-ResourceMap/blob/main/20250211-WeiChen-DeepSeek%20V3-R1%20Architecture%20Deep%20Analysis%20and%20Deep%20Think-1.31-v1.31.pdf | 25 | | MoE | MoE Explain | https://zhuanlan.zhihu.com/p/674698482 | 26 | | DeepSeekMoE | DeepSeek MOE—From Switch Transformers to DeepSeek v1/v2/v3 | https://zhuanlan.zhihu.com/p/21584562624 | 27 | | MLA | DeepSeek-V2 MLA KV Cache | https://zhuanlan.zhihu.com/p/714761319 | 28 | | MLA | DeepSeekV2 Multi-head Latent Attention | https://zhuanlan.zhihu.com/p/714686419 | 29 | | MTP | DeepSeek V3: Multi-token Prediction | https://zhuanlan.zhihu.com/p/16540775238 | 30 | | MTP | Multi-Token Prediction | https://zhuanlan.zhihu.com/p/15823898951 | 31 | 32 | 33 | 34 | 35 | 36 | ### 3.2 Training 37 | 38 | | Classification | Title | Link | 39 | | ------------------ | ------------------------------------------------------------ | ------------------------------------------------------------ | 40 | | GRPO | GRPO:Group Relative Policy Optimization | https://zhuanlan.zhihu.com/p/20021693569 | 41 | | Rejection Sampling | LLM-Rejection Sampling | https://zhuanlan.zhihu.com/p/4547529049 | 42 | | Rejection Sampling | LLM Training Trick (1) -Rejection Sampling | https://zhuanlan.zhihu.com/p/649731916 | 43 | | Fill in the Middle | Efficient Training of Language Models to Fill in the Middle | https://huggingface.co/papers/2207.14255 | 44 | | Fill in the Middle | OpenAI Presents a Simple and Efficient Training Strategy to Boost Language Models’ Text-Infilling Capabilities | https://syncedreview.com/2022/08/04/openai-presents-a-simple-and-efficient-training-strategy-to-boost-language-models-text-infilling-capabilities/ | 45 | | Fill in the Middle | “Where’s the Beef”, Codestral’s Fill-In-the-Middle Magic | https://auro-227.medium.com/wheres-the-beef-codestral-s-fill-in-the-middle-magic-c0804094a424 | 46 | | | | | 47 | 48 | 49 | 50 | ### 3.3 DeepSeek Related Paper 51 | 52 | 53 | 54 | | Item | Title | Link | 55 | | --------------------------- | ------------------------------------------------------------ | ----------------------------------- | 56 | | R1 | DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning | https://arxiv.org/abs/2501.12948 | 57 | | V3 | DeepSeek-V3 Technical Report | https://arxiv.org/abs/2412.19437 | 58 | | V2 | DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model | https://arxiv.org/abs/2405.04434 | 59 | | Coder-V2 | DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence | https://arxiv.org/abs/2406.11931 | 60 | | HAI Platform(Not a model) | Fire-Flyer AI-HPC: A Cost-Effective Software-Hardware Co-Design for Deep Learning | https://arxiv.org/html/2408.14158v2 | 61 | | | | | 62 | | Fill in the Middle | Efficient Training of Language Models to Fill in the Middle | https://arxiv.org/pdf/2207.14255 | 63 | 64 | 65 | 66 | ## 4 DeepSeek Deploy 67 | 68 | ### 4.1 Memory Requirements(From Xinference) 69 | 70 | | Model | # parameters | Data type | Memory Requirements | 71 | | -------------------- | ------------ | --------- | ------------------- | 72 | | R1 | 685B | FP8 | ≥890GB | 73 | | R1 | 685B | INT4 | ≥450GB | 74 | | V3 | 671B | FP8 | ≥870GB | 75 | | V3 | 671B | INT4 | ≥440GB | 76 | | R1-Distill-Llama-70B | 70B | BF16 | ≥180GB | 77 | | R1-Distil1-Qwen-32B | 32B | BF16 | ≥80GB | 78 | | R1-Distil1-Qwen-14B | 14B | BF16 | ≥40GB | 79 | | R1-Disti11-Llama-8B | 8B | BF16 | ≥22GB | 80 | | R1-Distill-Qwen-7B | 7B | BF16 | ≥20GB | 81 | | R1-Disti11-0wen-1.5B | 1.5B | BF16 | ≥5GB | 82 | 83 | ### 4.2 Deploy 84 | 85 | | Item | Title | Link | 86 | | --------------------------- | ------------------------------------ | ---------------------------------------- | 87 | | | | | 88 | | Ollama | DeepSeek Simple Local Deploy | https://zhuanlan.zhihu.com/p/21311556084 | 89 | | SiliconFlow | DeepSeek Simple Local API | https://zhuanlan.zhihu.com/p/21170425824 | 90 | | Ollama+Dynamic Quantization | DeepSeek-R1 671B Simple Local Deploy | https://zhuanlan.zhihu.com/p/21856074485 | 91 | | | | | 92 | | | | | 93 | | | | | 94 | 95 | 96 | 97 | ## 5 Distill 98 | 99 | | Item | Title | Link | 100 | | -------------- | ---------------------------------------------- | ------------------------------------ | 101 | | | | | 102 | | Custom Distill | Train your own R1 reasoning model with Unsloth | https://unsloth.ai/blog/r1-reasoning | 103 | | | | | 104 | | | | | 105 | | | | | 106 | | | | | 107 | | | | | 108 | 109 | 110 | 111 | 112 | 113 | ## 6 Comment about DeepSeek 114 | 115 | | Author | Title | Link | 116 | | ------------ | ------------------------------- | ------------------------------------------------------- | 117 | | Dario Amodei | On DeepSeek and Export Controls | https://darioamodei.com/on-deepseek-and-export-controls | 118 | | | | | 119 | | | | | 120 | | | | | 121 | 122 | 123 | 124 | ## 7 DeepSeek Model Link 125 | 126 | | Item | Link | 127 | | --------------------------- | ------------------------------------------------ | 128 | | R1 | https://github.com/deepseek-ai/DeepSeek-R1 | 129 | | V3 | https://github.com/deepseek-ai/DeepSeek-V3 | 130 | | V2 | https://github.com/deepseek-ai/DeepSeek-V2 | 131 | | Coder-V2 | https://github.com/deepseek-ai/DeepSeek-Coder-V2 | 132 | | MoE | https://github.com/deepseek-ai/DeepSeek-MoE | 133 | | HAI Platform(Not a model) | https://github.com/HFAiLab/hai-platform | 134 | | | | 135 | 136 | 137 | 138 | ## Discussion Group 139 | 140 | 141 | 142 | | DeepSeek and MoE Discussion Group | ![MoEgroup](https://github.com/user-attachments/assets/78a2fe0b-6a9b-4389-842f-24fd4e1d5119) | 143 | | ------------------------------------------------------------ | ------------------------------------------------------------ | 144 | | 如群满或二维码失效请添加群助理,加入相应讨论群。添加时请注明:称呼-所在单位-要加入的群 | ![image](https://github.com/user-attachments/assets/6b757bea-a656-4aac-8e5d-83525130db20) | 145 | 146 | 147 | --------------------------------------------------------------------------------