├── 1_data ├── deduplicate.md ├── deepseek_codev2_math.md ├── download.md └── preprocess.md ├── 2_training ├── algo │ ├── 100b_moe_hyper_param.md │ ├── long_context │ │ ├── attention_formula.png │ │ ├── theta_formula.png │ │ ├── theta_trends.png │ │ ├── 大模型长文训练(一)位置编码基础理论.md │ │ ├── 大模型长文训练(三)YaRN代码详解.md │ │ └── 大模型长文训练(二)长度外推.md │ ├── moe_algo.md │ ├── moe_pruning.md │ ├── nsa │ │ ├── Native_Sparse_Attention(一)图解.md │ │ ├── compress_attention.png │ │ ├── normal_attention.png │ │ ├── nsa_total.png │ │ ├── sliding_window_attention.png │ │ └── top_n_attention.png │ ├── post_train.md │ ├── ppo.md │ └── reward_rule.md └── infra │ ├── 1f1b.png │ ├── flash_attn │ ├── Flash Attention v2(一).md │ ├── Flash Attention v3(一) .md │ ├── flash_attn_v1_pic0.png │ ├── flash_attn_v1_pic1.png │ ├── flash_attn_v1_pic2.png │ ├── flash_attn_v1_pic3.png │ ├── flash_attn_v1_pic4.png │ └── 五张图片看懂Flash Attention v1(一).md │ ├── megatron │ ├── Megatron-LM(一)代码结构分析.md │ ├── Megatron-LM(三)代码调试指南.md │ └── Megatron-LM(二)代码运行流程.md │ ├── megatron_detail.md │ └── ring_attn │ ├── ring_attn_pic0.png │ ├── ring_attn_pic1.png │ ├── ring_attn_pic2.png │ ├── ring_attn_pic3.png │ └── ring_attn(一).md ├── 4_evaluation └── README.md ├── 5_application └── README.md ├── LICENSE └── README.md /1_data/deduplicate.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/1_data/deduplicate.md -------------------------------------------------------------------------------- /1_data/deepseek_codev2_math.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/1_data/deepseek_codev2_math.md -------------------------------------------------------------------------------- /1_data/download.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/1_data/download.md -------------------------------------------------------------------------------- /1_data/preprocess.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/1_data/preprocess.md -------------------------------------------------------------------------------- /2_training/algo/100b_moe_hyper_param.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/algo/100b_moe_hyper_param.md -------------------------------------------------------------------------------- /2_training/algo/long_context/attention_formula.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/algo/long_context/attention_formula.png -------------------------------------------------------------------------------- /2_training/algo/long_context/theta_formula.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/algo/long_context/theta_formula.png -------------------------------------------------------------------------------- /2_training/algo/long_context/theta_trends.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/algo/long_context/theta_trends.png -------------------------------------------------------------------------------- /2_training/algo/long_context/大模型长文训练(一)位置编码基础理论.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/algo/long_context/大模型长文训练(一)位置编码基础理论.md -------------------------------------------------------------------------------- /2_training/algo/long_context/大模型长文训练(三)YaRN代码详解.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/algo/long_context/大模型长文训练(三)YaRN代码详解.md -------------------------------------------------------------------------------- /2_training/algo/long_context/大模型长文训练(二)长度外推.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/algo/long_context/大模型长文训练(二)长度外推.md -------------------------------------------------------------------------------- /2_training/algo/moe_algo.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/algo/moe_algo.md -------------------------------------------------------------------------------- /2_training/algo/moe_pruning.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/algo/moe_pruning.md -------------------------------------------------------------------------------- /2_training/algo/nsa/Native_Sparse_Attention(一)图解.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/algo/nsa/Native_Sparse_Attention(一)图解.md -------------------------------------------------------------------------------- /2_training/algo/nsa/compress_attention.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/algo/nsa/compress_attention.png -------------------------------------------------------------------------------- /2_training/algo/nsa/normal_attention.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/algo/nsa/normal_attention.png -------------------------------------------------------------------------------- /2_training/algo/nsa/nsa_total.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/algo/nsa/nsa_total.png -------------------------------------------------------------------------------- /2_training/algo/nsa/sliding_window_attention.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/algo/nsa/sliding_window_attention.png -------------------------------------------------------------------------------- /2_training/algo/nsa/top_n_attention.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/algo/nsa/top_n_attention.png -------------------------------------------------------------------------------- /2_training/algo/post_train.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/algo/post_train.md -------------------------------------------------------------------------------- /2_training/algo/ppo.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/algo/ppo.md -------------------------------------------------------------------------------- /2_training/algo/reward_rule.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/algo/reward_rule.md -------------------------------------------------------------------------------- /2_training/infra/1f1b.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/infra/1f1b.png -------------------------------------------------------------------------------- /2_training/infra/flash_attn/Flash Attention v2(一).md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/infra/flash_attn/Flash Attention v2(一).md -------------------------------------------------------------------------------- /2_training/infra/flash_attn/Flash Attention v3(一) .md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/infra/flash_attn/Flash Attention v3(一) .md -------------------------------------------------------------------------------- /2_training/infra/flash_attn/flash_attn_v1_pic0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/infra/flash_attn/flash_attn_v1_pic0.png -------------------------------------------------------------------------------- /2_training/infra/flash_attn/flash_attn_v1_pic1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/infra/flash_attn/flash_attn_v1_pic1.png -------------------------------------------------------------------------------- /2_training/infra/flash_attn/flash_attn_v1_pic2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/infra/flash_attn/flash_attn_v1_pic2.png -------------------------------------------------------------------------------- /2_training/infra/flash_attn/flash_attn_v1_pic3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/infra/flash_attn/flash_attn_v1_pic3.png -------------------------------------------------------------------------------- /2_training/infra/flash_attn/flash_attn_v1_pic4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/infra/flash_attn/flash_attn_v1_pic4.png -------------------------------------------------------------------------------- /2_training/infra/flash_attn/五张图片看懂Flash Attention v1(一).md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/infra/flash_attn/五张图片看懂Flash Attention v1(一).md -------------------------------------------------------------------------------- /2_training/infra/megatron/Megatron-LM(一)代码结构分析.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/infra/megatron/Megatron-LM(一)代码结构分析.md -------------------------------------------------------------------------------- /2_training/infra/megatron/Megatron-LM(三)代码调试指南.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/infra/megatron/Megatron-LM(三)代码调试指南.md -------------------------------------------------------------------------------- /2_training/infra/megatron/Megatron-LM(二)代码运行流程.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/infra/megatron/Megatron-LM(二)代码运行流程.md -------------------------------------------------------------------------------- /2_training/infra/megatron_detail.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/infra/megatron_detail.md -------------------------------------------------------------------------------- /2_training/infra/ring_attn/ring_attn_pic0.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/infra/ring_attn/ring_attn_pic0.png -------------------------------------------------------------------------------- /2_training/infra/ring_attn/ring_attn_pic1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/infra/ring_attn/ring_attn_pic1.png -------------------------------------------------------------------------------- /2_training/infra/ring_attn/ring_attn_pic2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/infra/ring_attn/ring_attn_pic2.png -------------------------------------------------------------------------------- /2_training/infra/ring_attn/ring_attn_pic3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/infra/ring_attn/ring_attn_pic3.png -------------------------------------------------------------------------------- /2_training/infra/ring_attn/ring_attn(一).md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/2_training/infra/ring_attn/ring_attn(一).md -------------------------------------------------------------------------------- /4_evaluation/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/4_evaluation/README.md -------------------------------------------------------------------------------- /5_application/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/5_application/README.md -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/monster119120/blog_github/HEAD/README.md --------------------------------------------------------------------------------