├── .gitignore ├── AGENTS.md ├── CODE_OF_CONDUCT.md ├── CONTRIBUTING.md ├── LICENSE ├── README.md ├── assets ├── datawhale.png ├── figure.png ├── logo.png └── nanochat.png ├── code └── .gitkeep └── docs ├── .nojekyll ├── Appendix ├── README.md └── report-RTX5090.md ├── Chapter01 └── Introduction.md ├── Chapter02 ├── Environment.md └── scripts │ ├── Linux-Tier-1.sh │ ├── Linux-Tier-2.sh │ ├── Linux-Tier-3.sh │ ├── Linux-Tier-4.sh │ ├── Linux-Tier-Free.sh │ ├── Windows-Tier-1.bat │ ├── Windows-Tier-Free.bat │ └── macOS-Tier-Free.sh ├── Chapter03 ├── Data.md ├── DataProcessingIpynb.ipynb └── data_check │ ├── __init__.py │ ├── check_char_distribution.py │ ├── check_content_quality.py │ ├── check_data.py │ ├── check_length_distribution.py │ └── convert_custom_data.py ├── Chapter04 └── Tokenization.md ├── Chapter05 └── Architecture.md ├── Chapter06 └── Pretrain.md ├── Chapter07 └── Midtrain.md ├── Chapter08 ├── SFT.md └── images │ ├── 1-显存.png │ ├── 2-sft-train.png │ ├── 3-midtrain-eval.png │ ├── 4-sft-eval.png │ ├── 5-pretrain-eval.png │ ├── 6-tokenizer-train.png │ └── 7-tokenizer-eval.png ├── Chapter09 ├── RL.md └── images │ ├── image_1.png │ ├── image_2.png │ ├── image_3.png │ └── image_4.png ├── Chapter10 └── Inference.md ├── Chapter11 └── Evaluation.md ├── Chapter12 └── Safety.md ├── README.md ├── _sidebar.md ├── index.html └── logo.png /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/.gitignore -------------------------------------------------------------------------------- /AGENTS.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/AGENTS.md -------------------------------------------------------------------------------- /CODE_OF_CONDUCT.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/CODE_OF_CONDUCT.md -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/CONTRIBUTING.md -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/README.md -------------------------------------------------------------------------------- /assets/datawhale.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/assets/datawhale.png -------------------------------------------------------------------------------- /assets/figure.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/assets/figure.png -------------------------------------------------------------------------------- /assets/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/assets/logo.png -------------------------------------------------------------------------------- /assets/nanochat.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/assets/nanochat.png -------------------------------------------------------------------------------- /code/.gitkeep: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /docs/.nojekyll: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /docs/Appendix/README.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /docs/Appendix/report-RTX5090.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Appendix/report-RTX5090.md -------------------------------------------------------------------------------- /docs/Chapter01/Introduction.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter01/Introduction.md -------------------------------------------------------------------------------- /docs/Chapter02/Environment.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter02/Environment.md -------------------------------------------------------------------------------- /docs/Chapter02/scripts/Linux-Tier-1.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter02/scripts/Linux-Tier-1.sh -------------------------------------------------------------------------------- /docs/Chapter02/scripts/Linux-Tier-2.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter02/scripts/Linux-Tier-2.sh -------------------------------------------------------------------------------- /docs/Chapter02/scripts/Linux-Tier-3.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter02/scripts/Linux-Tier-3.sh -------------------------------------------------------------------------------- /docs/Chapter02/scripts/Linux-Tier-4.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter02/scripts/Linux-Tier-4.sh -------------------------------------------------------------------------------- /docs/Chapter02/scripts/Linux-Tier-Free.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter02/scripts/Linux-Tier-Free.sh -------------------------------------------------------------------------------- /docs/Chapter02/scripts/Windows-Tier-1.bat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter02/scripts/Windows-Tier-1.bat -------------------------------------------------------------------------------- /docs/Chapter02/scripts/Windows-Tier-Free.bat: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter02/scripts/Windows-Tier-Free.bat -------------------------------------------------------------------------------- /docs/Chapter02/scripts/macOS-Tier-Free.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter02/scripts/macOS-Tier-Free.sh -------------------------------------------------------------------------------- /docs/Chapter03/Data.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter03/Data.md -------------------------------------------------------------------------------- /docs/Chapter03/DataProcessingIpynb.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter03/DataProcessingIpynb.ipynb -------------------------------------------------------------------------------- /docs/Chapter03/data_check/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter03/data_check/__init__.py -------------------------------------------------------------------------------- /docs/Chapter03/data_check/check_char_distribution.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter03/data_check/check_char_distribution.py -------------------------------------------------------------------------------- /docs/Chapter03/data_check/check_content_quality.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter03/data_check/check_content_quality.py -------------------------------------------------------------------------------- /docs/Chapter03/data_check/check_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter03/data_check/check_data.py -------------------------------------------------------------------------------- /docs/Chapter03/data_check/check_length_distribution.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter03/data_check/check_length_distribution.py -------------------------------------------------------------------------------- /docs/Chapter03/data_check/convert_custom_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter03/data_check/convert_custom_data.py -------------------------------------------------------------------------------- /docs/Chapter04/Tokenization.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /docs/Chapter05/Architecture.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter05/Architecture.md -------------------------------------------------------------------------------- /docs/Chapter06/Pretrain.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /docs/Chapter07/Midtrain.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /docs/Chapter08/SFT.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter08/SFT.md -------------------------------------------------------------------------------- /docs/Chapter08/images/1-显存.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter08/images/1-显存.png -------------------------------------------------------------------------------- /docs/Chapter08/images/2-sft-train.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter08/images/2-sft-train.png -------------------------------------------------------------------------------- /docs/Chapter08/images/3-midtrain-eval.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter08/images/3-midtrain-eval.png -------------------------------------------------------------------------------- /docs/Chapter08/images/4-sft-eval.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter08/images/4-sft-eval.png -------------------------------------------------------------------------------- /docs/Chapter08/images/5-pretrain-eval.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter08/images/5-pretrain-eval.png -------------------------------------------------------------------------------- /docs/Chapter08/images/6-tokenizer-train.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter08/images/6-tokenizer-train.png -------------------------------------------------------------------------------- /docs/Chapter08/images/7-tokenizer-eval.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter08/images/7-tokenizer-eval.png -------------------------------------------------------------------------------- /docs/Chapter09/RL.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter09/RL.md -------------------------------------------------------------------------------- /docs/Chapter09/images/image_1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter09/images/image_1.png -------------------------------------------------------------------------------- /docs/Chapter09/images/image_2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter09/images/image_2.png -------------------------------------------------------------------------------- /docs/Chapter09/images/image_3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter09/images/image_3.png -------------------------------------------------------------------------------- /docs/Chapter09/images/image_4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/Chapter09/images/image_4.png -------------------------------------------------------------------------------- /docs/Chapter10/Inference.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /docs/Chapter11/Evaluation.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /docs/Chapter12/Safety.md: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /docs/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/README.md -------------------------------------------------------------------------------- /docs/_sidebar.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/_sidebar.md -------------------------------------------------------------------------------- /docs/index.html: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/index.html -------------------------------------------------------------------------------- /docs/logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/datawhalechina/code-your-own-llm/HEAD/docs/logo.png --------------------------------------------------------------------------------