├── .gitignore ├── 0_tokenizer.ipynb ├── 1_model.ipynb ├── 2_dataset.ipynb ├── 3_pretrain.ipynb ├── 4_sft.ipynb ├── 5_dpo.ipynb ├── 6_lora.ipynb ├── 7_distill.ipynb ├── 8_reason.ipynb ├── LICENSE ├── README.md ├── images ├── LLM-structure-moe.png ├── LLM-structure.png ├── gqa.png └── lora.png ├── model ├── LMConfig.py ├── dataset.py ├── minimind_tokenizer │ ├── merges.txt │ ├── tokenizer.json │ ├── tokenizer_config.json │ └── vocab.json └── model.py ├── requirements.txt └── toydata ├── dpo_data.jsonl ├── lora_data.jsonl ├── pretrain_data.jsonl ├── r1_data.jsonl ├── sft_data.jsonl └── tokenizer_data.jsonl /.gitignore: -------------------------------------------------------------------------------- 1 | **/.ipynb_checkpoints 2 | 3 | **/__pycache__ -------------------------------------------------------------------------------- /0_tokenizer.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/0_tokenizer.ipynb -------------------------------------------------------------------------------- /1_model.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/1_model.ipynb -------------------------------------------------------------------------------- /2_dataset.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/2_dataset.ipynb -------------------------------------------------------------------------------- /3_pretrain.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/3_pretrain.ipynb -------------------------------------------------------------------------------- /4_sft.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/4_sft.ipynb -------------------------------------------------------------------------------- /5_dpo.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/5_dpo.ipynb -------------------------------------------------------------------------------- /6_lora.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/6_lora.ipynb -------------------------------------------------------------------------------- /7_distill.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/7_distill.ipynb -------------------------------------------------------------------------------- /8_reason.ipynb: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/8_reason.ipynb -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/README.md -------------------------------------------------------------------------------- /images/LLM-structure-moe.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/images/LLM-structure-moe.png -------------------------------------------------------------------------------- /images/LLM-structure.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/images/LLM-structure.png -------------------------------------------------------------------------------- /images/gqa.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/images/gqa.png -------------------------------------------------------------------------------- /images/lora.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/images/lora.png -------------------------------------------------------------------------------- /model/LMConfig.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/model/LMConfig.py -------------------------------------------------------------------------------- /model/dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/model/dataset.py -------------------------------------------------------------------------------- /model/minimind_tokenizer/merges.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/model/minimind_tokenizer/merges.txt -------------------------------------------------------------------------------- /model/minimind_tokenizer/tokenizer.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/model/minimind_tokenizer/tokenizer.json -------------------------------------------------------------------------------- /model/minimind_tokenizer/tokenizer_config.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/model/minimind_tokenizer/tokenizer_config.json -------------------------------------------------------------------------------- /model/minimind_tokenizer/vocab.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/model/minimind_tokenizer/vocab.json -------------------------------------------------------------------------------- /model/model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/model/model.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/requirements.txt -------------------------------------------------------------------------------- /toydata/dpo_data.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/toydata/dpo_data.jsonl -------------------------------------------------------------------------------- /toydata/lora_data.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/toydata/lora_data.jsonl -------------------------------------------------------------------------------- /toydata/pretrain_data.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/toydata/pretrain_data.jsonl -------------------------------------------------------------------------------- /toydata/r1_data.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/toydata/r1_data.jsonl -------------------------------------------------------------------------------- /toydata/sft_data.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/toydata/sft_data.jsonl -------------------------------------------------------------------------------- /toydata/tokenizer_data.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Nijikadesu/breakdown-minimind/HEAD/toydata/tokenizer_data.jsonl --------------------------------------------------------------------------------