├── .gitignore ├── LICENSE ├── README.md ├── README_en.md ├── assets └── loss.png ├── configs ├── 10w_vocab_wudao5_pile10.model ├── 6w_vocab_wudao5_pile10.model ├── default_config.yaml ├── llama_tokenizer.model └── train_config.py ├── data ├── download_the_pile.sh ├── download_wudao.sh ├── preprocess_the_pile.py └── preprocess_wudao.py ├── dataset ├── data_iter.py ├── pretrain_dataset.py ├── tokenizer.py ├── train_tokenizer.py └── validation.py ├── models └── llama.py ├── pretrain_llama.py └── requirements.txt /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/.gitignore -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/README.md -------------------------------------------------------------------------------- /README_en.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/README_en.md -------------------------------------------------------------------------------- /assets/loss.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/assets/loss.png -------------------------------------------------------------------------------- /configs/10w_vocab_wudao5_pile10.model: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/configs/10w_vocab_wudao5_pile10.model -------------------------------------------------------------------------------- /configs/6w_vocab_wudao5_pile10.model: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/configs/6w_vocab_wudao5_pile10.model -------------------------------------------------------------------------------- /configs/default_config.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/configs/default_config.yaml -------------------------------------------------------------------------------- /configs/llama_tokenizer.model: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/configs/llama_tokenizer.model -------------------------------------------------------------------------------- /configs/train_config.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/configs/train_config.py -------------------------------------------------------------------------------- /data/download_the_pile.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/data/download_the_pile.sh -------------------------------------------------------------------------------- /data/download_wudao.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/data/download_wudao.sh -------------------------------------------------------------------------------- /data/preprocess_the_pile.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/data/preprocess_the_pile.py -------------------------------------------------------------------------------- /data/preprocess_wudao.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/data/preprocess_wudao.py -------------------------------------------------------------------------------- /dataset/data_iter.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/dataset/data_iter.py -------------------------------------------------------------------------------- /dataset/pretrain_dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/dataset/pretrain_dataset.py -------------------------------------------------------------------------------- /dataset/tokenizer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/dataset/tokenizer.py -------------------------------------------------------------------------------- /dataset/train_tokenizer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/dataset/train_tokenizer.py -------------------------------------------------------------------------------- /dataset/validation.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/dataset/validation.py -------------------------------------------------------------------------------- /models/llama.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/models/llama.py -------------------------------------------------------------------------------- /pretrain_llama.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/pretrain_llama.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/beichao1314/Open-Llama/HEAD/requirements.txt --------------------------------------------------------------------------------