├── LICENSE ├── LMtrainer ├── data_utils │ ├── __init__.py │ └── traverse_utils.py ├── model │ ├── config.json │ ├── special_tokens_map.json │ ├── tokenizer.json │ └── tokenizer_config.json ├── mpu │ ├── __init__.py │ ├── initialize.py │ └── utils.py └── pretrain.py ├── README.md ├── compress_pythia.py ├── datalist ├── datalist1.txt └── datalist2.txt ├── fig ├── framework.png └── loss.png ├── generate_subdataset.py ├── requirements.txt └── scripts ├── pretrain_phase1.sh ├── pretrain_phase2.sh └── recon_ckpts.sh /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/LICENSE -------------------------------------------------------------------------------- /LMtrainer/data_utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/LMtrainer/data_utils/__init__.py -------------------------------------------------------------------------------- /LMtrainer/data_utils/traverse_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/LMtrainer/data_utils/traverse_utils.py -------------------------------------------------------------------------------- /LMtrainer/model/config.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/LMtrainer/model/config.json -------------------------------------------------------------------------------- /LMtrainer/model/special_tokens_map.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/LMtrainer/model/special_tokens_map.json -------------------------------------------------------------------------------- /LMtrainer/model/tokenizer.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/LMtrainer/model/tokenizer.json -------------------------------------------------------------------------------- /LMtrainer/model/tokenizer_config.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/LMtrainer/model/tokenizer_config.json -------------------------------------------------------------------------------- /LMtrainer/mpu/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/LMtrainer/mpu/__init__.py -------------------------------------------------------------------------------- /LMtrainer/mpu/initialize.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/LMtrainer/mpu/initialize.py -------------------------------------------------------------------------------- /LMtrainer/mpu/utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/LMtrainer/mpu/utils.py -------------------------------------------------------------------------------- /LMtrainer/pretrain.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/LMtrainer/pretrain.py -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/README.md -------------------------------------------------------------------------------- /compress_pythia.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/compress_pythia.py -------------------------------------------------------------------------------- /datalist/datalist1.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/datalist/datalist1.txt -------------------------------------------------------------------------------- /datalist/datalist2.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/datalist/datalist2.txt -------------------------------------------------------------------------------- /fig/framework.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/fig/framework.png -------------------------------------------------------------------------------- /fig/loss.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/fig/loss.png -------------------------------------------------------------------------------- /generate_subdataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/generate_subdataset.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/requirements.txt -------------------------------------------------------------------------------- /scripts/pretrain_phase1.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/scripts/pretrain_phase1.sh -------------------------------------------------------------------------------- /scripts/pretrain_phase2.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/scripts/pretrain_phase2.sh -------------------------------------------------------------------------------- /scripts/recon_ckpts.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Gaffey/ExCP/HEAD/scripts/recon_ckpts.sh --------------------------------------------------------------------------------