├── LICENSE ├── README.md ├── README_zh.md ├── assets ├── llama-3-syne-logo.png └── pipeline.png └── src ├── cal_data_volumn.py ├── config └── ds2_config.json ├── fetch_data.py ├── save_hf_dataset.py ├── scripts ├── launch.sh ├── run_cal_data_volumn.sh ├── run_fetch_data.sh ├── run_save_hf_dataset.sh ├── run_tokenize_split.sh └── run_train_multi.sh ├── split_data.py ├── tokenize_text.py ├── train.py └── train_utils.py /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RUC-GSAI/Llama-3-SynE/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RUC-GSAI/Llama-3-SynE/HEAD/README.md -------------------------------------------------------------------------------- /README_zh.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RUC-GSAI/Llama-3-SynE/HEAD/README_zh.md -------------------------------------------------------------------------------- /assets/llama-3-syne-logo.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RUC-GSAI/Llama-3-SynE/HEAD/assets/llama-3-syne-logo.png -------------------------------------------------------------------------------- /assets/pipeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RUC-GSAI/Llama-3-SynE/HEAD/assets/pipeline.png -------------------------------------------------------------------------------- /src/cal_data_volumn.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RUC-GSAI/Llama-3-SynE/HEAD/src/cal_data_volumn.py -------------------------------------------------------------------------------- /src/config/ds2_config.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RUC-GSAI/Llama-3-SynE/HEAD/src/config/ds2_config.json -------------------------------------------------------------------------------- /src/fetch_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RUC-GSAI/Llama-3-SynE/HEAD/src/fetch_data.py -------------------------------------------------------------------------------- /src/save_hf_dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RUC-GSAI/Llama-3-SynE/HEAD/src/save_hf_dataset.py -------------------------------------------------------------------------------- /src/scripts/launch.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RUC-GSAI/Llama-3-SynE/HEAD/src/scripts/launch.sh -------------------------------------------------------------------------------- /src/scripts/run_cal_data_volumn.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | python cal_data_volumn.py 4 | -------------------------------------------------------------------------------- /src/scripts/run_fetch_data.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RUC-GSAI/Llama-3-SynE/HEAD/src/scripts/run_fetch_data.sh -------------------------------------------------------------------------------- /src/scripts/run_save_hf_dataset.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RUC-GSAI/Llama-3-SynE/HEAD/src/scripts/run_save_hf_dataset.sh -------------------------------------------------------------------------------- /src/scripts/run_tokenize_split.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RUC-GSAI/Llama-3-SynE/HEAD/src/scripts/run_tokenize_split.sh -------------------------------------------------------------------------------- /src/scripts/run_train_multi.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RUC-GSAI/Llama-3-SynE/HEAD/src/scripts/run_train_multi.sh -------------------------------------------------------------------------------- /src/split_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RUC-GSAI/Llama-3-SynE/HEAD/src/split_data.py -------------------------------------------------------------------------------- /src/tokenize_text.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RUC-GSAI/Llama-3-SynE/HEAD/src/tokenize_text.py -------------------------------------------------------------------------------- /src/train.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RUC-GSAI/Llama-3-SynE/HEAD/src/train.py -------------------------------------------------------------------------------- /src/train_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/RUC-GSAI/Llama-3-SynE/HEAD/src/train_utils.py --------------------------------------------------------------------------------