├── .idea ├── .gitignore ├── LLM-RLHF-Tuning-main.iml ├── deployment.xml ├── inspectionProfiles │ ├── Project_Default.xml │ └── profiles_settings.xml └── modules.xml ├── README.md ├── config └── qwen_config.yaml ├── data ├── pt_data │ └── pt_sample_data.txt ├── rm_data │ └── comparison_gpt4_data_zh.json └── sft_data │ └── alpaca_data_zh_51k.json ├── dpo ├── default_config.yaml ├── run_dpo.sh └── run_dpo_with_peft.py ├── model └── qwen.py ├── ppo ├── default_config.yaml ├── ds_config.yaml ├── run_ppo.sh ├── run_ppo_co.sh ├── run_ppo_co_multi_adapters.sh └── run_ppo_with_peft.py ├── requirements.txt ├── rm ├── run_rm.sh └── run_rm_with_peft.py ├── sft ├── run_sft.sh └── run_sft_with_peft.py └── utils ├── .DS_Store ├── __init__.py ├── data_collator.py ├── metrics.py ├── parser_args.py ├── ppo_models.py ├── ppo_trainer_with_peft.py ├── trainer.py └── utils.py /.idea/.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/.idea/.gitignore -------------------------------------------------------------------------------- /.idea/LLM-RLHF-Tuning-main.iml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/.idea/LLM-RLHF-Tuning-main.iml -------------------------------------------------------------------------------- /.idea/deployment.xml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/.idea/deployment.xml -------------------------------------------------------------------------------- /.idea/inspectionProfiles/Project_Default.xml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/.idea/inspectionProfiles/Project_Default.xml -------------------------------------------------------------------------------- /.idea/inspectionProfiles/profiles_settings.xml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/.idea/inspectionProfiles/profiles_settings.xml -------------------------------------------------------------------------------- /.idea/modules.xml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/.idea/modules.xml -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/README.md -------------------------------------------------------------------------------- /config/qwen_config.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/config/qwen_config.yaml -------------------------------------------------------------------------------- /data/pt_data/pt_sample_data.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/data/pt_data/pt_sample_data.txt -------------------------------------------------------------------------------- /data/rm_data/comparison_gpt4_data_zh.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/data/rm_data/comparison_gpt4_data_zh.json -------------------------------------------------------------------------------- /data/sft_data/alpaca_data_zh_51k.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/data/sft_data/alpaca_data_zh_51k.json -------------------------------------------------------------------------------- /dpo/default_config.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/dpo/default_config.yaml -------------------------------------------------------------------------------- /dpo/run_dpo.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/dpo/run_dpo.sh -------------------------------------------------------------------------------- /dpo/run_dpo_with_peft.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/dpo/run_dpo_with_peft.py -------------------------------------------------------------------------------- /model/qwen.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/model/qwen.py -------------------------------------------------------------------------------- /ppo/default_config.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/ppo/default_config.yaml -------------------------------------------------------------------------------- /ppo/ds_config.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/ppo/ds_config.yaml -------------------------------------------------------------------------------- /ppo/run_ppo.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/ppo/run_ppo.sh -------------------------------------------------------------------------------- /ppo/run_ppo_co.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/ppo/run_ppo_co.sh -------------------------------------------------------------------------------- /ppo/run_ppo_co_multi_adapters.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/ppo/run_ppo_co_multi_adapters.sh -------------------------------------------------------------------------------- /ppo/run_ppo_with_peft.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/ppo/run_ppo_with_peft.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/requirements.txt -------------------------------------------------------------------------------- /rm/run_rm.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/rm/run_rm.sh -------------------------------------------------------------------------------- /rm/run_rm_with_peft.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/rm/run_rm_with_peft.py -------------------------------------------------------------------------------- /sft/run_sft.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/sft/run_sft.sh -------------------------------------------------------------------------------- /sft/run_sft_with_peft.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/sft/run_sft_with_peft.py -------------------------------------------------------------------------------- /utils/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/utils/.DS_Store -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/utils/__init__.py -------------------------------------------------------------------------------- /utils/data_collator.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/utils/data_collator.py -------------------------------------------------------------------------------- /utils/metrics.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/utils/metrics.py -------------------------------------------------------------------------------- /utils/parser_args.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/utils/parser_args.py -------------------------------------------------------------------------------- /utils/ppo_models.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/utils/ppo_models.py -------------------------------------------------------------------------------- /utils/ppo_trainer_with_peft.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/utils/ppo_trainer_with_peft.py -------------------------------------------------------------------------------- /utils/trainer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/utils/trainer.py -------------------------------------------------------------------------------- /utils/utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/garlic-byte/RL-LLM/HEAD/utils/utils.py --------------------------------------------------------------------------------