├── .gitignore ├── DATA_LICENSE ├── LICENSE ├── README.md ├── assets └── images │ ├── Dromedary-2.png │ ├── salmon_comparison.png │ ├── salmon_logo_with_text.jpeg │ └── salmon_pipeline.png ├── inference ├── README.md └── demo.py ├── prompts ├── dromedary_inference_prompt.txt ├── pmp_reward_model_prompt.txt ├── principles │ ├── principle_collection_harmless.json │ ├── principle_collection_honest.json │ ├── principle_collection_non_evasive.json │ ├── principle_collection_ppo.json │ └── principle_collection_rm.json ├── salmon_reward_model_prompt_v0.txt ├── salmon_reward_model_prompt_v1.txt └── synthetic_preference_prompt.txt ├── requirements.txt └── training ├── README.md ├── data_utils ├── common_utils.py ├── data_utils_ppo.py ├── data_utils_rm.py └── data_utils_sft.py ├── models ├── configuration_llama.py ├── distributed_utils.py ├── llama_with_flash_attn.py ├── ppo_trainer.py ├── qlora_model.py ├── reward_model.py ├── rl_models.py ├── rl_trainer.py └── trainer_utils.py ├── qlora_utils.py ├── step1_synthetic_preference_collection ├── batch_generation.py ├── clean_oasst1_prompts.py ├── scripts │ ├── generate_oasst1_response0.sh │ ├── generate_oasst1_response1.sh │ └── generate_synthetic_preference.sh └── synthetic_preference.py ├── step2_rm_training ├── aggregate_synthetic_preference.py ├── clean_pmp_data.py └── scripts │ ├── train_reward_model_70b_qlora_ft.sh │ └── train_reward_model_70b_qlora_pmp.sh ├── step3_ppo_training ├── aggregate_sharegpt_prompts.py ├── clean_and_merge_prompts.py ├── scripts │ └── train_ppo_model_70b_qlora_salmon.sh └── subsample_openorca_prompts.py ├── train_qlora_ppo.py ├── train_qlora_rm.py └── train_qlora_sft.py /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/.gitignore -------------------------------------------------------------------------------- /DATA_LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/DATA_LICENSE -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/README.md -------------------------------------------------------------------------------- /assets/images/Dromedary-2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/assets/images/Dromedary-2.png -------------------------------------------------------------------------------- /assets/images/salmon_comparison.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/assets/images/salmon_comparison.png -------------------------------------------------------------------------------- /assets/images/salmon_logo_with_text.jpeg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/assets/images/salmon_logo_with_text.jpeg -------------------------------------------------------------------------------- /assets/images/salmon_pipeline.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/assets/images/salmon_pipeline.png -------------------------------------------------------------------------------- /inference/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/inference/README.md -------------------------------------------------------------------------------- /inference/demo.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/inference/demo.py -------------------------------------------------------------------------------- /prompts/dromedary_inference_prompt.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/prompts/dromedary_inference_prompt.txt -------------------------------------------------------------------------------- /prompts/pmp_reward_model_prompt.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/prompts/pmp_reward_model_prompt.txt -------------------------------------------------------------------------------- /prompts/principles/principle_collection_harmless.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/prompts/principles/principle_collection_harmless.json -------------------------------------------------------------------------------- /prompts/principles/principle_collection_honest.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/prompts/principles/principle_collection_honest.json -------------------------------------------------------------------------------- /prompts/principles/principle_collection_non_evasive.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/prompts/principles/principle_collection_non_evasive.json -------------------------------------------------------------------------------- /prompts/principles/principle_collection_ppo.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/prompts/principles/principle_collection_ppo.json -------------------------------------------------------------------------------- /prompts/principles/principle_collection_rm.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/prompts/principles/principle_collection_rm.json -------------------------------------------------------------------------------- /prompts/salmon_reward_model_prompt_v0.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/prompts/salmon_reward_model_prompt_v0.txt -------------------------------------------------------------------------------- /prompts/salmon_reward_model_prompt_v1.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/prompts/salmon_reward_model_prompt_v1.txt -------------------------------------------------------------------------------- /prompts/synthetic_preference_prompt.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/prompts/synthetic_preference_prompt.txt -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/requirements.txt -------------------------------------------------------------------------------- /training/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/README.md -------------------------------------------------------------------------------- /training/data_utils/common_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/data_utils/common_utils.py -------------------------------------------------------------------------------- /training/data_utils/data_utils_ppo.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/data_utils/data_utils_ppo.py -------------------------------------------------------------------------------- /training/data_utils/data_utils_rm.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/data_utils/data_utils_rm.py -------------------------------------------------------------------------------- /training/data_utils/data_utils_sft.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/data_utils/data_utils_sft.py -------------------------------------------------------------------------------- /training/models/configuration_llama.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/models/configuration_llama.py -------------------------------------------------------------------------------- /training/models/distributed_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/models/distributed_utils.py -------------------------------------------------------------------------------- /training/models/llama_with_flash_attn.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/models/llama_with_flash_attn.py -------------------------------------------------------------------------------- /training/models/ppo_trainer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/models/ppo_trainer.py -------------------------------------------------------------------------------- /training/models/qlora_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/models/qlora_model.py -------------------------------------------------------------------------------- /training/models/reward_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/models/reward_model.py -------------------------------------------------------------------------------- /training/models/rl_models.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/models/rl_models.py -------------------------------------------------------------------------------- /training/models/rl_trainer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/models/rl_trainer.py -------------------------------------------------------------------------------- /training/models/trainer_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/models/trainer_utils.py -------------------------------------------------------------------------------- /training/qlora_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/qlora_utils.py -------------------------------------------------------------------------------- /training/step1_synthetic_preference_collection/batch_generation.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/step1_synthetic_preference_collection/batch_generation.py -------------------------------------------------------------------------------- /training/step1_synthetic_preference_collection/clean_oasst1_prompts.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/step1_synthetic_preference_collection/clean_oasst1_prompts.py -------------------------------------------------------------------------------- /training/step1_synthetic_preference_collection/scripts/generate_oasst1_response0.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/step1_synthetic_preference_collection/scripts/generate_oasst1_response0.sh -------------------------------------------------------------------------------- /training/step1_synthetic_preference_collection/scripts/generate_oasst1_response1.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/step1_synthetic_preference_collection/scripts/generate_oasst1_response1.sh -------------------------------------------------------------------------------- /training/step1_synthetic_preference_collection/scripts/generate_synthetic_preference.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/step1_synthetic_preference_collection/scripts/generate_synthetic_preference.sh -------------------------------------------------------------------------------- /training/step1_synthetic_preference_collection/synthetic_preference.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/step1_synthetic_preference_collection/synthetic_preference.py -------------------------------------------------------------------------------- /training/step2_rm_training/aggregate_synthetic_preference.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/step2_rm_training/aggregate_synthetic_preference.py -------------------------------------------------------------------------------- /training/step2_rm_training/clean_pmp_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/step2_rm_training/clean_pmp_data.py -------------------------------------------------------------------------------- /training/step2_rm_training/scripts/train_reward_model_70b_qlora_ft.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/step2_rm_training/scripts/train_reward_model_70b_qlora_ft.sh -------------------------------------------------------------------------------- /training/step2_rm_training/scripts/train_reward_model_70b_qlora_pmp.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/step2_rm_training/scripts/train_reward_model_70b_qlora_pmp.sh -------------------------------------------------------------------------------- /training/step3_ppo_training/aggregate_sharegpt_prompts.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/step3_ppo_training/aggregate_sharegpt_prompts.py -------------------------------------------------------------------------------- /training/step3_ppo_training/clean_and_merge_prompts.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/step3_ppo_training/clean_and_merge_prompts.py -------------------------------------------------------------------------------- /training/step3_ppo_training/scripts/train_ppo_model_70b_qlora_salmon.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/step3_ppo_training/scripts/train_ppo_model_70b_qlora_salmon.sh -------------------------------------------------------------------------------- /training/step3_ppo_training/subsample_openorca_prompts.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/step3_ppo_training/subsample_openorca_prompts.py -------------------------------------------------------------------------------- /training/train_qlora_ppo.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/train_qlora_ppo.py -------------------------------------------------------------------------------- /training/train_qlora_rm.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/train_qlora_rm.py -------------------------------------------------------------------------------- /training/train_qlora_sft.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/IBM/SALMON/HEAD/training/train_qlora_sft.py --------------------------------------------------------------------------------