├── .gitignore ├── README.md ├── analyse ├── analyse_alpacaeval_sample.py ├── analyse_gsm_sample.py ├── analyse_humaneval_sample.py ├── analyse_mixeval_sample.py └── analyse_mmlu_sample.py ├── assets ├── align.png ├── len.png ├── main.png ├── param.png ├── poster.png ├── reward.png ├── scale.png └── wildbench.png ├── best_of_n_eval ├── __init__.py ├── alapacaeval_utils.py ├── eval_alpacaeval_sample_baseline.py ├── eval_alpacaeval_sample_reward.py ├── eval_gsm_sample_baseline.py ├── eval_gsm_sample_reward.py ├── eval_humaneval_sample_baseline.py ├── eval_humaneval_sample_reward.py ├── eval_mmlu_sample_baseline.py └── eval_mmlu_sample_reward.py ├── get_benchmark_rewards ├── get_alpacaeval_sample_rewards.py ├── get_gsm_sample_rewards.py ├── get_humaneval_sample_rewards.py └── get_mmlu_sample_rewards.py ├── models ├── __init__.py ├── modeling_eurus_rm.py ├── modeling_starling.py └── modeling_starling_alpha.py ├── requirements.txt └── scripts ├── eval_alpacaeval_sample_baseline.sh ├── eval_alpacaeval_sample_reward.sh ├── eval_gsm_sample_baseline.sh ├── eval_gsm_sample_reward.sh ├── eval_humaneval_sample_baseline.sh ├── eval_humaneval_sample_reward.sh ├── eval_mmlu_sample_baseline.sh ├── eval_mmlu_sample_reward.sh ├── get_alpacaeval_sample_rewards.sh ├── get_gsm_sample_rewards.sh ├── get_humaneval_sample_rewards.sh └── get_mmlu_sample_rewards.sh /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/.gitignore -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/README.md -------------------------------------------------------------------------------- /analyse/analyse_alpacaeval_sample.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/analyse/analyse_alpacaeval_sample.py -------------------------------------------------------------------------------- /analyse/analyse_gsm_sample.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/analyse/analyse_gsm_sample.py -------------------------------------------------------------------------------- /analyse/analyse_humaneval_sample.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/analyse/analyse_humaneval_sample.py -------------------------------------------------------------------------------- /analyse/analyse_mixeval_sample.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/analyse/analyse_mixeval_sample.py -------------------------------------------------------------------------------- /analyse/analyse_mmlu_sample.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/analyse/analyse_mmlu_sample.py -------------------------------------------------------------------------------- /assets/align.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/assets/align.png -------------------------------------------------------------------------------- /assets/len.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/assets/len.png -------------------------------------------------------------------------------- /assets/main.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/assets/main.png -------------------------------------------------------------------------------- /assets/param.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/assets/param.png -------------------------------------------------------------------------------- /assets/poster.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/assets/poster.png -------------------------------------------------------------------------------- /assets/reward.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/assets/reward.png -------------------------------------------------------------------------------- /assets/scale.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/assets/scale.png -------------------------------------------------------------------------------- /assets/wildbench.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/assets/wildbench.png -------------------------------------------------------------------------------- /best_of_n_eval/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/best_of_n_eval/__init__.py -------------------------------------------------------------------------------- /best_of_n_eval/alapacaeval_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/best_of_n_eval/alapacaeval_utils.py -------------------------------------------------------------------------------- /best_of_n_eval/eval_alpacaeval_sample_baseline.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/best_of_n_eval/eval_alpacaeval_sample_baseline.py -------------------------------------------------------------------------------- /best_of_n_eval/eval_alpacaeval_sample_reward.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/best_of_n_eval/eval_alpacaeval_sample_reward.py -------------------------------------------------------------------------------- /best_of_n_eval/eval_gsm_sample_baseline.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/best_of_n_eval/eval_gsm_sample_baseline.py -------------------------------------------------------------------------------- /best_of_n_eval/eval_gsm_sample_reward.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/best_of_n_eval/eval_gsm_sample_reward.py -------------------------------------------------------------------------------- /best_of_n_eval/eval_humaneval_sample_baseline.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/best_of_n_eval/eval_humaneval_sample_baseline.py -------------------------------------------------------------------------------- /best_of_n_eval/eval_humaneval_sample_reward.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/best_of_n_eval/eval_humaneval_sample_reward.py -------------------------------------------------------------------------------- /best_of_n_eval/eval_mmlu_sample_baseline.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/best_of_n_eval/eval_mmlu_sample_baseline.py -------------------------------------------------------------------------------- /best_of_n_eval/eval_mmlu_sample_reward.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/best_of_n_eval/eval_mmlu_sample_reward.py -------------------------------------------------------------------------------- /get_benchmark_rewards/get_alpacaeval_sample_rewards.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/get_benchmark_rewards/get_alpacaeval_sample_rewards.py -------------------------------------------------------------------------------- /get_benchmark_rewards/get_gsm_sample_rewards.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/get_benchmark_rewards/get_gsm_sample_rewards.py -------------------------------------------------------------------------------- /get_benchmark_rewards/get_humaneval_sample_rewards.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/get_benchmark_rewards/get_humaneval_sample_rewards.py -------------------------------------------------------------------------------- /get_benchmark_rewards/get_mmlu_sample_rewards.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/get_benchmark_rewards/get_mmlu_sample_rewards.py -------------------------------------------------------------------------------- /models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/models/__init__.py -------------------------------------------------------------------------------- /models/modeling_eurus_rm.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/models/modeling_eurus_rm.py -------------------------------------------------------------------------------- /models/modeling_starling.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/models/modeling_starling.py -------------------------------------------------------------------------------- /models/modeling_starling_alpha.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/models/modeling_starling_alpha.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/requirements.txt -------------------------------------------------------------------------------- /scripts/eval_alpacaeval_sample_baseline.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/scripts/eval_alpacaeval_sample_baseline.sh -------------------------------------------------------------------------------- /scripts/eval_alpacaeval_sample_reward.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/scripts/eval_alpacaeval_sample_reward.sh -------------------------------------------------------------------------------- /scripts/eval_gsm_sample_baseline.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/scripts/eval_gsm_sample_baseline.sh -------------------------------------------------------------------------------- /scripts/eval_gsm_sample_reward.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/scripts/eval_gsm_sample_reward.sh -------------------------------------------------------------------------------- /scripts/eval_humaneval_sample_baseline.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/scripts/eval_humaneval_sample_baseline.sh -------------------------------------------------------------------------------- /scripts/eval_humaneval_sample_reward.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/scripts/eval_humaneval_sample_reward.sh -------------------------------------------------------------------------------- /scripts/eval_mmlu_sample_baseline.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/scripts/eval_mmlu_sample_baseline.sh -------------------------------------------------------------------------------- /scripts/eval_mmlu_sample_reward.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/scripts/eval_mmlu_sample_reward.sh -------------------------------------------------------------------------------- /scripts/get_alpacaeval_sample_rewards.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/scripts/get_alpacaeval_sample_rewards.sh -------------------------------------------------------------------------------- /scripts/get_gsm_sample_rewards.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/scripts/get_gsm_sample_rewards.sh -------------------------------------------------------------------------------- /scripts/get_humaneval_sample_rewards.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/scripts/get_humaneval_sample_rewards.sh -------------------------------------------------------------------------------- /scripts/get_mmlu_sample_rewards.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Yifan-Song793/GoodBadGreedy/HEAD/scripts/get_mmlu_sample_rewards.sh --------------------------------------------------------------------------------