├── .DS_Store ├── README.md ├── SFT ├── README.md ├── code-gemma-7B-it.yaml ├── gemma-2-9b-it.yaml ├── gemma-7b-it.yaml ├── mistral-v0.3.yaml └── sft_configs │ ├── sft_ds2.json │ └── sft_ds3.json ├── alignment_algorithms ├── README.md ├── dpo.py ├── dpo_trainer.py ├── kto_trainer.py ├── run_dpo.py ├── run_dpo.sh ├── run_kto.py ├── run_kto.sh └── training_configs │ ├── zero2_pf.yaml │ └── zero3_pf.yaml ├── assets └── main_result.png ├── inference ├── .DS_Store ├── README.md ├── data │ ├── gsm8k │ │ ├── test.jsonl │ │ └── train.jsonl │ └── math │ │ ├── test.jsonl │ │ └── train.jsonl ├── eval │ ├── evaluate.py │ └── grader.py ├── infer_data │ ├── annotate_data.py │ ├── get_dpo_dataset.py │ └── infer_eval.py ├── scripts │ ├── eval.sh │ ├── infer.sh │ ├── iter_infer_to_collect_data.sh │ └── register_server.sh └── utils │ ├── annotate_data.py │ ├── data_loader.py │ ├── filter_data.py │ ├── parser.py │ ├── python_executor.py │ └── utils.py └── useful_codes ├── annotate_data.py ├── interpolate_model.py ├── merge.py ├── push_model.py └── set_padding_token.py /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/.DS_Store -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/README.md -------------------------------------------------------------------------------- /SFT/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/SFT/README.md -------------------------------------------------------------------------------- /SFT/code-gemma-7B-it.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/SFT/code-gemma-7B-it.yaml -------------------------------------------------------------------------------- /SFT/gemma-2-9b-it.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/SFT/gemma-2-9b-it.yaml -------------------------------------------------------------------------------- /SFT/gemma-7b-it.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/SFT/gemma-7b-it.yaml -------------------------------------------------------------------------------- /SFT/mistral-v0.3.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/SFT/mistral-v0.3.yaml -------------------------------------------------------------------------------- /SFT/sft_configs/sft_ds2.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/SFT/sft_configs/sft_ds2.json -------------------------------------------------------------------------------- /SFT/sft_configs/sft_ds3.json: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/SFT/sft_configs/sft_ds3.json -------------------------------------------------------------------------------- /alignment_algorithms/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/alignment_algorithms/README.md -------------------------------------------------------------------------------- /alignment_algorithms/dpo.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/alignment_algorithms/dpo.py -------------------------------------------------------------------------------- /alignment_algorithms/dpo_trainer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/alignment_algorithms/dpo_trainer.py -------------------------------------------------------------------------------- /alignment_algorithms/kto_trainer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/alignment_algorithms/kto_trainer.py -------------------------------------------------------------------------------- /alignment_algorithms/run_dpo.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/alignment_algorithms/run_dpo.py -------------------------------------------------------------------------------- /alignment_algorithms/run_dpo.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/alignment_algorithms/run_dpo.sh -------------------------------------------------------------------------------- /alignment_algorithms/run_kto.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/alignment_algorithms/run_kto.py -------------------------------------------------------------------------------- /alignment_algorithms/run_kto.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/alignment_algorithms/run_kto.sh -------------------------------------------------------------------------------- /alignment_algorithms/training_configs/zero2_pf.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/alignment_algorithms/training_configs/zero2_pf.yaml -------------------------------------------------------------------------------- /alignment_algorithms/training_configs/zero3_pf.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/alignment_algorithms/training_configs/zero3_pf.yaml -------------------------------------------------------------------------------- /assets/main_result.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/assets/main_result.png -------------------------------------------------------------------------------- /inference/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/.DS_Store -------------------------------------------------------------------------------- /inference/README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/README.md -------------------------------------------------------------------------------- /inference/data/gsm8k/test.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/data/gsm8k/test.jsonl -------------------------------------------------------------------------------- /inference/data/gsm8k/train.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/data/gsm8k/train.jsonl -------------------------------------------------------------------------------- /inference/data/math/test.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/data/math/test.jsonl -------------------------------------------------------------------------------- /inference/data/math/train.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/data/math/train.jsonl -------------------------------------------------------------------------------- /inference/eval/evaluate.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/eval/evaluate.py -------------------------------------------------------------------------------- /inference/eval/grader.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/eval/grader.py -------------------------------------------------------------------------------- /inference/infer_data/annotate_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/infer_data/annotate_data.py -------------------------------------------------------------------------------- /inference/infer_data/get_dpo_dataset.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/infer_data/get_dpo_dataset.py -------------------------------------------------------------------------------- /inference/infer_data/infer_eval.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/infer_data/infer_eval.py -------------------------------------------------------------------------------- /inference/scripts/eval.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/scripts/eval.sh -------------------------------------------------------------------------------- /inference/scripts/infer.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/scripts/infer.sh -------------------------------------------------------------------------------- /inference/scripts/iter_infer_to_collect_data.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/scripts/iter_infer_to_collect_data.sh -------------------------------------------------------------------------------- /inference/scripts/register_server.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/scripts/register_server.sh -------------------------------------------------------------------------------- /inference/utils/annotate_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/utils/annotate_data.py -------------------------------------------------------------------------------- /inference/utils/data_loader.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/utils/data_loader.py -------------------------------------------------------------------------------- /inference/utils/filter_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/utils/filter_data.py -------------------------------------------------------------------------------- /inference/utils/parser.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/utils/parser.py -------------------------------------------------------------------------------- /inference/utils/python_executor.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/utils/python_executor.py -------------------------------------------------------------------------------- /inference/utils/utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/inference/utils/utils.py -------------------------------------------------------------------------------- /useful_codes/annotate_data.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/useful_codes/annotate_data.py -------------------------------------------------------------------------------- /useful_codes/interpolate_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/useful_codes/interpolate_model.py -------------------------------------------------------------------------------- /useful_codes/merge.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/useful_codes/merge.py -------------------------------------------------------------------------------- /useful_codes/push_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/useful_codes/push_model.py -------------------------------------------------------------------------------- /useful_codes/set_padding_token.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/WeiXiongUST/Building-Math-Agents-with-Multi-Turn-Iterative-Preference-Learning/HEAD/useful_codes/set_padding_token.py --------------------------------------------------------------------------------