├── .gitignore ├── README.md ├── assets └── rl_agent.png ├── configs └── zero3.yaml ├── pyproject.toml ├── quick_start_install.sh ├── scripts ├── run_grpo_mr.sh ├── run_grpo_or.sh └── run_mt_grpo.sh └── verifiers ├── __init__.py ├── envs ├── __init__.py ├── code_env.py ├── doublecheck_env.py ├── environment.py ├── math_env.py ├── multiturn_env.py ├── simple_env.py ├── textarena_env.py └── tool_env.py ├── examples ├── __init__.py ├── gsm8k_calculator.py ├── gsm8k_code.py ├── gsm8k_doublecheck.py ├── gsm8k_peft.py ├── gsm8k_simple.py ├── math_code.py ├── math_doublecheck.py ├── math_simple.py ├── openbookqa_search.py └── triviaqa_search.py ├── imports.py ├── mock_vllm.py ├── parsers ├── __init__.py └── xml_parser.py ├── prompts ├── __init__.py ├── few_shots.py ├── system_prompts.py └── templates.py ├── rubrics ├── __init__.py ├── code_rubric.py ├── math_rubric.py ├── rubric.py ├── tool_rubric.py └── triviaqa_rubric.py ├── tools ├── __init__.py ├── calculator.py ├── commonsense_tools.py ├── local_wiki_search.py └── search.py ├── trainers ├── __init__.py ├── grpo_env_trainer.py └── mt_grpo_env_trainer.py └── utils ├── __init__.py ├── config_utils.py ├── data_utils.py ├── logging_utils.py └── model_utils.py /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/.gitignore -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/README.md -------------------------------------------------------------------------------- /assets/rl_agent.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/assets/rl_agent.png -------------------------------------------------------------------------------- /configs/zero3.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/configs/zero3.yaml -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/pyproject.toml -------------------------------------------------------------------------------- /quick_start_install.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/quick_start_install.sh -------------------------------------------------------------------------------- /scripts/run_grpo_mr.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/scripts/run_grpo_mr.sh -------------------------------------------------------------------------------- /scripts/run_grpo_or.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/scripts/run_grpo_or.sh -------------------------------------------------------------------------------- /scripts/run_mt_grpo.sh: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/scripts/run_mt_grpo.sh -------------------------------------------------------------------------------- /verifiers/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/__init__.py -------------------------------------------------------------------------------- /verifiers/envs/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/envs/__init__.py -------------------------------------------------------------------------------- /verifiers/envs/code_env.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/envs/code_env.py -------------------------------------------------------------------------------- /verifiers/envs/doublecheck_env.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/envs/doublecheck_env.py -------------------------------------------------------------------------------- /verifiers/envs/environment.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/envs/environment.py -------------------------------------------------------------------------------- /verifiers/envs/math_env.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/envs/math_env.py -------------------------------------------------------------------------------- /verifiers/envs/multiturn_env.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/envs/multiturn_env.py -------------------------------------------------------------------------------- /verifiers/envs/simple_env.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/envs/simple_env.py -------------------------------------------------------------------------------- /verifiers/envs/textarena_env.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /verifiers/envs/tool_env.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/envs/tool_env.py -------------------------------------------------------------------------------- /verifiers/examples/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/examples/__init__.py -------------------------------------------------------------------------------- /verifiers/examples/gsm8k_calculator.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/examples/gsm8k_calculator.py -------------------------------------------------------------------------------- /verifiers/examples/gsm8k_code.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/examples/gsm8k_code.py -------------------------------------------------------------------------------- /verifiers/examples/gsm8k_doublecheck.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/examples/gsm8k_doublecheck.py -------------------------------------------------------------------------------- /verifiers/examples/gsm8k_peft.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/examples/gsm8k_peft.py -------------------------------------------------------------------------------- /verifiers/examples/gsm8k_simple.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/examples/gsm8k_simple.py -------------------------------------------------------------------------------- /verifiers/examples/math_code.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/examples/math_code.py -------------------------------------------------------------------------------- /verifiers/examples/math_doublecheck.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/examples/math_doublecheck.py -------------------------------------------------------------------------------- /verifiers/examples/math_simple.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/examples/math_simple.py -------------------------------------------------------------------------------- /verifiers/examples/openbookqa_search.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/examples/openbookqa_search.py -------------------------------------------------------------------------------- /verifiers/examples/triviaqa_search.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/examples/triviaqa_search.py -------------------------------------------------------------------------------- /verifiers/imports.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/imports.py -------------------------------------------------------------------------------- /verifiers/mock_vllm.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/mock_vllm.py -------------------------------------------------------------------------------- /verifiers/parsers/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/parsers/__init__.py -------------------------------------------------------------------------------- /verifiers/parsers/xml_parser.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/parsers/xml_parser.py -------------------------------------------------------------------------------- /verifiers/prompts/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/prompts/__init__.py -------------------------------------------------------------------------------- /verifiers/prompts/few_shots.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/prompts/few_shots.py -------------------------------------------------------------------------------- /verifiers/prompts/system_prompts.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/prompts/system_prompts.py -------------------------------------------------------------------------------- /verifiers/prompts/templates.py: -------------------------------------------------------------------------------- 1 | -------------------------------------------------------------------------------- /verifiers/rubrics/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/rubrics/__init__.py -------------------------------------------------------------------------------- /verifiers/rubrics/code_rubric.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/rubrics/code_rubric.py -------------------------------------------------------------------------------- /verifiers/rubrics/math_rubric.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/rubrics/math_rubric.py -------------------------------------------------------------------------------- /verifiers/rubrics/rubric.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/rubrics/rubric.py -------------------------------------------------------------------------------- /verifiers/rubrics/tool_rubric.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/rubrics/tool_rubric.py -------------------------------------------------------------------------------- /verifiers/rubrics/triviaqa_rubric.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/rubrics/triviaqa_rubric.py -------------------------------------------------------------------------------- /verifiers/tools/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/tools/__init__.py -------------------------------------------------------------------------------- /verifiers/tools/calculator.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/tools/calculator.py -------------------------------------------------------------------------------- /verifiers/tools/commonsense_tools.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/tools/commonsense_tools.py -------------------------------------------------------------------------------- /verifiers/tools/local_wiki_search.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/tools/local_wiki_search.py -------------------------------------------------------------------------------- /verifiers/tools/search.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/tools/search.py -------------------------------------------------------------------------------- /verifiers/trainers/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/trainers/__init__.py -------------------------------------------------------------------------------- /verifiers/trainers/grpo_env_trainer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/trainers/grpo_env_trainer.py -------------------------------------------------------------------------------- /verifiers/trainers/mt_grpo_env_trainer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/trainers/mt_grpo_env_trainer.py -------------------------------------------------------------------------------- /verifiers/utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/utils/__init__.py -------------------------------------------------------------------------------- /verifiers/utils/config_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/utils/config_utils.py -------------------------------------------------------------------------------- /verifiers/utils/data_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/utils/data_utils.py -------------------------------------------------------------------------------- /verifiers/utils/logging_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/utils/logging_utils.py -------------------------------------------------------------------------------- /verifiers/utils/model_utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/SiliangZeng/Multi-Turn-RL-Agent/HEAD/verifiers/utils/model_utils.py --------------------------------------------------------------------------------