├── Gsm8k ├── test-00000-of-00001.parquet ├── test.jsonl ├── train-00000-of-00001.parquet └── train.jsonl ├── Qwen └── README.md ├── README.md ├── grpo_train.py ├── infer_vllm.py ├── inference.py ├── main.py ├── outputs └── README.md ├── requirements.txt ├── reward.py ├── sft_train.py ├── test_infer.py ├── test_infer_vllm.py └── utils.py /Gsm8k/test-00000-of-00001.parquet: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuchen6667/qwen_grpo_gsm8k/HEAD/Gsm8k/test-00000-of-00001.parquet -------------------------------------------------------------------------------- /Gsm8k/test.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuchen6667/qwen_grpo_gsm8k/HEAD/Gsm8k/test.jsonl -------------------------------------------------------------------------------- /Gsm8k/train-00000-of-00001.parquet: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuchen6667/qwen_grpo_gsm8k/HEAD/Gsm8k/train-00000-of-00001.parquet -------------------------------------------------------------------------------- /Gsm8k/train.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuchen6667/qwen_grpo_gsm8k/HEAD/Gsm8k/train.jsonl -------------------------------------------------------------------------------- /Qwen/README.md: -------------------------------------------------------------------------------- 1 | 存放权重 -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuchen6667/qwen_grpo_gsm8k/HEAD/README.md -------------------------------------------------------------------------------- /grpo_train.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuchen6667/qwen_grpo_gsm8k/HEAD/grpo_train.py -------------------------------------------------------------------------------- /infer_vllm.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuchen6667/qwen_grpo_gsm8k/HEAD/infer_vllm.py -------------------------------------------------------------------------------- /inference.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuchen6667/qwen_grpo_gsm8k/HEAD/inference.py -------------------------------------------------------------------------------- /main.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuchen6667/qwen_grpo_gsm8k/HEAD/main.py -------------------------------------------------------------------------------- /outputs/README.md: -------------------------------------------------------------------------------- 1 | 存放训练后的模型 -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuchen6667/qwen_grpo_gsm8k/HEAD/requirements.txt -------------------------------------------------------------------------------- /reward.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuchen6667/qwen_grpo_gsm8k/HEAD/reward.py -------------------------------------------------------------------------------- /sft_train.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuchen6667/qwen_grpo_gsm8k/HEAD/sft_train.py -------------------------------------------------------------------------------- /test_infer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuchen6667/qwen_grpo_gsm8k/HEAD/test_infer.py -------------------------------------------------------------------------------- /test_infer_vllm.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuchen6667/qwen_grpo_gsm8k/HEAD/test_infer_vllm.py -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/liuchen6667/qwen_grpo_gsm8k/HEAD/utils.py --------------------------------------------------------------------------------