├── .gitignore ├── .python-version ├── LICENSE ├── README.md ├── config.yaml ├── config_24GB.yaml ├── countdown_task.py ├── data_types.py ├── grpo.py ├── optimizer.py ├── pyproject.toml ├── qwen2_model.py ├── tokenizer.py ├── train.py └── uv.lock /.gitignore: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/policy-gradient/GRPO-Zero/HEAD/.gitignore -------------------------------------------------------------------------------- /.python-version: -------------------------------------------------------------------------------- 1 | 3.11 -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/policy-gradient/GRPO-Zero/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/policy-gradient/GRPO-Zero/HEAD/README.md -------------------------------------------------------------------------------- /config.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/policy-gradient/GRPO-Zero/HEAD/config.yaml -------------------------------------------------------------------------------- /config_24GB.yaml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/policy-gradient/GRPO-Zero/HEAD/config_24GB.yaml -------------------------------------------------------------------------------- /countdown_task.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/policy-gradient/GRPO-Zero/HEAD/countdown_task.py -------------------------------------------------------------------------------- /data_types.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/policy-gradient/GRPO-Zero/HEAD/data_types.py -------------------------------------------------------------------------------- /grpo.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/policy-gradient/GRPO-Zero/HEAD/grpo.py -------------------------------------------------------------------------------- /optimizer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/policy-gradient/GRPO-Zero/HEAD/optimizer.py -------------------------------------------------------------------------------- /pyproject.toml: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/policy-gradient/GRPO-Zero/HEAD/pyproject.toml -------------------------------------------------------------------------------- /qwen2_model.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/policy-gradient/GRPO-Zero/HEAD/qwen2_model.py -------------------------------------------------------------------------------- /tokenizer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/policy-gradient/GRPO-Zero/HEAD/tokenizer.py -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/policy-gradient/GRPO-Zero/HEAD/train.py -------------------------------------------------------------------------------- /uv.lock: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/policy-gradient/GRPO-Zero/HEAD/uv.lock --------------------------------------------------------------------------------