├── LICENSE ├── README.md ├── data └── math_tasks.jsonl ├── loss.py ├── replay_buffer.py ├── requirements.txt └── train.py /LICENSE: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/open-thought/tiny-grpo/HEAD/LICENSE -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/open-thought/tiny-grpo/HEAD/README.md -------------------------------------------------------------------------------- /data/math_tasks.jsonl: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/open-thought/tiny-grpo/HEAD/data/math_tasks.jsonl -------------------------------------------------------------------------------- /loss.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/open-thought/tiny-grpo/HEAD/loss.py -------------------------------------------------------------------------------- /replay_buffer.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/open-thought/tiny-grpo/HEAD/replay_buffer.py -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | torch==2.5.1 2 | transformers==4.48.1 3 | accelerate==1.3.0 4 | wandb==0.19.4 5 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/open-thought/tiny-grpo/HEAD/train.py --------------------------------------------------------------------------------