├── .DS_Store ├── README.md ├── dpo_evaluation.py ├── dpo_train_from_scratch.py ├── grpo_evaluation.py └── grpo_train_from_scratch.py /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mingyin0312/RLFromScratch/HEAD/.DS_Store -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mingyin0312/RLFromScratch/HEAD/README.md -------------------------------------------------------------------------------- /dpo_evaluation.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mingyin0312/RLFromScratch/HEAD/dpo_evaluation.py -------------------------------------------------------------------------------- /dpo_train_from_scratch.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mingyin0312/RLFromScratch/HEAD/dpo_train_from_scratch.py -------------------------------------------------------------------------------- /grpo_evaluation.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mingyin0312/RLFromScratch/HEAD/grpo_evaluation.py -------------------------------------------------------------------------------- /grpo_train_from_scratch.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/mingyin0312/RLFromScratch/HEAD/grpo_train_from_scratch.py --------------------------------------------------------------------------------