├── Notes ├── Intro_RLHF.md ├── R1_reasoning.md └── Reward_Hacking.md ├── README.md ├── Slides └── Intro_RLHF_Reading_Group.pdf └── images └── bt_model_rlhf_workflow.png /Notes/Intro_RLHF.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yihedeng9/rlhf-summary-notes/HEAD/Notes/Intro_RLHF.md -------------------------------------------------------------------------------- /Notes/R1_reasoning.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yihedeng9/rlhf-summary-notes/HEAD/Notes/R1_reasoning.md -------------------------------------------------------------------------------- /Notes/Reward_Hacking.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yihedeng9/rlhf-summary-notes/HEAD/Notes/Reward_Hacking.md -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yihedeng9/rlhf-summary-notes/HEAD/README.md -------------------------------------------------------------------------------- /Slides/Intro_RLHF_Reading_Group.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yihedeng9/rlhf-summary-notes/HEAD/Slides/Intro_RLHF_Reading_Group.pdf -------------------------------------------------------------------------------- /images/bt_model_rlhf_workflow.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/yihedeng9/rlhf-summary-notes/HEAD/images/bt_model_rlhf_workflow.png --------------------------------------------------------------------------------