├── .gitignore ├── README.md ├── images ├── comparison.png ├── framework.png └── taxonomy.png └── paper └── ve.pdf /.gitignore: -------------------------------------------------------------------------------- 1 | */.DS_Store -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering 2 | 3 | 4 | ![Overview](images/framework.png) 5 | 6 | 7 | This is a collection of papers and other resources for verifier engineering, which corresponds to the paper [Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering](paper/ve.pdf). We will update the paper content and this repo regularly, and we very much welcome suggestions of any kind. 8 | 9 | > [!NOTE] 10 | > 🌟 **Feel free to submit pull requests to share your work and insights from the perspective of verifier engineering - your contributions are always welcome!** 11 | 12 | 13 | 14 | ## Overview of Common Verifiers 15 | 16 | | **Verifier Type** | **Verification Form** | **Verify Granularity** | **Verifier Source** | **Extra Training** | 17 | |-----------------------|---------------------------------|---------------------------------------|-----------------------|--------------------| 18 | | Golden Annotation | Binary/Text | Thought Step/Full Trajectory | Program Based | No | 19 | | Rule-based | Binary/Text | Thought Step/Full Trajectory | Program Based | No | 20 | | Code Interpreter | Binary/Score/Text | Token/Thought Step/Full Trajectory | Program Based | No | 21 | | ORM | Binary/Score/Rank/Text | Full Trajectory | Model Based | Yes | 22 | | Language Model | Binary/Score/Rank/Text | Thought Step/Full Trajectory | Model Based | Yes | 23 | | Tool | Binary/Score/Rank/Text | Token/Thought Step/Full Trajectory | Program Based | No | 24 | | Search Engine | Text | Thought Step/Full Trajectory | Program Based | No | 25 | | PRM | Score | Token/Thought Step | Model Based | Yes | 26 | | Knowledge Graph | Text | Thought Step/Full Trajectory | Program Based | No | 27 | 28 | 29 | ## A Verifier Engineering Perspective on Post-training Methods 30 | 31 | 32 | 33 | 34 | 35 | 36 | | | Search | Verify | Feedback | Task | 37 | |---------------------------|-----------------------------------|----------------------------------|-----------------------------------|------------------------| 38 | | [STar](https://arxiv.org/abs/2203.14465)
[RFT](https://arxiv.org/abs/2308.01825)
[WizardMath](https://arxiv.org/abs/2308.09583) | Linear | Golden Annotation | Imitation Learning | Math | 39 | | [CAG](https://arxiv.org/abs/2404.06809) | Linear | Golden Annotation | Imitation Learning | RAG | 40 | | [Self-Instruct](https://arxiv.org/abs/2212.10560) | Linear | Rule-based | Imitation Learning | General | 41 | | [Code Alpaca](https://github.com/sahil280114/codealpaca)
[WizardCoder](https://arxiv.org/abs/2306.08568) | Linear | Rule-based | Imitation Learning | Code | 42 | | [ILF-Code](https://arxiv.org/abs/2303.16749) | Linear | Code interpreter
Human | Imitation Learning | Code | 43 | | [RAFT](https://arxiv.org/abs/2403.10131)
[RRHF](https://arxiv.org/abs/2304.05302) | Linear | ORM | Imitation Learning | General | 44 | | [SSO](https://arxiv.org/abs/2410.17131) | Linear | Rule-based | Preference Learning | Alignment | 45 | | [CodeUltraFeedback](https://arxiv.org/abs/2403.09032) | Linear | Language Model | Preference Learning | Code | 46 | | [Self-Rewarding](https://arxiv.org/abs/2401.10020) | Linear | Language Model | Preference Learning | Alignment | 47 | | [StructRAG](https://arxiv.org/abs/2410.08815) | Linear | Language Model | Preference Learning | RAG | 48 | | [MCTS-DPO](https://arxiv.org/html/2405.00451v2) | Tree | Language Model | Preference Learning | Math | 49 | | [Chain of Preference Optimization](https://arxiv.org/abs/2406.09136) | Tree | Language Model | Preference Learning | Reasoning | 50 | | [LLAMA-BERRY](https://arxiv.org/abs/2410.02884) | Tree | ORM | Preference Learning | Reasoning | 51 | | [Math-Shepherd](https://arxiv.org/abs/2312.08935) | Linear | Golden Annotation
Rule-based | Reinforcement Learning | Math | 52 | | [RLTF](https://arxiv.org/abs/2307.04349)
[PPOCoder](https://arxiv.org/abs/2301.13816) | Linear | Code Interpreter | Reinforcement Learning | Code | 53 | | [RLAIF](https://openreview.net/forum?id=AAxIs3D2ZZ) | Linear | Language Model | Reinforcement Learning | General | 54 | | [SIRLC](https://arxiv.org/abs/2305.14483) | Linear | Language Model | Reinforcement Learning | Reasoning | 55 | | [RLFH](https://arxiv.org/abs/2406.12221) | Linear | Language Model | Reinforcement Learning | Knowledge | 56 | | [RLHF](https://arxiv.org/abs/2203.02155) | Linear | ORM | Reinforcement Learning | Alignment | 57 | | [Quark](https://arxiv.org/abs/2205.13636) | Linear | Tool | Reinforcement Learning | Alignment | 58 | | [RLVR](https://arxiv.org/pdf/2411.15124) | Linear | Binary Verifier | Reinforcement Learning | General | 59 | | [ReST-MCTS](https://arxiv.org/abs/2406.03816) | Tree | Language Model | Reinforcement Learning | Math | 60 | | [CRITIC](https://arxiv.org/abs/2305.11738) | Linear | Code Interpreter
Tool
Search Engine | Verifier-Aware | Math
Code
Knowledge
General | 61 | | [Self-Debug](https://arxiv.org/abs/2304.05128) | Linear | Code Interpreter | Verifier-Aware | Code | 62 | | [Self-Refine](https://arxiv.org/abs/2303.17651) | Linear | Language Model | Verifier-Aware | Alignment | 63 | | [ReAct](https://arxiv.org/abs/2210.03629) | Linear | Search Engine | Verifier-Aware | Knowledge | 64 | | [Constrative Decoding](https://arxiv.org/abs/2210.15097) | Linear | Language Model | Verifier-Guided | General | 65 | | [Chain-of-Verification](https://arxiv.org/abs/2309.11495) | Linear | Language Model | Verifier-Guided | Knowledge | 66 | | [Inverse Value Learning](https://arxiv.org/abs/2410.21027) | Linear | Language Model | Verifier-Guided | General | 67 | | [PRM](https://arxiv.org/abs/2305.20050) | Linear | PRM | Verifier-Guided | Math | 68 | | [KGR](https://arxiv.org/abs/2311.13314) | Linear | Knowledge Graph | Verifier-Guided | Knowledge | 69 | | [UoT](https://arxiv.org/abs/2402.03271) | Tree | Language Model | Verifier-Guided | General | 70 | | [ToT](https://arxiv.org/abs/2305.10601) | Tree | Language Model | Verifier-Guided | Reasoning | 71 | 72 | 73 | 74 | # Citation 75 | 76 | If you find our repo useful in your research, please consider citing: 77 | 78 | ``` 79 | @article{VerifierEngineering, 80 | title={Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering}, 81 | author={Xinyan Guan, Yanjiang Liu, Xinyu Lu, Boxi Cao, Ben He, Xianpei Han, Le Sun, Jie Lou, Bowen Yu, Yaojie Lu, Hongyu Lin}, 82 | journal={arXiv preprint arXiv:2411.11504}, 83 | url={https://arxiv.org/abs/2411.11504} 84 | year={2024} 85 | } 86 | 87 | ``` 88 | 89 | 90 | ## Star History 91 | 92 | [![Star History Chart](https://api.star-history.com/svg?repos=icip-cas/Verifier-Engineering&type=Date)](https://star-history.com/#icip-cas/Verifier-Engineering&Date) 93 | -------------------------------------------------------------------------------- /images/comparison.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icip-cas/Verifier-Engineering/50cb4bc140915a69995d36f74fd2220ecd47f973/images/comparison.png -------------------------------------------------------------------------------- /images/framework.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icip-cas/Verifier-Engineering/50cb4bc140915a69995d36f74fd2220ecd47f973/images/framework.png -------------------------------------------------------------------------------- /images/taxonomy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icip-cas/Verifier-Engineering/50cb4bc140915a69995d36f74fd2220ecd47f973/images/taxonomy.png -------------------------------------------------------------------------------- /paper/ve.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/icip-cas/Verifier-Engineering/50cb4bc140915a69995d36f74fd2220ecd47f973/paper/ve.pdf --------------------------------------------------------------------------------