├── .gitignore
├── README.md
├── images
├── comparison.png
├── framework.png
└── taxonomy.png
└── paper
└── ve.pdf
/.gitignore:
--------------------------------------------------------------------------------
1 | */.DS_Store
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
2 |
3 |
4 | 
5 |
6 |
7 | This is a collection of papers and other resources for verifier engineering, which corresponds to the paper [Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering](paper/ve.pdf). We will update the paper content and this repo regularly, and we very much welcome suggestions of any kind.
8 |
9 | > [!NOTE]
10 | > 🌟 **Feel free to submit pull requests to share your work and insights from the perspective of verifier engineering - your contributions are always welcome!**
11 |
12 |
13 |
14 | ## Overview of Common Verifiers
15 |
16 | | **Verifier Type** | **Verification Form** | **Verify Granularity** | **Verifier Source** | **Extra Training** |
17 | |-----------------------|---------------------------------|---------------------------------------|-----------------------|--------------------|
18 | | Golden Annotation | Binary/Text | Thought Step/Full Trajectory | Program Based | No |
19 | | Rule-based | Binary/Text | Thought Step/Full Trajectory | Program Based | No |
20 | | Code Interpreter | Binary/Score/Text | Token/Thought Step/Full Trajectory | Program Based | No |
21 | | ORM | Binary/Score/Rank/Text | Full Trajectory | Model Based | Yes |
22 | | Language Model | Binary/Score/Rank/Text | Thought Step/Full Trajectory | Model Based | Yes |
23 | | Tool | Binary/Score/Rank/Text | Token/Thought Step/Full Trajectory | Program Based | No |
24 | | Search Engine | Text | Thought Step/Full Trajectory | Program Based | No |
25 | | PRM | Score | Token/Thought Step | Model Based | Yes |
26 | | Knowledge Graph | Text | Thought Step/Full Trajectory | Program Based | No |
27 |
28 |
29 | ## A Verifier Engineering Perspective on Post-training Methods
30 |
31 |
32 |
33 |
34 |
35 |
36 | | | Search | Verify | Feedback | Task |
37 | |---------------------------|-----------------------------------|----------------------------------|-----------------------------------|------------------------|
38 | | [STar](https://arxiv.org/abs/2203.14465)
[RFT](https://arxiv.org/abs/2308.01825)
[WizardMath](https://arxiv.org/abs/2308.09583) | Linear | Golden Annotation | Imitation Learning | Math |
39 | | [CAG](https://arxiv.org/abs/2404.06809) | Linear | Golden Annotation | Imitation Learning | RAG |
40 | | [Self-Instruct](https://arxiv.org/abs/2212.10560) | Linear | Rule-based | Imitation Learning | General |
41 | | [Code Alpaca](https://github.com/sahil280114/codealpaca)
[WizardCoder](https://arxiv.org/abs/2306.08568) | Linear | Rule-based | Imitation Learning | Code |
42 | | [ILF-Code](https://arxiv.org/abs/2303.16749) | Linear | Code interpreter
Human | Imitation Learning | Code |
43 | | [RAFT](https://arxiv.org/abs/2403.10131)
[RRHF](https://arxiv.org/abs/2304.05302) | Linear | ORM | Imitation Learning | General |
44 | | [SSO](https://arxiv.org/abs/2410.17131) | Linear | Rule-based | Preference Learning | Alignment |
45 | | [CodeUltraFeedback](https://arxiv.org/abs/2403.09032) | Linear | Language Model | Preference Learning | Code |
46 | | [Self-Rewarding](https://arxiv.org/abs/2401.10020) | Linear | Language Model | Preference Learning | Alignment |
47 | | [StructRAG](https://arxiv.org/abs/2410.08815) | Linear | Language Model | Preference Learning | RAG |
48 | | [MCTS-DPO](https://arxiv.org/html/2405.00451v2) | Tree | Language Model | Preference Learning | Math |
49 | | [Chain of Preference Optimization](https://arxiv.org/abs/2406.09136) | Tree | Language Model | Preference Learning | Reasoning |
50 | | [LLAMA-BERRY](https://arxiv.org/abs/2410.02884) | Tree | ORM | Preference Learning | Reasoning |
51 | | [Math-Shepherd](https://arxiv.org/abs/2312.08935) | Linear | Golden Annotation
Rule-based | Reinforcement Learning | Math |
52 | | [RLTF](https://arxiv.org/abs/2307.04349)
[PPOCoder](https://arxiv.org/abs/2301.13816) | Linear | Code Interpreter | Reinforcement Learning | Code |
53 | | [RLAIF](https://openreview.net/forum?id=AAxIs3D2ZZ) | Linear | Language Model | Reinforcement Learning | General |
54 | | [SIRLC](https://arxiv.org/abs/2305.14483) | Linear | Language Model | Reinforcement Learning | Reasoning |
55 | | [RLFH](https://arxiv.org/abs/2406.12221) | Linear | Language Model | Reinforcement Learning | Knowledge |
56 | | [RLHF](https://arxiv.org/abs/2203.02155) | Linear | ORM | Reinforcement Learning | Alignment |
57 | | [Quark](https://arxiv.org/abs/2205.13636) | Linear | Tool | Reinforcement Learning | Alignment |
58 | | [RLVR](https://arxiv.org/pdf/2411.15124) | Linear | Binary Verifier | Reinforcement Learning | General |
59 | | [ReST-MCTS](https://arxiv.org/abs/2406.03816) | Tree | Language Model | Reinforcement Learning | Math |
60 | | [CRITIC](https://arxiv.org/abs/2305.11738) | Linear | Code Interpreter
Tool
Search Engine | Verifier-Aware | Math
Code
Knowledge
General |
61 | | [Self-Debug](https://arxiv.org/abs/2304.05128) | Linear | Code Interpreter | Verifier-Aware | Code |
62 | | [Self-Refine](https://arxiv.org/abs/2303.17651) | Linear | Language Model | Verifier-Aware | Alignment |
63 | | [ReAct](https://arxiv.org/abs/2210.03629) | Linear | Search Engine | Verifier-Aware | Knowledge |
64 | | [Constrative Decoding](https://arxiv.org/abs/2210.15097) | Linear | Language Model | Verifier-Guided | General |
65 | | [Chain-of-Verification](https://arxiv.org/abs/2309.11495) | Linear | Language Model | Verifier-Guided | Knowledge |
66 | | [Inverse Value Learning](https://arxiv.org/abs/2410.21027) | Linear | Language Model | Verifier-Guided | General |
67 | | [PRM](https://arxiv.org/abs/2305.20050) | Linear | PRM | Verifier-Guided | Math |
68 | | [KGR](https://arxiv.org/abs/2311.13314) | Linear | Knowledge Graph | Verifier-Guided | Knowledge |
69 | | [UoT](https://arxiv.org/abs/2402.03271) | Tree | Language Model | Verifier-Guided | General |
70 | | [ToT](https://arxiv.org/abs/2305.10601) | Tree | Language Model | Verifier-Guided | Reasoning |
71 |
72 |
73 |
74 | # Citation
75 |
76 | If you find our repo useful in your research, please consider citing:
77 |
78 | ```
79 | @article{VerifierEngineering,
80 | title={Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering},
81 | author={Xinyan Guan, Yanjiang Liu, Xinyu Lu, Boxi Cao, Ben He, Xianpei Han, Le Sun, Jie Lou, Bowen Yu, Yaojie Lu, Hongyu Lin},
82 | journal={arXiv preprint arXiv:2411.11504},
83 | url={https://arxiv.org/abs/2411.11504}
84 | year={2024}
85 | }
86 |
87 | ```
88 |
89 |
90 | ## Star History
91 |
92 | [](https://star-history.com/#icip-cas/Verifier-Engineering&Date)
93 |
--------------------------------------------------------------------------------
/images/comparison.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/icip-cas/Verifier-Engineering/50cb4bc140915a69995d36f74fd2220ecd47f973/images/comparison.png
--------------------------------------------------------------------------------
/images/framework.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/icip-cas/Verifier-Engineering/50cb4bc140915a69995d36f74fd2220ecd47f973/images/framework.png
--------------------------------------------------------------------------------
/images/taxonomy.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/icip-cas/Verifier-Engineering/50cb4bc140915a69995d36f74fd2220ecd47f973/images/taxonomy.png
--------------------------------------------------------------------------------
/paper/ve.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/icip-cas/Verifier-Engineering/50cb4bc140915a69995d36f74fd2220ecd47f973/paper/ve.pdf
--------------------------------------------------------------------------------