├── .gitignore
├── README.md
├── images
    ├── comparison.png
    ├── framework.png
    └── taxonomy.png
└── paper
    └── ve.pdf


/.gitignore:
--------------------------------------------------------------------------------
1 | */.DS_Store


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering
 2 | 
 3 | 
 4 | ![Overview](images/framework.png)
 5 | 
 6 | 
 7 | This is a collection of papers and other resources for verifier engineering, which corresponds to the paper [Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering](paper/ve.pdf). We will update the paper content and this repo regularly, and we very much welcome suggestions of any kind.
 8 | 
 9 | > [!NOTE]  
10 | > 🌟 **Feel free to submit pull requests to share your work and insights from the perspective of verifier engineering - your contributions are always welcome!**
11 | 
12 | 
13 | 
14 | ## Overview of Common Verifiers
15 | 
16 | | **Verifier Type**    | **Verification Form**           | **Verify Granularity**               | **Verifier Source**   | **Extra Training** |
17 | |-----------------------|---------------------------------|---------------------------------------|-----------------------|--------------------|
18 | | Golden Annotation     | Binary/Text                   | Thought Step/Full Trajectory         | Program Based         | No                 |
19 | | Rule-based            | Binary/Text                   | Thought Step/Full Trajectory         | Program Based         | No                 |
20 | | Code Interpreter      | Binary/Score/Text             | Token/Thought Step/Full Trajectory   | Program Based         | No                 |
21 | | ORM                   | Binary/Score/Rank/Text        | Full Trajectory                      | Model Based           | Yes                |
22 | | Language Model        | Binary/Score/Rank/Text        | Thought Step/Full Trajectory         | Model Based           | Yes                |
23 | | Tool                  | Binary/Score/Rank/Text        | Token/Thought Step/Full Trajectory   | Program Based         | No                 |
24 | | Search Engine         | Text                          | Thought Step/Full Trajectory         | Program Based         | No                 |
25 | | PRM                   | Score                         | Token/Thought Step                   | Model Based           | Yes                |
26 | | Knowledge Graph       | Text                          | Thought Step/Full Trajectory         | Program Based         | No                 |
27 | 
28 | 
29 | ## A Verifier Engineering Perspective on Post-training Methods
30 | 
31 | 
32 | 
33 | 
34 | 
35 | 
36 | |  | Search         | Verify                          | Feedback                   | Task                   |
37 | |---------------------------|-----------------------------------|----------------------------------|-----------------------------------|------------------------|
38 | | [STar](https://arxiv.org/abs/2203.14465) <br>  [RFT](https://arxiv.org/abs/2308.01825) <br>  [WizardMath](https://arxiv.org/abs/2308.09583)    | Linear                            | Golden Annotation               | Imitation Learning     | Math       |
39 | | [CAG](https://arxiv.org/abs/2404.06809)                      | Linear                            | Golden Annotation               | Imitation Learning     | RAG        |
40 | | [Self-Instruct](https://arxiv.org/abs/2212.10560)            | Linear                     | Rule-based                      | Imitation Learning     | General    |
41 | | [Code Alpaca](https://github.com/sahil280114/codealpaca) <br>  [WizardCoder](https://arxiv.org/abs/2306.08568) | Linear                            | Rule-based                      | Imitation Learning     | Code       |
42 | | [ILF-Code](https://arxiv.org/abs/2303.16749)                 | Linear                      | Code interpreter <br>  Human        | Imitation Learning     | Code       |
43 | | [RAFT](https://arxiv.org/abs/2403.10131) <br>  [RRHF](https://arxiv.org/abs/2304.05302)                | Linear                            | ORM                             | Imitation Learning     | General    |
44 | | [SSO](https://arxiv.org/abs/2410.17131)                      | Linear                            | Rule-based                      | Preference Learning    | Alignment  |
45 | | [CodeUltraFeedback](https://arxiv.org/abs/2403.09032)        | Linear                            | Language Model                  | Preference Learning    | Code       |
46 | | [Self-Rewarding](https://arxiv.org/abs/2401.10020)           | Linear                            | Language Model                  | Preference Learning    | Alignment  |
47 | | [StructRAG](https://arxiv.org/abs/2410.08815)                | Linear                            | Language Model                  | Preference Learning    | RAG        |
48 | | [MCTS-DPO](https://arxiv.org/html/2405.00451v2)                 | Tree                              | Language Model                  | Preference Learning    | Math       |
49 | | [Chain of Preference Optimization](https://arxiv.org/abs/2406.09136) | Tree                     | Language Model                  | Preference Learning    | Reasoning  |
50 | | [LLAMA-BERRY](https://arxiv.org/abs/2410.02884)              | Tree                              | ORM                             | Preference Learning    | Reasoning  |
51 | | [Math-Shepherd](https://arxiv.org/abs/2312.08935)            | Linear                            | Golden Annotation <br>  Rule-based  | Reinforcement Learning | Math       |
52 | | [RLTF](https://arxiv.org/abs/2307.04349) <br>  [PPOCoder](https://arxiv.org/abs/2301.13816)           | Linear                            | Code Interpreter                | Reinforcement Learning | Code       |
53 | | [RLAIF](https://openreview.net/forum?id=AAxIs3D2ZZ)                    | Linear                            | Language Model                  | Reinforcement Learning | General    |
54 | | [SIRLC](https://arxiv.org/abs/2305.14483)                    | Linear                            | Language Model                  | Reinforcement Learning | Reasoning  |
55 | | [RLFH](https://arxiv.org/abs/2406.12221)                     | Linear                            | Language Model                  | Reinforcement Learning | Knowledge  |
56 | | [RLHF](https://arxiv.org/abs/2203.02155)                     | Linear                            | ORM                             | Reinforcement Learning | Alignment  |
57 | | [Quark](https://arxiv.org/abs/2205.13636)                   | Linear                            | Tool                            | Reinforcement Learning | Alignment  |
58 | | [RLVR](https://arxiv.org/pdf/2411.15124)                | Linear                            | Binary Verifier                       | Reinforcement Learning | General    |
59 | | [ReST-MCTS](https://arxiv.org/abs/2406.03816)               | Tree                              | Language Model                  | Reinforcement Learning | Math       |
60 | | [CRITIC](https://arxiv.org/abs/2305.11738)                   | Linear                            | Code Interpreter <br>  Tool <br>  Search Engine | Verifier-Aware  | Math <br>  Code <br>  Knowledge <br>  General |
61 | | [Self-Debug](https://arxiv.org/abs/2304.05128)           | Linear                            | Code Interpreter                | Verifier-Aware         | Code       |
62 | | [Self-Refine](https://arxiv.org/abs/2303.17651)              | Linear                            | Language Model                  | Verifier-Aware         | Alignment  |
63 | | [ReAct](https://arxiv.org/abs/2210.03629)                    | Linear                            | Search Engine                   | Verifier-Aware         | Knowledge  |
64 | | [Constrative Decoding](https://arxiv.org/abs/2210.15097)     | Linear                            | Language Model                  | Verifier-Guided        | General    |
65 | | [Chain-of-Verification](https://arxiv.org/abs/2309.11495)    | Linear                            | Language Model                  | Verifier-Guided        | Knowledge  |
66 | | [Inverse Value Learning](https://arxiv.org/abs/2410.21027)   | Linear                            | Language Model                  | Verifier-Guided        | General    |
67 | | [PRM](https://arxiv.org/abs/2305.20050)                  | Linear                            | PRM                             | Verifier-Guided        | Math       |
68 | | [KGR](https://arxiv.org/abs/2311.13314)                      | Linear                            | Knowledge Graph                 | Verifier-Guided        | Knowledge  |
69 | | [UoT](https://arxiv.org/abs/2402.03271)                      | Tree                              | Language Model                  | Verifier-Guided        | General    |
70 | | [ToT](https://arxiv.org/abs/2305.10601)                      | Tree                              | Language Model                  | Verifier-Guided        | Reasoning  |
71 | 
72 | 
73 | 
74 | # Citation
75 | 
76 | If you find our repo useful in your research, please consider citing:
77 | 
78 | ```
79 | @article{VerifierEngineering,
80 |     title={Search, Verify and Feedback: Towards Next Generation Post-training Paradigm of Foundation Models via Verifier Engineering},
81 |     author={Xinyan Guan, Yanjiang Liu, Xinyu Lu, Boxi Cao, Ben He, Xianpei Han, Le Sun, Jie Lou, Bowen Yu, Yaojie Lu, Hongyu Lin},
82 |     journal={arXiv preprint arXiv:2411.11504},
83 |     url={https://arxiv.org/abs/2411.11504}
84 |     year={2024}
85 | }
86 | 
87 | ```
88 | 
89 | 
90 | ## Star History
91 | 
92 | [![Star History Chart](https://api.star-history.com/svg?repos=icip-cas/Verifier-Engineering&type=Date)](https://star-history.com/#icip-cas/Verifier-Engineering&Date)
93 | 


--------------------------------------------------------------------------------
/images/comparison.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/icip-cas/Verifier-Engineering/50cb4bc140915a69995d36f74fd2220ecd47f973/images/comparison.png


--------------------------------------------------------------------------------
/images/framework.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/icip-cas/Verifier-Engineering/50cb4bc140915a69995d36f74fd2220ecd47f973/images/framework.png


--------------------------------------------------------------------------------
/images/taxonomy.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/icip-cas/Verifier-Engineering/50cb4bc140915a69995d36f74fd2220ecd47f973/images/taxonomy.png


--------------------------------------------------------------------------------
/paper/ve.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/icip-cas/Verifier-Engineering/50cb4bc140915a69995d36f74fd2220ecd47f973/paper/ve.pdf


--------------------------------------------------------------------------------