└── README.md


/README.md:
--------------------------------------------------------------------------------
 1 | # Annotation Reading List
 2 | A reading list of relevant papers and projects on foundation model annotation.
 3 | 
 4 | 
 5 | ## Consitutional AI and Self-Alignment
 6 | - [Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision](https://arxiv.org/abs/2305.03047) (May 2023)
 7 | - [Constitutional AI: Harmlessness from AI Feedback](https://arxiv.org/abs/2212.08073) (Dec 2022)
 8 | 
 9 | ## Critique
10 | - [LLM Critics Help Catch LLM Bugs](https://arxiv.org/abs/2407.00215) (Jun 2024)
11 | - [CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation](https://arxiv.org/abs/2311.18702) (Nov 2023)
12 | - [Enabling Scalable Oversight via Self-Evolving Critic](https://arxiv.org/abs/2501.05727) (Jan 2025)
13 | - [Self-critiquing models for assisting human evaluators](https://arxiv.org/abs/2206.05802) (Jun 2022)
14 | - [Critique-out-Loud Reward Models](https://arxiv.org/abs/2408.11791) (Aug 2024)
15 | - [Self-Generated Critiques Boost Reward Modeling for Language Models](https://arxiv.org/abs/2411.16646) (Nov 2024)
16 | 
17 | ## Debate
18 | - [AI Safety via Debate](https://arxiv.org/abs/1805.00899) (May 2018)
19 | - [Scalable AI Safety via Doubly-Efficient Debate](https://arxiv.org/abs/2311.14125) (Nov 2023)
20 | - [Debating with More Persuasive LLMs Leads to More Truthful Answers](https://arxiv.org/abs/2402.06782) (Jul 2024)
21 | - [Improving Factuality and Reasoning in Language Models through Multiagent Debate](https://arxiv.org/abs/2305.14325) (May 2023)
22 | - [Scalable AI Safety via Doubly-Efficient Debate](https://arxiv.org/abs/2311.14125) (Nov 2023)
23 | - [Debate Helps Weak-to-Strong Generalization](https://arxiv.org/abs/2501.13124) (Jan 2025)
24 | - [Debate helps supervise unreliable experts](https://arxiv.org/pdf/2311.08702) (Nov 2023)
25 | 
26 | ## Iterated Amplification and Weak-to-Strong Generalization
27 | - [Supervising Strong Learners by Amplifying Weak Experts](https://arxiv.org/abs/1810.08575) (Oct 2018)
28 | - [Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning](https://arxiv.org/abs/2402.00667) (Feb 2024)
29 | - [Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision](https://arxiv.org/abs/2403.09472) (Mar 2024)
30 | - [Eliciting Strong Capabilities With Weak Supervision](https://arxiv.org/abs/2312.09390) (Dec 2023)
31 | 
32 | ## LLM-as-judge
33 | - [A Survey on LLM-as-a-Judge](https://arxiv.org/abs/2411.15594) (Nov 2024)
34 | - [Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges](https://arxiv.org/abs/2406.12624) (Jun 2024)
35 | - [JudgeLM: Fine-tuned Large Language Models are Scalable Judges](https://arxiv.org/abs/2310.17631) (Oct 2023)
36 | - [Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena](https://arxiv.org/abs/2306.05685) (Jun 2023)
37 | 
38 | ## Misc. annotation techniques
39 | - [Scalable Oversight by Accounting for Unreliable Feedback](https://openreview.net/forum?id=Noy5wbyiCS) (Jun 2024)
40 | - [Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback](https://arxiv.org/abs/2410.19133) (Jan 2025)
41 | - [ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks](https://arxiv.org/abs/2303.15056) (Mar 2023)
42 | - [AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators](https://arxiv.org/abs/2303.16854) (Mar 2023)
43 | - [LLMaAA: Making Large Language Models as Active Annotators](https://arxiv.org/abs/2310.19596) (Oct 2023)
44 | - [Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models](https://arxiv.org/abs/2312.06585 ) (Dec 2023)
45 |   
46 | 


--------------------------------------------------------------------------------