└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Annotation Reading List 2 | A reading list of relevant papers and projects on foundation model annotation. 3 | 4 | 5 | ## Consitutional AI and Self-Alignment 6 | - [Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision](https://arxiv.org/abs/2305.03047) (May 2023) 7 | - [Constitutional AI: Harmlessness from AI Feedback](https://arxiv.org/abs/2212.08073) (Dec 2022) 8 | 9 | ## Critique 10 | - [LLM Critics Help Catch LLM Bugs](https://arxiv.org/abs/2407.00215) (Jun 2024) 11 | - [CritiqueLLM: Scaling LLM-as-Critic for Effective and Explainable Evaluation of Large Language Model Generation](https://arxiv.org/abs/2311.18702) (Nov 2023) 12 | - [Enabling Scalable Oversight via Self-Evolving Critic](https://arxiv.org/abs/2501.05727) (Jan 2025) 13 | - [Self-critiquing models for assisting human evaluators](https://arxiv.org/abs/2206.05802) (Jun 2022) 14 | - [Critique-out-Loud Reward Models](https://arxiv.org/abs/2408.11791) (Aug 2024) 15 | - [Self-Generated Critiques Boost Reward Modeling for Language Models](https://arxiv.org/abs/2411.16646) (Nov 2024) 16 | 17 | ## Debate 18 | - [AI Safety via Debate](https://arxiv.org/abs/1805.00899) (May 2018) 19 | - [Scalable AI Safety via Doubly-Efficient Debate](https://arxiv.org/abs/2311.14125) (Nov 2023) 20 | - [Debating with More Persuasive LLMs Leads to More Truthful Answers](https://arxiv.org/abs/2402.06782) (Jul 2024) 21 | - [Improving Factuality and Reasoning in Language Models through Multiagent Debate](https://arxiv.org/abs/2305.14325) (May 2023) 22 | - [Scalable AI Safety via Doubly-Efficient Debate](https://arxiv.org/abs/2311.14125) (Nov 2023) 23 | - [Debate Helps Weak-to-Strong Generalization](https://arxiv.org/abs/2501.13124) (Jan 2025) 24 | - [Debate helps supervise unreliable experts](https://arxiv.org/pdf/2311.08702) (Nov 2023) 25 | 26 | ## Iterated Amplification and Weak-to-Strong Generalization 27 | - [Supervising Strong Learners by Amplifying Weak Experts](https://arxiv.org/abs/1810.08575) (Oct 2018) 28 | - [Improving Weak-to-Strong Generalization with Scalable Oversight and Ensemble Learning](https://arxiv.org/abs/2402.00667) (Feb 2024) 29 | - [Easy-to-Hard Generalization: Scalable Alignment Beyond Human Supervision](https://arxiv.org/abs/2403.09472) (Mar 2024) 30 | - [Eliciting Strong Capabilities With Weak Supervision](https://arxiv.org/abs/2312.09390) (Dec 2023) 31 | 32 | ## LLM-as-judge 33 | - [A Survey on LLM-as-a-Judge](https://arxiv.org/abs/2411.15594) (Nov 2024) 34 | - [Judging the Judges: Evaluating Alignment and Vulnerabilities in LLMs-as-Judges](https://arxiv.org/abs/2406.12624) (Jun 2024) 35 | - [JudgeLM: Fine-tuned Large Language Models are Scalable Judges](https://arxiv.org/abs/2310.17631) (Oct 2023) 36 | - [Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena](https://arxiv.org/abs/2306.05685) (Jun 2023) 37 | 38 | ## Misc. annotation techniques 39 | - [Scalable Oversight by Accounting for Unreliable Feedback](https://openreview.net/forum?id=Noy5wbyiCS) (Jun 2024) 40 | - [Hybrid Preferences: Learning to Route Instances for Human vs. AI Feedback](https://arxiv.org/abs/2410.19133) (Jan 2025) 41 | - [ChatGPT Outperforms Crowd-Workers for Text-Annotation Tasks](https://arxiv.org/abs/2303.15056) (Mar 2023) 42 | - [AnnoLLM: Making Large Language Models to Be Better Crowdsourced Annotators](https://arxiv.org/abs/2303.16854) (Mar 2023) 43 | - [LLMaAA: Making Large Language Models as Active Annotators](https://arxiv.org/abs/2310.19596) (Oct 2023) 44 | - [Beyond Human Data: Scaling Self-Training for Problem-Solving with Language Models](https://arxiv.org/abs/2312.06585 ) (Dec 2023) 45 | 46 | --------------------------------------------------------------------------------