├── .gitignore ├── .DS_Store └── readme.md /.gitignore: -------------------------------------------------------------------------------- 1 | note.md 2 | -------------------------------------------------------------------------------- /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ShiyuNee/Awesome-LMs-Perception-of-Their-Knowledge-Boundaries-Papers/HEAD/.DS_Store -------------------------------------------------------------------------------- /readme.md: -------------------------------------------------------------------------------- 1 | # Language Models' Perception of Their Knowledge Boundaries 2 | 3 | > A curated list of awesome papers about LMs' perception of their knowledge boundaries. This repository will be continuously updated. If I missed any papers, feel free to open a PR to include them! And any feedback and contributions are welcome! 4 | 5 | Knowing when LLMs lack knowledge enables them to express "I don't know" and trigger retrieval when they cannot provide correct answers. Consequently, much research focuses on **LLMs' perception of their knowledge boundaries—that is, whether they recognize what they know and what they don't.** 6 | 7 | > :star: represents the same series of papers 8 | 9 | ## Contents 10 | 11 | - [Language Models' Perception of Their Knowledge Boundaries](#language-models-perception-of-their-knowledge-boundaries) 12 | - [Contents](#contents) 13 | - [LMs' Perception of Their Knowledge Boundaries](#lms-perception-of-their-knowledge-boundaries) 14 | - [Survey or Foundation Papers](#survey-or-foundation-papers) 15 | - [White-box Investigation](#white-box-investigation) 16 | - [Training The Language Model](#training-the-language-model) 17 | - [Utilizing Internal States or Attention Weights](#utilizing-internal-states-or-attention-weights) 18 | - [Grey-box Investigation](#grey-box-investigation) 19 | - [Black-box Investigation](#black-box-investigation) 20 | - [Adaptive RAG](#adaptive-rag) 21 | 22 | - [Reasoning Models' Perception of Their Knowledge Boundaries](#reasoning-models-perception-of-their-knowledge-boundaries) 23 | 24 | - [Applications of Confidence](#applications-of-confidence) 25 | 26 | ## LMs' Perception of Their Knowledge Boundaries 27 | 28 | These methods focus on determining whether the model can provide a correct answer but do not perform adaptive Retrieval-Augmented Generation (RAG). 29 | 30 | ### Survey or Foundation Papers 31 | 32 | These papers are surveys or fairly comprehensive foundational studies. 33 | 34 | - [Machine Learning 2021] [Aleatoric and Epistemic Uncertainty in Machine Learning: An Introduction to Concepts and Methods](https://arxiv.org/abs/1910.09457) *Eyke H ̈ullermeier et.al.* 21 Oct 2019 35 | 36 | - [Anthropic] [Language Models (Mostly) Know What They Know](https://arxiv.org/abs/2207.05221) *Saurav Kadavath et.al.* 11 Jul 2022 37 | 38 | > Study whether language models can evaluate the validity of their own claims and predict which questions they will be able to answer correctly 39 | 40 | - [Survey] [Know Your Limits: A Survey of Abstention in Large Language Models](https://arxiv.org/abs/2407.18418) *Bingbing Wen et.al.* 25 Jul 2024 41 | 42 | > Introduce a framework to examine abstention from three perspectives: the query, the model, and human values. We organize the literature on abstention methods, benchmarks, and evaluation metrics using this framework, and discuss merits and limitations of prior work 43 | 44 | - [Survey] [A Survey on the Honesty of Large Language Models](https://arxiv.org/abs/2409.18786) *Siheng Li et.al.* 27 Sep 2024 45 | 46 | > A survey on the honesty of LLMs, covering its clarification, evaluation approaches, and strategies for improvement 47 | 48 | - [Survey] [A Survey on Uncertainty Quantification of Large Language Models: Taxonomy, Open Research Challenges, and Future Directions](https://arxiv.org/abs/2412.05563) *Ola Shorinwa et.al.* 7 Dec 2024 49 | 50 | - [Survey] [Knowledge Boundary of Large Language Models: A Survey](https://arxiv.org/abs/2412.12472) *Moxin Li et.al.* 17 Dec 2024 51 | 52 | - [Survey] [A Survey of Uncertainty Estimation Methods on Large Language Models](https://arxiv.org/pdf/2503.00172) *Zhiqiu Xia et.al.* 28 Feb 2025 53 | 54 | - [Tutorial] [Uncertainty Quantification for Large Language Models](https://aclanthology.org/2025.acl-tutorials.3/) *Artem Shelmanov et.al.* 27 July 2025 55 | 56 | ### White-box Investigation 57 | 58 | These methods require access to the full set of model parameters, such as for model training or using internal signals of the model. 59 | 60 | #### Training The Language Model 61 | 62 | - [NeurIPS 2017] [Simple and Scalable Predictive Uncertainty Estimation using Deep Ensembles](https://arxiv.org/pdf/1612.01474) *Balaji Lakshminarayanan et.al.* 5 Dec 2016 63 | 64 | > Shows ensembles can be used not only for prediction but also for uncertainty estimation. 65 | 66 | - [ICLR 2021] [Uncertainty Estimation in Autoregressive Structured Prediction](https://arxiv.org/abs/2002.07650) *Andrey Malinin et.al.* 18 Feb 2020 67 | 68 | > Shows how to use ensembles for UQ of generative LMs. 69 | 70 | - [EMNLP 2020, **Token-prob-based Confidence**] [Calibration of Pre-trained Transformers](https://arxiv.org/pdf/2003.07892) *Shrey Desai et.al.* 17 Mar 2020 71 | 72 | > Investigate calibration in pre-trained transformer models & in-domain and OOD settings. Find: 1) Pre-trained models are calibrated in-domain. 2) Label smooth is better that temperature scaling in OOD setting 73 | 74 | - [TACL 2021, **Token-prob-based Confidence**] [How Can We Know When Language Models Know? On the Calibration of Language Models for Question Answering](https://arxiv.org/abs/2012.00955) *Zhengbao Jiang et.al.* 2 Dec 2020 75 | 76 | >1)Investigate calibration (answerr: not good) in generative language models (e.g., T5) in QA task (OOD settings). 2) Examine the effectiveness of some methods (fine-tuning, post-hoc probability modification, or adjustment of the predicted outputs or inputs) 77 | 78 | - [TMLR 2022, **Token-prob-based & Verbalized Confidence**] [Teaching Models to Express Their Uncertainty in Words](https://arxiv.org/abs/2205.14334) *Stephanie Lin et.al.* 28 May 2022 79 | 80 | > The first time a model has been shown to express calibrated uncertainty about its own answers in natural language. For testing calibration, we introduce the CalibratedMath suite of tasks 81 | 82 | - [ACL 2023, **Token-prob-based Confidence**] [A Close Look into the Calibration of Pre-trained Language Models](https://arxiv.org/abs/2211.00151) *Yangyi Chen et.al.* 31 Oct 2022 83 | 84 | > Answer two questions: (1) Do PLMs learn to become calibrated in the training process? (No) (2) How effective are existing calibration methods? (learnable methods significantly reduce PLMs’ confidence in wrong predictions) 85 | 86 | - [NeurIPS 2024, **Verbalized Confidence & Self-consistency**] [Alignment for Honesty](https://arxiv.org/abs/2312.07000) *Yuqing Yang et.al.* 12 Dec 2023 87 | 88 | > 1)Establishing a precise problem definition and defining “honesty” 2)introduce a flexible training framework which emphasize honesty without sacrificing performance on other tasks 89 | 90 | - [ACL 2024, **Verbalized Confidence & RL**] [When to Trust LLMs: Aligning Confidence with Response Quality](https://arxiv.org/abs/2404.17287) *Shuchang Tao et.al.* 26 Apr 2024 91 | 92 | - [NAACL 2024, **Verbalized Confidence**, **Outstanding Paper Award**] [R-Tuning: Instructing Large Language Models to Say ‘I Don’t Know’](https://arxiv.org/pdf/2311.09677) *Hanning Zhang et.al.* 7 Jun 2024 93 | 94 | > Proposeing a supervised finetuning and an unsupervised finetuning method: 95 | > 1)supervised:Add certainty tags to QA dataset based on model's answer correctness. Train the model to express uncertainty when not sure about its answer. 96 | > 2)unsupervised:Firstly, generate answer multiple times, and calculate entropy based on answer frequency(similar to semantic entropy but didn't use a NLI model).Secondly, separate high entropy data to 'uncertain' set and low entropy data to 'certain' set and finetune model. 97 | > Interestingly, unsupervised learning can improve both accuracy and calibration. 98 | 99 | - [​ :star: NeurIPS Safe Generative AI Workshop 2024, **Semantic Uncertainty**] [Fine-Tuning Large Language Models to Appropriately Abstain with Semantic Entropy](https://arxiv.org/abs/2410.17234) *Benedict Aaron Tjandra et.al.* 22 Oct 2024 100 | 101 | > Existing methods rely on the existence of ground-truth labels or are limited to short-form responses. This paper proposes fine-tuning using semantic entropy, an uncertainty measure derived from introspection into the model which does not require external labels. 102 | 103 | - [Arixv] [ConfTuner: Training Large Language Models to Express Their Confidence Verbally](https://arxiv.org/pdf/2508.18847) *Yibo Li et.al.* 26 Aug 2025 104 | 105 | - [Arxiv] [Teaching Language Models to Faithfully Express their Uncertainty](Teaching Language Models to Faithfully Express their Uncertainty) *Bryan Eikema et.al.* 14 October 2025 106 | 107 | > The model is enabled to verbalize internal uncertainty on a specific claim through SFT. The uncertainty regarding specific claims is measured by the consistency across multiple generations. 108 | 109 | - [Arxiv] [CritiCal: Can Critique Help LLM Uncertainty or Confidence Calibration?](https://arxiv.org/abs/2510.24505) *Qing Zong et.al.* 28 October 2025 110 | 111 | > When teaching the model to output verbalized confidence, **critique-based learning is introduced**. Attempts were made to generate training data either via the model’s own self-critique or using a teacher model (GPT-4o). **Input: question + answer + confidence; Output: Critic**. SFT and DPO training are attempted, demonstrating that critique-based learning yields more reliable confidence calibration 112 | 113 | 114 | #### Utilizing Internal States or Attention Weights 115 | 116 | These papers focus on determining the truth of a statement or the model’s ability to provide a correct answer by analyzing the model’s internal states or attention weights. It usually involves using mathematical methods to extract features or training a lightweight MLP (Multi-Layer Perceptron). 117 | 118 | - [EMNLP 2023 Findings] [The Internal State of an LLM Knows When It's Lying](https://arxiv.org/pdf/2304.13734) *Amos Azaria et.al.* 26 Apr 2023 119 | 120 | > LLM’s internal state can be used to reveal the truthfulness of statements(train a classifier using hidden states) 121 | 122 | - [ICLR 2024] [Attention Satisfies: A Constraint-Satisfaction Lens on Factual Errors of Language Models](https://arxiv.org/abs/2309.15098) *Mert Yuksekgonul et.al.* 26 Sep 2023 123 | 124 | > Convert a QA problem into a constraint satisfaction problem (are the constraints in the question satisfied) and focus on the attention weights of each constraint when generating the first token. 125 | 126 | - [EMNLP 2023] [The Curious Case of Hallucinatory (Un)answerability: Finding Truths in the Hidden States of Over-Confident Large Language Models](https://arxiv.org/abs/2310.11877) *Aviv Slobodkin* et.al. 18 Oct 2023 127 | 128 | >Investigate whether models already represent questions’ (un)answerablility when producing answers (Yes) 129 | 130 | 131 | - [ICML 2024] [Thermometer: Towards Universal Calibration for Large Language Models](https://arxiv.org/abs/2403.08819) *Maohao Shen et.al.* 20 Feb 2024 132 | 133 | > An MLP is used to predict an appropriate temperature for each task, performing **temperature scaling.** It requires training on the training set and relies on ground-truth answers. 134 | 135 | - [ICLR 2024] [INSIDE: LLMs’ internal states retain the power of hallucination detection](https://arxiv.org/abs/2402.03744) *Chao Chen et.al.* 6 Feb 2024 136 | 137 | > 1)Propose EigenScore metric using hidden states to better evaluate responses’ self-consistency and 2) Truncate extreme activations in the feature space, which helps identify the overconfident (consistent but wrong) hallucinations 138 | 139 | - [ACL 2024 Findings, **MIND**] [Unsupervised Real-Time Hallucination Detection based on the Internal States of Large Language Models](https://arxiv.org/abs/2403.06448) *Weihang Su et.al.* 11 Mar 2024 140 | 141 | > Introduce MIND, an unsupervised training framework that leverages the internal states of LLMs for real-time hallucination detection without requiring manual annotations. 142 | 143 | - [NAACL 2024] [On Large Language Models' Hallucination with Regard to Known Facts](https://arxiv.org/abs/2403.20009) *Che Jiang et.al.* 29 Mar 2024 144 | 145 | > Investigate the phenomenon of LLMs possessing correct answer knowledge yet still hallucinating from the perspective of inference dynamics 146 | 147 | - [Arxiv, **FacLens**] [Hidden Question Representations Tell Non-Factuality Within and Across Large Language Models](https://arxiv.org/abs/2406.05328) *Yanling Wang et.al.* 8 Jun 2024 148 | 149 | > Studies non-factuality prediction (NFP) before response generation and propose FacLens (train MLP) to enhance efficiency and transferability (across different models, the first in NFP) of NFP 150 | 151 | - [Arxiv] [Towards Fully Exploiting LLM Internal States to Enhance Knowledge Boundary Perception](https://www.arxiv.org/abs/2502.11677) *Shiyu Ni et.al.* 17 Feb 2025 152 | 153 | > This work explores leveraging LLMs’ internal states to enhance their perception of knowledge boundaries from efficiency and risk perspectives. It focuses on: 1) The necessity of estimating model confidence after response generation. 2) Introducing Consistency-based Confidence Calibration ($C^3$), which evaluates confidence consistency through question reformulation. $C^3$ significantly improves LLMs’ ability to recognize their knowledge gaps. 154 | 155 | - [Arxiv] [Analyzing LLMs’ Knowledge Boundary Cognition Across Languages Through the Lens of Internal Representations](https://arxiv.org/pdf/2504.13816) *Chenghao Xiao et.al.* 18 Apr 2025 156 | 157 | - [EMNLP 2025] [Simple Factuality Probes Detect Hallucinations in Long-Form Natural Language Generation](https://aclanthology.org/2025.findings-emnlp.880.pdf) *Jiatong Han et.al.* 158 | 159 | > In the model’s **long-form generation**, a probe is trained to detect claim-level hallucinations **using internal states**. It is found that this approach achieves comparable performance to sampling-based methods, but with 100 times higher efficiency. 160 | 161 | - [AAAI 2026] [Fine-Tuned LLMs Know They Don't Know: A Parameter-Efficient Approach to Recovering Honesty](https://arxiv.org/abs/2511.12991) *Zeyu Shi et.al.* 17 Nov 2025 162 | 163 | ### Grey-box Investigation 164 | 165 | Need to access to the probability of generated tokens. Some methods also rely on the probability of generated tokens; however, since training is involved in the paper, they do not fall into this category. 166 | 167 | - [ICML 2017, **Token-prob-based Confidence**] [On Calibration of Modern Neural Networks](https://arxiv.org/abs/1706.04599) *Chuan Guo et.al.* 14 Jun 2017 168 | 169 | > Investigate calibration in modern neural networks, propose ECE metric, propose enhance calibration via temperature 170 | 171 | 172 | - [TACL 2022, **Verbalized Confidence**] [Reducing conversational agents’ overconfidence through linguistic calibration](https://aclanthology.org/2022.tacl-1.50/) *Sabrina J. Mielke et.al.* 30 Dec 2020 173 | 174 | > 1.Analyze to what extent SOTA chit-chat models are linguistically calibrated (poorly calibrated); 2.Train a much better correctness predictor directly from the chit-chat model’s representations. 3.Use this trained predictor within a controllable generation model which greatly improves the calibration of a SOTA chit-chat model. 175 | 176 | - [ICLR 2023, **Token-prob-based Confidence & Self-consistency**] [Prompting GPT-3 To Be Reliable](https://arxiv.org/abs/2210.09150) *Chenglei Si et.al.* 17 Oct 2022 177 | 178 | > With appropriate prompts, GPT-3 is more reliable (both consistency-based and prob-based confidence estimation) than smaller-scale supervised models 179 | 180 | - [​ :star: ICLR 2023 Spotlight, **Semantic Uncertainty**, **Token-prob-based Confidence & Self-consistency**] [Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation](https://arxiv.org/abs/2302.09664) *Lorenz Kuhn et.al.* 19 Feb 2023 181 | 182 | > Introduce semantic entropy—an entropy which incorporates linguistic invariances created by shared meanings 183 | 184 | - [ACL 2024] [Shifting Attention to Relevance: Towards the Predictive Uncertainty Quantification of Free-Form Large Language Models](https://arxiv.org/abs/2307.01379) *Jinhao Duan et.al.* 3 Jul 2023 185 | 186 | > Assigns token weights as the self-NLI score when token is removed. 187 | 188 | - [ACL 2024] [Fact-Checking the Output of Large Language Models via Token-Level Uncertainty Quantification](https://arxiv.org/abs/2403.04696) *Ekaterina Fadeeva et.al.* 7 Mar 2024 189 | 190 | > The first to investigate the quality of claim-level UQ techniques for LLM generation which approaches it by leveraging token-level uncertainty scores and aggregating them into claim-level scores. They propose CCP which considers uncertainty of expression and claim type/order uncertainty. 191 | 192 | - [ACL 2024, **Token-prob-based & Verbalized Confidence**] [Confidence Under the Hood: An Investigation into the Confidence-Probability Alignment in Large Language Models](https://arxiv.org/abs/2405.16282) *Abhishek Kumar et.al.* 25 May 2024 193 | 194 | > Investigate the alignment between LLMs' internal confidence and verbalized confidence 195 | 196 | 197 | - [​ :star: Nature, **Semantic Uncertainty**] [Detecting hallucinations in large language models using semantic entropy](https://www.nature.com/articles/s41586-024-07421-0) *Sebastian Farquhar et.al.* 19 June 2024 198 | 199 | > The expension of [Semantic Uncertainty: Linguistic Invariances for Uncertainty Estimation in Natural Language Generation](https://arxiv.org/abs/2302.09664) 200 | 201 | - [CCIR 2024, **Token-prob-based & Verbalized Confidence**] [Are Large Language Models More Honest in Their Probabilistic or Verbalized Confidence?](https://arxiv.org/abs/2408.09773) *Shiyu Ni et.al.* 19 Aug 2024 202 | 203 | > Conduct a comprehensive analysis and comparison of LLMs’ probabilistic perception and verbalized perception of their factual knowledge boundaries 204 | 205 | - [Arxiv **Token-prob-based** & **Self-consistency**] [Uncertainty Quantification for LLMs through Minimum Bayes Risk: Bridging Confidence and Consistency](https://arxiv.org/abs/2502.04964) *Roman Vashurin et.al.* 7 Feb 2025 206 | 207 | - [ICLR 2025] [From Risk to Uncertainty: Generating Predictive Uncertainty Measures via Bayesian Estimation](https://arxiv.org/abs/2402.10727) *Nikita Kotelevskii et.al.* 16 Feb 2024 208 | 209 | - [Arxiv] [Human-Alignment and Calibration of Inference-Time Uncertainty in Large Language Models](https://arxiv.org/pdf/2508.08204) *Kyle Moore et.al.* 11 Aug 2025 210 | 211 | > Evaluate how closely model uncertainty aligns to human uncertainty. (strong alignment to human uncertainty) 212 | 213 | - [Arxiv] [Semantic Energy: Detecting LLM Hallucination Beyond Entropy](https://arxiv.org/abs/2508.14496) *Huan Ma et.al.* 20 Aug 2025. 214 | 215 | > This paper introduces **Semantic Energy**, an energy-based uncertainty measure derived from the final-layer logits (before softmax) of language models. By combining semantic clustering with a Boltzmann-style energy formulation, it overcomes the limitations of **Semantic Entropy** in low-diversity settings and significantly improves hallucination detection in LLMs. 216 | 217 | - [NeurIPS 2025] [Your Pre-trained LLM is Secretly an Unsupervised Confidence Calibrator](https://arxiv.org/abs/2505.16690) *Beier Luo et.al.* 22 May 2025. 218 | 219 | > It assumes that the pretrained model is better calibrated. Therefore, it learns a **temperature** on questions answered by both the pretrained model and the chat model, so that the chat model’s logits are brought closer to those of the pretrained model, achieving calibration. This does not require ground-truth answers. 220 | 221 | ### Black-box Investigation 222 | 223 | These methods only require access to the model’s text output. 224 | 225 | - [TACL 2020, **Self-consistency**] [Unsupervised Quality Estimation for Neural Machine Translation](https://arxiv.org/abs/2005.10608) *Marina Fomicheva et.al.* 21 May 2020 226 | 227 | > Compare samples via lexical metrics. 228 | 229 | - [EMNLP 2023, **Selfcheckgpt**, **Self-consistency**] [Selfcheckgpt: Zero-resource black-box hallucination detection for generative large language models](https://arxiv.org/abs/2303.08896) *Potsawee Manakul et.al.* 15 Mar 2023 230 | 231 | > The first to analyze model hallucination of general LLM responses, and is the first zero-resource hallucination detection solution that can be applied to black-box systems 232 | 233 | - [ACL 2024, **Multi-LLM Collaboration**, **Outstanding paper award**] [Don't Hallucinate, Abstain: Identifying LLM Knowledge Gaps via Multi-LLM Collaboration](https://arxiv.org/abs/2402.00367) 234 | 235 | > Aim to identify LLM knowledge gaps and abstain from answering questions when knowledge gaps are present. Contribution: 1) A critical evaluation and typology of diverse existing methods 2) Propose two novel, robust multi-LLM collaboration methods to detect LLM knowledge gaps, COOPERATE and COMPETE 236 | 237 | - [EMNLP 2023, **Token-prob-based & Verbalized Confidence**] [Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback](https://arxiv.org/abs/2305.14975) *Katherine Tian et.al.* 24 May 2023 238 | 239 | >Conduct a broad evaluation of methods for extracting confidence scores from RLHF-LMs 240 | 241 | - [ACL 2023 Findings, **Verbalized Confidence**] [Do Large Language Models Know What They Don't Know?](https://arxiv.org/abs/2305.18153) *Zhangyue Yin et.al.* 29 May 2023 242 | 243 | > Evaluate LLMs’ self-knowledge by assessing their ability to identify unanswerable or unknowable questions 244 | 245 | 246 | - [TMLR 2024, **Self-consistency**] [Generating with Confidence: Uncertainty Quantification for Black-box Large Language Models](https://arxiv.org/abs/2305.19187) *Zhen Lin et.al.* 30 May 2023 247 | 248 | >They explored UQ methods for black-box models in the NLG task and proposed a series of methods (e.g., compute similarity as a graph, with sampled sequences are nodes and pairwise similarities are edges) to evaluate the models' uncertainty regarding inputs and confidence in each generated sequence, primarily based on the similarity of multiple generations. 249 | 250 | - [EACL 2024 Findings, **Self-consistency**] [Do Language Models Know When They’re Hallucinating References](https://arxiv.org/abs/2305.18248) *Ayush Agrawal et.al.* 29 May 2023 251 | 252 | > Focus on hallucinated book and article references due to their frequent and easy-to-discern nature. Identify hallucinated references by asking a set of direct (yes/no questions to directly get the model's confidence) or indirect queries (ask for the authors of the generated reference) to the language model about the references. 253 | 254 | - [ICLR 2024, **Self-consistency & Verbalized Confidence**] [Can LLMs Express Their Uncertainty? An Empirical Evaluation of Confidence Elicitation in LLMs](https://arxiv.org/abs/2306.13063) *Miao Xiong et.al.* 22 Jun 2023 255 | 256 | > Explore black-box approaches for LLM uncertainty estimation. Define a systematic framework with three components: prompting strategies for eliciting verbalized confidence, sampling methods for generating multiple responses, and aggregation techniques for computing consistency 257 | 258 | - [EMNLP 2023, **SAC3**, **Self-consistency & Multi-LLM Collaboration**] [SAC3: Reliable Hallucination Detection in Black-Box Language Models via Semantic-aware Cross-check Consistency](https://arxiv.org/abs/2311.01740) *Jiaxin Zhang et.al.* 3 Nov 2023 259 | 260 | > Extend self-consistency across pertubed questions and different models 261 | 262 | - [ICML 2024, **Self-consistency**] [Language Models with Conformal Factuality Guarantees](https://arxiv.org/abs/2402.10978) *Christopher Mohri et.al.* 15 Feb 2024 263 | 264 | > Propose FrequencyScoring which estimates the confidence of a generation by analyzing its support in other generation. Originally proposed for claim-level, with prompted GPT-4 serving as NLI model. 265 | 266 | - [NeurIPS 2024] [Semantic Density: Uncertainty Quantification for Large Language Models through Confidence Measurement in Semantic Space](https://arxiv.org/abs/2405.13845) *Xin Qiu et.al.* 22 May 2024 267 | 268 | > Compute similarity via semantic density. 269 | 270 | - [Arxiv] [Large Language Model Confidence Estimation via Black-Box Access](https://arxiv.org/abs/2406.04370) *Tejaswini Pedapati et.al.* 1 Jun 2024 271 | 272 | > Engineer novel features and train a (interpretable) model (viz. logistic regression) on these features to estimate the confidence. Design different ways of manipulating the input prompt and produce values based on the variability of the answers for each such manipulation. We aver to these values as features 273 | 274 | - [EMNLP 2024] [Calibrating the Confidence of Large Language Models by Eliciting Fidelity](https://arxiv.org/abs/2404.02655) *Mozhi Zhang et.al.* 3 April 2024 275 | 276 | >Decompose the language model's confidence for each choice into Uncertainty about the question and Fidelity to the answer. First, sample multiple times: 1.**Uncertainty**: The distribution of sampled answers. 2.**Fidelity**: Replace the selected answer with "all other options are wrong," then reselect, observing any changes. Repeat to assess fidelity to each answer. Finally, merge the two components. 277 | 278 | - [Arxiv, **Verbalized Confidence**] [Uncertainty Distillation: Teaching Language Models to Express Semantic Confidence](https://arxiv.org/abs/2503.14749) *Sophia Hager et.al.* 18 March 2025 279 | 280 | - [AAAI 2025, **Self-consistency**] [Explore What LLM Does Not Know in Complex Question Answering](https://ojs.aaai.org/index.php/AAAI/article/view/34638) *Xin Lin et.al.* 11 April 2025 281 | 282 | - [EMNLP 2024] [Can Large Language Models Faithfully Express Their Intrinsic Uncertainty in Words?](https://arxiv.org/abs/2405.16908) *Gal Yona et.al.* 27 May 2024 283 | 284 | > This study investigates whether models can express their internal uncertainty through language and finds that models typically produce answers in highly certain linguistic forms, even when their internal consistency (confidence) is low. 285 | 286 | - [EMNLP 2025] [MetaFaith: Faithful Natural Language Uncertainty Expression in LLMs](https://arxiv.org/abs/2505.24858) *Gabrielle Kaili-May Liu et.al.* 2 Oct 2025 287 | 288 | > The first to define and evaluate faithful calibration at scale from the perspective of linguistic expression, revealing that LLMs are often internally uncertain yet linguistically overconfident. Propose MetaFaith, a prompting framework that introduces three metacognitive strategies for generating calibration prompts: M+Reflect (reflective self-checking), MetSens (uncertainty-sensitive persona prompting), and MetSens+Hedge (linguistic uncertainty expression). 289 | 290 | - [Arxiv, Self-consistency] [Zero-knowledge LLM hallucination detection and mitigation through fine-grained cross-model consistency](https://arxiv.org/abs/2508.14314) *Aman Goel et.al.* 19 Aug 2025 291 | 292 | - [Arxiv] [SeSE: A Structural Information-Guided Uncertainty Quantification Framework for Hallucination Detection in LLMs](https://arxiv.org/abs/2511.16275) *Xingtao Zhao et.al.* 20 Nov 2025 293 | 294 | > Traditional uncertainty methods focus on semantic consistency but overlook the **semantic relationship network among multiple answers**. The authors claim this is the first framework to systematically incorporate semantic structural information—such as hierarchy, directionality, and entailment—into LLM uncertainty estimation 295 | 296 | 297 | ## Adaptive RAG 298 | 299 | These methods focus directly on the “when to retrieve”, designing strategies and evaluating their effectiveness in Retrieval-Augmented Generation (RAG). 300 | 301 | - [ICLR 2023, ReAct] [ReAct: Synergizing Reasoning and Acting in Language Models](https://arxiv.org/abs/2210.03629) *Shunyu yao et.al.* 6 Oct 2022 302 | 303 | - [ACL 2023 Oral, **Adaptive RAG**] [When Not to Trust Language Models: Investigating Effectiveness of Parametric and Non-Parametric Memories (Adaptive RAG)](https://arxiv.org/abs/2212.10511) *Alex Mallen et.al.* 20 Dec 2022 304 | 305 | > Investigate 1) when we should and should not rely on LMs’ parametric knowledge and 2) how scaling and non-parametric memories (e.g., retrievalaugmented LMs) can help. Propose adaptive RAG based on entity popularity 306 | 307 | 308 | - [ACL 2023, **IRCoT**] [Interleaving Retrieval with Chain-of-Thought Reasoning for Knowledge-Intensive Multi-Step Questions](https://arxiv.org/abs/2212.10509) *Harsh Trivedi et.al.* 309 | 310 | >Propose IRCoT which uses retrieval to guide the chain-of-thought (CoT) reasoning steps and uses CoT reasoning to guide the 311 | >retrieval. IRCoT alternates between the two steps: extend CoT & expand retrieved information 312 | 313 | - [EMNLP 2023, **FLARE**] [Active Retrieval Augmented Generation](https://arxiv.org/abs/2305.06983) *Zhengbao Jiang et.al.* 11 May 2023 314 | 315 | > Propose FLARE for long-form generation: Iteratively uses a prediction of the upcoming sentence to anticipate future content, which is then utilized as a query to retrieve relevant documents to regenerate the sentence if it contains low-confidence tokens 316 | 317 | - [EMNLP 2023 Findings, **SKR**] [Self-Knowledge Guided Retrieval Augmentation for Large Language Models](https://arxiv.org/abs/2310.05002) *Yile Wang et.al.* 8 Oct 2023 318 | 319 | > Investigate eliciting the model’s ability to recognize what they know and do not know and propose Self-Knowledge guided Retrieval augmentation (SKR), which can let LLMs adaptively call retrieval 320 | 321 | - [ICLR 2024 Oral, **Self-RAG**] [Self-RAG: Learning to Retrieve, Generate, and Critique through Self-Reflection](https://arxiv.org/abs/2310.11511) *Akari Asai et.al.* 17 Oct 2023 322 | 323 | > Propose a new framework to train an arbitrary LM to learn to **retrieve, generate, and critique (via generating special tokens)**to enhance the factuality and quality of generations, without hurting the versatility of LLMs. 324 | 325 | - [Arxiv, **Rowen**, **Enhanced SAC3**, **Self-consistency & Multi-LLM Collaboration**] [Retrieve Only When It Needs: Adaptive Retrieval Augmentation for Hallucination Mitigation in Large Language Models](https://arxiv.org/abs/2402.10612) *Hanxing Ding et.al.* 16 Feb 2024 326 | 327 | > Introduces Rowen which assesses the model’s uncertainty regarding the input query by evaluating the semantic **inconsistencies** in various responses generated **across different languages** or **models**. 328 | 329 | - [ACL 2024 Findings, **Verbalized Confidence**, **Prompting Methods**] [When Do LLMs Need Retrieval Augmentation? Mitigating LLMs' Overconfidence Helps Retrieval Augmentation](https://arxiv.org/abs/2402.11457) *Shiyu Ni et.al.* 18 Feb 2024 330 | 331 | > 1)Quantitatively measure LLMs’ such ability and confirm their overconfidence 2) study how LLMs’ certainty about a question correlates with their dependence on external retrieved information 3)Propose several prompting methods to enhance LLMs’ perception of knowledge boundaries and show that they are effective in reducing overconfidence 4) equipped with these methods, LLMs can achieve comparable or even better performance of RA with much fewer retrieval calls. 332 | 333 | - [Arxiv, **Position paper**] [Reliable, Adaptable, and Attributable Language Models with Retrieval](https://arxiv.org/abs/2403.03187) *Akari Asai et.al.* 5 Mar 2024 334 | 335 | > Advocate for retrieval-augmented LMs to replace parametric LMs as the next generation of LMs and propose a roadmap for developing general-purpose retrieval-augmented LMs 336 | 337 | - [ACL 2024 Oral, **DRAGIN**, **Enhanced FLARE**] [DRAGIN: Dynamic Retrieval Augmented Generation based on the Information Needs of Large Language Models](https://arxiv.org/pdf/2403.10081) *Weihang Su et.al.* 15 Mar 2024 338 | 339 | > Propose Dragin, focusing on 1) when to retrieve: considers the LLM’s uncertainty about its own generated content, the influence of each token on subsequent tokens, and the semantic significance of each token and 2) what to retrieve: construct query using important words by leveraging the LLM’s self-attention across the entire context 340 | 341 | - [Arxiv, **CtrlA**] [CtrlA: Adaptive Retrieval-Augmented Generation via Inherent Control](https://arxiv.org/abs/2405.18727) *Huanshuo Liu et.al.* 29 May 2024 342 | 343 | > This paper leverages the model's internal states to derive directions for honesty (where the output aligns with internal knowledge) and confidence. It enhances honesty by modifying internal representations and uses the confidence signal to detect retrieval timing. 344 | 345 | - [EMNLP 2024 Findings, **UAR**] [Unified Active Retrieval for Retrieval Augmented Generation](https://arxiv.org/abs/2406.12534) *Qinyuan Cheng et.al.* 18 Jun 2024 346 | 347 | > Propose Unified Active Retrieval (UAR),consists of four orthogonal criteria for determining the retrieval timing: Intent-aware; Knowledge-aware; Time-Sensitive-aware; Selfa-aware. Train four classifiers based on the model's internal states. 348 | 349 | - [Arxiv, **SEAKR**] [SEAKR: Self-aware Knowledge Retrieval for Adaptive Retrieval Augmented Generation](https://arxiv.org/abs/2406.19215) *Zijun Yao et.al.* 27 Jun 2024 350 | 351 | > Use hidden states of the last generated tokens to meauser LLMs' uncertainty and use this uncertainty to decide: when to retrieve, re-rank the retrieved documents, choose the reasoning strategy 352 | 353 | 354 | - [EMNLP 2024, Self-Multi-RAG, **Multi-Round**] [Learning When to Retrieve, What to Rewrite, and How to Respond in Conversational QA](https://arxiv.org/abs/2409.15515) 355 | 356 | > Extend Self-RAG to Conversational QA. Introduces a **self-reflection–based framework** with three key decisions: 357 | > 358 | > 1. **When to Retrieve** – The model evaluates whether new retrieval is needed based on dialogue context and prior retrieved information. Uses **special self-reflection tokens** 359 | > 2. **What to Rewrite** – The model rewrites the dialogue history into a retrieval-friendly query to improve retrieval relevance. 360 | > 3. **How to Respond** – Combines retrieved documents and dialogue context to generate the final answer, with internal evaluation of relevance and grounding. Generates **multiple candidate answers** and scores them based on relevance, support, and utility, selecting the best one 361 | 362 | 363 | - [Arxiv, **Comprehensive study**] [Adaptive Retrieval Without Self-Knowledge? Bringing Uncertainty Back Home](https://arxiv.org/abs/2501.12835) *Viktor Moskvoretskii et.al.* 22 Jan 2025 364 | 365 | >Conduct a comprehensive analysis of 35 adaptive retrieval methods, including 8 recent approaches and 27 uncertainty estimation techniques, across 6 datasets using 10 metrics for QA performance, self-knowledge, and efficiency. 366 | 367 | - [Arxiv, Search-o1] [Search-o1: Agentic Search-Enhanced Large Reasoning Models](https://arxiv.org/abs/2501.05366) *Xiaoxi Li et.al.* 9 Jan 2025 368 | 369 | > Propose **Search-o1**, which equips reasoning models with retrieval capabilities and consists of two parts: (1) Prompting model that generates `query` whenever retrieval is needed; (2) **Reason-in-Documents**, which analyzes retrieved documents, refines them, and injects the distilled content into the chain of reasoning—effectively reducing noise and preserving reasoning fluency to avoid disrupting coherence. 370 | 371 | 372 | - [Arxiv, EMNLP 2025] [LLM-Independent Adaptive RAG: Let the Question Speak for Itself](https://arxiv.org/abs/2505.04253) *Maria Marina et.al.* 7 May 2025 373 | 374 | > Currently, many adaptive retrieval methods rely on the LLM’s own uncertainty estimation to decide whether retrieval is needed. However, estimating uncertainty can be costly and impractical. This paper trains a lightweight classifier **using seven external features—rather than the model’s inherent uncertainty**—to determine whether retrieval is necessary. 375 | 376 | 377 | - [Arxiv, Search-R1] [Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning](https://arxiv.org/abs/2503.09516) *Bowen Jin et.al.* 12 Mar 2025 378 | 379 | > Propose **Search-R1**, the first framework that incorporates a search engine as part of the environment in reinforcement learning. The main workflow of Search-R1 is to train large language models (LLMs) via reinforcement learning to autonomously generate search queries (triggered with ``), dynamically integrate retrieved results (embedded with ``), and carry out multi-round interleaved “think–retrieve–reason” iterations, ultimately producing the final answer. 380 | 381 | - [EMNLP 2025] [Search Wisely: Mitigating Sub-optimal Agentic Searches By Reducing Uncertainty](https://arxiv.org/pdf/2505.17281) *Peilin Wu et.al.* 9 Oct 2025 382 | 383 | ## Reasoning Models' Perception of Their Knowledge Boundaries 384 | 385 | - [Arxiv] [Reasoning Models Know When They're Right: Probing Hidden States for Self-Verification](https://arxiv.org/abs/2504.05419) *Anqi Zhang et.al.* 7 April 2025 386 | - [Arxiv] [Thinking Out Loud: Do Reasoning Models Know When They're Right?](https://arxiv.org/abs/2504.06564) *Qingcheng Zeng et.al.* 9 Apr 2025 387 | - [EMNLP 2025] [The Hallucination Tax of Reinforcement Finetuning](https://arxiv.org/pdf/2505.13988) *Linxin Song et.al.* 20 May 2025 388 | - [Arxiv] [Reasoning about Uncertainty: Do Reasoning Models Know When They Don't Know?](https://arxiv.org/abs/2506.18183) *Zhiting Mei et.al.* 22 Jun 2025 389 | - [Arxiv] [Beyond Binary Rewards: Training LMs to Reason About Their Uncertainty](https://www.arxiv.org/abs/2507.16806) *Mehul Damani et.al.* 22 Jul 2025 390 | - [Arxiv] [Toward Honest Language Models for Deductive Reasoning](https://arxiv.org/abs/2511.09222) *Jiarui Liu et.al.* 12 Nov 2025. 391 | 392 | ## Applications of Confidence 393 | 394 | - [Arxiv] [Confidence Estimation for Text-to-SQL in Large Language Models](https://arxiv.org/abs/2508.14056) *Sepideh Entezari Maleki et.al.* 8 Aug 2025 395 | - [Arxiv] [Deep Think with Confidence](https://arxiv.org/abs/2508.15260) *Yichao Fu et.al.* 21 Aug 2025 396 | - [EMNLP 2025] [CoCoA: Confidence and Context-Aware Adaptive Decoding for Resolving Knowledge Conflicts in Large Language Models](https://arxiv.org/abs/2508.17670) *Anant Khandelwal et.al.* 25 Aug 2025 397 | - [Arxiv] [As If We've Met Before: LLMs Exhibit Certainty in Recognizing Seen Files](https://arxiv.org/abs/2511.15192) *Haoding Li et.al.* 19 Nov 2025 398 | 399 | --------------------------------------------------------------------------------