├── .DS_Store ├── images ├── 1.png ├── 10.png ├── 11.png ├── 2.png ├── 3.png ├── 4.png ├── 5.png ├── 6.png ├── 7.png ├── 8.png ├── 9.png ├── .DS_Store └── 2025-LLM-Hallucination-Index.png ├── hallucination-index-2023.md └── README.md /.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rungalileo/hallucination-index/HEAD/.DS_Store -------------------------------------------------------------------------------- /images/1.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rungalileo/hallucination-index/HEAD/images/1.png -------------------------------------------------------------------------------- /images/10.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rungalileo/hallucination-index/HEAD/images/10.png -------------------------------------------------------------------------------- /images/11.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rungalileo/hallucination-index/HEAD/images/11.png -------------------------------------------------------------------------------- /images/2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rungalileo/hallucination-index/HEAD/images/2.png -------------------------------------------------------------------------------- /images/3.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rungalileo/hallucination-index/HEAD/images/3.png -------------------------------------------------------------------------------- /images/4.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rungalileo/hallucination-index/HEAD/images/4.png -------------------------------------------------------------------------------- /images/5.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rungalileo/hallucination-index/HEAD/images/5.png -------------------------------------------------------------------------------- /images/6.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rungalileo/hallucination-index/HEAD/images/6.png -------------------------------------------------------------------------------- /images/7.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rungalileo/hallucination-index/HEAD/images/7.png -------------------------------------------------------------------------------- /images/8.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rungalileo/hallucination-index/HEAD/images/8.png -------------------------------------------------------------------------------- /images/9.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rungalileo/hallucination-index/HEAD/images/9.png -------------------------------------------------------------------------------- /images/.DS_Store: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rungalileo/hallucination-index/HEAD/images/.DS_Store -------------------------------------------------------------------------------- /images/2025-LLM-Hallucination-Index.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/rungalileo/hallucination-index/HEAD/images/2025-LLM-Hallucination-Index.png -------------------------------------------------------------------------------- /hallucination-index-2023.md: -------------------------------------------------------------------------------- 1 |  2 | # 🌟 LLM Hallucination Index 🌟 3 | ## About the Index 💥 4 | The Hallucination Index is an ongoing initiative to evaluate and rank the most popular LLMs based on propensity to hallucinate. The index uses a comprehensive set of datasets, chosen for their diversity and ability to challenge the model’s abilities to stay on task. 5 | 6 | **Why**: There has yet to be an LLM benchmark report that provides a comprehensive measurement of LLM hallucinations. After all, measuring hallucinations is difficult, as LLM performance varies by task type, dataset, context and more. Further, there isn’t a consistent set of metrics for measuring hallucinations. 7 | 8 | **What**: The Hallucination Index ranks popular LLMs based on their propensity to hallucinate across three common task types - question & answer without RAG, question and answer with RAG, and long-form text generation. 9 | 10 | **How**: The Index ranks leading 11 LLMs performance across three task types. The LLMs were evaluated using 7 popular datasets. To measure hallucinations, the Hallucination Index employs 2 metrics, Correctness and Context Adherence, which are built with the state-of-the-art evaluation method Chainpoll. 11 | 12 | We share full details about our methodology [here](http://rungalileo.io/hallucinationindex). 13 | 14 | ## LLM Rankings for Q&A with RAG 15 | Task: A model that, when presented with a question, uses retrieved information from a given dataset, database, or set of documents to provide an accurate answer. This approach is akin to looking up information in a reference book or searching a database before responding. 16 | 17 |  18 | 19 | ## LLM Rankings for Q&A without RAG 20 | Task: A model that, when presented with a question, relies on the internal knowledge and understanding that the AI model has already acquired during its training. It generates answers based on patterns, facts, and relationships it has learned without referencing external sources of information. 21 | 22 |  23 | 24 | ## LLM Rankings for Long-Form Text Generation 25 | Task: Using generative AI to create extensive and coherent pieces of text such as reports, articles, essays, or stories. For this use-case, AI models are trained on large datasets to understand context, maintain subject relevance, and mimic a natural writing style over longer passages. 26 | 27 |  28 | 29 | Note: Rankings were last updated on November 15, 2023. 30 | 31 | ## Evaluation metrics ✨ 32 | The metrics used to evaluate output quality and propensity for hallucination are powered by [ChainPoll](https://arxiv.org/abs/2310.18344). 33 | 34 | 35 | ChainPoll, developed by Galileo Labs, is an innovative and cost-effective hallucination detection method for large language models (LLMs), and RealHall is a set of challenging, real-world benchmark datasets. Our extensive comparisons show ChainPoll's superior performance in detecting LLM hallucinations, outperforming existing metrics such as with a significant margin in accuracy, transparency, and efficiency, while also introducing new metrics for evaluating LLMs' adherence and correctness in complex reasoning tasks. 36 | 37 | Learn more about our experiment and results over [here](http://rungalileo.io/hallucinationindex). 38 | 39 |  40 | 41 | ## Models evaluated 42 | 43 | **Open AI** 44 | - GPT-4-0613 45 | - GPT-3.5-turbo-1106 46 | - GPT-3.5-turbo-0613 47 | - GPT-3.5-turbo-instruct 48 | 49 | **Meta** 50 | - Llama-2-70b-chat 51 | - Llama-2-13b-chat 52 | - Llama-2-7b-chat 53 | 54 | **Hugging Face** 55 | - Zephyr-7b-beta 56 | 57 | **Mosaic** 58 | - MPT-7b-instruct 59 | 60 | **Mistral** 61 | - Mistral-7b-instruct-v0.1 62 | 63 | **TII UAE** 64 | - Falcon-40b-instruct 65 | 66 | ## What next? 67 | We are excited about this initiative and plan to update Hallucination Index on a quarterly basis. To get an LLM added to the Hallucination Index reach out [here](https://www.rungalileo.io/hallucinationindex). -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 🌟 LLM Hallucination Index - RAG Special 🌟 2 | 3 |
4 |
5 |
8 | https://galileo.ai/hallucination-index 9 |
10 | 11 | # About the Index 12 | 13 |
14 |
15 |
44 |
45 |
50 |
51 |
56 |
57 |
62 |
63 |
68 |
69 |
74 |
75 |
120 |
121 |
126 |
127 |
132 |
133 |