├── LICENSE ├── README.md ├── main.pdf └── taxonomy.png /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2025 HKUST-KnowComp 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Awesome LLM Scientific Discovery [![Awesome](https://awesome.re/badge.svg)](https://awesome.re) 2 | 3 | A curated list of pioneering research papers, tools, and resources at the intersection of Large Language Models (LLMs) and Scientific Discovery. 4 | 5 | Survey: ***From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery.*** ([https://arxiv.org/abs/2505.13259]) 6 | 7 | The survey delineates the evolving role of LLMs in science through a three-level autonomy framework: 8 | * **Level 1: LLM as Tool:** LLMs augmenting human researchers for specific, well-defined tasks. 9 | * **Level 2: LLM as Analyst:** LLMs exhibiting greater autonomy in processing complex information and offering insights. 10 | * **Level 3: LLM as Scientist:** LLM-based systems autonomously conducting major research stages. 11 | 12 | Below is a visual representation of this taxonomy: 13 | 14 | ![Taxonomy of LLM in Scientific Discovery](taxonomy.png) 15 | 16 | We aim to provide a comprehensive overview for researchers, developers, and enthusiasts interested in this rapidly advancing field. 17 | 18 | ## Contents 19 | 20 | * [Level 1: LLM as Tool](#level-1-llm-as-tool) 21 | * [Literature Review and Information Gathering](#literature-review-and-information-gathering) 22 | * [Idea Generation and Hypothesis Formulation](#idea-generation-and-hypothesis-formulation) 23 | * [Experiment Planning and Execution](#experiment-planning-and-execution) 24 | * [Data Analysis and Organization](#data-analysis-and-organization) 25 | * [Conclusion and Hypothesis Validation](#conclusion-and-hypothesis-validation) 26 | * [Iteration and Refinement](#iteration-and-refinement) 27 | * [Level 2: LLM as Analyst](#level-2-llm-as-analyst) 28 | * [Machine Learning Research](#machine-learning-research) 29 | * [Data Modeling and Analysis](#data-modeling-and-analysis) 30 | * [Function Discovery](#function-discovery) 31 | * [Natural Science Research](#natural-science-research) 32 | * [General Research](#general-research) 33 | * [Level 3: LLM as Scientist](#level-3-llm-as-scientist) 34 | * [Contributing](#contributing) 35 | 36 | --- 37 | 38 | ## Level 1: LLM as Tool 39 | 40 | At this foundational level, LLMs function as tailored tools under direct human supervision, designed to execute specific, well-defined tasks within a single stage of the scientific method. Their primary goal is to enhance researcher efficiency. 41 | 42 | ### Literature Review and Information Gathering 43 | 44 | Automating literature search, retrieval, synthesis, structuring, and organization. 45 | 46 | * **SCIMON : Scientific Inspiration Machines Optimized for Novelty** [![arXiv](https://img.shields.io/badge/arXiv-2305.14259-B31B1B.svg)](https://arxiv.org/pdf/2305.14259) - *Wang et al. (2023.05)* 47 | * **ResearchAgent: Iterative research idea generation over scientific literature with Large Language Models** [![arXiv](https://img.shields.io/badge/arXiv-2404.07738-B31B1B.svg)](https://arxiv.org/pdf/2404.07738) - *Baek et al. (2024.04)* 48 | * **Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction** [![arXiv](https://img.shields.io/badge/arXiv-2404.14215-B31B1B.svg)](https://arxiv.org/pdf/2404.14215) - *Deng et al. (2024.04)* 49 | * **TKGT: Redefinition and A New Way of text-to-table tasks based on real world demands and knowledge graphs augmented LLMs** [![arXiv](https://img.shields.io/badge/arXiv-2410.emnlp--main.901-B31B1B.svg)](https://aclanthology.org/2024.emnlp-main.901.pdf) - *Jiang et al. (2024.10)* 50 | * **ArxivDIGESTables: Synthesizing scientific literature into tables using language models** [![arXiv](https://img.shields.io/badge/arXiv-2410.22360-B31B1B.svg)](https://arxiv.org/pdf/2410.22360) - *Newman et al. (2024.10)* 51 | * **Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol** [![arXiv](https://img.shields.io/badge/arXiv-2504.10284-B31B1B.svg)](https://arxiv.org/pdf/2504.10284) - *Wang et al. (2025.04)* 52 | * **LitLLM: A Toolkit for Scientific Literature Review** [![arXiv](https://img.shields.io/badge/arXiv-2402.01788v1-B31B1B.svg)](https://arxiv.org/pdf/2402.01788v1) - *Agarwal et al. (2024.02)* 53 | * **Title and abstract screening for literature reviews using large language models: an exploratory study in the biomedical domain** [![DOI](https://img.shields.io/badge/DOI-10.1186/s13643--024--02575--4-blue.svg)](https://systematicreviewsjournal.biomedcentral.com/articles/10.1186/s13643-024-02575-4) - *Dennstädt et al. (2024.06)* 54 | * **Science Hierarchography: Hierarchical Organization of Science Literature** [![arXiv](https://img.shields.io/badge/arXiv-2504.13834-B31B1B.svg)](https://arxiv.org/pdf/2504.13834) - *Gao et al. (2025.04)* 55 | 56 | ### Idea Generation and Hypothesis Formulation 57 | 58 | Automated generation of novel research ideas, conceptual insights, and testable scientific hypotheses. 59 | 60 | * **SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning** [![arXiv](https://img.shields.io/badge/arXiv-2409.05556-B31B1B.svg)](https://arxiv.org/pdf/2409.05556) - *Ghafarollahi et al. (2024.09)* 61 | * **Accelerating scientific discovery with generative knowledge extraction, graph-based representation, and multimodal intelligent graph reasoning** [![arXiv](https://img.shields.io/badge/arXiv-2403.11996-B31B1B.svg)](https://arxiv.org/pdf/2403.11996) - *Buehler (2024.03)* 62 | * **MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses** [![arXiv](https://img.shields.io/badge/arXiv-2410.07076-B31B1B.svg)](https://arxiv.org/pdf/2410.07076) - *Yang et al. (2024.10)* 63 | * **Large Language Models for Automated Open-domain Scientific Hypotheses Discovery** [![arXiv](https://img.shields.io/badge/arXiv-2309.02726-B31B1B.svg)](https://arxiv.org/pdf/2309.02726) - *Yang et al. (2023.09)* 64 | * **Improving Scientific Hypothesis Generation with Knowledge Grounded Large Language Models** [![arXiv](https://img.shields.io/badge/arXiv-2411.02382-B31B1B.svg)](https://arxiv.org/pdf/2411.02382) - *Xiong et al. (2024.11)* 65 | * **ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition** [![arXiv](https://img.shields.io/badge/arXiv-2503.21248-B31B1B.svg)](https://arxiv.org/pdf/2503.21248) - *Liu et al. (2025.03)* 66 | * **AI Idea Bench 2025: AI Research Idea Generation Benchmark** [![arXiv](https://img.shields.io/badge/arXiv-2504.14191-B31B1B.svg)](https://arxiv.org/pdf/2504.14191) - *Qiu et al. (2025.04)* 67 | * **IdeaBench: Benchmarking Large Language Models for Research Idea Generation** [![arXiv](https://img.shields.io/badge/arXiv-2411.02429-B31B1B.svg)](https://arxiv.org/pdf/2411.02429) - *Guo et al. (2024.11)* 68 | * **Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers** [![arXiv](https://img.shields.io/badge/arXiv-2409.04109-B31B1B.svg)](https://arxiv.org/pdf/2409.04109) - *Si et al. (2024.09)* 69 | * **Learning to Generate Research Idea with Dynamic Control** [![arXiv](https://img.shields.io/badge/arXiv-2412.14626-B31B1B.svg)](https://arxiv.org/pdf/2412.14626) - *Li et al. (2024.12)* 70 | * **LiveIdeaBench: Evaluating LLMs' Divergent Thinking for Scientific Idea Generation with Minimal Context** [![arXiv](https://img.shields.io/badge/arXiv-2412.17596-B31B1B.svg)](https://arxiv.org/pdf/2412.17596) - *Ruan et al. (2024.12)* 71 | * **Nova: An Iterative Planning and Search Approach to Enhance Novelty and Diversity of LLM Generated Ideas** [![arXiv](https://img.shields.io/badge/arXiv-2410.14255-B31B1B.svg)](https://arxiv.org/pdf/2410.14255) - *Hu et al. (2024.10)* 72 | * **GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation** [![arXiv](https://img.shields.io/badge/arXiv-2503.12600-B31B1B.svg)](https://arxiv.org/pdf/2503.12600) - *Feng et al. (2025.03)* 73 | * **Hypothesis Generation with Large Language Models** [![arXiv](https://img.shields.io/badge/arXiv-2404.04326-B31B1B.svg)](https://arxiv.org/pdf/2404.04326) - *Zhou et al. (2024.04)* 74 | * **Harnessing the Power of Adversarial Prompting and Large Language Models for Robust Hypothesis Generation in Astronomy** [![arXiv](https://img.shields.io/badge/arXiv-2306.11648-B31B1B.svg)](https://arxiv.org/pdf/2306.11648) - *Ciuca et al. (2023.06)* 75 | * **Large Language Models are Zero Shot Hypothesis Proposers** [![arXiv](https://img.shields.io/badge/arXiv-2311.05965-B31B1B.svg)](https://arxiv.org/pdf/2311.05965) - *Qi et al. (2023.11)* 76 | * **Machine learning for hypothesis generation in biology and medicine: exploring the latent space of neuroscience and developmental bioelectricity** [![DOI](https://img.shields.io/badge/DOI-10.1039/D3DD00185G-blue.svg)](https://pubs.rsc.org/en/content/articlelanding/2024/dd/d3dd00185g) - *O’Brien et al. (2023.07)* 77 | * **Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation** [![arXiv](https://img.shields.io/badge/arXiv-2407.08940-B31B1B.svg)](https://arxiv.org/pdf/2407.08940) - *Qi et al. (2024.07)* 78 | * **LLM4GRN: Discovering Causal Gene Regulatory Networks with LLMs -- Evaluation through Synthetic Data Generation** [![arXiv](https://img.shields.io/badge/arXiv-2410.15828-B31B1B.svg)](https://arxiv.org/pdf/2410.15828) - *Afonja et al. (2024.10)* 79 | * **Scideator: Human-LLM Scientific Idea Generation Grounded in Research-Paper Facet Recombination** [![arXiv](https://img.shields.io/badge/arXiv-2409.14634-B31B1B.svg)](https://arxiv.org/pdf/2409.14634) - *Radensky et al. (2024.09)* 80 | 81 | ### Experiment Planning and Execution 82 | 83 | LLMs assisting in experimental protocol planning, workflow design, and scientific code generation. 84 | 85 | * **BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology** [![arXiv](https://img.shields.io/badge/arXiv-2310.10632-B31B1B.svg)](https://arxiv.org/pdf/2310.10632) - *O'Donoghue et al. (2023.10)* 86 | * **Can Large Language Models Help Experimental Design for Causal Discovery?** (Li et al. in survey) [![arXiv](https://img.shields.io/badge/arXiv-2503.01139-B31B1B.svg)](https://arxiv.org/pdf/2503.01139) - *Li et al. (2025.03)* 87 | * **Hierarchically Encapsulated Representation for Protocol Design in Self-Driving Labs** [![arXiv](https://img.shields.io/badge/arXiv-2504.03810-B31B1B.svg)](https://arxiv.org/pdf/2504.03810) - *Shi et al. (2025.04)* 88 | * **SciCode: A Research Coding Benchmark Curated by Scientists** [![arXiv](https://img.shields.io/badge/arXiv-2407.13168-B31B1B.svg)](https://arxiv.org/pdf/2407.13168) - *Tian et al. (2024.07)* 89 | * **Natural Language to Code Generation in Interactive Data Science Notebooks** [![arXiv](https://img.shields.io/badge/arXiv-2212.09248-B31B1B.svg)](https://arxiv.org/pdf/2212.09248) - *Yin et al. (2022.12)* 90 | * **DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation** [![arXiv](https://img.shields.io/badge/arXiv-2211.11501-B31B1B.svg)](https://arxiv.org/pdf/2211.11501) - *Lai et al. (2022.11)* 91 | * **MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering** [![arXiv](https://img.shields.io/badge/arXiv-2410.07095-B31B1B.svg)](https://arxiv.org/pdf/2410.07095) - *Chan et al. (2024.10)* 92 | * **AIDE: AI-Driven Exploration in the Space of Code** [![arXiv](https://img.shields.io/badge/arXiv-2502.13138-B31B1B.svg)](https://arxiv.org/pdf/2502.13138) - *Jiang et al. (2025.02)* 93 | * **Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents**, [![arXiv](https://img.shields.io/badge/arXiv-2502.16069-B31B1B.svg)](https://arxiv.org/pdf/2502.16069) - *Kon et al. (2025.02)* 94 | 95 | ### Data Analysis and Organization 96 | 97 | LLMs assisting in data-driven analysis, tabular/chart reasoning, statistical reasoning, and model discovery. 98 | 99 | * **AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ** [![arXiv](https://img.shields.io/badge/arXiv-2310.00367-B31B1B.svg)](https://arxiv.org/pdf/2310.00367) - *Belouadi et al. (2023.10)* 100 | * **Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback** [![arXiv](https://img.shields.io/badge/arXiv-2410.04064-B31B1B.svg)](https://arxiv.org/pdf/2410.04064) - *Zadeh et al. (2024.10)* 101 | * **ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning** [![arXiv](https://img.shields.io/badge/arXiv-2203.10244-B31B1B.svg)](https://arxiv.org/pdf/2203.10244) - *Masry et al. (2022.03)* 102 | * **CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs** [![arXiv](https://img.shields.io/badge/arXiv-2406.18521-B31B1B.svg)](https://arxiv.org/pdf/2406.18521) - *Wang et al. (2024.06)* 103 | * **ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning** [![arXiv](https://img.shields.io/badge/arXiv-2402.12185-B31B1B.svg)](https://arxiv.org/pdf/2402.12185) - *Xia et al. (2024.02)* 104 | * **Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding** [![arXiv](https://img.shields.io/badge/arXiv-2401.04398-B31B1B.svg)](https://arxiv.org/pdf/2401.04398) - *Wang et al. (2024.01)* 105 | * **TableBench: A Comprehensive and Complex Benchmark for Table Question Answering** [![arXiv](https://img.shields.io/badge/arXiv-2408.09174-B31B1B.svg)](https://arxiv.org/pdf/2408.09174) - *Wu et al. (2024.08)* 106 | * **Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs** [![arXiv](https://img.shields.io/badge/arXiv-2402.12424-B31B1B.svg)](https://arxiv.org/pdf/2402.12424) - *Deng et al. (2024.02)* 107 | 108 | ### Conclusion and Hypothesis Validation 109 | 110 | LLMs providing feedback, verifying claims, replicating results, and generating reviews. 111 | 112 | * **CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers?** [![arXiv](https://img.shields.io/badge/arXiv-2503.21717-B31B1B.svg)](https://arxiv.org/pdf/2503.21717) - *Ou et al. (2025.03)* 113 | * **LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing** [![arXiv](https://img.shields.io/badge/arXiv-2406.16253-B31B1B.svg)](https://arxiv.org/pdf/2406.16253) - *Du et al. (2024.06)* 114 | * **AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware Academic Reviews** [![arXiv](https://img.shields.io/badge/arXiv-2408.10365-B31B1B.svg)](https://arxiv.org/pdf/2408.10365) - *Tyser et al. (2024.08)* 115 | * **Is LLM a Reliable Reviewer? A Comprehensive Evaluation of LLM on Automatic Paper Reviewing Tasks** [![Link](https://img.shields.io/badge/Link-LREC--COLING_2024-blue.svg)](https://aclanthology.org/2024.lrec-main.816.pdf) - *Zhou et al. (2024.05)* 116 | * **ReviewerGPT? An Exploratory Study on Using Large Language Models for Paper Reviewing** [![arXiv](https://img.shields.io/badge/arXiv-2306.00622-B31B1B.svg)](https://arxiv.org/pdf/2306.00622) - *Liu and Shah (2023.06)* 117 | * **Towards Autonomous Hypothesis Verification via Language Models with Minimal Guidance** [![arXiv](https://img.shields.io/badge/arXiv-2311.09706-B31B1B.svg)](https://arxiv.org/pdf/2311.09706) - *Takagi et al. (2023.11)* 118 | * **CycleResearcher: Improving Automated Research via Automated Review** [![arXiv](https://img.shields.io/badge/arXiv-2411.00816-B31B1B.svg)](https://arxiv.org/pdf/2411.00816) - *Weng et al. (2024.11)* 119 | * **PaperBench: Evaluating AI’s Ability to Replicate AI Research** [![arXiv](https://img.shields.io/badge/arXiv-2504.01848-B31B1B.svg)](https://arxiv.org/pdf/2504.01848) - *Starace et al. (2025.04)* 120 | * **SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers** [![arXiv](https://img.shields.io/badge/arXiv-2504.00255-B31B1B.svg)](https://arxiv.org/pdf/2504.00255) - *Xiang et al. (2025.04)* 121 | * **Advancing AI-Scientist Understanding: Making LLM Think Like a Physicist with Interpretable Reasoning** [![arXiv](https://img.shields.io/badge/arXiv-2504.01911-B31B1B.svg)](https://arxiv.org/pdf/2504.01911) - *Xu et al. (2025.04)* 122 | 123 | ### Iteration and Refinement 124 | 125 | LLMs involved in iterative refinement of research hypotheses and strategic exploration. 126 | 127 | * **Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving** [![arXiv](https://img.shields.io/badge/arXiv-2405.01379-B31B1B.svg)](https://arxiv.org/pdf/2405.01379) - *Quan et al. (2024.05)* 128 | * **Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents** [![arXiv](https://img.shields.io/badge/arXiv-2410.13185-B31B1B.svg)](https://arxiv.org/pdf/2410.13185) - *Li et al. (2024.10)* 129 | * **Iterative Hypothesis Generation for Scientific Discovery with Monte Carlo Nash Equilibrium Self-Refining Trees** [![arXiv](https://img.shields.io/badge/arXiv-2503.19309-B31B1B.svg)](https://arxiv.org/pdf/2503.19309) - *Rabby et al. (2025.03)* 130 | * **XtraGPT: LLMs for Human-AI Collaboration on Controllable Academic Paper Revision** [![arXiv](https://img.shields.io/badge/arXiv-2505.11336-B31B1B.svg)](https://arxiv.org/pdf/2505.11336) - *Chen et al. (2025.05)* 131 | 132 | --- 133 | 134 | ## Level 2: LLM as Analyst 135 | 136 | LLMs exhibiting a greater degree of autonomy, functioning as passive agents capable of complex information processing, data modeling, and analytical reasoning with reduced human intervention. 137 | 138 | ### Machine Learning Research 139 | 140 | Automated modeling of machine learning tasks, experiment design, and execution. 141 | 142 | * **MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation** [![arXiv](https://img.shields.io/badge/arXiv-2310.03302-B31B1B.svg)](https://arxiv.org/pdf/2310.03302) - *Huang et al. (2023.10)* 143 | * **MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents** [![arXiv](https://img.shields.io/badge/arXiv-2408.14033-B31B1B.svg)](https://arxiv.org/pdf/2408.14033) - *Li et al. (2024.08)* 144 | * **MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering** [![arXiv](https://img.shields.io/badge/arXiv-2410.07095-B31B1B.svg)](https://arxiv.org/pdf/2410.07095) - *Chan et al. (2024.10)* 145 | * **IMPROVE: Iterative Model Pipeline Refinement and Optimization Leveraging LLM Agents** [![arXiv](https://img.shields.io/badge/arXiv-2502.18530v1-B31B1B.svg)](https://arxiv.org/pdf/2502.18530v1) - *Xue et al. (2025.02)* 146 | * **CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation** [![arXiv](https://img.shields.io/badge/arXiv-2503.22708-B31B1B.svg)](https://arxiv.org/pdf/2503.22708) - *Jansen et al. (2025.03)* 147 | * **MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?** [![arXiv](https://img.shields.io/badge/arXiv-2504.09702-B31B1B.svg)](https://arxiv.org/pdf/2504.09702) - *Zhang et al. (2025.04)* 148 | 149 | ### Data Modeling and Analysis 150 | 151 | Automated data-driven analysis, statistical data modeling, and hypothesis validation. 152 | 153 | * **Automated Statistical Model Discovery with Language Models** [![arXiv](https://img.shields.io/badge/arXiv-2402.17879-B31B1B.svg)](https://arxiv.org/pdf/2402.17879) - *Li et al. (2024.02)* 154 | * **InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks** [![arXiv](https://img.shields.io/badge/arXiv-2401.05507-B31B1B.svg)](https://arxiv.org/pdf/2401.05507) - *Hu et al. (2024.01)* 155 | * **DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning** [![arXiv](https://img.shields.io/badge/arXiv-2402.17453-B31B1B.svg)](https://arxiv.org/pdf/2402.17453) - *Guo et al. (2024.02)* 156 | * **BLADE: Benchmarking Language Model Agents for Data-Driven Science** [![arXiv](https://img.shields.io/badge/arXiv-2408.09667-B31B1B.svg)](https://arxiv.org/pdf/2408.09667) - *Gu et al. (2024.08)* 157 | * **DAgent: A Relational Database-Driven Data Analysis Report Generation Agent** [![arXiv](https://img.shields.io/badge/arXiv-2503.13269-B31B1B.svg)](https://arxiv.org/pdf/2503.13269) - *Xu et al. (2025.03)* 158 | * **DiscoveryBench: Towards Data-Driven Discovery with Large Language Models** [![arXiv](https://img.shields.io/badge/arXiv-2407.01725-B31B1B.svg)](https://arxiv.org/pdf/2407.01725) - *Majumder et al. (2024.07)* 159 | * **Large Language Models for Scientific Synthesis, Inference and Explanation** [![arXiv](https://img.shields.io/badge/arXiv-2310.07984-B31B1B.svg)](https://arxiv.org/pdf/2310.07984) - *Zheng et al. (2023.10)* 160 | * **MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem** [![arXiv](https://img.shields.io/badge/arXiv-2505.14148-B31B1B.svg)](https://arxiv.org/pdf/2505.14148) - *Liu et al. (2025.05)* 161 | 162 | ### Function Discovery 163 | 164 | Identifying underlying equations from observational data (AI-driven symbolic regression). 165 | 166 | * **LLM-SR: Scientific Equation Discovery via Programming with Large Language Models** [![arXiv](https://img.shields.io/badge/arXiv-2404.18400-B31B1B.svg)](https://arxiv.org/pdf/2404.18400) - *Shojaee et al. (2024.04)* 167 | * **LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models** [![arXiv](https://img.shields.io/badge/arXiv-2504.10415-B31B1B.svg)](https://arxiv.org/pdf/2504.10415) - *Shojaee et al. (2025.04)* 168 | * **Gravity-Bench-v1: A Benchmark on Gravitational Physics Discovery for Agents** [![arXiv](https://img.shields.io/badge/arXiv-2501.18411-B31B1B.svg)](https://arxiv.org/pdf/2501.18411) - *Koblischke et al. (2025.01)* 169 | 170 | ### Natural Science Research 171 | 172 | Autonomous research workflows for natural science discovery (e.g., chemistry, biology, biomedicine). 173 | 174 | * **Coscientist: Autonomous Chemical Research with Large Language Models** [![DOI](https://img.shields.io/badge/DOI-10.1038/s41586--023--06792--0-blue.svg)](https://www.nature.com/articles/s41586-023-06792-0) - *Boiko et al. (2023.10)* 175 | * **Empowering biomedical discovery with AI agents** [![DOI](https://img.shields.io/badge/DOI-10.1016/j.cell.2024.08.026-blue.svg)](https://www.cell.com/action/showPdf?pii=S0092-8674%2824%2901070-5) - *Gao et al. (2024.09)* 176 | * **From Intention To Implementation: Automating Biomedical Research via LLMs** [![arXiv](https://img.shields.io/badge/arXiv-2412.09429-B31B1B.svg)](https://arxiv.org/pdf/2412.09429) - *Luo et al. (2024.12)* 177 | * **DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration** [![arXiv](https://img.shields.io/badge/arXiv-2411.15692-B31B1B.svg)](https://arxiv.org/pdf/2411.15692) - *Liu et al. (2024.11)* 178 | * **ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery** [![arXiv](https://img.shields.io/badge/arXiv-2410.05080-B31B1B.svg)](https://arxiv.org/pdf/2410.05080) - *Chen et al. (2024.10)* 179 | * **ProtAgents: Protein discovery by combining physics and machine learning** [![arXiv](https://img.shields.io/badge/arXiv-2402.04268-B31B1B.svg)](https://arxiv.org/pdf/2402.04268) - *Ghafarollahi and Buehler (2024.02)* 180 | * **Auto-Bench: An Automated Benchmark for Scientific Discovery in LLMs** [![arXiv](https://img.shields.io/badge/arXiv-2502.15224-B31B1B.svg)](https://arxiv.org/pdf/2502.15224) - *Chen et al. (2025.02)* 181 | * **Towards an AI co-scientist** [![arXiv](https://img.shields.io/badge/arXiv-2502.18864-B31B1B.svg)](https://arxiv.org/pdf/2502.18864) - *Gottweis et al. (2025.02)* 182 | 183 | ### General Research 184 | 185 | Benchmarks and frameworks evaluating diverse tasks from different stages of scientific discovery. 186 | 187 | * **DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents** [![arXiv](https://img.shields.io/badge/arXiv-2406.06769-B31B1B.svg)](https://arxiv.org/pdf/2406.06769) - *Jansen et al. (2024.06)* 188 | * **A Vision for Auto Research with LLM Agents** [![arXiv](https://img.shields.io/badge/arXiv-2504.18765-B31B1B.svg)](https://arxiv.org/pdf/2504.18765) - *Liu et al. (2025.04)* 189 | * **CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning** [![arXiv](https://img.shields.io/badge/arXiv-2503.13517-B31B1B.svg)](https://arxiv.org/pdf/2503.13517) - *Cui et al. (2025.03)* 190 | * **EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants** [![arXiv](https://img.shields.io/badge/arXiv-2502.20309-B31B1B.svg)](https://arxiv.org/pdf/2502.20309) - *Cappello et al. (2025.02)* 191 | 192 | ### Survey Generation 193 | 194 | * **AutoSurvey: Large Language Models Can Automatically Write Surveys** [![arXiv](https://img.shields.io/badge/arXiv-2406.10252-B31B1B.svg)](https://arxiv.org/pdf/2406.10252) - *Wang et al. (2024.06)* 195 | --- 196 | 197 | ## Level 3: LLM as Scientist 198 | 199 | LLM-based systems operating as active agents capable of orchestrating and navigating multiple stages of the scientific discovery process with considerable independence, often culminating in draft research papers. 200 | 201 | * **Agent Laboratory: Using LLM Agents as Research Assistants** [![arXiv](https://img.shields.io/badge/arXiv-2501.04227-B31B1B.svg)](https://arxiv.org/pdf/2501.04227) - *Schmidgall et al. (2025.01)* 202 | * **The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery** [![arXiv](https://img.shields.io/badge/arXiv-2408.06292-B31B1B.svg)](https://arxiv.org/pdf/2408.06292) - *Lu et al. (2024.08)* 203 | * **The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search** [![arXiv](https://img.shields.io/badge/arXiv-2504.08066-B31B1B.svg)](https://arxiv.org/pdf/2504.08066) - *Yamada et al. (2025.04)* 204 | * **AI-Researcher: Fully-Automated Scientific Discovery with LLM Agents** [![GitHub](https://img.shields.io/badge/GitHub-HKUDS/AI--Researcher-blue.svg)](https://github.com/HKUDS/AI-Researcher) - *Data Intelligence Lab (2025.03)* 205 | * **Zochi Technical Report** [![Link](https://img.shields.io/badge/Link-Intology.AI-blue.svg)](https://www.intology.ai/blog/zochi-tech-report) - *Intology AI (2025.03)* 206 | * **Meet Carl: The First AI System To Produce Academically Peer-Reviewed Research** [![Link](https://img.shields.io/badge/Link-AutoScience.AI-blue.svg)](https://www.autoscience.ai/blog/meet-carl-the-first-ai-system-to-produce-academically-peer-reviewed-research) - *Autoscience Institute (2025.03)* 207 | 208 | --- 209 | 210 | ## Contributing 211 | 212 | Contributions are welcome! If you have a paper, tool, or resource that fits into this taxonomy, please submit a **pull request**. 213 | 214 | --- 215 | 216 | ## Citation 217 | 218 | Please cite our paper if you found our survey helpful: 219 | ```bibtex 220 | @misc{zheng2025automationautonomysurveylarge, 221 | title={From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery}, 222 | author={Tianshi Zheng and Zheye Deng and Hong Ting Tsang and Weiqi Wang and Jiaxin Bai and Zihao Wang and Yangqiu Song}, 223 | year={2025}, 224 | eprint={2505.13259}, 225 | archivePrefix={arXiv}, 226 | primaryClass={cs.CL}, 227 | url={https://arxiv.org/abs/2505.13259}, 228 | } 229 | -------------------------------------------------------------------------------- /main.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HKUST-KnowComp/Awesome-LLM-Scientific-Discovery/d94f7356ab6e62ed61ff23c10b2592d285f700ee/main.pdf -------------------------------------------------------------------------------- /taxonomy.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/HKUST-KnowComp/Awesome-LLM-Scientific-Discovery/d94f7356ab6e62ed61ff23c10b2592d285f700ee/taxonomy.png --------------------------------------------------------------------------------