├── LICENSE
├── README.md
├── main.pdf
└── taxonomy.png


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2025 HKUST-KnowComp
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Awesome LLM Scientific Discovery [![Awesome](https://awesome.re/badge.svg)](https://awesome.re)
  2 | 
  3 | A curated list of pioneering research papers, tools, and resources at the intersection of Large Language Models (LLMs) and Scientific Discovery. 
  4 | 
  5 | Survey: ***From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery.*** ([https://arxiv.org/abs/2505.13259])
  6 | 
  7 | The survey delineates the evolving role of LLMs in science through a three-level autonomy framework:
  8 | *   **Level 1: LLM as Tool:** LLMs augmenting human researchers for specific, well-defined tasks.
  9 | *   **Level 2: LLM as Analyst:** LLMs exhibiting greater autonomy in processing complex information and offering insights.
 10 | *   **Level 3: LLM as Scientist:** LLM-based systems autonomously conducting major research stages.
 11 | 
 12 | Below is a visual representation of this taxonomy:
 13 | 
 14 | ![Taxonomy of LLM in Scientific Discovery](taxonomy.png)
 15 | 
 16 | We aim to provide a comprehensive overview for researchers, developers, and enthusiasts interested in this rapidly advancing field.
 17 | 
 18 | ## Contents
 19 | 
 20 | *   [Level 1: LLM as Tool](#level-1-llm-as-tool)
 21 |     *   [Literature Review and Information Gathering](#literature-review-and-information-gathering)
 22 |     *   [Idea Generation and Hypothesis Formulation](#idea-generation-and-hypothesis-formulation)
 23 |     *   [Experiment Planning and Execution](#experiment-planning-and-execution)
 24 |     *   [Data Analysis and Organization](#data-analysis-and-organization)
 25 |     *   [Conclusion and Hypothesis Validation](#conclusion-and-hypothesis-validation)
 26 |     *   [Iteration and Refinement](#iteration-and-refinement)
 27 | *   [Level 2: LLM as Analyst](#level-2-llm-as-analyst)
 28 |     *   [Machine Learning Research](#machine-learning-research)
 29 |     *   [Data Modeling and Analysis](#data-modeling-and-analysis)
 30 |     *   [Function Discovery](#function-discovery)
 31 |     *   [Natural Science Research](#natural-science-research)
 32 |     *   [General Research](#general-research)
 33 | *   [Level 3: LLM as Scientist](#level-3-llm-as-scientist)
 34 | *   [Contributing](#contributing)
 35 | 
 36 | ---
 37 | 
 38 | ## Level 1: LLM as Tool
 39 | 
 40 | At this foundational level, LLMs function as tailored tools under direct human supervision, designed to execute specific, well-defined tasks within a single stage of the scientific method. Their primary goal is to enhance researcher efficiency.
 41 | 
 42 | ### Literature Review and Information Gathering
 43 | 
 44 | Automating literature search, retrieval, synthesis, structuring, and organization.
 45 | 
 46 | *   **SCIMON : Scientific Inspiration Machines Optimized for Novelty** [![arXiv](https://img.shields.io/badge/arXiv-2305.14259-B31B1B.svg)](https://arxiv.org/pdf/2305.14259) - *Wang et al. (2023.05)*
 47 | *   **ResearchAgent: Iterative research idea generation over scientific literature with Large Language Models** [![arXiv](https://img.shields.io/badge/arXiv-2404.07738-B31B1B.svg)](https://arxiv.org/pdf/2404.07738) - *Baek et al. (2024.04)*
 48 | *   **Text-Tuple-Table: Towards Information Integration in Text-to-Table Generation via Global Tuple Extraction** [![arXiv](https://img.shields.io/badge/arXiv-2404.14215-B31B1B.svg)](https://arxiv.org/pdf/2404.14215) - *Deng et al. (2024.04)*
 49 | *   **TKGT: Redefinition and A New Way of text-to-table tasks based on real world demands and knowledge graphs augmented LLMs** [![arXiv](https://img.shields.io/badge/arXiv-2410.emnlp--main.901-B31B1B.svg)](https://aclanthology.org/2024.emnlp-main.901.pdf) - *Jiang et al. (2024.10)*
 50 | *   **ArxivDIGESTables: Synthesizing scientific literature into tables using language models** [![arXiv](https://img.shields.io/badge/arXiv-2410.22360-B31B1B.svg)](https://arxiv.org/pdf/2410.22360) - *Newman et al. (2024.10)*
 51 | *   **Can LLMs Generate Tabular Summaries of Science Papers? Rethinking the Evaluation Protocol** [![arXiv](https://img.shields.io/badge/arXiv-2504.10284-B31B1B.svg)](https://arxiv.org/pdf/2504.10284) - *Wang et al. (2025.04)*
 52 | *   **LitLLM: A Toolkit for Scientific Literature Review** [![arXiv](https://img.shields.io/badge/arXiv-2402.01788v1-B31B1B.svg)](https://arxiv.org/pdf/2402.01788v1) - *Agarwal et al. (2024.02)*
 53 | *   **Title and abstract screening for literature reviews using large language models: an exploratory study in the biomedical domain** [![DOI](https://img.shields.io/badge/DOI-10.1186/s13643--024--02575--4-blue.svg)](https://systematicreviewsjournal.biomedcentral.com/articles/10.1186/s13643-024-02575-4) - *Dennstädt et al. (2024.06)*
 54 | *   **Science Hierarchography: Hierarchical Organization of Science Literature** [![arXiv](https://img.shields.io/badge/arXiv-2504.13834-B31B1B.svg)](https://arxiv.org/pdf/2504.13834) - *Gao et al. (2025.04)* 
 55 | 
 56 | ### Idea Generation and Hypothesis Formulation
 57 | 
 58 | Automated generation of novel research ideas, conceptual insights, and testable scientific hypotheses.
 59 | 
 60 | *   **SciAgents: Automating scientific discovery through multi-agent intelligent graph reasoning** [![arXiv](https://img.shields.io/badge/arXiv-2409.05556-B31B1B.svg)](https://arxiv.org/pdf/2409.05556) - *Ghafarollahi et al. (2024.09)*
 61 | *   **Accelerating scientific discovery with generative knowledge extraction, graph-based representation, and multimodal intelligent graph reasoning** [![arXiv](https://img.shields.io/badge/arXiv-2403.11996-B31B1B.svg)](https://arxiv.org/pdf/2403.11996) - *Buehler (2024.03)*
 62 | *   **MOOSE-Chem: Large Language Models for Rediscovering Unseen Chemistry Scientific Hypotheses** [![arXiv](https://img.shields.io/badge/arXiv-2410.07076-B31B1B.svg)](https://arxiv.org/pdf/2410.07076) - *Yang et al. (2024.10)*
 63 | *   **Large Language Models for Automated Open-domain Scientific Hypotheses Discovery** [![arXiv](https://img.shields.io/badge/arXiv-2309.02726-B31B1B.svg)](https://arxiv.org/pdf/2309.02726) - *Yang et al. (2023.09)*
 64 | *   **Improving Scientific Hypothesis Generation with Knowledge Grounded Large Language Models** [![arXiv](https://img.shields.io/badge/arXiv-2411.02382-B31B1B.svg)](https://arxiv.org/pdf/2411.02382) - *Xiong et al. (2024.11)*
 65 | *   **ResearchBench: Benchmarking LLMs in Scientific Discovery via Inspiration-Based Task Decomposition** [![arXiv](https://img.shields.io/badge/arXiv-2503.21248-B31B1B.svg)](https://arxiv.org/pdf/2503.21248) - *Liu et al. (2025.03)*
 66 | *   **AI Idea Bench 2025: AI Research Idea Generation Benchmark** [![arXiv](https://img.shields.io/badge/arXiv-2504.14191-B31B1B.svg)](https://arxiv.org/pdf/2504.14191) - *Qiu et al. (2025.04)*
 67 | *   **IdeaBench: Benchmarking Large Language Models for Research Idea Generation** [![arXiv](https://img.shields.io/badge/arXiv-2411.02429-B31B1B.svg)](https://arxiv.org/pdf/2411.02429) - *Guo et al. (2024.11)*
 68 | *   **Can LLMs Generate Novel Research Ideas? A Large-Scale Human Study with 100+ NLP Researchers** [![arXiv](https://img.shields.io/badge/arXiv-2409.04109-B31B1B.svg)](https://arxiv.org/pdf/2409.04109) - *Si et al. (2024.09)*
 69 | *   **Learning to Generate Research Idea with Dynamic Control** [![arXiv](https://img.shields.io/badge/arXiv-2412.14626-B31B1B.svg)](https://arxiv.org/pdf/2412.14626) - *Li et al. (2024.12)*
 70 | *   **LiveIdeaBench: Evaluating LLMs' Divergent Thinking for Scientific Idea Generation with Minimal Context** [![arXiv](https://img.shields.io/badge/arXiv-2412.17596-B31B1B.svg)](https://arxiv.org/pdf/2412.17596) - *Ruan et al. (2024.12)*
 71 | *   **Nova: An Iterative Planning and Search Approach to Enhance Novelty and Diversity of LLM Generated Ideas** [![arXiv](https://img.shields.io/badge/arXiv-2410.14255-B31B1B.svg)](https://arxiv.org/pdf/2410.14255) - *Hu et al. (2024.10)*
 72 | *   **GraphEval: A Lightweight Graph-Based LLM Framework for Idea Evaluation** [![arXiv](https://img.shields.io/badge/arXiv-2503.12600-B31B1B.svg)](https://arxiv.org/pdf/2503.12600) - *Feng et al. (2025.03)*
 73 | *   **Hypothesis Generation with Large Language Models** [![arXiv](https://img.shields.io/badge/arXiv-2404.04326-B31B1B.svg)](https://arxiv.org/pdf/2404.04326) - *Zhou et al. (2024.04)*
 74 | *   **Harnessing the Power of Adversarial Prompting and Large Language Models for Robust Hypothesis Generation in Astronomy** [![arXiv](https://img.shields.io/badge/arXiv-2306.11648-B31B1B.svg)](https://arxiv.org/pdf/2306.11648) - *Ciuca et al. (2023.06)*
 75 | *   **Large Language Models are Zero Shot Hypothesis Proposers** [![arXiv](https://img.shields.io/badge/arXiv-2311.05965-B31B1B.svg)](https://arxiv.org/pdf/2311.05965) - *Qi et al. (2023.11)*
 76 | *   **Machine learning for hypothesis generation in biology and medicine: exploring the latent space of neuroscience and developmental bioelectricity** [![DOI](https://img.shields.io/badge/DOI-10.1039/D3DD00185G-blue.svg)](https://pubs.rsc.org/en/content/articlelanding/2024/dd/d3dd00185g) - *O’Brien et al. (2023.07)*
 77 | *   **Large Language Models as Biomedical Hypothesis Generators: A Comprehensive Evaluation** [![arXiv](https://img.shields.io/badge/arXiv-2407.08940-B31B1B.svg)](https://arxiv.org/pdf/2407.08940) - *Qi et al. (2024.07)*
 78 | *   **LLM4GRN: Discovering Causal Gene Regulatory Networks with LLMs -- Evaluation through Synthetic Data Generation** [![arXiv](https://img.shields.io/badge/arXiv-2410.15828-B31B1B.svg)](https://arxiv.org/pdf/2410.15828) - *Afonja et al. (2024.10)*
 79 | *   **Scideator: Human-LLM Scientific Idea Generation Grounded in Research-Paper Facet Recombination** [![arXiv](https://img.shields.io/badge/arXiv-2409.14634-B31B1B.svg)](https://arxiv.org/pdf/2409.14634) - *Radensky et al. (2024.09)*
 80 | 
 81 | ### Experiment Planning and Execution
 82 | 
 83 | LLMs assisting in experimental protocol planning, workflow design, and scientific code generation.
 84 | 
 85 | *   **BioPlanner: Automatic Evaluation of LLMs on Protocol Planning in Biology** [![arXiv](https://img.shields.io/badge/arXiv-2310.10632-B31B1B.svg)](https://arxiv.org/pdf/2310.10632) - *O'Donoghue et al. (2023.10)*
 86 | *   **Can Large Language Models Help Experimental Design for Causal Discovery?** (Li et al. in survey) [![arXiv](https://img.shields.io/badge/arXiv-2503.01139-B31B1B.svg)](https://arxiv.org/pdf/2503.01139) - *Li et al. (2025.03)*
 87 | *   **Hierarchically Encapsulated Representation for Protocol Design in Self-Driving Labs**  [![arXiv](https://img.shields.io/badge/arXiv-2504.03810-B31B1B.svg)](https://arxiv.org/pdf/2504.03810) - *Shi et al. (2025.04)*
 88 | *   **SciCode: A Research Coding Benchmark Curated by Scientists** [![arXiv](https://img.shields.io/badge/arXiv-2407.13168-B31B1B.svg)](https://arxiv.org/pdf/2407.13168) - *Tian et al. (2024.07)*
 89 | *   **Natural Language to Code Generation in Interactive Data Science Notebooks** [![arXiv](https://img.shields.io/badge/arXiv-2212.09248-B31B1B.svg)](https://arxiv.org/pdf/2212.09248) - *Yin et al. (2022.12)*
 90 | *   **DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation** [![arXiv](https://img.shields.io/badge/arXiv-2211.11501-B31B1B.svg)](https://arxiv.org/pdf/2211.11501) - *Lai et al. (2022.11)*
 91 | *   **MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering** [![arXiv](https://img.shields.io/badge/arXiv-2410.07095-B31B1B.svg)](https://arxiv.org/pdf/2410.07095) - *Chan et al. (2024.10)*
 92 | *   **AIDE: AI-Driven Exploration in the Space of Code** [![arXiv](https://img.shields.io/badge/arXiv-2502.13138-B31B1B.svg)](https://arxiv.org/pdf/2502.13138) - *Jiang et al. (2025.02)*
 93 | *   **Curie: Toward Rigorous and Automated Scientific Experimentation with AI Agents**, [![arXiv](https://img.shields.io/badge/arXiv-2502.16069-B31B1B.svg)](https://arxiv.org/pdf/2502.16069) - *Kon et al. (2025.02)*
 94 | 
 95 | ### Data Analysis and Organization
 96 | 
 97 | LLMs assisting in data-driven analysis, tabular/chart reasoning, statistical reasoning, and model discovery.
 98 | 
 99 | *   **AutomaTikZ: Text-Guided Synthesis of Scientific Vector Graphics with TikZ** [![arXiv](https://img.shields.io/badge/arXiv-2310.00367-B31B1B.svg)](https://arxiv.org/pdf/2310.00367) - *Belouadi et al. (2023.10)*
100 | *   **Text2Chart31: Instruction Tuning for Chart Generation with Automatic Feedback** [![arXiv](https://img.shields.io/badge/arXiv-2410.04064-B31B1B.svg)](https://arxiv.org/pdf/2410.04064) - *Zadeh et al. (2024.10)*
101 | *   **ChartQA: A Benchmark for Question Answering about Charts with Visual and Logical Reasoning** [![arXiv](https://img.shields.io/badge/arXiv-2203.10244-B31B1B.svg)](https://arxiv.org/pdf/2203.10244) - *Masry et al. (2022.03)*
102 | *   **CharXiv: Charting Gaps in Realistic Chart Understanding in Multimodal LLMs** [![arXiv](https://img.shields.io/badge/arXiv-2406.18521-B31B1B.svg)](https://arxiv.org/pdf/2406.18521) - *Wang et al. (2024.06)*
103 | *   **ChartX & ChartVLM: A Versatile Benchmark and Foundation Model for Complicated Chart Reasoning** [![arXiv](https://img.shields.io/badge/arXiv-2402.12185-B31B1B.svg)](https://arxiv.org/pdf/2402.12185) - *Xia et al. (2024.02)*
104 | *   **Chain-of-Table: Evolving Tables in the Reasoning Chain for Table Understanding** [![arXiv](https://img.shields.io/badge/arXiv-2401.04398-B31B1B.svg)](https://arxiv.org/pdf/2401.04398) - *Wang et al. (2024.01)*
105 | *   **TableBench: A Comprehensive and Complex Benchmark for Table Question Answering** [![arXiv](https://img.shields.io/badge/arXiv-2408.09174-B31B1B.svg)](https://arxiv.org/pdf/2408.09174) - *Wu et al. (2024.08)*
106 | *   **Tables as Texts or Images: Evaluating the Table Reasoning Ability of LLMs and MLLMs** [![arXiv](https://img.shields.io/badge/arXiv-2402.12424-B31B1B.svg)](https://arxiv.org/pdf/2402.12424) - *Deng et al. (2024.02)*
107 | 
108 | ### Conclusion and Hypothesis Validation
109 | 
110 | LLMs providing feedback, verifying claims, replicating results, and generating reviews.
111 | 
112 | *   **CLAIMCHECK: How Grounded are LLM Critiques of Scientific Papers?** [![arXiv](https://img.shields.io/badge/arXiv-2503.21717-B31B1B.svg)](https://arxiv.org/pdf/2503.21717) - *Ou et al. (2025.03)*
113 | *   **LLMs Assist NLP Researchers: Critique Paper (Meta-)Reviewing**  [![arXiv](https://img.shields.io/badge/arXiv-2406.16253-B31B1B.svg)](https://arxiv.org/pdf/2406.16253) - *Du et al. (2024.06)*
114 | *   **AI-Driven Review Systems: Evaluating LLMs in Scalable and Bias-Aware Academic Reviews**  [![arXiv](https://img.shields.io/badge/arXiv-2408.10365-B31B1B.svg)](https://arxiv.org/pdf/2408.10365) - *Tyser et al. (2024.08)*
115 | *   **Is LLM a Reliable Reviewer? A Comprehensive Evaluation of LLM on Automatic Paper Reviewing Tasks**  [![Link](https://img.shields.io/badge/Link-LREC--COLING_2024-blue.svg)](https://aclanthology.org/2024.lrec-main.816.pdf) - *Zhou et al. (2024.05)*
116 | *   **ReviewerGPT? An Exploratory Study on Using Large Language Models for Paper Reviewing** [![arXiv](https://img.shields.io/badge/arXiv-2306.00622-B31B1B.svg)](https://arxiv.org/pdf/2306.00622) - *Liu and Shah (2023.06)*
117 | *   **Towards Autonomous Hypothesis Verification via Language Models with Minimal Guidance** [![arXiv](https://img.shields.io/badge/arXiv-2311.09706-B31B1B.svg)](https://arxiv.org/pdf/2311.09706) - *Takagi et al. (2023.11)*
118 | *   **CycleResearcher: Improving Automated Research via Automated Review** [![arXiv](https://img.shields.io/badge/arXiv-2411.00816-B31B1B.svg)](https://arxiv.org/pdf/2411.00816) - *Weng et al. (2024.11)*
119 | *   **PaperBench: Evaluating AI’s Ability to Replicate AI Research** [![arXiv](https://img.shields.io/badge/arXiv-2504.01848-B31B1B.svg)](https://arxiv.org/pdf/2504.01848) - *Starace et al. (2025.04)*
120 | *   **SciReplicate-Bench: Benchmarking LLMs in Agent-driven Algorithmic Reproduction from Research Papers** [![arXiv](https://img.shields.io/badge/arXiv-2504.00255-B31B1B.svg)](https://arxiv.org/pdf/2504.00255) - *Xiang et al. (2025.04)*
121 | *   **Advancing AI-Scientist Understanding: Making LLM Think Like a Physicist with Interpretable Reasoning** [![arXiv](https://img.shields.io/badge/arXiv-2504.01911-B31B1B.svg)](https://arxiv.org/pdf/2504.01911) - *Xu et al. (2025.04)*
122 | 
123 | ### Iteration and Refinement
124 | 
125 | LLMs involved in iterative refinement of research hypotheses and strategic exploration.
126 | 
127 | *   **Verification and Refinement of Natural Language Explanations through LLM-Symbolic Theorem Proving** [![arXiv](https://img.shields.io/badge/arXiv-2405.01379-B31B1B.svg)](https://arxiv.org/pdf/2405.01379) - *Quan et al. (2024.05)*
128 | *   **Chain of Ideas: Revolutionizing Research Via Novel Idea Development with LLM Agents** [![arXiv](https://img.shields.io/badge/arXiv-2410.13185-B31B1B.svg)](https://arxiv.org/pdf/2410.13185) - *Li et al. (2024.10)*
129 | *   **Iterative Hypothesis Generation for Scientific Discovery with Monte Carlo Nash Equilibrium Self-Refining Trees** [![arXiv](https://img.shields.io/badge/arXiv-2503.19309-B31B1B.svg)](https://arxiv.org/pdf/2503.19309) - *Rabby et al. (2025.03)*
130 | *   **XtraGPT: LLMs for Human-AI Collaboration on Controllable Academic Paper Revision** [![arXiv](https://img.shields.io/badge/arXiv-2505.11336-B31B1B.svg)](https://arxiv.org/pdf/2505.11336) - *Chen et al. (2025.05)*
131 | 
132 | ---
133 | 
134 | ## Level 2: LLM as Analyst
135 | 
136 | LLMs exhibiting a greater degree of autonomy, functioning as passive agents capable of complex information processing, data modeling, and analytical reasoning with reduced human intervention.
137 | 
138 | ### Machine Learning Research
139 | 
140 | Automated modeling of machine learning tasks, experiment design, and execution.
141 | 
142 | *   **MLAgentBench: Evaluating Language Agents on Machine Learning Experimentation** [![arXiv](https://img.shields.io/badge/arXiv-2310.03302-B31B1B.svg)](https://arxiv.org/pdf/2310.03302) - *Huang et al. (2023.10)*
143 | *   **MLR-Copilot: Autonomous Machine Learning Research based on Large Language Models Agents** [![arXiv](https://img.shields.io/badge/arXiv-2408.14033-B31B1B.svg)](https://arxiv.org/pdf/2408.14033) - *Li et al. (2024.08)*
144 | *   **MLE-bench: Evaluating Machine Learning Agents on Machine Learning Engineering** [![arXiv](https://img.shields.io/badge/arXiv-2410.07095-B31B1B.svg)](https://arxiv.org/pdf/2410.07095) - *Chan et al. (2024.10)*
145 | *   **IMPROVE: Iterative Model Pipeline Refinement and Optimization Leveraging LLM Agents** [![arXiv](https://img.shields.io/badge/arXiv-2502.18530v1-B31B1B.svg)](https://arxiv.org/pdf/2502.18530v1) - *Xue et al. (2025.02)*
146 | *   **CodeScientist: End-to-End Semi-Automated Scientific Discovery with Code-based Experimentation** [![arXiv](https://img.shields.io/badge/arXiv-2503.22708-B31B1B.svg)](https://arxiv.org/pdf/2503.22708) - *Jansen et al. (2025.03)*
147 | *   **MLRC-Bench: Can Language Agents Solve Machine Learning Research Challenges?** [![arXiv](https://img.shields.io/badge/arXiv-2504.09702-B31B1B.svg)](https://arxiv.org/pdf/2504.09702) - *Zhang et al. (2025.04)*
148 | 
149 | ### Data Modeling and Analysis
150 | 
151 | Automated data-driven analysis, statistical data modeling, and hypothesis validation.
152 | 
153 | *   **Automated Statistical Model Discovery with Language Models**  [![arXiv](https://img.shields.io/badge/arXiv-2402.17879-B31B1B.svg)](https://arxiv.org/pdf/2402.17879) - *Li et al. (2024.02)*
154 | *   **InfiAgent-DABench: Evaluating Agents on Data Analysis Tasks** [![arXiv](https://img.shields.io/badge/arXiv-2401.05507-B31B1B.svg)](https://arxiv.org/pdf/2401.05507) - *Hu et al. (2024.01)*
155 | *   **DS-Agent: Automated Data Science by Empowering Large Language Models with Case-Based Reasoning** [![arXiv](https://img.shields.io/badge/arXiv-2402.17453-B31B1B.svg)](https://arxiv.org/pdf/2402.17453) - *Guo et al. (2024.02)*
156 | *   **BLADE: Benchmarking Language Model Agents for Data-Driven Science** [![arXiv](https://img.shields.io/badge/arXiv-2408.09667-B31B1B.svg)](https://arxiv.org/pdf/2408.09667) - *Gu et al. (2024.08)*
157 | *   **DAgent: A Relational Database-Driven Data Analysis Report Generation Agent** [![arXiv](https://img.shields.io/badge/arXiv-2503.13269-B31B1B.svg)](https://arxiv.org/pdf/2503.13269) - *Xu et al. (2025.03)*
158 | *   **DiscoveryBench: Towards Data-Driven Discovery with Large Language Models** [![arXiv](https://img.shields.io/badge/arXiv-2407.01725-B31B1B.svg)](https://arxiv.org/pdf/2407.01725) - *Majumder et al. (2024.07)*
159 | *   **Large Language Models for Scientific Synthesis, Inference and Explanation**  [![arXiv](https://img.shields.io/badge/arXiv-2310.07984-B31B1B.svg)](https://arxiv.org/pdf/2310.07984) - *Zheng et al. (2023.10)*
160 | *   **MM-Agent: LLM as Agents for Real-world Mathematical Modeling Problem**  [![arXiv](https://img.shields.io/badge/arXiv-2505.14148-B31B1B.svg)](https://arxiv.org/pdf/2505.14148) - *Liu et al. (2025.05)*
161 | 
162 | ### Function Discovery
163 | 
164 | Identifying underlying equations from observational data (AI-driven symbolic regression).
165 | 
166 | *   **LLM-SR: Scientific Equation Discovery via Programming with Large Language Models** [![arXiv](https://img.shields.io/badge/arXiv-2404.18400-B31B1B.svg)](https://arxiv.org/pdf/2404.18400) - *Shojaee et al. (2024.04)*
167 | *   **LLM-SRBench: A New Benchmark for Scientific Equation Discovery with Large Language Models** [![arXiv](https://img.shields.io/badge/arXiv-2504.10415-B31B1B.svg)](https://arxiv.org/pdf/2504.10415) - *Shojaee et al. (2025.04)*
168 | *   **Gravity-Bench-v1: A Benchmark on Gravitational Physics Discovery for Agents** [![arXiv](https://img.shields.io/badge/arXiv-2501.18411-B31B1B.svg)](https://arxiv.org/pdf/2501.18411) - *Koblischke et al. (2025.01)*
169 | 
170 | ### Natural Science Research
171 | 
172 | Autonomous research workflows for natural science discovery (e.g., chemistry, biology, biomedicine).
173 | 
174 | *   **Coscientist: Autonomous Chemical Research with Large Language Models** [![DOI](https://img.shields.io/badge/DOI-10.1038/s41586--023--06792--0-blue.svg)](https://www.nature.com/articles/s41586-023-06792-0) - *Boiko et al. (2023.10)*
175 | *   **Empowering biomedical discovery with AI agents**  [![DOI](https://img.shields.io/badge/DOI-10.1016/j.cell.2024.08.026-blue.svg)](https://www.cell.com/action/showPdf?pii=S0092-8674%2824%2901070-5) - *Gao et al. (2024.09)*
176 | *   **From Intention To Implementation: Automating Biomedical Research via LLMs**  [![arXiv](https://img.shields.io/badge/arXiv-2412.09429-B31B1B.svg)](https://arxiv.org/pdf/2412.09429) - *Luo et al. (2024.12)*
177 | *   **DrugAgent: Automating AI-aided Drug Discovery Programming through LLM Multi-Agent Collaboration** [![arXiv](https://img.shields.io/badge/arXiv-2411.15692-B31B1B.svg)](https://arxiv.org/pdf/2411.15692) - *Liu et al. (2024.11)*
178 | *   **ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery** [![arXiv](https://img.shields.io/badge/arXiv-2410.05080-B31B1B.svg)](https://arxiv.org/pdf/2410.05080) - *Chen et al. (2024.10)*
179 | *   **ProtAgents: Protein discovery by combining physics and machine learning** [![arXiv](https://img.shields.io/badge/arXiv-2402.04268-B31B1B.svg)](https://arxiv.org/pdf/2402.04268) - *Ghafarollahi and Buehler (2024.02)*
180 | *   **Auto-Bench: An Automated Benchmark for Scientific Discovery in LLMs** [![arXiv](https://img.shields.io/badge/arXiv-2502.15224-B31B1B.svg)](https://arxiv.org/pdf/2502.15224) - *Chen et al. (2025.02)*
181 | *   **Towards an AI co-scientist** [![arXiv](https://img.shields.io/badge/arXiv-2502.18864-B31B1B.svg)](https://arxiv.org/pdf/2502.18864) - *Gottweis et al. (2025.02)*
182 | 
183 | ### General Research
184 | 
185 | Benchmarks and frameworks evaluating diverse tasks from different stages of scientific discovery.
186 | 
187 | *   **DISCOVERYWORLD: A Virtual Environment for Developing and Evaluating Automated Scientific Discovery Agents** [![arXiv](https://img.shields.io/badge/arXiv-2406.06769-B31B1B.svg)](https://arxiv.org/pdf/2406.06769) - *Jansen et al. (2024.06)*
188 | *   **A Vision for Auto Research with LLM Agents** [![arXiv](https://img.shields.io/badge/arXiv-2504.18765-B31B1B.svg)](https://arxiv.org/pdf/2504.18765) - *Liu et al. (2025.04)*
189 | *   **CURIE: Evaluating LLMs On Multitask Scientific Long Context Understanding and Reasoning** [![arXiv](https://img.shields.io/badge/arXiv-2503.13517-B31B1B.svg)](https://arxiv.org/pdf/2503.13517) - *Cui et al. (2025.03)*
190 | *   **EAIRA: Establishing a Methodology for Evaluating AI Models as Scientific Research Assistants** [![arXiv](https://img.shields.io/badge/arXiv-2502.20309-B31B1B.svg)](https://arxiv.org/pdf/2502.20309) - *Cappello et al. (2025.02)*
191 | 
192 | ### Survey Generation
193 | 
194 | *   **AutoSurvey: Large Language Models Can Automatically Write Surveys** [![arXiv](https://img.shields.io/badge/arXiv-2406.10252-B31B1B.svg)](https://arxiv.org/pdf/2406.10252) - *Wang et al. (2024.06)*
195 | ---
196 | 
197 | ## Level 3: LLM as Scientist
198 | 
199 | LLM-based systems operating as active agents capable of orchestrating and navigating multiple stages of the scientific discovery process with considerable independence, often culminating in draft research papers.
200 | 
201 | *   **Agent Laboratory: Using LLM Agents as Research Assistants** [![arXiv](https://img.shields.io/badge/arXiv-2501.04227-B31B1B.svg)](https://arxiv.org/pdf/2501.04227) - *Schmidgall et al. (2025.01)*
202 | *   **The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery** [![arXiv](https://img.shields.io/badge/arXiv-2408.06292-B31B1B.svg)](https://arxiv.org/pdf/2408.06292) - *Lu et al. (2024.08)*
203 | *   **The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search** [![arXiv](https://img.shields.io/badge/arXiv-2504.08066-B31B1B.svg)](https://arxiv.org/pdf/2504.08066) - *Yamada et al. (2025.04)*
204 | *   **AI-Researcher: Fully-Automated Scientific Discovery with LLM Agents** [![GitHub](https://img.shields.io/badge/GitHub-HKUDS/AI--Researcher-blue.svg)](https://github.com/HKUDS/AI-Researcher) - *Data Intelligence Lab (2025.03)*
205 | *   **Zochi Technical Report** [![Link](https://img.shields.io/badge/Link-Intology.AI-blue.svg)](https://www.intology.ai/blog/zochi-tech-report) - *Intology AI (2025.03)*
206 | *   **Meet Carl: The First AI System To Produce Academically Peer-Reviewed Research** [![Link](https://img.shields.io/badge/Link-AutoScience.AI-blue.svg)](https://www.autoscience.ai/blog/meet-carl-the-first-ai-system-to-produce-academically-peer-reviewed-research) - *Autoscience Institute (2025.03)*
207 | 
208 | ---
209 | 
210 | ## Contributing
211 | 
212 | Contributions are welcome! If you have a paper, tool, or resource that fits into this taxonomy, please submit a **pull request**.
213 | 
214 | ---
215 | 
216 | ## Citation
217 | 
218 | Please cite our paper if you found our survey helpful:
219 | ```bibtex
220 | @misc{zheng2025automationautonomysurveylarge,
221 |       title={From Automation to Autonomy: A Survey on Large Language Models in Scientific Discovery}, 
222 |       author={Tianshi Zheng and Zheye Deng and Hong Ting Tsang and Weiqi Wang and Jiaxin Bai and Zihao Wang and Yangqiu Song},
223 |       year={2025},
224 |       eprint={2505.13259},
225 |       archivePrefix={arXiv},
226 |       primaryClass={cs.CL},
227 |       url={https://arxiv.org/abs/2505.13259}, 
228 | }
229 | 


--------------------------------------------------------------------------------
/main.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HKUST-KnowComp/Awesome-LLM-Scientific-Discovery/d94f7356ab6e62ed61ff23c10b2592d285f700ee/main.pdf


--------------------------------------------------------------------------------
/taxonomy.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/HKUST-KnowComp/Awesome-LLM-Scientific-Discovery/d94f7356ab6e62ed61ff23c10b2592d285f700ee/taxonomy.png


--------------------------------------------------------------------------------