🤖✨ Awesome Repository-Level Code Generation ✨🤖

├── LICENSE
└── README.md


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2024 Yerba
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | <h1 align="center">🤖✨ Awesome Repository-Level Code Generation ✨🤖</h1>
  2 | 
  3 | <p align="center">
  4 |   <a href="https://awesome.re"><img src="https://awesome.re/badge.svg"></a>
  5 |   <img src="https://img.shields.io/github/stars/YerbaPage/Awesome-Repo-Level-Code-Generation">
  6 |   <img src="https://img.shields.io/badge/PRs-Welcome-red">
  7 |   <img src="https://img.shields.io/github/last-commit/YerbaPage/Awesome-Repo-Level-Code-Generation">
  8 | </p>
  9 | 
 10 | 🌟 A curated list of awesome repository-level code generation research papers and resources. If you want to contribute to this list (please do), feel free to send me a pull request. 🚀 If you have any further questions, feel free to contact [Yuling Shi](https://yerbasite.github.io) or [Xiaodong Gu](https://guxd.github.io) (SJTU).
 11 | 
 12 | ## 📚 Contents
 13 | 
 14 | - [📚 Contents](#-contents)
 15 | - [💥 Repo-Level Issue Resolution](#-repo-level-issue-resolution)
 16 | - [🤖 Repo-Level Code Completion](#-repo-level-code-completion)
 17 | - [🔄 Repo-Level Code Translation](#-repo-level-code-translation)
 18 | - [🧪 Repo-Level Unit Test Generation](#-repo-level-unit-test-generation)
 19 | - [🔍 Repo-Level Code QA](#-repo-level-code-qa)
 20 | - [👩‍💻 Repo-Level Issue Task Synthesis](#-repo-level-issue-task-synthesis)
 21 | - [📊 Datasets and Benchmarks](#-datasets-and-benchmarks)
 22 | 
 23 | ## 💥 Repo-Level Issue Resolution 
 24 | 
 25 | - SWE-Exp: Experience-Driven Software Issue Resolution [2025-07-arXiv] [[📄 paper](http://arxiv.org/abs/2507.23361)] [[🔗 repo](https://github.com/YerbaPage/SWE-Exp)]
 26 | 
 27 | - SWE-Debate: Competitive Multi-Agent Debate for Software Issue Resolution [2025-07-arXiv] [[📄 paper](http://arxiv.org/abs/2507.23348)] [[🔗 repo](https://github.com/YerbaPage/SWE-Debate)]
 28 | 
 29 | - LIVE-SWE-AGENT: Can Software Engineering Agents Self-Evolve on the Fly? [2025-11-arXiv] [[📄 paper](https://arxiv.org/abs/2511.13646)]
 30 | 
 31 | - Understanding Code Agent Behaviour: An Empirical Study of Success and Failure Trajectories  [2025-10-arXiv] [[📄 paper](https://arxiv.org/abs/2511.00197)]
 32 | 
 33 | - BugPilot: Complex Bug Generation for Efficient Learning of SWE Skills  [2025-10-arXiv] [[📄 paper](https://arxiv.org/pdf/2510.19898)]
 34 | 
 35 | - Where LLM Agents Fail and How They can Learn From Failures  [2025-09-arXiv] [[📄 paper](https://www.arxiv.org/abs/2509.25370)] [[🔗 repo](https://github.com/ulab-uiuc/AgentDebug)]
 36 | 
 37 | - SWE-Effi: Re-Evaluating Software AI Agent System Effectiveness Under Resource Constraints  [2025-09-arXiv] [[📄 paper](https://arxiv.org/abs/2509.09853)]
 38 | 
 39 | - Diffusion is a code repair operator and generator [2025-08-arXiv] [[📄 paper](https://arxiv.org/abs/2508.11110)]
 40 | 
 41 | - The SWE-Bench Illusion: When State-of-the-Art LLMs Remember Instead of Reason [2025-06-arXiv] [[📄 paper](https://arxiv.org/abs/2506.12286)]
 42 | 
 43 | - Agent-RLVR: Training Software Engineering Agents via Guidance and Environment Rewards [2025-06-arXiv] [[📄 paper](https://arxiv.org/pdf/2506.11425)]
 44 | 
 45 | - EXPEREPAIR: Dual-Memory Enhanced LLM-based Repository-Level Program Repair [2025-06-arXiv] [[📄 paper](https://arxiv.org/pdf/2506.10484)]
 46 | 
 47 | - Coding Agents with Multimodal Browsing are Generalist Problem Solvers [2025-06-arXiv] [[📄 paper](https://arxiv.org/pdf/2506.03011)] [[🔗 repo](https://github.com/adityasoni9998/OpenHands-Versa)]
 48 | 
 49 | - CoRet: Improved Retriever for Code Editing [2025-05-arXiv] [[📄 paper](https://arxiv.org/abs/2505.24715)] 
 50 | 
 51 | - Darwin Godel Machine: Open-Ended Evolution of Self-Improving Agents [2025-05-arXiv] [[📄 paper](https://arxiv.org/abs/2505.22954)] [[🔗 repo](https://github.com/jennyzzt/dgm)]
 52 | 
 53 | - SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development [2025-05-arXiv] [[📄 paper](https://arxiv.org/abs/2505.16975)] [[🔗 repo](https://github.com/justLittleWhite/SWE-Dev)]
 54 | 
 55 | - Putting It All into Context: Simplifying Agents with LCLMs [2025-05-arXiv] [[📄 paper](https://arxiv.org/abs/2505.08120)]
 56 | 
 57 | - SkyRL-v0: Train Real-World Long-Horizon Agents via Reinforcement Learning [2025-05-arXiv] [[📄 blog](https://novasky-ai.notion.site/skyrl-v0)] [[🔗 repo](https://github.com/novasky-ai/skyrl-v0)]
 58 | 
 59 | - AEGIS: An Agent-based Framework for General Bug Reproduction from Issue Descriptions [2025-FSE] [[📄 paper](https://arxiv.org/pdf/2411.18015)]
 60 | 
 61 | - Thinking Longer, Not Larger: Enhancing Software Engineering Agents via Scaling Test-Time Compute [2025-03-arXiv] [[📄 paper](https://arxiv.org/abs/2503.23803)] [[🔗 repo](https://github.com/yingweima2022/SWE-Reasoner)]
 62 | 
 63 | - Enhancing Repository-Level Software Repair via Repository-Aware Knowledge Graphs [2025-03-arXiv] [[📄 paper](https://arxiv.org/abs/2503.21710)]
 64 | 
 65 | - CoSIL: Software Issue Localization via LLM-Driven Code Repository Graph Searching [2025-03-arXiv] [[📄 paper](https://arxiv.org/pdf/2503.22424)]
 66 | 
 67 | - SEAlign: Alignment Training for Software Engineering Agent [2025-03-arXiv] [[📄 paper](https://arxiv.org/abs/2503.18455)]
 68 | 
 69 | - DARS: Dynamic Action Re-Sampling to Enhance Coding Agent Performance by Adaptive Tree Traversal [2025-03-arXiv] [[📄 paper](https://arxiv.org/abs/2503.14269)] [[🔗 repo](https://github.com/darsagent/DARS-Agent)]
 70 | 
 71 | - LocAgent: Graph-Guided LLM Agents for Code Localization [2025-03-arXiv] [[📄 paper](https://arxiv.org/pdf/2503.09089)] [[🔗 repo](https://github.com/gersteinlab/LocAgent)]
 72 | 
 73 | - SoRFT: Issue Resolving with Subtask-oriented Reinforced Fine-Tuning [2025-02-arXiv] [[📄 paper](https://arxiv.org/abs/2502.20127)]
 74 | 
 75 | - SWE-RL: Advancing LLM Reasoning via Reinforcement Learning on Open Software Evolution [2025-02-arXiv] [[📄 paper](https://arxiv.org/pdf/2502.18449)] [[🔗 repo](https://github.com/facebookresearch/swe-rl)]
 76 | 
 77 | - SWE-Fixer: Training Open-Source LLMs for Effective and Efficient GitHub Issue Resolution [2025-01-arXiv] [[📄 paper](https://arxiv.org/pdf/2501.05040)] [[🔗 repo](https://github.com/InternLM/SWE-Fixer)]
 78 | 
 79 | - CodeMonkeys: Scaling Test-Time Compute for Software Engineering [2025-01-arXiv] [[📄 paper](https://arxiv.org/abs/2501.14723)] [[🔗 repo](https://github.com/google-research/code-monkeys)]
 80 | 
 81 | - Training Software Engineering Agents and Verifiers with SWE-Gym [2024-12-arXiv] [[📄 paper](https://arxiv.org/pdf/2412.21139)] [[🔗 repo](https://github.com/SWE-Gym/SWE-Gym)]
 82 | 
 83 | - CODEV: Issue Resolving with Visual Data [2024-12-arXiv] [[📄 paper](https://arxiv.org/pdf/2412.17315)] [[🔗 repo](https://github.com/luolin101/CodeV)]
 84 | 
 85 | - LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues [2024-11-arXiv] [[📄 paper](https://arxiv.org/pdf/2411.13941)]
 86 | 
 87 | - Globant Code Fixer Agent Whitepaper [2024-11] [[📄 paper](https://ai.globant.com/wp-content/uploads/2024/11/Whitepaper-Globant-Code-Fixer-Agent.pdf)]
 88 | 
 89 | - MarsCode Agent: AI-native Automated Bug Fixing [2024-11-arXiv] [[📄 paper](https://arxiv.org/abs/2409.00899)]
 90 | 
 91 | - Lingma SWE-GPT: An Open Development-Process-Centric Language Model for Automated Software Improvement [2024-11-arXiv] [[📄 paper](https://arxiv.org/html/2411.00622v1)] [[🔗 repo](https://github.com/LingmaTongyi/Lingma-SWE-GPT)]
 92 | 
 93 | - SWE-Search: Enhancing Software Agents with Monte Carlo Tree Search and Iterative Refinement [2024-10-arXiv] [[📄 paper](https://arxiv.org/pdf/2410.20285)] [[🔗 repo](https://github.com/aorwall/moatless-tree-search)]
 94 | 
 95 | - AutoCodeRover: Autonomous Program Improvement [2024-09-ISSTA] [[📄 paper](https://dl.acm.org/doi/10.1145/3650212.3680384)] [[🔗 repo](https://github.com/nus-apr/auto-code-rover)]
 96 | 
 97 | - SpecRover: Code Intent Extraction via LLMs [2024-08-arXiv] [[📄 paper](https://arxiv.org/abs/2408.02232)]
 98 | 
 99 | - OpenHands: An Open Platform for AI Software Developers as Generalist Agents [2024-07-arXiv] [[📄 paper](https://arxiv.org/abs/2407.16741)] [[🔗 repo](https://github.com/All-Hands-AI/OpenHands)]
100 | 
101 | - AGENTLESS: Demystifying LLM-based Software Engineering Agents [2024-07-arXiv] [[📄 paper](https://arxiv.org/abs/2407.01489)]
102 | 
103 | - RepoGraph: Enhancing AI Software Engineering with Repository-level Code Graph [2024-07-arXiv] [[📄 paper](https://arxiv.org/abs/2410.14684)] [[🔗 repo](https://github.com/ozyyshr/RepoGraph)]
104 | 
105 | - CodeR: Issue Resolving with Multi-Agent and Task Graphs [2024-06-arXiv] [[📄 paper](https://arxiv.org/pdf/2406.01304)] [[🔗 repo](https://github.com/NL2Code/CodeR)]
106 | 
107 | - Alibaba LingmaAgent: Improving Automated Issue Resolution via Comprehensive Repository Exploration [2024-06-arXiv] [[📄 paper](https://arxiv.org/abs/2406.01422v2)]
108 | 
109 | - SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering [2024-NeurIPS] [[📄 paper](https://arxiv.org/abs/2405.15793)] [[🔗 repo](https://github.com/SWE-agent/SWE-agent)]
110 | 
111 | ## 🤖 Repo-Level Code Completion
112 | 
113 | - Enhancing Project-Specific Code Completion by Inferring Internal API Information [2025-07-TSE] [[📄 paper](https://ieeexplore.ieee.org/abstract/document/11096713)] [[🔗 repo](https://github.com/ZJU-CTAG/InferCom)]
114 | 
115 | - CodeRAG: Supportive Code Retrieval on Bigraph for Real-World Code Generation [2025-04-arXiv] [[📄 paper](https://arxiv.org/abs/2504.10046)]
116 | 
117 | - CodexGraph: Bridging Large Language Models and Code Repositories via Code Graph Databases [2025-04-NAACL] [[📄 paper](https://aclanthology.org/2025.naacl-long.7/)]
118 | 
119 | - RTLRepoCoder: Repository-Level RTL Code Completion through the Combination of Fine-Tuning and Retrieval Augmentation [2025-04-arXiv] [[📄 paper](https://arxiv.org/abs/2504.08862)]
120 | 
121 | - Hierarchical Context Pruning: Optimizing Real-World Code Completion with Repository-Level Pretrained Code LLMs [2025-04-AAAI] [[📄 paper](https://ojs.aaai.org/index.php/AAAI/article/view/34782)] [[🔗 repo](https://github.com/Hambaobao/HCP-Coder)]
122 | 
123 | - What to Retrieve for Effective Retrieval-Augmented Code Generation? An Empirical Study and Beyond [2025-03-arXiv] [[📄 paper](https://arxiv.org/abs/2503.20589)]
124 | 
125 | - REPOFILTER: Adaptive Retrieval Context Trimming for Repository-Level Code Completion [2025-04-OpenReview] [[📄 paper](https://openreview.net/forum?id=oOSeOEXrFA)]
126 | 
127 | - Improving FIM Code Completions via Context & Curriculum Based Learning [2024-12-arXiv] [[📄 paper](https://arxiv.org/abs/2412.16589)]
128 | 
129 | - ContextModule: Improving Code Completion via Repository-level Contextual Information [2024-12-arXiv] [[📄 paper](https://arxiv.org/abs/2412.08063)]
130 | 
131 | - A^3-CodGen: A Repository-Level Code Generation Framework for Code Reuse With Local-Aware, Global-Aware, and Third-Party-Library-Aware [2024-12-TSE] [[📄 paper](https://www.computer.org/csdl/journal/ts/2024/12/10734067/21iLh4j0oG4)]
132 | 
133 | - RepoGenReflex: Enhancing Repository-Level Code Completion with Verbal Reinforcement and Retrieval-Augmented Generation [2024-09-arXiv] [[📄 paper](https://arxiv.org/abs/2409.13122)]
134 |   
135 | - RAMBO: Enhancing RAG-based Repository-Level Method Body Completion [2024-09-arXiv] [[📄 paper](https://arxiv.org/abs/2409.15204)] [[🔗 repo](https://github.com/ise-uet-vnu/rambo)]
136 | 
137 | - RLCoder: Reinforcement Learning for Repository-Level Code Completion [2024-07-arXiv] [[📄 paper](https://arxiv.org/abs/2407.19487)] [[🔗 repo](https://github.com/DeepSoftwareAnalytics/RLCoder)]
138 |   
139 | - STALL+: Boosting LLM-based Repository-level Code Completion with Static Analysis [2024-06-arXiv] [[📄 paper](https://arxiv.org/abs/2406.10018)]
140 |   
141 | - GraphCoder: Enhancing Repository-Level Code Completion via Code Context Graph-based Retrieval and Language Model [2024-06-arXiv] [[📄 paper](https://arxiv.org/abs/2406.07003)]
142 |   
143 | - Enhancing Repository-Level Code Generation with Integrated Contextual Information [2024-06-arXiv] [[📄 paper](https://arxiv.org/pdf/2406.03283)]
144 |   
145 | - R2C2-Coder: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models [2024-06-arXiv] [[📄 paper](https://arxiv.org/abs/2406.01359)]
146 |   
147 | - Natural Language to Class-level Code Generation by Iterative Tool-augmented Reasoning over Repository [2024-05-arXiv] [[📄 paper](https://arxiv.org/abs/2405.01573)] [[🔗 repo](https://github.com/microsoft/repoclassbench)]
148 |   
149 | - Iterative Refinement of Project-Level Code Context for Precise Code Generation with Compiler Feedback [2024-03-arXiv] [[📄 paper](https://arxiv.org/abs/2403.16792)] [[🔗 repo](https://github.com/CGCL-codes/naturalcc/tree/main/examples/cocogen)]
150 |   
151 | - Repoformer: Selective Retrieval for Repository-Level Code Completion [2024-03-arXiv] [[📄 paper](https://arxiv.org/abs/2403.10059)] [[🔗 repo](https://repoformer.github.io/)]
152 |   
153 | - RepoHyper: Search-Expand-Refine on Semantic Graphs for Repository-Level Code Completion [2024-03-arXiv] [[📄 paper](https://arxiv.org/abs/2403.06095)] [[🔗 repo](https://github.com/FSoft-AI4Code/RepoHyper)]
154 |   
155 | - RepoMinCoder: Improving Repository-Level Code Generation Based on Information Loss Screening [2024-07-Internetware] [[📄 paper](https://dl.acm.org/doi/10.1145/3671016.3674819)]
156 |   
157 | - CodePlan: Repository-Level Coding using LLMs and Planning [2024-07-FSE] [[📄 paper](https://dl.acm.org/doi/abs/10.1145/3643757)] [[🔗 repo](https://github.com/microsoft/codeplan)]
158 | 
159 | - DraCo: Dataflow-Guided Retrieval Augmentation for Repository-Level Code Completion [2024-05-ACL] [[📄 paper](https://aclanthology.org/2024.acl-long.431/)] [[🔗 repo](https://github.com/nju-websoft/DraCo)]
160 | 
161 | - RepoCoder: Repository-Level Code Completion Through Iterative Retrieval and Generation [2023-10-EMNLP] [[📄 paper](https://aclanthology.org/2023.emnlp-main.151/)] [[🔗 repo](https://github.com/microsoft/CodeT/tree/main/RepoCoder)]
162 |   
163 | - Monitor-Guided Decoding of Code LMs with Static Analysis of Repository Context [2023-09-NeurIPS] [[📄 paper](https://neurips.cc/virtual/2023/poster/70362)] [[🔗 repo](https://aka.ms/monitors4codegen)]
164 |   
165 | - RepoFusion: Training Code Models to Understand Your Repository [2023-06-arXiv] [[📄 paper](https://arxiv.org/abs/2306.10998)] [[🔗 repo](https://github.com/ServiceNow/RepoFusion)]
166 |   
167 | - Repository-Level Prompt Generation for Large Language Models of Code [2023-06-ICML] [[📄 paper](https://proceedings.mlr.press/v202/shrivastava23a.html)] [[🔗 repo](https://github.com/shrivastavadisha/repo_level_prompt_generation)]
168 |   
169 | - Fully Autonomous Programming with Large Language Models [2023-06-GECCO] [[📄 paper](https://dl.acm.org/doi/pdf/10.1145/3583131.3590481)] [[🔗 repo](https://github.com/KoutchemeCharles/aied2023)]
170 | 
171 | ## 🔄 Repo-Level Code Translation
172 | 
173 | - A Systematic Literature Review on Neural Code Translation [2025-05-arXiv] [[📄 paper](https://arxiv.org/abs/2505.07425)]
174 | 
175 | - EVOC2RUST: A Skeleton-guided Framework for Project-Level C-to-Rust Translation [2025-08-arXiv] [[📄 paper](https://arxiv.org/abs/2508.04295)]
176 | 
177 | - Lost in Translation: A Study of Bugs Introduced by Large Language Models while Translating Code [2024-04-ICSE] [[📄 paper](https://doi.org/10.1145/3597503.3639226)] [[🔗 repo](https://github.com/Intelligent-CAT-Lab/PLTranslationEmpirical)]
178 | 
179 | - Enhancing llm-based code translation in repository context via triple knowledge-augmented [2025-03-arXiv] [[📄 paper]([https://www.arxiv.org/pdf/2501.14257](https://arxiv.org/pdf/2503.18305))]
180 | 
181 | - C2SaferRust: Transforming C Projects into Safer Rust with NeuroSymbolic Techniques [2025-01-arXiv] [[📄 paper](https://www.arxiv.org/pdf/2501.14257)] [[🔗 repo](https://github.com/vikramnitin9/c2saferrust)]
182 | 
183 | - Scalable, Validated Code Translation of Entire Projects using Large Language Models [2025-06-PLDI] [[📄 paper](https://dl.acm.org/doi/abs/10.1145/3729315)] 
184 | 
185 | - Syzygy: Dual Code-Test C to (safe) Rust Translation using LLMs and Dynamic Analysis [2024-12-arxiv] [[📄 paper](https://arxiv.org/pdf/2412.14234)] [[🕸️ website](https://syzygy-project.github.io/)]
186 | 
187 | - RustRepoTrans: Repository-level Code Translation Benchmark Targeting Rust [2024-11-arxiv] [[📄 paper](https://arxiv.org/abs/2411.13990)] [[🔗 repo](https://github.com/SYSUSELab/RustRepoTrans)]
188 | 
189 | ## 🧪 Repo-Level Unit Test Generation
190 | - Execution-Feedback Driven Test Generation from SWE Issues [2025-08-arXiv] [[📄 paper](https://www.arxiv.org/abs/2508.06365)]
191 | 
192 | - AssertFlip: Reproducing Bugs via Inversion of LLM-Generated Passing Tests [2025-07-arXiv] [[📄 paper](https://arxiv.org/abs/2507.17542)]
193 | 
194 | - Mystique: Automated Vulnerability Patch Porting with Semantic and Syntactic-Enhanced LLM  [2025-06-arXiv] [[📄 paper](https://dl.acm.org/doi/10.1145/3715718)]
195 | 
196 | - Issue2Test: Generating Reproducing Test Cases from Issue Reports [2025-03-arXiv] [[📄 paper](https://arxiv.org/abs/2503.16320)]
197 | 
198 | - Agentic Bug Reproduction for Effective Automated Program Repair at Google [2025-02-arXiv] [[📄 paper](https://arxiv.org/abs/2502.01821)]
199 | 
200 | - LLMs as Continuous Learners: Improving the Reproduction of Defective Code in Software Issues [2024-11-arXiv] [[📄 paper](https://arxiv.org/pdf/2411.13941)]
201 | 
202 | 
203 | ## 🔍 Repo-Level Code QA
204 | 
205 | - SWE-QA: Can Language Models Answer Repository-level Code Questions? [2025-09-arXiv] [[📄 paper](https://arxiv.org/abs/2509.14635)] [[🔗 repo](https://github.com/peng-weihan/SWE-QA-Bench)]
206 | 
207 | - Decompositional Reasoning for Graph Retrieval with Large Language Models [2025-06-arXiv] [[📄 paper](https://arxiv.org/abs/2506.13380)]
208 | 
209 | - LongCodeBench: Evaluating Coding LLMs at 1M Context Windows [2025-05-arXiv] [[📄 paper](https://arxiv.org/pdf/2505.07897)]
210 | 
211 | - LocAgent: Graph-Guided LLM Agents for Code Localization [2025-03-arXiv] [[📄 paper](https://arxiv.org/abs/2503.09089)] [[🔗 repo](https://github.com/gersteinlab/LocAgent)]
212 | 
213 | - CoReQA: Uncovering Potentials of Language Models in Code Repository Question Answering [2025-01-arXiv] [[📄 paper](https://arxiv.org/pdf/2501.03447)]
214 | 
215 | - RepoChat Arena [2025-Blog] [[🔗 repo](https://blog.lmarena.ai/blog/2025/repochat-arena/)]
216 | 
217 | - RepoChat: An LLM-Powered Chatbot for GitHub Repository Question-Answering [MSR-2025] [[🔗 repo](https://2025.msrconf.org/details/msr-2025-data-and-tool-showcase-track/35/RepoChat-An-LLM-Powered-Chatbot-for-GitHub-Repository-Question-Answering)]
218 | 
219 | - CodeQueries: A Dataset of Semantic Queries over Code [2022-09-arXiv] [[📄 paper](https://arxiv.org/abs/2209.08372)]
220 | 
221 | ## 👩‍💻 Repo-Level Issue Task Synthesis
222 | - SWE-Mirror: Scaling Issue-Resolving Datasets by Mirroring Issues Across Repositories [2025-09-arXiv] [[📄 paper](https://arxiv.org/pdf/2509.08724)]
223 | 
224 | - R2E-Gym: Procedural Environments and Hybrid Verifiers for Scaling Open-Weights SWE Agents [2025-04-arXiv] [[📄 paper](https://arxiv.org/abs/2504.07164)] [[🔗 repo](https://r2e-gym.github.io/)]
225 | 
226 | - SWE-bench Goes Live! [2025-05-arXiv] [[📄 paper](https://www.arxiv.org/abs/2505.23419)] [[🔗 repo](https://github.com/microsoft/SWE-bench-Live)]
227 | 
228 | - Scaling Data for Software Engineering Agents [2025-04-arXiv] [[📄 paper](https://arxiv.org/abs/2504.21798)] [[🔗 repo](https://swesmith.com/)]
229 | 
230 | - Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs [2025-04-arXiv] [[📄 paper](https://arxiv.org/abs/2504.14757v1)] [[🔗 repo](https://github.com/FSoft-AI4Code/SWE-Synth)]
231 | 
232 | - Training Software Engineering Agents and Verifiers with SWE-Gym [2024-12-arXiv] [[📄 paper](https://arxiv.org/pdf/2412.21139)] [[🔗 repo](https://github.com/SWE-Gym/SWE-Gym)]
233 | 
234 | 
235 | ## 📊 Datasets and Benchmarks 
236 | - **SWE-Bench++**: A Framework for the Scalable Generation of Software Engineering Benchmarks from Open-Source Repositories [2025-12-arXiv] [[📄 paper](https://arxiv.org/abs/2512.17419)]
237 | 
238 | - **Multi-Docker-Eval**:  A ‘Shovel of the Gold Rush’ Benchmark on Automatic Environment Building for Software Engineering? [2025-12-arXiv] [[📄 paper](https://arxiv.org/pdf/2512.06915)]
239 | 
240 | - **CodeClash**: CodeClash: Benchmarking Goal-Oriented Software Engineering [2025-11-arXiv] [[📄 paper](https://arxiv.org/abs/2511.00839)] [[🔗 repo](https://github.com/CodeClash-ai/CodeClash)]
241 | 
242 | - **SWE-fficiency**: Can Language Models Optimize Real-World Repositories on Real Workloads? [2025-11-arXiv] [[📄 paper](https://arxiv.org/abs/2511.06090)]
243 | 
244 | - **SWE-Compass**: Towards Unified Evaluation of Agentic Coding Abilities for Large Language Models [2025-11-arXiv] [[📄 paper](https://arxiv.org/abs/2511.05459)]
245 | 
246 | - **SWE-Sharp-Bench**: A Reproducible Benchmark for C# Software Engineering Tasks [2025-11-arXiv] [[📄 paper](https://arxiv.org/abs/2511.02352)]
247 | 
248 | - **ImpossibleBench**: Measuring LLMs’ Propensity of Exploiting Test Cases [2025-10-arXiv] [[📄 paper](https://arxiv.org/pdf/2510.20270)]
249 | 
250 | - **SWE-QA**: Can Language Models Answer Repository-level Code Questions? [2025-09-arXiv] [[📄 paper](https://arxiv.org/abs/2509.14635)] [[🔗 repo](https://github.com/peng-weihan/SWE-QA-Bench)]
251 | 
252 | - **SR-Eval**: Evaluating LLMs on Code Generation under Stepwise Requirement Refinement [2025-10-arXiv] [[📄 paper](https://arxiv.org/pdf/2509.18808)]
253 | 
254 | - **RECODE-H**: A Benchmark for Research Code Development with Interactive Human Feedback [2025-09-arXiv] [[📄 paper](https://arxiv.org/pdf/2510.06186v1)]
255 | 
256 | - **Bigcodebench**: Benchmarking code generation with diverse function calls and complex instructions [ICLR-2025 Oral] [[📄 paper](https://arxiv.org/abs/2406.15877)] [[🔗 repo](https://github.com/bigcode-project/bigcodebench)]
257 | 
258 | - **Vibe Checker**: Aligning Code Evaluation with Human Preference [2025-10-arXiv] [[📄 paper](https://arxiv.org/abs/2510.07315)]
259 | 
260 | - **MULocBench**: A Benchmark for Localizing Code and Non-Code Issues in Software Projects [2025-09-arXiv] [[📄 paper](https://www.arxiv.org/abs/2509.25242)] [[🕸️ website](https://huggingface.co/datasets/somethingone/MULocBench)]
261 | 
262 | - **SecureAgentBench**: Benchmarking Secure Code Generation under Realistic Vulnerability Scenarios [2025-09-arXiv] [[📄 paper](https://arxiv.org/html/2509.22097v1)]
263 |   
264 | - **SWE-bench Pro**: Can AI Agents Solve Long-Horizon Software Engineering Tasks? [2025-09] [[📄 paper](https://static.scale.com/uploads/654197dc94d34f66c0f5184e/SWEAP_Eval_Scale%20(9).pdf)] [[🔗 repo](https://github.com/scaleapi/SWE-bench_Pro-os/tree/main?tab=readme-ov-file)]
265 | 
266 | - **AutoCodeBench**: AutoCodeBench: Large Language Models are Automatic Code Benchmark Generators
267 |  [2025-08-arXiv] [[📄 paper](https://arxiv.org/abs/2508.09101)] [[🔗 repo](https://autocodebench.github.io/)]
268 | 
269 | - **LiveRepoReflection**: Turning the Tide: Repository-based Code Reflection [2025-07-arXiv] [[📄 paper](https://arxiv.org/abs/2507.09866)] [[🔗 repo](https://livereporeflection.github.io/)]
270 | 
271 | - **SWE-Perf**: SWE-Perf: Can Language Models Optimize Code Performance on Real-World Repositories? [2025-07-arXiv] [[📄 paper](https://arxiv.org/abs/2507.12415)] [[🔗 repo](https://swe-perf.github.io/)]
272 | 
273 | - **ResearchCodeBench**: Benchmarking LLMs on Implementing Novel Machine Learning Research Code [2025-06-arXiv] [[📄 paper](https://arxiv.org/abs/2506.02314)] [[🔗 repo](https://researchcodebench.github.io/)]
274 | 
275 | - **SWE-Factory**: Your Automated Factory for Issue Resolution Training Data and Evaluation Benchmarks [2025-06-arXiv] [[📄 paper](https://arxiv.org/abs/2506.10954v1)] [[🔗 repo](https://github.com/DeepSoftwareAnalytics/swe-factory)]
276 | 
277 | - **UTBoost**: Rigorous Evaluation of Coding Agents on SWE-Bench [ACL-2025] [[📄 paper](https://arxiv.org/abs/2506.09289)]
278 | 
279 | - **SWE-Flow**: Synthesizing Software Engineering Data in a Test-Driven Manner [ICML-2025] [[📄 paper](https://arxiv.org/abs/2506.09003)] [[🔗 repo](https://github.com/Hambaobao/SWE-Flow)]
280 | 
281 | - **AgentIssue-Bench**: Can Agents Fix Agent Issues? [2025-08-arXiv] [[📄 paper](https://arxiv.org/pdf/2505.20749)] [[🔗 repo](https://github.com/alfin06/AgentIssue-Bench)]
282 | 
283 | - **CodeAssistBench (CAB)**: Dataset & Benchmarking for Multi-turn Chat-Based Code Assistance [2025-07-arXiv] [[📄 paper](https://arxiv.org/abs/2507.10646)]
284 | 
285 | - **OmniGIRL**: OmniGIRL: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution [2025-05-arXiv] [[📄 paper](https://arxiv.org/abs/2505.04606)] [[🔗 repo](https://github.com/DeepSoftwareAnalytics/OmniGIRL)]
286 | 
287 | - **CodeFlowBench**: A Multi-turn, Iterative Benchmark for Complex Code Generation [2025-04-arXiv] [[📄 paper](https://arxiv.org/abs/2504.21751)] [[🔗 repo](https://github.com/Rise-1210/codeflow)]
288 | 
289 | - **SWE-Smith**: Scaling Data for Software Engineering Agents [2025-04-arXiv] [[📄 paper](https://arxiv.org/abs/2504.21798)] [[🔗 repo](https://swesmith.com/)]
290 | 
291 | - **SWE-Synth**: Synthesizing Verifiable Bug-Fix Data to Enable Large Language Models in Resolving Real-World Bugs [2025-04-arXiv] [[📄 paper](https://arxiv.org/abs/2504.14757v1)] [[🔗 repo](https://github.com/FSoft-AI4Code/SWE-Synth)]
292 | 
293 | - Are "Solved Issues" in SWE-bench Really Solved Correctly? An Empirical Study  [2025-03-arXiv] [[📄 paper](https://arxiv.org/abs/2503.15223)] 
294 | 
295 | - **Unveiling Pitfalls**: Understanding Why AI-driven Code Agents Fail at GitHub Issue Resolution [2025-03-arXiv] [[📄 paper](https://arxiv.org/pdf/2503.12374)]
296 | 
297 | - **ConvCodeWorld**: Benchmarking Conversational Code Generation in Reproducible Feedback Environments [2025-02-arXiv] [[📄 paper](https://arxiv.org/abs/2502.19852)] [[🔗 repo](https://huggingface.co/spaces/ConvCodeWorld/ConvCodeWorld)]
298 | 
299 | - **SWE-Lancer**: Can Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? [2025-arXiv] [[📄 paper](https://arxiv.org/pdf/2502.12115)] [[🔗 repo](https://github.com/openai/SWELancer-Benchmark)]
300 | 
301 | - Evaluating Agent-based Program Repair at Google [2025-01-arXiv] [[📄 paper](https://arxiv.org/pdf/2501.07531)]
302 | 
303 | - **SWE-rebench**: An Automated Pipeline for Task Collection and Decontaminated Evaluation of Software Engineering Agents [2025-05-arXiv] [[📄 paper](https://arxiv.org/abs/2505.20411)] [[🕸️ website](https://swe-rebench.com/leaderboard)]
304 | 
305 | - **SWE-bench-Live**: A Live Benchmark for Repository-Level Issue Resolution [2025-05-arXiv] [[📄 paper](https://www.arxiv.org/abs/2505.23419)] [[🔗 repo](https://github.com/microsoft/SWE-bench-Live)]
306 | 
307 | - **FEA-Bench**: A Benchmark for Evaluating Repository-Level Code Generation for Feature Implementation [2025-05-ACL] [[📄 paper](https://arxiv.org/abs/2503.06680)] [[🔗 repo](https://github.com/microsoft/FEA-Bench)]
308 | 
309 | - **OmniGIRL**: A Multilingual and Multimodal Benchmark for GitHub Issue Resolution [2025-05-ISSTA] [[📄 paper](https://arxiv.org/abs/2505.04606)]
310 | 
311 | - **SWE-PolyBench**: A multi-language benchmark for repository level evaluation of coding agents [2025-04-arXiv] [[📄 paper](https://arxiv.org/abs/2504.08703)] [[🔗 repo](https://github.com/FSoft-AI4Code/SWE-PolyBench)]
312 | 
313 | - **Multi-SWE-bench**: A Multilingual Benchmark for Issue Resolving [2025-04-arXiv] [[📄 paper](https://arxiv.org/abs/2504.02605)] [[🔗 repo](https://github.com/multi-swe-bench/multi-swe-bench)]
314 | 
315 | - **LibEvolutionEval**: A Benchmark and Study for Version-Specific Code Generation [2025-04-NAACL] [[📄 paper](https://arxiv.org/abs/2412.04478)][[🔗 Website](https://lib-evolution-eval.github.io/)]
316 | 
317 | - **SWEE-Bench & SWA-Bench**: Automated Benchmark Generation for Repository-Level Coding Tasks [2025-03-arXiv] [[📄 paper](https://arxiv.org/pdf/2503.07701)]
318 | 
319 | - **ProjectEval**: A Benchmark for Programming Agents Automated Evaluation on Project-Level Code Generation [2025 ACL-Findings] [[📄 paper](https://arxiv.org/pdf/2503.07010)] [[🔗 repo](https://github.com/RyanLoil/ProjectEval/)]
320 | 
321 | - **REPOST-TRAIN**: Scalable Repository-Level Coding Environment Construction with Sandbox Testing [2025-03-arXiv] [[📄 paper](https://arxiv.org/pdf/2503.07358)] [[🔗 repo](https://github.com/yiqingxyq/RepoST)]
322 | 
323 | - **Loc-Bench**: Graph-Guided LLM Agents for Code Localization [2025-03-arXiv] [[📄 paper](https://arxiv.org/pdf/2503.09089)] [[🔗 repo](https://github.com/gersteinlab/LocAgent)]
324 | 
325 | - **SWE-Lancer**: Can Can Frontier LLMs Earn $1 Million from Real-World Freelance Software Engineering? [2025-02-arXiv] [[📄 paper](https://arxiv.org/pdf/2502.12115)] [[🔗 repo](https://github.com/openai/SWELancer-Benchmark)]
326 | 
327 | - **SolEval**: Benchmarking Large Language Models for Repository-level Solidity Code Generation [2025-02-arXiv] [[📄 paper](https://arxiv.org/abs/2502.18793)] [[🔗 repo](https://anonymous.4open.science/r/SolEval-1C06/)]
328 | 
329 | - **HumanEvo**: An Evolution-aware Benchmark for More Realistic Evaluation of Repository-level Code Generation [2025-ICSE] [[📄 paper](https://www.computer.org/csdl/proceedings-article/icse/2025/056900a764/251mHzzKizu)] [[🔗 repo](https://github.com/DeepSoftwareAnalytics/HumanEvo)]
330 | 
331 | - **RepoExec**: On the Impacts of Contexts on Repository-Level Code Generation [2025-NAACL] [[📄 paper](https://arxiv.org/abs/2406.11927)] [[🔗 repo](https://github.com/FSoft-AI4Code/RepoExec)]
332 | 
333 | - **SWE-Gym**: Training Software Engineering Agents and Verifiers with SWE-Gym [2024-12-arXiv] [[📄 paper](https://arxiv.org/pdf/2412.21139)] [[🔗 repo](https://github.com/SWE-Gym/SWE-Gym)]
334 | 
335 | - **RepoTransBench**: RepoTransBench: A Real-World Benchmark for Repository-Level Code Translation [2024-12-arXiv] [[📄 paper](https://arxiv.org/abs/2412.17744)] [[🔗 repo](https://github.com/DeepSoftwareAnalytics/RepoTransBench)]
336 | 
337 | - **Visual SWE-bench**: Issue Resolving with Visual Data [2024-12-arXiv] [[📄 paper](https://arxiv.org/pdf/2412.17315)] [[🔗 repo](https://github.com/luolin101/CodeV)]
338 | 
339 | - **ExecRepoBench**: Multi-level Executable Code Completion Evaluation [2024-12-arXiv] [[📄 paper](https://arxiv.org/abs/2412.11990)] [[🔗 site](https://execrepobench.github.io/)]
340 | 
341 | - **REPOCOD**: Can Language Models Replace Programmers? REPOCOD Says 'Not Yet' [2024-10-arXiv] [[📄 paper](https://arxiv.org/abs/2410.21647)] [[🔗 repo](https://github.com/lt-asset/REPOCOD)]
342 | 
343 | - **M2RC-EVAL**: M2rc-Eval: Massively Multilingual Repository-level Code Completion Evaluation [2024-10-arXiv] [[📄 paper](https://arxiv.org/abs/2410.21157)] [[🔗 repo](https://github.com/M2RC-Eval-Team/M2RC-Eval)]
344 | 
345 | - **SWE-bench+**: Enhanced Coding Benchmark for LLMs [2024-10-arXiv] [[📄 paper](https://arxiv.org/pdf/2410.06992)]
346 | 
347 | - **SWE-bench Multimodal**: Multimodal Software Engineering Benchmark [2024-10-arXiv] [[📄 paper](https://arxiv.org/abs/2410.03859)] [[🔗 site](https://swebench.com/multimodal)]
348 | 
349 | - **Codev-Bench**: How Do LLMs Understand Developer-Centric Code Completion? [2024-10-arXiv] [[📄 paper](https://arxiv.org/abs/2410.01353)] [[🔗 repo](https://github.com/LingmaTongyi/Codev-Bench)]
350 | 
351 | - **SWT-Bench**: Testing and Validating Real-World Bug-Fixes with Code Agents
352 |  [2024-06-arxiv] [[📄 paper](https://arxiv.org/abs/2406.12952)] [[🕸️ website](https://swtbench.com/?results=verified)]
353 | 
354 | - **CodeRAG-Bench**: Can Retrieval Augment Code Generation? [2024-06-arXiv] [[📄 paper](http://arxiv.org/abs/2406.14497)] [[🔗 repo](https://github.com/code-rag-bench/code-rag-bench/tree/main)]
355 | 
356 | - **R2C2-Bench**: Enhancing and Benchmarking Real-world Repository-level Code Completion Abilities of Code Large Language Models [2024-06-arXiv] [[📄 paper](https://arxiv.org/abs/2406.01359)]
357 | 
358 | - **RepoClassBench**: Class-Level Code Generation from Natural Language Using Iterative, Tool-Enhanced Reasoning over Repository [2024-05-arXiv] [[📄 paper](https://arxiv.org/abs/2405.01573)] [[🔗 repo](https://github.com/microsoft/repoclassbench/tree/main)]
359 | 
360 | - **DevEval**: Evaluating Code Generation in Practical Software Projects [2024-ACL-Findings] [[📄 paper](https://aclanthology.org/2024.findings-acl.214.pdf)] [[🔗 repo](https://github.com/seketeam/DevEval)]
361 | 
362 | - **CodAgentBench**: Enhancing Code Generation with Tool-Integrated Agent Systems for Real-World Repo-level Coding Challenges [2024-ACL] [[📄 paper](https://aclanthology.org/2024.acl-long.737/)]
363 | 
364 | - **RepoBench**: Benchmarking Repository-Level Code Auto-Completion Systems [2024-ICLR] [[📄 paper](https://openreview.net/forum?id=pPjZIOuQuF)] [[🔗 repo](https://github.com/Leolty/repobench)]
365 | 
366 | - **SWE-bench**: Can Language Models Resolve Real-World GitHub Issues? [2024-ICLR] [[📄 paper](https://arxiv.org/pdf/2310.06770)] [[🔗 repo](https://github.com/princeton-nlp/SWE-bench)]
367 | 
368 | - **CrossCodeLongEval**: Repoformer: Selective Retrieval for Repository-Level Code Completion [2024-ICML] [[📄 paper](https://arxiv.org/abs/2403.10059)] [[🔗 repo](https://repoformer.github.io/)]
369 | 
370 | - **R2E-Eval**: Turning Any GitHub Repository into a Programming Agent Test Environment [2024-ICML] [[📄 paper](https://proceedings.mlr.press/v235/jain24c.html)] [[🔗 repo](https://r2e.dev/)]
371 | 
372 | - **RepoEval**: Repository-Level Code Completion Through Iterative Retrieval and Generation [2023-EMNLP] [[📄 paper](https://aclanthology.org/2023.emnlp-main.151/)] [[🔗 repo](https://github.com/microsoft/CodeT/tree/main/RepoCoder)]
373 | 
374 | - **CrossCodeEval**: A Diverse and Multilingual Benchmark for Cross-File Code Completion [2023-NeurIPS] [[📄 paper](https://proceedings.neurips.cc/paper_files/paper/2023/file/920f2dced7d32ab2ba2f1970bc306af6-Paper-Datasets_and_Benchmarks.pdf)] [[🔗 site](https://crosscodeeval.github.io/)]
375 | 
376 | - **Skeleton-Guided-Translation**: A Benchmarking Framework for Code Repository Translation with Fine-Grained Quality Evaluation [2025-01-arxiv] [[📄 paper](https://arxiv.org/abs/2501.16050)] [[🔗 repo](https://github.com/microsoft/TransRepo)]
377 | 
378 | - **SWE-Dev**: SWE-Dev: Evaluating and Training Autonomous Feature-Driven Software Development [2025-05-arXiv] [[📄 paper](https://arxiv.org/abs/2505.16975)] [[🔗 repo](https://github.com/justLittleWhite/SWE-Dev)]
379 | 
380 | ## Star History
381 | 
382 | [![Star History Chart](https://api.star-history.com/svg?repos=YerbaPage/Awesome-Repo-Level-Code-Generation&type=Date)](https://www.star-history.com/#YerbaPage/Awesome-Repo-Level-Code-Generation&Date)
383 | 


--------------------------------------------------------------------------------