└── README.md /README.md: -------------------------------------------------------------------------------- 1 | 2 | ## Awesome open coding-copilot and friends 3 | 4 | This is a curated list of resources related to coding-copilot and related tools. 5 | 6 | ## Table of Contents 7 | 8 | - [Awesome open coding-copilot and friends](#awesome-open-coding-copilot-and-friends) 9 | - [Table of Contents](#table-of-contents) 10 | - [Plugins](#plugins) 11 | - [Models](#models) 12 | - [Fine-tuning StarCoder](#fine-tuning-starcoder) 13 | - [Benchmarks/Datasets](#benchmarksdatasets) 14 | - [Natural Language to Code](#natural-language-to-code) 15 | - [Code to Natural Language](#code-to-natural-language) 16 | - [Code Completion](#code-completion) 17 | - [Code Search](#code-search) 18 | - [Papers](#papers) 19 | - [Performance Comparisons](#performance-comparisons) 20 | - [Related Open Source Projects](#related-open-source-projects) 21 | - [Ethics and Challenges](#ethics-and-challenges) 22 | - [Contributing Guidelines](#contributing-guidelines) 23 | 24 | 25 | ## Plugins 26 | 27 | These plugins integrate AI-assisted coding capabilities into various development environments: 28 | 29 | - [copilot-clone](https://github.com/hieunc229/copilot-clone) - An open-source alternative to GitHub Copilot 30 | - [fauxpilot](https://github.com/fauxpilot/fauxpilot) - An open-source alternative to GitHub Copilot server 31 | - [twinny](https://github.com/twinnydotdev/twinny) - The most no-nonsense, locally or API-hosted AI code completion plugin for Visual Studio Code 32 | - [claude-dev](https://github.com/saoudrizwan/claude-dev) - Autonomous software engineer right in your IDE, capable of creating/editing files, executing commands, and more with your permission every step of the way. 33 | 34 | 35 | ## Models 36 | 37 | These are some of the prominent models used for code generation and understanding: 38 | 39 | 40 | - [CodeT5](https://github.com/salesforce/CodeT5) - Open-source model for code understanding and generation 41 | - [CodeGen](https://github.com/salesforce/CodeGen) - Large language model for program synthesis 42 | - [StarCoder](https://huggingface.co/blog/starcoder) - Large language model trained on source code 43 | - [CodeBERT](https://github.com/microsoft/CodeBERT) - Pre-trained model for programming language 44 | - [GPT-Neo](https://github.com/EleutherAI/gpt-neo) - Open-source alternative to GPT-3 45 | - [CodeParrot](https://huggingface.co/codeparrot) - Large language model trained on code 46 | 47 | ### Fine-tuning StarCoder 48 | - [StarCoder Fine-tuning Guide](https://github.com/bigcode-project/starcoder/tree/main?tab=readme-ov-file#fine-tuning) 49 | 50 | 51 | ## Benchmarks/Datasets 52 | 53 | Resources for evaluating and training code models: 54 | 55 | - [CodeSearchNet](https://github.com/github/CodeSearchNet) - Dataset and benchmark for code search 56 | - [Stack Exchange Dataset](https://huggingface.co/datasets/ArmelR/stack-exchange-instruction) - Instruction dataset based on Stack Exchange 57 | - [The Pile](https://pile.eleuther.ai/) - Large-scale dataset including programming language data 58 | - [APPS](https://github.com/hendrycks/apps) - Benchmark for code generation 59 | - [HumanEval](https://github.com/openai/human-eval) - Benchmark for evaluating language models on coding tasks 60 | - [CodeXGLUE](https://github.com/microsoft/CodeXGLUE) - Benchmark dataset for code intelligence 61 | 62 | ### Natural Language to Code 63 | - [CoNaLa](https://conala-corpus.github.io/) - Dataset for mapping natural language to code 64 | - [NL2Code](https://github.com/neulab/nl2code) - A dataset for natural language to code generation 65 | 66 | ### Code to Natural Language 67 | - [CodeNN](https://github.com/sriniiyer/codenn) - Dataset for code summarization 68 | - [FunCom](https://github.com/LethargicLeprechaun/FunCom) - Dataset for method name generation 69 | 70 | ### Code Completion 71 | - [PY150](https://www.sri.inf.ethz.ch/py150) - Dataset for Python code completion 72 | - [CodeCompletionBenchmark](https://github.com/google-research/google-research/tree/master/CodeCompletionBenchmark) - Google's benchmark for code completion 73 | 74 | ### Code Search 75 | - [CodeSearchNet Challenge](https://github.com/github/CodeSearchNet#datasets) - Multiple datasets for code search tasks 76 | - [AdvTest](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/AdvTest) - Advanced code-to-code search dataset 77 | 78 | 79 | 80 | 81 | ## Papers 82 | 83 | Seminal and recent research papers in the field: 84 | 85 | - [Evaluating Large Language Models Trained on Code](https://arxiv.org/abs/2107.03374) - OpenAI's paper on Codex 86 | - [PyMT5: Multi-mode Translation of Natural Language and Python Code with Transformers](https://arxiv.org/abs/2010.03150) 87 | - [InCoder: A Generative Model for Code Infilling and Synthesis](https://arxiv.org/abs/2204.05999) 88 | - [Competition-Level Code Generation with AlphaCode](https://arxiv.org/abs/2203.07814) 89 | - [CodeXGLUE: A Benchmark Dataset and Open Challenge for Code Intelligence](https://arxiv.org/abs/2102.04664) 90 | - [CodeSearchNet Challenge: Evaluating the State of Semantic Code Search](https://arxiv.org/abs/1909.09436) 91 | - [CodeBERT: A Pre-Trained Model for Programming and Natural Languages](https://arxiv.org/abs/2002.08155) 92 | - [CodeT5: Transformer-based Models for Code Understanding](https://arxiv.org/abs/2102.04664) 93 | - [StarCoder: A Large Language Model for Program Synthesis](https://arxiv.org/abs/2202.08397) 94 | - [CodeParrot: A Large Language Model for Code Generation](https://arxiv.org/abs/2202.08397) 95 | - [GPT-Neo: Large-Scale Language Models for Code Generation](https://arxiv.org/abs/2202.08397) 96 | - [DeepCoder: Learning to Write Programs](https://arxiv.org/abs/1611.01989) 97 | - [Code2Vec: Learning Distributed Representations of Code](https://arxiv.org/abs/1803.09473) 98 | 99 | 100 | ## Performance Comparisons 101 | 102 | Studies and articles comparing the performance of different AI coding assistants: 103 | 104 | - [Comparing Copilot, ChatGPT, and Human Developers](https://arxiv.org/abs/2307.08908) - Research study on code generation performance 105 | - [Evaluating LLM Performance on Programming Tasks](https://arxiv.org/abs/2305.18323) - Comprehensive evaluation of various models 106 | 107 | ## Related Open Source Projects 108 | 109 | Open-source projects that complement or enhance AI-assisted coding: 110 | 111 | - [LSP](https://microsoft.github.io/language-server-protocol/) - Language Server Protocol for editor-agnostic tooling 112 | - [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) - Parser generator tool and incremental parsing library 113 | - [SonarQube](https://www.sonarqube.org/) - Static code analysis tool that can be enhanced with AI capabilities 114 | 115 | 116 | ## Ethics and Challenges 117 | 118 | Discussions and resources on the ethical considerations and challenges in AI-assisted coding: 119 | 120 | - [The Ethics of AI-Assisted Coding](https://dl.acm.org/doi/10.1145/3635715) - ACM article on ethical considerations 121 | - [Challenges in AI-Assisted Coding](https://arxiv.org/abs/2402.04141) - Research paper on challenges and future directions 122 | - [Copyright and AI-Generated Code](https://arxiv.org/abs/2402.02333) - Legal perspective on AI-generated code 123 | 124 | 125 | 126 | ## Contributing Guidelines 127 | 128 | We welcome contributions to this awesome list! Here's how you can contribute: 129 | 130 | 1. Fork the repository 131 | 2. Create a new branch for your additions 132 | 3. Add your links and descriptions, following the existing format 133 | 4. Ensure your additions are in alphabetical order within their respective sections 134 | 5. Create a pull request with a clear description of your changes 135 | 136 | Please make sure any resources you add are: 137 | - Relevant to AI-assisted coding 138 | - Of high quality and useful to the community 139 | - Not duplicates of existing entries 140 | 141 | Contributions to this awesome list are welcome! Please submit a pull request or open an issue to suggest additions or changes. 142 | --------------------------------------------------------------------------------