└── README.md


/README.md:
--------------------------------------------------------------------------------
  1 | 
  2 | ## Awesome open coding-copilot and friends
  3 | 
  4 | This is a curated list of resources related to coding-copilot and related tools.
  5 | 
  6 | ## Table of Contents
  7 | 
  8 | - [Awesome open coding-copilot and friends](#awesome-open-coding-copilot-and-friends)
  9 | - [Table of Contents](#table-of-contents)
 10 | - [Plugins](#plugins)
 11 | - [Models](#models)
 12 |   - [Fine-tuning StarCoder](#fine-tuning-starcoder)
 13 | - [Benchmarks/Datasets](#benchmarksdatasets)
 14 |   - [Natural Language to Code](#natural-language-to-code)
 15 |   - [Code to Natural Language](#code-to-natural-language)
 16 |   - [Code Completion](#code-completion)
 17 |   - [Code Search](#code-search)
 18 | - [Papers](#papers)
 19 | - [Performance Comparisons](#performance-comparisons)
 20 | - [Related Open Source Projects](#related-open-source-projects)
 21 | - [Ethics and Challenges](#ethics-and-challenges)
 22 | - [Contributing Guidelines](#contributing-guidelines)
 23 | 
 24 | 
 25 | ## Plugins
 26 | 
 27 | These plugins integrate AI-assisted coding capabilities into various development environments:
 28 | 
 29 | - [copilot-clone](https://github.com/hieunc229/copilot-clone) - An open-source alternative to GitHub Copilot
 30 | - [fauxpilot](https://github.com/fauxpilot/fauxpilot) - An open-source alternative to GitHub Copilot server
 31 | - [twinny](https://github.com/twinnydotdev/twinny) - The most no-nonsense, locally or API-hosted AI code completion plugin for Visual Studio Code
 32 | - [claude-dev](https://github.com/saoudrizwan/claude-dev) - Autonomous software engineer right in your IDE, capable of creating/editing files, executing commands, and more with your permission every step of the way.
 33 | 
 34 | 
 35 | ## Models
 36 | 
 37 | These are some of the prominent models used for code generation and understanding:
 38 | 
 39 | 
 40 | - [CodeT5](https://github.com/salesforce/CodeT5) - Open-source model for code understanding and generation
 41 | - [CodeGen](https://github.com/salesforce/CodeGen) - Large language model for program synthesis
 42 | - [StarCoder](https://huggingface.co/blog/starcoder) - Large language model trained on source code
 43 | - [CodeBERT](https://github.com/microsoft/CodeBERT) - Pre-trained model for programming language
 44 | - [GPT-Neo](https://github.com/EleutherAI/gpt-neo) - Open-source alternative to GPT-3
 45 | - [CodeParrot](https://huggingface.co/codeparrot) - Large language model trained on code
 46 | 
 47 | ### Fine-tuning StarCoder
 48 | - [StarCoder Fine-tuning Guide](https://github.com/bigcode-project/starcoder/tree/main?tab=readme-ov-file#fine-tuning)
 49 | 
 50 | 
 51 | ## Benchmarks/Datasets
 52 | 
 53 | Resources for evaluating and training code models:
 54 | 
 55 | - [CodeSearchNet](https://github.com/github/CodeSearchNet) - Dataset and benchmark for code search
 56 | - [Stack Exchange Dataset](https://huggingface.co/datasets/ArmelR/stack-exchange-instruction) - Instruction dataset based on Stack Exchange
 57 | - [The Pile](https://pile.eleuther.ai/) - Large-scale dataset including programming language data
 58 | - [APPS](https://github.com/hendrycks/apps) - Benchmark for code generation
 59 | - [HumanEval](https://github.com/openai/human-eval) - Benchmark for evaluating language models on coding tasks
 60 | - [CodeXGLUE](https://github.com/microsoft/CodeXGLUE) - Benchmark dataset for code intelligence
 61 | 
 62 | ### Natural Language to Code
 63 | - [CoNaLa](https://conala-corpus.github.io/) - Dataset for mapping natural language to code
 64 | - [NL2Code](https://github.com/neulab/nl2code) - A dataset for natural language to code generation
 65 | 
 66 | ### Code to Natural Language
 67 | - [CodeNN](https://github.com/sriniiyer/codenn) - Dataset for code summarization
 68 | - [FunCom](https://github.com/LethargicLeprechaun/FunCom) - Dataset for method name generation
 69 | 
 70 | ### Code Completion
 71 | - [PY150](https://www.sri.inf.ethz.ch/py150) - Dataset for Python code completion
 72 | - [CodeCompletionBenchmark](https://github.com/google-research/google-research/tree/master/CodeCompletionBenchmark) - Google's benchmark for code completion
 73 | 
 74 | ### Code Search
 75 | - [CodeSearchNet Challenge](https://github.com/github/CodeSearchNet#datasets) - Multiple datasets for code search tasks
 76 | - [AdvTest](https://github.com/microsoft/CodeXGLUE/tree/main/Code-Code/AdvTest) - Advanced code-to-code search dataset
 77 | 
 78 | 
 79 | 
 80 | 
 81 | ## Papers
 82 | 
 83 | Seminal and recent research papers in the field:
 84 | 
 85 | - [Evaluating Large Language Models Trained on Code](https://arxiv.org/abs/2107.03374) - OpenAI's paper on Codex
 86 | - [PyMT5: Multi-mode Translation of Natural Language and Python Code with Transformers](https://arxiv.org/abs/2010.03150)
 87 | - [InCoder: A Generative Model for Code Infilling and Synthesis](https://arxiv.org/abs/2204.05999)
 88 | - [Competition-Level Code Generation with AlphaCode](https://arxiv.org/abs/2203.07814)
 89 | - [CodeXGLUE: A Benchmark Dataset and Open Challenge for Code Intelligence](https://arxiv.org/abs/2102.04664)
 90 | - [CodeSearchNet Challenge: Evaluating the State of Semantic Code Search](https://arxiv.org/abs/1909.09436)
 91 | - [CodeBERT: A Pre-Trained Model for Programming and Natural Languages](https://arxiv.org/abs/2002.08155)
 92 | - [CodeT5: Transformer-based Models for Code Understanding](https://arxiv.org/abs/2102.04664)
 93 | - [StarCoder: A Large Language Model for Program Synthesis](https://arxiv.org/abs/2202.08397)
 94 | - [CodeParrot: A Large Language Model for Code Generation](https://arxiv.org/abs/2202.08397)
 95 | - [GPT-Neo: Large-Scale Language Models for Code Generation](https://arxiv.org/abs/2202.08397)
 96 | - [DeepCoder: Learning to Write Programs](https://arxiv.org/abs/1611.01989)
 97 | - [Code2Vec: Learning Distributed Representations of Code](https://arxiv.org/abs/1803.09473)
 98 | 
 99 | 
100 | ## Performance Comparisons
101 | 
102 | Studies and articles comparing the performance of different AI coding assistants:
103 | 
104 | - [Comparing Copilot, ChatGPT, and Human Developers](https://arxiv.org/abs/2307.08908) - Research study on code generation performance
105 | - [Evaluating LLM Performance on Programming Tasks](https://arxiv.org/abs/2305.18323) - Comprehensive evaluation of various models
106 | 
107 | ## Related Open Source Projects
108 | 
109 | Open-source projects that complement or enhance AI-assisted coding:
110 | 
111 | - [LSP](https://microsoft.github.io/language-server-protocol/) - Language Server Protocol for editor-agnostic tooling
112 | - [Tree-sitter](https://tree-sitter.github.io/tree-sitter/) - Parser generator tool and incremental parsing library
113 | - [SonarQube](https://www.sonarqube.org/) - Static code analysis tool that can be enhanced with AI capabilities
114 | 
115 | 
116 | ## Ethics and Challenges
117 | 
118 | Discussions and resources on the ethical considerations and challenges in AI-assisted coding:
119 | 
120 | - [The Ethics of AI-Assisted Coding](https://dl.acm.org/doi/10.1145/3635715) - ACM article on ethical considerations
121 | - [Challenges in AI-Assisted Coding](https://arxiv.org/abs/2402.04141) - Research paper on challenges and future directions
122 | - [Copyright and AI-Generated Code](https://arxiv.org/abs/2402.02333) - Legal perspective on AI-generated code
123 | 
124 | 
125 | 
126 | ## Contributing Guidelines
127 | 
128 | We welcome contributions to this awesome list! Here's how you can contribute:
129 | 
130 | 1. Fork the repository
131 | 2. Create a new branch for your additions
132 | 3. Add your links and descriptions, following the existing format
133 | 4. Ensure your additions are in alphabetical order within their respective sections
134 | 5. Create a pull request with a clear description of your changes
135 | 
136 | Please make sure any resources you add are:
137 | - Relevant to AI-assisted coding
138 | - Of high quality and useful to the community
139 | - Not duplicates of existing entries
140 | 
141 | Contributions to this awesome list are welcome! Please submit a pull request or open an issue to suggest additions or changes.
142 | 


--------------------------------------------------------------------------------