├── data ├── annotated │ └── DeepSeek-Coder-V2.pdf └── summary │ └── DeepSeek-Coder-V2-summary.md ├── checklist ├── image │ └── reading_research_paper.png └── README.md └── README.md /data/annotated/DeepSeek-Coder-V2.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iampukar/reading-research-paper/main/data/annotated/DeepSeek-Coder-V2.pdf -------------------------------------------------------------------------------- /checklist/image/reading_research_paper.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/iampukar/reading-research-paper/main/checklist/image/reading_research_paper.png -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Reading Research Paper 2 | 3 | This repository provides a comprehensive guide on how to read research papers effectively. Each paper entry includes a link to the original paper, an annotated version, and a summary to help researchers understand the content more deeply. 4 | 5 | ## Before You Begin 6 | 7 | Before starting, please read through the [Reading Checklist](checklist/README.md) to understand the process of how to better and efficiently go through a research paper. This checklist provides essential tips and strategies for getting the most out of your reading experience. 8 | 9 | ### How to Use 10 | 11 | 1. **Read the Original Paper:** Start with the link to the original paper to get the full content. 12 | 2. **Annotated Version:** Use the annotated version to understand key sections and insights on the paper. 13 | 3. **Summary:** Refer to the summary for a quick overview of the paper's main contributions and findings. 14 | 15 | ## Papers 16 | 17 | | Research Paper | Link to Paper | Annotated Version | Summary | 18 | |----------------------------------------------------------------------------|-----------------------------------------------|----------------------------------------|----------------------------------| 19 | | DeepSeek-Coder-V2: Breaking the Barrier of Closed-Source Models in Code Intelligence | [Link to Paper](https://arxiv.org/pdf/2406.11931) | [Annotated Version](https://github.com/iampukar/reading-research-paper/blob/main/data/annotated/DeepSeek-Coder-V2.pdf) | [Summary](https://github.com/iampukar/reading-research-paper/blob/main/data/summary/DeepSeek-Coder-V2-summary.md) | 20 | 21 | ## Contribution 22 | 23 | We welcome contributions to this repository. If you have an annotated version or summary of a research paper that you would like to share, please follow the pattern described in this README file. Ensure your contributions include: 24 | 25 | - A link to the actual research paper. 26 | - An annotated version of the research paper. 27 | - A summary of the research paper. 28 | -------------------------------------------------------------------------------- /checklist/README.md: -------------------------------------------------------------------------------- 1 | 2 | # How to Read a Reseach Paper 3 | 4 | **Keywords: Artificial Intelligence, Research Paper, Reading, Guide** 5 | 6 | **Acronyms:** 7 | *AI - Artificial Intelligence* 8 | 9 | **Descriptor: Personal Documentation** 10 | 11 |

12 | 13 |

14 | 15 | Artificial Intelligence is a fast-paced field. Plenty of new research is happening in every corner of the world. To keep up with the advancements in the said fields, one has to read plenty of research papers. Learning how to go through a research paper is one of the vital skills, which most of us learn very late, or probably never. With alot of online resources suggesting us a start point, it left me confused with the exact planning that perfectly fits my interest. My motive behind this post is to be of any help to the fellow researchers, who wish to jumpstart their career towards AI research. Our primary goal behind reading these papers is to identify its scientific contribution. For this reason, one may expect themselves to go through the paper repeatedly. In order to get the most from our reading, I found the following approach to match my learning needs in understanding the bits and pieces involved in the paper. 16 | 17 | **Overview of the Paper
** 18 | This approach involves a lot of critical and creative reasoning. Ideally, you are trying to build a framework that can help you decide if working with the paper is worth the time. You will need anywhere between 10 to 30 minutes to scan the paper as a bird's eye view and gain a general idea about it. It generally includes the following steps: 19 | 20 | 1. Read through the title, abstract, introduction, and conclusion. 21 | 2. Glance through the sections, and sub-sections headings, but don't dive straight into 22 | its content. 23 | 3. Inspect the tables, figures, graphs, and any mathematical equations to understand the 24 | solution, but ignore the vast details of it. 25 | 4. Check for the authors, and the companies they work for but don't get intimidated by the 26 | names involved. 27 | 5. Check for references. Mark off the ones that you have already read. 28 | 29 | After covering the above fronts, one should be in comfortable position to jot out the details about the paper. This includes the following outcomes: 30 | 31 | i) Categorize the paper 32 | - Get an overview of the authors, their workplace, and the conference they have submitted the 33 | paper to. 34 | 35 | Research Paradigm 36 | - Is it a psychological experiment? 37 | - Is it an improvement over a new idea? 38 | - Is it suggesting a novel prototype? 39 | - Is there any implementation of a new discovery? Have the authors included a complete working 40 | solution to their approach? 41 | - Is it a combination of previously implemented multiple approaches combined into one? 42 | 43 | ii) Context 44 | - Identify the problem statement. Get to know the main purpose behind this paper. 45 | You will usually find it towards the ending paragraphs of the Introductory section. 46 | - What are the good ideas behind this paper? What type of scientific contribution is the 47 | author making? 48 | 49 | iii) Correctness 50 | - Check to see if it is solving the right problem. 51 | - Is the author's assumption valid? 52 | - Is their logic justifiable with the solution, or are there any flaws in their ideas? 53 | - Is there any limitation of the solution, including those that the authors might have missed? 54 | 55 | iv) Contribution 56 | - What are the major contributions of the paper? 57 | - Are there any other applications that the authors might have missed out on? 58 | - Does the author describe the work of other researchers in the same field? How does their 59 | approach differ from the other one? 60 | 61 | v) Clarity 62 | - Is the paper well written? 63 | - Does the author provide a space for future research? 64 | - Is the data correct, or gathered in a way it should be collected? 65 | - Is the implementation correct, or can there be a further generalization to it? 66 | - Can there be any improvements that can possibly make a huge difference in the author's 67 | hypothesis? 68 | 69 | **References
** 70 | 71 | [Efficient Reading of Papers in Science and Technology](https://www.cs.columbia.edu/~hgs/netbib/efficientReading.pdf) 72 | 73 | [How to Read a Paper](http://ccr.sigcomm.org/online/files/p83-keshavA.pdf) 74 | 75 | [How to read a research paper](https://www.eecs.harvard.edu/~michaelm/postscripts/ReadPaper.pdf) 76 | 77 | [Writing reviews for systems conferences](http://people.inf.ethz.ch/troscoe/pubs/review-writing.pdf) 78 | -------------------------------------------------------------------------------- /data/summary/DeepSeek-Coder-V2-summary.md: -------------------------------------------------------------------------------- 1 | # DeepSeek-Coder-V2 Summary 2 | 3 | ## Abstract 4 | 5 | - DeepSeek-Coder-V2 is an open-source Mixture-of-Experts (MoE) code language model that achieves performance comparable to GPT4-Turbo in code-specific tasks. 6 | - The model is pre-trained from an intermediate checkpoint of DeepSeek-V2 with an additional 6 trillion tokens. 7 | - Substantially enhances coding and mathematical reasoning capabilities of DeepSeek-V2 while maintaining comparable performance in general language tasks. 8 | - Demonstrates significant advancements in various aspects of code-related tasks, reasoning, and general capabilities. 9 | - Expands support for programming languages from 86 to 338, while extending the context length from 16K to 128K tokens. 10 | - Achieves superior performance compared to closed-source models such as GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro in coding and math benchmarks. 11 | 12 | ## Introduction 13 | 14 | - The development of models like StarCoder, CodeLlama, DeepSeek-Coder, and Codestral has significantly advanced open-source code intelligence. 15 | - There is still a notable performance gap between open-source models and state-of-the-art closed-source models like GPT4-Turbo, Claude 3 Opus, and Gemini 1.5 Pro. 16 | - DeepSeek-Coder-V2 aims to bridge the performance gap and advance the development of open-source code models. 17 | - The model is introduced with 16B and 236B parameters, efficiently supporting diverse computational needs. 18 | - This is the first attempt to develop an open-source, hundred-billion-parameter code model, advancing code intelligence. Released under a permissive license, allowing for both research and commercial use. 19 | - DeepSeek-Coder-V2 outperforms all open-source models and matches leading closed-source models in code generation. Achieves a 90.2% score on HumanEval, 76.2% on MBPP, and 43.4% on LiveCodeBench. 20 | - Exhibits strong mathematical reasoning, rivaling top closed-source models on both elementary and advanced benchmarks. 21 | - Maintains strong general language performance, comparable to DeepSeek-V2. 22 | 23 | ## Data Collection 24 | 25 | - The pre-training data for DeepSeek-Coder-V2 primarily consists of 60% source code, 10% math corpus, and 30% natural language corpus. 26 | - The source code includes 1,170 billion code-related tokens sourced from GitHub and CommonCrawl. The corpus supports 338 programming languages, a significant expansion from the 86 languages in previous models. 27 | - 221 billion math-related tokens are collected from CommonCrawl, doubling the size of the previous DeepSeekMath corpus. Directly sampled from the training corpus in DeepSeek-V2, contributing to a total of 10.2 trillion training tokens. 28 | - Files are filtered out if the average line length exceeds 100 characters or the maximum line length exceeds 1000 characters. Files with fewer than 25% alphabetic characters are removed. Specific rules are applied for different file types (e.g., XML, HTML, JSON, and YAML) to ensure quality and relevance. 29 | - Ablation Studies Conducted to demonstrate the effectiveness of the new code corpus. The 1B parameter model showed improvements of 6.7% and 9.4% in accuracy on the HumanEval and MBPP benchmarks, respectively. 30 | - Uses the Byte Pair Encoding (BPE) tokenizer from DeepSeek-V2 to improve recall accuracy for languages like Chinese. 31 | - Multiple iterations of data collection and validation to ensure high-quality data. Comparative analysis experiments validate the quality and effectiveness of the collected data. 32 | 33 | ## Training Policy 34 | 35 | ### Training Strategy 36 | - Uses Next-Token-Prediction and Fill-In-Middle (FIM) training objectives for the 16B model. 37 | - The 236B model utilizes only the Next-Token-Prediction objective. 38 | - Adopts the Prefix, Suffix, Middle (PSM) mode for FIM, applied at a rate of 0.5 to enhance training efficacy and model performance. 39 | 40 | ### Model Architecture 41 | - Aligns with the DeepSeekV2 architecture. 42 | - Hyperparameters settings for 16B and 236B correspond to those used in DeepSeek-V2-Lite and DeepSeek-V2, respectively. 43 | - Addressed instability during training by reverting to conventional normalization methods. 44 | 45 | ### Training Hyper-Parameters 46 | - Utilizes the AdamW optimizer with configurations: β1 = 0.9, β2 = 0.95, and a weight decay of 0.1. 47 | - Batch sizes and learning rates are adjusted according to DeepSeek-V2 specifications. 48 | - Learning rate scheduling employs a cosine decay strategy, starting with 2000 warm-up steps and reducing the learning rate to 10% of its initial value. 49 | 50 | ### Long Context Extension 51 | - Extends context length to 128K using YARN (Yarn for Attention-Recurrent Networks). 52 | - Training involves two stages: first with a sequence length of 32K and a batch size of 1152 for 1000 steps, and then with a sequence length of 128K and a batch size of 288 sequences for another 1000 steps. 53 | - Evaluations on "Needle In A Haystack" (NIAH) tests indicate effective performance across all context window lengths up to 128K. 54 | 55 | ### Alignment 56 | - Supervised Fine-Tuning includes Mixed instruction training dataset with code and math data. 57 | - Collects 20K code-related instruction data and 30K math-related data from DeepSeek-Coder and DeepSeek-Math, sampling additional data from DeepSeek-V2. 58 | - Uses a cosine schedule with 100 warm-up steps and an initial learning rate of 5e-6. Batch size of 1M tokens, with a total of 1B tokens for training. 59 | - For Reinforcement Learning, Collected 40K data prompts related to code and math, each with corresponding test cases. 60 | - Trains a reward model on compiler feedback for code and mathematical preference data to guide policy model training. 61 | - GRPO (Group Relative Policy Optimization) algorithm used for alignment, proven effective and cost-efficient compared to PPO (Proximal Policy Optimization). 62 | 63 | ## Conclusion 64 | 65 | - DeepSeek-Coder-V2 improves coding and mathematical reasoning capabilities while maintaining general language performance. 66 | - Supports 338 programming languages and extends context length to 128K tokens, a significant increase from previous versions. 67 | - Achieves competitive performance in code and math-specific tasks, comparable to leading closed-source models like GPT-4 Turbo. 68 | - Identifies a gap in instruction-following capabilities, highlighting an area for future enhancement. 69 | - Future efforts will focus on improving instruction-following to better handle complex programming scenarios and enhance development productivity. --------------------------------------------------------------------------------