└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # ML Fundamentals Reading Lists 2 | 3 | ## Decision Factors 4 | 5 | This list includes papers that have significantly shaped the field of machine learning, particularly with the advent of deep learning techniques. We are fully aware that we might miss a paper or two, but in a rapidly changing industry, we believe these papers will be sufficient to serve as foundational works for each field. You are more than welcome to suggest changes; however, our goal is to keep each reading list limited to a maximum of 10 papers. 6 | 7 | Just to clarify, this list does not feature the latest research papers for each topic. 8 | 9 | 10 | ## Table of Contents 11 | - [NLP](#nlp) 12 | - [Computer Vision](#computer-vision) 13 | - [Generative Models](#generative-models) 14 | - [Graph Neural Networks](#graph-neural-networks) 15 | - [Fairness in Machine Learning](#fairness-in-machine-learning) 16 | - [Explainability in Machine Learning](#explainability-in-machine-learning) 17 | 18 | ## NLP 19 | 20 | ### 1. **Word Embeddings: Word2Vec** 21 | 22 | - **Title**: Efficient Estimation of Word Representations in Vector Space 23 | - **Authors**: Tomas Mikolov, Kai Chen, Greg Corrado, Jeffrey Dean 24 | - **Year**: 2013 25 | - **Summary**: Introduced word2vec, revolutionizing word embeddings. 26 | - **Link**: [arXiv](https://arxiv.org/abs/1301.3781) 27 | 28 | ### 2. **Sequence-to-Sequence Models** 29 | 30 | - **Title**: Sequence to Sequence Learning with Neural Networks 31 | - **Authors**: Ilya Sutskever, Oriol Vinyals, Quoc V. Le 32 | - **Year**: 2014 33 | - **Summary**: Introduced the Seq2Seq model, foundational for machine translation and other tasks. 34 | - **Link**: [arXiv](https://arxiv.org/abs/1409.3215) 35 | 36 | ### 3. **Attention Mechanism** 37 | 38 | - **Title**: Neural Machine Translation by Jointly Learning to Align and Translate 39 | - **Authors**: Dzmitry Bahdanau, Kyunghyun Cho, Yoshua Bengio 40 | - **Year**: 2014 41 | - **Summary**: Introduced the attention mechanism, which has become crucial in NLP. 42 | - **Link**: [arXiv](https://arxiv.org/abs/1409.0473) 43 | 44 | ### 4. **Transformer Models** 45 | 46 | - **Title**: Attention Is All You Need 47 | - **Authors**: Ashish Vaswani, Noam Shazeer, Niki Parmar, et al. 48 | - **Year**: 2017 49 | - **Summary**: Introduced the Transformer model, the foundation for many modern NLP models. 50 | - **Link**: [arXiv](https://arxiv.org/abs/1706.03762) 51 | 52 | ### 5. **BERT** 53 | 54 | - **Title**: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 55 | - **Authors**: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova 56 | - **Year**: 2018 57 | - **Summary**: Introduced BERT, which set new standards for several NLP tasks. 58 | - **Link**: [arXiv](https://arxiv.org/abs/1810.04805) 59 | 60 | ### 6. **GPT (Generative Pre-trained Transformer)** 61 | 62 | - **Title**: Improving Language Understanding by Generative Pre-Training 63 | - **Authors**: Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever 64 | - **Year**: 2018 65 | - **Summary**: Introduced the GPT architecture, another milestone in language models. 66 | - **Link**: [OpenAI](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf) 67 | 68 | ### 7. **ELMo (Embeddings from Language Models)** 69 | 70 | - **Title**: Deep contextualized word representations 71 | - **Authors**: Matthew E. Peters, Mark Neumann, Mohit Iyyer, et al. 72 | - **Year**: 2018 73 | - **Summary**: Introduced ELMo, showing the importance of contextualized word embeddings. 74 | - **Link**: [arXiv](https://arxiv.org/abs/1802.05365) 75 | 76 | ### 8. **XLNet** 77 | 78 | - **Title**: XLNet: Generalized Autoregressive Pretraining for Language Understanding 79 | - **Authors**: Zhilin Yang, Zihang Dai, Yiming Yang, et al. 80 | - **Year**: 2019 81 | - **Summary**: Introduced XLNet, which outperformed BERT on several benchmarks. 82 | - **Link**: [arXiv](https://arxiv.org/abs/1906.08237) 83 | 84 | ### 9. **RoBERTa** 85 | 86 | - **Title**: RoBERTa: A Robustly Optimized BERT Pretraining Approach 87 | - **Authors**: Yinhan Liu, Myle Ott, Naman Goyal, et al. 88 | - **Year**: 2019 89 | - **Summary**: Introduced RoBERTa, an optimized version of BERT. 90 | - **Link**: [arXiv](https://arxiv.org/abs/1907.11692) 91 | 92 | ### 10. **T5 (Text-to-Text Transfer Transformer)** 93 | 94 | - **Title**: Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer 95 | - **Authors**: Colin Raffel, Noam Shazeer, Adam Roberts, et al. 96 | - **Year**: 2019 97 | - **Summary**: Introduced T5, which reframed all NLP tasks as text-to-text tasks. 98 | - **Link**: [arXiv](https://arxiv.org/abs/1910.10683) 99 | 100 | ## Computer Vision 101 | 102 | ### 1. **Convolutional Neural Networks (LeNet)** 103 | 104 | - **Title**: Gradient-Based Learning Applied to Document Recognition 105 | - **Authors**: Yann LeCun, Léon Bottou, Yoshua Bengio, Patrick Haffner 106 | - **Year**: 1998 107 | - **Summary**: Introduced Convolutional Neural Networks (CNNs), setting the stage for deep learning in computer vision. 108 | - **Link**: [Stanford](http://vision.stanford.edu/cs598_spring07/papers/Lecun98.pdf) 109 | 110 | ### 2. **ImageNet & AlexNet** 111 | 112 | - **Title**: ImageNet Classification with Deep Convolutional Neural Networks 113 | - **Authors**: Alex Krizhevsky, Ilya Sutskever, Geoffrey E. Hinton 114 | - **Year**: 2012 115 | - **Summary**: Described AlexNet, the CNN that significantly outperformed existing algorithms in the ImageNet competition. 116 | - **Link**: [NIPS](https://papers.nips.cc/paper/2012/file/c399862d3b9d6b76c8436e924a68c45b-Paper.pdf) 117 | 118 | ### 3. **VGGNet** 119 | 120 | - **Title**: Very Deep Convolutional Networks for Large-Scale Image Recognition 121 | - **Authors**: Karen Simonyan, Andrew Zisserman 122 | - **Year**: 2014 123 | - **Summary**: Introduced VGGNet, emphasizing the importance of depth in convolutional neural networks. 124 | - **Link**: [arXiv](https://arxiv.org/abs/1409.1556) 125 | 126 | ### 4. **GoogLeNet/Inception** 127 | 128 | - **Title**: Going Deeper with Convolutions 129 | - **Authors**: Christian Szegedy, Wei Liu, Yangqing Jia, et al. 130 | - **Year**: 2015 131 | - **Summary**: Introduced the Inception architecture, which used "network-in-network" convolutions to increase efficiency. 132 | - **Link**: [arXiv](https://arxiv.org/abs/1409.4842) 133 | 134 | ### 5. **Residual Networks (ResNet)** 135 | 136 | - **Title**: Deep Residual Learning for Image Recognition 137 | - **Authors**: Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun 138 | - **Year**: 2015 139 | - **Summary**: Introduced residual learning, enabling the training of very deep networks. 140 | - **Link**: [arXiv](https://arxiv.org/abs/1512.03385) 141 | 142 | ### 6. **YOLO (You Only Look Once)** 143 | 144 | - **Title**: You Only Look Once: Unified, Real-Time Object Detection 145 | - **Authors**: Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhadi 146 | - **Year**: 2016 147 | - **Summary**: Introduced YOLO, a real-time object detection system. 148 | - **Link**: [arXiv](https://arxiv.org/abs/1506.02640) 149 | 150 | ### 7. **U-Net: Image Segmentation** 151 | 152 | - **Title**: U-Net: Convolutional Networks for Biomedical Image Segmentation 153 | - **Authors**: Olaf Ronneberger, Philipp Fischer, Thomas Brox 154 | - **Year**: 2015 155 | - **Summary**: Introduced U-Net, a specialized network for semantic segmentation in biomedical image analysis. 156 | - **Link**: [arXiv](https://arxiv.org/abs/1505.04597) 157 | 158 | ### 8. **Mask R-CNN** 159 | 160 | - **Title**: Mask R-CNN 161 | - **Authors**: Kaiming He, Georgia Gkioxari, Piotr Dollár, Ross Girshick 162 | - **Year**: 2017 163 | - **Summary**: Extended Faster R-CNN to provide pixel-level segmentation masks. 164 | - **Link**: [arXiv](https://arxiv.org/abs/1703.06870) 165 | 166 | ### 9. **Capsule Networks** 167 | 168 | - **Title**: Dynamic Routing Between Capsules 169 | - **Authors**: Geoffrey E. Hinton, Alex Krizhevsky, Sida Wang 170 | - **Year**: 2017 171 | - **Summary**: Introduced capsule networks as an alternative to CNNs for hierarchical feature learning. 172 | - **Link**: [arXiv](https://arxiv.org/abs/1710.09829) 173 | 174 | ### 10. **Neural Style Transfer** 175 | 176 | - **Title**: A Neural Algorithm of Artistic Style 177 | - **Authors**: Leon A. Gatys, Alexander S. Ecker, Matthias Bethge 178 | - **Year**: 2015 179 | - **Summary**: Introduced the concept of neural style transfer, using deep learning to transfer artistic styles between images. 180 | - **Link**: [arXiv](https://arxiv.org/abs/1508.06576) 181 | 182 | ## Generative Models 183 | 184 | ### 1. **Generative Adversarial Networks** 185 | 186 | - **Title**: Generative Adversarial Nets 187 | - **Authors**: Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, Yoshua Bengio 188 | - **Year**: 2014 189 | - **Summary**: Introduced GANs, a revolutionary framework for training generative models. 190 | - **Link**: [arXiv](https://arxiv.org/abs/1406.2661) 191 | 192 | ### 2. **Variational Autoencoders (VAEs)** 193 | 194 | - **Title**: Auto-Encoding Variational Bayes 195 | - **Authors**: Diederik P. Kingma, Max Welling 196 | - **Year**: 2013 197 | - **Summary**: Introduced VAEs, offering a probabilistic approach to generating data. 198 | - **Link**: [arXiv](https://arxiv.org/abs/1312.6114) 199 | 200 | ### 3. **Transformers for Text Generation (GPT)** 201 | 202 | - **Title**: Improving Language Understanding by Generative Pre-Training 203 | - **Authors**: Alec Radford, Karthik Narasimhan, Tim Salimans, Ilya Sutskever 204 | - **Year**: 2018 205 | - **Summary**: Introduced the GPT architecture, a milestone in text generation. 206 | - **Link**: [OpenAI](https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf) 207 | 208 | ### 4. **Bidirectional Transformers for Language Understanding (BERT)** 209 | 210 | - **Title**: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding 211 | - **Authors**: Jacob Devlin, Ming-Wei Chang, Kenton Lee, Kristina Toutanova 212 | - **Year**: 2018 213 | - **Summary**: Introduced BERT, which has been adapted for various generative tasks. 214 | - **Link**: [arXiv](https://arxiv.org/abs/1810.04805) 215 | 216 | ### 5. **CycleGAN** 217 | 218 | - **Title**: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks 219 | - **Authors**: Jun-Yan Zhu, Taesung Park, Phillip Isola, Alexei A. Efros 220 | - **Year**: 2017 221 | - **Summary**: Introduced CycleGANs for image-to-image translation without paired data. 222 | - **Link**: [arXiv](https://arxiv.org/abs/1703.10593) 223 | 224 | ### 6. **Style Transfer** 225 | 226 | - **Title**: A Neural Algorithm of Artistic Style 227 | - **Authors**: Leon A. Gatys, Alexander S. Ecker, Matthias Bethge 228 | - **Year**: 2015 229 | - **Summary**: Introduced the concept of neural style transfer, using deep learning to transfer artistic styles between images. 230 | - **Link**: [arXiv](https://arxiv.org/abs/1508.06576) 231 | 232 | ### 7. **Normalizing Flows** 233 | 234 | - **Title**: Variational Inference with Normalizing Flows 235 | - **Authors**: Danilo Rezende, Shakir Mohamed 236 | - **Year**: 2015 237 | - **Summary**: Introduced Normalizing Flows for more flexible variational inference. 238 | - **Link**: [arXiv](https://arxiv.org/abs/1505.05770) 239 | 240 | ### 8. **PixelRNN** 241 | 242 | - **Title**: Pixel Recurrent Neural Networks 243 | - **Authors**: Aaron van den Oord, Nal Kalchbrenner, Koray Kavukcuoglu 244 | - **Year**: 2016 245 | - **Summary**: Introduced PixelRNNs, a model for generating images pixel by pixel. 246 | - **Link**: [arXiv](https://arxiv.org/abs/1601.06759) 247 | 248 | ### 9. **Wasserstein GAN** 249 | 250 | - **Title**: Wasserstein GAN 251 | - **Authors**: Martin Arjovsky, Soumith Chintala, Léon Bottou 252 | - **Year**: 2017 253 | - **Summary**: Introduced the Wasserstein loss for more stable GAN training. 254 | - **Link**: [arXiv](https://arxiv.org/abs/1701.07875) 255 | 256 | ### 10. **BigGAN** 257 | 258 | - **Title**: Large Scale GAN Training for High Fidelity Natural Image Synthesis 259 | - **Authors**: Andrew Brock, Jeff Donahue, Karen Simonyan 260 | - **Year**: 2018 261 | - **Summary**: Discussed scaling up GANs to generate high-quality images. 262 | - **Link**: [arXiv](https://arxiv.org/abs/1809.11096) 263 | 264 | ## Graph Neural Networks 265 | 266 | ### 1. **Spectral Networks and Locally Connected Networks on Graphs** 267 | 268 | - **Title**: Spectral Networks and Locally Connected Networks on Graphs 269 | - **Authors**: Joan Bruna, Wojciech Zaremba, Arthur Szlam, Yann LeCun 270 | - **Year**: 2013 271 | - **Summary**: One of the earliest works on graph neural networks, introducing the concept of spectral networks. 272 | - **Link**: [arXiv](https://arxiv.org/abs/1312.6203) 273 | 274 | ### 2. **Graph Convolutional Networks (GCNs)** 275 | 276 | - **Title**: Semi-Supervised Classification with Graph Convolutional Networks 277 | - **Authors**: Thomas N. Kipf, Max Welling 278 | - **Year**: 2016 279 | - **Summary**: Introduced Graph Convolutional Networks, a fundamental architecture for GNNs. 280 | - **Link**: [arXiv](https://arxiv.org/abs/1609.02907) 281 | 282 | ### 3. **GraphSAGE** 283 | 284 | - **Title**: Inductive Representation Learning on Large Graphs 285 | - **Authors**: William L. Hamilton, Rex Ying, Jure Leskovec 286 | - **Year**: 2017 287 | - **Summary**: Introduced GraphSAGE, a method for inductive learning on graphs. 288 | - **Link**: [arXiv](https://arxiv.org/abs/1706.02216) 289 | 290 | ### 4. **GAT (Graph Attention Networks)** 291 | 292 | - **Title**: Graph Attention Networks 293 | - **Authors**: Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, Yoshua Bengio 294 | - **Year**: 2017 295 | - **Summary**: Introduced Graph Attention Networks, integrating attention mechanisms into GNNs. 296 | - **Link**: [arXiv](https://arxiv.org/abs/1710.10903) 297 | 298 | ### 5. **Graph Neural Networks with Differentiable Pooling** 299 | 300 | - **Title**: Hierarchical Graph Representation Learning with Differentiable Pooling 301 | - **Authors**: Rex Ying, Jiaxuan You, Christopher Morris, Xiang Ren, William L. Hamilton, Jure Leskovec 302 | - **Year**: 2018 303 | - **Summary**: Introduced differentiable pooling layers for learning hierarchical representations of graphs. 304 | - **Link**: [arXiv](https://arxiv.org/abs/1806.08804) 305 | 306 | ### 6. **ChebNet** 307 | 308 | - **Title**: Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering 309 | - **Authors**: Michaël Defferrard, Xavier Bresson, Pierre Vandergheynst 310 | - **Year**: 2016 311 | - **Summary**: Introduced ChebNet, which uses Chebyshev polynomials for spectral graph convolutions. 312 | - **Link**: [arXiv](https://arxiv.org/abs/1606.09375) 313 | 314 | ### 7. **Graph Isomorphism Networks (GIN)** 315 | 316 | - **Title**: How Powerful are Graph Neural Networks? 317 | - **Authors**: Keyulu Xu, Weihua Hu, Jure Leskovec, Stefanie Jegelka 318 | - **Year**: 2018 319 | - **Summary**: Investigated the expressive power of GNNs and introduced Graph Isomorphism Networks. 320 | - **Link**: [arXiv](https://arxiv.org/abs/1810.00826) 321 | 322 | ### 8. **Message Passing Neural Network (MPNN)** 323 | 324 | - **Title**: Neural Message Passing for Quantum Chemistry 325 | - **Authors**: Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, George E. Dahl 326 | - **Year**: 2017 327 | - **Summary**: Introduced the Message Passing Neural Network, a framework for learning on graphs. 328 | - **Link**: [arXiv](https://arxiv.org/abs/1704.01212) 329 | 330 | ### 9. **Dynamic Graph CNN for Learning on Point Clouds** 331 | 332 | - **Title**: Dynamic Graph CNN for Learning on Point Clouds 333 | - **Authors**: Yue Wang, Yongbin Sun, Ziwei Liu, Sanjay E. Sarma, Michael M. Bronstein, Justin M. Solomon 334 | - **Year**: 2018 335 | - **Summary**: Extended GNNs to unstructured point clouds, often used in 3D vision tasks. 336 | - **Link**: [arXiv](https://arxiv.org/abs/1801.07829) 337 | 338 | ### 10. **Relational Graph Convolutional Networks** 339 | 340 | - **Title**: Modeling Relational Data with Graph Convolutional Networks 341 | - **Authors**: Michael Schlichtkrull, Thomas N. Kipf, Peter Bloem, Rianne van den Berg, Ivan Titov, Max Welling 342 | - **Year**: 2017 343 | - **Summary**: Extended GCNs to relational data, which is particularly useful for knowledge graphs. 344 | - **Link**: [arXiv](https://arxiv.org/abs/1703.06103) 345 | 346 | ## Fairness in Machine Learning 347 | 348 | ### **1. Fairness Definitions Explained** 349 | 350 | - **Title**: Fairness Definitions Explained 351 | - **Authors**: Sahil Verma, Julia Rubin 352 | - **Year**: 2018 353 | - **Summary**: Provides a comprehensive overview of various fairness definitions in machine learning. 354 | - **Link**: [Umass](https://fairware.cs.umass.edu/papers/Verma.pdf) 355 | 356 | ### **2. Fairness Through Awareness** 357 | 358 | - **Title**: Fairness Through Awareness 359 | - **Authors**: Cynthia Dwork, Moritz Hardt, Toniann Pitassi, Omer Reingold, Richard Zemel 360 | - **Year**: 2011 361 | - **Summary**: Introduced the concept of "individual fairness." 362 | - **Link**: [arXiv](https://arxiv.org/abs/1104.3913) 363 | 364 | ### **3. Equality of Opportunity in Supervised Learning** 365 | 366 | - **Title**: Equality of Opportunity in Supervised Learning 367 | - **Authors**: Moritz Hardt, Eric Price, Nathan Srebro 368 | - **Year**: 2016 369 | - **Summary**: Introduces the notion of equality of opportunity in the context of classification. 370 | - **Link**: [arXiv](https://arxiv.org/abs/1610.02413) 371 | 372 | ## Explainability in Machine Learning 373 | 374 | ### **1. Local Interpretable Model-agnostic Explanations (LIME)** 375 | 376 | - **Title**: "Why Should I Trust You?” Explaining the Predictions of Any Classifier 377 | - **Authors**: Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin 378 | - **Year**: 2016 379 | - **Summary**: Introduced LIME, a framework for explaining individual predictions. 380 | - **Link**: [arXiv](https://arxiv.org/abs/1602.04938) 381 | 382 | ### **2. SHAP (SHapley Additive exPlanations)** 383 | 384 | - **Title**: A Unified Approach to Interpreting Model Predictions 385 | - **Authors**: Scott Lundberg, Su-In Lee 386 | - **Year**: 2017 387 | - **Summary**: Introduced SHAP values based on game theory for model explanation. 388 | - **Link**: [arXiv](https://arxiv.org/abs/1705.07874) 389 | 390 | ### **3. Interpretable Decision Sets** 391 | 392 | - **Title**: Interpretable Decision Sets: A Joint Framework for Description and Prediction 393 | - **Authors**: Himabindu Lakkaraju, Stephen H. Bach, Jure Leskovec 394 | - **Year**: 2016 395 | - **Summary**: Focuses on generating interpretable decision sets for classification. 396 | - **Link**:[Stanford](https://www-cs-faculty.stanford.edu/people/jure/pubs/interpretable-kdd16.pdf) 397 | 398 | ### **4. Anchors: High-Precision Model-Agnostic Explanations** 399 | 400 | - **Title**: Anchors: High-Precision Model-Agnostic Explanations 401 | - **Authors**: Marco Tulio Ribeiro, Sameer Singh, Carlos Guestrin 402 | - **Year**: 2018 403 | - **Summary**: Proposes a method for creating "anchor" explanations that are locally sufficient conditions for predictions. 404 | - **Link**: [Washington](https://homes.cs.washington.edu/~marcotcr/aaai18.pdf) 405 | 406 | ### **5. Counterfactual Explanations** 407 | 408 | - **Title**: Counterfactual Explanations without Opening the Black Box: Automated Decisions and the GDPR 409 | - **Authors**: Sandra Wachter, Brent Mittelstadt, Chris Russell 410 | - **Year**: 2017 411 | - **Summary**: Discusses counterfactual explanations in the context of GDPR. 412 | - **Link**: [arXiv](https://arxiv.org/abs/1711.00399) 413 | 414 | ### **6. Explainability for Neural Networks** 415 | 416 | - **Title**: Towards A Rigorous Science of Interpretable Machine Learning 417 | - **Authors**: Finale Doshi-Velez, Been Kim 418 | - **Year**: 2017 419 | - **Summary**: Discusses the challenges and directions for making neural networks interpretable. 420 | - **Link**: [arXiv](https://arxiv.org/abs/1702.08608) 421 | 422 | ### **7. Towards Fairness in Visual Recognition** 423 | 424 | - **Title**: Towards Fairness in Visual Recognition: Effective Strategies for Bias Mitigation 425 | - **Authors**: Zeyu Wang, Klint Qinami, Ioannis Christos Karakozis, Kyle Genova, Prem Nair, Kenji Hata, Olga Russakovsky 426 | - **Year**: 2019 427 | - **Summary**: Discusses fairness issues in computer vision and proposes bias mitigation strategies. 428 | - **Link**: [CVF](https://openaccess.thecvf.com/content_CVPR_2020/html/Wang_Towards_Fairness_in_Visual_Recognition_Effective_Strategies_for_Bias_Mitigation_CVPR_2020_paper.html) 429 | --------------------------------------------------------------------------------