├── LICENSE └── README.md /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2018 Marcel Edmund Franke 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Machine learning research papers 2 | 3 | Collection of machine learning research paper references 4 | 5 | ### LLM (Large language mode) 6 | 7 | * [Self-Rewarding Language Models](https://arxiv.org/pdf/2401.10020.pdf) 8 | * [Meta Large Language Model Compiler: Foundation Models of Compiler Optimization](https://ai.meta.com/research/publications/meta-large-language-model-compiler-foundation-models-of-compiler-optimization) 9 | 10 | ## Math 11 | 12 | * [A Beginner's Guide to the Mathematics of Neural Networks](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.161.3556&rep=rep1&type=pdf&fbclid=IwAR3OWInStoLwXtfjglO2XeQj1X7NNHBKPzzEou4At4GeYVGpx_zDkUEliz4) 13 | * [Mathematics of Deep Learning](https://arxiv.org/abs/1712.04741) 14 | * [The Matrix Calculus You Need For Deep Learning](https://arxiv.org/abs/1802.01528) 15 | * [A guide to convolution arithmetic for deep learning](https://arxiv.org/abs/1603.07285) 16 | * [Deep Learning: An Introduction for Applied Mathematicians](https://arxiv.org/abs/1801.05894) - page 23 17 | 18 | ## Deep learning 19 | 20 | * [Recent Advances in Deep Learning: An Overview](https://arxiv.org/abs/1807.08169) 21 | * [Deep learning review](https://www.cs.toronto.edu/~hinton/absps/NatureDeepReview.pdf) 22 | * [Understanding deep learning requires rethinking generalization](https://arxiv.org/abs/1611.03530) 23 | * [Learning the Number of Neurons in Deep Networks](https://arxiv.org/abs/1611.06321) 24 | * [Lifelong Learning with Dynamically Expandable Networks](https://arxiv.org/abs/1708.01547) 25 | * [Dropout: a simple way to prevent neural networks from overfitting](http://jmlr.org/papers/volume15/srivastava14a/srivastava14a.pdf) 26 | * [Self-Attentive Pooling for Efficient Deep Learning](https://arxiv.org/abs/2209.07659) 27 | 28 | ## GAN 29 | 30 | * [StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks](https://arxiv.org/abs/1612.03242) 31 | * [Self-Attention Generative Adversarial Networks](https://arxiv.org/abs/1805.08318) 32 | 33 | ## Neuro evolution 34 | 35 | * [Neural Architecture Search with Reinforcement Learning](https://arxiv.org/abs/1611.01578) 36 | * [Large-Scale Evolution of Image Classifiers](https://arxiv.org/pdf/1703.01041.pdf) 37 | * [AutoAugment: Learning Augmentation Policies from Data](https://arxiv.org/abs/1805.09501) 38 | * [Designing Neural Network Architectures using Reinforcement Learning](https://arxiv.org/abs/1611.02167) 39 | * [Learning Transferable Architectures for Scalable Image Recognition](https://arxiv.org/abs/1707.07012) 40 | * [Deep Neuroevolution: Genetic Algorithms are a Competitive Alternative for Training Deep Neural Networks for Reinforcement Learning](https://arxiv.org/abs/1712.06567) 41 | * [MorphNet: Fast & Simple Resource-Constrained Structure Learning of Deep 42 | Networks](https://arxiv.org/abs/1711.06798) 43 | 44 | ## Gradient descent 45 | 46 | * [An overview of gradient descent optimization algorithms](https://arxiv.org/abs/1609.04747) 47 | 48 | ## Word embedding 49 | 50 | * [Distributed Representations of Words and Phrases and their Compositionality Efficient Estimation of Word Representations in Vector Space](https://arxiv.org/abs/1310.4546) 51 | * [Linguistic Regularities in Continuous Space Word Representations](https://www.aclweb.org/anthology/N13-1090) 52 | * [A Neural Probabilistic Language Model](http://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf) 53 | * [Glove](https://nlp.stanford.edu/pubs/glove.pdf) 54 | * [Efficient Estimation of Word Representations in Vector Space](https://arxiv.org/pdf/1301.3781.pdf) 55 | * [Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings](https://arxiv.org/abs/1607.06520) 56 | * [FastText.zip: Compressing text classification models](https://arxiv.org/abs/1612.03651) 57 | * [BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805) 58 | 59 | ## CNN 60 | 61 | * [Siamese Neural Networks for One-shot Image Recognition](https://www.cs.cmu.edu/~rsalakhu/papers/oneshot1.pdf) 62 | * [ImageNet Classification with Deep Convolutional Neural Networks](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf) 63 | * [Multi-column Deep Neural Networks for Image Classification](https://arxiv.org/abs/1202.2745) 64 | * [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/abs/1409.1556) 65 | * [Rethinking the Inception Architecture for Computer Vision](https://arxiv.org/abs/1512.00567) 66 | * [Deep residual learning for image recognition](https://arxiv.org/abs/1512.03385) 67 | * [Network In Network](https://arxiv.org/pdf/1312.4400.pdf) 68 | * [Going Deeper with Convolutions](https://arxiv.org/abs/1409.4842) 69 | * [OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks](https://arxiv.org/pdf/1312.6229.pdf) 70 | * [You Only Look Once: Unified, Real-Time Object Detection](https://arxiv.org/abs/1506.02640) 71 | * [FaceNet: A Unified Embedding for Face Recognition and Clustering](https://arxiv.org/pdf/1503.03832.pdf) 72 | * [Visualizing and Understanding Convolutional Networks](https://arxiv.org/abs/1311.2901) 73 | * [A Neural Algorithm of Artistic Style](https://arxiv.org/abs/1508.06576) 74 | * [Convolutional Sequence to Sequence Learning](https://arxiv.org/abs/1705.03122) 75 | * [Deformable Convolutional Networks](https://arxiv.org/abs/1703.06211) 76 | * [Deep Photo Style Transfer](https://arxiv.org/abs/1703.07511) 77 | * [Wide Residual Networks](https://arxiv.org/abs/1605.07146) 78 | * [WaveNet: A Generative Model for Raw Audio](https://arxiv.org/abs/1609.03499) 79 | * [Densely Connected Convolutional Networks](https://arxiv.org/abs/1608.06993) 80 | * [Resnet in Resnet: Generalizing Residual Architectures](https://arxiv.org/abs/1603.08029) 81 | 82 | ## RL 83 | 84 | * [Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm](https://arxiv.org/pdf/1712.01815.pdf) 85 | * [RL Overview](https://arxiv.org/abs/1701.07274) 86 | 87 | ## GRU 88 | 89 | * [Gated Feedback Recurrent Neural Networks](https://arxiv.org/abs/1502.02367) 90 | 91 | ## RNN 92 | 93 | * [DRAW: A Recurrent Neural Network For Image Generation](https://arxiv.org/abs/1502.04623) 94 | * [Playing Atari with Deep Reinforcement Learning](https://arxiv.org/abs/1312.5602) 95 | * [Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling](https://arxiv.org/pdf/1412.3555.pdf) 96 | * [Sequence to Sequence Learning with Neural Networks](https://arxiv.org/abs/1409.3215) 97 | * [Learning Phrase Representations using RNN Encoder-Decoder for Statistical Machine Translation](https://arxiv.org/abs/1406.1078) 98 | * [Neural Machine Translation by Jointly Learning to Align and Translate](https://arxiv.org/abs/1409.0473) 99 | * [SQLNet: Generating Structured Queries From Natural Language Without Reinforcement Learning](https://arxiv.org/abs/1711.04436) 100 | 101 | ## Graph & Neural networks 102 | 103 | * [Relational inductive biases, deep learning, and graph networks](https://arxiv.org/abs/1806.01261) 104 | * [Interaction Networks for Learning about Objects,Relations and Physics](https://arxiv.org/pdf/1612.00222.pdf) 105 | * [Graph neural networks](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.1015.7227&rep=rep1&type=pdf) - Page 7 106 | * [Recurrent Relational Networks](https://arxiv.org/abs/1711.08028) 107 | * [Graph Capsule Convolutional Neural Networks](https://arxiv.org/abs/1805.08090) 108 | * [Graph Neural Networks for Ranking Web Pages](https://www.researchgate.net/publication/221158677_Graph_Neural_Networks_for_Ranking_Web_Pages) 109 | * [Graph Convolutional Neural Networks for Web-Scale Recommender Systems](https://arxiv.org/abs/1806.01973) 110 | 111 | ## Neural Module Networks 112 | 113 | * [Neural Module Networks](https://arxiv.org/abs/1511.02799) 114 | * [End-To-End Memory Networks](https://arxiv.org/pdf/1503.08895.pdf) 115 | * [Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN)](https://arxiv.org/abs/1412.6632) 116 | * [Show and Tell: A Neural Image Caption Generator](https://arxiv.org/abs/1411.4555) 117 | 118 | ## Memory Networks 119 | 120 | * [Memory Networks](https://arxiv.org/pdf/1410.3916.pdf) 121 | 122 | ## General Models 123 | 124 | * [One Model To Learn Them All](https://arxiv.org/abs/1706.05137) 125 | 126 | ## Neural Programmer-Interpreters 127 | 128 | * [Neural Programmer-Interpreters](https://arxiv.org/abs/1511.06279) 129 | * [Learning Simple Algorithms from Examples](https://arxiv.org/abs/1511.07275) 130 | * [pix2code: Generating Code from a Graphical User Interface Screenshot](https://arxiv.org/abs/1705.07962) 131 | * [DeepCoder: Learning to Write Programs](https://arxiv.org/abs/1611.01989) 132 | * [A deep language model for software code](https://arxiv.org/abs/1608.02715v1) 133 | * [Tree-to-tree Neural Networks for Program Translation](https://arxiv.org/abs/1802.03691) 134 | * [Unsupervised Translation of Programming Languages](https://arxiv.org/abs/2006.03511) 135 | * [TRANX: A Transition-based Neural Abstract Syntax Parser for Semantic Parsing and Code Generation](https://arxiv.org/abs/1810.02720) 136 | * [TransCoder-IR: Code Translation with Compiler Representations](https://arxiv.org/abs/2207.03578) 137 | 138 | ## Database 139 | 140 | * [SageDB: A Learned Database System](http://cidrdb.org/cidr2019/papers/p117-kraska-cidr19.pdf) 141 | 142 | ## Cache 143 | 144 | * [Feedforward Neural Networks for Caching: Enough or Too Much?](https://arxiv.org/abs/1810.06930) 145 | 146 | ## Activations 147 | 148 | * [Maxout networks](https://arxiv.org/pdf/1302.4389v4.pdf) 149 | 150 | ## Other 151 | 152 | * [Event detection in Twitter: A keyword volume approach](https://arxiv.org/abs/1901.00570) 153 | * [Bagging](https://www.stat.berkeley.edu/~breiman/bagging.pdf) 154 | * [Stack Overflow Considered Harmful? The Impact of Copy&Paste on Android Application Security](https://www.researchgate.net/publication/317919491_Stack_Overflow_Considered_Harmful_The_Impact_of_CopyPaste_on_Android_Application_Security) 155 | * [DEXTER: Large-Scale Discovery and Extraction of Product 156 | Specifications on the Web](http://www.vldb.org/pvldb/vol8/p2194-qiu.pdf) 157 | 158 | ## Robotics 159 | 160 | * [End-to-End Learning of Semantic Grasping](https://arxiv.org/abs/1707.01932) 161 | 162 | ## Machine learning (Articles) 163 | 164 | * [Understanding LSTM Networks](https://colah.github.io/posts/2015-08-Understanding-LSTMs/) 165 | * [Conv Nets: A Modular Perspective](https://colah.github.io/posts/2014-07-Conv-Nets-Modular) 166 | * [Understanding Convolutions](http://colah.github.io/posts/2014-07-Understanding-Convolutions/) 167 | 168 | ## Machine learning (Books) 169 | 170 | * [Understanding machine learning theory algorithms](https://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning/understanding-machine-learning-theory-algorithms.pdf) 171 | --------------------------------------------------------------------------------