└── README.md /README.md: -------------------------------------------------------------------------------- 1 | # Robust-and-Explainable-machine-learning 2 | Related materials for robust and explainable machine learning 3 | 4 | ## Contents 5 | 6 | - [Robustness](#robustness) 7 | - [Interpretability](#interpretability) 8 | 9 | ## Robustness 10 | ### Properties 11 | * [Intriguing properties of neural networks](https://arxiv.org/abs/1312.6199)
Individual unit contains no semantic information; Adversarial examples by L-BFGS (Optimization based). 12 | * [Deep Neural Networks are Easily Fooled: 13 | High Confidence Predictions for Unrecognizable Images](https://arxiv.org/abs/1412.1897)
Fool images by evolution algorithm. 14 | * [Universal adversarial perturbations](https://arxiv.org/abs/1610.08401)
Universal adversarial perturbations can fool the network in most of the images. 15 | 16 | ### Transferability 17 | * [Delving into Transferable Adversarial Examples and Black-box Attacks](https://arxiv.org/abs/1611.02770)
Examine the transferability on ImageNet dataset and use this property to attack black-box systems. 18 | 19 | 20 | ### Attack 21 | * [Explaining and Harnessing Adversarial Examples](https://arxiv.org/abs/1412.6572)
Fast gradient sign method. 22 | * [Adversarial Examples In The Physical World](https://arxiv.org/abs/1607.02533)
Printed photos can also fool the networks; Introduce an iterative method (extension of FGS). 23 | * [The Limitations of Deep Learning in Adversarial Settings](https://arxiv.org/abs/1511.07528)
Find salient input regions that are useful for adversarial examples. 24 | * [Towards Evaluating the Robustness of Neural Networks](https://arxiv.org/abs/1608.04644)
Optimization based approach. 25 | * [DeepFool: a simple and accurate method to fool deep neural networks](https://arxiv.org/pdf/1511.04599.pdf)
A new method to generate non-targeted adversarial examples. Find the closest boundary and also use the gradient. 26 | * [Good Word Attacks on Statistical Spam Filters](http://www.egov.ufsc.br/portal/sites/default/files/anexos/5867-5859-1-PB.pdf) 27 | * [Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples](https://arxiv.org/abs/1602.02697)
Block-box attack using a substitute network. 28 | * [Simple Black-Box Adversarial Perturbations for Deep Networks](https://arxiv.org/abs/1612.06299)
Black-box attack using greedy search. 29 | * [Adversarial Manipulation of Deep Representations](https://arxiv.org/abs/1511.05122)
Find an adversarial image that has similar representations with a target image (trivial). 30 | * [Adversarial Diversity and Hard Positive Generation](https://arxiv.org/abs/1605.01775) 31 | 32 | 33 | ### Generative Model 34 | * [Adversarial examples for generative models](https://arxiv.org/abs/1702.06832)
Attack VAE and VAE-GAN. 35 | * [Adversarial Images for Variational Autoencoders](https://arxiv.org/abs/1612.00155)
Attack VAE by latent representations. 36 | 37 | ### Defense 38 | * [Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks](https://arxiv.org/abs/1511.04508)
Train a second network with soft target labels. 39 | * [Robust Convolutional Neural Networks under Adversarial Noise](https://arxiv.org/abs/1511.06306)
Improve robustness by injecting noise during training. 40 | * [Towards Deep Neural Network Architectures Robust to Adversarial Examples](https://arxiv.org/abs/1412.5068)
Use aotoencoder to denoise. 41 | * [On Detecting Adversarial Perturbations](https://arxiv.org/abs/1702.04267)
Detect adversarial perturbations in intermediate layers by a detector network and dynamic generate adversarial images during training. They also propose fast gradient method, which is an extension of iterative method based on l2 norm. 42 | 43 | ### Theoretical Attack 44 | * [Measuring Neural Net Robustness with Constraints](https://arxiv.org/pdf/1605.07262.pdf)
A measurement of robustness. 45 | * [A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Examples](https://arxiv.org/abs/1612.00334) 46 | * [Blind Attacks on Machine Learners](https://papers.nips.cc/paper/6482-blind-attacks-on-machine-learners) 47 | * [SoK Towards the Science of Security and Privacy in Machine Learning](https://spqr.eecs.umich.edu/papers/rushanan-sok-oakland14.pdf) 48 | * [Robustness of classifiers: from adversarial to random noise](https://arxiv.org/abs/1608.08967) 49 | 50 | ## Interpretability 51 | 52 | * [Towards A Rigorous Science of Interpretable Machine Learning](https://arxiv.org/pdf/1702.08608.pdf)
An overview of interpretability. 53 | * [Visualizing and Understanding Convolutional Networks](https://arxiv.org/abs/1311.2901)
Deconvolution. 54 | * [Inverting Visual Representations with Convolutional Networks](https://arxiv.org/abs/1506.02753)
Code inversion by learning a decoder network. 55 | * [Understanding Deep Image Representations by Inverting Them](https://arxiv.org/abs/1412.0035)
Code inversion with priors. 56 | * [Synthesizing the preferred inputs for neurons in neural networks via deep generator networks](https://arxiv.org/abs/1605.09304)
Synthesize an image from internal representations and use GAN (deconvolution) to learn image priors. (like code inversion) 57 | * [Visualizing Higher-Layer Features of a Deep Network](https://www.researchgate.net/publication/265022827_Visualizing_Higher-Layer_Features_of_a_Deep_Network)
Activation maximization. 58 | * [Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks](https://arxiv.org/pdf/1602.03616.pdf)
Activation maximization for multifaceted features. 59 | * [Towards Better Analysis of Deep Convolutional Neural Networks](https://arxiv.org/abs/1604.07043)
An useful tool. Represent a neuron by top image patches with highest activation. 60 | * [Object Detectors Emerge in Deep Scene CNNs](https://arxiv.org/abs/1412.6856)
Visualize neurons by highest activated images and corresponding receptive fields. 61 | * [Visualizing Deep Neural Network Decisions: Prediction Difference Analysis](https://arxiv.org/abs/1702.04595)
A general method to visualize image regions that support or against a prediction (Attention). It can also be used to visualize neurons. 62 | * [STRIVING FOR SIMPLICITY: THE ALL CONVOLUTIONAL NET](https://arxiv.org/pdf/1412.6806.pdf)
Guided backpropogation 63 | * [Network Dissection Quantifying Interpretability of Deep Visual Representations](https://arxiv.org/pdf/1704.05796.pdf)
A new dataset with pixel-level annotations to quantify the interpretability of neurons (by using IoU). 64 | * [Do semantic parts emerge in Convolutional Neural Networks?](https://arxiv.org/pdf/1607.03738.pdf)
Semantic parts emerge in CNNs by using detection datasets. 65 | * [Learning Deep Features for Discriminative Localization](http://cnnlocalization.csail.mit.edu/Zhou_Learning_Deep_Features_CVPR_2016_paper.pdf)
CAM for weakly supervised detection. 66 | * [Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization](https://arxiv.org/abs/1610.02391)
Extension of CAM on captioning and VQA. 67 | * [Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps ](https://arxiv.org/pdf/1312.6034.pdf)
Visualize the class specific representation in the input space(activation maximization) and use the gradient information to find the saliency map. Gradients can 68 | represent the importance. 69 | * [Towards Transparent AI Systems: Interpreting Visual Question Answering Models](http://icmlviz.github.io/assets/papers/22.pdf)
Interpreting VQA answers by finding important image regions and question words. 70 | * [Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions?](http://icmlviz.github.io/assets/papers/17.pdf)
Study the attention regions made by humans and attention-models in VQA task. 71 | 72 | ### Justification 73 | * [Generating Visual Explanations](https://arxiv.org/abs/1603.08507)
Generate an explanation for bird classification. 74 | * [Attentive Explanations: Justifying Decisions and Pointing to the Evidence](https://arxiv.org/abs/1612.04757)
Justify its decisions by generating a neuron sentence and pointing to important image regions (Attention) in VQA task. 75 | 76 | ### Generative Models 77 | * [Inducing Interpretable Representations with Variational Autoencoders](https://arxiv.org/abs/1611.07492)
Learn interpretable latent variables in VAE. 78 | * [InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets](https://arxiv.org/abs/1606.03657)
In GAN. 79 | --------------------------------------------------------------------------------