└── README.md


/README.md:
--------------------------------------------------------------------------------
 1 | # Robust-and-Explainable-machine-learning
 2 | Related materials for robust and explainable machine learning
 3 | 
 4 | ## Contents 
 5 | 
 6 | - [Robustness](#robustness)
 7 | - [Interpretability](#interpretability)
 8 | 
 9 | ## Robustness
10 | ### Properties
11 | * [Intriguing properties of neural networks](https://arxiv.org/abs/1312.6199) <br/> Individual unit contains no semantic information; Adversarial examples by L-BFGS (Optimization based).
12 | * [Deep Neural Networks are Easily Fooled:
13 | High Confidence Predictions for Unrecognizable Images](https://arxiv.org/abs/1412.1897) <br/> Fool images by evolution algorithm.
14 | * [Universal adversarial perturbations](https://arxiv.org/abs/1610.08401) <br/> Universal adversarial perturbations can fool the network in most of the images.
15 | 
16 | ### Transferability
17 | * [Delving into Transferable Adversarial Examples and Black-box Attacks](https://arxiv.org/abs/1611.02770) <br/> Examine the transferability on ImageNet dataset and use this property to attack black-box systems.
18 | 
19 | 
20 | ### Attack
21 | * [Explaining and Harnessing Adversarial Examples](https://arxiv.org/abs/1412.6572) <br/> Fast gradient sign method.
22 | * [Adversarial Examples In The Physical World](https://arxiv.org/abs/1607.02533) <br/> Printed photos can also fool the networks; Introduce an iterative method (extension of FGS).
23 | * [The Limitations of Deep Learning in Adversarial Settings](https://arxiv.org/abs/1511.07528) <br/> Find salient input regions that are useful for adversarial examples.
24 | * [Towards Evaluating the Robustness of Neural Networks](https://arxiv.org/abs/1608.04644) <br/> Optimization based approach.
25 | * [DeepFool: a simple and accurate method to fool deep neural networks](https://arxiv.org/pdf/1511.04599.pdf) <br/> A new method to generate non-targeted adversarial examples. Find the closest boundary and also use the gradient.
26 | * [Good Word Attacks on Statistical Spam Filters](http://www.egov.ufsc.br/portal/sites/default/files/anexos/5867-5859-1-PB.pdf)
27 | * [Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples](https://arxiv.org/abs/1602.02697) <br/> Block-box attack using a substitute network.
28 | * [Simple Black-Box Adversarial Perturbations for Deep Networks](https://arxiv.org/abs/1612.06299) <br/> Black-box attack using greedy search.
29 | * [Adversarial Manipulation of Deep Representations](https://arxiv.org/abs/1511.05122) <br/> Find an adversarial image that has similar representations with a target image (trivial).
30 | * [Adversarial Diversity and Hard Positive Generation](https://arxiv.org/abs/1605.01775)
31 | 
32 | 
33 | ### Generative Model
34 | * [Adversarial examples for generative models](https://arxiv.org/abs/1702.06832) <br/> Attack VAE and VAE-GAN.
35 | * [Adversarial Images for Variational Autoencoders](https://arxiv.org/abs/1612.00155) <br/> Attack VAE by latent representations. 
36 | 
37 | ### Defense
38 | * [Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks](https://arxiv.org/abs/1511.04508) <br/> Train a second network with soft target labels.
39 | * [Robust Convolutional Neural Networks under Adversarial Noise](https://arxiv.org/abs/1511.06306) <br/> Improve robustness by injecting noise during training.
40 | * [Towards Deep Neural Network Architectures Robust to Adversarial Examples](https://arxiv.org/abs/1412.5068) <br/> Use aotoencoder to denoise.
41 | * [On Detecting Adversarial Perturbations](https://arxiv.org/abs/1702.04267) <br/> Detect adversarial perturbations in intermediate layers by a detector network and dynamic generate adversarial images during training. They also propose fast gradient method, which is an extension of iterative method based on l2 norm.
42 | 
43 | ### Theoretical Attack
44 | * [Measuring Neural Net Robustness with Constraints](https://arxiv.org/pdf/1605.07262.pdf)<br/>A measurement of robustness.
45 | * [A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Examples](https://arxiv.org/abs/1612.00334)
46 | * [Blind Attacks on Machine Learners](https://papers.nips.cc/paper/6482-blind-attacks-on-machine-learners)
47 | * [SoK Towards the Science of Security and Privacy in Machine Learning](https://spqr.eecs.umich.edu/papers/rushanan-sok-oakland14.pdf)
48 | * [Robustness of classifiers: from adversarial to random noise](https://arxiv.org/abs/1608.08967)
49 | 
50 | ## Interpretability
51 | 
52 | * [Towards A Rigorous Science of Interpretable Machine Learning](https://arxiv.org/pdf/1702.08608.pdf) <br/> An overview of interpretability.
53 | * [Visualizing and Understanding Convolutional Networks](https://arxiv.org/abs/1311.2901) <br/> Deconvolution.
54 | * [Inverting Visual Representations with Convolutional Networks](https://arxiv.org/abs/1506.02753) <br/> Code inversion by learning a decoder network.
55 | * [Understanding Deep Image Representations by Inverting Them](https://arxiv.org/abs/1412.0035) <br/> Code inversion with priors.
56 | * [Synthesizing the preferred inputs for neurons in neural networks via deep generator networks](https://arxiv.org/abs/1605.09304) <br/> Synthesize an image from internal representations and use GAN (deconvolution) to learn image priors. (like code inversion)
57 | * [Visualizing Higher-Layer Features of a Deep Network](https://www.researchgate.net/publication/265022827_Visualizing_Higher-Layer_Features_of_a_Deep_Network) <br/> Activation maximization.
58 | * [Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks](https://arxiv.org/pdf/1602.03616.pdf) <br/> Activation maximization for multifaceted features.
59 | * [Towards Better Analysis of Deep Convolutional Neural Networks](https://arxiv.org/abs/1604.07043) <br/> An useful tool. Represent a neuron by top image patches with highest activation.
60 | * [Object Detectors Emerge in Deep Scene CNNs](https://arxiv.org/abs/1412.6856) <br/> Visualize neurons by highest activated images and corresponding receptive fields.
61 | * [Visualizing Deep Neural Network Decisions: Prediction Difference Analysis](https://arxiv.org/abs/1702.04595) <br/> A general method to visualize image regions that support or against a prediction (Attention). It can also be used to visualize neurons.
62 | * [STRIVING FOR SIMPLICITY: THE ALL CONVOLUTIONAL NET](https://arxiv.org/pdf/1412.6806.pdf) <br/> Guided backpropogation
63 | * [Network Dissection Quantifying Interpretability of Deep Visual Representations](https://arxiv.org/pdf/1704.05796.pdf) <br/> A new dataset with pixel-level annotations to quantify the interpretability of neurons (by using IoU).
64 | * [Do semantic parts emerge in Convolutional Neural Networks?](https://arxiv.org/pdf/1607.03738.pdf) <br/> Semantic parts emerge in CNNs by using detection datasets.
65 | * [Learning Deep Features for Discriminative Localization](http://cnnlocalization.csail.mit.edu/Zhou_Learning_Deep_Features_CVPR_2016_paper.pdf) <br/> CAM for weakly supervised detection.
66 | * [Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization](https://arxiv.org/abs/1610.02391) <br/> Extension of CAM on captioning and VQA.
67 | * [Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps](https://arxiv.org/pdf/1312.6034.pdf) <br/> Visualize the class specific representation in the input space(activation maximization) and use the gradient information to find the saliency map. Gradients can
68 | represent the importance.
69 | * [Towards Transparent AI Systems: Interpreting Visual Question Answering Models](http://icmlviz.github.io/assets/papers/22.pdf) <br/> Interpreting VQA answers by finding important image regions and question words.
70 | * [Human Attention in Visual Question Answering:Do Humans and Deep Networks Look at the Same Regions?](http://icmlviz.github.io/assets/papers/17.pdf) <br/> Study the attention regions made by humans and attention-models in VQA task.
71 | 
72 | ### Justification
73 | * [Generating Visual Explanations](https://arxiv.org/abs/1603.08507) <br/> Generate an explanation for bird classification.
74 | * [Attentive Explanations: Justifying Decisions and Pointing to the Evidence](https://arxiv.org/abs/1612.04757) <br/> Justify its decisions by generating a neuron sentence and pointing to important image regions (Attention) in VQA task. 
75 | 
76 | ### Generative Models
77 | * [Inducing Interpretable Representations with Variational Autoencoders](https://arxiv.org/abs/1611.07492) <br/> Learn interpretable latent variables in VAE.
78 | * [InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets](https://arxiv.org/abs/1606.03657) <br/> In GAN.
79 | 


--------------------------------------------------------------------------------