└── README.md
/README.md:
--------------------------------------------------------------------------------
1 | # Robust-and-Explainable-machine-learning
2 | Related materials for robust and explainable machine learning
3 |
4 | ## Contents
5 |
6 | - [Robustness](#robustness)
7 | - [Interpretability](#interpretability)
8 |
9 | ## Robustness
10 | ### Properties
11 | * [Intriguing properties of neural networks](https://arxiv.org/abs/1312.6199)
Individual unit contains no semantic information; Adversarial examples by L-BFGS (Optimization based).
12 | * [Deep Neural Networks are Easily Fooled:
13 | High Confidence Predictions for Unrecognizable Images](https://arxiv.org/abs/1412.1897)
Fool images by evolution algorithm.
14 | * [Universal adversarial perturbations](https://arxiv.org/abs/1610.08401)
Universal adversarial perturbations can fool the network in most of the images.
15 |
16 | ### Transferability
17 | * [Delving into Transferable Adversarial Examples and Black-box Attacks](https://arxiv.org/abs/1611.02770)
Examine the transferability on ImageNet dataset and use this property to attack black-box systems.
18 |
19 |
20 | ### Attack
21 | * [Explaining and Harnessing Adversarial Examples](https://arxiv.org/abs/1412.6572)
Fast gradient sign method.
22 | * [Adversarial Examples In The Physical World](https://arxiv.org/abs/1607.02533)
Printed photos can also fool the networks; Introduce an iterative method (extension of FGS).
23 | * [The Limitations of Deep Learning in Adversarial Settings](https://arxiv.org/abs/1511.07528)
Find salient input regions that are useful for adversarial examples.
24 | * [Towards Evaluating the Robustness of Neural Networks](https://arxiv.org/abs/1608.04644)
Optimization based approach.
25 | * [DeepFool: a simple and accurate method to fool deep neural networks](https://arxiv.org/pdf/1511.04599.pdf)
A new method to generate non-targeted adversarial examples. Find the closest boundary and also use the gradient.
26 | * [Good Word Attacks on Statistical Spam Filters](http://www.egov.ufsc.br/portal/sites/default/files/anexos/5867-5859-1-PB.pdf)
27 | * [Practical Black-Box Attacks against Deep Learning Systems using Adversarial Examples](https://arxiv.org/abs/1602.02697)
Block-box attack using a substitute network.
28 | * [Simple Black-Box Adversarial Perturbations for Deep Networks](https://arxiv.org/abs/1612.06299)
Black-box attack using greedy search.
29 | * [Adversarial Manipulation of Deep Representations](https://arxiv.org/abs/1511.05122)
Find an adversarial image that has similar representations with a target image (trivial).
30 | * [Adversarial Diversity and Hard Positive Generation](https://arxiv.org/abs/1605.01775)
31 |
32 |
33 | ### Generative Model
34 | * [Adversarial examples for generative models](https://arxiv.org/abs/1702.06832)
Attack VAE and VAE-GAN.
35 | * [Adversarial Images for Variational Autoencoders](https://arxiv.org/abs/1612.00155)
Attack VAE by latent representations.
36 |
37 | ### Defense
38 | * [Distillation as a Defense to Adversarial Perturbations against Deep Neural Networks](https://arxiv.org/abs/1511.04508)
Train a second network with soft target labels.
39 | * [Robust Convolutional Neural Networks under Adversarial Noise](https://arxiv.org/abs/1511.06306)
Improve robustness by injecting noise during training.
40 | * [Towards Deep Neural Network Architectures Robust to Adversarial Examples](https://arxiv.org/abs/1412.5068)
Use aotoencoder to denoise.
41 | * [On Detecting Adversarial Perturbations](https://arxiv.org/abs/1702.04267)
Detect adversarial perturbations in intermediate layers by a detector network and dynamic generate adversarial images during training. They also propose fast gradient method, which is an extension of iterative method based on l2 norm.
42 |
43 | ### Theoretical Attack
44 | * [Measuring Neural Net Robustness with Constraints](https://arxiv.org/pdf/1605.07262.pdf)
A measurement of robustness.
45 | * [A Theoretical Framework for Robustness of (Deep) Classifiers against Adversarial Examples](https://arxiv.org/abs/1612.00334)
46 | * [Blind Attacks on Machine Learners](https://papers.nips.cc/paper/6482-blind-attacks-on-machine-learners)
47 | * [SoK Towards the Science of Security and Privacy in Machine Learning](https://spqr.eecs.umich.edu/papers/rushanan-sok-oakland14.pdf)
48 | * [Robustness of classifiers: from adversarial to random noise](https://arxiv.org/abs/1608.08967)
49 |
50 | ## Interpretability
51 |
52 | * [Towards A Rigorous Science of Interpretable Machine Learning](https://arxiv.org/pdf/1702.08608.pdf)
An overview of interpretability.
53 | * [Visualizing and Understanding Convolutional Networks](https://arxiv.org/abs/1311.2901)
Deconvolution.
54 | * [Inverting Visual Representations with Convolutional Networks](https://arxiv.org/abs/1506.02753)
Code inversion by learning a decoder network.
55 | * [Understanding Deep Image Representations by Inverting Them](https://arxiv.org/abs/1412.0035)
Code inversion with priors.
56 | * [Synthesizing the preferred inputs for neurons in neural networks via deep generator networks](https://arxiv.org/abs/1605.09304)
Synthesize an image from internal representations and use GAN (deconvolution) to learn image priors. (like code inversion)
57 | * [Visualizing Higher-Layer Features of a Deep Network](https://www.researchgate.net/publication/265022827_Visualizing_Higher-Layer_Features_of_a_Deep_Network)
Activation maximization.
58 | * [Multifaceted Feature Visualization: Uncovering the Different Types of Features Learned By Each Neuron in Deep Neural Networks](https://arxiv.org/pdf/1602.03616.pdf)
Activation maximization for multifaceted features.
59 | * [Towards Better Analysis of Deep Convolutional Neural Networks](https://arxiv.org/abs/1604.07043)
An useful tool. Represent a neuron by top image patches with highest activation.
60 | * [Object Detectors Emerge in Deep Scene CNNs](https://arxiv.org/abs/1412.6856)
Visualize neurons by highest activated images and corresponding receptive fields.
61 | * [Visualizing Deep Neural Network Decisions: Prediction Difference Analysis](https://arxiv.org/abs/1702.04595)
A general method to visualize image regions that support or against a prediction (Attention). It can also be used to visualize neurons.
62 | * [STRIVING FOR SIMPLICITY: THE ALL CONVOLUTIONAL NET](https://arxiv.org/pdf/1412.6806.pdf)
Guided backpropogation
63 | * [Network Dissection Quantifying Interpretability of Deep Visual Representations](https://arxiv.org/pdf/1704.05796.pdf)
A new dataset with pixel-level annotations to quantify the interpretability of neurons (by using IoU).
64 | * [Do semantic parts emerge in Convolutional Neural Networks?](https://arxiv.org/pdf/1607.03738.pdf)
Semantic parts emerge in CNNs by using detection datasets.
65 | * [Learning Deep Features for Discriminative Localization](http://cnnlocalization.csail.mit.edu/Zhou_Learning_Deep_Features_CVPR_2016_paper.pdf)
CAM for weakly supervised detection.
66 | * [Grad-CAM: Why did you say that? Visual Explanations from Deep Networks via Gradient-based Localization](https://arxiv.org/abs/1610.02391)
Extension of CAM on captioning and VQA.
67 | * [Deep Inside Convolutional Networks: Visualising Image Classification Models and Saliency Maps
](https://arxiv.org/pdf/1312.6034.pdf)
Visualize the class specific representation in the input space(activation maximization) and use the gradient information to find the saliency map. Gradients can
68 | represent the importance.
69 | * [Towards Transparent AI Systems: Interpreting Visual Question Answering Models](http://icmlviz.github.io/assets/papers/22.pdf)
Interpreting VQA answers by finding important image regions and question words.
70 | * [Human Attention in Visual Question Answering:
Do Humans and Deep Networks Look at the Same Regions?](http://icmlviz.github.io/assets/papers/17.pdf)
Study the attention regions made by humans and attention-models in VQA task.
71 |
72 | ### Justification
73 | * [Generating Visual Explanations](https://arxiv.org/abs/1603.08507)
Generate an explanation for bird classification.
74 | * [Attentive Explanations: Justifying Decisions and Pointing to the Evidence](https://arxiv.org/abs/1612.04757)
Justify its decisions by generating a neuron sentence and pointing to important image regions (Attention) in VQA task.
75 |
76 | ### Generative Models
77 | * [Inducing Interpretable Representations with Variational Autoencoders](https://arxiv.org/abs/1611.07492)
Learn interpretable latent variables in VAE.
78 | * [InfoGAN: Interpretable Representation Learning by Information Maximizing Generative Adversarial Nets](https://arxiv.org/abs/1606.03657)
In GAN.
79 |
--------------------------------------------------------------------------------