├── cheatsheets
    ├── gans.tex
    ├── rnns.tex
    ├── build.sh
    ├── bibliography.bib
    ├── applications.tex
    ├── deadline-miss-rate.tex
    ├── autoencoders.tex
    └── convolutions.tex
├── .mdlrc
├── .gitignore
├── README.md
└── graph.md


/cheatsheets/gans.tex:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/cheatsheets/rnns.tex:
--------------------------------------------------------------------------------
1 | 


--------------------------------------------------------------------------------
/.mdlrc:
--------------------------------------------------------------------------------
1 | rules "~MD013"
2 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | *.aux
2 | *.log
3 | *.pdf
4 | *.bbl
5 | *.blg
6 | 


--------------------------------------------------------------------------------
/cheatsheets/build.sh:
--------------------------------------------------------------------------------
 1 | #!/bin/bash
 2 | 
 3 | files=( convolutions autoencoders applications deadline-miss-rate )
 4 | 
 5 | for name in "${files[@]}"; do
 6 |   pdflatex "$name"
 7 |   bibtex "$name"
 8 |   pdflatex "$name"
 9 | done
10 | 


--------------------------------------------------------------------------------
/cheatsheets/bibliography.bib:
--------------------------------------------------------------------------------
 1 | @article{Bronstein2017,
 2 |   author={{Michael M.} Bronstein and Joan Bruna and Yann LeCun and Arthur Szlam and Pierre Vandergheynst},
 3 |   journal={IEEE Signal Processing Magazine},
 4 |   title={Geometric Deep Learning: Going beyond Euclidean data},
 5 |   year={2017},
 6 |   volume={34},
 7 |   number={4},
 8 |   pages={18--42},
 9 | }
10 | 
11 | @article{Monti2016,
12 |   author={Federico Monti and Davide Boscaini and Jonathan Masci and Emanuele Rodol{\`{a}} and Jan Svoboda and {Michael M.} Bronstein},
13 |   title={Geometric deep learning on graphs and manifolds using mixture model CNNs},
14 |   journal={CoRR},
15 |   year={2016},
16 | }
17 | 
18 | @mastersthesis{Fey2017,
19 |   title={Convolutional {N}eural {N}etworks auf {G}raphrepr{\"a}sentationen von {B}ildern},
20 |   author={Matthias Fey},
21 |   school={Technische Universit{\"a}t Dortmund},
22 |   year={2017},
23 | }
24 | 


--------------------------------------------------------------------------------
/cheatsheets/applications.tex:
--------------------------------------------------------------------------------
 1 | \documentclass[pdftex,10pt,a4paper]{scrartcl}
 2 | 
 3 | \usepackage[a4paper,left=2.5cm,right=2.5cm,bottom=3cm,top=3cm]{geometry}
 4 | \usepackage{amsmath,amssymb,amsthm}
 5 | \usepackage{url}
 6 | \usepackage{todonotes}
 7 | \usepackage[numbers,sort]{natbib}
 8 | \parindent=0cm
 9 | 
10 | \title{Graph applications}
11 | \date{\vspace{-5ex}}
12 | 
13 | \begin{document}
14 | 
15 | \maketitle
16 | 
17 | \section{Logistik}
18 | 
19 | Logistikdrohnen (unbemannte Luftfahrzeuge zum Transport von Waren)
20 | Roboter-Fahrzeuge zum Transport von Waren
21 | Menschen (als Skelett)
22 | Ladestationen fuer Drohnen
23 | Lage der Transportgueter
24 | Kameras?
25 | 
26 | Ort und Richtung der Roboter und des Skeletts stehen ueber Kamera zur Auswertung zur Verfuegung
27 | Viele weitere Sensoren nutzbar
28 | 
29 | Verbindung Mensch und Maschine
30 | 
31 | \paragraph{(Klassifizieren von) kritischen Graphen}
32 | 
33 | 
34 | 
35 | \end{document}
36 | 


--------------------------------------------------------------------------------
/cheatsheets/deadline-miss-rate.tex:
--------------------------------------------------------------------------------
 1 | \documentclass[pdftex,10pt,a4paper]{scrartcl}
 2 | 
 3 | \usepackage[a4paper,left=2.5cm,right=2.5cm,bottom=3cm,top=3cm]{geometry}
 4 | \usepackage{amsmath,amssymb,amsthm}
 5 | \usepackage{url}
 6 | \usepackage[numbers,sort]{natbib}
 7 | \parindent=0cm
 8 | 
 9 | \title{Deadline Miss Rate Analysis}
10 | \date{\vspace{-5ex}}
11 | 
12 | \begin{document}
13 | 
14 | \maketitle
15 | 
16 | \section{Modelling}
17 | 
18 | Hardware architecture is modelled as a set of processors $P_1, \ldots, P_m$, a set of buses, and the corresponding interconnection topology.
19 | The application is modelled as a set of tasks $\tau_1, \ldots, \tau_n$.
20 | Each task $\tau_i$ is characterized by its period $\pi_i$, its deadline $\delta_i$, its priority, its mapping $m(\tau_i)$, i.e.\ the processor on which every job of task $\tau_i$ executes, and its execution time probability density function (ETPDF) $\epsilon_i$.
21 | The execution times of any two jobs (of the same or of different tasks) are assumend statistically independent.
22 | There may exist data dependencies among the tasks.
23 | 
24 | \section{Problem}
25 | 
26 | Given a hardware architecture and an application under the assumptions
27 | 
28 | \end{document}
29 | 


--------------------------------------------------------------------------------
/cheatsheets/autoencoders.tex:
--------------------------------------------------------------------------------
 1 | \documentclass[pdftex,10pt,a4paper]{scrartcl}
 2 | 
 3 | \usepackage[a4paper,left=2.5cm,right=2.5cm,bottom=3cm,top=3cm]{geometry}
 4 | \usepackage{amsmath,amssymb,amsthm}
 5 | \usepackage{url}
 6 | \usepackage[numbers,sort]{natbib}
 7 | \parindent=0cm
 8 | 
 9 | \title{Graph Autoencoders}
10 | \date{\vspace{-5ex}}
11 | 
12 | \begin{document}
13 | 
14 | \maketitle
15 | 
16 | \section{Introduction}
17 | 
18 | An autoencoder consists of 3 components: \emph{encoder} $f(x)$, \emph{code} $h$, \emph{decoder} $g(h)$.
19 | The autoenconder compresses the input and produces the code $h = f(x)$, the endocer then reconstructs the input only using this code, so that $g(f(x)) \approx x$.
20 | The learning process is described simply as minimizing a loss function
21 | \begin{equation*}
22 |   L(x, g(f(x))),
23 | \end{equation*}
24 | where $L$ is a loss function penalizing $g(f(x))$ for being dissimilar from $x$ (e.g.\ mean squared error).
25 | \\\\
26 | Autoencoders are mainly a dimensionality reduction (or compression) algorithm.
27 | 
28 | \begin{itemize}
29 |   \item \textbf{Data-specific:} Autoencoders are only able to meaningfully compress data similiar to what they have been trained on.
30 |   \item \textbf{Lossy:} The output of the autoencoder will not be exactly the same as the input.
31 |   \item \textbf{Unsupervised:} Autoencoders are considered an \emph{unsupervised learning} technique since they don't need explicit labels to train on.
32 | \end{itemize}
33 | 
34 | \section{Why?}
35 | 
36 | \paragraph{Obtaining useful properties}
37 | 
38 | Copying the input to the output may sound useless, but we are typically not intereset in the output of the decoder.
39 | Instead, we hope that training the autoencoder to perform the input copying task will result in the code taking on useful properties.
40 | 
41 | \paragraph{Denoising autoencoders}
42 | 
43 | A \emph{denoising autoencoder} (DAE) instead minimizes
44 | \begin{equation*}
45 |   L(x, g(f(\tilde{x})))
46 | \end{equation*}
47 | where $\tilde{x}$ is a copy of $x$ that has been corrupted by some form of noise.
48 | Denoising autoencoders must therefore undo this operation rather than simply copying their input.
49 | 
50 | \section{Convolutional autoencoders}
51 | 
52 | Both the encoder and decoder are usually fully-connected feedforward neural networks and therefore ignore the 2D image or graph structure.
53 | \emph{Convolutional autoencoders} (CAE) differ from conventional autoencoders as their weights are shared among all locations in the input, preserving spatial locality.
54 | Therefore, a convolutional autoencoder needs a way to deconvolute and unpool a convolution respectively pooling operation.
55 | The definition of convolutional autoencoders is relative straightforward on regular grids, but needs several adjustments and generalizations for non-euclidean domains.
56 | 
57 | \end{document}
58 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Deep Learning Cheatsheet
  2 | 
  3 | General | [Graphs](graph.md)
  4 | 
  5 | ## Introduction
  6 | 
  7 | * [Machine Learning Glossary](https://developers.google.com/machine-learning/glossary/)
  8 | * [Essential Machine Learning Cheatsheets](https://github.com/kailashahirwar/cheatsheets-ai)
  9 | * [Neural Networks and Deep Learning](http://www.neuralnetworksanddeeplearning.com) [Free Online Book]
 10 | * [Free Deep Learning Book](http://www.datasciencecentral.com/profiles/blogs/free-deep-learning-book-mit-press) [MIT Press]
 11 | * [Andrew Ng's machine learning course at Coursera](https://www.coursera.org/learn/machine-learning) [[Material](http://cs229.stanford.edu/materials.html)]
 12 | * [Deep Learning by Google](https://www.udacity.com/course/deep-learning--ud730#)
 13 | * [Deep Learning in Neural Networks: An Overview](https://arxiv.org/pdf/1404.7828.pdf)
 14 | * [How To Become A Machine Learning Engineer: Learning Path](https://hackernoon.com/learning-path-for-machine-learning-engineer-a7d5dc9de4a4)
 15 | * [Awesome Deep Vision](https://www.github.com/kjw0612/awesome-deep-vision)
 16 | * [A Guide to Deep Learning](http://yerevann.com/a-guide-to-deep-learning/?utm_campaign=Revue%20newsletter&utm_medium=Newsletter&utm_source=revue)
 17 | * [Deep Learning Weekly](http://www.deeplearningweekly.com) [Weekly Newsletter Subscription]
 18 | * [A Year of Artificial Intelligence](https://ayearofai.com/) [Blog]
 19 | * [Introduction to Convolutional Neural Networks](http://cs.nju.edu.cn/wujx/paper/CNN.pdf)
 20 | * [Applied Deep Learning](https://medium.com/towards-data-science/applied-deep-learning-part-1-artificial-neural-networks-d7834f67a4f6)
 21 | * [Machine Learning is Fun!](https://medium.com/@ageitgey/machine-learning-is-fun-80ea3ec3c471) [Medium Series]
 22 | * [Machine Learning for Humans](https://medium.com/machine-learning-for-humans) [Medium Series]
 23 | * [My Neural network ins't working! What should I do?](http://theorangeduck.com/page/neural-network-not-working)
 24 | * [The math of neural networks](http://himarsh.org/the-math-neural-networks/)
 25 | * [Everything you need to know about Neural Networks](https://hackernoon.com/everything-you-need-to-know-about-neural-networks-8988c3ee4491)
 26 | 
 27 | ## Python
 28 | 
 29 | * [Learn Python the hard way](https://learnpythonthehardway.org/book/)
 30 | * [Python Data Science Handbook](https://jakevdp.github.io/PythonDataScienceHandbook/)
 31 | * [Styleguide](https://www.python.org/dev/peps/pep-0008/) [PEP8]
 32 | * [TensorFlow](https://www.tensorflow.org) [Open Source Library]
 33 | * [PyCUDA](https://mathema.tician.de/software/pycuda/)
 34 | * [Cython](http://cython.org/)
 35 |   * [Pure Python Mode](http://cython.readthedocs.io/en/latest/src/tutorial/pure.html)
 36 | 
 37 | ### TensorFlow
 38 | 
 39 | * [Official](https://www.tensorflow.org/versions/r0.11/tutorials/index.html)
 40 | * [Adding a New Op](https://www.tensorflow.org/extend/adding_an_op) [Official]
 41 | * [TensorFlow in a Nutshell](http://camron.xyz)
 42 | * [Source Code Examples](https://github.com/aymericdamien/TensorFlow-Examples)
 43 | * [Effictive TensorFlow](https://github.com/vahidk/EffectiveTensorflow) [TensorFlow tutorials and best practices]
 44 | 
 45 | ### PyTorch
 46 | 
 47 | * [Official](http://pytorch.org/)
 48 | * [PyTorch Examples](https://github.com/jcjohnson/pytorch-examples)
 49 | 
 50 | # Conferences
 51 | 
 52 | * [ICLR 2018](https://chillee.github.io/OpenReviewExplorer/)
 53 | 
 54 | ## Architectures
 55 | 
 56 | * [Very Deep Convolutional Networks for Large-Scale Image Recognition](https://arxiv.org/pdf/1409.1556v6.pdf) [VGG]
 57 | * [SqueezeNet: AlexNet-level accuracy with 50x fewer parameters and 0.5MB model size](https://arxiv.org/pdf/1602.07360.pdf)
 58 | * [Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition](https://arxiv.org/pdf/1406.4729.pdf)
 59 | * [Dynamically Expandable Neural Networks](https://buzzrobot.com/dynamically-expandable-neural-networks-ce75ff2b69cf)
 60 | 
 61 | ### AutoEncoder and GANs
 62 | 
 63 | * [Generative Adversarial Networks (GANs): Engine and Applications](https://blog.statsbot.co/generative-adversarial-networks-gans-engine-and-applications-f96291965b47)
 64 | * [GANs are Broken in More than One Way: The Numerics of GANs](http://www.inference.vc/my-notes-on-the-numerics-of-gans/)
 65 | * [Towards data set augmentation with GANs](https://medium.com/towards-data-science/towards-data-set-augmentation-with-gans-9dd64e9628e6)
 66 | * [How does the unpooling and deconvolution work in DeConvNet](https://stackoverflow.com/questions/35049197/how-does-the-unpooling-and-deconvolution-work-in-deconvnet)
 67 | * [What are deconvoltional layers?](https://datascience.stackexchange.com/questions/6107/what-are-deconvolutional-layers)
 68 | * [Visualizing and Understanding Convolutional Networks](https://arxiv.org/pdf/1311.2901v3.pdf) [Original Unpooling Paper]
 69 | 
 70 | ## Optimizer
 71 | 
 72 | * [A method for stochastic optimization](https://arxiv.org/pdf/1412.6980v8.pdf) [Adam Optimizer]
 73 | 
 74 | ## Overfitting
 75 | 
 76 | * [Dropout: A Simple Way to Prevent Neural Networks from Overfitting](https://www.cs.toronto.edu/~hinton/absps/JMLRdropout.pdf)
 77 | * [A Simple Weight Decay Can Improve Generalization](https://papers.nips.cc/paper/563-a-simple-weight-decay-can-improve-generalization.pdf)
 78 | * [Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift](https://arxiv.org/pdf/1502.03167.pdf)
 79 |   * [Batch Normalization - What the hey?](https://gab41.lab41.org/batch-normalization-what-the-hey-d480039a9e3b) [Blog]
 80 | 
 81 | ## Recurrent Neural Networks
 82 | 
 83 | * [The Unreasonable Effectiveness of Recurrent Neural Networks](http://karpathy.github.io/2015/05/21/rnn-effectiveness/)
 84 | 
 85 | ## Datasets
 86 | 
 87 | * [Overview](http://deeplearning.net/datasets/)
 88 | 
 89 | ### Images
 90 | 
 91 | * [MNIST](http://yann.lecun.com/exdb/mnist/)
 92 | * [Fashion MNIST](https://github.com/zalandoresearch/fashion-mnist)
 93 | * [CIFAR-10/100](http://www.cs.toronto.edu/%7Ekriz/cifar.html)
 94 | * [STL-10](https://cs.stanford.edu/~acoates/stl10/)
 95 | * [SVHN](http://ufldl.stanford.edu/housenumbers/)
 96 | * [ImageNet](http://image-net.org/)
 97 |   * [ILSVRC2014](http://image-net.org/challenges/LSVRC/2014/download-images-5jj5.php)
 98 | * [Pascal VOC](http://host.robots.ox.ac.uk/pascal/VOC/)
 99 | * [Animals with Attributes](https://cvml.ist.ac.at/AwA2/)
100 | 
101 | ### Meshes
102 | 
103 | * [MPI FAUST Dataset](http://faust.is.tue.mpg.de/)
104 | * [Tosca](http://tosca.cs.technion.ac.il/book/resources_data.html)
105 | 
106 | ## Classification
107 | 
108 | * [Image Classification datasets results](http://rodrigob.github.io/are_we_there_yet/build/classification_datasets_results.html)
109 | 


--------------------------------------------------------------------------------
/graph.md:
--------------------------------------------------------------------------------
  1 | # Deep Learning on Graphs
  2 | 
  3 | [General](README.md) | Graphs
  4 | 
  5 | ## Videos
  6 | 
  7 | * [New Deep Learning Techniques (Bresson, Monti, Bruna, Bronstein)](http://www.ipam.ucla.edu/programs/workshops/new-deep-learning-techniques/?tab=schedule)
  8 | 
  9 | ## Introduction
 10 | 
 11 | * [How do I generalize convolution of neural networks to graphs?](https://www.quora.com/How-do-I-generalize-convolution-of-neural-networks-to-graphs) [Spatial vs. Spectral]
 12 | * [Deep Learning on Graphs](https://figshare.com/articles/Deep_Learning_on_Graphs/4491686) [Slides]
 13 | * [Gated Graph Sequence Neural Networks](https://arxiv.org/pdf/1511.05493.pdf)
 14 | * [SchNet: A continuous-filter convolutional neural network for modeling quantum interactions](https://arxiv.org/pdf/1706.08566.pdf) (NIPS 2017)
 15 | 
 16 | ## Geometric Deep Learning
 17 | 
 18 | * [Geometric deep learning](http://geometricdeeplearning.com/) [Collection of links]
 19 | * [Geometric deep learning: going beyond Euclidean data](https://arxiv.org/pdf/1611.08097.pdf)
 20 | * [Geometric deep learning on graphs and manifolds using mixture model CNNs](https://arxiv.org/pdf/1611.08402.pdf)
 21 | * [Geodesic convolutional neural networks on Riemannian manifolds](https://arxiv.org/pdf/1501.06297.pdf)
 22 | * [Geometric deep learning on graphs](https://www.dropbox.com/s/4l6m32tg9yecvow/CVPR%20GDL.pdf?dl=0) [Slides (228MB)]
 23 | * [Robust Spatial Filtering with Graph Convolutional Neural Networks](https://arxiv.org/pdf/1703.00792.pdf) (Graph-CNNS + Graph Embed Pooling) [[Code](https://github.com/fps7806/Graph-CNN)]
 24 | 
 25 | ## Spectral Approach
 26 | 
 27 | * [Spectral Graph Theory](http://www.math.ucsd.edu/~fan/research/revised.html)
 28 | * [Wavelets on Graphs via Spectral Graph Theory](https://arxiv.org/pdf/0912.3848.pdf)
 29 | * [The Emerging Field of Signal Processing on Graphs](https://arxiv.org/pdf/1211.0053.pdf)
 30 | * [Discrete Laplace-Beltrami Operators for Shape Analysis and Segmentation](https://reuter.mit.edu/papers/reuter-smi09.pdf)
 31 | * [What's the intuition behind a Laplacian matrix](https://www.quora.com/Graph-Theory-Whats-the-intuition-behind-a-Laplacian-matrix)
 32 | 
 33 | ### Architectures
 34 | 
 35 | * [Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering](https://arxiv.org/pdf/1606.09375.pdf) [[TF Code](https://github.com/mdeff/cnn_graph)] [[PyTorch Code](https://github.com/xbresson/graph_convnets_pytorch)] [[Notebook](http://nbviewer.jupyter.org/github/mdeff/cnn_graph/blob/outputs/usage.ipynb)]
 36 | * [Semi-Supervised Classification with Graph Convolutional Networks](https://arxiv.org/pdf/1609.02907v3.pdf) [[Code](https://github.com/tkipf/gcn)] [[Explanation](http://tkipf.github.io/graph-convolutional-networks/)] [[Criticism](http://www.inference.vc/how-powerful-are-graph-convolutions-review-of-kipf-welling-2016-2/)] [[Review](https://openreview.net/forum?id=SJU4ayYgl)]
 37 | * [CayleyNets: Graph Convolutional Neural Networks with Complex Rational Spectral Filters](https://arxiv.org/pdf/1705.07664.pdf)
 38 | * [Spectral Networks and Deep Locally Connected Networks on Graphs](https://arxiv.org/pdf/1312.6203.pdf)
 39 | * [Convolutional Neural Networks auf Graphrepräsentationen von Bildern](https://github.com/rusty1s/deep-learning-on-graphs/tree/master/masterthesis) [[Code](https://www.github.com/rusty1s/embedded_gcnn)]
 40 | * [Deep Convolutional Networks on Graph-Structured Data](https://arxiv.org/pdf/1506.05163.pdf)
 41 | * [Convolutional Networks on Graphs for Learning Molecular Fingerprints](https://hips.seas.harvard.edu/files/duvenaud-graphs-nips-2015.pdf) [Node-degree specific weight matrices]
 42 | * [Graph image representation from convolutional neural networks](https://www.google.ch/patents/US9418458)
 43 | * [Dynamic Graph Convolutional Networks](https://arxiv.org/pdf/1704.06199.pdf)
 44 | 
 45 | #### Autoencoder
 46 | 
 47 | * [Automatic chemical design using a data-driven continuous representation of molecules](https://arxiv.org/pdf/1610.02415.pdf)
 48 | * [Variational Graph Auto-Encoders](https://arxiv.org/abs/1611.07308)
 49 | * [GraphVAE: Towards Generation of Small Graphs Using Variational Autoencoders](https://openreview.net/forum?id=SJlhPMWAW)
 50 |   * Graphs with varying nodes decoded and encoded to graphs with $K$ nodes and a graph matching algorithm for reconstruction loss
 51 | * [GraphGAN: Generating Graphs via Random Walks](https://openreview.net/forum?id=H15RufWAW)
 52 | * [Learning Deep Generative Models of Graphs](https://openreview.net/forum?id=Hy1d-ebAb)
 53 | * [Learning Graphical State Transitions](http://www.hexahedria.com/files/2017learninggraphical.pdf)
 54 | * [GRASS: Generative Recursive Autoencoders for Shape Structure](https://arxiv.org/abs/1705.02090) [[Reddit](https://www.reddit.com/r/MachineLearning/comments/7j70n8/r_grass_generative_recursive_autoencoders_for/)]
 55 | * [Convolutional Mesh Autoencoders for 3D Face Representation](https://openreview.net/forum?id=HJGcNz-0W)
 56 |   * Reconstruction of 3D Faces with a 8-dim latent space
 57 |   * Pooling is saved for reconstruction
 58 |   * Pooling Operation based on [surface error approximations using quadric metrices](https://people.eecs.berkeley.edu/~jrs/meshpapers/GarlandHeckbert2.pdf)
 59 | 
 60 | #### RNNs
 61 | 
 62 | * [Structural-RNN: Deep Learning on Spatio-Temporal Graphs](http://cvgl.stanford.edu/papers/jain_cvpr16.pdf)
 63 | 
 64 | ### Propagation rules
 65 | 
 66 | * **SGCNN (Spectral Graph Convolutional Neural Network):**
 67 |   * $\tilde{L} := L - I$
 68 |   * $\overline{F}_0 := F\_{in} W_0$
 69 |   * $\overline{F}_1 := \tilde{L} F\_{in} W_1$
 70 |   * $\overline{F}_k := \left(2\tilde{L} \overline{F}\_{k-1} - \overline{F}\_{k-2} \right) W_k$
 71 |   * $W \in \mathbb{R}^{(K+1) \times M\_{in} \times M\_{out}}$
 72 | 
 73 | $$
 74 | F_{out} := \sum\_{k=0}^K \overline{F}\_k
 75 | $$
 76 | 
 77 | * **GCN (Graph Convolutional Network):**
 78 |   * $\tilde{A} := A + I$
 79 |   * $\tilde{D} := D + I$
 80 |   * $W \in \mathbb{R}^{M\_{in} \times M\_{out}}$
 81 | 
 82 | $$
 83 | F_{out} := \tilde{D}^{-\frac{1}{2}} \tilde{A} \tilde{D}^{-\frac{1}{2}} F\_{in} W
 84 | $$
 85 | 
 86 | * **AGNN (Attention-based Graph Neural Network):** [Link](https://openreview.net/forum?id=rJg4YGWRb)
 87 |   * introdues in addition to GCN an additional scalar parameter $\beta$ to allow for an dynamic (changes over the layers with differing $\beta$) and adaptive (learns to weight more relevant neighbors higher)
 88 |   * $P_{ij}^{\beta^{(t)}} = \mathrm{softmax}(\beta^{(t)} \cos(H_i^{(t)}, H_j^{(t)}))$ for $j \in \mathcal{N}(i) \cup \lbrace i \rbrace$
 89 | 
 90 | $$
 91 | H^{(t+1)} = P^{\beta^{(t)}} + H^{(t)}
 92 | $$
 93 | 
 94 | * **EGCNN (Embedded Graph Convolutional Neural Network):**
 95 |   * Number of partitions $P$
 96 |   * closed B-Spline function $\mathrm{N}^K\_p$ with degree $K$ and node vector $\tau := [\alpha_0, \ldots, \alpha_{P+K}]^{\top}$ with $\alpha_p := 2\pi p / P$
 97 |   * $W \in \mathbb{R}^{(P+1) \times M\_{in} \times M\_{out}}$
 98 | 
 99 | $$
100 | F_{out} := \tilde{D}^{-1}\_{dist} F\_{in} W\_{P+1} + \sum\_{p=0}^{P-1} \left( \mathrm{N}^K_p(A\_{rad}) \odot \left( \tilde{D}\_{dist}^{-\frac{1}{2}} \tilde{A}\_{dist} \tilde{D}^{-\frac{1}{2}}\_{dist} \right) \right) F\_{in} W\_{p+1}
101 | $$
102 | 
103 | ## Spatial Approach
104 | 
105 | * [Learning Convolutional Neural Networks for Graphs](https://arxiv.org/pdf/1605.05273.pdf) [[Slides](http://www.matlog.net/icml2016_slides.pdf)]
106 | * [Generalizing CNNs for data structured on locations irregularly spaced out](https://arxiv.org/pdf/1606.01166.pdf)
107 | 
108 | ## Isomorphism
109 | 
110 | * [The Weisfeiler-Lehman Method and Graph Isomorphism Testing](https://arxiv.org/pdf/1101.5211v1.pdf)
111 | * [Pratical graph isomorphism, II](https://arxiv.org/pdf/1301.1493v1.pdf)
112 | * [Canonical Labelings with Nauty](https://computationalcombinatorics.wordpress.com/2012/09/20/canonical-labelings-with-nauty/) [[Code](http://pallini.di.uniroma1.it)] [[Python Wrapper](https://web.cs.dal.ca/~peter/software/pynauty/html/index.html)]
113 | 
114 | ## Graph Kernels
115 | 
116 | * [Graph Kernels](https://edoc.ub.uni-muenchen.de/7169/1/Borgwardt_KarstenMichael.pdf)
117 | * [Image Classification with Segmenation Graph Kernels](http://www.di.ens.fr/~fbach/harchaoui_bach_cvpr07.pdf)
118 | * [Deep Graph Kernels](http://dl.acm.org/citation.cfm?id=2783417)
119 | 
120 | ## Coarsening
121 | 
122 | * [Weighted Graph Cuts without Eigenvectors: A Multivel Approach](http://www.cs.utexas.edu/users/inderjit/public_papers/multilevel_pami.pdf)
123 | 


--------------------------------------------------------------------------------
/cheatsheets/convolutions.tex:
--------------------------------------------------------------------------------
  1 | \documentclass[pdftex,10pt,a4paper]{scrartcl}
  2 | 
  3 | \usepackage[a4paper,left=2.5cm,right=2.5cm,bottom=3cm,top=3cm]{geometry}
  4 | \usepackage{amsmath,amssymb,amsthm}
  5 | \usepackage{url}
  6 | \usepackage{todonotes}
  7 | \usepackage[numbers,sort]{natbib}
  8 | \parindent=0cm
  9 | 
 10 | \title{Graph Convolutions}
 11 | \date{\vspace{-5ex}}
 12 | 
 13 | \begin{document}
 14 | 
 15 | \maketitle
 16 | 
 17 | \section{Preliminaries}
 18 | 
 19 | Let $\mathcal{G} = (\mathcal{V}, \mathcal{E}, \mathbf{W})$ be a \emph{weighted graph} with $\mathcal{V} = \{1, \ldots, n\}$, $\mathcal{E} \subseteq \mathcal{V} \times \mathcal{V}$ and \emph{adjacency matrix} $\mathbf{W} \in \mathbb{R}^{n \times n}$, where $W_{ij} > 0$ iff $(i, j) \in \mathcal{E}$ and $W_{ij} = 0$ iff $(i, j) \not\in \mathcal{E}$.
 20 | Note that $\mathbf{W}$ is usually \emph{sparse} with $|\mathcal{E}| \ll n^2$ entries.
 21 | $\mathcal{G}$ is called \emph{undirected} iff $W_{ij} = W_{ji}$ for all $i,j \in \mathcal{V}$.
 22 | We further assume that $\mathcal{G}$ contains no \emph{self-loops}, meaning $(i, i) \not\in \mathcal{E}$.
 23 | $\mathbf{W}$ implies the diagonal \emph{degree matrix} $\mathbf{D} \in \mathbb{R}^{n \times n}$, where $D_{ii} = \sum_{j \in \mathcal{V}} W_{ij}$.
 24 | For a node $i \in \mathcal{V}$ its \emph{neighborhood} set is defined by $\mathcal{N}(i)$.
 25 | % \\\\
 26 | 
 27 | % The \emph{unnormalized Laplacian} of an weighted undirected graph without self-loops is a $n \times n$ symmetric positive-semidefinite matrix $\mathbf{L} = \mathbf{D} - \mathbf{W}$, where $\mathbf{D} = \mathrm{diag}\left( \sum_{i \in \mathcal{V}} w_{ij}\right)$. (wrong syntax)
 28 | % The \emph{normalized Laplacian}
 29 | 
 30 | % $\mathcal{N}(i)$
 31 | 
 32 | % A \emph{signal} $f \colon \mathcal{V} \to \mathbb{R}^m$ respectively $\mathbf{F} \in \mathbb{R}^{n \times m}$
 33 | % Let $f \colon \mathcal{V} \to \mathbb{R}$ respectively $\mathbf{f} \in \mathbb{R}^n$ be a \emph{feature} on the nodes of the graph.
 34 | 
 35 | 
 36 | \section{Convolutions}
 37 | 
 38 | \paragraph{Spectral CNN}
 39 | 
 40 | 
 41 | Spectral networks and locally connected networks on graphs
 42 | Bruna et al 2014
 43 | 
 44 | \paragraph{Smooth Spectral CNN}
 45 | 
 46 | Deep convoltional networks on graph-structured data
 47 | Henaff 2015
 48 | 
 49 | \paragraph{ChebNet}
 50 | 
 51 | Defferrard
 52 | 
 53 | \paragraph{GCN}
 54 | 
 55 | Kipf
 56 | 
 57 | \paragraph{CayleyNet}
 58 | 
 59 | Graph Convoltional neural networks with complex reational spectral filters
 60 | Spectral CNN with complex rational filters
 61 | Levie et al 2017
 62 | 
 63 | \paragraph{Graph NN (GNN)}
 64 | 
 65 | First neural net on graphs with the name
 66 | A new model for learning in graph domains
 67 | Gori, Monfardini
 68 | 
 69 | \paragraph{Diffusion CNN (DCNN)}
 70 | 
 71 | Atwood 2016
 72 | Diffusion-convolutional neural networks
 73 | 
 74 | \paragraph{Geodesic CNN (GCNN)}
 75 | 
 76 | Geodesic convolution nueral networks on Riemannian manifolds
 77 | First spatial CNN on manifolds
 78 | Masci 2015
 79 | 
 80 | \paragraph{Anisotropic CNN (ACNN)}
 81 | 
 82 | Learning shape correspondence with anisotropic convoltional neural networks
 83 | Boscaini 2015
 84 | 
 85 | \paragraph{MoNet}
 86 | 
 87 | Let $\mathbf{u} \colon \mathcal{V} \times \mathcal{V} \to \mathbb{R}^d$ define a $d$-dimensional vector of \emph{pseudo-coordinates}.
 88 | \emph{MoNet}~\cite{Monti2016} then uses the generic \emph{patch operator}
 89 | \begin{equation*}
 90 |   D_k(i)f = \sum_{j \in \mathcal{N}(i)} w_k(\mathbf{u}(i, j)) f(j), \quad k \in \{ 1, \ldots, K \},
 91 | \end{equation*}
 92 | for the spatial definition of a convolution on a graph signal $f \colon \mathcal{V} \to \mathbb{R}$ respectively $\mathbf{f} \in \mathbb{R}^n$
 93 | \begin{equation*}
 94 |   (\mathbf{f} \star \mathbf{g})(i) = \sum_k^K D_k(i)f,
 95 | \end{equation*}
 96 | where $K \in \mathbb{N}$ is the dimensionality of the extracted patch and $\mathbf{w}(\mathbf{u}) = (w_1(\mathbf{u}), \ldots, w_K(\mathbf{u}))$ is a continuous kernel parametrized by some finite set of learnable parameters.
 97 | MoNet~\cite{Monti2016} suggests the use of the \emph{gaussian kernel}
 98 | \begin{equation*}
 99 |   w_k(\mathbf{u}) = \exp \left(-\frac{1}{2} {(\mathbf{u} - \boldsymbol{\mu}_k)}^{\top} {\mathrm{diag}(\boldsymbol{\sigma}_k)}^{-1} (\mathbf{u} - \boldsymbol{\mu}_k) \right)
100 | \end{equation*}
101 | with the diagonal covariance matrix $\mathrm{diag}(\boldsymbol{\sigma}_k) \in \mathbb{R}^{d \times d}$ and mean vector $\boldsymbol{\mu}_k \in \mathbb{R}^d$, resulting in $2Kd$ learnable parameters.
102 | For arbitrary graphs one can choose the pseudo-coordinates to use the degrees of the nodes $\mathbf{u}(i,j) = \left( D_{ii}^{-1/2}, D_{jj}^{-1/2} \right)$, whereas one can take the polar coordinates $\mathbf{u} = (\rho, \varphi)$ respectively $\mathbf{u} = (r, \varphi, \theta)$ for 2D or 3d graph embeddings like discrete manifolds.
103 | 
104 | \paragraph{Localized SCNN}
105 | 
106 | Learning class-specific descriptors for deformable shapes using localized spectral convolutional networks
107 | Boscaini 2015
108 | 
109 | \paragraph{Spherical CNN}
110 | 
111 | Deep learing 3D shape surfaces using geometry images
112 | CNN on spherical authalic parametrization
113 | Sinha 2016
114 | 
115 | \paragraph{Toric CNN}
116 | 
117 | Convoltional neural networks on surfaces via seamless toric covers
118 | CNN on planar flat-torus parametrization
119 | Maron 2017
120 | 
121 | \paragraph{SyncSpecCNN}
122 | 
123 | Spectal transformer networks
124 | CVPR
125 | Yi 2017
126 | SyncSpecCNN: Synchronized Spectral CNN for 3d Shape segmentation
127 | 
128 | \paragraph{SchNet}
129 | 
130 | NIPS 2017
131 | Schuett
132 | Modeling quantum interactions in molecules
133 | no edges, only points
134 | 
135 | \subsection{SplineConv}
136 | 
137 | % Let $\xi = (t_0, \ldots, t_)$ be a node vector.
138 | % An \emph{open b-spline-function} $N_k^m$ over $k \in \{ 1, \ldots, K \}$ is recursively defined by
139 | % \begin{equation*}
140 | %   N_k^0(t) = \begin{cases}
141 | %     1 & \text{if } t \in [t_k, t_{k+1})\\
142 | %     0 & \text{else}
143 | %   \end{cases}
144 | % \end{equation*}
145 | 
146 | % A \emph{closed b-spline-function}
147 | 
148 | % We choose a different weight function based on \emph{B-Splines} to make use of \emph{local controllable} filters and to reduce computation significantly~\cite{Fey2017}:
149 | % \begin{equation*}
150 | %   w_k(\rho, \varphi) = \mu_k \hat{\rho} \bar{N}_k^m(\varphi)
151 | % \end{equation*}
152 | % with $\boldsymbol{\mu} = (\mu_1, \ldots, \mu_K )$ learnable parameters where $\bar{N}_k^m$ is a closed b-spline-function of degree $m \in \mathbb{N}$ with uniform node vector over the interval $[0, 2\pi]$ and $\hat{\rho} = \exp(\frac{\rho^2}{2\sigma^2})$.
153 | % \todo{emerging signal processing cite for gaussian}
154 | 
155 | % We can also use the Tensorproduktkonstruktion to also parametrize the distances via\todo{lol}
156 | % \begin{equation*}
157 | %   w_{kl}(\rho, \varphi) = \mu_{kl} N_k^m(\hat{\rho}) \bar{N}_l^m(\varphi)
158 | % \end{equation*}
159 | % where we combine open and closed b-spline functions.
160 | 
161 | % To modify it to the 3-dimenional space, one can use the polar coordinates $(\rho, \varphi, \theta)$ to
162 | % \begin{equation*}
163 | %   w_{klq}(\rho, \varphi, \theta) = \mu_{klm} N_k^m(\hat{\rho}) \bar{N}_l^m(\varphi) \bar{N}_q^m(\theta)
164 | % \end{equation*}
165 | 
166 | % The good thing is the runtime.
167 | % Whereas MoNet has a runtime dependent on $K$, which is typical the dominant factor between $d$ and $K$, this method is mostly dependent on $M$, which is typical low ($1$ or $2$).
168 | 
169 | % Computing $N_k^m(\cdot)$ for all edges and all $k \in \{ 1, \ldots, K \}$ yields $K$ adjacency matrices with $(m + 1) |\mathcal{E}|$ entries combined.
170 | 
171 | % If we further restrict $m = 1$, which leads to linear interpolation, we can reach a runtime nearly independent of $K$ with $\mathcal{O}(K |\mathcal{E}| + M^{\mathrm{in}} M^{\mathrm{out}} |\mathcal{E}|)$ which is quite as fast as the GCN introduced by Kipf et al.\todo{cite}
172 | 
173 | % \newpage
174 | 
175 | $n$ node count,
176 | $d$ dimensionality,
177 | $k$ partitions,
178 | $m$ b-spline degree which should be fixed to $1$,
179 | $y$ incoming features and
180 | $z$ outgoing features.
181 | \\
182 | $\mathbf{W} \in \mathbb{R}^{k \times y \times z}$,
183 | $\mathbf{F}^{\mathrm{in}} \in \mathbb{R}^{n \times y}$,
184 | $\mathbf{A} \in \mathbb{R}^{n \times n \times d}$ and
185 | $\mathbf{F}^{\mathrm{out}} \in \mathbb{R}^{n \times z}$.
186 | \\
187 | Let $N_i^m \colon [0, 1] \to [0, 1]$ be an open and $\bar{N}_i^m \colon [0, 2\pi] \to [0, 1]$ be a closed b-spline function with uniform node vectors.
188 | We can redefine the kernel $\mathbf{w}(\mathbf{u})$ of the patch operator $D_k(i)f$ used in MoNet using localized b-spline functions as
189 | \begin{equation*}
190 |   w_i(\rho, \varphi) = \mu_i \hat{\rho} \bar{N}_i^m(\varphi)
191 | \end{equation*}
192 | respectively
193 | \begin{equation*}
194 |   w_{ij}(\rho, \varphi, \theta) = \mu_{ij} \hat{\rho} \bar{N}_i^m(\varphi) \bar{N}_j^m(\theta),
195 | \end{equation*}
196 | where the radius $\rho$ is inverted by the \emph{gaussian function} $\hat{\rho} = \exp(\frac{\rho^2}{2\sigma^2})$.
197 | For an additional parametrization of $\rho$ one can also replace $\hat{\rho}$ with the corresponding b-spline function resulting in
198 | \begin{equation*}
199 |   w_{ijk}(\rho, \varphi, \theta) = \mu_{ijk} N_i^m(\hat{\rho}) \bar{N}_j^m(\varphi) \bar{N}_k^m(\theta).
200 | \end{equation*}
201 | \\\\
202 | Index-Notation nicht korrekt ($k$ wird hier als Index benutzt, $i$ und $j$ eigl reserviert fuer Knoten)
203 | $\mu$ muss eigl ab zwei Indizes auch gross geschrieben werden
204 | 
205 | \paragraph{Efficient parallel computing}
206 | 
207 | For each edge in $\mathbf{A} \in \mathbb{R}^{n \times n \times d}$ there exists $m + 1$ b-spline values $\bar{N}_i^m(\varphi) \neq 0, i = 1, \ldots, k$.
208 | Stimmt das auch fuer offene b-spline-Funktionen?
209 | Hence, we can compute $\mathbf{B} \in {[0, 1]}^{n \times n \times d \times m + 1}$ as an equally dense matrix of $\mathbf{A}$ containing the b-spline values $\neq 0$, and a corresponding index matrix $\mathbf{C} \in {\{ 1, \ldots, k \}}^{n \times n \times d \times m + 1}$ containing the indices of the b-spline-functions $\neq 0$.
210 | Each $N_i^m$ respectively $\bar{N}_i^m$ can be computed element-wise in $\mathcal{O}(1)$ (stimmt das auch fuer hoehere $m$?) resulting in a runtime of $\mathcal{O}(dm|\mathcal{E}|)$, where $d \in \{2,3\}$ and typically $m \in \{1, 2\}$ are quite low numbered.
211 | 
212 | The convolution operator $(\mathbf{F} \star \mathbf{W}) \colon \mathbb{R}^{n \times y} \to \mathbb{R}^{n \times z}$ can therefore be computed as following:
213 | \begin{enumerate}
214 |   \item Compute $\mathbf{B} \in \mathbb{R}^{n \times n \times m + 1}$ respectively $\mathbf{C} \in \mathbb{R}^{n \times n \times m + 1}$ (ich hab das mal auf $d = 1$ begrenzt)
215 |   \item Allocate a zero-filled matrix $\mathbf{F}^{\mathrm{out}} \in \mathbb{R}^{n \times z}$.
216 |   \item For every edge $(i, j) \in \mathcal{E}$ do the following:
217 |   \begin{enumerate}
218 |     \item $F^{\mathrm{out}}_i \leftarrow F^{\mathrm{out}}_i + \sum_a^{m+1} B_{ija} W_{C_{ija}} F^{\mathrm{in}}_j$
219 |   \end{enumerate}
220 | \end{enumerate}
221 | Note that the convolution operator is particularly independent of the number of partitions $k$, resulting in a very fast approach nearly competing with the runtime of the convolution operator from Kipf et al.\ by being far more powerful on discrete manifolds.
222 | 
223 | For batch-wise convolutions one can interpret the batch of adjacency matrices $\{ \mathbf{A}_1, \ldots, \mathbf{A}_b \}$ as one adjacency matrix
224 | \begin{equation*}
225 |   \mathbf{A} = \begin{bmatrix}
226 |     \mathbf{A}_1 & & \\
227 |     & \ddots &  \\
228 |     & & \mathbf{A}_b
229 |   \end{bmatrix}.
230 | \end{equation*}
231 | 
232 | % For every $N_i^m$
233 | % First, do some element-wise preprocessing on $\mathbf{A}$ resulting in $2(m + 1)$ new $\mathbf{B} \in \mathbb{R}^{n \times n \times (m+1) \times 2}$ adjaceny matrices.
234 | 
235 | 
236 | 
237 | 
238 | 
239 | 
240 | 
241 | 
242 | \newpage
243 | 
244 | \todo{normalize by node degree}
245 | \todo{important: don't forget root nodes}
246 | \todo{write tensor syntax down}
247 | 
248 | \begin{itemize}
249 |   \item Insbesondere Gradienten beschreiben
250 | \end{itemize}
251 | 
252 | \section{Applications}
253 | 
254 | \subsection{Images}
255 | 
256 | Classification task on MNIST~\cite{mnist} and Cifar-10~\cite{cifar10}.
257 | MoNet uses superpixel-based approach with (fixed) 300, 150 and 75 vertices and no word on feature selection with 3 convolution layers and max pooling by a factor of four.
258 | superpixel barycenters and 25! gaussian kernels initialized with random means and variances for 63! epochs
259 | No word of runtime.
260 | Each Superpixel is connected with each other (lol).
261 | 
262 | \subsection{Manifolds}
263 | 
264 | Learning dense intrinsic correspondence between collections of 3D shapes represented as discrete manifolds, where one tries to label each vertex of a given query shape $\mathcal{X}$ with the index of a corresponding point on some reference shape $\mathcal{Y}$.
265 | Let $n$ and $m$ denote the number of vertices in $\mathcal{X}$ and $\mathcal{Y}$, respectively.
266 | For a point $x$ on a query shape, the last layer of the network is a soft-max, producing an $m$-dimensional output $f(x)$ interpreted as a probability distribution on $\mathcal{Y}$.
267 | 
268 | \subsubsection{Shape correspondence}
269 | 
270 | \paragraph{Meshes}
271 | 
272 | FAUST humans dataset consisting of 100 watertight meshes representing $10$ different poses for $10$ different subjects with exact ground-truth correspondence.
273 | Each shape is represented as a mesh with $6890$ vertices.
274 | The first subject in first pose was used as the reference.
275 | For all the shapes, point-wise 544-dimensional SHOT descriptors were used as input data.
276 | MoNet architecture with 3 convolutional layers.
277 | First 80 subjects in all poses were used for training (800 shapes in total), the remaining 20 subjects were used for testing.
278 | The output of the network was refined using the intrinsic Bayesian filter in order to remove some local outliers.
279 | Correspondnce quality was evaluated using the Princeton benchmark, plotting the percentage of matches that are at most $r$-geodesically distant from the groundtruth correspondence on the reference shape.
280 | 
281 | \paragraph{Range maps}
282 | 
283 | Repeated shape correspondence experiment on range maps synthetically generated from FASUT meshes.
284 | For each subject and pose produced 10 rangemaps in $100 \times 180$ resolution, covering shape rotations around the $z$-axis with increments of $36$ degrees (total of $1000$ range maps).
285 | For comparision, we show the performance of a stanrard Euclidean CNN applied on raw depth values and on SHOT descriptors.
286 | 
287 | \subsubsection{Shape retrieval}
288 | 
289 | In the shape retrieval application, we are interested in producing a global shape descriptor that discrimnates between shape classes.
290 | 
291 | \subsubsection{Invariant descriptors}
292 | 
293 | Point-wise applying some input feature vector, the output can be regarded as a dense local descriptor at point $x$.
294 | Our goal is to make the output of the network as similar as possible at coresponding points (positives) across a collection of shapes, and as dissimilar as possible at non-corresponding points (negatives).
295 | For this purpose, we use a siamese network configuration and minimize the siamese loss.
296 | 
297 | \paragraph{GCNN}
298 | 
299 | also used the TOSCA dataset containing syntetic models of humans in a variety of near-isometric deformations.
300 | The meshes in TOSCA were resamples to $10$K vertices.
301 | 
302 | \section{Multi graphs}
303 | 
304 | was von monti 2017
305 | CNNs on multiple graphs and application to matrix completion
306 | 
307 | \bibliographystyle{plain}
308 | \bibliography{bibliography}
309 | 
310 | \end{document}
311 | 


--------------------------------------------------------------------------------