2 | {% include banners/banner_icon.html %}
3 |
4 | Looks like this page still needs to be completed!
5 | If you want to help, you can
6 | edit this page on Github.
8 |
9 |
2 | {% include banners/banner_icon.html %}
3 |
4 | The below definition has been marked for review.
5 | If you want to help, you can
6 | edit this page on Github.
8 |
9 |
10 |
--------------------------------------------------------------------------------
/terms/bagging.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Bagging
3 | references:
4 | - link_title: Bootstrap aggregating - Wikipedia
5 | link_url: http://en.wikipedia.org/wiki/Bootstrap_aggregating
6 | ---
7 | Bagging, short for bootstrap aggregating, is training different base learners on different subsets of the training set randomly, by drawing random training sets from the given sample (with replacement).
8 |
--------------------------------------------------------------------------------
/_sass/tachyons/scss/_forms.scss:
--------------------------------------------------------------------------------
1 |
2 | // Converted Variables
3 |
4 |
5 | // Custom Media Query Variables
6 |
7 |
8 | /*
9 |
10 | FORMS
11 |
12 | */
13 |
14 | .input-reset {
15 | -webkit-appearance: none;
16 | -moz-appearance: none;
17 | }
18 |
19 | .button-reset::-moz-focus-inner,
20 | .input-reset::-moz-focus-inner {
21 | border: 0;
22 | padding: 0;
23 | }
24 |
--------------------------------------------------------------------------------
/terms/continuous-bag-of-words-cbow.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Continuous-Bag-of-Words (CBOW)
3 | related_terms:
4 | - word-embedding
5 | - bag-of-words
6 | - word2vec
7 | ---
8 | Continuous Bag of Words refers to a algorithm
9 | that predicts a target word from its
10 | surrounding context.
11 |
12 | CBOW is one of the algorithms used for training
13 | [word2vec](/terms/word2vec/) vectors.
--------------------------------------------------------------------------------
/terms/expectation-maximization-em-algorithm.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Expectation-maximization algorithm
4 | link_url: https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm
5 | related_terms:
6 | - maximum-a-posteriori-map-estimation
7 | - maximum-likelihood-estimation-mle
8 | - expectation
9 | title: Expectation-maximization (EM) algorithm
10 | ---
11 |
--------------------------------------------------------------------------------
/terms/passive-aggressive-algorithm.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Online Passive-Aggressive Algorithms
4 | link_url: http://jmlr.csail.mit.edu/papers/volume7/crammer06a/crammer06a.pdf
5 | - link_title: Passive Aggressive Algorithms - scikit-learn Documentation
6 | link_url: http://scikit-learn.org/stable/modules/linear_model.html#passive-aggressive
7 | title: Passive-Aggressive Algorithm
8 | ---
9 |
--------------------------------------------------------------------------------
/serve.sh:
--------------------------------------------------------------------------------
1 | #!/bin/sh
2 | # This runs "jekyll serve" inside the container, which will build the website.
3 | exec docker run \
4 | --rm \
5 | -it \
6 | --init \
7 | --volume="${PWD}":/opt/buildhome/repo \
8 | --workdir=/opt/buildhome/repo \
9 | --entrypoint=/opt/build-bin/build \
10 | -p 4000:4000 \
11 | netlify/build:xenial \
12 | jekyll serve --host=0.0.0.0 --incremental
13 |
--------------------------------------------------------------------------------
/terms/decision-tree.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Decision tree
3 | references:
4 | - link_title: Decision tree - Wikipedia
5 | link_url: https://en.wikipedia.org/wiki/Decision_tree
6 | related_terms:
7 | - random-forest
8 | ---
9 | A supervised learning method that iteratively refines a prediction by asking questions about the input feature most likely to affect the outcome, making a 'tree' of question branches.
10 |
--------------------------------------------------------------------------------
/terms/sequential-model-based-optimization-smbo.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Sequential Model-Based Optimization for General Algorithm Configuration
4 | (extended version)
5 | link_url: https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf
6 | related_terms:
7 | - bayesian-optimization
8 | - structured-bayesian-optimization-sbo
9 | title: Sequential Model-Based Optimization (SMBO)
10 | ---
11 |
--------------------------------------------------------------------------------
/terms/second-order-information.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Second-order information
3 | needs_review: true
4 | related_terms:
5 | - first-order-information
6 | - hessian-matrix
7 | - hessian-free-optimization
8 | ---
9 | The term *second-order information* refers to information
10 | about a function gained by computing its second derivative.
11 | The second derivative reveals information about the function's
12 | curvature.
--------------------------------------------------------------------------------
/terms/boosting.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Boosting
3 | references:
4 | - link_title: Bootstrap aggregating - Wikipedia
5 | link_url: http://en.wikipedia.org/wiki/Bootstrap_aggregating
6 | ---
7 | Learners trained serially so that instances on which the preceding base learners are not accurate are given more emphasis in training later base-learners; actively tries to generate complementary learners, instead of leaving this to chance.
8 |
--------------------------------------------------------------------------------
/terms/contextual-bandit.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Contextual Bandit - Multi-armed Bandit - Wikipedia
4 | link_url: https://en.wikipedia.org/wiki/Multi-armed_bandit#Contextual_Bandit
5 | - link_title: An Introduction to Contextual Bandits - Stream
6 | link_url: https://getstream.io/blog/introduction-contextual-bandits/
7 | related_terms:
8 | - multi-armed-bandit
9 | title: Contextual Bandit
10 | ---
11 |
--------------------------------------------------------------------------------
/_includes/references.html:
--------------------------------------------------------------------------------
1 | {% if include.page.references %}
2 |
10 | {% endif %}
--------------------------------------------------------------------------------
/terms/perplexity.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Perplexity
3 | ---
4 | Wikipedia [defines perplexity][1] as the following:
5 |
6 | > In information theory, perplexity is a measurement of how well a probability distribution or probability model predicts a sample. It may be used to compare probability models. A low perplexity indicates the probability distribution is good at predicting the sample.
7 |
8 | [1]: https://en.wikipedia.org/wiki/Perplexity
--------------------------------------------------------------------------------
/terms/adversarial-variational-bayes.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: 'Adversarial Variational Bayes: Unifying Variational Autoencoders and
4 | Generative Adversarial Networks'
5 | link_url: https://arxiv.org/abs/1701.04722
6 | related_terms:
7 | - autoencoder
8 | - adversarial-autoencoder
9 | - variational-autoencoder-vae
10 | - generative-adversarial-network-gan
11 | title: Adversarial Variational Bayes
12 | ---
13 |
--------------------------------------------------------------------------------
/terms/gradient-descent.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Gradient descent
3 | ---
4 | Gradient descent is an optimization algorithm designed
5 | to find the minimum of a function. Many machine learning
6 | algorithms use gradient descent or a variant.
7 |
8 | Common variants include:
9 | - [Stochastic Gradient Descent (SGD)](/terms/stochastic-gradient-descent-sgd/)
10 | - [Minibatch Gradient Descent](/terms/minibatch-gradient-descent/)
11 |
12 |
--------------------------------------------------------------------------------
/terms/nchw.md:
--------------------------------------------------------------------------------
1 | ---
2 | related_terms:
3 | - nhwc
4 | title: NCHW
5 | ---
6 |
7 | **NCHW** is an acronym describing the order of the axes in a tensor containing image data samples.
8 |
9 | * **N**: Number of data samples.
10 | * **C**: Image channels. A red-green-blue (RGB) image will have 3 channels.
11 | * **H**: Image height.
12 | * **W**: Image width.
13 |
14 | NCHW is sometimes referred to as a **channels-first** layout.
15 |
--------------------------------------------------------------------------------
/meta/needs-review.html:
--------------------------------------------------------------------------------
1 | ---
2 | layout: page
3 | title: "Meta: Terms Needing Review"
4 | ---
5 |
6 | This page links to terms that have "complete" definitions,
7 | but are in need of review or further editing.
8 |
2 | {% include banners/banner_icon.html %}
3 |
4 | Want to improve this page?
5 | Edit this page or
7 | report an issue.
10 |
11 |
12 |
13 |
--------------------------------------------------------------------------------
/terms/recurrent-neural-network-language-model-rnnlm.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Recurrent Neural Networks - Tutorials - TensorFlow Documentation
4 | link_url: https://www.tensorflow.org/tutorials/recurrent#language_modeling
5 | related_terms:
6 | - natural-language-processing
7 | - recurrent-neural-network
8 | - neural-network
9 | title: Recurrent Neural Network Language Model (RNNLM)
10 | ---
11 | A Recurrent Neural Network Language Model (RNNLM) is
12 | a recurrent neural network tasked with modeling
13 | languages.
--------------------------------------------------------------------------------
/terms/nhwc.md:
--------------------------------------------------------------------------------
1 | ---
2 | related_terms:
3 | - nchw
4 | title: NHWC
5 | ---
6 |
7 | **NHWC** is an acronym describing the order of the axes in a tensor containing image data samples.
8 |
9 | Software frameworks for training machine learning models--such as TensorFlow--use the acronym NHWC
10 |
11 | * **N**: Number of data samples.
12 | * **H**: Image height.
13 | * **W**: Image width.
14 | * **C**: Image channels. A red-green-blue (RGB) image will have 3 channels.
15 |
16 | NHWC is sometimes referred to as a **channels-last** layout.
17 |
--------------------------------------------------------------------------------
/terms/fasttext.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: fastText
3 | related_terms:
4 | - word2vec
5 | - glove-word-embeddings
6 | - word-embedding
7 | ---
8 | fastText is a project from Facebook Research for producing
9 | [word embeddings](/terms/word-embedding/) and sentence
10 | classification.
11 |
12 | The fastText project is [hosted on Github](https://github.com/facebookresearch/fastText/) and
13 | instructions for using their pre-trained word embeddings
14 | can be [found here](https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md).
--------------------------------------------------------------------------------
/terms/hinge-loss.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Hinge loss
3 | needs_review: true
4 | related_terms:
5 | - loss-function
6 | ---
7 |
8 | From the scikit-learn documentation, we get [the following definition][1]:
9 |
10 | > The hinge_loss function computes the average distance between the model and the data using hinge loss, a one-sided metric that considers only prediction errors. (Hinge loss is used in maximal margin classifiers such as support vector machines.)
11 |
12 | [1]: http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter
--------------------------------------------------------------------------------
/_build/install-git-lfs.sh:
--------------------------------------------------------------------------------
1 | #!/bin/bash
2 |
3 | # This script is for installing git-lfs on Travis CI Linux environments.
4 | # Don't use this script to install git-lfs on your personal computer.
5 | # This is necessary until https://github.com/travis-ci/packer-templates/issues/386
6 | # is resolved.
7 | mkdir -p $HOME/bin
8 | wget https://github.com/git-lfs/git-lfs/releases/download/v2.1.1/git-lfs-linux-amd64-2.1.1.tar.gz
9 | tar xvfz git-lfs-linux-amd64-2.1.1.tar.gz
10 | mv git-lfs-2.1.1/git-lfs $HOME/bin/git-lfs
11 | export PATH=$PATH:$HOME/bin/
12 |
--------------------------------------------------------------------------------
/terms/dimensionality-reduction.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Dimensionality reduction
3 | references:
4 | - link_title: Dimensionality reduction - Wikipedia
5 | link_url: https://en.wikipedia.org/wiki/Dimensionality_reduction
6 | related_terms:
7 | - unsupervised-learning
8 | ---
9 | Dimensionality reduction is about taking a set of data, and reducing its number of dimensions in such a way as to balance information size and independence of the features' information. The point is to get a smaller dataset that still retains most of the original information.
10 |
--------------------------------------------------------------------------------
/terms/paragraph-vector.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Paragraph vector
3 | related_terms:
4 | - doc2vec
5 | - bag-of-words
6 | ---
7 | *Paragraph Vectors* is the name of the model proposed by [Le and Mikolov][1] to
8 | generate unsupervised representations of sentences, paragraphs, or entire documents
9 | without losing local word order.
10 |
11 | This is in contrast to [bag-of-words](/terms/bag-of-words/) representations, which
12 | can offer useful representations of documents but lose all word order information.
13 |
14 | [1]: https://arxiv.org/abs/1405.4053
--------------------------------------------------------------------------------
/terms/backpropagation.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Yes you should understand backprop - Andrej Karpathy
4 | link_url: https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b
5 | - link_title: Backpropagation - Wikipedia
6 | link_url: https://en.wikipedia.org/wiki/Backpropagation
7 | related_terms:
8 | - gradient-descent
9 | title: Backpropagation
10 | ---
11 | A technique to find good weight values in a neural network by trying different weights, and seeing if the change contributes positively to prediction quality.
12 |
--------------------------------------------------------------------------------
/terms/hessian-free-optimization.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Hessian Free Optimization - Andrew Gibiansky
4 | link_url: http://andrew.gibiansky.com/blog/machine-learning/hessian-free-optimization/
5 | - link_title: Deep learning via Hessian-free optimization
6 | link_url: http://www.cs.toronto.edu/~jmartens/docs/Deep_HessianFree.pdf
7 | - link_title: Hessian-free Optimization for Learning Deep Multidimensional Recurrent
8 | Neural Networks
9 | link_url: https://arxiv.org/abs/1509.03475
10 | title: Hessian-free optimization
11 | ---
12 |
--------------------------------------------------------------------------------
/terms/meteor-machine-translation.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: METEOR - Automatic Machine Translation Evaluation System
4 | link_url: http://www.cs.cmu.edu/~alavie/METEOR/
5 | related_terms:
6 | - bilingual-evaluation-understudy-bleu
7 | title: METEOR
8 | ---
9 | METEOR is an automatic evaluation metric for machine translation,
10 | designed to mitigate perceived weaknesses in
11 | [BLEU](/terms/bilingual-evaluation-understudy-bleu/). METEOR scores
12 | machine translation *hypotheses* by aligning them to reference translations,
13 | much like BLEU does.
--------------------------------------------------------------------------------
/terms/search-based-software-engineering-sbse.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Search-based software engineering - Wikipedia
4 | link_url: https://en.wikipedia.org/wiki/Search-based_software_engineering
5 | title: Search-based software engineering (SBSE)
6 | ---
7 | Search-based software engineering applies search and
8 | optimization techniques to software engineering problems.
9 |
10 | [LDADE][/terms/latent-dirichlet-allocation-differential-evolution-ldade] is an example of a system that applies search-based software engineering to optimize topic modeling.
--------------------------------------------------------------------------------
/_layouts/redirect.html:
--------------------------------------------------------------------------------
1 | ---
2 | layout: compress
3 | ---
4 | {% assign destination = "/terms/" | append: page.destination | append: "/" %}
5 |
6 |
7 |
8 |
9 | {{ page.title }} - {{ site.title }}
10 |
11 |
12 |
{{ page.title }} is a synonym for another term.
13 |
Click here if your browser does not automatically redirect.
14 |
15 |
--------------------------------------------------------------------------------
/_sass/tachyons/scss/_gradients.scss:
--------------------------------------------------------------------------------
1 |
2 | // Converted Variables
3 |
4 |
5 | // Custom Media Query Variables
6 |
7 |
8 | /*
9 |
10 | GRADIENTS
11 |
12 |
13 | */
14 |
15 | .gradient-blue {
16 | background-image: linear-gradient(#4570B0, #0081C2);
17 | }
18 |
19 | .gradient-blue-reversed {
20 | background-image: linear-gradient(#0081C2, #4570B0);
21 | }
22 |
23 | .gradient-light-blue {
24 | background-image: linear-gradient(#76D3FE, #008AE0);
25 | }
26 |
27 | .gradient-light-blue-reversed {
28 | background-image: linear-gradient(#008AE0, #76D3FE);
29 | }
30 |
--------------------------------------------------------------------------------
/terms/no-free-lunch-nfl-theorem.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Doesn't the NFL theorem show that black box optimization is flawed?
4 | - Black Box Optimization Competition
5 | link_url: https://bbcomp.ini.rub.de/faq.html#q20
6 | - link_title: No free lunch theorem - Wikipedia
7 | link_url: https://en.wikipedia.org/wiki/No_free_lunch_theorem
8 | related_terms:
9 | - optimization
10 | title: No Free Lunch (NFL) theorem
11 | ---
12 | The "No Free Lunch" theorem is the idea that all optimizers perform equally well
13 | when averaged across all possible optimization problems.
--------------------------------------------------------------------------------
/terms/softmax.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Softmax function - Wikipedia
4 | link_url: https://en.wikipedia.org/wiki/Softmax_function
5 | related_terms:
6 | - hierarchical-softmax
7 | title: Softmax
8 | ---
9 | The softmax turns $n$ numbers
10 | in $\mathbb R^N$ into a probability distribution proportional
11 | to the size of the numbers.
12 |
13 | Given an $n$-dimensional vector $\mathbf v$ with all component terms
14 | in $\mathbb R^N$, the softmax of $\mathbb v$ is:
15 | $$
16 | \mathrm{softmax}(\mathbb v)_i =
17 | \frac{\exp{(v_i)}}
18 | {\sum_{j=1}^{n} \exp{(v_i)}}
19 | $$
--------------------------------------------------------------------------------
/terms/top-5-error-rate.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: 'ImageNet: what is top-1 and top-5 error rate?'
4 | link_url: https://stats.stackexchange.com/questions/156471/imagenet-what-is-top-1-and-top-5-error-rate
5 | related_terms:
6 | - top-1-error-rate
7 | title: Top-5 error rate
8 | ---
9 | The term *top-5 error rate* refers method of benchmarking
10 | machine learning models in the ImageNet
11 | Large Scale Visual Recognition Competition.
12 |
13 | The model is considered to have classified a given image correctly
14 | if the target label is one of the model's top 5 predictions.
--------------------------------------------------------------------------------
/terms/inceptionism.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Inceptionism
3 | related_terms:
4 | - convolutional-neural-network-cnn
5 | - googlenet-neural-network
6 | - inception-neural-network
7 | ---
8 |
9 | *Inceptionism* refers to a visualization technique to understand what
10 | a neural network learned. The network is fed an image,
11 | asked what the network detected, and then that feature in the
12 | image is *amplified*. The full technique is described in the
13 | Google Research blog post titled [Inceptionism: Going Deeper into Neural Networks](https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html).
--------------------------------------------------------------------------------
/_sass/tachyons/scss/_opacity.scss:
--------------------------------------------------------------------------------
1 |
2 | // Converted Variables
3 |
4 |
5 | // Custom Media Query Variables
6 |
7 |
8 | /*
9 |
10 | OPACITY
11 | Docs: http://tachyons.io/docs/themes/opacity/
12 |
13 | */
14 |
15 | .o-100 { opacity: 1; }
16 | .o-90 { opacity: .9; }
17 | .o-80 { opacity: .8; }
18 | .o-70 { opacity: .7; }
19 | .o-60 { opacity: .6; }
20 | .o-50 { opacity: .5; }
21 | .o-40 { opacity: .4; }
22 | .o-30 { opacity: .3; }
23 | .o-20 { opacity: .2; }
24 | .o-10 { opacity: .1; }
25 | .o-05 { opacity: .05; }
26 | .o-025 { opacity: .025; }
27 | .o-0 { opacity: 0; }
28 |
--------------------------------------------------------------------------------
/terms/reinforcement-learning.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Reinforcement Learning (RL)
3 | references:
4 | - link_title: Reinforcement learning - Wikipedia
5 | link_url: http://en.wikipedia.org/wiki/Reinforcement_learning
6 | related_terms:
7 | - supervised-learning
8 | - unsupervised-learning
9 | ---
10 | Reinforcement learning is about learning from feedback (reinforcement) while learning 'on the job', i.e. learning by trying, rather than from labeled answer data.
11 | This is how robots may learn, but is also used for playing games, a tight feedback loop through e.g. score may help give the algorithm an idea of what works well.
12 |
--------------------------------------------------------------------------------
/terms/hadamard-product.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Hadamard product
3 | ---
4 | The Hadamard product refers to component-wise multiplication of the same dimension.
5 | The $\odot$ symbol is commonly used as the Hadamard product operator.
6 |
7 | Here is an example for the Hadamard product for a pair of $3 \times 3$ matrices.
8 |
9 | $$
10 | \begin{bmatrix}
11 | a & b & c \\
12 | d & e & f \\
13 | g & h & i
14 | \end{bmatrix}
15 | \odot
16 | \begin{bmatrix}
17 | j & k & l \\
18 | m & n & o \\
19 | p & q & r
20 | \end{bmatrix}
21 | =
22 | \begin{bmatrix}
23 | aj & bk & cl \\
24 | dm & ne & fo \\
25 | gp & hq & ir
26 | \end{bmatrix}
27 | $$
28 |
--------------------------------------------------------------------------------
/_sass/tachyons/scss/_links.scss:
--------------------------------------------------------------------------------
1 |
2 | // Converted Variables
3 |
4 |
5 | // Custom Media Query Variables
6 |
7 |
8 | /*
9 |
10 | LINKS
11 | Docs: http://tachyons.io/docs/elements/links/
12 |
13 | */
14 |
15 | .link {
16 | text-decoration: none;
17 | transition: color .15s ease-in;
18 | }
19 |
20 | .link:link,
21 | .link:visited {
22 | transition: color .15s ease-in;
23 | }
24 | .link:hover {
25 | transition: color .15s ease-in;
26 | }
27 | .link:active {
28 | transition: color .15s ease-in;
29 | }
30 | .link:focus {
31 | transition: color .15s ease-in;
32 | outline: 1px dotted currentColor;
33 | }
34 |
35 |
--------------------------------------------------------------------------------
/terms/filter-convolution.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Kernel (image processig) - Wikipedia
4 | link_url: https://en.wikipedia.org/wiki/Kernel_(image_processing
5 | title: Filter (convolution)
6 | ---
7 | A *filter* (also known as a *kernel*) is a small matrix
8 | used in convolution operations.
9 |
10 | Convolution filters are commonly used in image processing
11 | to modify images or extract features.
12 |
13 | The dimensions of a convolution filter are typically small,
14 | odd, and square. For example, convolution filters are typically
15 | $3 \times 3$ or $5 \times 5$ matrices. Odd dimensions are
16 | preferred to even dimensions.
--------------------------------------------------------------------------------
/terms/data-parallelism.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Data parallelism - Wikipedia
4 | link_url: https://en.wikipedia.org/wiki/Data_parallelism
5 | - link_title: What is the difference between model parallelism and data parallelism?
6 | - Quora
7 | link_url: https://www.quora.com/What-is-the-difference-between-model-parallelism-and-data-parallelism
8 | title: Data parallelism
9 | ---
10 | Data parallelism is when data is distributed across multiple
11 | nodes in a distributed computing environment, and then
12 | each node acts on the data in parallel.
13 |
14 | On each node, the computation is the same, but the data is
15 | different.
--------------------------------------------------------------------------------
/terms/stochastic-convex-hull-sch.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: On the expected diameter, width, and complexity of a stochastic convex-hull
4 | link_url: http://arxiv.org/abs/1704.07028
5 | - link_title: On the Separability of Stochastic Geometric Objects, with Applications
6 | link_url: https://arxiv.org/abs/1603.07021
7 | - link_title: Convex Hulls under Uncertainty
8 | link_url: https://www.cs.ucsb.edu/~suri/psdir/esa14.pdf
9 | - link_title: Probabilistic Convex Hull Queries over Uncertain Data
10 | link_url: http://ieeexplore.ieee.org/document/6858080/
11 | related_terms:
12 | - convex-hull
13 | title: Stochastic convex hull (SCH)
14 | ---
15 |
--------------------------------------------------------------------------------
/admin/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 | Content Manager
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
--------------------------------------------------------------------------------
/terms/convolutional-neural-network-cnn.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Convolutional neural network - Wikipedia
4 | link_url: https://en.wikipedia.org/wiki/Convolutional_neural_network
5 | - link_title: Stanford CS231n Convolutional Neural Networks for Visual Recognition
6 | link_url: http://cs231n.github.io/convolutional-networks/
7 | - link_title: Understanding convolutional neural networks for NLP - WildML
8 | link_url: http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/
9 | related_terms:
10 | - convolution
11 | - attention-neural-networks
12 | - neural-network
13 | title: Convolutional Neural Networks (CNN)
14 | ---
15 |
--------------------------------------------------------------------------------
/terms/extractive-sentence-summarization.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Extractive Summarization Using Supervised and Semi-supervised Learning
4 | link_url: http://anthology.aclweb.org/C/C08/C08-1124.pdf
5 | related_terms:
6 | - abstractive-sentence-summarization
7 | - textrank
8 | title: Extractive sentence summarization
9 | ---
10 | Extractive sentence summarization refers to programmatically
11 | creating a shorter version of a document by extracting
12 | the "important" parts.
13 |
14 | [TextRank][1] is an example of an algorithm that can
15 | rank sentences in a document for the purpose of extractive
16 | summarization.
17 |
18 | [1]: /terms/textrank
--------------------------------------------------------------------------------
/terms/first-order-information.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: First-order information
3 | needs_review: true
4 | ---
5 | First-order information is a term used to mean information obtained
6 | by computing the first derivative of a function. The first
7 | derivative of a function reveals the slope of a tangent
8 | line to the function. This gives a general idea of how the function
9 | is changing at that point, but does not give information
10 | about the *curvature* of the function--the second derivative
11 | is required for that.
12 |
13 | First-order information should not be confused with
14 | [first-order logic][1].
15 |
16 | [1]: https://en.wikipedia.org/wiki/First-order_logic
--------------------------------------------------------------------------------
/terms/named-entity-recognition-in-query-nerq.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Named Entity Recognition in Query
4 | link_url: https://soumen.cse.iitb.ac.in/~soumen/doc/www2013/QirWoo/GuoXCL2009nerq.pdf
5 | - link_title: Named entity recognition in query - Google Patents
6 | link_url: https://www.google.com/patents/US9009134
7 | related_terms:
8 | - named-entity-recognition-ner
9 | title: Named Entity Recognition in Query (NERQ)
10 | ---
11 | *Named Entity Recognition in Query* is a phrase used in a research paper and patent from
12 | Microsoft, referring to the [Named Entity Recognition](/terms/named-entity-recognition-ner/)
13 | problem in web search queries.
--------------------------------------------------------------------------------
/_sass/tachyons/scss/_box-sizing.scss:
--------------------------------------------------------------------------------
1 |
2 | // Converted Variables
3 |
4 |
5 | // Custom Media Query Variables
6 |
7 |
8 | /*
9 |
10 | BOX SIZING
11 |
12 | */
13 |
14 | html,
15 | body,
16 | div,
17 | article,
18 | section,
19 | main,
20 | footer,
21 | header,
22 | form,
23 | fieldset,
24 | legend,
25 | pre,
26 | code,
27 | a,
28 | h1,h2,h3,h4,h5,h6,
29 | p,
30 | ul,
31 | ol,
32 | li,
33 | dl,
34 | dt,
35 | dd,
36 | textarea,
37 | table,
38 | td,
39 | th,
40 | tr,
41 | input[type="email"],
42 | input[type="number"],
43 | input[type="password"],
44 | input[type="tel"],
45 | input[type="text"],
46 | input[type="url"],
47 | .border-box {
48 | box-sizing: border-box;
49 | }
50 |
--------------------------------------------------------------------------------
/terms/parameter-budget.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Parameter budget
3 | needs_review: true
4 | related_terms:
5 | - optimization
6 | ---
7 | A *parameter budget* refers to the idea of constraining the number
8 | of learnable parameters for a machine learning model. Some types
9 | of parameters are more useful for improving a model
10 | than others, thus they should be prioritized in a model
11 | with a restricted parameter budget.
12 |
13 | In neural networks, deeper networks seem to work better when the parameter
14 | budget is constrained.
15 |
16 | A related idea is the *computational budget*, but the budget for overall computation is not strictly tied to the number of parameters in a model.
--------------------------------------------------------------------------------
/terms/word2phrase.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: word2phrase
3 | related_terms:
4 | - word2vec
5 | ---
6 | word2phrase refers to a program in the
7 | [word2vec](/terms/word2vec) toolkit that discovers
8 | multi-word phrases in a corpus of words.
9 |
10 | From the [original word2vec Google Code page](https://code.google.com/archive/p/word2vec/):
11 |
12 | > In certain applications, it is useful to have vector representation of larger pieces of text. For example, it is desirable to have only one vector for representing 'san francisco'. This can be achieved by pre-processing the training data set to form the phrases using the word2phrase tool, as is shown in the example script ./demo-phrases.sh.
13 |
--------------------------------------------------------------------------------
/terms/reparameterization-trick.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Auto-Encoding Variational Bayes
4 | link_url: https://arxiv.org/abs/1312.6114
5 | - link_title: How does the reparameterization trick for VAEs work and why is it important?
6 | - Cross Validated
7 | link_url: https://stats.stackexchange.com/questions/199605/how-does-the-reparameterization-trick-for-vaes-work-and-why-is-it-important
8 | - link_title: Variational Auto-Encoders and Extensions - NIPS 2015 workshop
9 | link_url: http://dpkingma.com/wordpress/wp-content/uploads/2015/12/talk_nips_workshop_2015.pdf
10 | related_terms:
11 | - variational-autoencoder-vae
12 | title: Reparameterization trick
13 | ---
14 |
--------------------------------------------------------------------------------
/terms/variance.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Variance - Wikipedia
4 | link_url: https://en.wikipedia.org/wiki/Variance
5 | title: Variance
6 | ---
7 | Wikipedia describes variance as follows:
8 |
9 | > In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its mean, and it informally measures how far a set of (random) numbers are spread out from their mean.
10 |
11 | Variance the square of standard deviation, or rather, [standard deviation][1] is the square root of variance.
12 | Thus, sometimes variance is written as $\sigma^2$ where $\sigma$ stands for the standard deviation.
13 |
14 | [1]: /terms/standard-deviation/
--------------------------------------------------------------------------------
/meta/index.md:
--------------------------------------------------------------------------------
1 | ---
2 | layout: page
3 | title: Meta
4 | ---
5 | This section is for *meta-pages*. Pages that don't contain machine
6 | learning terminology, but are useful for helping edit, navigate,
7 | and understand this website.
8 |
9 | - [Unfinished Terms](/meta/unfinished/): This page lists terms that currently
10 | have incomplete definitions.
11 | - [Needs Review](/meta/needs-review/): This page lists terms that have definitions, but they have been marked for review.
12 | - [No Related Terms](/meta/no-related/): Ths page lists terms that are not connected to any other term via *Related Terms*.
13 | - [No References](/meta/no-references/): This page lists terms that do not have any references.
--------------------------------------------------------------------------------
/terms/face-recognition.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Face recognition
3 | references:
4 | - link_title: What is face recognition? - Convolutional Neural Networks - deeplearning.ai
5 | link_url: https://www.coursera.org/learn/convolutional-neural-networks/lecture/lUBYU/what-is-face-recognition
6 | ---
7 | Face recognition is the problem of identifying whether an input
8 | image contains the faces of any of $k$ people... or if the image has
9 | none of the $k$ faces.
10 |
11 | Face recognition is a harder problem than [face verification][1]
12 | because face verification only compares a single image to one person,
13 | whereas face recognition does this for $k$ people.
14 |
15 | [1]: /terms/face-verification/
16 |
--------------------------------------------------------------------------------
/terms/stationary-environment.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Stationary environment
3 | ---
4 | A stationary environment refers to data-generating distributions
5 | that do not change over time.
6 |
7 | A non-stationary environment, in contrast, refers to data-generating
8 | distributions that do change over time.
9 |
10 | It is a difficult problem to train machine learning algorithms
11 | to generalize well in non-stationary environments. See
12 | [Machine Learning in Non-Stationary Environments][1] for
13 | more information.
14 |
15 | [1]: https://mitpress.mit.edu/books/machine-learning-non-stationary-environments "Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation - Mit Press"
--------------------------------------------------------------------------------
/_layouts/page.html:
--------------------------------------------------------------------------------
1 | ---
2 | layout: compress
3 | ---
4 | {% include header.html %}
5 |
6 |
7 |
Search Results
8 |
9 |
10 |
11 |
{{ page.title }}
12 |
13 |
14 | {{ content }}
15 |
16 |
21 |
22 |
23 |
24 | {% include footer.html %}
25 |
--------------------------------------------------------------------------------
/terms/distributional-similarity.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Distributional semantics - Wikipedia
4 | link_url: https://en.wikipedia.org/wiki/Distributional_semantics
5 | - link_title: word2vec - Stanford CS224N Lecture 2
6 | link_url: https://www.youtube.com/watch?v=ERibwqs9p38
7 | title: Distributional similarity
8 | ---
9 | Distributional similarity is the idea that the meaning of words can be understood
10 | from their context.
11 |
12 | This should not be confused with the term [distributed representation][1], which refers to the
13 | idea of representing information with relatively dense vectors as opposed to a one-hot
14 | representation.
15 |
16 | [1]: /terms/distributed-representation/
--------------------------------------------------------------------------------
/terms/latent-semantic-indexing-lsi.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Latent semantic indexing - Wikipedia
4 | link_url: https://en.wikipedia.org/wiki/Latent_semantic_analysis#Latent_semantic_indexing
5 | - link_title: What is latent semantic indexing? - Search Engine Journal
6 | link_url: https://www.searchenginejournal.com/what-is-latent-semantic-indexing-seo-defined/21642/
7 | - link_title: Latent Semantic Indexing - Introduction to Information Retrieval
8 | link_url: https://nlp.stanford.edu/IR-book/html/htmledition/latent-semantic-indexing-1.html
9 | related_terms:
10 | - latent-dirichlet-allocation-lda
11 | - singular-value-decomposition-svd
12 | title: Latent Semantic Indexing (LSI)
13 | ---
14 |
--------------------------------------------------------------------------------
/terms/top-1-error-rate.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: 'ImageNet: what is top-1 and top-5 error rate?'
4 | link_url: https://stats.stackexchange.com/questions/156471/imagenet-what-is-top-1-and-top-5-error-rate
5 | title: Top-1 error rate
6 | ---
7 | The term *top-1 error rate* refers method of benchmarking
8 | machine learning models in the ImageNet
9 | Large Scale Visual Recognition Competition.
10 |
11 | The model is considered to have classified a given image correctly
12 | if the target label is the model's top prediction. This
13 | is in contrast to the [top-5 error rate](/terms/top-5-error-rate/)
14 | where the model only needs to identify the correct label in the
15 | model's top 5 predictions.
--------------------------------------------------------------------------------
/terms/sobel-filter-convolution.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Sobel operator - Wikipedia
4 | link_url: https://en.wikipedia.org/wiki/Sobel_operator
5 | related_terms:
6 | - filter-convolution
7 | - convolution
8 | - convolutional-neural-network-cnn
9 | title: Sobel filter (convolution)
10 | ---
11 | The Sobel filter is a set of two convolution filters used to detect horizontal
12 | and vertical edges in images.
13 |
14 | The horizontal filter is
15 |
16 | $$
17 | \begin{bmatrix}
18 | 1 & 0 & -1 \\
19 | 2 & 0 & -2 \\
20 | 1 & 0 & -1
21 | \end{bmatrix}
22 | $$
23 |
24 | and the vertical filter is
25 |
26 | $$
27 | \begin{bmatrix}
28 | 1 & 2 & 1 \\
29 | 0 & 0 & 0 \\
30 | -1 & -2 & -1
31 | \end{bmatrix}
32 | $$
--------------------------------------------------------------------------------
/terms/activation-function.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Activation function - Wikipedia
4 | link_url: https://en.wikipedia.org/wiki/Activation_function
5 | - link_title: Commonly used activation functions - Stanford CS231n notes
6 | link_url: http://cs231n.github.io/neural-networks-1/#actfun
7 | related_terms:
8 | - neural-network
9 | title: Activation function
10 | ---
11 | In neural networks, an activation function defines
12 | the output of a neuron.
13 |
14 | The activation function takes the dot product of
15 | the input to the neuron ($\mathbf x$) and the weights ($\mathbf w$).
16 |
17 | Typically activation functions are nonlinear, as that allows the
18 | network to approximate a wider variety of functions.
--------------------------------------------------------------------------------
/terms/gram-matrix.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Gramian matrix - Wikipedia
4 | link_url: https://en.wikipedia.org/wiki/Gramian_matrix
5 | - link_title: Gram Matrix - Wolfram Mathworld
6 | link_url: http://mathworld.wolfram.com/GramMatrix.html
7 | - link_title: Gram matrix - Encyclopedia of Mathematics
8 | link_url: https://www.encyclopediaofmath.org/index.php/Gram_matrix
9 | title: Gram matrix
10 | ---
11 | [Wolfram Mathworld defines Gram matrix as][1]:
12 |
13 | > Given a set $V$ of $m$ vectors (points in $\mathcal R^n$), the Gram matrix $G$ is
14 | > the matrix of all possible inner products of $V$, i.e.,
15 | > $$ g_{ij} = \mathbf v_i^T \mathbf v_j $$
16 |
17 | [1]: http://mathworld.wolfram.com/GramMatrix.html
--------------------------------------------------------------------------------
/terms/one-shot-learning.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: One-shot learning
3 | related_terms:
4 | - representation-learning
5 | references:
6 | - link_title: One Shot Learning - Convolutional Neural Networks - deeplearning.ai
7 | link_url: https://www.coursera.org/learn/convolutional-neural-networks/lecture/gjckG/one-shot-learning
8 | ---
9 | One-shot learning refers to the problem of
10 | training a statistical model (such as a
11 | classifier) with only a single example per class.
12 |
13 | One way to build a system capable of
14 | one-shot learning is to use [representation learning][1], to learn representations or features of data
15 | that can be used to accurately classify a single example.
16 |
17 | [1]: /terms/representation-learning/
--------------------------------------------------------------------------------
/terms/tree-lstm.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Improved Semantic Representations From Tree-Structured Long Short-Term
4 | Memory Networks
5 | link_url: https://arxiv.org/abs/1503.00075
6 | related_terms:
7 | - long-short-term-memory-lstm
8 | title: Tree-LSTM
9 | ---
10 | Tree-LSTMs are a variant of Long Short Term Memory (LSTM) neural networks.
11 |
12 | A traditional LSTM is structured as a linear chain, and displays
13 | strong performance on sequence modeling tasks--such as machine translation.
14 |
15 | However, some types of data (such as text) are better represented as
16 | tree structures instead of sequences. Thus, Tree-LSTMs were
17 | [introduced by Tai, et al][1] in 2015.
18 |
19 | [1]: https://arxiv.org/abs/1503.00075
--------------------------------------------------------------------------------
/terms/lenet.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Case Studies - Stanford CS231n Convolutional Neural Networks
4 | link_url: http://cs231n.github.io/convolutional-networks/#case
5 | - link_title: Gradient-Based Learning Applied to Document Recognition
6 | link_url: http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf
7 | related_terms:
8 | - convolutional-neural-network-cnn
9 | title: LeNet
10 | ---
11 | LeNet was an early convolutional neural network proposed
12 | by Lecun et al in the paper
13 | [Gradient-Based Learning Applied to Document Recognition](http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf).
14 |
15 | LeNet was designed for handwriting recognition. Many modern
16 | convolutional neural network architectures are inspired by LeNet.
--------------------------------------------------------------------------------
/terms/face-verification.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Face verification
3 | related_terms:
4 | - face-recognition
5 | - face-detection
6 | references:
7 | - link_title: What is face recognition? - Convolutional Neural Networks - deeplearning.ai
8 | link_url: https://www.coursera.org/learn/convolutional-neural-networks/lecture/lUBYU/what-is-face-recognition
9 | ---
10 | *Face verification* is the problem of identifying
11 | whether an image belongs to a person--given
12 | the one image and one person as the input.
13 |
14 | Face verification is an easier problem than
15 | [face recognition][1] because face verification only compares
16 | a single image to one person, whereas face recognition does this
17 | for $k$ people.
18 |
19 | [1]: /terms/face-recognition/
--------------------------------------------------------------------------------
/terms/alexnet.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: ImageNet Classification with Deep Convolutional Neural Networks
4 | link_url: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf
5 | related_terms:
6 | - convolutional-neural-network-cnn
7 | title: AlexNet
8 | ---
9 | AlexNet is a convolutional neural network architecture proposed by
10 | Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton in 2012.
11 |
12 | At the time, it achieved state-of-the-art performance on
13 | the test set for the 2010 ImageNet Large Scale Visual Recognition Competition (LSVRC). A variant of the model won the
14 | 2012 ImageNet LSVRC with a top-5 test error rate of
15 | 15.3%--ten percentage points ahead of the second place winner.
--------------------------------------------------------------------------------
/terms/weight-sharing.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Simplifying Neural Networks by Soft Weight-Sharing
4 | link_url: http://www.cs.toronto.edu/~fritz/absps/sunspots.pdf
5 | - link_title: Shared Weights - Convolutional Neural Networks - Deep Learning Tutorial
6 | link_url: http://deeplearning.net/tutorial/lenet.html#shared-weights
7 | - link_title: Soft Weight-Sharing for Neural Network Compression
8 | link_url: https://arxiv.org/abs/1702.04008
9 | related_terms:
10 | - neural-network
11 | title: Weight sharing
12 | ---
13 | In neural networks, weight sharing is a way to reduce the number of parameters while allowing
14 | for more robust feature detection. Reducing the number of parameters can be
15 | considered a form of [model compression](/terms/model-compression/).
--------------------------------------------------------------------------------
/terms/co-adaptation.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: What does "co-adaptation of neurons" in a Neural network mean? - Quora
4 | link_url: https://www.quora.com/What-does-co-adaptation-of-neurons-in-a-Neural-network-mean
5 | related_terms:
6 | - dropout
7 | - regularization
8 | title: Co-adaptation
9 | ---
10 | In neural networks, co-adaptation refers to when different hidden
11 | units in a neural networks have highly correlated behavior.
12 |
13 | It is better for computational efficiency and the the model's ability
14 | to learn a general representation if hidden units can detect
15 | features independently of each other.
16 |
17 | A few different regularization techniques aim at reducing
18 | co-adapatation--[dropout][1] being a notable one.
19 |
20 | [1]: /terms/dropout/
--------------------------------------------------------------------------------
/terms/hypergraph.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Hypergraph - Wikipedia
4 | link_url: https://en.wikipedia.org/wiki/Hypergraph
5 | - link_title: 'Learning with Hypergraphs: Clustering, Classification, Embedding'
6 | link_url: https://papers.nips.cc/paper/3128-learning-with-hypergraphs-clustering-classification-and-embedding.pdf
7 | - link_title: What are the applications of hypergraphs? - Math Overflow
8 | link_url: https://mathoverflow.net/questions/13750/what-are-the-applications-of-hypergraphs
9 | related_terms:
10 | - graph
11 | title: Hypergraph
12 | ---
13 | A hypergraph is a generalization of the [graph][1]. A graph has edges that connect
14 | pairs of vertices, but a hypergraph has hyperedges that can connect any number
15 | of vertices.
16 |
17 | [1]: /terms/graph/
--------------------------------------------------------------------------------
/terms/adagrad.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Adaptive Subgradient Methods for Online Learning and Stochastic Optimization
4 | link_url: http://jmlr.org/papers/v12/duchi11a.html
5 | - link_title: An overview of gradient descent optimization algorithms - Sebiastian
6 | Ruder
7 | link_url: http://sebastianruder.com/optimizing-gradient-descent/
8 | related_terms:
9 | - stochastic-gradient-descent-sgd
10 | - momentum-optimization
11 | - learning-rate
12 | title: AdaGrad
13 | ---
14 | AdaGrad is a gradient-descent based optimization algorithm. It automatically
15 | tunes the [learning rate][1] based on its observations of the data's geometry.
16 | AdaGrad is designed to perform well with datasets that have infrequently-occurring
17 | features.
18 |
19 | [1]: /terms/learning-rate/
--------------------------------------------------------------------------------
/terms/facets-tool.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Facets home page at Google PAIR
4 | link_url: https://pair-code.github.io/facets/
5 | title: Facet (visualization tool)
6 | ---
7 | Facets is a a plotting and visualizaton tool created by
8 | the People + AI Research (PAIR) initiative at Google.
9 |
10 | Facets is broken into two tools with the following goals:
11 |
12 | - **Facets Overview** -- summarize statistics for features collected from datasets
13 | - **Facets Dive** -- explore the relationship between different features in a dataset
14 |
15 | From the Facets homepage, they state that
16 |
17 | > Success stories of (Facets) Dive include the detection of classifier failure, identification of systematic errors, evaluating ground truth and potential new signals for ranking.
--------------------------------------------------------------------------------
/terms/he-initialization.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: 'Delving Deep into Rectifiers: Surpassing Human-Level Performance on
4 | ImageNet Classification'
5 | link_url: https://arxiv.org/abs/1502.01852
6 | related_terms:
7 | - symmetry-breaking
8 | - random-initialization
9 | title: He initialization
10 | ---
11 | The term *He initialization* refers to the first author in the paper
12 | "[Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification](https://arxiv.org/abs/1502.01852)".
13 |
14 | He initialization initializes the bias vectors of a neural network
15 | to $0$ and the weights to random numbers drawn from a Gaussian
16 | distribution where the mean is $0$ and the variance is
17 | $\sqrt(2/n_l)$ where $n_l$ is the dimension of the previous layer.
--------------------------------------------------------------------------------
/terms/leaky-relu.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Leaky ReLU
3 | related_terms:
4 | - rectified-linear-unit-relu
5 | - activation-function
6 | ---
7 | Leaky ReLU is a type of [activation function][1] that tries
8 | to solve the [Dying ReLU problem][2].
9 |
10 | A traditional rectified linear unit $f(x)$ returns 0 when $x \leq 0$.
11 | The *Dying ReLU problem* refers to when the unit gets stuck this
12 | way--always returning 0 for any input.
13 |
14 | Leaky ReLU aims to fix this by returning a small, negative,
15 | non-zero value instead of 0, as such:
16 |
17 | $$
18 | f(x) =
19 | \begin{cases}
20 | \max(0,x) & x > 0 \\
21 | \alpha x & x \leq 0
22 | \end{cases}
23 | $$
24 | where $\alpha$ is typically a small value like $\alpha = 0.0001$.
25 |
26 | [1]: /terms/activation-function/
27 | [2]: /terms/dying-relu/
--------------------------------------------------------------------------------
/terms/resnet.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Case Studies - Stanford CS231n Convolutional Neural Networks
4 | link_url: http://cs231n.github.io/convolutional-networks/#case
5 | - link_title: Deep Residual Learning for Image Recognition
6 | link_url: https://arxiv.org/abs/1512.03385
7 | - link_title: Identity Mappings in Deep Residual Networks
8 | link_url: https://arxiv.org/abs/1603.05027
9 | related_terms:
10 | - convolutional-neural-network-cnn
11 | title: ResNet
12 | ---
13 | ResNet stands for "Residual Network" and was introduced in the paper
14 | [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385).
15 | ResNet won the [ImageNet Large Scale Visual Recognition Challenge (ILSVRC)][1] 2015
16 | competition.
17 |
18 | [1]: http://www.image-net.org/challenges/LSVRC/
--------------------------------------------------------------------------------
/terms/adaptive-learning-rate.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Adaptive learning rate
3 | related_terms:
4 | - learning-rate
5 | - learning-rate-decay
6 | - adagrad
7 | - adam-optimizer
8 | ---
9 | The term *adaptive learning rate* refers to variants
10 | of [stochastic gradient descent][1] with learning
11 | rates that change over the course of the algorithm's
12 | execution.
13 |
14 | Allowing the learning rate to change dynamically
15 | eliminates the need to pick a "good" static learning rate,
16 | and can lead to faster training and a trained model
17 | with better performance.
18 |
19 | Some adaptive learning rate algorithms are:
20 | - [Adagrad][2]
21 | - [ADADELTA][3]
22 | - [ADAM][4]
23 |
24 | [1]: /terms/stochastic-gradient-descent-sgd/
25 | [2]: /terms/adagrad/
26 | [3]: /terms/adadelta/
27 | [4]: /terms/adam-optimizer/
28 |
--------------------------------------------------------------------------------
/terms/model-parallelism.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: What is the difference between model parallelism and data parallelism?
4 | - Quora
5 | link_url: https://www.quora.com/What-is-the-difference-between-model-parallelism-and-data-parallelism
6 | - link_title: Training with Multiple GPUs Using Model Parallelism - MXNet documentation
7 | link_url: http://mxnet.io/how_to/model_parallel_lstm.html
8 | related_terms:
9 | - data-parallelism
10 | title: Model parallelism
11 | ---
12 | Model parallelism is where multiple computing nodes evaluate
13 | the same model with the same data, but using different
14 | parameters or hyperparameters.
15 |
16 | In contrast to model parallelism,
17 | [data parallelism](/terms/data-parallelism/)
18 | where the different computing nodes have the same
19 | parameters but different data.
--------------------------------------------------------------------------------
/_sass/tachyons/scss/_font-style.scss:
--------------------------------------------------------------------------------
1 |
2 | // Converted Variables
3 |
4 |
5 | // Custom Media Query Variables
6 |
7 |
8 | /*
9 |
10 | FONT STYLE
11 | Docs: http://tachyons.io/docs/typography/font-style/
12 |
13 | Media Query Extensions:
14 | -ns = not-small
15 | -m = medium
16 | -l = large
17 |
18 | */
19 |
20 | .i { font-style: italic; }
21 | .fs-normal { font-style: normal; }
22 |
23 | @media #{$breakpoint-not-small} {
24 | .i-ns { font-style: italic; }
25 | .fs-normal-ns { font-style: normal; }
26 | }
27 |
28 | @media #{$breakpoint-medium} {
29 | .i-m { font-style: italic; }
30 | .fs-normal-m { font-style: normal; }
31 | }
32 |
33 | @media #{$breakpoint-large} {
34 | .i-l { font-style: italic; }
35 | .fs-normal-l { font-style: normal; }
36 | }
37 |
--------------------------------------------------------------------------------
/_sass/tachyons/scss/_tables.scss:
--------------------------------------------------------------------------------
1 |
2 | // Converted Variables
3 |
4 |
5 | // Custom Media Query Variables
6 |
7 |
8 | /*
9 |
10 | TABLES
11 | Docs: http://tachyons.io/docs/elements/tables/
12 |
13 | */
14 |
15 | .collapse {
16 | border-collapse: collapse;
17 | border-spacing: 0;
18 | }
19 |
20 | .striped--light-silver:nth-child(odd) {
21 | background-color: $light-silver;
22 | }
23 |
24 | .striped--moon-gray:nth-child(odd) {
25 | background-color: $moon-gray;
26 | }
27 |
28 | .striped--light-gray:nth-child(odd) {
29 | background-color: $light-gray;
30 | }
31 |
32 | .striped--near-white:nth-child(odd) {
33 | background-color: $near-white;
34 | }
35 |
36 | .stripe-light:nth-child(odd) {
37 | background-color: $white-10;
38 | }
39 |
40 | .stripe-dark:nth-child(odd) {
41 | background-color: $black-10;
42 | }
43 |
--------------------------------------------------------------------------------
/meta/no-related.html:
--------------------------------------------------------------------------------
1 | ---
2 | layout: page
3 | title: "Meta: Terms Without Related Terms"
4 | ---
5 |
6 | This is a list of terms that have an empty list of outbound
7 | related terms, and also is not referenced by any other term as a
8 | related term.
9 |
10 |
11 | {% capture all_related_terms_str %}{% for page in site.pages %}{% for term in page.related_terms %}/terms/{{ term }}/ {% endfor %}{% endfor %}{% endcapture %}
12 | {% assign all_related_terms = all_related_terms_str| split: " " %}
13 | {% for page in site.pages %}
14 | {% if page.url contains 'terms/' and page.related_terms == nil and page.layout != 'redirect' %}
15 | {% unless all_related_terms contains page.url %}
16 |
--------------------------------------------------------------------------------
/terms/coreference-resolution.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Coreference resolution
3 | related_terms:
4 | - named-entity-recognition-ner
5 | - natural-language-processing
6 | ---
7 | The Stanford NLP group [defines coreference resolution][1] as:
8 |
9 | > Coreference resolution is the task of finding all expressions that refer to the same entity in a text. It is an important step for a lot of higher level NLP tasks that involve natural language understanding such as document summarization, question answering, and information extraction.
10 |
11 | Coreference resolution should not be confused with
12 | [Named Entity Recognition](/terms/named-entity-recognition-ner/), which is focused on labeling
13 | sequences of text that refer to entities--but not focused
14 | on linking those entities together.
15 |
16 | [1]: https://nlp.stanford.edu/projects/coref.shtml
--------------------------------------------------------------------------------
/terms/momentum-optimization.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Why Momentum Really Works - Distill
4 | link_url: http://distill.pub/2017/momentum/
5 | related_terms:
6 | - stochastic-optimization
7 | - stochastic-gradient-descent-sgd
8 | title: Momentum
9 | ---
10 | Momentum is commonly understood as a variation of [stochastic gradient descent][1],
11 | but with one important difference: stochastic gradient descent can
12 | unnecessarily oscillate, and doesn't accelerate based on the shape of the
13 | curve.
14 |
15 | In contrast, momentum can dampen oscillations and accelerate convergence.
16 |
17 | Momentum was originally [proposed in 1964 by Boris T. Polyak][2].
18 |
19 | [1]: /terms/stochastic-gradient-descent-sgd/
20 | [2]: https://www.researchgate.net/publication/243648538_Some_methods_of_speeding_up_the_convergence_of_iteration_methods
--------------------------------------------------------------------------------
/terms/convex-combination.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Convex combination - Wikipedia
4 | link_url: https://en.wikipedia.org/wiki/Convex_combination
5 | related_terms:
6 | - convex-optimization
7 | title: Convex combination
8 | ---
9 | A convex combination is a linear combination, where all
10 | the coefficients are greater than 0 and sum to 1.
11 |
12 | The [Convex combination Wikipedia article][1] gives the following example:
13 |
14 | Given a finite number of points $x_1, x_2, \ldots, x_n$ in a real vector
15 | space, a convex combination of these points is a point of the form
16 |
17 | $$
18 | a_1 x_1 + a_2 x _2 + \ldots + a_n x_n
19 | $$
20 | is a convex combination if all real numbers $a_i \geq 0$ and
21 | $a_1 + a_2 + \ldots + a_n = 1$
22 |
23 | [1]: https://en.wikipedia.org/wiki/Convex_combination "Convex combination - Wikipedia"
--------------------------------------------------------------------------------
/admin/config.yml:
--------------------------------------------------------------------------------
1 | backend:
2 | name: git-gateway
3 | branch: master
4 | publish_mode: editorial_workflow
5 | media_folder: images
6 | collections:
7 | - name: terms
8 | label: Term
9 | folder: terms
10 | create: true
11 | slug: "{{slug}}"
12 | fields:
13 | - label: Title
14 | name: title
15 | widget: string
16 | required: true
17 | - label: Related Terms
18 | name: related_terms
19 | widget: list
20 | - label: Reference
21 | name: references
22 | widget: list
23 | fields:
24 | - label: Link Title
25 | name: link_title
26 | widget: string
27 | - label: Link URL
28 | name: link_url
29 | widget: string
30 | - label: Definition
31 | name: body
32 | widget: markdown
33 |
--------------------------------------------------------------------------------
/terms/one-hot-encoding.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: One-hot - Wikipedia
4 | link_url: https://en.wikipedia.org/wiki/One-hot
5 | - link_title: What is one hot encoding and when is it used in data science?
6 | link_url: https://www.quora.com/What-is-one-hot-encoding-and-when-is-it-used-in-data-science
7 | related_terms:
8 | - distributed-representation
9 | title: One-hot encoding
10 | ---
11 | *One-hot encoding* refers to a way of transforming data into vectors
12 | where all components are 0, except for one component with a value of 1,
13 | e,g.:
14 | $$
15 | 0 = [1, 0, 0, 0, 0]^T
16 | $$
17 | $$
18 | 1 = [0, 1, 0, 0, 0]^T
19 | $$
20 | $$
21 | \ldots
22 | $$
23 | $$
24 | 4 = [0, 0, 0, 0, 1]^T
25 | $$
26 | and so on.
27 |
28 | One-hot encoding can make it easier for machine learning algorithms to
29 | manipulate and learn categorical variables.
--------------------------------------------------------------------------------
/terms/part-of-speech-pos-tagging.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Part-of-speech tagging
4 | link_url: https://en.wikipedia.org/wiki/Part-of-speech_tagging
5 | related_terms:
6 | - natural-language-processing
7 | title: Part-of-Speech (POS) Tagging
8 | ---
9 | Part-of-Speech tagging is the process of reading
10 | natural language text and assigning parts of speech to each
11 | token.
12 |
13 | One could imagine taking in a sentence like:
14 |
15 | > The dog ran away.
16 |
17 | and creating a data structure that had the following annotations:
18 |
19 | > The*[article]* dog*[noun]* ran*[verb]* away*[adjective]*.
20 |
21 | Words can have different parts-of-speech depending on their
22 | context. For example, the word *away* can be either an [adverb
23 | or an adjective, or part of a larger phrase][1].
24 |
25 | [1]: http://www.dictionary.com/browse/away
--------------------------------------------------------------------------------
/terms/termite.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: 'Termite: Visualization Techniques for Assessing Textual Topic Models
4 | - Stanford Visualization Group'
5 | link_url: http://vis.stanford.edu/papers/termite
6 | - link_title: Online Termite Demo
7 | link_url: http://vis.stanford.edu/topic-diagnostics/model/silverStandards/
8 | - link_title: Termite Data Server - Github
9 | link_url: https://github.com/uwdata/termite-data-server
10 | related_terms:
11 | - latent-dirichlet-allocation-lda
12 | title: Termite
13 | ---
14 | Termite is a visual analysis tool to determine the quality of topic models
15 | like [latent Dirichlet allocation](/terms/latent-dirichlet-allocation-lda/).
16 |
17 | Termite lays out document terms as a table of circles where:
18 |
19 | - rows represent document terms
20 | - columns represent topics
21 | - circular areas represent term probabilities
--------------------------------------------------------------------------------
/terms/imputation.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: 6 Different Ways to Compensate for Missing Values In a Dataset (Data Imputation with examples) - Will Badr
4 | link_url: https://towardsdatascience.com/6-different-ways-to-compensate-for-missing-values-data-imputation-with-examples-6022d9ca0779
5 | - link_title: Imputation (statistics)
6 | link_url: https://en.wikipedia.org/wiki/Imputation_(statistics)
7 | - link_title: Defining, Analysing, and Implementing Imputation Techniques - Shashank Singhal
8 | link_url: https://scikit-learn.org/stable/glossary.html#term-imputation
9 | related_terms:
10 | title: Imputation
11 | ---
12 | Imputation means replacing missing data values with substitute values.
13 | There are several ways to do this such as choosing from a random distribution to avoid bias or
14 | by replacing the missing value with the mean or median of that column.
15 |
16 |
--------------------------------------------------------------------------------
/terms/random-initialization.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Why doesn't backpropagation work when you initialize the weights the
4 | same value? -- Cross Validated
5 | link_url: https://stats.stackexchange.com/questions/45087/why-doesnt-backpropagation-work-when-you-initialize-the-weights-the-same-value
6 | - link_title: Random Initialization - Coursera Machine Learning
7 | link_url: https://www.coursera.org/learn/machine-learning/lecture/ND5G5/random-initialization
8 | related_terms:
9 | - symmetry-breaking
10 | title: Random initialization
11 | ---
12 | Random initialization refers to the practice of using random numbers
13 | to initialize the weights of a machine learning model.
14 |
15 | Random initialization is one way of performing [symmetry breaking](/terms/symmetry-breaking), which is the act of preventing all of
16 | the weights in the machine learning model from being the same.
--------------------------------------------------------------------------------
/terms/bias-variance-tradeoff.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Bias-variance tradeoff - Wikipedia
4 | link_url: https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff
5 | related_terms:
6 | - regularization
7 | - variance
8 | - bias
9 | - supervised-learning
10 | - underfitting
11 | - overfitting
12 | title: Bias-variance tradeoff
13 | ---
14 | The bias-variance tradeoff refers to the problem of minimizing two different sources of error
15 | when training a supervised learning model:
16 |
17 | 1. **Bias** - Bias is a consistent error, possibly from the algorithm having
18 | made an incorrect assumption about the training data. Bias is often related to underfitting.
19 |
20 | 2. **Variance** - Variances comes from a high sensitivity to differences in training data.
21 | Variance is often related to overfitting.
22 |
23 | It is typically difficult to simultaneously minimize bias and variance.
--------------------------------------------------------------------------------
/terms/syntaxnet.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: "Announcing SyntaxNet: The World\u2019s Most Accurate Parser Goes Open\
4 | \ Source - Google Research Blog"
5 | link_url: https://research.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html
6 | - link_title: Globally Normalized Transition-Based Neural Networks
7 | link_url: https://arxiv.org/abs/1603.06042
8 | related_terms:
9 | - natural-language-processing
10 | title: SyntaxNet
11 | ---
12 | SyntaxNet is a framework for natural language syntactic
13 | parsers released by Google in 2016.
14 |
15 | SyntaxNet tags words in a sentence with their syntactic part-of-speech
16 | and creates a parse tree showing dependencies between words
17 | in a sentence.
18 |
19 | Parsey McParseface is a SyntaxNet model trained on the English
20 | language. At its time of release, Parsey McParseface is the
21 | world's most accurate model of its kind.
--------------------------------------------------------------------------------
/_sass/tachyons/scss/_white-space.scss:
--------------------------------------------------------------------------------
1 |
2 | // Converted Variables
3 |
4 |
5 | // Custom Media Query Variables
6 |
7 |
8 | /*
9 |
10 | WHITE SPACE
11 |
12 | Media Query Extensions:
13 | -ns = not-small
14 | -m = medium
15 | -l = large
16 |
17 | */
18 |
19 |
20 | .ws-normal { white-space: normal; }
21 | .nowrap { white-space: nowrap; }
22 | .pre { white-space: pre; }
23 |
24 | @media #{$breakpoint-not-small} {
25 | .ws-normal-ns { white-space: normal; }
26 | .nowrap-ns { white-space: nowrap; }
27 | .pre-ns { white-space: pre; }
28 | }
29 |
30 | @media #{$breakpoint-medium} {
31 | .ws-normal-m { white-space: normal; }
32 | .nowrap-m { white-space: nowrap; }
33 | .pre-m { white-space: pre; }
34 | }
35 |
36 | @media #{$breakpoint-large} {
37 | .ws-normal-l { white-space: normal; }
38 | .nowrap-l { white-space: nowrap; }
39 | .pre-l { white-space: pre; }
40 | }
41 |
42 |
--------------------------------------------------------------------------------
/terms/codebook-collapse.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Codebook collapse
3 | related_terms:
4 | - vector-quantized-variational-autoencoder-vqvae
5 | - codebook
6 | - mode-collapse
7 | ---
8 |
9 | **Codebook collapse** is a problem that arises when training generative machine learning models that generate outputs using a fixed-length codebook, such as the [Vector-Quantized Variational Autoencoder (VQ-VAE)][2].
10 |
11 | In ideal scenarios, the model's fixed-size codebook is large enough to create a diverse set of output values. Codebook collapse happens when the model only learns to use a few of the values in the codebook--artificially limiting the diversity of outputs that the model can generate.
12 |
13 | Codebook collapse is analogous to [mode collapse][1], another problem commonly faced when training generative models.
14 |
15 | [1]: /terms/mode-collapse
16 | [2]: /terms/vector-quantized-variational-autoencoder-vqvae
17 |
--------------------------------------------------------------------------------
/terms/margin.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Margin
3 | related_terms:
4 | - support-vector-machine-svm
5 | references:
6 | - link_title: Hard Margin - Support Vector Machines
7 | link_url: https://en.wikipedia.org/wiki/Support_vector_machine#Hard-margin
8 | ---
9 | In machine learning, a *margin* often refers to the
10 | distance between the two hyperplanes that separate linearly-separable classes of data points.
11 |
12 | ![In this [image from Wikipedia][1], the dotted lines represent the two hyperplanes dividing the white and black data points. The region between the lines is the margin.](/images/margin.png)
13 |
14 | The term is most commonly used when discussing
15 | [support vector machines][2], but often appears in
16 | other literature discussing boundaries between points in a vector space.
17 |
18 | [1]: https://en.wikipedia.org/wiki/Support_vector_machine#Hard-margin
19 | [2]: /terms/support-vector-machine-svm/
20 |
--------------------------------------------------------------------------------
/terms/latent-dirichlet-allocation-differential-evolution-ldade.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: What is Wrong with Topic Modeling? (and How to Fix it Using Search-based
4 | Software Engineering)
5 | link_url: https://arxiv.org/abs/1608.08176
6 | related_terms:
7 | - latent-dirichlet-allocation-lda
8 | - differential-evolution
9 | - clustering-stability
10 | - search-based-software-engineering-sbse
11 | title: Latent Dirichlet Allocation Differential Evolution (LDADE)
12 | ---
13 | LDADE is a tool proposed by Agrawal et al. in a paper
14 | titled [What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)](https://arxiv.org/abs/1608.08176).
15 |
16 | It tunes [LDA](/terms/latent-dirichlet-allocation-lda) parameters
17 | using
18 | [Differential Evolution](/terms/differential-evolution/) to increase
19 | the [clustering stability](/terms/clustering-stability/) of standard LDA.
--------------------------------------------------------------------------------
/terms/dying-relu.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: "What is the \u201Cdying ReLU\u201D problem in neural networks? - Data\
4 | \ Science StackExchange"
5 | link_url: https://datascience.stackexchange.com/questions/5706/what-is-the-dying-relu-problem-in-neural-networks
6 | related_terms:
7 | - leaky-relu
8 | - rectified-linear-unit-relu
9 | title: Dying ReLU
10 | ---
11 | *Dying ReLU* refers to a problem when training neural
12 | networks with [rectified linear units (ReLU)][1].
13 | The unit dies when it only outputs 0 for any given input.
14 |
15 | When training with stochastic gradient descent, the unit
16 | is not likely to return to life, and the unit will no
17 | longer be useful during training.
18 |
19 | [Leaky ReLU][2] is a variant that solves the Dying ReLU problem
20 | by returning a small value when the input $x$ is less than 0.
21 |
22 | [1]: /terms/rectified-linear-unit-relu/
23 | [2]: /terms/leaky-relu/
--------------------------------------------------------------------------------
/terms/zero-shot-learning.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: What is zero-shot learning?
4 | link_url: https://www.quora.com/What-is-zero-shot-learning
5 | - link_title: Representation Learning - Deep Learning Book
6 | link_url: http://www.deeplearningbook.org/contents/representation.html
7 | related_terms:
8 | - one-shot-learning
9 | - representation-learning
10 | title: Zero-shot learning
11 | ---
12 | Ian Goodfellow in [a Quora answer][1] defines zero-shot learning as the following:
13 |
14 | > Zero-shot learning is being able to solve a task despite not having received any training examples of that task. For a concrete example, imagine recognizing a category of object in photos without ever having seen a photo of that kind of object before. If you've read a very detailed description of a cat, you might be able to tell what a cat is in a photograph the first time you see it.
15 |
16 |
17 | [1]: https://www.quora.com/What-is-zero-shot-learning
--------------------------------------------------------------------------------
/_sass/tachyons/readme.md:
--------------------------------------------------------------------------------
1 | # tachyons-sass [](https://travis-ci.org/tachyons-css/tachyons-sass)
2 |
3 | Transpiled partials for Tachyons.
4 |
5 | ## Installation
6 |
7 | ```bash
8 | npm install --save tachyons-sass
9 | ```
10 |
11 | ## Usage
12 |
13 | ```scss
14 | @import "path/to/tachyons.scss";
15 | ```
16 |
17 | ## License
18 |
19 | MIT
20 |
21 | ## Contributing
22 |
23 | 1. Fork it
24 | 2. Create your feature branch (`git checkout -b my-new-feature`)
25 | 3. Commit your changes (`git commit -am 'Add some feature'`)
26 | 4. Push to the branch (`git push origin my-new-feature`)
27 | 5. Create new Pull Request
28 |
29 | Built by [@mrmrs_](https://twitter.com/mrmrs_) & [@4lpine](https://twitter.com/4lpine).
30 |
31 | ***
32 |
33 | > This package was initially generated with [yeoman](http://yeoman.io) and the [p generator](https://github.com/johnotander/generator-p.git).
34 |
--------------------------------------------------------------------------------
/terms/bit-transparency-audio.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Bit transparency (audio)
3 | references:
4 | - link_title: Bit Transparency
5 | link_url: https://www.soundonsound.com/techniques/bit-transparency
6 | ---
7 |
8 | A digital audio system satisfies **bit transparency** if audio data can pass through the system without being changed.
9 |
10 | A system can fail to be bit-transparent if it performs any type of digital signal processing--such as changing the audio's sample rate. Some audio operations--like [converting audio samples from integer to float and back][1]--can either be bit-transparent or not depending on the implementation.
11 |
12 | An audio system [can be tested][2] for bit-transparency by giving a random sequence of bits as input and testing that the output is bit-for-bit identical to the input.
13 |
14 | [1]: http://blog.bjornroche.com/2009/12/int-float-int-its-jungle-out-there.html
15 | [2]: https://benchmarkmedia.com/blogs/wiki/14949565-bit-transparency
16 |
--------------------------------------------------------------------------------
/terms/sense2vec.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: sense2vec - A Fast and Accurate Method for Word Sense Disambiguation
4 | In Neural Word Embeddings
5 | link_url: https://arxiv.org/abs/1511.06388
6 | - link_title: Sense2vec with spaCy and Gensim - Explosion AI
7 | link_url: https://explosion.ai/blog/sense2vec-with-spacy
8 | related_terms:
9 | - word2vec
10 | - word-embedding
11 | - glove-word-embeddings
12 | title: sense2vec
13 | ---
14 | sense2vec refers to a system in a paper titled
15 | [sense2vec - A Fast and Accurate Method for Word Sense Disambiguation In Neural Word Embeddings](https://arxiv.org/abs/1511.06388).
16 | It solves a problem with previous word embeddings like [word2vec](/terms/word2vec) and [GloVe](/terms/glove-word-embeddings/)
17 | where words of different senses (e.g. "duck" as an animal and "duck" as a verb) are represented by the
18 | same embedding.
19 |
20 | sense2vec uses word sense information to train more accurate word embeddings.
--------------------------------------------------------------------------------
/terms/adam-optimizer.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: 'Adam: A Method for Stochastic Optimization'
4 | link_url: https://arxiv.org/abs/1412.6980
5 | - link_title: ADAM - An overview of gradient descent optimization algorithms - Sebastian
6 | Ruder
7 | link_url: http://sebastianruder.com/optimizing-gradient-descent/index.html#adam
8 | related_terms:
9 | - adagrad
10 | - rmsprop
11 | - stochastic-optimization
12 | - stochastic-gradient-descent-sgd
13 | title: ADAM Optimizer
14 | ---
15 | ADAM, or **Ada**ptive **M**oment Estimation, is a stochastic optimization
16 | method [introduced by Diederik P. Kingma and Jimmy Lei Ba][5].
17 |
18 | They intended to combine the advantages of [Adagrad][1]'s
19 | handling of sparse [gradients][3] and [RMSProp][2]'s handling
20 | of [non-stationary environments][4].
21 |
22 | [1]: /terms/adagrad/
23 | [2]: /terms/rmsprop/
24 | [3]: /terms/gradient/
25 | [4]: /terms/stationary-environment/
26 | [5]: https://arxiv.org/abs/1412.6980
--------------------------------------------------------------------------------
/terms/symmetry-breaking.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Why doesn't backpropagation work when you initialize the weights the
4 | same value? -- Cross Validated
5 | link_url: https://stats.stackexchange.com/questions/45087/why-doesnt-backpropagation-work-when-you-initialize-the-weights-the-same-value
6 | - link_title: Random Initialization - Coursera Machine Learning
7 | link_url: https://www.coursera.org/learn/machine-learning/lecture/ND5G5/random-initialization
8 | title: Symmetry breaking
9 | ---
10 | Symmetry breaking refer to a requirement of initializing machine
11 | learning models like neural networks.
12 |
13 | When some machine learning models have weights all initialized
14 | to the same value, it can be difficult or impossible for the
15 | weights to differ as the model is trained. This is the "symmetry".
16 |
17 | Initializing the model to small random values breaks the symmetry
18 | and allows different weights to learn independently of each other.
--------------------------------------------------------------------------------
/_sass/tachyons/scss/_outlines.scss:
--------------------------------------------------------------------------------
1 |
2 | // Converted Variables
3 |
4 |
5 | // Custom Media Query Variables
6 |
7 |
8 | /*
9 |
10 | OUTLINES
11 |
12 | Media Query Extensions:
13 | -ns = not-small
14 | -m = medium
15 | -l = large
16 |
17 | */
18 |
19 | .outline { outline: 1px solid; }
20 | .outline-transparent { outline: 1px solid transparent; }
21 | .outline-0 { outline: 0; }
22 |
23 | @media #{$breakpoint-not-small} {
24 | .outline-ns { outline: 1px solid; }
25 | .outline-transparent-ns { outline: 1px solid transparent; }
26 | .outline-0-ns { outline: 0; }
27 | }
28 |
29 | @media #{$breakpoint-medium} {
30 | .outline-m { outline: 1px solid; }
31 | .outline-transparent-m { outline: 1px solid transparent; }
32 | .outline-0-m { outline: 0; }
33 | }
34 |
35 | @media #{$breakpoint-large} {
36 | .outline-l { outline: 1px solid; }
37 | .outline-transparent-l { outline: 1px solid transparent; }
38 | .outline-0-l { outline: 0; }
39 | }
40 |
--------------------------------------------------------------------------------
/all.html:
--------------------------------------------------------------------------------
1 | ---
2 | layout: page
3 | title: All Words
4 | ---
5 | {% assign terms = site.pages | where_exp:"item","item.url contains 'terms/'" %}
6 | {% for term in terms %}
7 | {% if term.title %}
8 | {% assign link_name = term.url | split: "/" | last %}
9 |
10 | {% if term.layout == "redirect" %}
11 | {% assign destination_url = "/terms/" | append: term.destination | append: "/" %}
12 | {% assign destination = terms | where_exp: "item", "item.url == destination_url" | first %}
13 | See {{ destination.title }}.
14 | {% else %}
15 | {{ term.content | markdownify | replace: 'href="/terms/', 'href="#' }}
16 | {% include synonyms.html page=term %}
17 | {% include related_terms.html page=term local=true %}
18 | {% include references.html page=term %}
19 | {% endif %}
20 |
21 | {% endif %}
22 | {% endfor %}
--------------------------------------------------------------------------------
/terms/inverted-dropout.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Analysis of Dropout
4 | link_url: https://pgaleone.eu/deep-learning/regularization/2017/01/10/anaysis-of-dropout/
5 | related_terms:
6 | - regularization
7 | - model-averaging
8 | - dropout
9 | title: Inverted dropout
10 | ---
11 | Inverted dropout is a variant of the original [dropout](/terms/dropout)
12 | technique developed by Hinton et al.
13 |
14 | Just like traditional dropout, inverted dropout randomly
15 | keeps some weights and sets others to zero. This is known
16 | as the "keep probability" $p$.
17 |
18 | The one difference is that, during the training of a neural
19 | network, inverted dropout scales the activations by
20 | the inverse of the keep probability $q = 1 - p$.
21 |
22 | This prevents network's activations from getting too large,
23 | and does not require any changes to the network during
24 | evaluation.
25 |
26 | In contrast, traditional dropout requires scaling to be implemented
27 | during the test phase.
--------------------------------------------------------------------------------
/terms/siamese-neural-network.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Siamese neural network
3 | related_terms:
4 | - neural-network
5 | - representation-learning
6 | - one-shot-learning
7 | - triplet-loss
8 | references:
9 | - link_title: Siamese network - Convolutional Neural Networks - deeplearning.ai
10 | link_url: https://www.coursera.org/learn/convolutional-neural-networks/lecture/bjhmj/siamese-network
11 | ---
12 | A *Siamese neural network* is a neural network architecture that runs two pieces of data through identical neural networks, and then the outputs are fed to a loss function measuring similarity between outputs.
13 |
14 | Siamese neural networks are a common model architecture for [one-shot learning][1].
15 |
16 | For example, a Siamese neural network might be used to train a model to measure similarity between two different images, for the purpose of identifying whether the images are of the object.... but without training on many examples of that object.
17 |
18 | [1]: /terms/one-shot-learning
--------------------------------------------------------------------------------
/terms/unk.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: UNK
3 | ---
4 | UNK, unk, `` are variants of a symbol in natural language processing
5 | and machine translation to indicate an out-of-vocabulary word.
6 |
7 | Many language models do calculations upon representations of
8 | the $n$ most frequent words in the corpus. Words that are less
9 | frequent are replaced with the `` symbol.
10 |
11 | This is what such a transformation might look like. The below
12 | is an example of a source document in a corpus with
13 | common English words.
14 |
15 |
16 | > Today I'll bake; tomorrow I'll brew,
17 | > Then I'll fetch the queen's new child,
18 | > It is good that no one knows,
19 | > **Rumpelstiltskin** is my name.
20 |
21 | Every word in the above quote is common in English, except for
22 | Rumpelstiltskin, which is replaced as following:
23 |
24 | > Today I'll bake; tomorrow I'll brew,
25 | > Then I'll fetch the queen's new child,
26 | > It is good that no one knows,
27 | > **<unk>** is my name.
28 |
--------------------------------------------------------------------------------
/terms/pseudo-labeling.md:
--------------------------------------------------------------------------------
1 | ---
2 | related_terms:
3 | - semi-supervised-learning
4 | title: Pseudo-labeling
5 | ---
6 |
7 | **Pseudo-labeling** is when:
8 |
9 | 1. A machine learning model is trained on a labeled training set.
10 | 2. The model is used to compute predicted labels against unlabeled data.
11 | 3. The model is retrained from a new dataset that adds the data with predicted labels to the training set.
12 |
13 | Pseudo-labeling can sometimes be very effective in improving a machine learning model's accuracy. The underlying theory is that pseudo-labeling can make it easier for a classification model to learn more precise boundaries between classes. However, in order for pseudo-labeling to work, the original training set must be large enough--and representative of all classes--for the model's predicted labels to be reasonably accurate.
14 |
15 | However, if the training set is already very large relative to the number of parameters in the model, then pseudo-labeling may be unnecessary.
16 |
--------------------------------------------------------------------------------
/acronyms.html:
--------------------------------------------------------------------------------
1 | ---
2 | layout: page
3 | title: "List of Acronyms"
4 | ---
5 |
6 | This page contains alphabetically-sorted links to all terms with well-known acronyms.
7 |
--------------------------------------------------------------------------------
/terms/convex-hull.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Convex hull - Wikipedia
4 | link_url: https://en.wikipedia.org/wiki/Convex_hull
5 | - link_title: Convex set - Wikipedia
6 | link_url: https://en.wikipedia.org/wiki/Convex_set
7 | - link_title: Convex combination - Wikipedia
8 | link_url: https://en.wikipedia.org/wiki/Convex_combination
9 | related_terms:
10 | - convex-combination
11 | - affine-space
12 | title: Convex hull
13 | ---
14 | The convex hull of a set $X$ in an affine space over the reals is the smallest
15 | [convex set][1] that contains $X$. When the points are two dimensional,
16 | the convex hull can be thought of as the rubber band around the points of $X$.
17 |
18 | As per Wikipedia, a [convex set][1] is the smallest affine space
19 | closed under convex combination.
20 |
21 | A [convex combination][2] is a linear combination where
22 | all the coefficients are greater than 0 and all sum to 1.
23 |
24 | [1]: https://en.wikipedia.org/wiki/Convex_set
25 | [2]: /terms/convex-combination/
--------------------------------------------------------------------------------
/terms/neural-turing-machine-ntm.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Neural Turing Machine - Wikipedia
4 | link_url: https://en.wikipedia.org/wiki/Neural_Turing_machine
5 | - link_title: Neural Turing Machines
6 | link_url: https://arxiv.org/pdf/1410.5401.pdf
7 | related_terms:
8 | - recurrent-neural-network
9 | - neural-network
10 | - long-short-term-memory
11 | title: Neural Turing Machine (NTM)
12 | ---
13 | Neural Turing Machines (NTM) consists of a RNN (commonly with LSTM), and a memory bank, where the neural network can make write and read operations. By making each operation of the NTM differentiable, it can be trained efficiently trained with gradient descent.
14 |
15 | The main idea of the NTM is to use the memory bank -- a large, addressable memory -- to give a memory to the RNN so that it can read and write to, yielding a practical mechanism to learn programs. The NTM has been shown to be able to infer simple algorithms, such as copying, sorting and associative recall from input and output examples.
--------------------------------------------------------------------------------
/_sass/tachyons/scss/_word-break.scss:
--------------------------------------------------------------------------------
1 |
2 | // Converted Variables
3 |
4 |
5 | // Custom Media Query Variables
6 |
7 |
8 | /*
9 |
10 | WORD BREAK
11 |
12 | Base:
13 | word = word-break
14 |
15 | Media Query Extensions:
16 | -ns = not-small
17 | -m = medium
18 | -l = large
19 |
20 | */
21 |
22 | .word-normal { word-break: normal; }
23 | .word-wrap { word-break: break-all; }
24 | .word-nowrap { word-break: keep-all; }
25 |
26 | @media #{$breakpoint-not-small} {
27 | .word-normal-ns { word-break: normal; }
28 | .word-wrap-ns { word-break: break-all; }
29 | .word-nowrap-ns { word-break: keep-all; }
30 | }
31 |
32 | @media #{$breakpoint-medium} {
33 | .word-normal-m { word-break: normal; }
34 | .word-wrap-m { word-break: break-all; }
35 | .word-nowrap-m { word-break: keep-all; }
36 | }
37 |
38 | @media #{$breakpoint-large} {
39 | .word-normal-l { word-break: normal; }
40 | .word-wrap-l { word-break: break-all; }
41 | .word-nowrap-l { word-break: keep-all; }
42 | }
43 |
44 |
--------------------------------------------------------------------------------
/terms/vector-quantized-variational-autoencoder-vqvae.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: "Vector-Quantized Variational Autoencoders (VQ-VAE)"
3 | references:
4 | - link_title: "Understanding VQ-VAE (DALL-E Explained Pt. 1) - Machine Learning @ Berkeley"
5 | link_url: https://ml.berkeley.edu/blog/posts/vq-vae/
6 | - link_title: "Neural Discrete Representation Learning"
7 | link_url: https://arxiv.org/abs/1711.00937
8 | related_terms:
9 | - codebook
10 | - variational-autoencoder-vae
11 | - generative-adversarial-network-gan
12 | ---
13 |
14 | The **Vector-Quantized Variational Autoencoder (VAE)** is a type of [variational autoencoder][1] where the autoencoder's encoder neural network emits discrete--not continuous--values by mapping the encoder's embedding values to a fixed number of [codebook][2] values.
15 |
16 | The VQ-VAE was originally introduced in the [Neural Discrete Representation Learning][3] paper from Google.
17 |
18 | [1]: /terms/variational-autoencoder-vae
19 | [2]: /terms/codebook
20 | [3]: https://arxiv.org/abs/1711.00937
21 |
--------------------------------------------------------------------------------
/_sass/tachyons/scss/_text-align.scss:
--------------------------------------------------------------------------------
1 |
2 | // Converted Variables
3 |
4 |
5 | // Custom Media Query Variables
6 |
7 |
8 | /*
9 |
10 | TEXT ALIGN
11 | Docs: http://tachyons.io/docs/typography/text-align/
12 |
13 | Base
14 | t = text-align
15 |
16 | Modifiers
17 | l = left
18 | r = right
19 | c = center
20 |
21 | Media Query Extensions:
22 | -ns = not-small
23 | -m = medium
24 | -l = large
25 |
26 | */
27 |
28 | .tl { text-align: left; }
29 | .tr { text-align: right; }
30 | .tc { text-align: center; }
31 |
32 | @media #{$breakpoint-not-small} {
33 | .tl-ns { text-align: left; }
34 | .tr-ns { text-align: right; }
35 | .tc-ns { text-align: center; }
36 | }
37 |
38 | @media #{$breakpoint-medium} {
39 | .tl-m { text-align: left; }
40 | .tr-m { text-align: right; }
41 | .tc-m { text-align: center; }
42 | }
43 |
44 | @media #{$breakpoint-large} {
45 | .tl-l { text-align: left; }
46 | .tr-l { text-align: right; }
47 | .tc-l { text-align: center; }
48 | }
49 |
50 |
--------------------------------------------------------------------------------
/terms/mean-reciprocal-rank-mrr.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: 'Chapter 14: Question Answering and Information Retrieval - Speech and Language Processing'
4 | link_url: https://web.stanford.edu/~jurafsky/slp3/14.pdf
5 | - link_title: Mean reciprocal rank - Wikipedia
6 | link_url: https://en.wikipedia.org/wiki/Mean_reciprocal_rank
7 | title: Mean Reciprocal Rank (MRR)
8 | ---
9 | $\newcommand{\Correctrank}{\mathrm{rank}}$
10 |
11 | Mean Reciprocal Rank is a measure to evaluate systems that return
12 | a ranked list of answers to queries.
13 |
14 | For a single query, the *reciprocal rank* is
15 | $\frac 1 \Correctrank$ where $\Correctrank$ is the position of the
16 | highest-ranked answer ($1, 2, 3, \ldots, N$ for $N$ answers returned
17 | in a query). If no correct answer was returned in the query, then the reciprocal
18 | rank is 0.
19 |
20 | For multiple queries $Q$, the Mean Reciprocal Rank is the mean
21 | of the $Q$ reciprocal ranks.
22 |
23 | $$\mathrm{MRR} = \frac 1 Q \sum_{i=1}^{Q} \frac 1 {\Correctrank_i}$$
24 |
--------------------------------------------------------------------------------
/terms/vggnet.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Very Deep Convolutional Networks for Large-Scale Visual Recognition
4 | - Department of Engineering Science, University of Oxford
5 | link_url: http://www.robots.ox.ac.uk/~vgg/research/very_deep/
6 | - link_title: Case Studies - Stanford CS231n Convolutional Neural Networks
7 | link_url: http://cs231n.github.io/convolutional-networks/#case
8 | - link_title: Very Deep Convolutional Networks for Large-Scale Image Recognition
9 | link_url: https://arxiv.org/abs/1409.1556
10 | related_terms:
11 | - convolutional-neural-network-cnn
12 | title: VGGNet
13 | ---
14 | VGGNet is a deep convolutional neural network
15 | for image recognition, trained by
16 | the Visual Geometry Group (VGG) at the University of Oxford.
17 |
18 | VGGNet helped the VGG team secure the [first place
19 | in Localization and second place in Classification][1]
20 | in the 2014 ImageNet Large Scale Visual Recognition Competition.
21 |
22 | [1]: http://www.image-net.org/challenges/LSVRC/2014/results#clsloc
--------------------------------------------------------------------------------
/terms/catastrophic-forgetting.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Overcoming catastrophic forgetting in neural networks
4 | link_url: https://arxiv.org/abs/1612.00796
5 | - link_title: Catastrophic interference - Wikipedia
6 | link_url: https://en.wikipedia.org/wiki/Catastrophic_interference
7 | - link_title: Catastrophic forgetting - Standout Publishing
8 | link_url: http://standoutpublishing.com/g/catastrophic-forgetting.html
9 | - link_title: 'Catastrophic Interference in Connectionist Networks: The Sequential
10 | Learning Problem'
11 | link_url: http://www.sciencedirect.com/science/article/pii/S0079742108605368
12 | title: Catastrophic forgetting
13 | ---
14 | Catastrophic forgetting (or catastrophic interference) is a problem
15 | in machine learning where a model forgets an existing learned pattern
16 | when learning a new one.
17 |
18 | The model uses the same parameters to recognize both patterns,
19 | and learning the second pattern overwrites the parameters'
20 | configuration from having learned the first pattern.
--------------------------------------------------------------------------------
/_sass/tachyons/scss/_background-size.scss:
--------------------------------------------------------------------------------
1 |
2 | // Converted Variables
3 |
4 |
5 | // Custom Media Query Variables
6 |
7 |
8 | /*
9 |
10 | BACKGROUND SIZE
11 | Docs: http://tachyons.io/docs/themes/background-size/
12 |
13 | Media Query Extensions:
14 | -ns = not-small
15 | -m = medium
16 | -l = large
17 |
18 | */
19 |
20 | /*
21 | Often used in combination with background image set as an inline style
22 | on an html element.
23 | */
24 |
25 | .cover { background-size: cover!important; }
26 | .contain { background-size: contain!important; }
27 |
28 | @media #{$breakpoint-not-small} {
29 | .cover-ns { background-size: cover!important; }
30 | .contain-ns { background-size: contain!important; }
31 | }
32 |
33 | @media #{$breakpoint-medium} {
34 | .cover-m { background-size: cover!important; }
35 | .contain-m { background-size: contain!important; }
36 | }
37 |
38 | @media #{$breakpoint-large} {
39 | .cover-l { background-size: cover!important; }
40 | .contain-l { background-size: contain!important; }
41 | }
42 |
--------------------------------------------------------------------------------
/terms/connectionist-temporal-classification-ctc.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: 'Connectionist Temporal Classification: Labelling Unsegmented Sequence
4 | Data with Recurrent Neural Networks'
5 | link_url: http://www.machinelearning.org/proceedings/icml2006/047_Connectionist_Tempor.pdf
6 | related_terms:
7 | - recurrent-neural-network
8 | - temporal-classification
9 | title: Connectionist Temporal Classification (CTC)
10 | ---
11 | *Connectionist Temporal Classification* is a term coined in a paper by
12 | Graves et al. titled [Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks][1].
13 |
14 | It refers to the use of [recurrent neural networks][2]
15 | (which is a form of [connectionism][3]) for the purpose of
16 | labeling unsegmented data sequences (AKA [temporal classification][4]).
17 |
18 | [1]: http://www.machinelearning.org/proceedings/icml2006/047_Connectionist_Tempor.pdf
19 | [2]: /terms/recurrent-neural-network/
20 | [3]: /terms/connectionism/
21 | [4]: /terms/temporal-classification/
--------------------------------------------------------------------------------
/terms/language-segmentation.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Language segmentation
3 | related_terms:
4 | - natural-language-processing
5 | ---
6 | This phrase is most concisely described in in [this work by David Alfter][1]:
7 |
8 | > Language segmentation consists in finding the boundaries where one
9 | > language ends and another language begins in a text written in more than one language.
10 | > This is important for all natural language processing tasks.
11 | >
12 | > [...]
13 | >
14 | > One important point that has to be borne in mind is the difference between language
15 | > identification and language segmentation. Language identification is concerned with recognizing
16 | > the language at hand. It is possible to use language identification for language segmentation.
17 | > Indeed, by identifying the languages in a text, the segmentation is implicitly obtained.
18 | > Language segmentation on the other hand is only concerned with identifying language
19 | > boundaries. No claims about the languages involved are made.
20 |
21 | [1]: https://arxiv.org/abs/1510.01717
--------------------------------------------------------------------------------
/_sass/tachyons/scss/_clears.scss:
--------------------------------------------------------------------------------
1 |
2 | // Converted Variables
3 |
4 |
5 | // Custom Media Query Variables
6 |
7 |
8 | /*
9 |
10 | CLEARFIX
11 | http://tachyons.io/docs/layout/clearfix/
12 |
13 | */
14 |
15 | /* Nicolas Gallaghers Clearfix solution
16 | Ref: http://nicolasgallagher.com/micro-clearfix-hack/ */
17 |
18 | .cf:before,
19 | .cf:after { content: " "; display: table; }
20 | .cf:after { clear: both; }
21 | .cf { *zoom: 1; }
22 |
23 | .cl { clear: left; }
24 | .cr { clear: right; }
25 | .cb { clear: both; }
26 | .cn { clear: none; }
27 |
28 | @media #{$breakpoint-not-small} {
29 | .cl-ns { clear: left; }
30 | .cr-ns { clear: right; }
31 | .cb-ns { clear: both; }
32 | .cn-ns { clear: none; }
33 | }
34 |
35 | @media #{$breakpoint-medium} {
36 | .cl-m { clear: left; }
37 | .cr-m { clear: right; }
38 | .cb-m { clear: both; }
39 | .cn-m { clear: none; }
40 | }
41 |
42 | @media #{$breakpoint-large} {
43 | .cl-l { clear: left; }
44 | .cr-l { clear: right; }
45 | .cb-l { clear: both; }
46 | .cn-l { clear: none; }
47 | }
48 |
--------------------------------------------------------------------------------
/terms/padding-convolution.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Padding (convolution)
3 | related_terms:
4 | - zero-padding
5 | - convolution
6 | - convolutional-neural-network-cnn
7 | ---
8 | Padding is a preprocessing step before a convolution operation.
9 |
10 | When we [convolve][1] a $n \times n$ image with an $f \times f$ filter
11 | and a stride length of $1$,
12 | the output is a matrix of dimension $n - f \times n - f$.
13 |
14 | For deep convolutional neural networks that may do many convolutions,
15 | this would cause the input matrix to dramatically shrink and become
16 | too small.
17 |
18 | Additionally, values in the middle of the input matrix have a greater
19 | influence on the output than values on the edges.
20 |
21 | There are several different methods for choosing what values to pad an input
22 | matrix with:
23 |
24 | - [Zero-padding][2] -- padding with zeroes
25 | - Repeating the nearest border values as values for padding
26 | - Using values from the opposite side of the matrix as padding values
27 |
28 | [1]: /terms/convolution/
29 | [2]: /terms/zero-pad/
30 |
--------------------------------------------------------------------------------
/terms/batch-normalization.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: 'Batch Normalization: Accelerating Deep Network Training by Reducing
4 | Internal Covariate Shift'
5 | link_url: https://arxiv.org/abs/1502.03167
6 | - link_title: "Batch Normalization\u200A\u2014\u200AWhat the hey?"
7 | link_url: https://gab41.lab41.org/batch-normalization-what-the-hey-d480039a9e3b
8 | - link_title: Why does batch normalization help? - Quora
9 | link_url: https://www.quora.com/Why-does-batch-normalization-help
10 | - link_title: Understanding the backward pass through Batch Normalization Layer
11 | link_url: http://kratzert.github.io/2016/02/12/understanding-the-gradient-flow-through-the-batch-normalization-layer.html
12 | title: Batch normalization
13 | ---
14 | Batch normalization is a technique used to improve the stability and performance of deep neural networks. It works by normalizing the input data at each layer, which allows the network to learn more effectively. Batch normalization has been shown to improve training times, accuracy, and robustness of deep neural networks.
15 |
--------------------------------------------------------------------------------
/terms/distance-metric.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Distance metric
3 | references:
4 | - link_title: Metric (mathematics)
5 | link_url: https://en.wikipedia.org/wiki/Metric_(mathematics)
6 | ---
7 | As per Wikipedia, a distance metric, metric, or distance
8 | function, "is a function that defines a distance between each pair of elements of a set."
9 |
10 | A distance metric $d(\cdot)$ requires the following four axioms to be true
11 | for all elements $x$, $y$, and $z$ in a given set.
12 |
13 | - **Non-negativity:** $d(x, y) \geq 0$ -- The distance must always be
14 | greater than zero.
15 | - **Identity of indiscernibles:** $d(x, y) = 0 \Leftrightarrow x = y$ -- The distance must be zero for two elements that are the same (i.e. indiscernible from each other).
16 | - **Symmetry:** $d(x,y) = d(y,x)$ -- The distances must be the same, no matter which order the parameters are given.
17 | - **Triangle inequality:** $d(x,z) \leq d(x,y) + d(y,z)$ -- For three elements in the set, the sum of the distances for any two pairs must be greater than the distance for the remaining pair.
18 |
19 |
--------------------------------------------------------------------------------
/terms/yolo-object-detection.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: YOLO (object detection algorithm)
3 | related_terms:
4 | - computer-vision
5 | - convolutional-neural-network-cnn
6 | - object-detection
7 | - object-localization
8 | references:
9 | - link_title: "You Only Look Once: Realtime Object detection"
10 | link_url: https://pjreddie.com/media/files/papers/yolo.pdf
11 | - link_title: YOLO project homepage
12 | link_url: https://pjreddie.com/darknet/yolo/
13 | ---
14 |
15 | *YOLO* (an acronym standing for the phrase "You Only Look Once")
16 | refers to a fast object detection algorithm. Previous attempts
17 | at building object detection algorithms involved running
18 | [object detectors][1] or [object localizers][2] multiple times over
19 | a single image.
20 |
21 | Instead of needing multiple executions over a single image, YOLO
22 | detects objects through sending an image through a single forward
23 | pass through a [convolutional neural network][3].
24 |
25 | [1]: /terms/object-detection/
26 | [2]: /terms/object-localization/
27 | [3]: /terms/convolutional-neural-network-cnn/
28 |
--------------------------------------------------------------------------------
/terms/bag-of-n-grams.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Bag-of-n-grams
3 | related_terms:
4 | - bag-of-words
5 | ---
6 | A bag-of-$n$-grams model is a way to represent a document,
7 | similar to a [bag-of-words][/terms/bag-of-words/] model.
8 |
9 | A bag-of-$n$-grams model represents a text document as
10 | an unordered collection of its $n$-grams.
11 |
12 | For example, let's use the following phrase and divide
13 | it into bi-grams ($n = 2$).
14 |
15 | > James is the best person ever.
16 |
17 | becomes
18 |
19 | - `James`
20 | - `James is`
21 | - `is the`
22 | - `the best`
23 | - `best person`
24 | - `person ever.`
25 | - `ever.`
26 |
27 | In a typical bag-of-$n$-grams model, these 6 bigrams would be
28 | a sample from a large number of bigrams observed in a corpus.
29 | And then *James is the best person ever.* would be encoded
30 | in a representation showing which of the corpus's bigrams
31 | were observed in the sentence.
32 |
33 | A bag-of-$n$-grams model has the simplicity of the bag-of-words
34 | model, but allows the preservation of more word locality
35 | information.
36 |
--------------------------------------------------------------------------------
/terms/facet-plotting.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Facet (plotting)
4 | link_url: http://www.cookbook-r.com/Graphs/Facets_(ggplot2)
5 | - link_title: Plotting multiple groups with facets in ggplot2
6 | link_url: https://www3.nd.edu/~steve/computing_with_data/13_Facets/facets.html
7 | title: Facet
8 | related_terms:
9 | - facets-tool
10 | ---
11 |
12 | In statistical plotting, a facet is a type of plot. Data
13 | is split into subsets and the subsets are plotted
14 | in a row or grid of subplots.
15 |
16 | The term is common among users of [ggplot2](http://ggplot2.org/),
17 | a plotting package for the
18 | [R statistical computing language](https://www.r-project.org/about.html).
19 |
20 | [Facet](/terms/facets-tool) is also the name of a plotting and
21 | visualizaton tool created by the People + AI Research (PAIR)
22 | initiative at Google.
23 |
24 | ![This is a facet wrap as generated by the R package `ggplot2`. This image comes from [Plotting multiple groups with facets in ggplot2][1].](/images/faceting.png)
25 |
26 | [1]: https://www3.nd.edu/~steve/computing_with_data/13_Facets/facets.html
--------------------------------------------------------------------------------
/terms/weak-supervision.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Weak supervision
3 | references:
4 | - link_title: "Weak Supervision: A New Programming Paradigm for Machine Learning - The Stanford AI Lab Blog"
5 | link_url: https://ai.stanford.edu/blog/weak-supervision/
6 | - link_title: "Snorkel and The Dawn of Weakly Supervised Machine Learning - Stanford DAWN"
7 | link_url: https://dawn.cs.stanford.edu/2017/05/08/snorkel/
8 | - link_title: "Snuba: Automating Weak Supervision to Label Training Data - Stanford University"
9 | link_url: http://www.vldb.org/pvldb/vol12/p223-varma.pdf
10 | - link_title: "Weak supervision - Wikipedia"
11 | link_url: https://en.wikipedia.org/wiki/Weak_supervision
12 | related_terms:
13 | - active-learning
14 | - semi-supervised-learning
15 | - transfer-learning
16 | ---
17 | **Weak supervision** describes the use of noisy or error-prone data labels for training supervised learning models.
18 |
19 | It can be expensive or impractical to create or obtain highly-accurate labels for a large dataset. Weak supervision offers the choice of using a larger number of somewhat-less-accurate data labels.
20 |
--------------------------------------------------------------------------------
/_sass/tachyons/scss/_position.scss:
--------------------------------------------------------------------------------
1 |
2 | // Converted Variables
3 |
4 |
5 | // Custom Media Query Variables
6 |
7 |
8 | /*
9 |
10 | POSITIONING
11 | Docs: http://tachyons.io/docs/layout/position/
12 |
13 | Media Query Extensions:
14 | -ns = not-small
15 | -m = medium
16 | -l = large
17 |
18 | */
19 |
20 | .static { position: static; }
21 | .relative { position: relative; }
22 | .absolute { position: absolute; }
23 | .fixed { position: fixed; }
24 |
25 | @media #{$breakpoint-not-small} {
26 | .static-ns { position: static; }
27 | .relative-ns { position: relative; }
28 | .absolute-ns { position: absolute; }
29 | .fixed-ns { position: fixed; }
30 | }
31 |
32 | @media #{$breakpoint-medium} {
33 | .static-m { position: static; }
34 | .relative-m { position: relative; }
35 | .absolute-m { position: absolute; }
36 | .fixed-m { position: fixed; }
37 | }
38 |
39 | @media #{$breakpoint-large} {
40 | .static-l { position: static; }
41 | .relative-l { position: relative; }
42 | .absolute-l { position: absolute; }
43 | .fixed-l { position: fixed; }
44 | }
45 |
--------------------------------------------------------------------------------
/terms/neural-checklist-model.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Globally Coherent Text Generation with Neural Checklist Models
4 | link_url: https://homes.cs.washington.edu/~yejin/Papers/emnlp16_neuralchecklist.pdf
5 | related_terms:
6 | - recurrent-neural-network
7 | - attention-neural-networks
8 | title: Neural checklist model
9 | ---
10 | Neural checklist models were introduced in the paper [Globally Coherent Text Generation with Neural Checklist Models](https://homes.cs.washington.edu/~yejin/Papers/emnlp16_neuralchecklist.pdf) by Kiddon et al.
11 |
12 | A neural checklist model is a recurrent neural network that tracks an agenda of text strings that should be mentioned in the output.
13 |
14 | This technique allows the neural checklist model to generate *globally coherent* text, as opposed to text from traditional RNNs that is only locally coherent.
15 |
16 | The original paper describes applying the neural checklist model
17 | to recipes and dialogue responses for information systems,
18 | where there already exists a pre-existing notion of all
19 | the topics that should be present in a natural language response.
--------------------------------------------------------------------------------
/_sass/tachyons/license.txt:
--------------------------------------------------------------------------------
1 | The MIT License (MIT)
2 |
3 | Copyright (c) 2015 @mrmrs (mrmrs.io)
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
6 |
7 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
8 |
9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
10 |
--------------------------------------------------------------------------------
/_sass/tachyons/scss/_text-decoration.scss:
--------------------------------------------------------------------------------
1 |
2 | // Converted Variables
3 |
4 |
5 | // Custom Media Query Variables
6 |
7 |
8 | /*
9 |
10 | TEXT DECORATION
11 | Docs: http://tachyons.io/docs/typography/text-decoration/
12 |
13 |
14 | Media Query Extensions:
15 | -ns = not-small
16 | -m = medium
17 | -l = large
18 |
19 | */
20 |
21 | .strike { text-decoration: line-through; }
22 | .underline { text-decoration: underline; }
23 | .no-underline { text-decoration: none; }
24 |
25 |
26 | @media #{$breakpoint-not-small} {
27 | .strike-ns { text-decoration: line-through; }
28 | .underline-ns { text-decoration: underline; }
29 | .no-underline-ns { text-decoration: none; }
30 | }
31 |
32 | @media #{$breakpoint-medium} {
33 | .strike-m { text-decoration: line-through; }
34 | .underline-m { text-decoration: underline; }
35 | .no-underline-m { text-decoration: none; }
36 | }
37 |
38 | @media #{$breakpoint-large} {
39 | .strike-l { text-decoration: line-through; }
40 | .underline-l { text-decoration: underline; }
41 | .no-underline-l { text-decoration: none; }
42 | }
43 |
--------------------------------------------------------------------------------
/_sass/tachyons/scss/_line-height.scss:
--------------------------------------------------------------------------------
1 |
2 | // Converted Variables
3 |
4 |
5 | // Custom Media Query Variables
6 |
7 |
8 | /*
9 |
10 | LINE HEIGHT / LEADING
11 | Docs: http://tachyons.io/docs/typography/line-height
12 |
13 | Media Query Extensions:
14 | -ns = not-small
15 | -m = medium
16 | -l = large
17 |
18 | */
19 |
20 | .lh-solid { line-height: $line-height-solid; }
21 | .lh-title { line-height: $line-height-title; }
22 | .lh-copy { line-height: $line-height-copy; }
23 |
24 | @media #{$breakpoint-not-small} {
25 | .lh-solid-ns { line-height: $line-height-solid; }
26 | .lh-title-ns { line-height: $line-height-title; }
27 | .lh-copy-ns { line-height: $line-height-copy; }
28 | }
29 |
30 | @media #{$breakpoint-medium} {
31 | .lh-solid-m { line-height: $line-height-solid; }
32 | .lh-title-m { line-height: $line-height-title; }
33 | .lh-copy-m { line-height: $line-height-copy; }
34 | }
35 |
36 | @media #{$breakpoint-large} {
37 | .lh-solid-l { line-height: $line-height-solid; }
38 | .lh-title-l { line-height: $line-height-title; }
39 | .lh-copy-l { line-height: $line-height-copy; }
40 | }
41 |
42 |
--------------------------------------------------------------------------------
/terms/distributed-representation.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Distributed Representation - Directory of Cognitive Science
4 | link_url: http://www.bcp.psych.ualberta.ca/~mike/Pearl_Street/Dictionary/contents/D/distributed.html
5 | - link_title: Local and distributed representations - Programming Methods for Cognitive
6 | Science
7 | link_url: http://www.indiana.edu/~gasser/Q530/Notes/representation.html
8 | related_terms:
9 | - word-embedding
10 | title: Distributed representation
11 | ---
12 | In machine learning, data with a *local representation* typically has 1 unit per element.
13 | A 5-word vocabulary might be defined by a 5-dimensional vector, with
14 | $[1, 0, 0, 0, 0]^T$ denoting the first word, $[0, 1, 0, 0, 0]^T$ denoting the second word,
15 | and so forth.
16 |
17 | Distributed representations are the opposite, instead of concentrating the meaning
18 | of a data point into one component or one "element", the meaning of the
19 | data is distributed across the whole vector.
20 |
21 | The word that is $[1, 0, 0, 0, 0]^T$ in a local representation might look like
22 | $[-0.150, -0.024, -0.233, -0.253, -0.183]^T$ in a distributed representation.
--------------------------------------------------------------------------------
/terms/word2vec.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Efficient Estimation of Word Representations in Vector Space
4 | link_url: https://arxiv.org/abs/1301.3781
5 | - link_title: Vector Representations of Words - Tutorials - TensorFlow Documentation
6 | link_url: https://www.tensorflow.org/tutorials/word2vec
7 | related_terms:
8 | - word-embedding
9 | - skip-gram
10 | - continuous-bag-of-words-cbow
11 | - doc2vec
12 | title: word2vec
13 | ---
14 | `word2vec` refers to a pair of models, open-source software, and pre-trained word embeddings
15 | from Google.
16 |
17 | The models are:
18 |
19 | - [skip-gram](/terms/skip-gram/), using a word to predict the surrounding $n$ words
20 | - [continuous-bag-of-words (CBOW)](/terms/continuous-bag-of-words-cbow), using the context of the surrounding
21 | $n$ words to predict the center word.
22 |
23 | The original paper is titled [Efficient Estimation of Word Representations in
24 | Vector Space](https://arxiv.org/abs/1301.3781) by Mikolov et al.
25 |
26 | The source code was originally hosted on
27 | [Google Code](https://code.google.com/p/word2vec) but is now
28 | located [on Github](https://github.com/tmikolov/word2vec).
--------------------------------------------------------------------------------
/_sass/tachyons/scss/_utilities.scss:
--------------------------------------------------------------------------------
1 |
2 | // Converted Variables
3 |
4 |
5 | // Custom Media Query Variables
6 |
7 |
8 | /*
9 |
10 | UTILITIES
11 |
12 | Media Query Extensions:
13 | -ns = not-small
14 | -m = medium
15 | -l = large
16 |
17 | */
18 |
19 | /* Equivalent to .overflow-y-scroll */
20 | .overflow-container {
21 | overflow-y: scroll;
22 | }
23 |
24 | .center {
25 | margin-right: auto;
26 | margin-left: auto;
27 | }
28 |
29 | .mr-auto { margin-right: auto; }
30 | .ml-auto { margin-left: auto; }
31 |
32 | @media #{$breakpoint-not-small}{
33 | .center-ns {
34 | margin-right: auto;
35 | margin-left: auto;
36 | }
37 | .mr-auto-ns { margin-right: auto; }
38 | .ml-auto-ns { margin-left: auto; }
39 | }
40 |
41 | @media #{$breakpoint-medium}{
42 | .center-m {
43 | margin-right: auto;
44 | margin-left: auto;
45 | }
46 | .mr-auto-m { margin-right: auto; }
47 | .ml-auto-m { margin-left: auto; }
48 | }
49 |
50 | @media #{$breakpoint-large}{
51 | .center-l {
52 | margin-right: auto;
53 | margin-left: auto;
54 | }
55 | .mr-auto-l { margin-right: auto; }
56 | .ml-auto-l { margin-left: auto; }
57 | }
58 |
--------------------------------------------------------------------------------
/_sass/tachyons/scss/_vertical-align.scss:
--------------------------------------------------------------------------------
1 |
2 | // Converted Variables
3 |
4 |
5 | // Custom Media Query Variables
6 |
7 |
8 | /*
9 |
10 | VERTICAL ALIGN
11 |
12 | Media Query Extensions:
13 | -ns = not-small
14 | -m = medium
15 | -l = large
16 |
17 | */
18 |
19 | .v-base { vertical-align: baseline; }
20 | .v-mid { vertical-align: middle; }
21 | .v-top { vertical-align: top; }
22 | .v-btm { vertical-align: bottom; }
23 |
24 | @media #{$breakpoint-not-small} {
25 | .v-base-ns { vertical-align: baseline; }
26 | .v-mid-ns { vertical-align: middle; }
27 | .v-top-ns { vertical-align: top; }
28 | .v-btm-ns { vertical-align: bottom; }
29 | }
30 |
31 | @media #{$breakpoint-medium} {
32 | .v-base-m { vertical-align: baseline; }
33 | .v-mid-m { vertical-align: middle; }
34 | .v-top-m { vertical-align: top; }
35 | .v-btm-m { vertical-align: bottom; }
36 | }
37 |
38 | @media #{$breakpoint-large} {
39 | .v-base-l { vertical-align: baseline; }
40 | .v-mid-l { vertical-align: middle; }
41 | .v-top-l { vertical-align: top; }
42 | .v-btm-l { vertical-align: bottom; }
43 | }
44 |
--------------------------------------------------------------------------------
/terms/sequence-to-sequence-learning-seq2seq.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Sequence-to-Sequence Models - TensorFlow Tutorials
4 | link_url: https://www.tensorflow.org/tutorials/seq2seq
5 | - link_title: Sequence to Sequence Learning with Neural Networks
6 | link_url: https://arxiv.org/abs/1409.3215
7 | related_terms:
8 | - long-short-term-memory-lstm
9 | - recurrent-neural-network
10 | title: Sequence to Sequence Learning (seq2seq)
11 | ---
12 | This typically refers to the method originally described by Sutskever et al. in the paper
13 | [Sequence to Sequence Learning with Neural Networks][1].
14 |
15 | Feedforward neural networks and many other models can learn complex patterns, but require fixed-length
16 | input. This makes it difficult for these models to learn variable-length sequences. To solve this,
17 | the authors applied one [LSTM](/terms/long-short-term-memory-lstm/) to read the input seqeunce
18 | and a second LSTM to generate the output sequence.
19 |
20 | A few potential applications of sequence to sequence learning include:
21 |
22 | - Machine translation
23 | - Text summarization
24 | - Speech-to-text conversion
25 |
26 | [1]: https://arxiv.org/abs/1409.3215
--------------------------------------------------------------------------------
/terms/same-convolution.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool
4 | of tensorflow?
5 | link_url: https://stackoverflow.com/questions/37674306/what-is-the-difference-between-same-and-valid-padding-in-tf-nn-max-pool-of-t
6 | related_terms:
7 | - convolution
8 | - convolutional-neural-network-cnn
9 | - padding-convolution
10 | title: Same convolution
11 | ---
12 | A *same convolution* is a type of convolution where the output
13 | matrix is of the same dimension as the input matrix.
14 |
15 | For a $n \times n$ input matrix $A$ and a $f \times f$ filter matrix $F$,
16 | the output of the convolution $A * F$ is of dimension
17 | $\left \lfloor \frac{n + 2p - f}{s} \right \rfloor + 1 \times \left \lfloor \frac{n + 2p - f}{s} \right \rfloor + 1$
18 | where $s$ represents the stride length and
19 | $p$ represents the padding.
20 |
21 | In a same convolution:
22 |
23 | - $s$ is typically set to $1$
24 | - $p$ is set to $\frac{f - 1}{2}$
25 | - $f$ is an odd number
26 |
27 | The result is that $A$ is padded to be $n + p \times n + p$
28 | and $A * F$ becomes $n \times n$ -- the same as the original
29 | dimensions of $A$.
--------------------------------------------------------------------------------
/terms/valid-convolution.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool
4 | of tensorflow?
5 | link_url: https://stackoverflow.com/questions/37674306/what-is-the-difference-between-same-and-valid-padding-in-tf-nn-max-pool-of-t
6 | related_terms:
7 | - convolution
8 | - same-convolution
9 | - convolutional-neural-network-cnn
10 | - padding-convolution
11 | title: Valid convolution
12 | ---
13 | A *valid convolution* is a type of [convolution][1] operation that does not use any [padding][2] on the input.
14 |
15 | For an $n \times n$ input matrix and an $f \times f$ filter, a valid convolution
16 | will return an output matrix of dimensions
17 |
18 | $$
19 | \left \lfloor \frac{n - f}{s} + 1 \right \rfloor \times
20 | \left \lfloor \frac{n - f}{s} + 1 \right \rfloor
21 | $$
22 |
23 | where $s$ is the [stride][3] length of the convolution.
24 |
25 | This is in contrast to a [same convolution][4], which pads the
26 | $n \times n$ input matrix such that the output matrix is also $n
27 | \times n$.
28 |
29 | [1]: /terms/convolution/
30 | [2]: /terms/padding-convolution/
31 | [3]: /terms/stride-convolution/
32 | [4]: /terms/same-convolution/
33 |
--------------------------------------------------------------------------------
/terms/anchor-box.md:
--------------------------------------------------------------------------------
1 | ---
2 | title: Anchor box
3 | related_terms:
4 | - yolo-object-detection
5 | - computer-vision
6 | - convolutional-neural-network-cnn
7 | - bounding-box
8 | references:
9 | - link_title: Anchor Boxes - Convolutional Neural Networks - deeplearning.ai
10 | link_url: https://www.coursera.org/learn/convolutional-neural-networks/lecture/yNwO0/anchor-boxes
11 | ---
12 | *Anchor boxes* are a technique used in some [computer vision][4]
13 | [object detection][3] algorithms to help identify objects of different shapes.
14 |
15 | Anchor boxes are hand-picked boxes of different height/width ratios
16 | (for 2-dimensional boxes) designed to match the relative ratios of
17 | the object classes being detected. For example, an object detector
18 | that detects cars and people may have a wide anchor box to detect
19 | cars and a tall, narrow box to detect people.
20 |
21 | The [Fast R-CNN][1] paper introduced the idea of using the
22 | [$k$-means-clustering][2] to automatically determine the appropriate
23 | anchor box dimensions for a given $k$ number of anchor boxes.
24 |
25 | [1]: /terms/fast-r-cnn/
26 | [2]: /terms/k-means-clustering/
27 | [3]: /terms/object-detection/
28 | [4]: /terms/computer-vision/
29 |
--------------------------------------------------------------------------------
/terms/out-of-core.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Out-of-core algorithm
4 | link_url: https://en.wikipedia.org/wiki/Out-of-core_algorithm
5 | - link_title: Scaling with instances using out-of-core learning - scikit-learn documentation
6 | link_url: http://scikit-learn.org/stable/modules/scaling_strategies.html#scaling-with-instances-using-out-of-core-learning
7 | title: Out-of-core
8 | ---
9 | The term *out-of-core* typically refers to processing data that is too large
10 | to fit into a computer's main memory.
11 |
12 | Typically, when a dataset fits neatly into a computer's main memory,
13 | randomly accessing sections of data has a (relatively) small performance
14 | penalty.
15 |
16 | When data must be stored in a medium like a large spinning hard drive
17 | or an external computer network, it becomes very expensive to randomly
18 | seek to an arbitrary section of data or to process the same data
19 | multiple times.
20 |
21 | In such a case, an out-of-core algorithm would try to access all relevant
22 | data in one sequence.
23 |
24 | However, modern computers have a deep memory hierarchy, and replacing
25 | random access with sequential access can increase performance even
26 | on datasets that fit within memory.
--------------------------------------------------------------------------------
/terms/rectified-linear-unit-relu.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Why do we use ReLU in neural networks and how do we use it?
4 | link_url: https://stats.stackexchange.com/questions/226923/why-do-we-use-relu-in-neural-networks-and-how-do-we-use-it
5 | - link_title: What are the advantages of ReLU over sigmoid function in deep neural
6 | networks?
7 | link_url: https://stats.stackexchange.com/questions/126238/what-are-the-advantages-of-relu-over-sigmoid-function-in-deep-neural-networks
8 | related_terms:
9 | - activation-function
10 | - neural-network
11 | title: Rectified Linear Unit (ReLU)
12 | ---
13 | A Rectified Linear Unit is a common name for a neuron (the "unit")
14 | with an activation function of $f(x) = \max(0,x)$.
15 |
16 | Neural networks built with ReLU have the following advantages:
17 |
18 | - [gradient][1] computation is simpler because the activation
19 | function is computationally similar than comparable activation
20 | functions like $\tanh(x)$.
21 | - Neural networks with ReLU are less susceptible to
22 | the [vanishing gradient problem][2] but may suffer from
23 | the [dying ReLU problem][3].
24 |
25 | [1]: /terms/gradient/
26 | [2]: /terms/vanishing-gradient-problem/
27 | [3]: /terms/dying-relu/
--------------------------------------------------------------------------------
/_sass/tachyons/scss/_letter-spacing.scss:
--------------------------------------------------------------------------------
1 |
2 | // Converted Variables
3 |
4 |
5 | // Custom Media Query Variables
6 |
7 |
8 | /*
9 |
10 | LETTER SPACING
11 | Docs: http://tachyons.io/docs/typography/tracking/
12 |
13 | Media Query Extensions:
14 | -ns = not-small
15 | -m = medium
16 | -l = large
17 |
18 | */
19 |
20 | .tracked { letter-spacing: $letter-spacing-1; }
21 | .tracked-tight { letter-spacing: $letter-spacing-tight; }
22 | .tracked-mega { letter-spacing: $letter-spacing-2; }
23 |
24 | @media #{$breakpoint-not-small} {
25 | .tracked-ns { letter-spacing: $letter-spacing-1; }
26 | .tracked-tight-ns { letter-spacing: $letter-spacing-tight; }
27 | .tracked-mega-ns { letter-spacing: $letter-spacing-2; }
28 | }
29 |
30 | @media #{$breakpoint-medium} {
31 | .tracked-m { letter-spacing: $letter-spacing-1; }
32 | .tracked-tight-m { letter-spacing: $letter-spacing-tight; }
33 | .tracked-mega-m { letter-spacing: $letter-spacing-2; }
34 | }
35 |
36 | @media #{$breakpoint-large} {
37 | .tracked-l { letter-spacing: $letter-spacing-1; }
38 | .tracked-tight-l { letter-spacing: $letter-spacing-tight; }
39 | .tracked-mega-l { letter-spacing: $letter-spacing-2; }
40 | }
41 |
--------------------------------------------------------------------------------
/terms/deep-convolutional-generative-adversarial-network-dcgan.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Unsupervised Representation Learning with Deep Convolutional Generative
4 | Adversarial Networks
5 | link_url: https://arxiv.org/abs/1511.06434
6 | related_terms:
7 | - generative-adversarial-network-gan
8 | - convolutional-neural-network-cnn
9 | title: Deep Convolutional Generative Adversarial Network (DCGAN)
10 | ---
11 | DCGAN refers to a model described by [Radford, Metz, and Chintala][1]
12 | that uses deep convolutional neural networks in a generative adversarial network model.
13 |
14 | Generative adversarial networks (GANs) are structured as a competition between
15 | two models:
16 |
17 | 1. a generative model that tries to create fake examples of training data interspersed with real training data.
18 | 2. a discriminative model that tries to classify real examples from fake ones.
19 |
20 | DCGAN uses deep convolutional neural networks for both models. Convolutional neural networks (CNNs)
21 | are well-known for their performance on image data. DCGAN uses the strong performance of (CNNs)
22 | to learn [unsupervised representations][2] of the input data.
23 |
24 | [1]: https://arxiv.org/abs/1511.06434
25 | [2]: /terms/unsupervised-learning/
--------------------------------------------------------------------------------
/terms/multiple-crops-at-test-time.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: State of computer vision - Convolutional Neural Networks - deeplearning.ai
4 | link_url: https://www.coursera.org/learn/convolutional-neural-networks/lecture/D9ra2/state-of-computer-vision
5 | related_terms:
6 | - data-augmentation
7 | - alexnet
8 | title: Multiple crops at test time
9 | ---
10 | *Multi-crop at test time* is a form of data augmentation that a model uses
11 | during test time, as opposed to most data augmentation techniques
12 | that run during training time.
13 |
14 | Broadly, the technique involves:
15 |
16 | - cropping a test image in multiple ways
17 | - using the model to classify these cropped variants of the test image
18 | - averaging the results of the model's many predictions
19 |
20 | Multi-crop at test time is a technique that some machine learning researchers
21 | use to improve accuracy at test time. The technique
22 | found popularity among some competitors in the
23 | ImageNet Large Scale Visual Recognition Competition
24 | after the famous AlexNet paper, titled
25 | [ImageNet Classification with Deep Convolutional Neural Networks](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf), used
26 | the technique.
--------------------------------------------------------------------------------
/terms/long-short-term-memory-lstm.md:
--------------------------------------------------------------------------------
1 | ---
2 | references:
3 | - link_title: Long Short-Term Memory - Wikipedia
4 | link_url: https://en.wikipedia.org/wiki/Long_short-term_memory
5 | - link_title: LONG SHORT-TERM MEMORY
6 | link_url: http://www.bioinf.jku.at/publications/older/2604.pdf
7 | related_terms:
8 | - recurrent-neural-network
9 | - backpropagation
10 | title: Long Short-Term Memory (LSTM)
11 | ---
12 | Long short term memory (LSTM) networks try to reduce the vanishing and exploding gradient problem during the backpropagation in recurrent neural networks. LSTM are in general, a RNN where each neuron has a memory cell and three gates: input, output and forget. The purpose of the memory cell is to retain information previously used by the RNN, or forget if needed. LSTMs are explicitly designed to avoid the long-term dependency problem in RNNs, and have been shown to be able to learn complex sequences better than simple RNNs.
13 |
14 | The structure of a memory cell is: an input gate, that determines how much of information from the previous layer gets stored in the cell; the output gate, that determines how of the next layer gets to know about the state of the current cell; and the forget gate, which determines what to forget about the current state of the memory cell.
--------------------------------------------------------------------------------
/meta/unfinished.html:
--------------------------------------------------------------------------------
1 | ---
2 | layout: page
3 | title: "Meta: Unfinished Terms"
4 | ---
5 |
6 | This page contains links to terms that have short amounts of content. They
7 | should be expanded and turned into full-fledged glossary entries.
8 |