├── .ruby-version ├── terms ├── graph.md ├── rmsprop.md ├── antonym.md ├── clustering.md ├── lexeme.md ├── meronym.md ├── polysemy.md ├── troponym.md ├── affine-space.md ├── covariance.md ├── learning-rate.md ├── loss-function.md ├── pertainym.md ├── rand-index.md ├── random-search.md ├── early-stopping.md ├── expectation.md ├── hyperparameter.md ├── neural-network.md ├── optimization.md ├── conditional-gan.md ├── model-averaging.md ├── multi-armed-bandit.md ├── negative-sampling.md ├── regularization.md ├── confusion-matrix.md ├── hamming-distance.md ├── hierarchical-softmax.md ├── identity-mapping.md ├── importance-sampling.md ├── model-compression.md ├── sparse-autoencoder.md ├── wronskian-matrix.md ├── collaborative-filtering.md ├── convex-optimization.md ├── mixed-membership-model.md ├── stochastic-optimization.md ├── time-delayed-signal.md ├── attention-neural-networks.md ├── cosine-similarity-distance.md ├── chinese-restaurant-process.md ├── differential-topic-modeling.md ├── hidden-markov-model-hmm.md ├── hypernym.md ├── hyponym.md ├── minimal-matching-distance.md ├── exploding-gradient-problem.md ├── moore-penrose-pseudoinverse.md ├── natural-language-processing.md ├── co-clustering.md ├── grid-search.md ├── latent-dirichlet-allocation-lda.md ├── linear-discriminant-analysis-lda.md ├── markov-chain-monte-carlo-mcmc.md ├── skip-gram.md ├── conditional-markov-models-cmms.md ├── kullback-leibler-kl-divergence.md ├── max-margin-loss.md ├── pointwise-mutual-information-pmi.md ├── principal-component-analysis-pca.md ├── q-learning.md ├── regression-based-latent-factors.md ├── singular-value-decomposition-svd.md ├── underfitting.md ├── additive-clustering.md ├── backprop.md ├── minibatching.md ├── random-forest.md ├── stacked-autoencoder.md ├── bilingual-evaluation-understudy-bleu.md ├── clustering-stability.md ├── cross-entropy-loss.md ├── denoising-autoencoder.md ├── k-means-clustering.md ├── laplacian-matrix.md ├── maximum-a-posteriori-map-estimation.md ├── probabilistic-matrix-factorization-pmf.md ├── receiver-operating-characteristic-roc.md ├── spearman-rank-correlation-coefficient.md ├── gibbs-sampling.md ├── multilayer-lstm.md ├── stochastic-gradient-variational-bayes.md ├── bidirectional-lstm.md ├── black-box-optimization.md ├── nonparametric-clustering.md ├── textual-entailment.md ├── average-pooling.md ├── feature-learning.md ├── googlenet-neural-network.md ├── helvetica-scenario.md ├── parameter-sharing.md ├── contractive-autoencoder-cae.md ├── kernel-convolution.md ├── one-dimensional-convolution.md ├── triplet-loss-function.md ├── bayesian-optimization.md ├── jacobian-matrix.md ├── multinomial-distribution.md ├── object-detection.md ├── temporal-classification.md ├── learning-rate-annealing.md ├── vanishing-gradient-problem.md ├── collaborative-topic-regression.md ├── parametric-clustering.md ├── market-basket-analysis.md ├── max-pooling.md ├── nested-chinese-restaurant-process.md ├── pitman-yor-topic-modeling-pytm.md ├── hessian-matrix.md ├── hypernetwork.md ├── recursive-neural-network.md ├── hierarchical-dirichlet-process-hdp.md ├── latent-semantic-analysis-lsa.md ├── pagerank.md ├── point-estimator.md ├── yolo9000-object-detection.md ├── dirichlet-process.md ├── time-delayed-neural-network.md ├── variational-autoencoder-vae.md ├── minibatch-gradient-descent.md ├── narrow-convolution.md ├── categorical-mixture-model-cmm.md ├── probabilistic-latent-semantic-indexing-plsi.md ├── stochastic-gradient-descent-sgd.md ├── positive-pointwise-mutual-information-ppmi.md ├── additive-model.md ├── bayesian-probabilistic-matrix-factorization.md ├── maximum-likelihood-estimation-mle.md ├── wide-convolution.md ├── bidirectional-recurrent-neural-network-brnn.md ├── affinity-analysis.md ├── domain-adaptation.md ├── error-correcting-tournaments.md ├── random-optimization.md ├── gaussian-mixture-model-gmm.md ├── abscissa.md ├── autocorrelation-matrix.md ├── connectionism.md ├── hierarchical-latent-dirichlet-allocation-hlda.md ├── meta-learning.md ├── stochastic-block-model-sbm.md ├── differential-evolution.md ├── fast-fourier-transform-fft.md ├── finite-state-transducer-fst.md ├── maxout-activation-function.md ├── standard-deviation.md ├── boltzmann-machine.md ├── conditional-random-fields-crf.md ├── variation-of-information-distance.md ├── affinity-propagation-clustering.md ├── nonparametric.md ├── polya-urn-model.md ├── adaboost.md ├── pixel-recurrent-neural-network.md ├── trust-region-policy-optimization.md ├── likelihood.md ├── minimum-description-length-mdl-principle.md ├── nonparametric-regression.md ├── gap-statistic.md ├── global-average-pooling-gap.md ├── sequential-pattern-mining.md ├── community-structure.md ├── generalized-additive-model-gam.md ├── graph-neural-network.md ├── covariate-shift.md ├── indian-buffet-process.md ├── accams.md ├── support-vector-machine-svm.md ├── dirichlet-multinomial-distribution.md ├── mention-ranking-coreference-model.md ├── community-detection.md ├── gradient.md ├── mutual-information.md ├── temporal-generative-adversarial-network-tgan.md ├── adversarial-autoencoder.md ├── tabu-search.md ├── binary-tree-lstm.md ├── n-ary-tree-lstm.md ├── structured-bayesian-optimization-sbo.md ├── structured-learning.md ├── child-sum-tree-lstm.md ├── policy-gradient.md ├── backpropagation-through-time-bptt.md ├── bias.md ├── dependency-tree-lstm.md ├── fast-r-cnn.md ├── k-max-pooling.md ├── constituency-tree-lstm.md ├── dynamic-k-max-pooling.md ├── stride-convolution.md ├── weighted-finite-state-transducer-wfst.md ├── abstractive-sentence-summarization.md ├── named-entity-recognition-ner.md ├── poisson-additive-co-clustering-paco.md ├── test-term.md ├── deep-learning.md ├── independent-identically-distributed-iid.md ├── doc2vec.md ├── facet.md ├── mention-pair-coreference-model.md ├── smooth-support-vector-machine-ssvm.md ├── yolov2-object-detection.md ├── association-rule-mining.md ├── reinforce-policy-gradient-algorithm.md ├── textrank.md ├── alternating-conditional-expectation-ace-algorithm.md ├── bagging.md ├── continuous-bag-of-words-cbow.md ├── expectation-maximization-em-algorithm.md ├── passive-aggressive-algorithm.md ├── decision-tree.md ├── sequential-model-based-optimization-smbo.md ├── second-order-information.md ├── boosting.md ├── contextual-bandit.md ├── perplexity.md ├── adversarial-variational-bayes.md ├── gradient-descent.md ├── nchw.md ├── transduction.md ├── adadelta.md ├── bounding-box.md ├── exponential-linear-unit-elu.md ├── gradient-clipping.md ├── multidimensional-recurrent-neural-network-mdrnn.md ├── computer-vision.md ├── learning-rate-decay.md ├── synset.md ├── face-detection.md ├── supervised-learning.md ├── unsupervised-learning.md ├── negative-log-likelihood.md ├── derivative-free-optimization.md ├── glove-word-embeddings.md ├── overfitting.md ├── representation-learning.md ├── jaccard-index.md ├── chunking.md ├── learning-to-rank-ltr.md ├── r-cnn.md ├── recurrent-neural-network-language-model-rnnlm.md ├── nhwc.md ├── fasttext.md ├── hinge-loss.md ├── dimensionality-reduction.md ├── paragraph-vector.md ├── backpropagation.md ├── hessian-free-optimization.md ├── meteor-machine-translation.md ├── search-based-software-engineering-sbse.md ├── no-free-lunch-nfl-theorem.md ├── softmax.md ├── top-5-error-rate.md ├── inceptionism.md ├── reinforcement-learning.md ├── hadamard-product.md ├── filter-convolution.md ├── data-parallelism.md ├── stochastic-convex-hull-sch.md ├── convolutional-neural-network-cnn.md ├── extractive-sentence-summarization.md ├── first-order-information.md ├── named-entity-recognition-in-query-nerq.md ├── parameter-budget.md ├── word2phrase.md ├── reparameterization-trick.md ├── variance.md ├── face-recognition.md ├── stationary-environment.md ├── distributional-similarity.md ├── latent-semantic-indexing-lsi.md ├── top-1-error-rate.md ├── sobel-filter-convolution.md ├── activation-function.md ├── gram-matrix.md ├── one-shot-learning.md ├── tree-lstm.md ├── lenet.md ├── face-verification.md ├── alexnet.md ├── weight-sharing.md ├── co-adaptation.md ├── hypergraph.md ├── adagrad.md ├── facets-tool.md ├── he-initialization.md ├── leaky-relu.md ├── resnet.md ├── adaptive-learning-rate.md ├── model-parallelism.md ├── coreference-resolution.md ├── momentum-optimization.md ├── convex-combination.md ├── one-hot-encoding.md ├── part-of-speech-pos-tagging.md ├── termite.md ├── imputation.md ├── random-initialization.md ├── bias-variance-tradeoff.md ├── syntaxnet.md ├── codebook-collapse.md ├── margin.md ├── latent-dirichlet-allocation-differential-evolution-ldade.md ├── dying-relu.md ├── zero-shot-learning.md ├── bit-transparency-audio.md ├── sense2vec.md ├── adam-optimizer.md ├── symmetry-breaking.md ├── inverted-dropout.md ├── siamese-neural-network.md ├── unk.md ├── pseudo-labeling.md ├── convex-hull.md ├── neural-turing-machine-ntm.md ├── vector-quantized-variational-autoencoder-vqvae.md ├── mean-reciprocal-rank-mrr.md ├── vggnet.md ├── catastrophic-forgetting.md ├── connectionist-temporal-classification-ctc.md ├── language-segmentation.md ├── padding-convolution.md ├── batch-normalization.md ├── distance-metric.md ├── yolo-object-detection.md ├── bag-of-n-grams.md ├── facet-plotting.md ├── weak-supervision.md ├── neural-checklist-model.md ├── distributed-representation.md ├── word2vec.md ├── sequence-to-sequence-learning-seq2seq.md ├── same-convolution.md ├── valid-convolution.md ├── anchor-box.md ├── out-of-core.md ├── rectified-linear-unit-relu.md ├── deep-convolutional-generative-adversarial-network-dcgan.md ├── multiple-crops-at-test-time.md └── long-short-term-memory-lstm.md ├── .gitignore ├── CONTRIBUTING.md ├── _includes ├── banners │ ├── README.md │ ├── banner_icon.html │ ├── incomplete_term.html │ ├── needs_review.html │ └── default.html ├── synonyms.html └── references.html ├── 404.md ├── images ├── faceting.png ├── margin.png ├── overfitting.png ├── hoovertowernight.jpg ├── starry_night_google.jpg └── starry_stanford_bigger.png ├── LICENSE.md ├── .gitattributes ├── _sass └── tachyons │ ├── scss │ ├── _styles.scss │ ├── _code.scss │ ├── _lists.scss │ ├── _images.scss │ ├── _debug_children.scss │ ├── _forms.scss │ ├── _debug-children.scss │ ├── _module-template.scss │ ├── _gradients.scss │ ├── _opacity.scss │ ├── _links.scss │ ├── _box-sizing.scss │ ├── _font-style.scss │ ├── _tables.scss │ ├── _white-space.scss │ ├── _outlines.scss │ ├── _word-break.scss │ ├── _text-align.scss │ ├── _background-size.scss │ ├── _clears.scss │ ├── _position.scss │ ├── _text-decoration.scss │ ├── _line-height.scss │ ├── _utilities.scss │ ├── _vertical-align.scss │ └── _letter-spacing.scss │ ├── readme.md │ └── license.txt ├── index.md ├── netlify.toml ├── DEPENDENCIES.md ├── serve.sh ├── meta ├── needs-review.html ├── no-references.html ├── index.md ├── no-related.html └── unfinished.html ├── Gemfile ├── _build ├── entrypoint.sh └── install-git-lfs.sh ├── _layouts ├── redirect.html └── page.html ├── admin ├── index.html └── config.yml ├── all.html └── acronyms.html /.ruby-version: -------------------------------------------------------------------------------- 1 | 2.7.1 -------------------------------------------------------------------------------- /terms/graph.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Graph 3 | --- -------------------------------------------------------------------------------- /terms/rmsprop.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: RMSProp 3 | --- -------------------------------------------------------------------------------- /terms/antonym.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Antonym 3 | --- 4 | -------------------------------------------------------------------------------- /terms/clustering.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Clustering 3 | --- -------------------------------------------------------------------------------- /terms/lexeme.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Lexeme 3 | --- 4 | -------------------------------------------------------------------------------- /terms/meronym.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Meronym 3 | --- 4 | -------------------------------------------------------------------------------- /terms/polysemy.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Polysemy 3 | --- 4 | -------------------------------------------------------------------------------- /terms/troponym.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Troponym 3 | --- 4 | -------------------------------------------------------------------------------- /terms/affine-space.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Affine space 3 | --- -------------------------------------------------------------------------------- /terms/covariance.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Covariance 3 | --- 4 | -------------------------------------------------------------------------------- /terms/learning-rate.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Learning rate 3 | --- -------------------------------------------------------------------------------- /terms/loss-function.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Loss function 3 | --- -------------------------------------------------------------------------------- /terms/pertainym.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Pertainym 3 | --- 4 | -------------------------------------------------------------------------------- /terms/rand-index.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Rand Index 3 | --- 4 | -------------------------------------------------------------------------------- /terms/random-search.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Random search 3 | --- -------------------------------------------------------------------------------- /terms/early-stopping.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Early stopping 3 | --- -------------------------------------------------------------------------------- /terms/expectation.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Expectation 3 | --- 4 | -------------------------------------------------------------------------------- /terms/hyperparameter.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Hyperparameter 3 | --- -------------------------------------------------------------------------------- /terms/neural-network.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Neural network 3 | --- -------------------------------------------------------------------------------- /terms/optimization.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Optimization 3 | --- 4 | -------------------------------------------------------------------------------- /terms/conditional-gan.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Conditional GAN 3 | --- 4 | -------------------------------------------------------------------------------- /terms/model-averaging.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Model averaging 3 | --- 4 | -------------------------------------------------------------------------------- /terms/multi-armed-bandit.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Multi-Armed Bandit 3 | --- -------------------------------------------------------------------------------- /terms/negative-sampling.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Negative Sampling 3 | --- -------------------------------------------------------------------------------- /terms/regularization.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Regularization 3 | --- 4 | -------------------------------------------------------------------------------- /terms/confusion-matrix.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Confusion matrix 3 | --- 4 | -------------------------------------------------------------------------------- /terms/hamming-distance.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Hamming distance 3 | --- 4 | -------------------------------------------------------------------------------- /terms/hierarchical-softmax.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Hierarchical Softmax 3 | --- -------------------------------------------------------------------------------- /terms/identity-mapping.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Identity mapping 3 | --- 4 | -------------------------------------------------------------------------------- /terms/importance-sampling.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Importance sampling 3 | --- -------------------------------------------------------------------------------- /terms/model-compression.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Model compression 3 | --- 4 | -------------------------------------------------------------------------------- /terms/sparse-autoencoder.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Sparse autoencoder 3 | --- 4 | -------------------------------------------------------------------------------- /terms/wronskian-matrix.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Wronskian matrix 3 | --- 4 | -------------------------------------------------------------------------------- /terms/collaborative-filtering.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Collaborative filtering 3 | --- -------------------------------------------------------------------------------- /terms/convex-optimization.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Convex optimization 3 | --- 4 | -------------------------------------------------------------------------------- /terms/mixed-membership-model.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Mixed-membership model 3 | --- -------------------------------------------------------------------------------- /terms/stochastic-optimization.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Stochastic Optimization 3 | --- -------------------------------------------------------------------------------- /terms/time-delayed-signal.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Time-delayed signal 3 | --- 4 | -------------------------------------------------------------------------------- /terms/attention-neural-networks.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Attention Mechanism 3 | --- 4 | -------------------------------------------------------------------------------- /terms/cosine-similarity-distance.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Cosine similarity 3 | --- 4 | -------------------------------------------------------------------------------- /terms/chinese-restaurant-process.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Chinese Restaurant Process 3 | --- -------------------------------------------------------------------------------- /terms/differential-topic-modeling.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Differential Topic Modeling 3 | --- -------------------------------------------------------------------------------- /terms/hidden-markov-model-hmm.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Hidden Markov Models (HMMs) 3 | --- 4 | -------------------------------------------------------------------------------- /terms/hypernym.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Hypernym 3 | related_terms: 4 | - hypernym 5 | --- 6 | -------------------------------------------------------------------------------- /terms/hyponym.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Hyponym 3 | related_terms: 4 | - hyponym 5 | --- 6 | -------------------------------------------------------------------------------- /terms/minimal-matching-distance.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Minimal matching distance 3 | --- 4 | -------------------------------------------------------------------------------- /terms/exploding-gradient-problem.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Exploding gradient problem 3 | --- 4 | -------------------------------------------------------------------------------- /terms/moore-penrose-pseudoinverse.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Moore-Penrose Pseudoinverse 3 | --- 4 | -------------------------------------------------------------------------------- /terms/natural-language-processing.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Natural Language Processing 3 | --- 4 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | _site 2 | .sass-cache 3 | .jekyll-metadata 4 | .vscode 5 | .bundle 6 | .jekyll-cache -------------------------------------------------------------------------------- /terms/co-clustering.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Co-clustering 3 | related_terms: 4 | - clustering 5 | --- -------------------------------------------------------------------------------- /terms/grid-search.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Grid search 3 | related_terms: 4 | - random-search 5 | --- -------------------------------------------------------------------------------- /terms/latent-dirichlet-allocation-lda.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Latent Dirichlet allocation (LDA) 3 | --- -------------------------------------------------------------------------------- /terms/linear-discriminant-analysis-lda.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Linear discriminant analysis (LDA) 3 | --- -------------------------------------------------------------------------------- /terms/markov-chain-monte-carlo-mcmc.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Markov Chain Monte Carlo (MCMC) 3 | --- 4 | -------------------------------------------------------------------------------- /terms/skip-gram.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Skip-Gram 3 | related_terms: 4 | - word-embedding 5 | --- 6 | -------------------------------------------------------------------------------- /terms/conditional-markov-models-cmms.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Conditional Markov Models (CMMs) 3 | --- 4 | -------------------------------------------------------------------------------- /terms/kullback-leibler-kl-divergence.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Kullback-Leibler (KL) divergence 3 | --- 4 | -------------------------------------------------------------------------------- /terms/max-margin-loss.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Max-margin loss 3 | related_terms: 4 | - loss-function 5 | --- -------------------------------------------------------------------------------- /terms/pointwise-mutual-information-pmi.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Pointwise Mutual Information (PMI) 3 | --- 4 | -------------------------------------------------------------------------------- /terms/principal-component-analysis-pca.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Principal Component Analysis (PCA) 3 | --- 4 | -------------------------------------------------------------------------------- /terms/q-learning.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Q-learning 3 | related_terms: 4 | - reinforcement-learning 5 | --- -------------------------------------------------------------------------------- /terms/regression-based-latent-factors.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Regression based latent factors (RLFM) 3 | --- -------------------------------------------------------------------------------- /terms/singular-value-decomposition-svd.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Singular Value Decomposition (SVD) 3 | --- 4 | -------------------------------------------------------------------------------- /terms/underfitting.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Underfitting 3 | related_terms: 4 | - overfitting 5 | --- 6 | -------------------------------------------------------------------------------- /terms/additive-clustering.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Additive clustering 3 | related_terms: 4 | - clustering 5 | --- -------------------------------------------------------------------------------- /terms/backprop.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Backprop 3 | layout: redirect 4 | destination: backpropagation 5 | --- 6 | -------------------------------------------------------------------------------- /terms/minibatching.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Mini-Batching 3 | destination: minibatch-gradient-descent 4 | --- 5 | -------------------------------------------------------------------------------- /terms/random-forest.md: -------------------------------------------------------------------------------- 1 | --- 2 | related_terms: 3 | - decision-tree 4 | title: Random Forest (RF) 5 | --- 6 | -------------------------------------------------------------------------------- /terms/stacked-autoencoder.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Stacked autoencoder 3 | related_terms: 4 | - autoencoder 5 | --- -------------------------------------------------------------------------------- /CONTRIBUTING.md: -------------------------------------------------------------------------------- 1 | # Contributing to Machine Learning Glossary 2 | 3 | (Remind James to fill this out later.) 4 | -------------------------------------------------------------------------------- /terms/bilingual-evaluation-understudy-bleu.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Bilingual Evaluation Understudy (BLEU) 3 | --- 4 | -------------------------------------------------------------------------------- /terms/clustering-stability.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Clustering stability 3 | related_terms: 4 | - clustering 5 | --- -------------------------------------------------------------------------------- /terms/cross-entropy-loss.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Cross-Entropy loss 3 | related_terms: 4 | - loss-function 5 | --- 6 | -------------------------------------------------------------------------------- /terms/denoising-autoencoder.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Denoising autoencoder 3 | related_terms: 4 | - autoencoder 5 | --- -------------------------------------------------------------------------------- /terms/k-means-clustering.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: K-Means clustering 3 | related_terms: 4 | - clustering 5 | --- 6 | -------------------------------------------------------------------------------- /terms/laplacian-matrix.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Laplacian matrix 3 | related_terms: 4 | - wronskian-matrix 5 | --- 6 | -------------------------------------------------------------------------------- /terms/maximum-a-posteriori-map-estimation.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Maximum A Posteriori (MAP) Estimation 3 | --- 4 | -------------------------------------------------------------------------------- /terms/probabilistic-matrix-factorization-pmf.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Probabilistic Matrix Factorization (PMF) 3 | --- -------------------------------------------------------------------------------- /terms/receiver-operating-characteristic-roc.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Receiver Operating Characteristic (ROC) 3 | --- 4 | -------------------------------------------------------------------------------- /terms/spearman-rank-correlation-coefficient.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Spearman's Rank Correlation Coefficient 3 | --- 4 | -------------------------------------------------------------------------------- /_includes/banners/README.md: -------------------------------------------------------------------------------- 1 | This directory contains informational banners that are displayed 2 | above a term. 3 | -------------------------------------------------------------------------------- /terms/gibbs-sampling.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Gibbs sampling 3 | related_terms: 4 | - markov-chain-monte-carlo-mcmc 5 | --- -------------------------------------------------------------------------------- /terms/multilayer-lstm.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Multilayer LSTM 3 | related_terms: 4 | - long-short-term-memory-lstm 5 | --- -------------------------------------------------------------------------------- /terms/stochastic-gradient-variational-bayes.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Stochastic Gradient Variational Bayes (SGVB) 3 | --- 4 | -------------------------------------------------------------------------------- /terms/bidirectional-lstm.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Bidirectional LSTM 3 | related_terms: 4 | - long-short-term-memory-lstm 5 | --- -------------------------------------------------------------------------------- /terms/black-box-optimization.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Black-Box optimization 3 | related_terms: 4 | - optimization 5 | --- 6 | -------------------------------------------------------------------------------- /terms/nonparametric-clustering.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Nonparametric clustering 3 | related_terms: 4 | - clustering 5 | --- 6 | -------------------------------------------------------------------------------- /terms/textual-entailment.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Textual entailment 3 | layout: redirect 4 | destination: entailment 5 | --- 6 | -------------------------------------------------------------------------------- /terms/average-pooling.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Average pooling 3 | related_terms: 4 | - pooling-layer 5 | - max-pooling 6 | --- 7 | -------------------------------------------------------------------------------- /terms/feature-learning.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Feature learning 3 | layout: redirect 4 | destination: representation-learning 5 | --- -------------------------------------------------------------------------------- /terms/googlenet-neural-network.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: GoogLeNet 3 | related_terms: 4 | - convolutional-neural-network-cnn 5 | --- -------------------------------------------------------------------------------- /terms/helvetica-scenario.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Helvetica scenario 3 | layout: redirect 4 | destination: mode-collapse 5 | --- 6 | -------------------------------------------------------------------------------- /terms/parameter-sharing.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Parameter sharing 3 | layout: redirect 4 | destination: weight-sharing 5 | --- 6 | -------------------------------------------------------------------------------- /terms/contractive-autoencoder-cae.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Contractive autoencoder (CAE) 3 | related_terms: 4 | - autoencoder 5 | --- 6 | -------------------------------------------------------------------------------- /terms/kernel-convolution.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Kernel (convolution) 3 | layout: redirect 4 | destination: filter-convolution 5 | --- 6 | -------------------------------------------------------------------------------- /terms/one-dimensional-convolution.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: One-dimensional convolution 3 | related_terms: 4 | - convolution 5 | --- 6 | -------------------------------------------------------------------------------- /terms/triplet-loss-function.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Triplet loss function 3 | layout: redirect 4 | destination: triplet-loss 5 | --- 6 | -------------------------------------------------------------------------------- /terms/bayesian-optimization.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Bayesian optimization 3 | related_terms: 4 | - optimization 5 | - hyperparameter 6 | --- -------------------------------------------------------------------------------- /terms/jacobian-matrix.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Jacobian matrix 3 | related_terms: 4 | - laplacian-matrix 5 | - wronskian-matrix 6 | --- 7 | -------------------------------------------------------------------------------- /terms/multinomial-distribution.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Multinomial distribution 3 | related_terms: 4 | - multinomial-mixture-model-mmm 5 | --- -------------------------------------------------------------------------------- /terms/object-detection.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Object detection 3 | related_terms: 4 | - object-localization 5 | - computer-vision 6 | --- 7 | -------------------------------------------------------------------------------- /terms/temporal-classification.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Temporal classification 3 | related_terms: 4 | - recurrent-neural-network 5 | --- 6 | -------------------------------------------------------------------------------- /terms/learning-rate-annealing.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Learning rate annealing 3 | layout: redirect 4 | destination: learning-rate-decay 5 | --- 6 | -------------------------------------------------------------------------------- /terms/vanishing-gradient-problem.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Vanishing gradient problem 3 | related_terms: 4 | - exploding-gradient-problem 5 | --- 6 | -------------------------------------------------------------------------------- /terms/collaborative-topic-regression.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Collaborative Topic Regression (CTR) 3 | related_terms: 4 | - collaborative-filtering 5 | --- -------------------------------------------------------------------------------- /terms/parametric-clustering.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Parametric clustering 3 | related_terms: 4 | - nonparametric-clustering 5 | - clustering 6 | --- 7 | -------------------------------------------------------------------------------- /terms/market-basket-analysis.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Market basket analysis 3 | related_terms: 4 | - affinity-analysis 5 | - collaborative-filtering 6 | --- -------------------------------------------------------------------------------- /terms/max-pooling.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Max Pooling 3 | related_terms: 4 | - dynamic-k-max-pooling 5 | - k-max-pooling 6 | - pooling-layer 7 | --- 8 | -------------------------------------------------------------------------------- /terms/nested-chinese-restaurant-process.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Nested Chinese Restaurant Process 3 | related_terms: 4 | - chinese-restaurant-process 5 | --- -------------------------------------------------------------------------------- /terms/pitman-yor-topic-modeling-pytm.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Pitman-Yor Topic Modeling (PYTM) 3 | related_terms: 4 | - latent-dirichlet-allocation-lda 5 | --- -------------------------------------------------------------------------------- /terms/hessian-matrix.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Hessian matrix 3 | related_terms: 4 | - jacobian-matrix 5 | - laplacian-matrix 6 | - wronskian-matrix 7 | --- 8 | -------------------------------------------------------------------------------- /terms/hypernetwork.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Hypernetworks 4 | link_url: https://arxiv.org/abs/1609.09106 5 | title: Hypernetwork 6 | --- 7 | -------------------------------------------------------------------------------- /terms/recursive-neural-network.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Recursive Neural Network 3 | related_terms: 4 | - recurrent-neural-network 5 | - neural-network 6 | --- 7 | -------------------------------------------------------------------------------- /404.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: 404 Error 3 | layout: page 4 | --- 5 | We couldn't find the page you were looking for. 6 | 7 | [Click here](/) to return to the home page. -------------------------------------------------------------------------------- /images/faceting.png: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:6811ec8ed8c1ce67cd377e544d4a5f7c38d0fd50a43e928a3adb17ee13f2b70a 3 | size 27152 4 | -------------------------------------------------------------------------------- /images/margin.png: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:3006804d951bf380d18f285c19414f4e8c491e07923e4b682fbf239d0acdb85b 3 | size 79821 4 | -------------------------------------------------------------------------------- /terms/hierarchical-dirichlet-process-hdp.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Hierarchical Dirichlet process (HDP) 3 | related_terms: 4 | - latent-dirichlet-allocation-lda 5 | --- -------------------------------------------------------------------------------- /terms/latent-semantic-analysis-lsa.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Latent semantic analysis (LSA) 3 | layout: redirect 4 | destination: latent-semantic-indexing-lsi 5 | --- 6 | -------------------------------------------------------------------------------- /terms/pagerank.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: PageRank - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/PageRank 5 | title: PageRank 6 | --- 7 | -------------------------------------------------------------------------------- /terms/point-estimator.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Point Estimator 3 | --- 4 | A point estimator estimates population parameters (e.g. mean, variance) with sample data. 5 | -------------------------------------------------------------------------------- /terms/yolo9000-object-detection.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: YOLO9000 (object detection algorithm) 3 | layout: redirect 4 | destination: yolov2-object-detection 5 | --- 6 | -------------------------------------------------------------------------------- /images/overfitting.png: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:6901a8a57a6b7abe9e89f93a9f20eaf0d3a518dda2f4a3d4d9938ad709b2b4a5 3 | size 14447 4 | -------------------------------------------------------------------------------- /terms/dirichlet-process.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Dirichlet process 3 | related_terms: 4 | - hierarchical-dirichlet-process-hdp 5 | - latent-dirichlet-allocation-lda 6 | --- -------------------------------------------------------------------------------- /terms/time-delayed-neural-network.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Time-delayed neural network 3 | related_terms: 4 | - neural-network 5 | - recurrent-neural-network 6 | --- 7 | -------------------------------------------------------------------------------- /terms/variational-autoencoder-vae.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Variational Autoencoder (VAE) 3 | related_terms: 4 | - autoencoder 5 | - generative-adversarial-network-gan 6 | --- -------------------------------------------------------------------------------- /images/hoovertowernight.jpg: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:8852910ed1594af535c3dee012b020b9ee4045798645885d20a9445d12e98093 3 | size 216598 4 | -------------------------------------------------------------------------------- /images/starry_night_google.jpg: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:c31741a09141c2e2d07fc7aa6bdf135c369b804f9433e43fa3ceac19386d1ee6 3 | size 613337 4 | -------------------------------------------------------------------------------- /terms/minibatch-gradient-descent.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Minibatch Gradient Descent 3 | related_terms: 4 | - stochastic-gradient-descent-sgd 5 | - gradient-descent 6 | --- 7 | -------------------------------------------------------------------------------- /terms/narrow-convolution.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Narrow convolution 3 | related_terms: 4 | - convolutional-neural-network-cnn 5 | - convolution 6 | - zero-padding 7 | --- 8 | -------------------------------------------------------------------------------- /images/starry_stanford_bigger.png: -------------------------------------------------------------------------------- 1 | version https://git-lfs.github.com/spec/v1 2 | oid sha256:fa77fa5115d1d60be333e82a6583b7a15226137de0cd55323b6e197a4c762881 3 | size 13376089 4 | -------------------------------------------------------------------------------- /terms/categorical-mixture-model-cmm.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Categorical mixture model 3 | related_terms: 4 | - multinomial-mixture-model-mmm 5 | - gaussian-mixture-model-gmm 6 | --- -------------------------------------------------------------------------------- /terms/probabilistic-latent-semantic-indexing-plsi.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Probabilistic Latent Semantic Indexing (PLSI) 3 | related_terms: 4 | - latent-semantic-indexing-lsi 5 | --- -------------------------------------------------------------------------------- /terms/stochastic-gradient-descent-sgd.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Stochastic Gradient Descent (SGD) 3 | related_terms: 4 | - stochastic-optimization 5 | - gradient-descent 6 | --- 7 | -------------------------------------------------------------------------------- /terms/positive-pointwise-mutual-information-ppmi.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Positive Pointwise Mutual Information (PPMI) 3 | related_terms: 4 | - pointwise-mutual-information-pmi 5 | --- 6 | -------------------------------------------------------------------------------- /terms/additive-model.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Additive model - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Additive_model 5 | title: Additive model 6 | --- 7 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | The content of Machine Learning Glossary is licensed under 2 | a [Creative Commons Attribution 4.0 International License][1] 3 | 4 | [1]: https://creativecommons.org/licenses/by/4.0/ -------------------------------------------------------------------------------- /terms/bayesian-probabilistic-matrix-factorization.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Bayesian Probabilistic Matrix Factorization (BPMF) 3 | related_terms: 4 | - probabilistic-matrix-factorization-pmf 5 | --- -------------------------------------------------------------------------------- /terms/maximum-likelihood-estimation-mle.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Maximum Likelihood Estimation (MLE) 3 | related_terms: 4 | - maximum-a-posteriori-map-estimation 5 | - negative-log-likelihood 6 | --- -------------------------------------------------------------------------------- /terms/wide-convolution.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Wide convolution 3 | related_terms: 4 | - narrow-convolution 5 | - convolution 6 | - convolutional-neural-network-cnn 7 | - zero-padding 8 | --- 9 | -------------------------------------------------------------------------------- /terms/bidirectional-recurrent-neural-network-brnn.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Bidirectional Recurrent Neural Network (BRNN) 3 | related_terms: 4 | - recurrent-neural-network 5 | - bidirectional-lstm 6 | --- -------------------------------------------------------------------------------- /.gitattributes: -------------------------------------------------------------------------------- 1 | *.png filter=lfs diff=lfs merge=lfs -text 2 | *.jpg filter=lfs diff=lfs merge=lfs -text 3 | *.jpeg filter=lfs diff=lfs merge=lfs -text 4 | *.gif filter=lfs diff=lfs merge=lfs -text 5 | -------------------------------------------------------------------------------- /terms/affinity-analysis.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Affinity analysis - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Affinity_analysis 5 | title: Affinity analysis 6 | --- 7 | -------------------------------------------------------------------------------- /terms/domain-adaptation.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Domain adaptation - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Domain_adaptation 5 | title: Domain adaptation 6 | --- 7 | -------------------------------------------------------------------------------- /terms/error-correcting-tournaments.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Error-Correcting Tournaments 4 | link_url: https://arxiv.org/abs/0902.3176 5 | title: Error-Correcting Tournaments 6 | --- 7 | -------------------------------------------------------------------------------- /terms/random-optimization.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Random optimization - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Random_optimization 5 | title: Random optimization 6 | --- 7 | -------------------------------------------------------------------------------- /_sass/tachyons/scss/_styles.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | STYLES 11 | 12 | Add custom styles here. 13 | 14 | */ 15 | 16 | -------------------------------------------------------------------------------- /terms/gaussian-mixture-model-gmm.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Mixture model - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Mixture_model 5 | title: Gaussian mixture model (GMM) 6 | --- 7 | -------------------------------------------------------------------------------- /terms/abscissa.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Abscissa 3 | related_terms: 4 | - ordinate 5 | --- 6 | 7 | **Abscissa** is an obscure term for referring to the horizontal axis (the x-axis) on a two-dimensional coordinate plane. 8 | -------------------------------------------------------------------------------- /terms/autocorrelation-matrix.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Autocorrelation matrix - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Autocorrelation_matrix 5 | title: Autocorrelation matrix 6 | --- 7 | -------------------------------------------------------------------------------- /terms/connectionism.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Connectionism - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Connectionism 5 | related_terms: 6 | - neural-network 7 | title: Connectionism 8 | --- 9 | -------------------------------------------------------------------------------- /terms/hierarchical-latent-dirichlet-allocation-hlda.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Hierarchical Latent Dirichlet allocation (hLDA) 3 | related_terms: 4 | - latent-dirichlet-allocation-lda 5 | - nested-chinese-restaurant-process 6 | --- -------------------------------------------------------------------------------- /terms/meta-learning.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Meta learning (computer science) - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Meta_learning_(computer_science 5 | title: Meta learning 6 | --- 7 | -------------------------------------------------------------------------------- /terms/stochastic-block-model-sbm.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Stochastic block model (SBM) 3 | --- 4 | The stochastic block model is a common model for detection 5 | of [community structure][1] 6 | 7 | [1]: /terms/community-structure/ -------------------------------------------------------------------------------- /terms/differential-evolution.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Differential evolution - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Differential_evolution 5 | title: Differential Evolution (DE) 6 | --- 7 | -------------------------------------------------------------------------------- /terms/fast-fourier-transform-fft.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Fast Fourier transform - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Fast_Fourier_transform 5 | title: Fast Fourier transform (FFT) 6 | --- 7 | -------------------------------------------------------------------------------- /terms/finite-state-transducer-fst.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Finite state transducer - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Finite-state_transducer 5 | title: Finite-state transducer (FST) 6 | --- 7 | -------------------------------------------------------------------------------- /terms/maxout-activation-function.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Maxout networks 4 | link_url: https://arxiv.org/abs/1302.4389 5 | related_terms: 6 | - activation-function 7 | - dropout 8 | title: Maxout 9 | --- 10 | -------------------------------------------------------------------------------- /terms/standard-deviation.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Standard deviation - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Standard_deviation 5 | related_terms: 6 | - variance 7 | title: Standard deviation 8 | --- 9 | -------------------------------------------------------------------------------- /terms/boltzmann-machine.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Boltzmann machine - Wikipedia 4 | link_url: http://en.wikipedia.org/wiki/Boltzmann_machine 5 | related_terms: 6 | - hopfield-network-hn 7 | title: Boltzmann machine 8 | --- 9 | -------------------------------------------------------------------------------- /terms/conditional-random-fields-crf.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Conditional Random Fields (CRFs) 3 | references: 4 | - link_title: Conditional Random Field - Wikipedia 5 | link_url: http://en.wikipedia.org/wiki/Conditional_Random_Field 6 | --- 7 | -------------------------------------------------------------------------------- /terms/variation-of-information-distance.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Variation of information - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Variation_of_information 5 | title: Variation of Information distance 6 | --- 7 | -------------------------------------------------------------------------------- /terms/affinity-propagation-clustering.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Affinity propagation 4 | link_url: https://en.wikipedia.org/wiki/Affinity_propagation 5 | related_terms: 6 | - clustering 7 | title: Affinity propagation clustering 8 | --- 9 | -------------------------------------------------------------------------------- /terms/nonparametric.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Nonparametric statistics - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Nonparametric_statistics 5 | related_terms: 6 | - nonparametric-clustering 7 | title: Nonparametric 8 | --- 9 | -------------------------------------------------------------------------------- /terms/polya-urn-model.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: "P\xF3lya urn model - Wikipedia" 4 | link_url: https://en.wikipedia.org/wiki/P%C3%B3lya_urn_model 5 | related_terms: 6 | - chinese-restaurant-process 7 | title: "P\xF3lya urn model" 8 | --- 9 | -------------------------------------------------------------------------------- /terms/adaboost.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: AdaBoost 3 | --- 4 | AdaBoost, short for Adaptive Boosting, aka Weight Boosted Trees, uses the same training set over and over. Increases the margin (separation) like [SVMs][1]. 5 | 6 | [1]: /terms/support-vector-machine/ 7 | -------------------------------------------------------------------------------- /terms/pixel-recurrent-neural-network.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Pixel Recurrent Neural Networks 4 | link_url: https://arxiv.org/abs/1601.06759 5 | related_terms: 6 | - recurrent-neural-network 7 | title: Pixel Recurrent Neural Network 8 | --- 9 | -------------------------------------------------------------------------------- /terms/trust-region-policy-optimization.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Trust Region Policy Optimization 4 | link_url: https://arxiv.org/abs/1502.05477 5 | related_terms: 6 | - optimization 7 | title: Trust Region Policy Optimization (TRPO) 8 | --- 9 | -------------------------------------------------------------------------------- /terms/likelihood.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Likelihood 3 | --- 4 | In statistics, likelihood is the *hypothetical probability* that a past event 5 | would yield a specific outcome. Probability is concerned with the future, 6 | but likelihood is concerned with the past. 7 | -------------------------------------------------------------------------------- /terms/minimum-description-length-mdl-principle.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Minimum description length - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Minimum_description_length 5 | title: Minimum description length (MDL) principle 6 | --- 7 | -------------------------------------------------------------------------------- /terms/nonparametric-regression.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Nonparametric regression - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Nonparametric_regression 5 | related_terms: 6 | - additive-model 7 | title: Nonparametric regression 8 | --- 9 | -------------------------------------------------------------------------------- /_sass/tachyons/scss/_code.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | CODE 11 | 12 | */ 13 | 14 | .pre { 15 | overflow-x: auto; 16 | overflow-y: hidden; 17 | overflow: scroll; 18 | } 19 | -------------------------------------------------------------------------------- /_sass/tachyons/scss/_lists.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | LISTS 11 | http://tachyons.io/docs/elements/lists/ 12 | 13 | */ 14 | 15 | .list { list-style-type: none; } 16 | -------------------------------------------------------------------------------- /terms/gap-statistic.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Estimating the number of clusters in a data set via the gap statistic 4 | link_url: http://statweb.stanford.edu/~gwalther/gap.pdf 5 | related_terms: 6 | - k-means-clustering 7 | title: Gap statistic 8 | --- 9 | -------------------------------------------------------------------------------- /terms/global-average-pooling-gap.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Learning Deep Features for Discriminative Localization 4 | link_url: http://cnnlocalization.csail.mit.edu/ 5 | related_terms: 6 | - pooling-layer 7 | title: Global Average Pooling (GAP) 8 | --- 9 | -------------------------------------------------------------------------------- /terms/sequential-pattern-mining.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Sequential pattern mining - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Sequential_pattern_mining 5 | related_terms: 6 | - association-rule-mining 7 | title: Sequential pattern mining 8 | --- 9 | -------------------------------------------------------------------------------- /index.md: -------------------------------------------------------------------------------- 1 | --- 2 | # You don't need to edit this file, it's empty on purpose. 3 | # Edit theme's home layout instead if you wanna make some changes 4 | # See: https://jekyllrb.com/docs/themes/#overriding-theme-defaults 5 | layout: homepage 6 | title: Machine Learning Glossary 7 | --- 8 | -------------------------------------------------------------------------------- /terms/community-structure.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Community structure - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Community_structure 5 | related_terms: 6 | - stochastic-block-model-sbm 7 | - clustering 8 | title: Community structure 9 | --- 10 | -------------------------------------------------------------------------------- /terms/generalized-additive-model-gam.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Generalized additive model - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Generalized_additive_model 5 | related_terms: 6 | - additive-model 7 | title: Generalized additive model (GAM) 8 | --- 9 | -------------------------------------------------------------------------------- /terms/graph-neural-network.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: The Graph Neural Network Model 4 | link_url: http://repository.hkbu.edu.hk/cgi/viewcontent.cgi?article=1000&context=vprd_ja 5 | related_terms: 6 | - recursive-neural-network 7 | title: Graph Neural Network 8 | --- 9 | -------------------------------------------------------------------------------- /netlify.toml: -------------------------------------------------------------------------------- 1 | [build] 2 | command = "jekyll build" 3 | publish = "_site" 4 | 5 | [context.production] 6 | # Only update the Algolia search index for production builds. 7 | # Deploy previews and branch builds shouldn't be indexed. 8 | command = "jekyll build && jekyll algolia" 9 | -------------------------------------------------------------------------------- /terms/covariate-shift.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Real simple covariate shift correction - Alex Smola 4 | link_url: http://blog.smola.org/post/4110255196/real-simple-covariate-shift-correction 5 | related_terms: 6 | - batch-normalization 7 | title: Covariate shift 8 | --- 9 | -------------------------------------------------------------------------------- /terms/indian-buffet-process.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Indian buffet process - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Chinese_restaurant_process#The_Indian_buffet_process 5 | related_terms: 6 | - chinese-restaurant-process 7 | title: Indian Buffet Process 8 | --- 9 | -------------------------------------------------------------------------------- /_includes/banners/banner_icon.html: -------------------------------------------------------------------------------- 1 | 2 | info icon 3 | 4 | -------------------------------------------------------------------------------- /terms/accams.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: 'ACCAMS: Additive Co-Clustering to Approximate Matrices Succinctly' 4 | link_url: http://arxiv.org/abs/1501.00199 5 | related_terms: 6 | - co-clustering 7 | title: ACCAMS 8 | --- 9 | ACCAMS: Additive Co-Clustering to Approximate Matrices Succinctly -------------------------------------------------------------------------------- /terms/support-vector-machine-svm.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Support Vector Machine (SVM) 3 | --- 4 | Support Vector Machine is a classification method in supervised learning that seeks to use support vectors (cases close to the boundary) to find an optimal hyperplane separating items from different classes. 5 | -------------------------------------------------------------------------------- /_sass/tachyons/scss/_images.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | IMAGES 11 | Docs: http://tachyons.io/docs/elements/images/ 12 | 13 | */ 14 | 15 | /* Responsive images! */ 16 | 17 | img { max-width: 100%; } 18 | 19 | -------------------------------------------------------------------------------- /terms/dirichlet-multinomial-distribution.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Dirichlet-multinomial distribution 4 | link_url: https://en.wikipedia.org/wiki/Dirichlet-multinomial_distribution 5 | related_terms: 6 | - multinomial-distribution 7 | title: Dirichlet-multinomial distribution 8 | --- 9 | -------------------------------------------------------------------------------- /terms/mention-ranking-coreference-model.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Deep Reinforcement Learning for Mention-Ranking Coreference Models 4 | link_url: https://arxiv.org/pdf/1609.08667.pdf 5 | related_terms: 6 | - coreference-resolution 7 | title: Mention-ranking coreference model 8 | --- 9 | -------------------------------------------------------------------------------- /terms/community-detection.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Community detection 3 | related_terms: 4 | - community-structure 5 | - stochastic-block-model-sbm 6 | --- 7 | Community detection refers to the problem of detecting whether 8 | a graph has [community structure][1]. 9 | 10 | [1]: /terms/community-structure/ -------------------------------------------------------------------------------- /terms/gradient.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Gradient 3 | --- 4 | The gradient is the vector generalization of the derivative. 5 | 6 | For a function $f([x_1, \ldots, x_n]^T)$, the gradient $\nabla_x f([x_1, \ldots, x_n]^T)$ 7 | is the vector containing the $n$ partial derivatives of $f$ with respect to each $x_i$. 8 | -------------------------------------------------------------------------------- /terms/mutual-information.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Mutual information - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Mutual_information 5 | related_terms: 6 | - pointwise-mutual-information-pmi 7 | - variation-of-information-distance 8 | title: Mutual information 9 | --- 10 | -------------------------------------------------------------------------------- /terms/temporal-generative-adversarial-network-tgan.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Temporal Generative Adversarial Networks 4 | link_url: https://arxiv.org/abs/1611.06624 5 | related_terms: 6 | - generative-adversarial-network-gan 7 | title: Temporal Generative Adversarial Network (TGAN) 8 | --- 9 | -------------------------------------------------------------------------------- /terms/adversarial-autoencoder.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Adversarial Autoencoders 4 | link_url: https://arxiv.org/abs/1511.05644 5 | related_terms: 6 | - autoencoder 7 | - generative-adversarial-network-gan 8 | - variational-autoencoder-vae 9 | title: Adversarial autoencoder 10 | --- 11 | -------------------------------------------------------------------------------- /terms/tabu-search.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Tabu Search - Clever Algorithms 4 | link_url: http://www.cleveralgorithms.com/nature-inspired/stochastic/tabu_search.html 5 | - link_title: Tabu search - Wikipedia 6 | link_url: https://en.wikipedia.org/wiki/Tabu_search 7 | title: Tabu Search 8 | --- 9 | -------------------------------------------------------------------------------- /terms/binary-tree-lstm.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Improved Semantic Representations From Tree-Structured Long Short-Term 4 | Memory Networks 5 | link_url: https://arxiv.org/abs/1503.00075 6 | related_terms: 7 | - long-short-term-memory-lstm 8 | - tree-lstm 9 | title: Binary Tree LSTM 10 | --- 11 | -------------------------------------------------------------------------------- /terms/n-ary-tree-lstm.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Improved Semantic Representations From Tree-Structured Long Short-Term 4 | Memory Networks 5 | link_url: https://arxiv.org/abs/1503.00075 6 | related_terms: 7 | - long-short-term-memory-lstm 8 | - tree-lstm 9 | title: N-ary Tree LSTM 10 | --- 11 | -------------------------------------------------------------------------------- /terms/structured-bayesian-optimization-sbo.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: 'BOAT: Building Auto-Tuners with Structured Bayesian Optimization' 4 | link_url: https://www.cl.cam.ac.uk/~mks40/pubs/www_2017.pdf 5 | related_terms: 6 | - bayesian-optimization 7 | title: Structured Bayesian optimization (SBO) 8 | --- 9 | -------------------------------------------------------------------------------- /terms/structured-learning.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Structured prediction - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Structured_prediction 5 | - link_title: What is structured learning? - PyStruct 6 | link_url: http://pystruct.github.io/intro.html#intro 7 | title: Structured learning 8 | --- 9 | -------------------------------------------------------------------------------- /terms/child-sum-tree-lstm.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Improved Semantic Representations From Tree-Structured Long Short-Term 4 | Memory Networks 5 | link_url: https://arxiv.org/abs/1503.00075 6 | related_terms: 7 | - long-short-term-memory-lstm 8 | - tree-lstm 9 | title: Child-Sum Tree-LSTM 10 | --- 11 | -------------------------------------------------------------------------------- /terms/policy-gradient.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Policy gradient methods - Scholarpedia 4 | link_url: http://www.scholarpedia.org/article/Policy_gradient_methods 5 | related_terms: 6 | - reinforcement-learning 7 | - trust-region-policy-optimization 8 | - gradient 9 | title: Policy Gradient 10 | --- 11 | -------------------------------------------------------------------------------- /_sass/tachyons/scss/_debug_children.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | DEBUG CHILDREN 11 | 12 | Just add the debug class to any element to see outlines on its 13 | children. 14 | 15 | */ 16 | 17 | .debug * { outline: 1px solid gold; } 18 | 19 | -------------------------------------------------------------------------------- /terms/backpropagation-through-time-bptt.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Backpropagation through time - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Backpropagation_through_time 5 | related_terms: 6 | - backpropagation 7 | - recurrent-neural-network 8 | title: Backpropagation Through Time (BPTT) 9 | --- 10 | -------------------------------------------------------------------------------- /terms/bias.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Bias of an estimator - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Bias_of_an_estimator 5 | - link_title: Bias (statistics) - Wikipedia 6 | link_url: https://en.wikipedia.org/wiki/Bias_(statistics 7 | related_terms: 8 | - variance 9 | title: Bias 10 | --- 11 | -------------------------------------------------------------------------------- /terms/dependency-tree-lstm.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Improved Semantic Representations From Tree-Structured Long Short-Term 4 | Memory Networks 5 | link_url: https://arxiv.org/abs/1503.00075 6 | related_terms: 7 | - long-short-term-memory-lstm 8 | - tree-lstm 9 | title: Dependency Tree LSTM 10 | --- 11 | -------------------------------------------------------------------------------- /terms/fast-r-cnn.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Fast R-CNN 3 | related_terms: 4 | - yolo-object-detection 5 | - r-cnn 6 | - convolutional-neural-network-cnn 7 | references: 8 | - link_title: "Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks" 9 | link_url: https://arxiv.org/abs/1506.01497 10 | --- 11 | -------------------------------------------------------------------------------- /terms/k-max-pooling.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: A Convolutional Neural Network for Modelling Sentences 4 | link_url: https://arxiv.org/abs/1404.2188 5 | related_terms: 6 | - pooling-layer 7 | - convolutional-neural-network-cnn 8 | - dynamic-k-max-pooling 9 | - max-pooling 10 | title: k-Max Pooling 11 | --- 12 | -------------------------------------------------------------------------------- /terms/constituency-tree-lstm.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Improved Semantic Representations From Tree-Structured Long Short-Term 4 | Memory Networks 5 | link_url: https://arxiv.org/abs/1503.00075 6 | related_terms: 7 | - long-short-term-memory-lstm 8 | - tree-lstm 9 | title: Constituency Tree-LSTM 10 | --- 11 | -------------------------------------------------------------------------------- /terms/dynamic-k-max-pooling.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: A Convolutional Neural Network for Modelling Sentences 4 | link_url: https://arxiv.org/abs/1404.2188 5 | related_terms: 6 | - pooling-layer 7 | - convolutional-neural-network-cnn 8 | - max-pooling 9 | - k-max-pooling 10 | title: Dynamic k-Max Pooling 11 | --- 12 | -------------------------------------------------------------------------------- /terms/stride-convolution.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Stride (convolution) 3 | related_terms: 4 | - same-convolution 5 | - filter-convolution 6 | - convolution 7 | - convolutional-neural-network-cnn 8 | --- 9 | 10 | In convolutions, the *stride* is the number of horizontal 11 | and vertical steps that the filter takes over the original matrix. -------------------------------------------------------------------------------- /terms/weighted-finite-state-transducer-wfst.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Weighted automata - Finite-state transducer - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Finite-state_transducer#Weighted_automata 5 | related_terms: 6 | - finite-state-transducer-fst 7 | title: Weighted finite-state transducer (WFST) 8 | --- 9 | -------------------------------------------------------------------------------- /terms/abstractive-sentence-summarization.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Abstractive sentence summarization 3 | --- 4 | Abstractive sentence summarization refers to creating a shorter version of a 5 | sentence with the same meaning. 6 | 7 | This is in contrast to extractive sentence summarization, which pulls the 8 | most informative sentences from a document. 9 | -------------------------------------------------------------------------------- /terms/named-entity-recognition-ner.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Stanford Named Entity Recognizer (NER) - The Stanford Natural Language 4 | Processing Group 5 | link_url: https://nlp.stanford.edu/software/CRF-NER.shtml 6 | related_terms: 7 | - natural-language-processing 8 | title: Named Entity Recognition (NER) 9 | --- 10 | -------------------------------------------------------------------------------- /terms/poisson-additive-co-clustering-paco.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: 'Explaining reviews and ratings with PACO: Poisson Additive Co-Clustering' 4 | link_url: https://arxiv.org/abs/1512.01845 5 | related_terms: 6 | - accams 7 | - co-clustering 8 | - additive-model 9 | title: Poisson Additive Co-Clustering (PACO) 10 | --- 11 | -------------------------------------------------------------------------------- /terms/test-term.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Test term 3 | related_terms: 4 | - c 5 | - c 6 | new_references: 7 | - link_title: Google 8 | link_url: 'https://google.com' 9 | --- 10 | This is the body of yet another test term. I think that this is going to be extremely promising. 11 | 12 | 13 | $$ 14 | 15 | \int_0^100 x^2 16 | 17 | $$ 18 | -------------------------------------------------------------------------------- /DEPENDENCIES.md: -------------------------------------------------------------------------------- 1 | # Licenses for dependencies 2 | 3 | ## jekyll-compress-html 4 | https://github.com/penibelst/jekyll-compress-html/blob/master/LICENSE 5 | 6 | ## tachyons-sass 7 | https://github.com/tachyons-css/tachyons-sass/blob/master/license 8 | 9 | ## normalize.css 10 | https://github.com/necolas/normalize.css/blob/master/LICENSE.md 11 | -------------------------------------------------------------------------------- /terms/deep-learning.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Deep Learning 3 | references: 4 | - link_title: Deep Learning - Wikipedia 5 | link_url: http://en.wikipedia.org/wiki/Deep_learning 6 | --- 7 | Deep Learning is about learning using [neural networks][1] with multiple [hidden layers][2]. 8 | 9 | [1]: /terms/neural-network/ 10 | [1]: /terms/hidden-layer/ 11 | -------------------------------------------------------------------------------- /terms/independent-identically-distributed-iid.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Independent and Identically Distributed (i.i.d) 3 | --- 4 | A collection of random variables is independent and identically distributed 5 | if they have these properties: 6 | 7 | 1. they all have the same probability distribution. 8 | 2. they are all mutually independent of each other. 9 | -------------------------------------------------------------------------------- /terms/doc2vec.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: doc2vec 3 | related_terms: 4 | - paragraph-vector 5 | --- 6 | [doc2vec](https://radimrehurek.com/gensim/models/doc2vec.html) is the gensim 7 | library's name for its [paragraph vector](/terms/paragraph-vector/) implementation. 8 | doc2vec can be used to generate unsupervised representations of sentences, paragraphs, 9 | and documents. -------------------------------------------------------------------------------- /terms/facet.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Facet (disambiguation) 3 | --- 4 | A [facet](/terms/facet-plotting) can refer to a type 5 | of plot or chart designed to efficiently display 6 | multidimensional data. 7 | 8 | [Facets](/terms/facets-tool) is also the name of a tool 9 | from Google's People + AI Research (PAIR) lab designed 10 | to help explore datasets. 11 | -------------------------------------------------------------------------------- /terms/mention-pair-coreference-model.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Entity-Centric Coreference Resolution with Model Stacking 4 | link_url: https://nlp.stanford.edu/pubs/clark-manning-acl15-entity.pdf 5 | related_terms: 6 | - coreference-resolution 7 | - mention-ranking-coreference-model 8 | title: Mention-pair coreference model 9 | --- 10 | -------------------------------------------------------------------------------- /terms/smooth-support-vector-machine-ssvm.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: 'SSVM: A Smooth Support Vector Machine for Classification' 4 | link_url: http://jupiter.math.nctu.edu.tw/~yuhjye/assets/file/publications/conference_papers/C29_SSVM.pdf 5 | related_terms: 6 | - support-vector-machine-svm 7 | title: Smooth support vector machine (SSVM) 8 | --- 9 | -------------------------------------------------------------------------------- /_includes/banners/incomplete_term.html: -------------------------------------------------------------------------------- 1 | 10 | -------------------------------------------------------------------------------- /_includes/synonyms.html: -------------------------------------------------------------------------------- 1 | {% assign slug = include.page.url | split: "/" | last %} 2 | {% assign synonyms = site.pages | where_exp: "item": "item.destination == slug" %} 3 | {% if synonyms.size > 0 %} 4 |

Synonyms

5 | 10 | {% endif %} -------------------------------------------------------------------------------- /terms/yolov2-object-detection.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: YOLOv2 (object detection algorithm) 3 | related_terms: 4 | - yolo-object-detection 5 | - convolutional-neural-network-cnn 6 | - object-detection 7 | - object-localization 8 | references: 9 | - link_title: "YOLO9000: Better, Faster, Stronger" 10 | - link_url: https://arxiv.org/pdf/1612.08242.pdf 11 | --- 12 | -------------------------------------------------------------------------------- /terms/association-rule-mining.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Association rule learning - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Association_rule_learning 5 | - link_title: Mining association rules between sets of items in large databases 6 | link_url: http://dl.acm.org/citation.cfm?doid=170035.170072 7 | title: Association rule mining 8 | --- 9 | -------------------------------------------------------------------------------- /terms/reinforce-policy-gradient-algorithm.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Simple statistical gradient-following algorithms for connectionist reinforcement 4 | learning 5 | link_url: http://citeseer.ist.psu.edu/viewdoc/summary?doi=10.1.1.129.8871 6 | related_terms: 7 | - reinforcement-learning 8 | title: REINFORCE Policy Gradient Algorithm 9 | --- 10 | -------------------------------------------------------------------------------- /terms/textrank.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: 'TextRank: Bringing Order into Texts' 4 | link_url: http://www.aclweb.org/anthology/W04-3252 5 | related_terms: 6 | - pagerank 7 | title: TextRank 8 | --- 9 | TextRank is a graph ranking algorithm applied to text. 10 | It is useful for various unsupervised learning 11 | tasks in natural language processing. -------------------------------------------------------------------------------- /terms/alternating-conditional-expectation-ace-algorithm.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Estimating Optimal Transformations for Multiple Regression Using the 4 | ACE Algorithm 5 | link_url: http://www.jds-online.com/files/JDS-156.pdf 6 | related_terms: 7 | - nonparametric-regression 8 | title: Alternating conditional expectation (ACE) algorithm 9 | --- 10 | -------------------------------------------------------------------------------- /_includes/banners/needs_review.html: -------------------------------------------------------------------------------- 1 | 10 | -------------------------------------------------------------------------------- /terms/bagging.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Bagging 3 | references: 4 | - link_title: Bootstrap aggregating - Wikipedia 5 | link_url: http://en.wikipedia.org/wiki/Bootstrap_aggregating 6 | --- 7 | Bagging, short for bootstrap aggregating, is training different base learners on different subsets of the training set randomly, by drawing random training sets from the given sample (with replacement). 8 | -------------------------------------------------------------------------------- /_sass/tachyons/scss/_forms.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | FORMS 11 | 12 | */ 13 | 14 | .input-reset { 15 | -webkit-appearance: none; 16 | -moz-appearance: none; 17 | } 18 | 19 | .button-reset::-moz-focus-inner, 20 | .input-reset::-moz-focus-inner { 21 | border: 0; 22 | padding: 0; 23 | } 24 | -------------------------------------------------------------------------------- /terms/continuous-bag-of-words-cbow.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Continuous-Bag-of-Words (CBOW) 3 | related_terms: 4 | - word-embedding 5 | - bag-of-words 6 | - word2vec 7 | --- 8 | Continuous Bag of Words refers to a algorithm 9 | that predicts a target word from its 10 | surrounding context. 11 | 12 | CBOW is one of the algorithms used for training 13 | [word2vec](/terms/word2vec/) vectors. -------------------------------------------------------------------------------- /terms/expectation-maximization-em-algorithm.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Expectation-maximization algorithm 4 | link_url: https://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm 5 | related_terms: 6 | - maximum-a-posteriori-map-estimation 7 | - maximum-likelihood-estimation-mle 8 | - expectation 9 | title: Expectation-maximization (EM) algorithm 10 | --- 11 | -------------------------------------------------------------------------------- /terms/passive-aggressive-algorithm.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Online Passive-Aggressive Algorithms 4 | link_url: http://jmlr.csail.mit.edu/papers/volume7/crammer06a/crammer06a.pdf 5 | - link_title: Passive Aggressive Algorithms - scikit-learn Documentation 6 | link_url: http://scikit-learn.org/stable/modules/linear_model.html#passive-aggressive 7 | title: Passive-Aggressive Algorithm 8 | --- 9 | -------------------------------------------------------------------------------- /serve.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | # This runs "jekyll serve" inside the container, which will build the website. 3 | exec docker run \ 4 | --rm \ 5 | -it \ 6 | --init \ 7 | --volume="${PWD}":/opt/buildhome/repo \ 8 | --workdir=/opt/buildhome/repo \ 9 | --entrypoint=/opt/build-bin/build \ 10 | -p 4000:4000 \ 11 | netlify/build:xenial \ 12 | jekyll serve --host=0.0.0.0 --incremental 13 | -------------------------------------------------------------------------------- /terms/decision-tree.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Decision tree 3 | references: 4 | - link_title: Decision tree - Wikipedia 5 | link_url: https://en.wikipedia.org/wiki/Decision_tree 6 | related_terms: 7 | - random-forest 8 | --- 9 | A supervised learning method that iteratively refines a prediction by asking questions about the input feature most likely to affect the outcome, making a 'tree' of question branches. 10 | -------------------------------------------------------------------------------- /terms/sequential-model-based-optimization-smbo.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Sequential Model-Based Optimization for General Algorithm Configuration 4 | (extended version) 5 | link_url: https://www.cs.ubc.ca/~hutter/papers/10-TR-SMAC.pdf 6 | related_terms: 7 | - bayesian-optimization 8 | - structured-bayesian-optimization-sbo 9 | title: Sequential Model-Based Optimization (SMBO) 10 | --- 11 | -------------------------------------------------------------------------------- /terms/second-order-information.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Second-order information 3 | needs_review: true 4 | related_terms: 5 | - first-order-information 6 | - hessian-matrix 7 | - hessian-free-optimization 8 | --- 9 | The term *second-order information* refers to information 10 | about a function gained by computing its second derivative. 11 | The second derivative reveals information about the function's 12 | curvature. -------------------------------------------------------------------------------- /terms/boosting.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Boosting 3 | references: 4 | - link_title: Bootstrap aggregating - Wikipedia 5 | link_url: http://en.wikipedia.org/wiki/Bootstrap_aggregating 6 | --- 7 | Learners trained serially so that instances on which the preceding base learners are not accurate are given more emphasis in training later base-learners; actively tries to generate complementary learners, instead of leaving this to chance. 8 | -------------------------------------------------------------------------------- /terms/contextual-bandit.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Contextual Bandit - Multi-armed Bandit - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Multi-armed_bandit#Contextual_Bandit 5 | - link_title: An Introduction to Contextual Bandits - Stream 6 | link_url: https://getstream.io/blog/introduction-contextual-bandits/ 7 | related_terms: 8 | - multi-armed-bandit 9 | title: Contextual Bandit 10 | --- 11 | -------------------------------------------------------------------------------- /_includes/references.html: -------------------------------------------------------------------------------- 1 | {% if include.page.references %} 2 |

References

3 | 10 | {% endif %} -------------------------------------------------------------------------------- /terms/perplexity.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Perplexity 3 | --- 4 | Wikipedia [defines perplexity][1] as the following: 5 | 6 | > In information theory, perplexity is a measurement of how well a probability distribution or probability model predicts a sample. It may be used to compare probability models. A low perplexity indicates the probability distribution is good at predicting the sample. 7 | 8 | [1]: https://en.wikipedia.org/wiki/Perplexity -------------------------------------------------------------------------------- /terms/adversarial-variational-bayes.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: 'Adversarial Variational Bayes: Unifying Variational Autoencoders and 4 | Generative Adversarial Networks' 5 | link_url: https://arxiv.org/abs/1701.04722 6 | related_terms: 7 | - autoencoder 8 | - adversarial-autoencoder 9 | - variational-autoencoder-vae 10 | - generative-adversarial-network-gan 11 | title: Adversarial Variational Bayes 12 | --- 13 | -------------------------------------------------------------------------------- /terms/gradient-descent.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Gradient descent 3 | --- 4 | Gradient descent is an optimization algorithm designed 5 | to find the minimum of a function. Many machine learning 6 | algorithms use gradient descent or a variant. 7 | 8 | Common variants include: 9 | - [Stochastic Gradient Descent (SGD)](/terms/stochastic-gradient-descent-sgd/) 10 | - [Minibatch Gradient Descent](/terms/minibatch-gradient-descent/) 11 | 12 | -------------------------------------------------------------------------------- /terms/nchw.md: -------------------------------------------------------------------------------- 1 | --- 2 | related_terms: 3 | - nhwc 4 | title: NCHW 5 | --- 6 | 7 | **NCHW** is an acronym describing the order of the axes in a tensor containing image data samples. 8 | 9 | * **N**: Number of data samples. 10 | * **C**: Image channels. A red-green-blue (RGB) image will have 3 channels. 11 | * **H**: Image height. 12 | * **W**: Image width. 13 | 14 | NCHW is sometimes referred to as a **channels-first** layout. 15 | -------------------------------------------------------------------------------- /meta/needs-review.html: -------------------------------------------------------------------------------- 1 | --- 2 | layout: page 3 | title: "Meta: Terms Needing Review" 4 | --- 5 |

6 | This page links to terms that have "complete" definitions, 7 | but are in need of review or further editing. 8 |

9 | -------------------------------------------------------------------------------- /terms/transduction.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Transduction 3 | references: 4 | - link_title: Transduction (machine learning) - Wikipedia 5 | link_url: https://en.wikipedia.org/wiki/Transduction_(machine_learning) 6 | related_terms: 7 | - supervised-learning 8 | --- 9 | Similar to supervised learning, but does not explicitly construct a function: instead, tries to predict new outputs based on training inputs, training outputs, and new inputs. 10 | -------------------------------------------------------------------------------- /terms/adadelta.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: 'ADADELTA: An Adaptive Learning Rate Method' 4 | link_url: https://arxiv.org/abs/1212.5701 5 | related_terms: 6 | - learning-rate 7 | - adam-optimizer 8 | - adagrad 9 | - stochastic-gradient-descent-sgd 10 | title: ADADELTA 11 | --- 12 | ADADELTA is a gradient descent-based optimization algorithm. Like [AdaGrad][1], 13 | ADADELTA automatically tunes the learning rate. 14 | 15 | [1]: /terms/adagrad/ -------------------------------------------------------------------------------- /terms/bounding-box.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Bounding box 3 | related_terms: 4 | - object-localization 5 | - object-detection 6 | --- 7 | A bounding box is a rectangle (in 2D datasets) or rectangular prism 8 | (in 3D datasets) drawn around an object identified in an image. 9 | 10 | [Object localization](/terms/object-localization) is a task in computer 11 | vision where a model is trained to draw bounding boxes around 12 | object detected in an image. 13 | -------------------------------------------------------------------------------- /terms/exponential-linear-unit-elu.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Fast and Accurate Deep Network Learning by Exponential Linear Units 4 | (ELUs) 5 | link_url: https://arxiv.org/abs/1511.07289 6 | - link_title: Deep Residual Networks with Exponential Linear Unit 7 | link_url: https://arxiv.org/abs/1604.04112 8 | related_terms: 9 | - rectified-linear-unit-relu 10 | - activation-function 11 | title: Exponential Linear Unit (ELU) 12 | --- 13 | -------------------------------------------------------------------------------- /terms/gradient-clipping.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Gradient Clipping - Tensorflow Python Documentation 4 | link_url: https://www.tensorflow.org/versions/r0.12/api_docs/python/train/gradient_clipping 5 | - link_title: On the difficulty of training Recurrent Neural Networks 6 | link_url: https://arxiv.org/abs/1211.5063 7 | related_terms: 8 | - recurrent-neural-network 9 | - long-short-term-memory-lstm 10 | title: Gradient Clipping 11 | --- 12 | -------------------------------------------------------------------------------- /_sass/tachyons/scss/_debug-children.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | DEBUG CHILDREN 11 | Docs: http://tachyons.io/docs/debug/ 12 | 13 | Just add the debug class to any element to see outlines on its 14 | children. 15 | 16 | */ 17 | 18 | .debug * { outline: 1px solid gold; } 19 | .debug-white * { outline: 1px solid white; } 20 | .debug-black * { outline: 1px solid black; } 21 | 22 | -------------------------------------------------------------------------------- /_sass/tachyons/scss/_module-template.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | MODULE NAME 11 | 12 | Use this scaffolding to create or extend your own modules with tachyons 13 | style architecture. 14 | 15 | */ 16 | 17 | 18 | @media #{$breakpoint-not-small} { 19 | 20 | } 21 | 22 | @media #{$breakpoint-medium} { 23 | 24 | } 25 | 26 | @media #{$breakpoint-large} { 27 | 28 | } 29 | 30 | -------------------------------------------------------------------------------- /terms/multidimensional-recurrent-neural-network-mdrnn.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Hessian-free Optimization for Learning Deep Multidimensional Recurrent 4 | Neural Networks 5 | link_url: https://arxiv.org/abs/1509.03475 6 | - link_title: Multi-Dimensional Recurrent Neural Networks 7 | link_url: https://arxiv.org/abs/0705.2011 8 | related_terms: 9 | - recurrent-neural-network 10 | title: Multidimensional recurrent neural network (MDRNN) 11 | --- 12 | -------------------------------------------------------------------------------- /terms/computer-vision.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Computer vision - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Computer_vision 5 | title: Computer vision 6 | --- 7 | Computer vision is the field of teaching computers to perceive sensor 8 | data--such as from cameras, RADAR, and LIDAR sensors--to achieve 9 | an understanding of what is in the data. 10 | 11 | Computer vision is a wide-ranging field that comprises of many techniques 12 | and subfields. -------------------------------------------------------------------------------- /terms/learning-rate-decay.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Decaying the learning rate - TensorFlow Documentation 4 | link_url: https://www.tensorflow.org/versions/r0.11/api_docs/python/train/decaying_the_learning_rate 5 | - link_title: Annealing the learning rate - Stanford CS231n 6 | link_url: http://cs231n.github.io/neural-networks-3/#anneal 7 | related_terms: 8 | - learning-rate 9 | - stochastic-gradient-descent-sgd 10 | title: Learning rate decay 11 | --- 12 | -------------------------------------------------------------------------------- /terms/synset.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Synset 3 | published: true 4 | --- 5 | A synset is [WordNet][1]'s terminology for a [synonym ring][2]. 6 | 7 | WordNet is a database of English words grouped into sets of synonyms. 8 | WordNet's synsets are often useful in information retrieval and natural 9 | langauge processing tasks to discover when two different words can mean 10 | similar things. 11 | 12 | [1]: https://wordnet.princeton.edu/ 13 | [2]: https://en.wikipedia.org/wiki/Synonym_ring -------------------------------------------------------------------------------- /terms/face-detection.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Face detection 3 | --- 4 | *Face detection* is the problem of detecting wheter an image has 5 | a (usually human) face in it. 6 | 7 | The problem of identifying whether the image has a specific 8 | *single* person's face is called [face verification][1]. The problem 9 | of identifying whether the image has any of $k$ person's faces 10 | is called [face recognition][2]. 11 | 12 | [1]: /terms/face-verification/ 13 | [2]: /terms/face-recognition/ -------------------------------------------------------------------------------- /terms/supervised-learning.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Supervised learning 3 | references: 4 | - link_title: Supervised learning - Wikipedia 5 | link_url: http://en.wikipedia.org/wiki/Supervised_learning 6 | related_terms: 7 | - unsupervised-learning 8 | - structured-learning 9 | --- 10 | Supervised learning is about training on labeled data, i.e. questions with known answers. The task is to find the function that maps a set of inputs (x1, x2, x3, ...) to the value to be predicted (y). 11 | -------------------------------------------------------------------------------- /terms/unsupervised-learning.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Unsupervised learning 3 | references: 4 | - link_title: Unsupervised learning - Wikipedia 5 | link_url: http://en.wikipedia.org/wiki/Unsupervised_learning 6 | related_terms: 7 | - anomaly-detection 8 | - clustering 9 | - supervised-learning 10 | - structured-learning 11 | --- 12 | Unsupervised learning is about problems where we don't have 13 | labeled answers, such as clustering, dimensionality reduction, and anomaly detection. 14 | -------------------------------------------------------------------------------- /terms/negative-log-likelihood.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Why Minimize Negative Log Likelihood? - Quantivity 4 | link_url: https://quantivity.wordpress.com/2011/05/23/why-minimize-negative-log-likelihood/ 5 | - link_title: I am wondering why we use negative (log) likelihood sometimes? - Cross 6 | Validated 7 | link_url: https://stats.stackexchange.com/questions/141087/i-am-wondering-why-we-use-negative-log-likelihood-sometimes 8 | title: Negative Log Likelihood 9 | --- 10 | -------------------------------------------------------------------------------- /terms/derivative-free-optimization.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Derivative-free optimization - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Derivative-free_optimization 5 | - link_title: 'Derivative-free optimization: A review of algorithms and comparison 6 | of software implementations' 7 | link_url: http://thales.cheme.cmu.edu/dfo/comparison/dfo.pdf 8 | related_terms: 9 | - optimization 10 | - black-box-optimization 11 | title: Derivative-free optimization 12 | --- 13 | -------------------------------------------------------------------------------- /terms/glove-word-embeddings.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: 'GloVe: Global Vectors for Word Representation - Stanford NLP Group' 4 | link_url: https://nlp.stanford.edu/pubs/glove.pdf 5 | - link_title: GloVe - Stanford NLP project page 6 | link_url: https://nlp.stanford.edu/projects/glove/ 7 | related_terms: 8 | - word2vec 9 | - word-embedding 10 | title: GloVe (Global Vectors) embeddings 11 | --- 12 | GloVe, or Global Vectors, refers to a word embedding algorithm 13 | from the Stanford NLP group. -------------------------------------------------------------------------------- /terms/overfitting.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Overfitting - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Overfitting 5 | related_terms: 6 | - supervised-learning 7 | title: Overfitting 8 | --- 9 | Overfitting is when a statistical model learns patterns in the training 10 | data that are too complex to generalize well. 11 | 12 | ![Example of 1-dimensional overfitted data from [Wikipedia][1]](/images/overfitting.png) 13 | 14 | [1]: https://en.wikipedia.org/wiki/File:Overfitted_Data.png -------------------------------------------------------------------------------- /terms/representation-learning.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Feature learning - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Feature_learning 5 | - link_title: 'Representation Learning: A Review and New Perspectives' 6 | link_url: http://www.cl.uni-heidelberg.de/courses/ws14/deepl/BengioETAL12.pdf 7 | - link_title: Representation Learning - Deep Learning Book 8 | link_url: http://www.deeplearningbook.org/contents/representation.html 9 | title: Representation learning 10 | --- 11 | -------------------------------------------------------------------------------- /Gemfile: -------------------------------------------------------------------------------- 1 | source "https://rubygems.org" 2 | ruby RUBY_VERSION 3 | gem "jekyll", "~> 4.2" 4 | gem "nokogiri", "~> 1.12" 5 | gem "ffi", "~> 1.15" 6 | gem "kramdown", ">= 2.3.0" 7 | 8 | group :jekyll_plugins do 9 | gem "jekyll-last-modified-at" 10 | gem "jekyll-sitemap" 11 | gem "jekyll-pandoc" 12 | gem 'jekyll-algolia', '~> 1.0' 13 | end 14 | 15 | # Windows does not include zoneinfo files, so bundle the tzinfo-data gem 16 | gem 'tzinfo-data', platforms: [:mingw, :mswin, :x64_mingw, :jruby] 17 | 18 | -------------------------------------------------------------------------------- /terms/jaccard-index.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Jaccard index (intersection over union) 3 | --- 4 | The *Jaccard index*--otherwise known as *intersection over union*--is used to calculate the similarity or difference 5 | of sample sets. 6 | 7 | $$ 8 | J(\mathbb{A}, \mathbb{B}) = 9 | \frac{\left | \mathbb{A} \cap \mathbb{B} \right |} 10 | {\left | \mathbb{A} \cup \mathbb{B} \right |} 11 | $$ 12 | 13 | $$ 14 | 0 \leq J(\mathbb{A}, \mathbb{B}) \leq 1 15 | $$ 16 | 17 | The index is defined to be 1 if the sets are empty. 18 | -------------------------------------------------------------------------------- /terms/chunking.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Chunking 3 | related_terms: 4 | - natural-language-processing 5 | --- 6 | The paper [Natural Language Processing (almost) from Scratch][1] describes chunking as: 7 | 8 | > Also called shallow parsing, chunking aims at labeling segments of a sentence with syntactic constituents such as noun or verb phrases (NP or VP). Each word is assigned only one unique tag, often encoded as a begin-chunk (e.g., B-NP) or inside-chunk tag (e.g., I-NP). 9 | 10 | [1]: https://arxiv.org/abs/1103.0398 -------------------------------------------------------------------------------- /_build/entrypoint.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | # Run with original Bourne shell because we're using an Alpine Docker image. 3 | set -e 4 | if [ "$1" = "serve" ] 5 | then 6 | echo "Serving..." 7 | exec jekyll serve --host 0.0.0.0 "$@" 8 | elif [ "$1" = "deploy" ] 9 | then 10 | echo "Starting build..." 11 | jekyll build 12 | echo 'Completed build. starting algolia push...' 13 | jekyll algolia push 14 | echo "Completed algolia push, starting deploy..." 15 | exec jekyll-s3 --headless 16 | else 17 | exec jekyll "$@" 18 | fi 19 | -------------------------------------------------------------------------------- /meta/no-references.html: -------------------------------------------------------------------------------- 1 | --- 2 | layout: page 3 | title: "Meta: Terms Without References" 4 | --- 5 |

6 | This page links to all terms that do not have listed external 7 | references. 8 |

9 | {% assign no_references = site.pages | where_exp: "item", "item.references == nil" %} 10 | -------------------------------------------------------------------------------- /terms/learning-to-rank-ltr.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Learning to rank - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Learning_to_rank 5 | - link_title: Learning To Rank - Apache Solr Documentation 6 | link_url: https://cwiki.apache.org/confluence/display/solr/Learning+To+Rank 7 | related_terms: 8 | - collaborative-filtering 9 | title: Learning To Rank (LTR) 10 | --- 11 | Learning-to-rank is the application of machine learning 12 | to ranking search results, recommendations, or similar 13 | information. -------------------------------------------------------------------------------- /terms/r-cnn.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: R-CNN 3 | related_terms: 4 | - convolutional-neural-network-cnn 5 | - object-detection 6 | - yolo-object-detection 7 | references: 8 | - link_title: Rich feature hierarchies for accurate object detection and semantic segmentation 9 | link_url: https://arxiv.org/abs/1311.2524 10 | - link_title: Region Proposals - Convolutional Neural Networks - deeplearning.ai 11 | link_url: https://www.coursera.org/learn/convolutional-neural-networks/lecture/aCYZv/optional-region-proposals 12 | --- 13 | -------------------------------------------------------------------------------- /_includes/banners/default.html: -------------------------------------------------------------------------------- 1 | 13 | -------------------------------------------------------------------------------- /terms/recurrent-neural-network-language-model-rnnlm.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Recurrent Neural Networks - Tutorials - TensorFlow Documentation 4 | link_url: https://www.tensorflow.org/tutorials/recurrent#language_modeling 5 | related_terms: 6 | - natural-language-processing 7 | - recurrent-neural-network 8 | - neural-network 9 | title: Recurrent Neural Network Language Model (RNNLM) 10 | --- 11 | A Recurrent Neural Network Language Model (RNNLM) is 12 | a recurrent neural network tasked with modeling 13 | languages. -------------------------------------------------------------------------------- /terms/nhwc.md: -------------------------------------------------------------------------------- 1 | --- 2 | related_terms: 3 | - nchw 4 | title: NHWC 5 | --- 6 | 7 | **NHWC** is an acronym describing the order of the axes in a tensor containing image data samples. 8 | 9 | Software frameworks for training machine learning models--such as TensorFlow--use the acronym NHWC 10 | 11 | * **N**: Number of data samples. 12 | * **H**: Image height. 13 | * **W**: Image width. 14 | * **C**: Image channels. A red-green-blue (RGB) image will have 3 channels. 15 | 16 | NHWC is sometimes referred to as a **channels-last** layout. 17 | -------------------------------------------------------------------------------- /terms/fasttext.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: fastText 3 | related_terms: 4 | - word2vec 5 | - glove-word-embeddings 6 | - word-embedding 7 | --- 8 | fastText is a project from Facebook Research for producing 9 | [word embeddings](/terms/word-embedding/) and sentence 10 | classification. 11 | 12 | The fastText project is [hosted on Github](https://github.com/facebookresearch/fastText/) and 13 | instructions for using their pre-trained word embeddings 14 | can be [found here](https://github.com/facebookresearch/fastText/blob/master/pretrained-vectors.md). -------------------------------------------------------------------------------- /terms/hinge-loss.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Hinge loss 3 | needs_review: true 4 | related_terms: 5 | - loss-function 6 | --- 7 | 8 | From the scikit-learn documentation, we get [the following definition][1]: 9 | 10 | > The hinge_loss function computes the average distance between the model and the data using hinge loss, a one-sided metric that considers only prediction errors. (Hinge loss is used in maximal margin classifiers such as support vector machines.) 11 | 12 | [1]: http://scikit-learn.org/stable/modules/model_evaluation.html#scoring-parameter -------------------------------------------------------------------------------- /_build/install-git-lfs.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | 3 | # This script is for installing git-lfs on Travis CI Linux environments. 4 | # Don't use this script to install git-lfs on your personal computer. 5 | # This is necessary until https://github.com/travis-ci/packer-templates/issues/386 6 | # is resolved. 7 | mkdir -p $HOME/bin 8 | wget https://github.com/git-lfs/git-lfs/releases/download/v2.1.1/git-lfs-linux-amd64-2.1.1.tar.gz 9 | tar xvfz git-lfs-linux-amd64-2.1.1.tar.gz 10 | mv git-lfs-2.1.1/git-lfs $HOME/bin/git-lfs 11 | export PATH=$PATH:$HOME/bin/ 12 | -------------------------------------------------------------------------------- /terms/dimensionality-reduction.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Dimensionality reduction 3 | references: 4 | - link_title: Dimensionality reduction - Wikipedia 5 | link_url: https://en.wikipedia.org/wiki/Dimensionality_reduction 6 | related_terms: 7 | - unsupervised-learning 8 | --- 9 | Dimensionality reduction is about taking a set of data, and reducing its number of dimensions in such a way as to balance information size and independence of the features' information. The point is to get a smaller dataset that still retains most of the original information. 10 | -------------------------------------------------------------------------------- /terms/paragraph-vector.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Paragraph vector 3 | related_terms: 4 | - doc2vec 5 | - bag-of-words 6 | --- 7 | *Paragraph Vectors* is the name of the model proposed by [Le and Mikolov][1] to 8 | generate unsupervised representations of sentences, paragraphs, or entire documents 9 | without losing local word order. 10 | 11 | This is in contrast to [bag-of-words](/terms/bag-of-words/) representations, which 12 | can offer useful representations of documents but lose all word order information. 13 | 14 | [1]: https://arxiv.org/abs/1405.4053 -------------------------------------------------------------------------------- /terms/backpropagation.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Yes you should understand backprop - Andrej Karpathy 4 | link_url: https://medium.com/@karpathy/yes-you-should-understand-backprop-e2f06eab496b 5 | - link_title: Backpropagation - Wikipedia 6 | link_url: https://en.wikipedia.org/wiki/Backpropagation 7 | related_terms: 8 | - gradient-descent 9 | title: Backpropagation 10 | --- 11 | A technique to find good weight values in a neural network by trying different weights, and seeing if the change contributes positively to prediction quality. 12 | -------------------------------------------------------------------------------- /terms/hessian-free-optimization.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Hessian Free Optimization - Andrew Gibiansky 4 | link_url: http://andrew.gibiansky.com/blog/machine-learning/hessian-free-optimization/ 5 | - link_title: Deep learning via Hessian-free optimization 6 | link_url: http://www.cs.toronto.edu/~jmartens/docs/Deep_HessianFree.pdf 7 | - link_title: Hessian-free Optimization for Learning Deep Multidimensional Recurrent 8 | Neural Networks 9 | link_url: https://arxiv.org/abs/1509.03475 10 | title: Hessian-free optimization 11 | --- 12 | -------------------------------------------------------------------------------- /terms/meteor-machine-translation.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: METEOR - Automatic Machine Translation Evaluation System 4 | link_url: http://www.cs.cmu.edu/~alavie/METEOR/ 5 | related_terms: 6 | - bilingual-evaluation-understudy-bleu 7 | title: METEOR 8 | --- 9 | METEOR is an automatic evaluation metric for machine translation, 10 | designed to mitigate perceived weaknesses in 11 | [BLEU](/terms/bilingual-evaluation-understudy-bleu/). METEOR scores 12 | machine translation *hypotheses* by aligning them to reference translations, 13 | much like BLEU does. -------------------------------------------------------------------------------- /terms/search-based-software-engineering-sbse.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Search-based software engineering - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Search-based_software_engineering 5 | title: Search-based software engineering (SBSE) 6 | --- 7 | Search-based software engineering applies search and 8 | optimization techniques to software engineering problems. 9 | 10 | [LDADE][/terms/latent-dirichlet-allocation-differential-evolution-ldade] is an example of a system that applies search-based software engineering to optimize topic modeling. -------------------------------------------------------------------------------- /_layouts/redirect.html: -------------------------------------------------------------------------------- 1 | --- 2 | layout: compress 3 | --- 4 | {% assign destination = "/terms/" | append: page.destination | append: "/" %} 5 | 6 | 7 | 8 | 9 | {{ page.title }} - {{ site.title }} 10 | 11 | 12 |

{{ page.title }} is a synonym for another term.

13 |

Click here if your browser does not automatically redirect.

14 | 15 | -------------------------------------------------------------------------------- /_sass/tachyons/scss/_gradients.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | GRADIENTS 11 | 12 | 13 | */ 14 | 15 | .gradient-blue { 16 | background-image: linear-gradient(#4570B0, #0081C2); 17 | } 18 | 19 | .gradient-blue-reversed { 20 | background-image: linear-gradient(#0081C2, #4570B0); 21 | } 22 | 23 | .gradient-light-blue { 24 | background-image: linear-gradient(#76D3FE, #008AE0); 25 | } 26 | 27 | .gradient-light-blue-reversed { 28 | background-image: linear-gradient(#008AE0, #76D3FE); 29 | } 30 | -------------------------------------------------------------------------------- /terms/no-free-lunch-nfl-theorem.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Doesn't the NFL theorem show that black box optimization is flawed? 4 | - Black Box Optimization Competition 5 | link_url: https://bbcomp.ini.rub.de/faq.html#q20 6 | - link_title: No free lunch theorem - Wikipedia 7 | link_url: https://en.wikipedia.org/wiki/No_free_lunch_theorem 8 | related_terms: 9 | - optimization 10 | title: No Free Lunch (NFL) theorem 11 | --- 12 | The "No Free Lunch" theorem is the idea that all optimizers perform equally well 13 | when averaged across all possible optimization problems. -------------------------------------------------------------------------------- /terms/softmax.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Softmax function - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Softmax_function 5 | related_terms: 6 | - hierarchical-softmax 7 | title: Softmax 8 | --- 9 | The softmax turns $n$ numbers 10 | in $\mathbb R^N$ into a probability distribution proportional 11 | to the size of the numbers. 12 | 13 | Given an $n$-dimensional vector $\mathbf v$ with all component terms 14 | in $\mathbb R^N$, the softmax of $\mathbb v$ is: 15 | $$ 16 | \mathrm{softmax}(\mathbb v)_i = 17 | \frac{\exp{(v_i)}} 18 | {\sum_{j=1}^{n} \exp{(v_i)}} 19 | $$ -------------------------------------------------------------------------------- /terms/top-5-error-rate.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: 'ImageNet: what is top-1 and top-5 error rate?' 4 | link_url: https://stats.stackexchange.com/questions/156471/imagenet-what-is-top-1-and-top-5-error-rate 5 | related_terms: 6 | - top-1-error-rate 7 | title: Top-5 error rate 8 | --- 9 | The term *top-5 error rate* refers method of benchmarking 10 | machine learning models in the ImageNet 11 | Large Scale Visual Recognition Competition. 12 | 13 | The model is considered to have classified a given image correctly 14 | if the target label is one of the model's top 5 predictions. -------------------------------------------------------------------------------- /terms/inceptionism.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Inceptionism 3 | related_terms: 4 | - convolutional-neural-network-cnn 5 | - googlenet-neural-network 6 | - inception-neural-network 7 | --- 8 | 9 | *Inceptionism* refers to a visualization technique to understand what 10 | a neural network learned. The network is fed an image, 11 | asked what the network detected, and then that feature in the 12 | image is *amplified*. The full technique is described in the 13 | Google Research blog post titled [Inceptionism: Going Deeper into Neural Networks](https://research.googleblog.com/2015/06/inceptionism-going-deeper-into-neural.html). -------------------------------------------------------------------------------- /_sass/tachyons/scss/_opacity.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | OPACITY 11 | Docs: http://tachyons.io/docs/themes/opacity/ 12 | 13 | */ 14 | 15 | .o-100 { opacity: 1; } 16 | .o-90 { opacity: .9; } 17 | .o-80 { opacity: .8; } 18 | .o-70 { opacity: .7; } 19 | .o-60 { opacity: .6; } 20 | .o-50 { opacity: .5; } 21 | .o-40 { opacity: .4; } 22 | .o-30 { opacity: .3; } 23 | .o-20 { opacity: .2; } 24 | .o-10 { opacity: .1; } 25 | .o-05 { opacity: .05; } 26 | .o-025 { opacity: .025; } 27 | .o-0 { opacity: 0; } 28 | -------------------------------------------------------------------------------- /terms/reinforcement-learning.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Reinforcement Learning (RL) 3 | references: 4 | - link_title: Reinforcement learning - Wikipedia 5 | link_url: http://en.wikipedia.org/wiki/Reinforcement_learning 6 | related_terms: 7 | - supervised-learning 8 | - unsupervised-learning 9 | --- 10 | Reinforcement learning is about learning from feedback (reinforcement) while learning 'on the job', i.e. learning by trying, rather than from labeled answer data. 11 | This is how robots may learn, but is also used for playing games, a tight feedback loop through e.g. score may help give the algorithm an idea of what works well. 12 | -------------------------------------------------------------------------------- /terms/hadamard-product.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Hadamard product 3 | --- 4 | The Hadamard product refers to component-wise multiplication of the same dimension. 5 | The $\odot$ symbol is commonly used as the Hadamard product operator. 6 | 7 | Here is an example for the Hadamard product for a pair of $3 \times 3$ matrices. 8 | 9 | $$ 10 | \begin{bmatrix} 11 | a & b & c \\ 12 | d & e & f \\ 13 | g & h & i 14 | \end{bmatrix} 15 | \odot 16 | \begin{bmatrix} 17 | j & k & l \\ 18 | m & n & o \\ 19 | p & q & r 20 | \end{bmatrix} 21 | = 22 | \begin{bmatrix} 23 | aj & bk & cl \\ 24 | dm & ne & fo \\ 25 | gp & hq & ir 26 | \end{bmatrix} 27 | $$ 28 | -------------------------------------------------------------------------------- /_sass/tachyons/scss/_links.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | LINKS 11 | Docs: http://tachyons.io/docs/elements/links/ 12 | 13 | */ 14 | 15 | .link { 16 | text-decoration: none; 17 | transition: color .15s ease-in; 18 | } 19 | 20 | .link:link, 21 | .link:visited { 22 | transition: color .15s ease-in; 23 | } 24 | .link:hover { 25 | transition: color .15s ease-in; 26 | } 27 | .link:active { 28 | transition: color .15s ease-in; 29 | } 30 | .link:focus { 31 | transition: color .15s ease-in; 32 | outline: 1px dotted currentColor; 33 | } 34 | 35 | -------------------------------------------------------------------------------- /terms/filter-convolution.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Kernel (image processig) - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Kernel_(image_processing 5 | title: Filter (convolution) 6 | --- 7 | A *filter* (also known as a *kernel*) is a small matrix 8 | used in convolution operations. 9 | 10 | Convolution filters are commonly used in image processing 11 | to modify images or extract features. 12 | 13 | The dimensions of a convolution filter are typically small, 14 | odd, and square. For example, convolution filters are typically 15 | $3 \times 3$ or $5 \times 5$ matrices. Odd dimensions are 16 | preferred to even dimensions. -------------------------------------------------------------------------------- /terms/data-parallelism.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Data parallelism - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Data_parallelism 5 | - link_title: What is the difference between model parallelism and data parallelism? 6 | - Quora 7 | link_url: https://www.quora.com/What-is-the-difference-between-model-parallelism-and-data-parallelism 8 | title: Data parallelism 9 | --- 10 | Data parallelism is when data is distributed across multiple 11 | nodes in a distributed computing environment, and then 12 | each node acts on the data in parallel. 13 | 14 | On each node, the computation is the same, but the data is 15 | different. -------------------------------------------------------------------------------- /terms/stochastic-convex-hull-sch.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: On the expected diameter, width, and complexity of a stochastic convex-hull 4 | link_url: http://arxiv.org/abs/1704.07028 5 | - link_title: On the Separability of Stochastic Geometric Objects, with Applications 6 | link_url: https://arxiv.org/abs/1603.07021 7 | - link_title: Convex Hulls under Uncertainty 8 | link_url: https://www.cs.ucsb.edu/~suri/psdir/esa14.pdf 9 | - link_title: Probabilistic Convex Hull Queries over Uncertain Data 10 | link_url: http://ieeexplore.ieee.org/document/6858080/ 11 | related_terms: 12 | - convex-hull 13 | title: Stochastic convex hull (SCH) 14 | --- 15 | -------------------------------------------------------------------------------- /admin/index.html: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 5 | 6 | Content Manager 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | -------------------------------------------------------------------------------- /terms/convolutional-neural-network-cnn.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Convolutional neural network - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Convolutional_neural_network 5 | - link_title: Stanford CS231n Convolutional Neural Networks for Visual Recognition 6 | link_url: http://cs231n.github.io/convolutional-networks/ 7 | - link_title: Understanding convolutional neural networks for NLP - WildML 8 | link_url: http://www.wildml.com/2015/11/understanding-convolutional-neural-networks-for-nlp/ 9 | related_terms: 10 | - convolution 11 | - attention-neural-networks 12 | - neural-network 13 | title: Convolutional Neural Networks (CNN) 14 | --- 15 | -------------------------------------------------------------------------------- /terms/extractive-sentence-summarization.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Extractive Summarization Using Supervised and Semi-supervised Learning 4 | link_url: http://anthology.aclweb.org/C/C08/C08-1124.pdf 5 | related_terms: 6 | - abstractive-sentence-summarization 7 | - textrank 8 | title: Extractive sentence summarization 9 | --- 10 | Extractive sentence summarization refers to programmatically 11 | creating a shorter version of a document by extracting 12 | the "important" parts. 13 | 14 | [TextRank][1] is an example of an algorithm that can 15 | rank sentences in a document for the purpose of extractive 16 | summarization. 17 | 18 | [1]: /terms/textrank -------------------------------------------------------------------------------- /terms/first-order-information.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: First-order information 3 | needs_review: true 4 | --- 5 | First-order information is a term used to mean information obtained 6 | by computing the first derivative of a function. The first 7 | derivative of a function reveals the slope of a tangent 8 | line to the function. This gives a general idea of how the function 9 | is changing at that point, but does not give information 10 | about the *curvature* of the function--the second derivative 11 | is required for that. 12 | 13 | First-order information should not be confused with 14 | [first-order logic][1]. 15 | 16 | [1]: https://en.wikipedia.org/wiki/First-order_logic -------------------------------------------------------------------------------- /terms/named-entity-recognition-in-query-nerq.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Named Entity Recognition in Query 4 | link_url: https://soumen.cse.iitb.ac.in/~soumen/doc/www2013/QirWoo/GuoXCL2009nerq.pdf 5 | - link_title: Named entity recognition in query - Google Patents 6 | link_url: https://www.google.com/patents/US9009134 7 | related_terms: 8 | - named-entity-recognition-ner 9 | title: Named Entity Recognition in Query (NERQ) 10 | --- 11 | *Named Entity Recognition in Query* is a phrase used in a research paper and patent from 12 | Microsoft, referring to the [Named Entity Recognition](/terms/named-entity-recognition-ner/) 13 | problem in web search queries. -------------------------------------------------------------------------------- /_sass/tachyons/scss/_box-sizing.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | BOX SIZING 11 | 12 | */ 13 | 14 | html, 15 | body, 16 | div, 17 | article, 18 | section, 19 | main, 20 | footer, 21 | header, 22 | form, 23 | fieldset, 24 | legend, 25 | pre, 26 | code, 27 | a, 28 | h1,h2,h3,h4,h5,h6, 29 | p, 30 | ul, 31 | ol, 32 | li, 33 | dl, 34 | dt, 35 | dd, 36 | textarea, 37 | table, 38 | td, 39 | th, 40 | tr, 41 | input[type="email"], 42 | input[type="number"], 43 | input[type="password"], 44 | input[type="tel"], 45 | input[type="text"], 46 | input[type="url"], 47 | .border-box { 48 | box-sizing: border-box; 49 | } 50 | -------------------------------------------------------------------------------- /terms/parameter-budget.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Parameter budget 3 | needs_review: true 4 | related_terms: 5 | - optimization 6 | --- 7 | A *parameter budget* refers to the idea of constraining the number 8 | of learnable parameters for a machine learning model. Some types 9 | of parameters are more useful for improving a model 10 | than others, thus they should be prioritized in a model 11 | with a restricted parameter budget. 12 | 13 | In neural networks, deeper networks seem to work better when the parameter 14 | budget is constrained. 15 | 16 | A related idea is the *computational budget*, but the budget for overall computation is not strictly tied to the number of parameters in a model. -------------------------------------------------------------------------------- /terms/word2phrase.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: word2phrase 3 | related_terms: 4 | - word2vec 5 | --- 6 | word2phrase refers to a program in the 7 | [word2vec](/terms/word2vec) toolkit that discovers 8 | multi-word phrases in a corpus of words. 9 | 10 | From the [original word2vec Google Code page](https://code.google.com/archive/p/word2vec/): 11 | 12 | > In certain applications, it is useful to have vector representation of larger pieces of text. For example, it is desirable to have only one vector for representing 'san francisco'. This can be achieved by pre-processing the training data set to form the phrases using the word2phrase tool, as is shown in the example script ./demo-phrases.sh. 13 | -------------------------------------------------------------------------------- /terms/reparameterization-trick.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Auto-Encoding Variational Bayes 4 | link_url: https://arxiv.org/abs/1312.6114 5 | - link_title: How does the reparameterization trick for VAEs work and why is it important? 6 | - Cross Validated 7 | link_url: https://stats.stackexchange.com/questions/199605/how-does-the-reparameterization-trick-for-vaes-work-and-why-is-it-important 8 | - link_title: Variational Auto-Encoders and Extensions - NIPS 2015 workshop 9 | link_url: http://dpkingma.com/wordpress/wp-content/uploads/2015/12/talk_nips_workshop_2015.pdf 10 | related_terms: 11 | - variational-autoencoder-vae 12 | title: Reparameterization trick 13 | --- 14 | -------------------------------------------------------------------------------- /terms/variance.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Variance - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Variance 5 | title: Variance 6 | --- 7 | Wikipedia describes variance as follows: 8 | 9 | > In probability theory and statistics, variance is the expectation of the squared deviation of a random variable from its mean, and it informally measures how far a set of (random) numbers are spread out from their mean. 10 | 11 | Variance the square of standard deviation, or rather, [standard deviation][1] is the square root of variance. 12 | Thus, sometimes variance is written as $\sigma^2$ where $\sigma$ stands for the standard deviation. 13 | 14 | [1]: /terms/standard-deviation/ -------------------------------------------------------------------------------- /meta/index.md: -------------------------------------------------------------------------------- 1 | --- 2 | layout: page 3 | title: Meta 4 | --- 5 | This section is for *meta-pages*. Pages that don't contain machine 6 | learning terminology, but are useful for helping edit, navigate, 7 | and understand this website. 8 | 9 | - [Unfinished Terms](/meta/unfinished/): This page lists terms that currently 10 | have incomplete definitions. 11 | - [Needs Review](/meta/needs-review/): This page lists terms that have definitions, but they have been marked for review. 12 | - [No Related Terms](/meta/no-related/): Ths page lists terms that are not connected to any other term via *Related Terms*. 13 | - [No References](/meta/no-references/): This page lists terms that do not have any references. -------------------------------------------------------------------------------- /terms/face-recognition.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Face recognition 3 | references: 4 | - link_title: What is face recognition? - Convolutional Neural Networks - deeplearning.ai 5 | link_url: https://www.coursera.org/learn/convolutional-neural-networks/lecture/lUBYU/what-is-face-recognition 6 | --- 7 | Face recognition is the problem of identifying whether an input 8 | image contains the faces of any of $k$ people... or if the image has 9 | none of the $k$ faces. 10 | 11 | Face recognition is a harder problem than [face verification][1] 12 | because face verification only compares a single image to one person, 13 | whereas face recognition does this for $k$ people. 14 | 15 | [1]: /terms/face-verification/ 16 | -------------------------------------------------------------------------------- /terms/stationary-environment.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Stationary environment 3 | --- 4 | A stationary environment refers to data-generating distributions 5 | that do not change over time. 6 | 7 | A non-stationary environment, in contrast, refers to data-generating 8 | distributions that do change over time. 9 | 10 | It is a difficult problem to train machine learning algorithms 11 | to generalize well in non-stationary environments. See 12 | [Machine Learning in Non-Stationary Environments][1] for 13 | more information. 14 | 15 | [1]: https://mitpress.mit.edu/books/machine-learning-non-stationary-environments "Machine Learning in Non-Stationary Environments: Introduction to Covariate Shift Adaptation - Mit Press" -------------------------------------------------------------------------------- /_layouts/page.html: -------------------------------------------------------------------------------- 1 | --- 2 | layout: compress 3 | --- 4 | {% include header.html %} 5 |
6 |
7 |

Search Results

8 |
9 |
10 |
11 |

{{ page.title }}

12 |
13 |
14 | {{ content }} 15 |
16 | 21 |
22 |
23 | 24 | {% include footer.html %} 25 | -------------------------------------------------------------------------------- /terms/distributional-similarity.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Distributional semantics - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Distributional_semantics 5 | - link_title: word2vec - Stanford CS224N Lecture 2 6 | link_url: https://www.youtube.com/watch?v=ERibwqs9p38 7 | title: Distributional similarity 8 | --- 9 | Distributional similarity is the idea that the meaning of words can be understood 10 | from their context. 11 | 12 | This should not be confused with the term [distributed representation][1], which refers to the 13 | idea of representing information with relatively dense vectors as opposed to a one-hot 14 | representation. 15 | 16 | [1]: /terms/distributed-representation/ -------------------------------------------------------------------------------- /terms/latent-semantic-indexing-lsi.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Latent semantic indexing - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Latent_semantic_analysis#Latent_semantic_indexing 5 | - link_title: What is latent semantic indexing? - Search Engine Journal 6 | link_url: https://www.searchenginejournal.com/what-is-latent-semantic-indexing-seo-defined/21642/ 7 | - link_title: Latent Semantic Indexing - Introduction to Information Retrieval 8 | link_url: https://nlp.stanford.edu/IR-book/html/htmledition/latent-semantic-indexing-1.html 9 | related_terms: 10 | - latent-dirichlet-allocation-lda 11 | - singular-value-decomposition-svd 12 | title: Latent Semantic Indexing (LSI) 13 | --- 14 | -------------------------------------------------------------------------------- /terms/top-1-error-rate.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: 'ImageNet: what is top-1 and top-5 error rate?' 4 | link_url: https://stats.stackexchange.com/questions/156471/imagenet-what-is-top-1-and-top-5-error-rate 5 | title: Top-1 error rate 6 | --- 7 | The term *top-1 error rate* refers method of benchmarking 8 | machine learning models in the ImageNet 9 | Large Scale Visual Recognition Competition. 10 | 11 | The model is considered to have classified a given image correctly 12 | if the target label is the model's top prediction. This 13 | is in contrast to the [top-5 error rate](/terms/top-5-error-rate/) 14 | where the model only needs to identify the correct label in the 15 | model's top 5 predictions. -------------------------------------------------------------------------------- /terms/sobel-filter-convolution.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Sobel operator - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Sobel_operator 5 | related_terms: 6 | - filter-convolution 7 | - convolution 8 | - convolutional-neural-network-cnn 9 | title: Sobel filter (convolution) 10 | --- 11 | The Sobel filter is a set of two convolution filters used to detect horizontal 12 | and vertical edges in images. 13 | 14 | The horizontal filter is 15 | 16 | $$ 17 | \begin{bmatrix} 18 | 1 & 0 & -1 \\ 19 | 2 & 0 & -2 \\ 20 | 1 & 0 & -1 21 | \end{bmatrix} 22 | $$ 23 | 24 | and the vertical filter is 25 | 26 | $$ 27 | \begin{bmatrix} 28 | 1 & 2 & 1 \\ 29 | 0 & 0 & 0 \\ 30 | -1 & -2 & -1 31 | \end{bmatrix} 32 | $$ -------------------------------------------------------------------------------- /terms/activation-function.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Activation function - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Activation_function 5 | - link_title: Commonly used activation functions - Stanford CS231n notes 6 | link_url: http://cs231n.github.io/neural-networks-1/#actfun 7 | related_terms: 8 | - neural-network 9 | title: Activation function 10 | --- 11 | In neural networks, an activation function defines 12 | the output of a neuron. 13 | 14 | The activation function takes the dot product of 15 | the input to the neuron ($\mathbf x$) and the weights ($\mathbf w$). 16 | 17 | Typically activation functions are nonlinear, as that allows the 18 | network to approximate a wider variety of functions. -------------------------------------------------------------------------------- /terms/gram-matrix.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Gramian matrix - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Gramian_matrix 5 | - link_title: Gram Matrix - Wolfram Mathworld 6 | link_url: http://mathworld.wolfram.com/GramMatrix.html 7 | - link_title: Gram matrix - Encyclopedia of Mathematics 8 | link_url: https://www.encyclopediaofmath.org/index.php/Gram_matrix 9 | title: Gram matrix 10 | --- 11 | [Wolfram Mathworld defines Gram matrix as][1]: 12 | 13 | > Given a set $V$ of $m$ vectors (points in $\mathcal R^n$), the Gram matrix $G$ is 14 | > the matrix of all possible inner products of $V$, i.e., 15 | > $$ g_{ij} = \mathbf v_i^T \mathbf v_j $$ 16 | 17 | [1]: http://mathworld.wolfram.com/GramMatrix.html -------------------------------------------------------------------------------- /terms/one-shot-learning.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: One-shot learning 3 | related_terms: 4 | - representation-learning 5 | references: 6 | - link_title: One Shot Learning - Convolutional Neural Networks - deeplearning.ai 7 | link_url: https://www.coursera.org/learn/convolutional-neural-networks/lecture/gjckG/one-shot-learning 8 | --- 9 | One-shot learning refers to the problem of 10 | training a statistical model (such as a 11 | classifier) with only a single example per class. 12 | 13 | One way to build a system capable of 14 | one-shot learning is to use [representation learning][1], to learn representations or features of data 15 | that can be used to accurately classify a single example. 16 | 17 | [1]: /terms/representation-learning/ -------------------------------------------------------------------------------- /terms/tree-lstm.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Improved Semantic Representations From Tree-Structured Long Short-Term 4 | Memory Networks 5 | link_url: https://arxiv.org/abs/1503.00075 6 | related_terms: 7 | - long-short-term-memory-lstm 8 | title: Tree-LSTM 9 | --- 10 | Tree-LSTMs are a variant of Long Short Term Memory (LSTM) neural networks. 11 | 12 | A traditional LSTM is structured as a linear chain, and displays 13 | strong performance on sequence modeling tasks--such as machine translation. 14 | 15 | However, some types of data (such as text) are better represented as 16 | tree structures instead of sequences. Thus, Tree-LSTMs were 17 | [introduced by Tai, et al][1] in 2015. 18 | 19 | [1]: https://arxiv.org/abs/1503.00075 -------------------------------------------------------------------------------- /terms/lenet.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Case Studies - Stanford CS231n Convolutional Neural Networks 4 | link_url: http://cs231n.github.io/convolutional-networks/#case 5 | - link_title: Gradient-Based Learning Applied to Document Recognition 6 | link_url: http://yann.lecun.com/exdb/publis/pdf/lecun-98.pdf 7 | related_terms: 8 | - convolutional-neural-network-cnn 9 | title: LeNet 10 | --- 11 | LeNet was an early convolutional neural network proposed 12 | by Lecun et al in the paper 13 | [Gradient-Based Learning Applied to Document Recognition](http://yann.lecun.com/exdb/publis/pdf/lecun-01a.pdf). 14 | 15 | LeNet was designed for handwriting recognition. Many modern 16 | convolutional neural network architectures are inspired by LeNet. -------------------------------------------------------------------------------- /terms/face-verification.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Face verification 3 | related_terms: 4 | - face-recognition 5 | - face-detection 6 | references: 7 | - link_title: What is face recognition? - Convolutional Neural Networks - deeplearning.ai 8 | link_url: https://www.coursera.org/learn/convolutional-neural-networks/lecture/lUBYU/what-is-face-recognition 9 | --- 10 | *Face verification* is the problem of identifying 11 | whether an image belongs to a person--given 12 | the one image and one person as the input. 13 | 14 | Face verification is an easier problem than 15 | [face recognition][1] because face verification only compares 16 | a single image to one person, whereas face recognition does this 17 | for $k$ people. 18 | 19 | [1]: /terms/face-recognition/ -------------------------------------------------------------------------------- /terms/alexnet.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: ImageNet Classification with Deep Convolutional Neural Networks 4 | link_url: https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf 5 | related_terms: 6 | - convolutional-neural-network-cnn 7 | title: AlexNet 8 | --- 9 | AlexNet is a convolutional neural network architecture proposed by 10 | Alex Krizhevsky, Ilya Sutskever, and Geoffrey Hinton in 2012. 11 | 12 | At the time, it achieved state-of-the-art performance on 13 | the test set for the 2010 ImageNet Large Scale Visual Recognition Competition (LSVRC). A variant of the model won the 14 | 2012 ImageNet LSVRC with a top-5 test error rate of 15 | 15.3%--ten percentage points ahead of the second place winner. -------------------------------------------------------------------------------- /terms/weight-sharing.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Simplifying Neural Networks by Soft Weight-Sharing 4 | link_url: http://www.cs.toronto.edu/~fritz/absps/sunspots.pdf 5 | - link_title: Shared Weights - Convolutional Neural Networks - Deep Learning Tutorial 6 | link_url: http://deeplearning.net/tutorial/lenet.html#shared-weights 7 | - link_title: Soft Weight-Sharing for Neural Network Compression 8 | link_url: https://arxiv.org/abs/1702.04008 9 | related_terms: 10 | - neural-network 11 | title: Weight sharing 12 | --- 13 | In neural networks, weight sharing is a way to reduce the number of parameters while allowing 14 | for more robust feature detection. Reducing the number of parameters can be 15 | considered a form of [model compression](/terms/model-compression/). -------------------------------------------------------------------------------- /terms/co-adaptation.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: What does "co-adaptation of neurons" in a Neural network mean? - Quora 4 | link_url: https://www.quora.com/What-does-co-adaptation-of-neurons-in-a-Neural-network-mean 5 | related_terms: 6 | - dropout 7 | - regularization 8 | title: Co-adaptation 9 | --- 10 | In neural networks, co-adaptation refers to when different hidden 11 | units in a neural networks have highly correlated behavior. 12 | 13 | It is better for computational efficiency and the the model's ability 14 | to learn a general representation if hidden units can detect 15 | features independently of each other. 16 | 17 | A few different regularization techniques aim at reducing 18 | co-adapatation--[dropout][1] being a notable one. 19 | 20 | [1]: /terms/dropout/ -------------------------------------------------------------------------------- /terms/hypergraph.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Hypergraph - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Hypergraph 5 | - link_title: 'Learning with Hypergraphs: Clustering, Classification, Embedding' 6 | link_url: https://papers.nips.cc/paper/3128-learning-with-hypergraphs-clustering-classification-and-embedding.pdf 7 | - link_title: What are the applications of hypergraphs? - Math Overflow 8 | link_url: https://mathoverflow.net/questions/13750/what-are-the-applications-of-hypergraphs 9 | related_terms: 10 | - graph 11 | title: Hypergraph 12 | --- 13 | A hypergraph is a generalization of the [graph][1]. A graph has edges that connect 14 | pairs of vertices, but a hypergraph has hyperedges that can connect any number 15 | of vertices. 16 | 17 | [1]: /terms/graph/ -------------------------------------------------------------------------------- /terms/adagrad.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Adaptive Subgradient Methods for Online Learning and Stochastic Optimization 4 | link_url: http://jmlr.org/papers/v12/duchi11a.html 5 | - link_title: An overview of gradient descent optimization algorithms - Sebiastian 6 | Ruder 7 | link_url: http://sebastianruder.com/optimizing-gradient-descent/ 8 | related_terms: 9 | - stochastic-gradient-descent-sgd 10 | - momentum-optimization 11 | - learning-rate 12 | title: AdaGrad 13 | --- 14 | AdaGrad is a gradient-descent based optimization algorithm. It automatically 15 | tunes the [learning rate][1] based on its observations of the data's geometry. 16 | AdaGrad is designed to perform well with datasets that have infrequently-occurring 17 | features. 18 | 19 | [1]: /terms/learning-rate/ -------------------------------------------------------------------------------- /terms/facets-tool.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Facets home page at Google PAIR 4 | link_url: https://pair-code.github.io/facets/ 5 | title: Facet (visualization tool) 6 | --- 7 | Facets is a a plotting and visualizaton tool created by 8 | the People + AI Research (PAIR) initiative at Google. 9 | 10 | Facets is broken into two tools with the following goals: 11 | 12 | - **Facets Overview** -- summarize statistics for features collected from datasets 13 | - **Facets Dive** -- explore the relationship between different features in a dataset 14 | 15 | From the Facets homepage, they state that 16 | 17 | > Success stories of (Facets) Dive include the detection of classifier failure, identification of systematic errors, evaluating ground truth and potential new signals for ranking. -------------------------------------------------------------------------------- /terms/he-initialization.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: 'Delving Deep into Rectifiers: Surpassing Human-Level Performance on 4 | ImageNet Classification' 5 | link_url: https://arxiv.org/abs/1502.01852 6 | related_terms: 7 | - symmetry-breaking 8 | - random-initialization 9 | title: He initialization 10 | --- 11 | The term *He initialization* refers to the first author in the paper 12 | "[Delving Deep into Rectifiers: Surpassing Human-Level Performance on ImageNet Classification](https://arxiv.org/abs/1502.01852)". 13 | 14 | He initialization initializes the bias vectors of a neural network 15 | to $0$ and the weights to random numbers drawn from a Gaussian 16 | distribution where the mean is $0$ and the variance is 17 | $\sqrt(2/n_l)$ where $n_l$ is the dimension of the previous layer. -------------------------------------------------------------------------------- /terms/leaky-relu.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Leaky ReLU 3 | related_terms: 4 | - rectified-linear-unit-relu 5 | - activation-function 6 | --- 7 | Leaky ReLU is a type of [activation function][1] that tries 8 | to solve the [Dying ReLU problem][2]. 9 | 10 | A traditional rectified linear unit $f(x)$ returns 0 when $x \leq 0$. 11 | The *Dying ReLU problem* refers to when the unit gets stuck this 12 | way--always returning 0 for any input. 13 | 14 | Leaky ReLU aims to fix this by returning a small, negative, 15 | non-zero value instead of 0, as such: 16 | 17 | $$ 18 | f(x) = 19 | \begin{cases} 20 | \max(0,x) & x > 0 \\ 21 | \alpha x & x \leq 0 22 | \end{cases} 23 | $$ 24 | where $\alpha$ is typically a small value like $\alpha = 0.0001$. 25 | 26 | [1]: /terms/activation-function/ 27 | [2]: /terms/dying-relu/ -------------------------------------------------------------------------------- /terms/resnet.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Case Studies - Stanford CS231n Convolutional Neural Networks 4 | link_url: http://cs231n.github.io/convolutional-networks/#case 5 | - link_title: Deep Residual Learning for Image Recognition 6 | link_url: https://arxiv.org/abs/1512.03385 7 | - link_title: Identity Mappings in Deep Residual Networks 8 | link_url: https://arxiv.org/abs/1603.05027 9 | related_terms: 10 | - convolutional-neural-network-cnn 11 | title: ResNet 12 | --- 13 | ResNet stands for "Residual Network" and was introduced in the paper 14 | [Deep Residual Learning for Image Recognition](https://arxiv.org/abs/1512.03385). 15 | ResNet won the [ImageNet Large Scale Visual Recognition Challenge (ILSVRC)][1] 2015 16 | competition. 17 | 18 | [1]: http://www.image-net.org/challenges/LSVRC/ -------------------------------------------------------------------------------- /terms/adaptive-learning-rate.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Adaptive learning rate 3 | related_terms: 4 | - learning-rate 5 | - learning-rate-decay 6 | - adagrad 7 | - adam-optimizer 8 | --- 9 | The term *adaptive learning rate* refers to variants 10 | of [stochastic gradient descent][1] with learning 11 | rates that change over the course of the algorithm's 12 | execution. 13 | 14 | Allowing the learning rate to change dynamically 15 | eliminates the need to pick a "good" static learning rate, 16 | and can lead to faster training and a trained model 17 | with better performance. 18 | 19 | Some adaptive learning rate algorithms are: 20 | - [Adagrad][2] 21 | - [ADADELTA][3] 22 | - [ADAM][4] 23 | 24 | [1]: /terms/stochastic-gradient-descent-sgd/ 25 | [2]: /terms/adagrad/ 26 | [3]: /terms/adadelta/ 27 | [4]: /terms/adam-optimizer/ 28 | -------------------------------------------------------------------------------- /terms/model-parallelism.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: What is the difference between model parallelism and data parallelism? 4 | - Quora 5 | link_url: https://www.quora.com/What-is-the-difference-between-model-parallelism-and-data-parallelism 6 | - link_title: Training with Multiple GPUs Using Model Parallelism - MXNet documentation 7 | link_url: http://mxnet.io/how_to/model_parallel_lstm.html 8 | related_terms: 9 | - data-parallelism 10 | title: Model parallelism 11 | --- 12 | Model parallelism is where multiple computing nodes evaluate 13 | the same model with the same data, but using different 14 | parameters or hyperparameters. 15 | 16 | In contrast to model parallelism, 17 | [data parallelism](/terms/data-parallelism/) 18 | where the different computing nodes have the same 19 | parameters but different data. -------------------------------------------------------------------------------- /_sass/tachyons/scss/_font-style.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | FONT STYLE 11 | Docs: http://tachyons.io/docs/typography/font-style/ 12 | 13 | Media Query Extensions: 14 | -ns = not-small 15 | -m = medium 16 | -l = large 17 | 18 | */ 19 | 20 | .i { font-style: italic; } 21 | .fs-normal { font-style: normal; } 22 | 23 | @media #{$breakpoint-not-small} { 24 | .i-ns { font-style: italic; } 25 | .fs-normal-ns { font-style: normal; } 26 | } 27 | 28 | @media #{$breakpoint-medium} { 29 | .i-m { font-style: italic; } 30 | .fs-normal-m { font-style: normal; } 31 | } 32 | 33 | @media #{$breakpoint-large} { 34 | .i-l { font-style: italic; } 35 | .fs-normal-l { font-style: normal; } 36 | } 37 | -------------------------------------------------------------------------------- /_sass/tachyons/scss/_tables.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | TABLES 11 | Docs: http://tachyons.io/docs/elements/tables/ 12 | 13 | */ 14 | 15 | .collapse { 16 | border-collapse: collapse; 17 | border-spacing: 0; 18 | } 19 | 20 | .striped--light-silver:nth-child(odd) { 21 | background-color: $light-silver; 22 | } 23 | 24 | .striped--moon-gray:nth-child(odd) { 25 | background-color: $moon-gray; 26 | } 27 | 28 | .striped--light-gray:nth-child(odd) { 29 | background-color: $light-gray; 30 | } 31 | 32 | .striped--near-white:nth-child(odd) { 33 | background-color: $near-white; 34 | } 35 | 36 | .stripe-light:nth-child(odd) { 37 | background-color: $white-10; 38 | } 39 | 40 | .stripe-dark:nth-child(odd) { 41 | background-color: $black-10; 42 | } 43 | -------------------------------------------------------------------------------- /meta/no-related.html: -------------------------------------------------------------------------------- 1 | --- 2 | layout: page 3 | title: "Meta: Terms Without Related Terms" 4 | --- 5 |

6 | This is a list of terms that have an empty list of outbound 7 | related terms, and also is not referenced by any other term as a 8 | related term. 9 |

10 | -------------------------------------------------------------------------------- /terms/coreference-resolution.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Coreference resolution 3 | related_terms: 4 | - named-entity-recognition-ner 5 | - natural-language-processing 6 | --- 7 | The Stanford NLP group [defines coreference resolution][1] as: 8 | 9 | > Coreference resolution is the task of finding all expressions that refer to the same entity in a text. It is an important step for a lot of higher level NLP tasks that involve natural language understanding such as document summarization, question answering, and information extraction. 10 | 11 | Coreference resolution should not be confused with 12 | [Named Entity Recognition](/terms/named-entity-recognition-ner/), which is focused on labeling 13 | sequences of text that refer to entities--but not focused 14 | on linking those entities together. 15 | 16 | [1]: https://nlp.stanford.edu/projects/coref.shtml -------------------------------------------------------------------------------- /terms/momentum-optimization.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Why Momentum Really Works - Distill 4 | link_url: http://distill.pub/2017/momentum/ 5 | related_terms: 6 | - stochastic-optimization 7 | - stochastic-gradient-descent-sgd 8 | title: Momentum 9 | --- 10 | Momentum is commonly understood as a variation of [stochastic gradient descent][1], 11 | but with one important difference: stochastic gradient descent can 12 | unnecessarily oscillate, and doesn't accelerate based on the shape of the 13 | curve. 14 | 15 | In contrast, momentum can dampen oscillations and accelerate convergence. 16 | 17 | Momentum was originally [proposed in 1964 by Boris T. Polyak][2]. 18 | 19 | [1]: /terms/stochastic-gradient-descent-sgd/ 20 | [2]: https://www.researchgate.net/publication/243648538_Some_methods_of_speeding_up_the_convergence_of_iteration_methods -------------------------------------------------------------------------------- /terms/convex-combination.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Convex combination - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Convex_combination 5 | related_terms: 6 | - convex-optimization 7 | title: Convex combination 8 | --- 9 | A convex combination is a linear combination, where all 10 | the coefficients are greater than 0 and sum to 1. 11 | 12 | The [Convex combination Wikipedia article][1] gives the following example: 13 | 14 | Given a finite number of points $x_1, x_2, \ldots, x_n$ in a real vector 15 | space, a convex combination of these points is a point of the form 16 | 17 | $$ 18 | a_1 x_1 + a_2 x _2 + \ldots + a_n x_n 19 | $$ 20 | is a convex combination if all real numbers $a_i \geq 0$ and 21 | $a_1 + a_2 + \ldots + a_n = 1$ 22 | 23 | [1]: https://en.wikipedia.org/wiki/Convex_combination "Convex combination - Wikipedia" -------------------------------------------------------------------------------- /admin/config.yml: -------------------------------------------------------------------------------- 1 | backend: 2 | name: git-gateway 3 | branch: master 4 | publish_mode: editorial_workflow 5 | media_folder: images 6 | collections: 7 | - name: terms 8 | label: Term 9 | folder: terms 10 | create: true 11 | slug: "{{slug}}" 12 | fields: 13 | - label: Title 14 | name: title 15 | widget: string 16 | required: true 17 | - label: Related Terms 18 | name: related_terms 19 | widget: list 20 | - label: Reference 21 | name: references 22 | widget: list 23 | fields: 24 | - label: Link Title 25 | name: link_title 26 | widget: string 27 | - label: Link URL 28 | name: link_url 29 | widget: string 30 | - label: Definition 31 | name: body 32 | widget: markdown 33 | -------------------------------------------------------------------------------- /terms/one-hot-encoding.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: One-hot - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/One-hot 5 | - link_title: What is one hot encoding and when is it used in data science? 6 | link_url: https://www.quora.com/What-is-one-hot-encoding-and-when-is-it-used-in-data-science 7 | related_terms: 8 | - distributed-representation 9 | title: One-hot encoding 10 | --- 11 | *One-hot encoding* refers to a way of transforming data into vectors 12 | where all components are 0, except for one component with a value of 1, 13 | e,g.: 14 | $$ 15 | 0 = [1, 0, 0, 0, 0]^T 16 | $$ 17 | $$ 18 | 1 = [0, 1, 0, 0, 0]^T 19 | $$ 20 | $$ 21 | \ldots 22 | $$ 23 | $$ 24 | 4 = [0, 0, 0, 0, 1]^T 25 | $$ 26 | and so on. 27 | 28 | One-hot encoding can make it easier for machine learning algorithms to 29 | manipulate and learn categorical variables. -------------------------------------------------------------------------------- /terms/part-of-speech-pos-tagging.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Part-of-speech tagging 4 | link_url: https://en.wikipedia.org/wiki/Part-of-speech_tagging 5 | related_terms: 6 | - natural-language-processing 7 | title: Part-of-Speech (POS) Tagging 8 | --- 9 | Part-of-Speech tagging is the process of reading 10 | natural language text and assigning parts of speech to each 11 | token. 12 | 13 | One could imagine taking in a sentence like: 14 | 15 | > The dog ran away. 16 | 17 | and creating a data structure that had the following annotations: 18 | 19 | > The*[article]* dog*[noun]* ran*[verb]* away*[adjective]*. 20 | 21 | Words can have different parts-of-speech depending on their 22 | context. For example, the word *away* can be either an [adverb 23 | or an adjective, or part of a larger phrase][1]. 24 | 25 | [1]: http://www.dictionary.com/browse/away -------------------------------------------------------------------------------- /terms/termite.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: 'Termite: Visualization Techniques for Assessing Textual Topic Models 4 | - Stanford Visualization Group' 5 | link_url: http://vis.stanford.edu/papers/termite 6 | - link_title: Online Termite Demo 7 | link_url: http://vis.stanford.edu/topic-diagnostics/model/silverStandards/ 8 | - link_title: Termite Data Server - Github 9 | link_url: https://github.com/uwdata/termite-data-server 10 | related_terms: 11 | - latent-dirichlet-allocation-lda 12 | title: Termite 13 | --- 14 | Termite is a visual analysis tool to determine the quality of topic models 15 | like [latent Dirichlet allocation](/terms/latent-dirichlet-allocation-lda/). 16 | 17 | Termite lays out document terms as a table of circles where: 18 | 19 | - rows represent document terms 20 | - columns represent topics 21 | - circular areas represent term probabilities -------------------------------------------------------------------------------- /terms/imputation.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: 6 Different Ways to Compensate for Missing Values In a Dataset (Data Imputation with examples) - Will Badr 4 | link_url: https://towardsdatascience.com/6-different-ways-to-compensate-for-missing-values-data-imputation-with-examples-6022d9ca0779 5 | - link_title: Imputation (statistics) 6 | link_url: https://en.wikipedia.org/wiki/Imputation_(statistics) 7 | - link_title: Defining, Analysing, and Implementing Imputation Techniques - Shashank Singhal 8 | link_url: https://scikit-learn.org/stable/glossary.html#term-imputation 9 | related_terms: 10 | title: Imputation 11 | --- 12 | Imputation means replacing missing data values with substitute values. 13 | There are several ways to do this such as choosing from a random distribution to avoid bias or 14 | by replacing the missing value with the mean or median of that column. 15 | 16 | -------------------------------------------------------------------------------- /terms/random-initialization.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Why doesn't backpropagation work when you initialize the weights the 4 | same value? -- Cross Validated 5 | link_url: https://stats.stackexchange.com/questions/45087/why-doesnt-backpropagation-work-when-you-initialize-the-weights-the-same-value 6 | - link_title: Random Initialization - Coursera Machine Learning 7 | link_url: https://www.coursera.org/learn/machine-learning/lecture/ND5G5/random-initialization 8 | related_terms: 9 | - symmetry-breaking 10 | title: Random initialization 11 | --- 12 | Random initialization refers to the practice of using random numbers 13 | to initialize the weights of a machine learning model. 14 | 15 | Random initialization is one way of performing [symmetry breaking](/terms/symmetry-breaking), which is the act of preventing all of 16 | the weights in the machine learning model from being the same. -------------------------------------------------------------------------------- /terms/bias-variance-tradeoff.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Bias-variance tradeoff - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Bias%E2%80%93variance_tradeoff 5 | related_terms: 6 | - regularization 7 | - variance 8 | - bias 9 | - supervised-learning 10 | - underfitting 11 | - overfitting 12 | title: Bias-variance tradeoff 13 | --- 14 | The bias-variance tradeoff refers to the problem of minimizing two different sources of error 15 | when training a supervised learning model: 16 | 17 | 1. **Bias** - Bias is a consistent error, possibly from the algorithm having 18 | made an incorrect assumption about the training data. Bias is often related to underfitting. 19 | 20 | 2. **Variance** - Variances comes from a high sensitivity to differences in training data. 21 | Variance is often related to overfitting. 22 | 23 | It is typically difficult to simultaneously minimize bias and variance. -------------------------------------------------------------------------------- /terms/syntaxnet.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: "Announcing SyntaxNet: The World\u2019s Most Accurate Parser Goes Open\ 4 | \ Source - Google Research Blog" 5 | link_url: https://research.googleblog.com/2016/05/announcing-syntaxnet-worlds-most.html 6 | - link_title: Globally Normalized Transition-Based Neural Networks 7 | link_url: https://arxiv.org/abs/1603.06042 8 | related_terms: 9 | - natural-language-processing 10 | title: SyntaxNet 11 | --- 12 | SyntaxNet is a framework for natural language syntactic 13 | parsers released by Google in 2016. 14 | 15 | SyntaxNet tags words in a sentence with their syntactic part-of-speech 16 | and creates a parse tree showing dependencies between words 17 | in a sentence. 18 | 19 | Parsey McParseface is a SyntaxNet model trained on the English 20 | language. At its time of release, Parsey McParseface is the 21 | world's most accurate model of its kind. -------------------------------------------------------------------------------- /_sass/tachyons/scss/_white-space.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | WHITE SPACE 11 | 12 | Media Query Extensions: 13 | -ns = not-small 14 | -m = medium 15 | -l = large 16 | 17 | */ 18 | 19 | 20 | .ws-normal { white-space: normal; } 21 | .nowrap { white-space: nowrap; } 22 | .pre { white-space: pre; } 23 | 24 | @media #{$breakpoint-not-small} { 25 | .ws-normal-ns { white-space: normal; } 26 | .nowrap-ns { white-space: nowrap; } 27 | .pre-ns { white-space: pre; } 28 | } 29 | 30 | @media #{$breakpoint-medium} { 31 | .ws-normal-m { white-space: normal; } 32 | .nowrap-m { white-space: nowrap; } 33 | .pre-m { white-space: pre; } 34 | } 35 | 36 | @media #{$breakpoint-large} { 37 | .ws-normal-l { white-space: normal; } 38 | .nowrap-l { white-space: nowrap; } 39 | .pre-l { white-space: pre; } 40 | } 41 | 42 | -------------------------------------------------------------------------------- /terms/codebook-collapse.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Codebook collapse 3 | related_terms: 4 | - vector-quantized-variational-autoencoder-vqvae 5 | - codebook 6 | - mode-collapse 7 | --- 8 | 9 | **Codebook collapse** is a problem that arises when training generative machine learning models that generate outputs using a fixed-length codebook, such as the [Vector-Quantized Variational Autoencoder (VQ-VAE)][2]. 10 | 11 | In ideal scenarios, the model's fixed-size codebook is large enough to create a diverse set of output values. Codebook collapse happens when the model only learns to use a few of the values in the codebook--artificially limiting the diversity of outputs that the model can generate. 12 | 13 | Codebook collapse is analogous to [mode collapse][1], another problem commonly faced when training generative models. 14 | 15 | [1]: /terms/mode-collapse 16 | [2]: /terms/vector-quantized-variational-autoencoder-vqvae 17 | -------------------------------------------------------------------------------- /terms/margin.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Margin 3 | related_terms: 4 | - support-vector-machine-svm 5 | references: 6 | - link_title: Hard Margin - Support Vector Machines 7 | link_url: https://en.wikipedia.org/wiki/Support_vector_machine#Hard-margin 8 | --- 9 | In machine learning, a *margin* often refers to the 10 | distance between the two hyperplanes that separate linearly-separable classes of data points. 11 | 12 | ![In this [image from Wikipedia][1], the dotted lines represent the two hyperplanes dividing the white and black data points. The region between the lines is the margin.](/images/margin.png) 13 | 14 | The term is most commonly used when discussing 15 | [support vector machines][2], but often appears in 16 | other literature discussing boundaries between points in a vector space. 17 | 18 | [1]: https://en.wikipedia.org/wiki/Support_vector_machine#Hard-margin 19 | [2]: /terms/support-vector-machine-svm/ 20 | -------------------------------------------------------------------------------- /terms/latent-dirichlet-allocation-differential-evolution-ldade.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: What is Wrong with Topic Modeling? (and How to Fix it Using Search-based 4 | Software Engineering) 5 | link_url: https://arxiv.org/abs/1608.08176 6 | related_terms: 7 | - latent-dirichlet-allocation-lda 8 | - differential-evolution 9 | - clustering-stability 10 | - search-based-software-engineering-sbse 11 | title: Latent Dirichlet Allocation Differential Evolution (LDADE) 12 | --- 13 | LDADE is a tool proposed by Agrawal et al. in a paper 14 | titled [What is Wrong with Topic Modeling? (and How to Fix it Using Search-based Software Engineering)](https://arxiv.org/abs/1608.08176). 15 | 16 | It tunes [LDA](/terms/latent-dirichlet-allocation-lda) parameters 17 | using 18 | [Differential Evolution](/terms/differential-evolution/) to increase 19 | the [clustering stability](/terms/clustering-stability/) of standard LDA. -------------------------------------------------------------------------------- /terms/dying-relu.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: "What is the \u201Cdying ReLU\u201D problem in neural networks? - Data\ 4 | \ Science StackExchange" 5 | link_url: https://datascience.stackexchange.com/questions/5706/what-is-the-dying-relu-problem-in-neural-networks 6 | related_terms: 7 | - leaky-relu 8 | - rectified-linear-unit-relu 9 | title: Dying ReLU 10 | --- 11 | *Dying ReLU* refers to a problem when training neural 12 | networks with [rectified linear units (ReLU)][1]. 13 | The unit dies when it only outputs 0 for any given input. 14 | 15 | When training with stochastic gradient descent, the unit 16 | is not likely to return to life, and the unit will no 17 | longer be useful during training. 18 | 19 | [Leaky ReLU][2] is a variant that solves the Dying ReLU problem 20 | by returning a small value when the input $x$ is less than 0. 21 | 22 | [1]: /terms/rectified-linear-unit-relu/ 23 | [2]: /terms/leaky-relu/ -------------------------------------------------------------------------------- /terms/zero-shot-learning.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: What is zero-shot learning? 4 | link_url: https://www.quora.com/What-is-zero-shot-learning 5 | - link_title: Representation Learning - Deep Learning Book 6 | link_url: http://www.deeplearningbook.org/contents/representation.html 7 | related_terms: 8 | - one-shot-learning 9 | - representation-learning 10 | title: Zero-shot learning 11 | --- 12 | Ian Goodfellow in [a Quora answer][1] defines zero-shot learning as the following: 13 | 14 | > Zero-shot learning is being able to solve a task despite not having received any training examples of that task. For a concrete example, imagine recognizing a category of object in photos without ever having seen a photo of that kind of object before. If you've read a very detailed description of a cat, you might be able to tell what a cat is in a photograph the first time you see it. 15 | 16 | 17 | [1]: https://www.quora.com/What-is-zero-shot-learning -------------------------------------------------------------------------------- /_sass/tachyons/readme.md: -------------------------------------------------------------------------------- 1 | # tachyons-sass [![Build Status](https://travis-ci.org/tachyons-css/tachyons-sass.svg?branch=master)](https://travis-ci.org/tachyons-css/tachyons-sass) 2 | 3 | Transpiled partials for Tachyons. 4 | 5 | ## Installation 6 | 7 | ```bash 8 | npm install --save tachyons-sass 9 | ``` 10 | 11 | ## Usage 12 | 13 | ```scss 14 | @import "path/to/tachyons.scss"; 15 | ``` 16 | 17 | ## License 18 | 19 | MIT 20 | 21 | ## Contributing 22 | 23 | 1. Fork it 24 | 2. Create your feature branch (`git checkout -b my-new-feature`) 25 | 3. Commit your changes (`git commit -am 'Add some feature'`) 26 | 4. Push to the branch (`git push origin my-new-feature`) 27 | 5. Create new Pull Request 28 | 29 | Built by [@mrmrs_](https://twitter.com/mrmrs_) & [@4lpine](https://twitter.com/4lpine). 30 | 31 | *** 32 | 33 | > This package was initially generated with [yeoman](http://yeoman.io) and the [p generator](https://github.com/johnotander/generator-p.git). 34 | -------------------------------------------------------------------------------- /terms/bit-transparency-audio.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Bit transparency (audio) 3 | references: 4 | - link_title: Bit Transparency 5 | link_url: https://www.soundonsound.com/techniques/bit-transparency 6 | --- 7 | 8 | A digital audio system satisfies **bit transparency** if audio data can pass through the system without being changed. 9 | 10 | A system can fail to be bit-transparent if it performs any type of digital signal processing--such as changing the audio's sample rate. Some audio operations--like [converting audio samples from integer to float and back][1]--can either be bit-transparent or not depending on the implementation. 11 | 12 | An audio system [can be tested][2] for bit-transparency by giving a random sequence of bits as input and testing that the output is bit-for-bit identical to the input. 13 | 14 | [1]: http://blog.bjornroche.com/2009/12/int-float-int-its-jungle-out-there.html 15 | [2]: https://benchmarkmedia.com/blogs/wiki/14949565-bit-transparency 16 | -------------------------------------------------------------------------------- /terms/sense2vec.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: sense2vec - A Fast and Accurate Method for Word Sense Disambiguation 4 | In Neural Word Embeddings 5 | link_url: https://arxiv.org/abs/1511.06388 6 | - link_title: Sense2vec with spaCy and Gensim - Explosion AI 7 | link_url: https://explosion.ai/blog/sense2vec-with-spacy 8 | related_terms: 9 | - word2vec 10 | - word-embedding 11 | - glove-word-embeddings 12 | title: sense2vec 13 | --- 14 | sense2vec refers to a system in a paper titled 15 | [sense2vec - A Fast and Accurate Method for Word Sense Disambiguation In Neural Word Embeddings](https://arxiv.org/abs/1511.06388). 16 | It solves a problem with previous word embeddings like [word2vec](/terms/word2vec) and [GloVe](/terms/glove-word-embeddings/) 17 | where words of different senses (e.g. "duck" as an animal and "duck" as a verb) are represented by the 18 | same embedding. 19 | 20 | sense2vec uses word sense information to train more accurate word embeddings. -------------------------------------------------------------------------------- /terms/adam-optimizer.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: 'Adam: A Method for Stochastic Optimization' 4 | link_url: https://arxiv.org/abs/1412.6980 5 | - link_title: ADAM - An overview of gradient descent optimization algorithms - Sebastian 6 | Ruder 7 | link_url: http://sebastianruder.com/optimizing-gradient-descent/index.html#adam 8 | related_terms: 9 | - adagrad 10 | - rmsprop 11 | - stochastic-optimization 12 | - stochastic-gradient-descent-sgd 13 | title: ADAM Optimizer 14 | --- 15 | ADAM, or **Ada**ptive **M**oment Estimation, is a stochastic optimization 16 | method [introduced by Diederik P. Kingma and Jimmy Lei Ba][5]. 17 | 18 | They intended to combine the advantages of [Adagrad][1]'s 19 | handling of sparse [gradients][3] and [RMSProp][2]'s handling 20 | of [non-stationary environments][4]. 21 | 22 | [1]: /terms/adagrad/ 23 | [2]: /terms/rmsprop/ 24 | [3]: /terms/gradient/ 25 | [4]: /terms/stationary-environment/ 26 | [5]: https://arxiv.org/abs/1412.6980 -------------------------------------------------------------------------------- /terms/symmetry-breaking.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Why doesn't backpropagation work when you initialize the weights the 4 | same value? -- Cross Validated 5 | link_url: https://stats.stackexchange.com/questions/45087/why-doesnt-backpropagation-work-when-you-initialize-the-weights-the-same-value 6 | - link_title: Random Initialization - Coursera Machine Learning 7 | link_url: https://www.coursera.org/learn/machine-learning/lecture/ND5G5/random-initialization 8 | title: Symmetry breaking 9 | --- 10 | Symmetry breaking refer to a requirement of initializing machine 11 | learning models like neural networks. 12 | 13 | When some machine learning models have weights all initialized 14 | to the same value, it can be difficult or impossible for the 15 | weights to differ as the model is trained. This is the "symmetry". 16 | 17 | Initializing the model to small random values breaks the symmetry 18 | and allows different weights to learn independently of each other. -------------------------------------------------------------------------------- /_sass/tachyons/scss/_outlines.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | OUTLINES 11 | 12 | Media Query Extensions: 13 | -ns = not-small 14 | -m = medium 15 | -l = large 16 | 17 | */ 18 | 19 | .outline { outline: 1px solid; } 20 | .outline-transparent { outline: 1px solid transparent; } 21 | .outline-0 { outline: 0; } 22 | 23 | @media #{$breakpoint-not-small} { 24 | .outline-ns { outline: 1px solid; } 25 | .outline-transparent-ns { outline: 1px solid transparent; } 26 | .outline-0-ns { outline: 0; } 27 | } 28 | 29 | @media #{$breakpoint-medium} { 30 | .outline-m { outline: 1px solid; } 31 | .outline-transparent-m { outline: 1px solid transparent; } 32 | .outline-0-m { outline: 0; } 33 | } 34 | 35 | @media #{$breakpoint-large} { 36 | .outline-l { outline: 1px solid; } 37 | .outline-transparent-l { outline: 1px solid transparent; } 38 | .outline-0-l { outline: 0; } 39 | } 40 | -------------------------------------------------------------------------------- /all.html: -------------------------------------------------------------------------------- 1 | --- 2 | layout: page 3 | title: All Words 4 | --- 5 | {% assign terms = site.pages | where_exp:"item","item.url contains 'terms/'" %} 6 | {% for term in terms %} 7 | {% if term.title %} 8 | {% assign link_name = term.url | split: "/" | last %} 9 |

{{ term.title }}

10 | {% if term.layout == "redirect" %} 11 | {% assign destination_url = "/terms/" | append: term.destination | append: "/" %} 12 | {% assign destination = terms | where_exp: "item", "item.url == destination_url" | first %} 13 | See {{ destination.title }}. 14 | {% else %} 15 | {{ term.content | markdownify | replace: 'href="/terms/', 'href="#' }} 16 | {% include synonyms.html page=term %} 17 | {% include related_terms.html page=term local=true %} 18 | {% include references.html page=term %} 19 | {% endif %} 20 |
21 | {% endif %} 22 | {% endfor %} -------------------------------------------------------------------------------- /terms/inverted-dropout.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Analysis of Dropout 4 | link_url: https://pgaleone.eu/deep-learning/regularization/2017/01/10/anaysis-of-dropout/ 5 | related_terms: 6 | - regularization 7 | - model-averaging 8 | - dropout 9 | title: Inverted dropout 10 | --- 11 | Inverted dropout is a variant of the original [dropout](/terms/dropout) 12 | technique developed by Hinton et al. 13 | 14 | Just like traditional dropout, inverted dropout randomly 15 | keeps some weights and sets others to zero. This is known 16 | as the "keep probability" $p$. 17 | 18 | The one difference is that, during the training of a neural 19 | network, inverted dropout scales the activations by 20 | the inverse of the keep probability $q = 1 - p$. 21 | 22 | This prevents network's activations from getting too large, 23 | and does not require any changes to the network during 24 | evaluation. 25 | 26 | In contrast, traditional dropout requires scaling to be implemented 27 | during the test phase. -------------------------------------------------------------------------------- /terms/siamese-neural-network.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Siamese neural network 3 | related_terms: 4 | - neural-network 5 | - representation-learning 6 | - one-shot-learning 7 | - triplet-loss 8 | references: 9 | - link_title: Siamese network - Convolutional Neural Networks - deeplearning.ai 10 | link_url: https://www.coursera.org/learn/convolutional-neural-networks/lecture/bjhmj/siamese-network 11 | --- 12 | A *Siamese neural network* is a neural network architecture that runs two pieces of data through identical neural networks, and then the outputs are fed to a loss function measuring similarity between outputs. 13 | 14 | Siamese neural networks are a common model architecture for [one-shot learning][1]. 15 | 16 | For example, a Siamese neural network might be used to train a model to measure similarity between two different images, for the purpose of identifying whether the images are of the object.... but without training on many examples of that object. 17 | 18 | [1]: /terms/one-shot-learning -------------------------------------------------------------------------------- /terms/unk.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: UNK 3 | --- 4 | UNK, unk, `` are variants of a symbol in natural language processing 5 | and machine translation to indicate an out-of-vocabulary word. 6 | 7 | Many language models do calculations upon representations of 8 | the $n$ most frequent words in the corpus. Words that are less 9 | frequent are replaced with the `` symbol. 10 | 11 | This is what such a transformation might look like. The below 12 | is an example of a source document in a corpus with 13 | common English words. 14 | 15 | 16 | > Today I'll bake; tomorrow I'll brew, 17 | > Then I'll fetch the queen's new child, 18 | > It is good that no one knows, 19 | > **Rumpelstiltskin** is my name. 20 | 21 | Every word in the above quote is common in English, except for 22 | Rumpelstiltskin, which is replaced as following: 23 | 24 | > Today I'll bake; tomorrow I'll brew, 25 | > Then I'll fetch the queen's new child, 26 | > It is good that no one knows, 27 | > **<unk>** is my name. 28 | -------------------------------------------------------------------------------- /terms/pseudo-labeling.md: -------------------------------------------------------------------------------- 1 | --- 2 | related_terms: 3 | - semi-supervised-learning 4 | title: Pseudo-labeling 5 | --- 6 | 7 | **Pseudo-labeling** is when: 8 | 9 | 1. A machine learning model is trained on a labeled training set. 10 | 2. The model is used to compute predicted labels against unlabeled data. 11 | 3. The model is retrained from a new dataset that adds the data with predicted labels to the training set. 12 | 13 | Pseudo-labeling can sometimes be very effective in improving a machine learning model's accuracy. The underlying theory is that pseudo-labeling can make it easier for a classification model to learn more precise boundaries between classes. However, in order for pseudo-labeling to work, the original training set must be large enough--and representative of all classes--for the model's predicted labels to be reasonably accurate. 14 | 15 | However, if the training set is already very large relative to the number of parameters in the model, then pseudo-labeling may be unnecessary. 16 | -------------------------------------------------------------------------------- /acronyms.html: -------------------------------------------------------------------------------- 1 | --- 2 | layout: page 3 | title: "List of Acronyms" 4 | --- 5 |

6 | This page contains alphabetically-sorted links to all terms with well-known acronyms. 7 |

8 | -------------------------------------------------------------------------------- /terms/convex-hull.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Convex hull - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Convex_hull 5 | - link_title: Convex set - Wikipedia 6 | link_url: https://en.wikipedia.org/wiki/Convex_set 7 | - link_title: Convex combination - Wikipedia 8 | link_url: https://en.wikipedia.org/wiki/Convex_combination 9 | related_terms: 10 | - convex-combination 11 | - affine-space 12 | title: Convex hull 13 | --- 14 | The convex hull of a set $X$ in an affine space over the reals is the smallest 15 | [convex set][1] that contains $X$. When the points are two dimensional, 16 | the convex hull can be thought of as the rubber band around the points of $X$. 17 | 18 | As per Wikipedia, a [convex set][1] is the smallest affine space 19 | closed under convex combination. 20 | 21 | A [convex combination][2] is a linear combination where 22 | all the coefficients are greater than 0 and all sum to 1. 23 | 24 | [1]: https://en.wikipedia.org/wiki/Convex_set 25 | [2]: /terms/convex-combination/ -------------------------------------------------------------------------------- /terms/neural-turing-machine-ntm.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Neural Turing Machine - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Neural_Turing_machine 5 | - link_title: Neural Turing Machines 6 | link_url: https://arxiv.org/pdf/1410.5401.pdf 7 | related_terms: 8 | - recurrent-neural-network 9 | - neural-network 10 | - long-short-term-memory 11 | title: Neural Turing Machine (NTM) 12 | --- 13 | Neural Turing Machines (NTM) consists of a RNN (commonly with LSTM), and a memory bank, where the neural network can make write and read operations. By making each operation of the NTM differentiable, it can be trained efficiently trained with gradient descent. 14 | 15 | The main idea of the NTM is to use the memory bank -- a large, addressable memory -- to give a memory to the RNN so that it can read and write to, yielding a practical mechanism to learn programs. The NTM has been shown to be able to infer simple algorithms, such as copying, sorting and associative recall from input and output examples. -------------------------------------------------------------------------------- /_sass/tachyons/scss/_word-break.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | WORD BREAK 11 | 12 | Base: 13 | word = word-break 14 | 15 | Media Query Extensions: 16 | -ns = not-small 17 | -m = medium 18 | -l = large 19 | 20 | */ 21 | 22 | .word-normal { word-break: normal; } 23 | .word-wrap { word-break: break-all; } 24 | .word-nowrap { word-break: keep-all; } 25 | 26 | @media #{$breakpoint-not-small} { 27 | .word-normal-ns { word-break: normal; } 28 | .word-wrap-ns { word-break: break-all; } 29 | .word-nowrap-ns { word-break: keep-all; } 30 | } 31 | 32 | @media #{$breakpoint-medium} { 33 | .word-normal-m { word-break: normal; } 34 | .word-wrap-m { word-break: break-all; } 35 | .word-nowrap-m { word-break: keep-all; } 36 | } 37 | 38 | @media #{$breakpoint-large} { 39 | .word-normal-l { word-break: normal; } 40 | .word-wrap-l { word-break: break-all; } 41 | .word-nowrap-l { word-break: keep-all; } 42 | } 43 | 44 | -------------------------------------------------------------------------------- /terms/vector-quantized-variational-autoencoder-vqvae.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: "Vector-Quantized Variational Autoencoders (VQ-VAE)" 3 | references: 4 | - link_title: "Understanding VQ-VAE (DALL-E Explained Pt. 1) - Machine Learning @ Berkeley" 5 | link_url: https://ml.berkeley.edu/blog/posts/vq-vae/ 6 | - link_title: "Neural Discrete Representation Learning" 7 | link_url: https://arxiv.org/abs/1711.00937 8 | related_terms: 9 | - codebook 10 | - variational-autoencoder-vae 11 | - generative-adversarial-network-gan 12 | --- 13 | 14 | The **Vector-Quantized Variational Autoencoder (VAE)** is a type of [variational autoencoder][1] where the autoencoder's encoder neural network emits discrete--not continuous--values by mapping the encoder's embedding values to a fixed number of [codebook][2] values. 15 | 16 | The VQ-VAE was originally introduced in the [Neural Discrete Representation Learning][3] paper from Google. 17 | 18 | [1]: /terms/variational-autoencoder-vae 19 | [2]: /terms/codebook 20 | [3]: https://arxiv.org/abs/1711.00937 21 | -------------------------------------------------------------------------------- /_sass/tachyons/scss/_text-align.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | TEXT ALIGN 11 | Docs: http://tachyons.io/docs/typography/text-align/ 12 | 13 | Base 14 | t = text-align 15 | 16 | Modifiers 17 | l = left 18 | r = right 19 | c = center 20 | 21 | Media Query Extensions: 22 | -ns = not-small 23 | -m = medium 24 | -l = large 25 | 26 | */ 27 | 28 | .tl { text-align: left; } 29 | .tr { text-align: right; } 30 | .tc { text-align: center; } 31 | 32 | @media #{$breakpoint-not-small} { 33 | .tl-ns { text-align: left; } 34 | .tr-ns { text-align: right; } 35 | .tc-ns { text-align: center; } 36 | } 37 | 38 | @media #{$breakpoint-medium} { 39 | .tl-m { text-align: left; } 40 | .tr-m { text-align: right; } 41 | .tc-m { text-align: center; } 42 | } 43 | 44 | @media #{$breakpoint-large} { 45 | .tl-l { text-align: left; } 46 | .tr-l { text-align: right; } 47 | .tc-l { text-align: center; } 48 | } 49 | 50 | -------------------------------------------------------------------------------- /terms/mean-reciprocal-rank-mrr.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: 'Chapter 14: Question Answering and Information Retrieval - Speech and Language Processing' 4 | link_url: https://web.stanford.edu/~jurafsky/slp3/14.pdf 5 | - link_title: Mean reciprocal rank - Wikipedia 6 | link_url: https://en.wikipedia.org/wiki/Mean_reciprocal_rank 7 | title: Mean Reciprocal Rank (MRR) 8 | --- 9 | $\newcommand{\Correctrank}{\mathrm{rank}}$ 10 | 11 | Mean Reciprocal Rank is a measure to evaluate systems that return 12 | a ranked list of answers to queries. 13 | 14 | For a single query, the *reciprocal rank* is 15 | $\frac 1 \Correctrank$ where $\Correctrank$ is the position of the 16 | highest-ranked answer ($1, 2, 3, \ldots, N$ for $N$ answers returned 17 | in a query). If no correct answer was returned in the query, then the reciprocal 18 | rank is 0. 19 | 20 | For multiple queries $Q$, the Mean Reciprocal Rank is the mean 21 | of the $Q$ reciprocal ranks. 22 | 23 | $$\mathrm{MRR} = \frac 1 Q \sum_{i=1}^{Q} \frac 1 {\Correctrank_i}$$ 24 | -------------------------------------------------------------------------------- /terms/vggnet.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Very Deep Convolutional Networks for Large-Scale Visual Recognition 4 | - Department of Engineering Science, University of Oxford 5 | link_url: http://www.robots.ox.ac.uk/~vgg/research/very_deep/ 6 | - link_title: Case Studies - Stanford CS231n Convolutional Neural Networks 7 | link_url: http://cs231n.github.io/convolutional-networks/#case 8 | - link_title: Very Deep Convolutional Networks for Large-Scale Image Recognition 9 | link_url: https://arxiv.org/abs/1409.1556 10 | related_terms: 11 | - convolutional-neural-network-cnn 12 | title: VGGNet 13 | --- 14 | VGGNet is a deep convolutional neural network 15 | for image recognition, trained by 16 | the Visual Geometry Group (VGG) at the University of Oxford. 17 | 18 | VGGNet helped the VGG team secure the [first place 19 | in Localization and second place in Classification][1] 20 | in the 2014 ImageNet Large Scale Visual Recognition Competition. 21 | 22 | [1]: http://www.image-net.org/challenges/LSVRC/2014/results#clsloc -------------------------------------------------------------------------------- /terms/catastrophic-forgetting.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Overcoming catastrophic forgetting in neural networks 4 | link_url: https://arxiv.org/abs/1612.00796 5 | - link_title: Catastrophic interference - Wikipedia 6 | link_url: https://en.wikipedia.org/wiki/Catastrophic_interference 7 | - link_title: Catastrophic forgetting - Standout Publishing 8 | link_url: http://standoutpublishing.com/g/catastrophic-forgetting.html 9 | - link_title: 'Catastrophic Interference in Connectionist Networks: The Sequential 10 | Learning Problem' 11 | link_url: http://www.sciencedirect.com/science/article/pii/S0079742108605368 12 | title: Catastrophic forgetting 13 | --- 14 | Catastrophic forgetting (or catastrophic interference) is a problem 15 | in machine learning where a model forgets an existing learned pattern 16 | when learning a new one. 17 | 18 | The model uses the same parameters to recognize both patterns, 19 | and learning the second pattern overwrites the parameters' 20 | configuration from having learned the first pattern. -------------------------------------------------------------------------------- /_sass/tachyons/scss/_background-size.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | BACKGROUND SIZE 11 | Docs: http://tachyons.io/docs/themes/background-size/ 12 | 13 | Media Query Extensions: 14 | -ns = not-small 15 | -m = medium 16 | -l = large 17 | 18 | */ 19 | 20 | /* 21 | Often used in combination with background image set as an inline style 22 | on an html element. 23 | */ 24 | 25 | .cover { background-size: cover!important; } 26 | .contain { background-size: contain!important; } 27 | 28 | @media #{$breakpoint-not-small} { 29 | .cover-ns { background-size: cover!important; } 30 | .contain-ns { background-size: contain!important; } 31 | } 32 | 33 | @media #{$breakpoint-medium} { 34 | .cover-m { background-size: cover!important; } 35 | .contain-m { background-size: contain!important; } 36 | } 37 | 38 | @media #{$breakpoint-large} { 39 | .cover-l { background-size: cover!important; } 40 | .contain-l { background-size: contain!important; } 41 | } 42 | -------------------------------------------------------------------------------- /terms/connectionist-temporal-classification-ctc.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: 'Connectionist Temporal Classification: Labelling Unsegmented Sequence 4 | Data with Recurrent Neural Networks' 5 | link_url: http://www.machinelearning.org/proceedings/icml2006/047_Connectionist_Tempor.pdf 6 | related_terms: 7 | - recurrent-neural-network 8 | - temporal-classification 9 | title: Connectionist Temporal Classification (CTC) 10 | --- 11 | *Connectionist Temporal Classification* is a term coined in a paper by 12 | Graves et al. titled [Connectionist Temporal Classification: Labelling Unsegmented Sequence Data with Recurrent Neural Networks][1]. 13 | 14 | It refers to the use of [recurrent neural networks][2] 15 | (which is a form of [connectionism][3]) for the purpose of 16 | labeling unsegmented data sequences (AKA [temporal classification][4]). 17 | 18 | [1]: http://www.machinelearning.org/proceedings/icml2006/047_Connectionist_Tempor.pdf 19 | [2]: /terms/recurrent-neural-network/ 20 | [3]: /terms/connectionism/ 21 | [4]: /terms/temporal-classification/ -------------------------------------------------------------------------------- /terms/language-segmentation.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Language segmentation 3 | related_terms: 4 | - natural-language-processing 5 | --- 6 | This phrase is most concisely described in in [this work by David Alfter][1]: 7 | 8 | > Language segmentation consists in finding the boundaries where one 9 | > language ends and another language begins in a text written in more than one language. 10 | > This is important for all natural language processing tasks. 11 | > 12 | > [...] 13 | > 14 | > One important point that has to be borne in mind is the difference between language 15 | > identification and language segmentation. Language identification is concerned with recognizing 16 | > the language at hand. It is possible to use language identification for language segmentation. 17 | > Indeed, by identifying the languages in a text, the segmentation is implicitly obtained. 18 | > Language segmentation on the other hand is only concerned with identifying language 19 | > boundaries. No claims about the languages involved are made. 20 | 21 | [1]: https://arxiv.org/abs/1510.01717 -------------------------------------------------------------------------------- /_sass/tachyons/scss/_clears.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | CLEARFIX 11 | http://tachyons.io/docs/layout/clearfix/ 12 | 13 | */ 14 | 15 | /* Nicolas Gallaghers Clearfix solution 16 | Ref: http://nicolasgallagher.com/micro-clearfix-hack/ */ 17 | 18 | .cf:before, 19 | .cf:after { content: " "; display: table; } 20 | .cf:after { clear: both; } 21 | .cf { *zoom: 1; } 22 | 23 | .cl { clear: left; } 24 | .cr { clear: right; } 25 | .cb { clear: both; } 26 | .cn { clear: none; } 27 | 28 | @media #{$breakpoint-not-small} { 29 | .cl-ns { clear: left; } 30 | .cr-ns { clear: right; } 31 | .cb-ns { clear: both; } 32 | .cn-ns { clear: none; } 33 | } 34 | 35 | @media #{$breakpoint-medium} { 36 | .cl-m { clear: left; } 37 | .cr-m { clear: right; } 38 | .cb-m { clear: both; } 39 | .cn-m { clear: none; } 40 | } 41 | 42 | @media #{$breakpoint-large} { 43 | .cl-l { clear: left; } 44 | .cr-l { clear: right; } 45 | .cb-l { clear: both; } 46 | .cn-l { clear: none; } 47 | } 48 | -------------------------------------------------------------------------------- /terms/padding-convolution.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Padding (convolution) 3 | related_terms: 4 | - zero-padding 5 | - convolution 6 | - convolutional-neural-network-cnn 7 | --- 8 | Padding is a preprocessing step before a convolution operation. 9 | 10 | When we [convolve][1] a $n \times n$ image with an $f \times f$ filter 11 | and a stride length of $1$, 12 | the output is a matrix of dimension $n - f \times n - f$. 13 | 14 | For deep convolutional neural networks that may do many convolutions, 15 | this would cause the input matrix to dramatically shrink and become 16 | too small. 17 | 18 | Additionally, values in the middle of the input matrix have a greater 19 | influence on the output than values on the edges. 20 | 21 | There are several different methods for choosing what values to pad an input 22 | matrix with: 23 | 24 | - [Zero-padding][2] -- padding with zeroes 25 | - Repeating the nearest border values as values for padding 26 | - Using values from the opposite side of the matrix as padding values 27 | 28 | [1]: /terms/convolution/ 29 | [2]: /terms/zero-pad/ 30 | -------------------------------------------------------------------------------- /terms/batch-normalization.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: 'Batch Normalization: Accelerating Deep Network Training by Reducing 4 | Internal Covariate Shift' 5 | link_url: https://arxiv.org/abs/1502.03167 6 | - link_title: "Batch Normalization\u200A\u2014\u200AWhat the hey?" 7 | link_url: https://gab41.lab41.org/batch-normalization-what-the-hey-d480039a9e3b 8 | - link_title: Why does batch normalization help? - Quora 9 | link_url: https://www.quora.com/Why-does-batch-normalization-help 10 | - link_title: Understanding the backward pass through Batch Normalization Layer 11 | link_url: http://kratzert.github.io/2016/02/12/understanding-the-gradient-flow-through-the-batch-normalization-layer.html 12 | title: Batch normalization 13 | --- 14 | Batch normalization is a technique used to improve the stability and performance of deep neural networks. It works by normalizing the input data at each layer, which allows the network to learn more effectively. Batch normalization has been shown to improve training times, accuracy, and robustness of deep neural networks. 15 | -------------------------------------------------------------------------------- /terms/distance-metric.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Distance metric 3 | references: 4 | - link_title: Metric (mathematics) 5 | link_url: https://en.wikipedia.org/wiki/Metric_(mathematics) 6 | --- 7 | As per Wikipedia, a distance metric, metric, or distance 8 | function, "is a function that defines a distance between each pair of elements of a set." 9 | 10 | A distance metric $d(\cdot)$ requires the following four axioms to be true 11 | for all elements $x$, $y$, and $z$ in a given set. 12 | 13 | - **Non-negativity:** $d(x, y) \geq 0$ -- The distance must always be 14 | greater than zero. 15 | - **Identity of indiscernibles:** $d(x, y) = 0 \Leftrightarrow x = y$ -- The distance must be zero for two elements that are the same (i.e. indiscernible from each other). 16 | - **Symmetry:** $d(x,y) = d(y,x)$ -- The distances must be the same, no matter which order the parameters are given. 17 | - **Triangle inequality:** $d(x,z) \leq d(x,y) + d(y,z)$ -- For three elements in the set, the sum of the distances for any two pairs must be greater than the distance for the remaining pair. 18 | 19 | -------------------------------------------------------------------------------- /terms/yolo-object-detection.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: YOLO (object detection algorithm) 3 | related_terms: 4 | - computer-vision 5 | - convolutional-neural-network-cnn 6 | - object-detection 7 | - object-localization 8 | references: 9 | - link_title: "You Only Look Once: Realtime Object detection" 10 | link_url: https://pjreddie.com/media/files/papers/yolo.pdf 11 | - link_title: YOLO project homepage 12 | link_url: https://pjreddie.com/darknet/yolo/ 13 | --- 14 | 15 | *YOLO* (an acronym standing for the phrase "You Only Look Once") 16 | refers to a fast object detection algorithm. Previous attempts 17 | at building object detection algorithms involved running 18 | [object detectors][1] or [object localizers][2] multiple times over 19 | a single image. 20 | 21 | Instead of needing multiple executions over a single image, YOLO 22 | detects objects through sending an image through a single forward 23 | pass through a [convolutional neural network][3]. 24 | 25 | [1]: /terms/object-detection/ 26 | [2]: /terms/object-localization/ 27 | [3]: /terms/convolutional-neural-network-cnn/ 28 | -------------------------------------------------------------------------------- /terms/bag-of-n-grams.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Bag-of-n-grams 3 | related_terms: 4 | - bag-of-words 5 | --- 6 | A bag-of-$n$-grams model is a way to represent a document, 7 | similar to a [bag-of-words][/terms/bag-of-words/] model. 8 | 9 | A bag-of-$n$-grams model represents a text document as 10 | an unordered collection of its $n$-grams. 11 | 12 | For example, let's use the following phrase and divide 13 | it into bi-grams ($n = 2$). 14 | 15 | > James is the best person ever. 16 | 17 | becomes 18 | 19 | - `James` 20 | - `James is` 21 | - `is the` 22 | - `the best` 23 | - `best person` 24 | - `person ever.` 25 | - `ever.` 26 | 27 | In a typical bag-of-$n$-grams model, these 6 bigrams would be 28 | a sample from a large number of bigrams observed in a corpus. 29 | And then *James is the best person ever.* would be encoded 30 | in a representation showing which of the corpus's bigrams 31 | were observed in the sentence. 32 | 33 | A bag-of-$n$-grams model has the simplicity of the bag-of-words 34 | model, but allows the preservation of more word locality 35 | information. 36 | -------------------------------------------------------------------------------- /terms/facet-plotting.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Facet (plotting) 4 | link_url: http://www.cookbook-r.com/Graphs/Facets_(ggplot2) 5 | - link_title: Plotting multiple groups with facets in ggplot2 6 | link_url: https://www3.nd.edu/~steve/computing_with_data/13_Facets/facets.html 7 | title: Facet 8 | related_terms: 9 | - facets-tool 10 | --- 11 | 12 | In statistical plotting, a facet is a type of plot. Data 13 | is split into subsets and the subsets are plotted 14 | in a row or grid of subplots. 15 | 16 | The term is common among users of [ggplot2](http://ggplot2.org/), 17 | a plotting package for the 18 | [R statistical computing language](https://www.r-project.org/about.html). 19 | 20 | [Facet](/terms/facets-tool) is also the name of a plotting and 21 | visualizaton tool created by the People + AI Research (PAIR) 22 | initiative at Google. 23 | 24 | ![This is a facet wrap as generated by the R package `ggplot2`. This image comes from [Plotting multiple groups with facets in ggplot2][1].](/images/faceting.png) 25 | 26 | [1]: https://www3.nd.edu/~steve/computing_with_data/13_Facets/facets.html -------------------------------------------------------------------------------- /terms/weak-supervision.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Weak supervision 3 | references: 4 | - link_title: "Weak Supervision: A New Programming Paradigm for Machine Learning - The Stanford AI Lab Blog" 5 | link_url: https://ai.stanford.edu/blog/weak-supervision/ 6 | - link_title: "Snorkel and The Dawn of Weakly Supervised Machine Learning - Stanford DAWN" 7 | link_url: https://dawn.cs.stanford.edu/2017/05/08/snorkel/ 8 | - link_title: "Snuba: Automating Weak Supervision to Label Training Data - Stanford University" 9 | link_url: http://www.vldb.org/pvldb/vol12/p223-varma.pdf 10 | - link_title: "Weak supervision - Wikipedia" 11 | link_url: https://en.wikipedia.org/wiki/Weak_supervision 12 | related_terms: 13 | - active-learning 14 | - semi-supervised-learning 15 | - transfer-learning 16 | --- 17 | **Weak supervision** describes the use of noisy or error-prone data labels for training supervised learning models. 18 | 19 | It can be expensive or impractical to create or obtain highly-accurate labels for a large dataset. Weak supervision offers the choice of using a larger number of somewhat-less-accurate data labels. 20 | -------------------------------------------------------------------------------- /_sass/tachyons/scss/_position.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | POSITIONING 11 | Docs: http://tachyons.io/docs/layout/position/ 12 | 13 | Media Query Extensions: 14 | -ns = not-small 15 | -m = medium 16 | -l = large 17 | 18 | */ 19 | 20 | .static { position: static; } 21 | .relative { position: relative; } 22 | .absolute { position: absolute; } 23 | .fixed { position: fixed; } 24 | 25 | @media #{$breakpoint-not-small} { 26 | .static-ns { position: static; } 27 | .relative-ns { position: relative; } 28 | .absolute-ns { position: absolute; } 29 | .fixed-ns { position: fixed; } 30 | } 31 | 32 | @media #{$breakpoint-medium} { 33 | .static-m { position: static; } 34 | .relative-m { position: relative; } 35 | .absolute-m { position: absolute; } 36 | .fixed-m { position: fixed; } 37 | } 38 | 39 | @media #{$breakpoint-large} { 40 | .static-l { position: static; } 41 | .relative-l { position: relative; } 42 | .absolute-l { position: absolute; } 43 | .fixed-l { position: fixed; } 44 | } 45 | -------------------------------------------------------------------------------- /terms/neural-checklist-model.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Globally Coherent Text Generation with Neural Checklist Models 4 | link_url: https://homes.cs.washington.edu/~yejin/Papers/emnlp16_neuralchecklist.pdf 5 | related_terms: 6 | - recurrent-neural-network 7 | - attention-neural-networks 8 | title: Neural checklist model 9 | --- 10 | Neural checklist models were introduced in the paper [Globally Coherent Text Generation with Neural Checklist Models](https://homes.cs.washington.edu/~yejin/Papers/emnlp16_neuralchecklist.pdf) by Kiddon et al. 11 | 12 | A neural checklist model is a recurrent neural network that tracks an agenda of text strings that should be mentioned in the output. 13 | 14 | This technique allows the neural checklist model to generate *globally coherent* text, as opposed to text from traditional RNNs that is only locally coherent. 15 | 16 | The original paper describes applying the neural checklist model 17 | to recipes and dialogue responses for information systems, 18 | where there already exists a pre-existing notion of all 19 | the topics that should be present in a natural language response. -------------------------------------------------------------------------------- /_sass/tachyons/license.txt: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 @mrmrs (mrmrs.io) 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions: 6 | 7 | The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software. 8 | 9 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE. 10 | -------------------------------------------------------------------------------- /_sass/tachyons/scss/_text-decoration.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | TEXT DECORATION 11 | Docs: http://tachyons.io/docs/typography/text-decoration/ 12 | 13 | 14 | Media Query Extensions: 15 | -ns = not-small 16 | -m = medium 17 | -l = large 18 | 19 | */ 20 | 21 | .strike { text-decoration: line-through; } 22 | .underline { text-decoration: underline; } 23 | .no-underline { text-decoration: none; } 24 | 25 | 26 | @media #{$breakpoint-not-small} { 27 | .strike-ns { text-decoration: line-through; } 28 | .underline-ns { text-decoration: underline; } 29 | .no-underline-ns { text-decoration: none; } 30 | } 31 | 32 | @media #{$breakpoint-medium} { 33 | .strike-m { text-decoration: line-through; } 34 | .underline-m { text-decoration: underline; } 35 | .no-underline-m { text-decoration: none; } 36 | } 37 | 38 | @media #{$breakpoint-large} { 39 | .strike-l { text-decoration: line-through; } 40 | .underline-l { text-decoration: underline; } 41 | .no-underline-l { text-decoration: none; } 42 | } 43 | -------------------------------------------------------------------------------- /_sass/tachyons/scss/_line-height.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | LINE HEIGHT / LEADING 11 | Docs: http://tachyons.io/docs/typography/line-height 12 | 13 | Media Query Extensions: 14 | -ns = not-small 15 | -m = medium 16 | -l = large 17 | 18 | */ 19 | 20 | .lh-solid { line-height: $line-height-solid; } 21 | .lh-title { line-height: $line-height-title; } 22 | .lh-copy { line-height: $line-height-copy; } 23 | 24 | @media #{$breakpoint-not-small} { 25 | .lh-solid-ns { line-height: $line-height-solid; } 26 | .lh-title-ns { line-height: $line-height-title; } 27 | .lh-copy-ns { line-height: $line-height-copy; } 28 | } 29 | 30 | @media #{$breakpoint-medium} { 31 | .lh-solid-m { line-height: $line-height-solid; } 32 | .lh-title-m { line-height: $line-height-title; } 33 | .lh-copy-m { line-height: $line-height-copy; } 34 | } 35 | 36 | @media #{$breakpoint-large} { 37 | .lh-solid-l { line-height: $line-height-solid; } 38 | .lh-title-l { line-height: $line-height-title; } 39 | .lh-copy-l { line-height: $line-height-copy; } 40 | } 41 | 42 | -------------------------------------------------------------------------------- /terms/distributed-representation.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Distributed Representation - Directory of Cognitive Science 4 | link_url: http://www.bcp.psych.ualberta.ca/~mike/Pearl_Street/Dictionary/contents/D/distributed.html 5 | - link_title: Local and distributed representations - Programming Methods for Cognitive 6 | Science 7 | link_url: http://www.indiana.edu/~gasser/Q530/Notes/representation.html 8 | related_terms: 9 | - word-embedding 10 | title: Distributed representation 11 | --- 12 | In machine learning, data with a *local representation* typically has 1 unit per element. 13 | A 5-word vocabulary might be defined by a 5-dimensional vector, with 14 | $[1, 0, 0, 0, 0]^T$ denoting the first word, $[0, 1, 0, 0, 0]^T$ denoting the second word, 15 | and so forth. 16 | 17 | Distributed representations are the opposite, instead of concentrating the meaning 18 | of a data point into one component or one "element", the meaning of the 19 | data is distributed across the whole vector. 20 | 21 | The word that is $[1, 0, 0, 0, 0]^T$ in a local representation might look like 22 | $[-0.150, -0.024, -0.233, -0.253, -0.183]^T$ in a distributed representation. -------------------------------------------------------------------------------- /terms/word2vec.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Efficient Estimation of Word Representations in Vector Space 4 | link_url: https://arxiv.org/abs/1301.3781 5 | - link_title: Vector Representations of Words - Tutorials - TensorFlow Documentation 6 | link_url: https://www.tensorflow.org/tutorials/word2vec 7 | related_terms: 8 | - word-embedding 9 | - skip-gram 10 | - continuous-bag-of-words-cbow 11 | - doc2vec 12 | title: word2vec 13 | --- 14 | `word2vec` refers to a pair of models, open-source software, and pre-trained word embeddings 15 | from Google. 16 | 17 | The models are: 18 | 19 | - [skip-gram](/terms/skip-gram/), using a word to predict the surrounding $n$ words 20 | - [continuous-bag-of-words (CBOW)](/terms/continuous-bag-of-words-cbow), using the context of the surrounding 21 | $n$ words to predict the center word. 22 | 23 | The original paper is titled [Efficient Estimation of Word Representations in 24 | Vector Space](https://arxiv.org/abs/1301.3781) by Mikolov et al. 25 | 26 | The source code was originally hosted on 27 | [Google Code](https://code.google.com/p/word2vec) but is now 28 | located [on Github](https://github.com/tmikolov/word2vec). -------------------------------------------------------------------------------- /_sass/tachyons/scss/_utilities.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | UTILITIES 11 | 12 | Media Query Extensions: 13 | -ns = not-small 14 | -m = medium 15 | -l = large 16 | 17 | */ 18 | 19 | /* Equivalent to .overflow-y-scroll */ 20 | .overflow-container { 21 | overflow-y: scroll; 22 | } 23 | 24 | .center { 25 | margin-right: auto; 26 | margin-left: auto; 27 | } 28 | 29 | .mr-auto { margin-right: auto; } 30 | .ml-auto { margin-left: auto; } 31 | 32 | @media #{$breakpoint-not-small}{ 33 | .center-ns { 34 | margin-right: auto; 35 | margin-left: auto; 36 | } 37 | .mr-auto-ns { margin-right: auto; } 38 | .ml-auto-ns { margin-left: auto; } 39 | } 40 | 41 | @media #{$breakpoint-medium}{ 42 | .center-m { 43 | margin-right: auto; 44 | margin-left: auto; 45 | } 46 | .mr-auto-m { margin-right: auto; } 47 | .ml-auto-m { margin-left: auto; } 48 | } 49 | 50 | @media #{$breakpoint-large}{ 51 | .center-l { 52 | margin-right: auto; 53 | margin-left: auto; 54 | } 55 | .mr-auto-l { margin-right: auto; } 56 | .ml-auto-l { margin-left: auto; } 57 | } 58 | -------------------------------------------------------------------------------- /_sass/tachyons/scss/_vertical-align.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | VERTICAL ALIGN 11 | 12 | Media Query Extensions: 13 | -ns = not-small 14 | -m = medium 15 | -l = large 16 | 17 | */ 18 | 19 | .v-base { vertical-align: baseline; } 20 | .v-mid { vertical-align: middle; } 21 | .v-top { vertical-align: top; } 22 | .v-btm { vertical-align: bottom; } 23 | 24 | @media #{$breakpoint-not-small} { 25 | .v-base-ns { vertical-align: baseline; } 26 | .v-mid-ns { vertical-align: middle; } 27 | .v-top-ns { vertical-align: top; } 28 | .v-btm-ns { vertical-align: bottom; } 29 | } 30 | 31 | @media #{$breakpoint-medium} { 32 | .v-base-m { vertical-align: baseline; } 33 | .v-mid-m { vertical-align: middle; } 34 | .v-top-m { vertical-align: top; } 35 | .v-btm-m { vertical-align: bottom; } 36 | } 37 | 38 | @media #{$breakpoint-large} { 39 | .v-base-l { vertical-align: baseline; } 40 | .v-mid-l { vertical-align: middle; } 41 | .v-top-l { vertical-align: top; } 42 | .v-btm-l { vertical-align: bottom; } 43 | } 44 | -------------------------------------------------------------------------------- /terms/sequence-to-sequence-learning-seq2seq.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Sequence-to-Sequence Models - TensorFlow Tutorials 4 | link_url: https://www.tensorflow.org/tutorials/seq2seq 5 | - link_title: Sequence to Sequence Learning with Neural Networks 6 | link_url: https://arxiv.org/abs/1409.3215 7 | related_terms: 8 | - long-short-term-memory-lstm 9 | - recurrent-neural-network 10 | title: Sequence to Sequence Learning (seq2seq) 11 | --- 12 | This typically refers to the method originally described by Sutskever et al. in the paper 13 | [Sequence to Sequence Learning with Neural Networks][1]. 14 | 15 | Feedforward neural networks and many other models can learn complex patterns, but require fixed-length 16 | input. This makes it difficult for these models to learn variable-length sequences. To solve this, 17 | the authors applied one [LSTM](/terms/long-short-term-memory-lstm/) to read the input seqeunce 18 | and a second LSTM to generate the output sequence. 19 | 20 | A few potential applications of sequence to sequence learning include: 21 | 22 | - Machine translation 23 | - Text summarization 24 | - Speech-to-text conversion 25 | 26 | [1]: https://arxiv.org/abs/1409.3215 -------------------------------------------------------------------------------- /terms/same-convolution.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool 4 | of tensorflow? 5 | link_url: https://stackoverflow.com/questions/37674306/what-is-the-difference-between-same-and-valid-padding-in-tf-nn-max-pool-of-t 6 | related_terms: 7 | - convolution 8 | - convolutional-neural-network-cnn 9 | - padding-convolution 10 | title: Same convolution 11 | --- 12 | A *same convolution* is a type of convolution where the output 13 | matrix is of the same dimension as the input matrix. 14 | 15 | For a $n \times n$ input matrix $A$ and a $f \times f$ filter matrix $F$, 16 | the output of the convolution $A * F$ is of dimension 17 | $\left \lfloor \frac{n + 2p - f}{s} \right \rfloor + 1 \times \left \lfloor \frac{n + 2p - f}{s} \right \rfloor + 1$ 18 | where $s$ represents the stride length and 19 | $p$ represents the padding. 20 | 21 | In a same convolution: 22 | 23 | - $s$ is typically set to $1$ 24 | - $p$ is set to $\frac{f - 1}{2}$ 25 | - $f$ is an odd number 26 | 27 | The result is that $A$ is padded to be $n + p \times n + p$ 28 | and $A * F$ becomes $n \times n$ -- the same as the original 29 | dimensions of $A$. -------------------------------------------------------------------------------- /terms/valid-convolution.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: What is the difference between 'SAME' and 'VALID' padding in tf.nn.max_pool 4 | of tensorflow? 5 | link_url: https://stackoverflow.com/questions/37674306/what-is-the-difference-between-same-and-valid-padding-in-tf-nn-max-pool-of-t 6 | related_terms: 7 | - convolution 8 | - same-convolution 9 | - convolutional-neural-network-cnn 10 | - padding-convolution 11 | title: Valid convolution 12 | --- 13 | A *valid convolution* is a type of [convolution][1] operation that does not use any [padding][2] on the input. 14 | 15 | For an $n \times n$ input matrix and an $f \times f$ filter, a valid convolution 16 | will return an output matrix of dimensions 17 | 18 | $$ 19 | \left \lfloor \frac{n - f}{s} + 1 \right \rfloor \times 20 | \left \lfloor \frac{n - f}{s} + 1 \right \rfloor 21 | $$ 22 | 23 | where $s$ is the [stride][3] length of the convolution. 24 | 25 | This is in contrast to a [same convolution][4], which pads the 26 | $n \times n$ input matrix such that the output matrix is also $n 27 | \times n$. 28 | 29 | [1]: /terms/convolution/ 30 | [2]: /terms/padding-convolution/ 31 | [3]: /terms/stride-convolution/ 32 | [4]: /terms/same-convolution/ 33 | -------------------------------------------------------------------------------- /terms/anchor-box.md: -------------------------------------------------------------------------------- 1 | --- 2 | title: Anchor box 3 | related_terms: 4 | - yolo-object-detection 5 | - computer-vision 6 | - convolutional-neural-network-cnn 7 | - bounding-box 8 | references: 9 | - link_title: Anchor Boxes - Convolutional Neural Networks - deeplearning.ai 10 | link_url: https://www.coursera.org/learn/convolutional-neural-networks/lecture/yNwO0/anchor-boxes 11 | --- 12 | *Anchor boxes* are a technique used in some [computer vision][4] 13 | [object detection][3] algorithms to help identify objects of different shapes. 14 | 15 | Anchor boxes are hand-picked boxes of different height/width ratios 16 | (for 2-dimensional boxes) designed to match the relative ratios of 17 | the object classes being detected. For example, an object detector 18 | that detects cars and people may have a wide anchor box to detect 19 | cars and a tall, narrow box to detect people. 20 | 21 | The [Fast R-CNN][1] paper introduced the idea of using the 22 | [$k$-means-clustering][2] to automatically determine the appropriate 23 | anchor box dimensions for a given $k$ number of anchor boxes. 24 | 25 | [1]: /terms/fast-r-cnn/ 26 | [2]: /terms/k-means-clustering/ 27 | [3]: /terms/object-detection/ 28 | [4]: /terms/computer-vision/ 29 | -------------------------------------------------------------------------------- /terms/out-of-core.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Out-of-core algorithm 4 | link_url: https://en.wikipedia.org/wiki/Out-of-core_algorithm 5 | - link_title: Scaling with instances using out-of-core learning - scikit-learn documentation 6 | link_url: http://scikit-learn.org/stable/modules/scaling_strategies.html#scaling-with-instances-using-out-of-core-learning 7 | title: Out-of-core 8 | --- 9 | The term *out-of-core* typically refers to processing data that is too large 10 | to fit into a computer's main memory. 11 | 12 | Typically, when a dataset fits neatly into a computer's main memory, 13 | randomly accessing sections of data has a (relatively) small performance 14 | penalty. 15 | 16 | When data must be stored in a medium like a large spinning hard drive 17 | or an external computer network, it becomes very expensive to randomly 18 | seek to an arbitrary section of data or to process the same data 19 | multiple times. 20 | 21 | In such a case, an out-of-core algorithm would try to access all relevant 22 | data in one sequence. 23 | 24 | However, modern computers have a deep memory hierarchy, and replacing 25 | random access with sequential access can increase performance even 26 | on datasets that fit within memory. -------------------------------------------------------------------------------- /terms/rectified-linear-unit-relu.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Why do we use ReLU in neural networks and how do we use it? 4 | link_url: https://stats.stackexchange.com/questions/226923/why-do-we-use-relu-in-neural-networks-and-how-do-we-use-it 5 | - link_title: What are the advantages of ReLU over sigmoid function in deep neural 6 | networks? 7 | link_url: https://stats.stackexchange.com/questions/126238/what-are-the-advantages-of-relu-over-sigmoid-function-in-deep-neural-networks 8 | related_terms: 9 | - activation-function 10 | - neural-network 11 | title: Rectified Linear Unit (ReLU) 12 | --- 13 | A Rectified Linear Unit is a common name for a neuron (the "unit") 14 | with an activation function of $f(x) = \max(0,x)$. 15 | 16 | Neural networks built with ReLU have the following advantages: 17 | 18 | - [gradient][1] computation is simpler because the activation 19 | function is computationally similar than comparable activation 20 | functions like $\tanh(x)$. 21 | - Neural networks with ReLU are less susceptible to 22 | the [vanishing gradient problem][2] but may suffer from 23 | the [dying ReLU problem][3]. 24 | 25 | [1]: /terms/gradient/ 26 | [2]: /terms/vanishing-gradient-problem/ 27 | [3]: /terms/dying-relu/ -------------------------------------------------------------------------------- /_sass/tachyons/scss/_letter-spacing.scss: -------------------------------------------------------------------------------- 1 | 2 | // Converted Variables 3 | 4 | 5 | // Custom Media Query Variables 6 | 7 | 8 | /* 9 | 10 | LETTER SPACING 11 | Docs: http://tachyons.io/docs/typography/tracking/ 12 | 13 | Media Query Extensions: 14 | -ns = not-small 15 | -m = medium 16 | -l = large 17 | 18 | */ 19 | 20 | .tracked { letter-spacing: $letter-spacing-1; } 21 | .tracked-tight { letter-spacing: $letter-spacing-tight; } 22 | .tracked-mega { letter-spacing: $letter-spacing-2; } 23 | 24 | @media #{$breakpoint-not-small} { 25 | .tracked-ns { letter-spacing: $letter-spacing-1; } 26 | .tracked-tight-ns { letter-spacing: $letter-spacing-tight; } 27 | .tracked-mega-ns { letter-spacing: $letter-spacing-2; } 28 | } 29 | 30 | @media #{$breakpoint-medium} { 31 | .tracked-m { letter-spacing: $letter-spacing-1; } 32 | .tracked-tight-m { letter-spacing: $letter-spacing-tight; } 33 | .tracked-mega-m { letter-spacing: $letter-spacing-2; } 34 | } 35 | 36 | @media #{$breakpoint-large} { 37 | .tracked-l { letter-spacing: $letter-spacing-1; } 38 | .tracked-tight-l { letter-spacing: $letter-spacing-tight; } 39 | .tracked-mega-l { letter-spacing: $letter-spacing-2; } 40 | } 41 | -------------------------------------------------------------------------------- /terms/deep-convolutional-generative-adversarial-network-dcgan.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Unsupervised Representation Learning with Deep Convolutional Generative 4 | Adversarial Networks 5 | link_url: https://arxiv.org/abs/1511.06434 6 | related_terms: 7 | - generative-adversarial-network-gan 8 | - convolutional-neural-network-cnn 9 | title: Deep Convolutional Generative Adversarial Network (DCGAN) 10 | --- 11 | DCGAN refers to a model described by [Radford, Metz, and Chintala][1] 12 | that uses deep convolutional neural networks in a generative adversarial network model. 13 | 14 | Generative adversarial networks (GANs) are structured as a competition between 15 | two models: 16 | 17 | 1. a generative model that tries to create fake examples of training data interspersed with real training data. 18 | 2. a discriminative model that tries to classify real examples from fake ones. 19 | 20 | DCGAN uses deep convolutional neural networks for both models. Convolutional neural networks (CNNs) 21 | are well-known for their performance on image data. DCGAN uses the strong performance of (CNNs) 22 | to learn [unsupervised representations][2] of the input data. 23 | 24 | [1]: https://arxiv.org/abs/1511.06434 25 | [2]: /terms/unsupervised-learning/ -------------------------------------------------------------------------------- /terms/multiple-crops-at-test-time.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: State of computer vision - Convolutional Neural Networks - deeplearning.ai 4 | link_url: https://www.coursera.org/learn/convolutional-neural-networks/lecture/D9ra2/state-of-computer-vision 5 | related_terms: 6 | - data-augmentation 7 | - alexnet 8 | title: Multiple crops at test time 9 | --- 10 | *Multi-crop at test time* is a form of data augmentation that a model uses 11 | during test time, as opposed to most data augmentation techniques 12 | that run during training time. 13 | 14 | Broadly, the technique involves: 15 | 16 | - cropping a test image in multiple ways 17 | - using the model to classify these cropped variants of the test image 18 | - averaging the results of the model's many predictions 19 | 20 | Multi-crop at test time is a technique that some machine learning researchers 21 | use to improve accuracy at test time. The technique 22 | found popularity among some competitors in the 23 | ImageNet Large Scale Visual Recognition Competition 24 | after the famous AlexNet paper, titled 25 | [ImageNet Classification with Deep Convolutional Neural Networks](https://papers.nips.cc/paper/4824-imagenet-classification-with-deep-convolutional-neural-networks.pdf), used 26 | the technique. -------------------------------------------------------------------------------- /terms/long-short-term-memory-lstm.md: -------------------------------------------------------------------------------- 1 | --- 2 | references: 3 | - link_title: Long Short-Term Memory - Wikipedia 4 | link_url: https://en.wikipedia.org/wiki/Long_short-term_memory 5 | - link_title: LONG SHORT-TERM MEMORY 6 | link_url: http://www.bioinf.jku.at/publications/older/2604.pdf 7 | related_terms: 8 | - recurrent-neural-network 9 | - backpropagation 10 | title: Long Short-Term Memory (LSTM) 11 | --- 12 | Long short term memory (LSTM) networks try to reduce the vanishing and exploding gradient problem during the backpropagation in recurrent neural networks. LSTM are in general, a RNN where each neuron has a memory cell and three gates: input, output and forget. The purpose of the memory cell is to retain information previously used by the RNN, or forget if needed. LSTMs are explicitly designed to avoid the long-term dependency problem in RNNs, and have been shown to be able to learn complex sequences better than simple RNNs. 13 | 14 | The structure of a memory cell is: an input gate, that determines how much of information from the previous layer gets stored in the cell; the output gate, that determines how of the next layer gets to know about the state of the current cell; and the forget gate, which determines what to forget about the current state of the memory cell. -------------------------------------------------------------------------------- /meta/unfinished.html: -------------------------------------------------------------------------------- 1 | --- 2 | layout: page 3 | title: "Meta: Unfinished Terms" 4 | --- 5 |

6 | This page contains links to terms that have short amounts of content. They 7 | should be expanded and turned into full-fledged glossary entries. 8 |

9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | {% assign pages = site.pages | where_exp:"item","item.url contains 'terms/'" | sort: 'title' %} 19 | {% for page in pages %} 20 | {% assign page_size = page.content | size %} 21 | {% if page_size < site.small_page_size and page.layout != redirect %} 22 | 23 | 24 | {% assign references = 0 %} 25 | {% if page.references %} 26 | {% assign references = page.references.size %} 27 | {% endif %} 28 | 29 | 30 | 31 | {% endif %} 32 | {% endfor %} 33 | 34 |
Page TitleCharacter(s)Reference(s)
{{ page.title }}{{ page.content | size }}{{ references }}
35 | --------------------------------------------------------------------------------