├── .gitignore ├── LICENSE.md ├── README.md ├── atlas.json ├── images ├── animation.gif ├── animation_matrix.gif ├── digits.png ├── digits_tsne.png ├── distributions.png ├── similarity.png └── spheres.png └── page.js /.gitignore: -------------------------------------------------------------------------------- 1 | .ipynb_checkpoints 2 | *.pyc 3 | .DS_Store 4 | .awspublish-* 5 | compiled -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | Contributor License Agreement 2 | 3 | Revised: August 26, 2013 4 | 5 | 6 | 7 | Contributors to O'Reilly Media, Inc. ("ORM") content projects are required to agree to this non-exclusive Contributor License Agreement ("CLA") in which ORM acknowledges that each contributor ("you") retains ownership of the copyright in each contribution and remains free to use each of the contributions for other purposes and independently of the ORM content project to which it was contributed, and grants a non-exclusive license to the contribution to ORM, and its licensees and assigns (the "ORM Parties"). 8 | 9 | 1\. In consideration of the opportunity provided to you by or on behalf of O'Reilly Media, Inc. to participate in and to make contributions to one or more ORM content projects (collectively, the "ORM Project"), you agree to be legally bound by this CLA. ORM's consent to your participation in and contribution to the ORM Project is conditioned on your agreement to and compliance with this CLA and the [ORM Privacy Policy][1]. Because the ORM Project is in development, access to it is provided to you only for purposes of your participation in its development under this CLA. You agree and understand that your participation in the ORM Project is at your own risk, and that it if you need backups of your Contributions, backing them up is your own responsibility and that ORM is not obligated to maintain any back-ups for your benefit. This CLA applies to all of your current and future Contributions to any current or future ORM Project. 10 | 11 | 2\. ORM understands and agrees that you reserve your ownership in the copyright in any content, data, feedback, contributions, errata, and suggestions made by you regarding any ORM Project and/or contributed by you to any ORM Project (collectively, "Contributions"). You grant ORM and its licensees and assigns the irrevocable and non-exclusive right in their sole discretion to use your Contributions as they deem appropriate in connection with the ORM Project, and/or any other work, product or service, and the ORM Parties will not have any obligation or liability to you with respect any use and/or incorporation of any of your Contributions (as submitted by you, or as modified or combined with other content by or on behalf of the ORM Parties) into the ORM Project, or any other work, product or service. You understand and agree that as between you and ORM, ORM is and will be the owner of all rights in and to the ORM Project, and any other work, product or service that may utilize the Contribution(s) under this CLA. You grant the ORM Parties an irrevocable license, but not the obligation, to use, in connection with any use made of your Contribution, the name under which you made the Contribution. 12 | 13 | 3\. You warrant and represent that you are legally entitled to enter into this CLA and to grant this license, and (a) if any of your employers has rights to intellectual property created by you which include your Contributions, that you have received permission to make Contributions on behalf of that employer, or that your employer has waived such rights with respect to your Contributions, or that your employer has agreed to its own separate CLA with ORM; (b) if you have employees, you also represent that any employee making Contributions on your behalf is fully authorized to do so, and that you have sufficient rights in the employee's Contributions to grant this license; and (c) the name and other contact information provided by you is accurate. You also warrant and represent that your participation in the ORM Project, and/or your Contributions, will not (d) include unauthorized disclosure(s) of personal information, trade secrets, or confidential information; (e) violate anyone's rights, including without limitation intellectual property rights; (f) contain software viruses or any other elements designed to interrupt, destroy or limit the functionality of any software, systems, or devices; (g) contain or link to commercial solicitations; (h) contain data or technology subject to restriction under laws regulating the export and other dissemination of information or technology; and (i) be inaccurate, defamatory, obscene, harassing, or otherwise objectionable. If you become aware of any facts or circumstances that would make your representations under this CLA inaccurate in any respect, or if your contact information changes, you will notify ORM promptly at atlas@oreilly.com. As between you and ORM, you assume all risk and consequences resulting from any third party's use of any Contribution. You will indemnify ORM, and our agents, licensees, affiliates, and employees, and hold them and ORM harmless, against any liability, loss or cost, including reasonable attorney's fees, arising out of any breach of these warranties and representations. Your contact information is subject to our [Privacy Policy]. 14 | 15 | 4\. California law and applicable U.S. federal law govern this CLA, and any dispute related to this CLA, and you agree to submit to the personal and exclusive jurisdiction of the courts located within the county of San Francisco, California, U.S.A. 16 | 17 | 5\. ORM may amend this CLA at any time by posting the amended CLA with or without notice to you on contributor-agreements.oreilly.com and your participation and/or contribution to any ORM Project after such posting will constitute consent to the modified CLA. A current version of this CLA will always be available at http://contributor-agreements.oreilly.com/contributor_agreement. 18 | 19 | 20 | [1]: http://oreilly.com/oreilly/privacy.csp 21 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # An illustrated introduction to the t-SNE algorithm 2 | 3 | In the Big Data era, data is not only becoming bigger and bigger; it is also becoming more and more complex. This translates into a spectacular increase of the dimensionality of the data. For example, the dimensionality of a set of images is the number of pixels in any image, which ranges from thousands to millions. 4 | 5 | Computers have no problem processing that many dimensions. However, we humans are limited to three dimensions. Computers still need us (thankfully), so we often need ways to effectively visualize high-dimensional data before handing it over to the computer. 6 | 7 | How can we possibly reduce the dimensionality of a dataset from an arbitrary number to two or three, which is what we're doing when we visualize data on a screen? 8 | 9 | The answer lies in the observation that many real-world datasets have a low intrinsic dimensionality, even though they're embedded in a high-dimensional space. Imagine that you're shooting a panoramic landscape with your camera, while rotating around yourself. We can consider every picture as a point in a 16,000,000-dimensional space (assuming a 16 megapixels camera). Yet, the set of pictures approximately lie in a three-dimensional space (yaw, pitch, roll). This low-dimensional space is embedded within the high-dimensional space in a complex, nonlinear way. Hidden in the data, this structure can only be recovered via specific mathematical methods. 10 | 11 | This is the topic of [**manifold learning**](http://en.wikipedia.org/wiki/Nonlinear_dimensionality_reduction), also called **nonlinear dimensionality reduction**, a branch of machine learning (more specifically, _unsupervised learning_). It is still an active area of research today to develop algorithms that can automatically recover a hidden structure in a high-dimensional dataset. 12 | 13 | This post is an introduction to a popular dimensonality reduction algorithm: [**t-distributed stochastic neighbor embedding (t-SNE)**](http://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding). Developed by [Laurens van der Maaten](http://lvdmaaten.github.io/) and [Geoffrey Hinton](http://www.cs.toronto.edu/~hinton/) (see the [original paper here](http://jmlr.csail.mit.edu/papers/volume9/vandermaaten08a/vandermaaten08a.pdf)), this algorithm has been successfully applied to many real-world datasets. Here, we'll follow the original paper and describe the key mathematical concepts of the method, when applied to a toy dataset (handwritten digits). We'll use Python and the [scikit-learn](http://scikit-learn.org/stable/index.html) library. 14 | 15 | ## Visualizing handwritten digits 16 | 17 | Let's first import a few libraries. 18 | 19 |
22 | # That's an impressive list of imports. 23 | import numpy as np 24 | from numpy import linalg 25 | from numpy.linalg import norm 26 | from scipy.spatial.distance import squareform, pdist 27 | 28 | # We import sklearn. 29 | import sklearn 30 | from sklearn.manifold import TSNE 31 | from sklearn.datasets import load_digits 32 | from sklearn.preprocessing import scale 33 | 34 | # We'll hack a bit with the t-SNE code in sklearn 0.15.2. 35 | from sklearn.metrics.pairwise import pairwise_distances 36 | from sklearn.manifold.t_sne import (_joint_probabilities, 37 | _kl_divergence) 38 | from sklearn.utils.extmath import _ravel 39 | # Random state. 40 | RS = 20150101 41 | 42 | # We'll use matplotlib for graphics. 43 | import matplotlib.pyplot as plt 44 | import matplotlib.patheffects as PathEffects 45 | import matplotlib 46 | %matplotlib inline 47 | 48 | # We import seaborn to make nice plots. 49 | import seaborn as sns 50 | sns.set_style('darkgrid') 51 | sns.set_palette('muted') 52 | sns.set_context("notebook", font_scale=1.5, 53 | rc={"lines.linewidth": 2.5}) 54 | 55 | # We'll generate an animation with matplotlib and moviepy. 56 | from moviepy.video.io.bindings import mplfig_to_npimage 57 | import moviepy.editor as mpy 58 |59 | 60 | Now we load the classic [_handwritten digits_](http://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_digits.html#sklearn.datasets.load_digits) datasets. It contains 1797 images with \\(8*8=64\\) pixels each. 61 | 62 |
65 | digits = load_digits() 66 | digits.data.shape 67 |68 | 69 |
72 | print(digits['DESCR']) 73 |74 | 75 | Here are the images: 76 | 77 |
80 | nrows, ncols = 2, 5 81 | plt.figure(figsize=(6,3)) 82 | plt.gray() 83 | for i in range(ncols * nrows): 84 | ax = plt.subplot(nrows, ncols, i + 1) 85 | ax.matshow(digits.images[i,...]) 86 | plt.xticks([]); plt.yticks([]) 87 | plt.title(digits.target[i]) 88 | plt.savefig('images/digits-generated.png', dpi=150) 89 |90 | 91 |  92 | 93 | Now let's run the t-SNE algorithm on the dataset. It just takes one line with scikit-learn. 94 | 95 |
98 | # We first reorder the data points according to the handwritten numbers. 99 | X = np.vstack([digits.data[digits.target==i] 100 | for i in range(10)]) 101 | y = np.hstack([digits.target[digits.target==i] 102 | for i in range(10)]) 103 |104 | 105 |
108 | digits_proj = TSNE(random_state=RS).fit_transform(X) 109 |110 | 111 | Here is a utility function used to display the transformed dataset. The color of each point refers to the actual digit (of course, this information was not used by the dimensionality reduction algorithm). 112 | 113 |
116 | def scatter(x, colors): 117 | # We choose a color palette with seaborn. 118 | palette = np.array(sns.color_palette("hls", 10)) 119 | 120 | # We create a scatter plot. 121 | f = plt.figure(figsize=(8, 8)) 122 | ax = plt.subplot(aspect='equal') 123 | sc = ax.scatter(x[:,0], x[:,1], lw=0, s=40, 124 | c=palette[colors.astype(np.int)]) 125 | plt.xlim(-25, 25) 126 | plt.ylim(-25, 25) 127 | ax.axis('off') 128 | ax.axis('tight') 129 | 130 | # We add the labels for each digit. 131 | txts = [] 132 | for i in range(10): 133 | # Position of each label. 134 | xtext, ytext = np.median(x[colors == i, :], axis=0) 135 | txt = ax.text(xtext, ytext, str(i), fontsize=24) 136 | txt.set_path_effects([ 137 | PathEffects.Stroke(linewidth=5, foreground="w"), 138 | PathEffects.Normal()]) 139 | txts.append(txt) 140 | 141 | return f, ax, sc, txts 142 |143 | 144 | Here is the result. 145 | 146 |
149 | scatter(digits_proj, y) 150 | plt.savefig('images/digits_tsne-generated.png', dpi=120) 151 |152 | 153 |  154 | 155 | We observe that the images corresponding to the different digits are clearly separated into different clusters of points. 156 | 157 | ## Mathematical framework 158 | 159 | Let's explain how the algorithm works. First, a few definitions. 160 | 161 | A **data point** is a point \\(x_i\\) in the original **data space** \\(\mathbf{R}^D\\), where \\(D=64\\) is the **dimensionality** of the data space. Every point is an image of a handwritten digit here. There are \\(N=1797\\) points. 162 | 163 | A **map point** is a point \\(y_i\\) in the **map space** \\(\mathbf{R}^2\\). This space will contain our final representation of the dataset. There is a _bijection_ between the data points and the map points: every map point represents one of the original images. 164 | 165 | How do we choose the positions of the map points? We want to conserve the structure of the data. More specifically, if two data points are close together, we want the two corresponding map points to be close too. Let's \\(\left| x_i - x_j \right|\\) be the Euclidean distance between two data points, and \\(\left| y_i - y_j \right|\\) the distance between the map points. We first define a conditional similarity between the two data points: 166 | 167 | \\(p_{j|i} = \frac{\exp\left(-\left| x_i - x_j\right|^2 \big/ 2\sigma_i^2\right)}{\displaystyle\sum_{k \neq i} \exp\left(-\left| x_i - x_k\right|^2 \big/ 2\sigma_i^2\right)}\\) 168 | 169 | This measures how close \\(x_j\\) is from \\(x_i\\), considering a **Gaussian distribution** around \\(x_i\\) with a given variance \\(\sigma_i^2\\). This variance is different for every point; it is chosen such that points in dense areas are given a smaller variance than points in sparse areas. The original paper details how this variance is computed exactly. 170 | 171 | Now, we define the similarity as a symmetrized version of the conditional similarity: 172 | 173 | \\(p_{ij} = \frac{p_{j|i} + p_{i|j}}{2N}\\) 174 | 175 | We obtain a **similarity matrix** for our original dataset. What does this matrix look like? 176 | 177 | ## Similarity matrix 178 | 179 | The following function computes the similarity with a constant \\(\sigma\\). 180 | 181 |
184 | def _joint_probabilities_constant_sigma(D, sigma): 185 | P = np.exp(-D**2/2 * sigma**2) 186 | P /= np.sum(P, axis=1) 187 | return P 188 |189 | 190 | We now compute the similarity with a \\(\sigma_i\\) depending on the data point (found via a binary search, according to the original t-SNE paper). This algorithm is implemented in the `_joint_probabilities` private function in scikit-learn's code. 191 | 192 |
195 | # Pairwise distances between all data points. 196 | D = pairwise_distances(X, squared=True) 197 | # Similarity with constant sigma. 198 | P_constant = _joint_probabilities_constant_sigma(D, .002) 199 | # Similarity with variable sigma. 200 | P_binary = _joint_probabilities(D, 30., False) 201 | # The output of this function needs to be reshaped to a square matrix. 202 | P_binary_s = squareform(P_binary) 203 |204 | 205 | We can now display the distance matrix of the data points, and the similarity matrix with both a constant and variable sigma. 206 | 207 |
210 | plt.figure(figsize=(12, 4)) 211 | pal = sns.light_palette("blue", as_cmap=True) 212 | 213 | plt.subplot(131) 214 | plt.imshow(D[::10, ::10], interpolation='none', cmap=pal) 215 | plt.axis('off') 216 | plt.title("Distance matrix", fontdict={'fontsize': 16}) 217 | 218 | plt.subplot(132) 219 | plt.imshow(P_constant[::10, ::10], interpolation='none', cmap=pal) 220 | plt.axis('off') 221 | plt.title("$p_{j|i}$ (constant $\sigma$)", fontdict={'fontsize': 16}) 222 | 223 | plt.subplot(133) 224 | plt.imshow(P_binary_s[::10, ::10], interpolation='none', cmap=pal) 225 | plt.axis('off') 226 | plt.title("$p_{j|i}$ (variable $\sigma$)", fontdict={'fontsize': 16}) 227 | plt.savefig('images/similarity-generated.png', dpi=120) 228 |229 | 230 | We can already observe the 10 groups in the data, corresponding to the 10 numbers. 231 | 232 | Let's also define a similarity matrix for our map points. 233 | 234 | \\(q_{ij} = \frac{f(\left| x_i - x_j\right|)}{\displaystyle\sum_{k \neq i} f(\left| x_i - x_k\right|)} \quad \textrm{with} \quad f(z) = \frac{1}{1+z^2}\\) 235 | 236 | This is the same idea as for the data points, but with a different distribution ([**t-Student with one degree of freedom**](http://en.wikipedia.org/wiki/Student%27s_t-distribution), or [**Cauchy distribution**](http://en.wikipedia.org/wiki/Cauchy_distribution), instead of a Gaussian distribution). We'll elaborate on this choice later. 237 | 238 | Whereas the data similarity matrix \\(\big(p_{ij}\big)\\) is fixed, the map similarity matrix \\(\big(q_{ij}\big)\\) depends on the map points. What we want is for these two matrices to be as close as possible. This would mean that similar data points yield similar map points. 239 | 240 | ## A physical analogy 241 | 242 | Let's assume that our map points are all connected with springs. The stiffness of a spring connecting points \\(i\\) and \\(j\\) depends on the mismatch between the similarity of the two data points and the similarity of the two map points, that is, \\(p_{ij} - q_{ij}\\). Now, we let the system evolve according to the laws of physics. If two map points are far apart while the data points are close, they are attracted together. If they are nearby while the data points are dissimilar, they are repelled. 243 | 244 | The final mapping is obtained when the equilibrium is reached. 245 | 246 | Here is an illustration of a dynamic graph layout based on a similar idea. Nodes are connected via springs and the system evolves according to law of physics (example by [Mike Bostock](http://bl.ocks.org/mbostock/4062045)). 247 | 248 | 250 | 251 | 252 | 253 | 254 | 255 | 256 | 257 | 258 | 259 | 260 | 261 | ## Algorithm 262 | 263 | Remarkably, this physical analogy stems naturally from the mathematical algorithm. It corresponds to minimizing the [Kullback-Leiber](http://en.wikipedia.org/wiki/Kullback%E2%80%93Leibler_divergence) divergence between the two distributions \\(\big(p_{ij}\big)\\) and \\(\big(q_{ij}\big)\\): 264 | 265 | \\(KL(P||Q) = \sum_{i, j} p_{ij} \, \log \frac{p_{ij}}{q_{ij}}.\\) 266 | 267 | This measures the distance between our two similarity matrices. 268 | 269 | To minimize this score, we perform a gradient descent. The gradient can be computed analytically: 270 | 271 | \\(\frac{\partial \, KL(P || Q)}{\partial y_i} = 4 \sum_j (p_{ij} - q_{ij}) g\left( \left| x_i - x_j\right| \right) u_{ij} \quad \textrm{where} \, g(z) = \frac{z}{1+z^2}.\\) 272 | 273 | Here, \\(u_{ij}\\) is a unit vector going from \\(y_j\\) to \\(y_i\\). This gradient expresses the sum of all spring forces applied to map point \\(i\\). 274 | 275 | Let's illustrate this process by creating an animation of the convergence. We'll have to [monkey-patch](http://en.wikipedia.org/wiki/Monkey_patch) the internal `_gradient_descent()` function from scikit-learn's t-SNE implementation in order to register the position of the map points at every iteration. 276 | 277 |
280 | # This list will contain the positions of the map points at every iteration. 281 | positions = [] 282 | def _gradient_descent(objective, p0, it, n_iter, n_iter_without_progress=30, 283 | momentum=0.5, learning_rate=1000.0, min_gain=0.01, 284 | min_grad_norm=1e-7, min_error_diff=1e-7, verbose=0, 285 | args=[]): 286 | # The documentation of this function can be found in scikit-learn's code. 287 | p = p0.copy().ravel() 288 | update = np.zeros_like(p) 289 | gains = np.ones_like(p) 290 | error = np.finfo(np.float).max 291 | best_error = np.finfo(np.float).max 292 | best_iter = 0 293 | 294 | for i in range(it, n_iter): 295 | # We save the current position. 296 | positions.append(p.copy()) 297 | 298 | new_error, grad = objective(p, *args) 299 | error_diff = np.abs(new_error - error) 300 | error = new_error 301 | grad_norm = linalg.norm(grad) 302 | 303 | if error < best_error: 304 | best_error = error 305 | best_iter = i 306 | elif i - best_iter > n_iter_without_progress: 307 | break 308 | if min_grad_norm >= grad_norm: 309 | break 310 | if min_error_diff >= error_diff: 311 | break 312 | 313 | inc = update * grad >= 0.0 314 | dec = np.invert(inc) 315 | gains[inc] += 0.05 316 | gains[dec] *= 0.95 317 | np.clip(gains, min_gain, np.inf) 318 | grad *= gains 319 | update = momentum * update - learning_rate * grad 320 | p += update 321 | 322 | return p, error, i 323 | sklearn.manifold.t_sne._gradient_descent = _gradient_descent 324 |325 | 326 | Let's run the algorithm again, but this time saving all intermediate positions. 327 | 328 |
331 | X_proj = TSNE(random_state=RS).fit_transform(X) 332 |333 | 334 |
337 | X_iter = np.dstack(position.reshape(-1, 2) 338 | for position in positions) 339 |340 | 341 | We create an animation using [MoviePy](http://zulko.github.io/moviepy/). 342 | 343 |
346 | f, ax, sc, txts = scatter(X_iter[..., -1], y) 347 | 348 | def make_frame_mpl(t): 349 | i = int(t*40) 350 | x = X_iter[..., i] 351 | sc.set_offsets(x) 352 | for j, txt in zip(range(10), txts): 353 | xtext, ytext = np.median(x[y == j, :], axis=0) 354 | txt.set_x(xtext) 355 | txt.set_y(ytext) 356 | return mplfig_to_npimage(f) 357 | 358 | animation = mpy.VideoClip(make_frame_mpl, 359 | duration=X_iter.shape[2]/40.) 360 | animation.write_gif("images/animation.gif", fps=20) 361 |362 | 363 |
372 | n = 1. / (pdist(X_iter[..., -1], "sqeuclidean") + 1) 373 | Q = n / (2.0 * np.sum(n)) 374 | Q = squareform(Q) 375 | 376 | f = plt.figure(figsize=(6, 6)) 377 | ax = plt.subplot(aspect='equal') 378 | im = ax.imshow(Q, interpolation='none', cmap=pal) 379 | plt.axis('tight') 380 | plt.axis('off') 381 | 382 | def make_frame_mpl(t): 383 | i = int(t*40) 384 | n = 1. / (pdist(X_iter[..., i], "sqeuclidean") + 1) 385 | Q = n / (2.0 * np.sum(n)) 386 | Q = squareform(Q) 387 | im.set_data(Q) 388 | return mplfig_to_npimage(f) 389 | 390 | animation = mpy.VideoClip(make_frame_mpl, 391 | duration=X_iter.shape[2]/40.) 392 | animation.write_gif("images/animation_matrix.gif", fps=20) 393 |394 | 395 |
406 | npoints = 1000 407 | plt.figure(figsize=(15, 4)) 408 | for i, D in enumerate((2, 5, 10)): 409 | # Normally distributed points. 410 | u = np.random.randn(npoints, D) 411 | # Now on the sphere. 412 | u /= norm(u, axis=1)[:, None] 413 | # Uniform radius. 414 | r = np.random.rand(npoints, 1) 415 | # Uniformly within the ball. 416 | points = u * r**(1./D) 417 | # Plot. 418 | ax = plt.subplot(1, 3, i+1) 419 | ax.set_xlabel('Ball radius') 420 | if i == 0: 421 | ax.set_ylabel('Distance from origin') 422 | ax.hist(norm(points, axis=1), 423 | bins=np.linspace(0., 1., 50)) 424 | ax.set_title('D=%d' % D, loc='left') 425 | plt.savefig('images/spheres-generated.png', dpi=100, bbox_inches='tight') 426 |427 | 428 |  429 | 430 | When reducing the dimensionality of a dataset, if we used the same Gaussian distribution for the data points and the map points, we would get an _imbalance_ in the distribution of the distances of a point's neighbors. This is because the distribution of the distances is so different between a high-dimensional space and a low-dimensional space. Yet, the algorithm tries to reproduce the same distances in the two spaces. This imbalance would lead to an excess of attraction forces and a sometimes unappealing mapping. This is actually what happens in the original SNE algorithm, by [Hinton and Roweis (2002)](http://www.cs.toronto.edu/~fritz/absps/sne.pdf). 431 | 432 | The t-SNE algorithm works around this problem by using a t-Student with one degree of freedom (or Cauchy) distribution for the map points. This distribution has a much heavier tail than the Gaussian distribution, which _compensates_ the original imbalance. For a given similarity between two data points, the two corresponding map points will need to be much further apart in order for their similarity to match the data similarity. This can be seen in the following plot. 433 | 434 |
437 | z = np.linspace(0., 5., 1000) 438 | gauss = np.exp(-z**2) 439 | cauchy = 1/(1+z**2) 440 | plt.plot(z, gauss, label='Gaussian distribution') 441 | plt.plot(z, cauchy, label='Cauchy distribution') 442 | plt.legend() 443 | plt.savefig('images/distributions-generated.png', dpi=100) 444 |445 | 446 |  447 | 448 | Using this distribution leads to more effective data visualizations, where clusters of points are more distinctly separated. 449 | 450 | ## Conclusion 451 | 452 | The t-SNE algorithm provides an effective method to visualize a complex dataset. It successfully uncovers hidden structures in the data, exposing natural clusters and smooth nonlinear variations along the dimensions. It has been implemented in many languages, including Python, and it can be easily used thanks to the scikit-learn library. 453 | 454 | The references below describe some optimizations and improvements that can be made to the algorithm and implementations. In particular, the algorithm described here is quadratic in the number of samples, which makes it unscalable to large datasets. One could for example obtain an \\(O(N \log N)\\) complexity by using the Barnes-Hut algorithm to accelerate the N-body simulation via a quadtree or an octree. 455 | 456 | ## References 457 | 458 | * [Original paper](http://jmlr.csail.mit.edu/papers/volume9/vandermaaten08a/vandermaaten08a.pdf) 459 | * [Optimized t-SNE paper](http://lvdmaaten.github.io/publications/papers/JMLR_2014.pdf) 460 | * [A notebook on t-SNE by Alexander Flabish](http://nbviewer.ipython.org/urls/gist.githubusercontent.com/AlexanderFabisch/1a0c648de22eff4a2a3e/raw/59d5bc5ed8f8bfd9ff1f7faa749d1b095aa97d5a/t-SNE.ipynb) 461 | * [Official t-SNE page](http://lvdmaaten.github.io/tsne/) 462 | * [scikit documentation](http://scikit-learn.org/stable/modules/generated/sklearn.manifold.TSNE.html) 463 | * [Barnes-Hut t-SNE implementation in Python](https://github.com/danielfrg/tsne) 464 | * [Barnes-Hut on Wikipedia](http://en.wikipedia.org/wiki/Barnes%E2%80%93Hut_simulation) 465 | * [t-SNE on Wikipedia](http://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding) 466 | * [Implementation in scikit-learn](https://github.com/scikit-learn/scikit-learn/blob/master/sklearn/manifold/t_sne.py) 467 | -------------------------------------------------------------------------------- /atlas.json: -------------------------------------------------------------------------------- 1 | { 2 | "branch": "master", 3 | "files": [ 4 | "README.md" 5 | ], 6 | "formats": { 7 | "pdf": { 8 | "version": false, 9 | "index": false, 10 | "toc": false, 11 | "syntaxhighlighting": false, 12 | "show_comments": false 13 | }, 14 | "epub": { 15 | "index": false, 16 | "toc": false, 17 | "epubcheck": false, 18 | "syntaxhighlighting": false, 19 | "show_comments": false 20 | }, 21 | "mobi": { 22 | "index": false, 23 | "toc": false, 24 | "syntaxhighlighting": false, 25 | "show_comments": false 26 | }, 27 | "html": { 28 | "index": false, 29 | "toc": false, 30 | "syntaxhighlighting": false, 31 | "show_comments": false, 32 | "consolidated": false, 33 | "javascripts": [ 34 | "http://rawgit.com/oreillymedia/thebe/master/static/main-built.js", 35 | "page.js" 36 | ] 37 | } 38 | }, 39 | "theme": "oreillymedia/jupyter_theme", 40 | "title": "An illustrated introduction to the t-SNE algorithm", 41 | "github_url": "https://github.com/oreillymedia/ipython-tutorial-content", 42 | "uuid": "b0c6f45f-cc4f-4c32-9a70-c3c4adc54c70" 43 | } -------------------------------------------------------------------------------- /images/animation.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oreillymedia/t-SNE-tutorial/0e2ab3332525979ad1d710aa09bebcfbdab96c08/images/animation.gif -------------------------------------------------------------------------------- /images/animation_matrix.gif: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oreillymedia/t-SNE-tutorial/0e2ab3332525979ad1d710aa09bebcfbdab96c08/images/animation_matrix.gif -------------------------------------------------------------------------------- /images/digits.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oreillymedia/t-SNE-tutorial/0e2ab3332525979ad1d710aa09bebcfbdab96c08/images/digits.png -------------------------------------------------------------------------------- /images/digits_tsne.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oreillymedia/t-SNE-tutorial/0e2ab3332525979ad1d710aa09bebcfbdab96c08/images/digits_tsne.png -------------------------------------------------------------------------------- /images/distributions.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oreillymedia/t-SNE-tutorial/0e2ab3332525979ad1d710aa09bebcfbdab96c08/images/distributions.png -------------------------------------------------------------------------------- /images/similarity.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oreillymedia/t-SNE-tutorial/0e2ab3332525979ad1d710aa09bebcfbdab96c08/images/similarity.png -------------------------------------------------------------------------------- /images/spheres.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/oreillymedia/t-SNE-tutorial/0e2ab3332525979ad1d710aa09bebcfbdab96c08/images/spheres.png -------------------------------------------------------------------------------- /page.js: -------------------------------------------------------------------------------- 1 | $(function(){ 2 | new Thebe({url:"https://oreillyorchard.com:8000/"}); 3 | }); --------------------------------------------------------------------------------