├── .gitignore ├── .travis.yml ├── LICENSE.md ├── README.md ├── logs └── projector │ └── .gitignore ├── mnist_data └── mnist_10k_sprite.png ├── mnist_t-sne.py ├── mnist_with_summaries.py └── tests └── test_example.py /.gitignore: -------------------------------------------------------------------------------- 1 | /mnist_with_summaries.zip 2 | /mnist_data.zip 3 | /mnist_data/train-labels-idx1-ubyte.gz 4 | /mnist_data/train-images-idx3-ubyte.gz 5 | /mnist_data/t10k-labels-idx1-ubyte.gz 6 | /mnist_data/t10k-images-idx3-ubyte.gz 7 | /.idea 8 | *.pyc 9 | .DS_Store 10 | /logs/test 11 | /logs/train 12 | -------------------------------------------------------------------------------- /.travis.yml: -------------------------------------------------------------------------------- 1 | sudo: required 2 | dist: trusty 3 | language: python 4 | python: 5 | - "2.7" 6 | - "3.4" 7 | - "3.5" 8 | # command to install dependencies 9 | install: 10 | # install TensorFlow from https://storage.googleapis.com/tensorflow/ 11 | - if [[ "$TRAVIS_PYTHON_VERSION" == "2.7" ]]; then 12 | pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.0.0rc0-cp27-none-linux_x86_64.whl; 13 | elif [[ "$TRAVIS_PYTHON_VERSION" == "3.4" ]]; then 14 | pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.0.0rc0-cp34-cp34m-linux_x86_64.whl; 15 | elif [[ "$TRAVIS_PYTHON_VERSION" == "3.5" ]]; then 16 | pip install https://storage.googleapis.com/tensorflow/linux/cpu/tensorflow-1.0.0rc0-cp35-cp35m-linux_x86_64.whl; 17 | fi 18 | 19 | # command to run tests 20 | script: 21 | - python -m unittest discover -s tests; 22 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2015 Norman Heckscher 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # mnist-tensorboard-embeddings 2 | [![Build Status](https://travis-ci.org/normanheckscher/mnist-tensorboard-embeddings.svg?branch=master)](https://travis-ci.org/normanheckscher/mnist-tensorboard-embeddings) 3 | 4 | TensorBoard is a suite of web applications for inspecting and understanding 5 | your TensorFlow runs and graphs. The TensorFlow documentation isn't extremely 6 | explicit with the how-to visualizations. The code within `mnist_t-sne.py` 7 | is a working example of how to implement a 3-dimensional visualization 8 | with the MNIST dataset and it's embedded images. 9 | 10 | The full tutorial is on the [TensorFlow website](https://www.tensorflow.org/how_tos/embedding_viz/). 11 | 12 | By default, the Embedding Projector performs 3-dimensional 13 | [principal component analysis](https://en.wikipedia.org/wiki/Principal_component_analysis), 14 | meaning it takes high-dimensional data and tries to find a 15 | structure-preserving projection onto three dimensional space. Basically, it does 16 | this by rotating the data so that the first three dimensions reveal as much of 17 | the variance in the data as possible. There's a nice visual explanation 18 | [here](http://setosa.io/ev/principal-component-analysis/). Another extremely 19 | useful projection is 20 | [t-SNE](https://en.wikipedia.org/wiki/T-distributed_stochastic_neighbor_embedding). 21 | 22 | # Requirements 23 | - [Tensorflow](http://www.tensorflow.org) r1.0 24 | 25 | # Sample output 26 | 27 | 31 | 32 | Run the `mnist_t-sne.py` file from within its directory to generate the 33 | embeddings and visualisation. 34 | 35 | Once you have event files, run TensorBoard and provide the log directory. If 36 | you're using a precompiled TensorFlow package (e.g. you installed via pip), run: 37 | 38 | ``` 39 | tensorboard --logdir=path/to/logs 40 | ``` 41 | 42 | This should print that TensorBoard has started. Next, connect to http://localhost:6006. 43 | 44 | TensorBoard requires a `logdir` to read logs from. For info on configuring 45 | TensorBoard, run `tensorboard --help`. 46 | 47 | TensorBoard can be used in Google Chrome or Firefox. Other browsers might 48 | work, but there may be bugs or performance issues. 49 | 50 | The second file, ` mnist_with_summaries.py`, is a full example of the 51 | embedding,visualization and a subsequent model generation. This second 52 | file mostly mirrors the official TensorFlow tutorial file. 53 | 54 | # Contribution 55 | Your comments (issues) and PRs are always welcome. 56 | -------------------------------------------------------------------------------- /logs/projector/.gitignore: -------------------------------------------------------------------------------- 1 | # Ignore everything in this directory 2 | * 3 | # Except this file 4 | !.gitignore 5 | -------------------------------------------------------------------------------- /mnist_data/mnist_10k_sprite.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/normanheckscher/mnist-tensorboard-embeddings/9d57a364f58896d9be254082e31b49338103bb0a/mnist_data/mnist_10k_sprite.png -------------------------------------------------------------------------------- /mnist_t-sne.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 Norman Heckscher. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the 'License'); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an 'AS IS' BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """MNIST dimensionality reduction with TensorFlow and TensorBoard. 16 | 17 | This demonstrates the functionality of the TensorBoard Embedding Visualization dashboard using MNIST. 18 | 19 | https://www.tensorflow.org/versions/r0.12/how_tos/embedding_viz/index.html#tensorboard-embedding-visualization 20 | """ 21 | from __future__ import absolute_import 22 | from __future__ import division 23 | from __future__ import print_function 24 | 25 | import argparse 26 | import sys 27 | 28 | import os 29 | import numpy as np 30 | from tensorflow.contrib.tensorboard.plugins import projector 31 | 32 | import tensorflow as tf 33 | 34 | from tensorflow.examples.tutorials.mnist import input_data 35 | FLAGS = None 36 | 37 | def generate_embeddings(): 38 | # Import data 39 | mnist = input_data.read_data_sets(FLAGS.data_dir, 40 | one_hot=True, 41 | fake_data=FLAGS.fake_data) 42 | sess = tf.InteractiveSession() 43 | 44 | # Input set for Embedded TensorBoard visualization 45 | # Performed with cpu to conserve memory and processing power 46 | with tf.device("/cpu:0"): 47 | embedding = tf.Variable(tf.stack(mnist.test.images[:FLAGS.max_steps], axis=0), trainable=False, name='embedding') 48 | 49 | tf.global_variables_initializer().run() 50 | 51 | saver = tf.train.Saver() 52 | writer = tf.summary.FileWriter(FLAGS.log_dir + '/projector', sess.graph) 53 | 54 | # Add embedding tensorboard visualization. Need tensorflow version 55 | # >= 0.12.0RC0 56 | config = projector.ProjectorConfig() 57 | embed= config.embeddings.add() 58 | embed.tensor_name = 'embedding:0' 59 | embed.metadata_path = os.path.join(FLAGS.log_dir + '/projector/metadata.tsv') 60 | embed.sprite.image_path = os.path.join(FLAGS.data_dir + '/mnist_10k_sprite.png') 61 | 62 | # Specify the width and height of a single thumbnail. 63 | embed.sprite.single_image_dim.extend([28, 28]) 64 | projector.visualize_embeddings(writer, config) 65 | 66 | saver.save(sess, os.path.join( 67 | FLAGS.log_dir, 'projector/a_model.ckpt'), global_step=FLAGS.max_steps) 68 | 69 | def generate_metadata_file(): 70 | # Import data 71 | mnist = input_data.read_data_sets(FLAGS.data_dir, 72 | one_hot=True, 73 | fake_data=FLAGS.fake_data) 74 | def save_metadata(file): 75 | with open(file, 'w') as f: 76 | for i in range(FLAGS.max_steps): 77 | c = np.nonzero(mnist.test.labels[::1])[1:][0][i] 78 | f.write('{}\n'.format(c)) 79 | 80 | save_metadata(FLAGS.log_dir + '/projector/metadata.tsv') 81 | 82 | def main(_): 83 | if tf.gfile.Exists(FLAGS.log_dir + '/projector'): 84 | tf.gfile.DeleteRecursively(FLAGS.log_dir + '/projector') 85 | tf.gfile.MkDir(FLAGS.log_dir + '/projector') 86 | tf.gfile.MakeDirs(FLAGS.log_dir + '/projector') # fix the directory to be created 87 | generate_metadata_file() 88 | generate_embeddings() 89 | 90 | if __name__ == '__main__': 91 | parser = argparse.ArgumentParser() 92 | parser.add_argument('--fake_data', nargs='?', const=True, type=bool, 93 | default=False, 94 | help='If true, uses fake data for unit testing.') 95 | parser.add_argument('--max_steps', type=int, default=10000, 96 | help='Number of steps to run trainer.') 97 | parser.add_argument('--data_dir', type=str, default='/Users/norman/Documents/workspace/mnist-tensorboard-embeddings/mnist_data', 98 | help='Directory for storing input data') 99 | parser.add_argument('--log_dir', type=str, default='/Users/norman/Documents/workspace/mnist-tensorboard-embeddings/logs', 100 | help='Summaries log directory') 101 | FLAGS, unparsed = parser.parse_known_args() 102 | tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) 103 | 104 | -------------------------------------------------------------------------------- /mnist_with_summaries.py: -------------------------------------------------------------------------------- 1 | # Copyright 2015 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the 'License'); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an 'AS IS' BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """A simple MNIST classifier which displays summaries in TensorBoard. 16 | 17 | This is an unimpressive MNIST model, but it is a good example of using 18 | tf.name_scope to make a graph legible in the TensorBoard graph explorer, and of 19 | naming summary tags so that they are grouped meaningfully in TensorBoard. 20 | 21 | It demonstrates the functionality of every TensorBoard dashboard. 22 | 23 | Updated by Norman Heckscher to use display Embedding Visualization. 24 | https://www.tensorflow.org/versions/r0.12/how_tos/embedding_viz/index.html#tensorboard-embedding-visualization 25 | """ 26 | from __future__ import absolute_import 27 | from __future__ import division 28 | from __future__ import print_function 29 | 30 | import argparse 31 | import sys 32 | 33 | # nh 34 | import os 35 | import numpy as np 36 | from tensorflow.contrib.tensorboard.plugins import projector 37 | # 38 | 39 | import tensorflow as tf 40 | 41 | from tensorflow.examples.tutorials.mnist import input_data 42 | FLAGS = None 43 | 44 | def train(): 45 | # Import data 46 | mnist = input_data.read_data_sets(FLAGS.data_dir, 47 | one_hot=True, 48 | fake_data=FLAGS.fake_data) 49 | sess = tf.InteractiveSession() 50 | # Create a multilayer model. 51 | 52 | # Input placeholders 53 | with tf.name_scope('input'): 54 | x = tf.placeholder(tf.float32, [None, 784], name='x-input') 55 | y_ = tf.placeholder(tf.float32, [None, 10], name='y-input') 56 | 57 | # Input set for Embedded TensorBoard visualization 58 | # Performed with cpu to conserve memory and processing power 59 | with tf.device("/cpu:0"): 60 | embedding = tf.Variable(tf.stack(mnist.test.images[:FLAGS.max_steps], axis=0), trainable=False, name='embedding') 61 | 62 | with tf.name_scope('input_reshape'): 63 | image_shaped_input = tf.reshape(x, [-1, 28, 28, 1]) 64 | tf.summary.image('input', image_shaped_input, 10) 65 | 66 | # We can't initialize these variables to 0 - the network will get stuck. 67 | def weight_variable(shape): 68 | """Create a weight variable with appropriate initialization.""" 69 | initial = tf.truncated_normal(shape, stddev=0.1) 70 | return tf.Variable(initial) 71 | 72 | def bias_variable(shape): 73 | """Create a bias variable with appropriate initialization.""" 74 | initial = tf.constant(0.1, shape=shape) 75 | return tf.Variable(initial) 76 | 77 | def variable_summaries(var): 78 | """Attach a lot of summaries to a Tensor (for TensorBoard visualization).""" 79 | with tf.name_scope('summaries'): 80 | mean = tf.reduce_mean(var) 81 | tf.summary.scalar('mean', mean) 82 | with tf.name_scope('stddev'): 83 | stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean))) 84 | tf.summary.scalar('stddev', stddev) 85 | tf.summary.scalar('max', tf.reduce_max(var)) 86 | tf.summary.scalar('min', tf.reduce_min(var)) 87 | tf.summary.histogram('histogram', var) 88 | 89 | def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu): 90 | """Reusable code for making a simple neural net layer. 91 | 92 | It does a matrix multiply, bias add, and then uses relu to nonlinearize. 93 | It also sets up name scoping so that the resultant graph is easy to read, 94 | and adds a number of summary ops. 95 | """ 96 | # Adding a name scope ensures logical grouping of the layers in the graph. 97 | with tf.name_scope(layer_name): 98 | # This Variable will hold the state of the weights for the layer 99 | with tf.name_scope('weights'): 100 | weights = weight_variable([input_dim, output_dim]) 101 | variable_summaries(weights) 102 | with tf.name_scope('biases'): 103 | biases = bias_variable([output_dim]) 104 | variable_summaries(biases) 105 | with tf.name_scope('Wx_plus_b'): 106 | preactivate = tf.matmul(input_tensor, weights) + biases 107 | tf.summary.histogram('pre_activations', preactivate) 108 | activations = act(preactivate, name='activation') 109 | tf.summary.histogram('activations', activations) 110 | return activations 111 | 112 | hidden1 = nn_layer(x, 784, 500, 'layer1') 113 | 114 | with tf.name_scope('dropout'): 115 | keep_prob = tf.placeholder(tf.float32) 116 | tf.summary.scalar('dropout_keep_probability', keep_prob) 117 | dropped = tf.nn.dropout(hidden1, keep_prob) 118 | 119 | # Do not apply softmax activation yet, see below. 120 | y = nn_layer(dropped, 500, 10, 'layer2', act=tf.identity) 121 | 122 | with tf.name_scope('cross_entropy'): 123 | # The raw formulation of cross-entropy, 124 | # 125 | # tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.softmax(y)), 126 | # reduction_indices=[1])) 127 | # 128 | # can be numerically unstable. 129 | # 130 | # So here we use tf.nn.softmax_cross_entropy_with_logits on the 131 | # raw outputs of the nn_layer above, and then average across 132 | # the batch. 133 | diff = tf.nn.softmax_cross_entropy_with_logits(logits=y, labels=y_) 134 | with tf.name_scope('total'): 135 | cross_entropy = tf.reduce_mean(diff) 136 | tf.summary.scalar('cross_entropy', cross_entropy) 137 | 138 | with tf.name_scope('train'): 139 | train_step = tf.train.AdamOptimizer(FLAGS.learning_rate).minimize( 140 | cross_entropy) 141 | 142 | with tf.name_scope('accuracy'): 143 | with tf.name_scope('correct_prediction'): 144 | correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) 145 | with tf.name_scope('accuracy'): 146 | accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 147 | tf.summary.scalar('accuracy', accuracy) 148 | 149 | # Merge all the summaries and write them out to /tmp/mnist_logs (by default) 150 | merged = tf.summary.merge_all() 151 | train_writer = tf.summary.FileWriter(FLAGS.log_dir + '/train', 152 | sess.graph) 153 | test_writer = tf.summary.FileWriter(FLAGS.log_dir + '/test') 154 | 155 | tf.global_variables_initializer().run() 156 | 157 | saver = tf.train.Saver() 158 | writer = tf.summary.FileWriter(FLAGS.log_dir + '/projector', sess.graph) 159 | # Add embedding tensorboard visualization. Need tensorflow version 160 | # >= 0.12.0RC0 161 | config = projector.ProjectorConfig() 162 | embed = config.embeddings.add() 163 | embed.tensor_name = 'embedding:0' 164 | embed.metadata_path = os.path.join(FLAGS.log_dir + '/projector/metadata.tsv') 165 | embed.sprite.image_path = os.path.join(FLAGS.data_dir + '/mnist_10k_sprite.png') 166 | # Specify the width and height of a single thumbnail. 167 | embed.sprite.single_image_dim.extend([28, 28]) 168 | projector.visualize_embeddings(writer, config) 169 | 170 | # Train the model, and also write summaries. 171 | # Every 10th step, measure test-set accuracy, and write test summaries 172 | # All other steps, run train_step on training data, & add training summaries 173 | 174 | def feed_dict(train): 175 | """Make a TensorFlow feed_dict: maps data onto Tensor placeholders.""" 176 | if train or FLAGS.fake_data: 177 | xs, ys = mnist.train.next_batch(100, fake_data=FLAGS.fake_data) 178 | k = FLAGS.dropout 179 | else: 180 | xs, ys = mnist.test.images, mnist.test.labels 181 | k = 1.0 182 | return {x: xs, y_: ys, keep_prob: k} 183 | 184 | for i in range(FLAGS.max_steps): 185 | if i % 10 == 0: # Record summaries and test-set accuracy 186 | summary, acc = sess.run([merged, accuracy], feed_dict=feed_dict(False)) 187 | test_writer.add_summary(summary, i) 188 | print('Accuracy at step %s: %s' % (i, acc)) 189 | 190 | else: # Record train set summaries, and train 191 | if i % 100 == 99: # Record execution stats 192 | run_options = tf.RunOptions(trace_level=tf.RunOptions.FULL_TRACE) 193 | run_metadata = tf.RunMetadata() 194 | summary, _ = sess.run([merged, train_step], 195 | feed_dict=feed_dict(True), 196 | options=run_options, 197 | run_metadata=run_metadata) 198 | train_writer.add_run_metadata(run_metadata, 'step%03d' % i) 199 | train_writer.add_summary(summary, i) 200 | 201 | print('Adding run metadata for', i) 202 | 203 | else: # Record a summary 204 | summary, _ = sess.run([merged, train_step], feed_dict=feed_dict(True)) 205 | train_writer.add_summary(summary, i) 206 | saver.save(sess, os.path.join( 207 | FLAGS.log_dir, 'projector/a_model.ckpt'), global_step=FLAGS.max_steps) 208 | train_writer.close() 209 | test_writer.close() 210 | 211 | 212 | def generate_metadata_file(): 213 | # Import data 214 | mnist = input_data.read_data_sets(FLAGS.data_dir, 215 | one_hot=True, 216 | fake_data=FLAGS.fake_data) 217 | def save_metadata(file): 218 | with open(file, 'w') as f: 219 | # f.write('id\tchar\n') 220 | for i in range(FLAGS.max_steps): 221 | c = np.nonzero(mnist.test.labels[::1])[1:][0][i] 222 | f.write('{}\n'.format(c)) 223 | # save metadata file 224 | save_metadata(FLAGS.log_dir + '/projector/metadata.tsv') 225 | 226 | def main(_): 227 | if tf.gfile.Exists(FLAGS.log_dir + '/train'): 228 | tf.gfile.DeleteRecursively(FLAGS.log_dir + '/train') 229 | tf.gfile.DeleteRecursively(FLAGS.log_dir + '/test') 230 | tf.gfile.DeleteRecursively(FLAGS.log_dir + '/projector') 231 | tf.gfile.MkDir(FLAGS.log_dir + '/projector') 232 | tf.gfile.MakeDirs(FLAGS.log_dir) 233 | generate_metadata_file() 234 | train() 235 | 236 | if __name__ == '__main__': 237 | parser = argparse.ArgumentParser() 238 | parser.add_argument('--fake_data', nargs='?', const=True, type=bool, 239 | default=False, 240 | help='If true, uses fake data for unit testing.') 241 | parser.add_argument('--max_steps', type=int, default=10000, 242 | help='Number of steps to run trainer.') 243 | parser.add_argument('--learning_rate', type=float, default=0.001, 244 | help='Initial learning rate') 245 | parser.add_argument('--dropout', type=float, default=0.9, 246 | help='Keep probability for training dropout.') 247 | parser.add_argument('--data_dir', type=str, default='/Users/norman/Documents/workspace/mnist-tensorboard-embeddings/mnist_data', 248 | help='Directory for storing input data') 249 | parser.add_argument('--log_dir', type=str, default='/Users/norman/Documents/workspace/mnist-tensorboard-embeddings/logs', 250 | help='Summaries log directory') 251 | FLAGS, unparsed = parser.parse_known_args() 252 | tf.app.run(main=main, argv=[sys.argv[0]] + unparsed) 253 | 254 | -------------------------------------------------------------------------------- /tests/test_example.py: -------------------------------------------------------------------------------- 1 | import unittest 2 | 3 | class TestStringMethods(unittest.TestCase): 4 | 5 | def test_upper(self): 6 | self.assertEqual('foo'.upper(), 'FOO') 7 | 8 | def test_isupper(self): 9 | self.assertTrue('FOO'.isupper()) 10 | self.assertFalse('Foo'.isupper()) 11 | 12 | def test_split(self): 13 | s = 'hello world' 14 | self.assertEqual(s.split(), ['hello', 'world']) 15 | # check that s.split fails when the separator is not a string 16 | with self.assertRaises(TypeError): 17 | s.split(2) 18 | 19 | if __name__ == '__main__': 20 | unittest.main() --------------------------------------------------------------------------------