├── .github
├── ISSUE_TEMPLATE.md
└── PULL_REQUEST_TEMPLATE.md
├── .gitignore
├── Dockerfile
├── OWNERS
├── README.md
├── bazel
├── BUILD
├── WORKSPACE
├── hello_lib.py
└── hello_main.py
├── cifar10
├── README.md
├── cifar10.py
├── cifar10_async_dist_train.py
├── cifar10_eval.py
├── cifar10_input.py
├── cifar10_sync_dist_train.py
└── cifar10_train.py
├── data
├── README.md
├── ptb.test.txt
├── ptb.train.txt
├── ptb.valid.txt
├── t10k-images-idx3-ubyte.gz
├── t10k-labels-idx1-ubyte.gz
├── text8.zip
├── train-images-idx3-ubyte.gz
└── train-labels-idx1-ubyte.gz
├── distributed
├── README.md
├── create_tf_server_yaml.py
├── mnist_cnn.py
├── mnist_dnn.py
├── mnist_dnn_sync_update.py
├── start_servers.py
├── start_tf.sh
└── word2vector.py
├── file_server.py
├── imagenet_serving
├── Dockerfile
├── README.md
├── pic
│ ├── 02ea79e4aad9d6275da78a9170fa4e82.jpg
│ ├── 07889356d62fa6517b0db6cf9dcf1f96.jpg
│ ├── 516308313.jpg
│ └── 7e7e745620d307aa2cb4afcdfa90d189.jpg
└── serving.json
├── k8s_tf.yaml
├── k8s_tf_runner.yaml
├── notebooks
├── RNN_PennTreeBank_LanguageModeling.ipynb
├── hello_world.ipynb
├── mnist.ipynb
├── mnist_cnn.ipynb
├── mnist_skflow.ipynb
├── mnist_tensorboard.ipynb
├── scope.ipynb
└── word2vec_basic.ipynb
├── picture
├── create_local.png
├── create_terminal.png
├── dist_creation.png
├── expanded_view.png
├── homepage.png
├── jupyter.png
├── list_view.png
└── terminal_view.png
├── run_tf.sh
└── word2vector
├── BUILD
├── README.md
├── __init__.py
├── index.html
├── questions-words.txt
├── word2vec.py
├── word2vec_basic.py
├── word2vec_kernels.cc
├── word2vec_ops.cc
├── word2vec_optimized.py
└── words_calculator_server.py
/.github/ISSUE_TEMPLATE.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | **Is this a BUG REPORT or FEATURE REQUEST?**:
4 |
5 | > Uncomment only one, leave it on its own line:
6 | >
7 | > /kind bug
8 | > /kind feature
9 |
10 | **What happened**:
11 |
12 | **What you expected to happen**:
13 |
14 | **How to reproduce it (as minimally and precisely as possible)**:
15 |
16 | **Anything else we need to know?**:
17 |
--------------------------------------------------------------------------------
/.github/PULL_REQUEST_TEMPLATE.md:
--------------------------------------------------------------------------------
1 |
2 |
3 | **What this PR does / why we need it**:
4 |
5 | Add your description
6 |
7 | **Which issue(s) this PR is related to** *(optional, link to 3rd issue(s))*:
8 |
9 | Fixes #
10 |
11 | Reference to #
12 |
13 |
14 | **Special notes for your reviewer**:
15 |
16 | /cc @your-reviewer
17 |
18 |
26 |
27 | **Release note**:
28 |
32 |
33 | ```release-note
34 | NONE
35 | ```
36 |
37 |
47 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | *~
2 | word2vector/text8*
3 | word2vector/glove*
4 |
--------------------------------------------------------------------------------
/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM index.caicloud.io/caicloud/ml-libs
2 |
3 | RUN apt-get update && apt-get install -y bc
4 |
5 | RUN rm -rf /notebooks/*
6 |
7 | COPY data /tmp/data
8 | COPY file_server.py /file_server.py
9 | COPY run_tf.sh /run_tf.sh
10 |
11 | COPY notebooks /notebooks
12 | COPY distributed /distributed
13 |
14 | CMD ["/run_tf.sh"]
15 |
--------------------------------------------------------------------------------
/OWNERS:
--------------------------------------------------------------------------------
1 | approvers:
2 | - perhapszzy
3 | reviewers:
4 | - perhapszzy
5 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # TensorFlow Examples
2 | This repository includes TensorFlow example codes for both distributed and non-distributed version. **Contributions are very welcome.**
3 |
4 | ## Local examples
5 | To run the local examples on Jupyter Notebooks, you can either use caicloud.io directly or run it in docker with caicloud TensorFlow image.
6 |
7 | ### Use caicloud.io machine learning SaaS
8 | - Step 1. Login into [caicloud.io](https://console.caicloud.io/login). Registry [here](https://console.caicloud.io/reg) if you don't have a caicloud account. After login, you may see something like this
9 | 
10 |
11 | - Step 2. Click on "机器学习" and then click on "单机实验”. You may see something like the picture below if you haven't created one. If you have already created one, you can skip Step 3.
12 | 
13 |
14 | - Step 3. Creat an experiment environment by click “创建单机实验” and fill the required fields.
15 | 
16 | 
17 |
18 | - Step 4. Open Jupyter Notebook
19 | 
20 |
21 | ### Use caicloud TensorFlow docker image
22 | - Step 1. [Install Docker](https://docs.docker.com/engine/installation/)
23 |
24 | - Step 2. Pull image
25 |
26 | ```
27 | docker pull index.caicloud.io/tensorflow:0.8.0
28 | ```
29 |
30 | Note you need to have a [caicloud account](https://console.caicloud.io/reg) to pull the image.
31 |
32 | - Step 3. Start the image
33 |
34 | ```
35 | docker run --net=host index.caicloud.io/tensorflow:0.8.0
36 | ```
37 |
38 | - Step 4. Access the Jupyter Notebook at ```localhost:8888```
39 |
40 |
41 | ## Distributed examples
42 | Distributed TensorFlow examples could only be run on [caicloud.io](caicloud.io).
43 |
44 | - Step 1. Create distributed TensorFlow cluster. This may take a few minutes. Note you'll need to create a kubernetes cluster before deploying a TensorFlow cluster. This [doc](http://www.clipular.com/c/4898024607711232.png?k=8TxxmTwy57gXs7SZ9iVVopscjKg) describes how to create a kubernetes cluster on caicloud.io.
45 | 
46 |
47 | - Step 2. Open Jupyter Notebook.
48 |
49 | - Step 3. Create a terminal.
50 | 
51 | 
52 |
53 | - Step 4. Go into the distrubted examples directory:
54 |
55 | ```
56 | cd /distributed
57 | ls
58 | ```
59 |
60 | - Step 5. Run examples follow instructions [here](https://github.com/caicloud/tensorflow-demo/blob/master/distributed/README.md)
61 |
62 |
--------------------------------------------------------------------------------
/bazel/BUILD:
--------------------------------------------------------------------------------
1 | py_library(
2 | name = "hello_lib",
3 | srcs = [
4 | "hello_lib.py",
5 | ]
6 | )
7 |
8 | py_binary(
9 | name = "hello_main",
10 | srcs = [
11 | "hello_main.py",
12 | ],
13 | deps = [
14 | ":hello_lib",
15 | ],
16 | )
17 |
18 |
--------------------------------------------------------------------------------
/bazel/WORKSPACE:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/caicloud/tensorflow-demo/87460710b5fc9c829de369013c0008a9322b6c4f/bazel/WORKSPACE
--------------------------------------------------------------------------------
/bazel/hello_lib.py:
--------------------------------------------------------------------------------
1 | def print_hello_world():
2 | print('Hello World')
3 |
--------------------------------------------------------------------------------
/bazel/hello_main.py:
--------------------------------------------------------------------------------
1 | import hello_lib
2 | hello_lib.print_hello_world()
3 |
--------------------------------------------------------------------------------
/cifar10/README.md:
--------------------------------------------------------------------------------
1 | CIFAR-10 is a common benchmark in machine learning for image recognition.
2 |
3 | http://www.cs.toronto.edu/~kriz/cifar.html
4 |
5 | Code in this directory demonstrates how to use TensorFlow to train and evaluate a convolutional neural network (CNN) on both CPU and GPU. We also demonstrate how to train a CNN over multiple GPUs.
6 |
7 | Detailed instructions on how to get started available at:
8 |
9 | http://tensorflow.org/tutorials/deep_cnn/
10 |
11 |
--------------------------------------------------------------------------------
/cifar10/cifar10.py:
--------------------------------------------------------------------------------
1 | # Copyright 2015 Google Inc. All Rights Reserved.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | # ==============================================================================
15 |
16 | """Builds the CIFAR-10 network.
17 |
18 | Summary of available functions:
19 |
20 | # Compute input images and labels for training. If you would like to run
21 | # evaluations, use inputs() instead.
22 | inputs, labels = distorted_inputs()
23 |
24 | # Compute inference on the model inputs to make a prediction.
25 | predictions = inference(inputs)
26 |
27 | # Compute the total loss of the prediction with respect to the labels.
28 | loss = loss(predictions, labels)
29 |
30 | # Create a graph to run one step of training with respect to the loss.
31 | train_op = train(loss, global_step)
32 | """
33 | # pylint: disable=missing-docstring
34 | from __future__ import absolute_import
35 | from __future__ import division
36 | from __future__ import print_function
37 |
38 | import gzip
39 | import os
40 | import re
41 | import sys
42 | import tarfile
43 |
44 | from six.moves import urllib
45 | import tensorflow as tf
46 |
47 | from tensorflow.models.image.cifar10 import cifar10_input
48 |
49 | FLAGS = tf.app.flags.FLAGS
50 |
51 | # Basic model parameters.
52 | tf.app.flags.DEFINE_integer('batch_size', 128,
53 | """Number of images to process in a batch.""")
54 | tf.app.flags.DEFINE_string('data_dir', '/tmp/cifar10_data',
55 | """Path to the CIFAR-10 data directory.""")
56 |
57 | # Global constants describing the CIFAR-10 data set.
58 | IMAGE_SIZE = cifar10_input.IMAGE_SIZE
59 | NUM_CLASSES = cifar10_input.NUM_CLASSES
60 | NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = cifar10_input.NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN
61 | NUM_EXAMPLES_PER_EPOCH_FOR_EVAL = cifar10_input.NUM_EXAMPLES_PER_EPOCH_FOR_EVAL
62 |
63 |
64 | # Constants describing the training process.
65 | MOVING_AVERAGE_DECAY = 0.9999 # The decay to use for the moving average.
66 | NUM_EPOCHS_PER_DECAY = 350.0 # Epochs after which learning rate decays.
67 | LEARNING_RATE_DECAY_FACTOR = 0.1 # Learning rate decay factor.
68 | INITIAL_LEARNING_RATE = 0.1 # Initial learning rate.
69 |
70 | # If a model is trained with multiple GPUs, prefix all Op names with tower_name
71 | # to differentiate the operations. Note that this prefix is removed from the
72 | # names of the summaries when visualizing a model.
73 | TOWER_NAME = 'tower'
74 |
75 | DATA_URL = 'http://www.cs.toronto.edu/~kriz/cifar-10-binary.tar.gz'
76 |
77 |
78 | def _activation_summary(x):
79 | """Helper to create summaries for activations.
80 |
81 | Creates a summary that provides a histogram of activations.
82 | Creates a summary that measure the sparsity of activations.
83 |
84 | Args:
85 | x: Tensor
86 | Returns:
87 | nothing
88 | """
89 | # Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training
90 | # session. This helps the clarity of presentation on tensorboard.
91 | tensor_name = re.sub('%s_[0-9]*/' % TOWER_NAME, '', x.op.name)
92 | tf.histogram_summary(tensor_name + '/activations', x)
93 | tf.scalar_summary(tensor_name + '/sparsity', tf.nn.zero_fraction(x))
94 |
95 |
96 | def _variable_on_cpu(name, shape, initializer):
97 | """Helper to create a Variable stored on CPU memory.
98 |
99 | Args:
100 | name: name of the variable
101 | shape: list of ints
102 | initializer: initializer for Variable
103 |
104 | Returns:
105 | Variable Tensor
106 | """
107 | with tf.device('/cpu:0'):
108 | var = tf.get_variable(name, shape, initializer=initializer)
109 | return var
110 |
111 |
112 | def _variable_with_weight_decay(name, shape, stddev, wd):
113 | """Helper to create an initialized Variable with weight decay.
114 |
115 | Note that the Variable is initialized with a truncated normal distribution.
116 | A weight decay is added only if one is specified.
117 |
118 | Args:
119 | name: name of the variable
120 | shape: list of ints
121 | stddev: standard deviation of a truncated Gaussian
122 | wd: add L2Loss weight decay multiplied by this float. If None, weight
123 | decay is not added for this Variable.
124 |
125 | Returns:
126 | Variable Tensor
127 | """
128 | var = _variable_on_cpu(name, shape,
129 | tf.truncated_normal_initializer(stddev=stddev))
130 | if wd is not None:
131 | weight_decay = tf.mul(tf.nn.l2_loss(var), wd, name='weight_loss')
132 | tf.add_to_collection('losses', weight_decay)
133 | return var
134 |
135 |
136 | def distorted_inputs():
137 | """Construct distorted input for CIFAR training using the Reader ops.
138 |
139 | Returns:
140 | images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
141 | labels: Labels. 1D tensor of [batch_size] size.
142 |
143 | Raises:
144 | ValueError: If no data_dir
145 | """
146 | if not FLAGS.data_dir:
147 | raise ValueError('Please supply a data_dir')
148 | data_dir = os.path.join(FLAGS.data_dir, 'cifar-10-batches-bin')
149 | return cifar10_input.distorted_inputs(data_dir=data_dir,
150 | batch_size=FLAGS.batch_size)
151 |
152 |
153 | def inputs(eval_data):
154 | """Construct input for CIFAR evaluation using the Reader ops.
155 |
156 | Args:
157 | eval_data: bool, indicating if one should use the train or eval data set.
158 |
159 | Returns:
160 | images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
161 | labels: Labels. 1D tensor of [batch_size] size.
162 |
163 | Raises:
164 | ValueError: If no data_dir
165 | """
166 | if not FLAGS.data_dir:
167 | raise ValueError('Please supply a data_dir')
168 | data_dir = os.path.join(FLAGS.data_dir, 'cifar-10-batches-bin')
169 | return cifar10_input.inputs(eval_data=eval_data, data_dir=data_dir,
170 | batch_size=FLAGS.batch_size)
171 |
172 |
173 | def inference(images):
174 | """Build the CIFAR-10 model.
175 |
176 | Args:
177 | images: Images returned from distorted_inputs() or inputs().
178 |
179 | Returns:
180 | Logits.
181 | """
182 | # We instantiate all variables using tf.get_variable() instead of
183 | # tf.Variable() in order to share variables across multiple GPU training runs.
184 | # If we only ran this model on a single GPU, we could simplify this function
185 | # by replacing all instances of tf.get_variable() with tf.Variable().
186 | #
187 | # conv1
188 | with tf.variable_scope('conv1') as scope:
189 | kernel = _variable_with_weight_decay('weights', shape=[5, 5, 3, 64],
190 | stddev=1e-4, wd=0.0)
191 | conv = tf.nn.conv2d(images, kernel, [1, 1, 1, 1], padding='SAME')
192 | biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.0))
193 | bias = tf.nn.bias_add(conv, biases)
194 | conv1 = tf.nn.relu(bias, name=scope.name)
195 | _activation_summary(conv1)
196 |
197 | # pool1
198 | pool1 = tf.nn.max_pool(conv1, ksize=[1, 3, 3, 1], strides=[1, 2, 2, 1],
199 | padding='SAME', name='pool1')
200 | # norm1
201 | norm1 = tf.nn.lrn(pool1, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,
202 | name='norm1')
203 |
204 | # conv2
205 | with tf.variable_scope('conv2') as scope:
206 | kernel = _variable_with_weight_decay('weights', shape=[5, 5, 64, 64],
207 | stddev=1e-4, wd=0.0)
208 | conv = tf.nn.conv2d(norm1, kernel, [1, 1, 1, 1], padding='SAME')
209 | biases = _variable_on_cpu('biases', [64], tf.constant_initializer(0.1))
210 | bias = tf.nn.bias_add(conv, biases)
211 | conv2 = tf.nn.relu(bias, name=scope.name)
212 | _activation_summary(conv2)
213 |
214 | # norm2
215 | norm2 = tf.nn.lrn(conv2, 4, bias=1.0, alpha=0.001 / 9.0, beta=0.75,
216 | name='norm2')
217 | # pool2
218 | pool2 = tf.nn.max_pool(norm2, ksize=[1, 3, 3, 1],
219 | strides=[1, 2, 2, 1], padding='SAME', name='pool2')
220 |
221 | # local3
222 | with tf.variable_scope('local3') as scope:
223 | # Move everything into depth so we can perform a single matrix multiply.
224 | reshape = tf.reshape(pool2, [FLAGS.batch_size, -1])
225 | dim = reshape.get_shape()[1].value
226 | weights = _variable_with_weight_decay('weights', shape=[dim, 384],
227 | stddev=0.04, wd=0.004)
228 | biases = _variable_on_cpu('biases', [384], tf.constant_initializer(0.1))
229 | local3 = tf.nn.relu(tf.matmul(reshape, weights) + biases, name=scope.name)
230 | _activation_summary(local3)
231 |
232 | # local4
233 | with tf.variable_scope('local4') as scope:
234 | weights = _variable_with_weight_decay('weights', shape=[384, 192],
235 | stddev=0.04, wd=0.004)
236 | biases = _variable_on_cpu('biases', [192], tf.constant_initializer(0.1))
237 | local4 = tf.nn.relu(tf.matmul(local3, weights) + biases, name=scope.name)
238 | _activation_summary(local4)
239 |
240 | # softmax, i.e. softmax(WX + b)
241 | with tf.variable_scope('softmax_linear') as scope:
242 | weights = _variable_with_weight_decay('weights', [192, NUM_CLASSES],
243 | stddev=1/192.0, wd=0.0)
244 | biases = _variable_on_cpu('biases', [NUM_CLASSES],
245 | tf.constant_initializer(0.0))
246 | softmax_linear = tf.add(tf.matmul(local4, weights), biases, name=scope.name)
247 | _activation_summary(softmax_linear)
248 |
249 | return softmax_linear
250 |
251 |
252 | def loss(logits, labels):
253 | """Add L2Loss to all the trainable variables.
254 |
255 | Add summary for "Loss" and "Loss/avg".
256 | Args:
257 | logits: Logits from inference().
258 | labels: Labels from distorted_inputs or inputs(). 1-D tensor
259 | of shape [batch_size]
260 |
261 | Returns:
262 | Loss tensor of type float.
263 | """
264 | # Calculate the average cross entropy loss across the batch.
265 | labels = tf.cast(labels, tf.int64)
266 | cross_entropy = tf.nn.sparse_softmax_cross_entropy_with_logits(
267 | logits, labels, name='cross_entropy_per_example')
268 | cross_entropy_mean = tf.reduce_mean(cross_entropy, name='cross_entropy')
269 | tf.add_to_collection('losses', cross_entropy_mean)
270 |
271 | # The total loss is defined as the cross entropy loss plus all of the weight
272 | # decay terms (L2 loss).
273 | return tf.add_n(tf.get_collection('losses'), name='total_loss')
274 |
275 |
276 | def _add_loss_summaries(total_loss):
277 | """Add summaries for losses in CIFAR-10 model.
278 |
279 | Generates moving average for all losses and associated summaries for
280 | visualizing the performance of the network.
281 |
282 | Args:
283 | total_loss: Total loss from loss().
284 | Returns:
285 | loss_averages_op: op for generating moving averages of losses.
286 | """
287 | # Compute the moving average of all individual losses and the total loss.
288 | loss_averages = tf.train.ExponentialMovingAverage(0.9, name='avg')
289 | losses = tf.get_collection('losses')
290 | loss_averages_op = loss_averages.apply(losses + [total_loss])
291 |
292 | # Attach a scalar summary to all individual losses and the total loss; do the
293 | # same for the averaged version of the losses.
294 | for l in losses + [total_loss]:
295 | # Name each loss as '(raw)' and name the moving average version of the loss
296 | # as the original loss name.
297 | tf.scalar_summary(l.op.name +' (raw)', l)
298 | tf.scalar_summary(l.op.name, loss_averages.average(l))
299 |
300 | return loss_averages_op
301 |
302 |
303 | def train(total_loss, global_step):
304 | """Train CIFAR-10 model.
305 |
306 | Create an optimizer and apply to all trainable variables. Add moving
307 | average for all trainable variables.
308 |
309 | Args:
310 | total_loss: Total loss from loss().
311 | global_step: Integer Variable counting the number of training steps
312 | processed.
313 | Returns:
314 | train_op: op for training.
315 | """
316 | # Variables that affect learning rate.
317 | num_batches_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN / FLAGS.batch_size
318 | decay_steps = int(num_batches_per_epoch * NUM_EPOCHS_PER_DECAY)
319 |
320 | # Decay the learning rate exponentially based on the number of steps.
321 | lr = tf.train.exponential_decay(INITIAL_LEARNING_RATE,
322 | global_step,
323 | decay_steps,
324 | LEARNING_RATE_DECAY_FACTOR,
325 | staircase=True)
326 | tf.scalar_summary('learning_rate', lr)
327 |
328 | # Generate moving averages of all losses and associated summaries.
329 | loss_averages_op = _add_loss_summaries(total_loss)
330 |
331 | # Compute gradients.
332 | with tf.control_dependencies([loss_averages_op]):
333 | opt = tf.train.GradientDescentOptimizer(lr)
334 | grads = opt.compute_gradients(total_loss)
335 |
336 | # Apply gradients.
337 | apply_gradient_op = opt.apply_gradients(grads, global_step=global_step)
338 |
339 | # Add histograms for trainable variables.
340 | for var in tf.trainable_variables():
341 | tf.histogram_summary(var.op.name, var)
342 |
343 | # Add histograms for gradients.
344 | for grad, var in grads:
345 | if grad is not None:
346 | tf.histogram_summary(var.op.name + '/gradients', grad)
347 |
348 | # Track the moving averages of all trainable variables.
349 | variable_averages = tf.train.ExponentialMovingAverage(
350 | MOVING_AVERAGE_DECAY, global_step)
351 | variables_averages_op = variable_averages.apply(tf.trainable_variables())
352 |
353 | with tf.control_dependencies([apply_gradient_op, variables_averages_op]):
354 | train_op = tf.no_op(name='train')
355 |
356 | return train_op
357 |
358 |
359 | def maybe_download_and_extract():
360 | """Download and extract the tarball from Alex's website."""
361 | dest_directory = FLAGS.data_dir
362 | if not os.path.exists(dest_directory):
363 | os.makedirs(dest_directory)
364 | filename = DATA_URL.split('/')[-1]
365 | filepath = os.path.join(dest_directory, filename)
366 | if not os.path.exists(filepath):
367 | def _progress(count, block_size, total_size):
368 | sys.stdout.write('\r>> Downloading %s %.1f%%' % (filename,
369 | float(count * block_size) / float(total_size) * 100.0))
370 | sys.stdout.flush()
371 | filepath, _ = urllib.request.urlretrieve(DATA_URL, filepath, _progress)
372 | print()
373 | statinfo = os.stat(filepath)
374 | print('Successfully downloaded', filename, statinfo.st_size, 'bytes.')
375 | tarfile.open(filepath, 'r:gz').extractall(dest_directory)
376 |
--------------------------------------------------------------------------------
/cifar10/cifar10_async_dist_train.py:
--------------------------------------------------------------------------------
1 | from __future__ import absolute_import
2 | from __future__ import division
3 | from __future__ import print_function
4 |
5 | from datetime import datetime
6 | import os.path
7 | import time
8 |
9 | import numpy as np
10 | from six.moves import xrange
11 | import tensorflow as tf
12 |
13 | from tensorflow.models.image.cifar10 import cifar10
14 |
15 | FLAGS = tf.app.flags.FLAGS
16 |
17 | tf.app.flags.DEFINE_string('job_name', '', 'One of "ps", "worker"')
18 | tf.app.flags.DEFINE_string('ps_hosts', '',
19 | """Comma-separated list of hostname:port for the """
20 | """parameter server jobs. e.g. """
21 | """'machine1:2222,machine2:1111,machine2:2222'""")
22 | tf.app.flags.DEFINE_string('worker_hosts', '',
23 | """Comma-separated list of hostname:port for the """
24 | """worker jobs. e.g. """
25 | """'machine1:2222,machine2:1111,machine2:2222'""")
26 | tf.app.flags.DEFINE_integer('task_id', 0, 'Task ID of the worker/replica running the training.')
27 |
28 |
29 | tf.app.flags.DEFINE_string('train_dir', '/tmp/cifar10_train',
30 | """Directory where to write event logs """
31 | """and checkpoint.""")
32 | tf.app.flags.DEFINE_integer('max_steps', 1000000,
33 | """Number of batches to run.""")
34 | tf.app.flags.DEFINE_boolean('log_device_placement', False,
35 | """Whether to log device placement.""")
36 |
37 | tf.logging.set_verbosity(tf.logging.INFO)
38 |
39 | def train():
40 | ps_hosts = FLAGS.ps_hosts.split(',')
41 | worker_hosts = FLAGS.worker_hosts.split(',')
42 | print ('PS hosts are: %s' % ps_hosts)
43 | print ('Worker hosts are: %s' % worker_hosts)
44 |
45 | server = tf.train.Server(
46 | {'ps': ps_hosts, 'worker': worker_hosts},
47 | job_name = FLAGS.job_name,
48 | task_index=FLAGS.task_id)
49 |
50 | if FLAGS.job_name == 'ps':
51 | server.join()
52 |
53 | is_chief = (FLAGS.task_id == 0)
54 | if is_chief:
55 | if tf.gfile.Exists(FLAGS.train_dir):
56 | tf.gfile.DeleteRecursively(FLAGS.train_dir)
57 | tf.gfile.MakeDirs(FLAGS.train_dir)
58 |
59 | device_setter = tf.train.replica_device_setter(ps_tasks=1)
60 | with tf.device('/job:worker/task:%d' % FLAGS.task_id):
61 | with tf.device(device_setter):
62 | global_step = tf.Variable(0, trainable=False)
63 |
64 | # Get images and labels for CIFAR-10.
65 | images, labels = cifar10.distorted_inputs()
66 |
67 | # Build a Graph that computes the logits predictions from the
68 | # inference model.
69 | logits = cifar10.inference(images)
70 |
71 | # Calculate loss.
72 | loss = cifar10.loss(logits, labels)
73 | train_op = cifar10.train(loss, global_step)
74 |
75 | saver = tf.train.Saver()
76 | # We run the summaries in the same thread as the training operations by
77 | # passing in None for summary_op to avoid a summary_thread being started.
78 | # Running summaries and training operations in parallel could run out of
79 | # GPU memory.
80 | sv = tf.train.Supervisor(is_chief=is_chief,
81 | logdir=FLAGS.train_dir,
82 | init_op=tf.initialize_all_variables(),
83 | summary_op=tf.merge_all_summaries(),
84 | global_step=global_step,
85 | saver=saver,
86 | save_model_secs=60)
87 |
88 | tf.logging.info('%s Supervisor' % datetime.now())
89 |
90 | sess_config = tf.ConfigProto(allow_soft_placement=True,
91 | log_device_placement=FLAGS.log_device_placement)
92 |
93 | print ("Before session init")
94 | # Get a session.
95 | sess = sv.prepare_or_wait_for_session(server.target, config=sess_config)
96 | print ("Session init done")
97 |
98 | # Start the queue runners.
99 | queue_runners = tf.get_collection(tf.GraphKeys.QUEUE_RUNNERS)
100 | sv.start_queue_runners(sess, queue_runners)
101 | print ('Started %d queues for processing input data.' % len(queue_runners))
102 |
103 | """Train CIFAR-10 for a number of steps."""
104 | for step in xrange(FLAGS.max_steps):
105 | start_time = time.time()
106 | _, loss_value, gs = sess.run([train_op, loss, global_step])
107 | duration = time.time() - start_time
108 |
109 | assert not np.isnan(loss_value), 'Model diverged with loss = NaN'
110 |
111 | if step % 10 == 0:
112 | num_examples_per_step = FLAGS.batch_size
113 | examples_per_sec = num_examples_per_step / duration
114 | sec_per_batch = float(duration)
115 |
116 | format_str = ('%s: step %d (global_step %d), loss = %.2f (%.1f examples/sec; %.3f sec/batch)')
117 | print (format_str % (datetime.now(), step, gs, loss_value, examples_per_sec, sec_per_batch))
118 |
119 | if is_chief:
120 | saver.save(sess, os.path.join(FLAGS.train_dir, 'model.ckpt'), global_step=global_step)
121 |
122 |
123 | def main(argv=None):
124 | cifar10.maybe_download_and_extract()
125 | train()
126 |
127 | if __name__ == '__main__':
128 | tf.app.run()
129 |
--------------------------------------------------------------------------------
/cifar10/cifar10_eval.py:
--------------------------------------------------------------------------------
1 | # Copyright 2015 Google Inc. All Rights Reserved.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | # ==============================================================================
15 |
16 | """Evaluation for CIFAR-10.
17 |
18 | Accuracy:
19 | cifar10_train.py achieves 83.0% accuracy after 100K steps (256 epochs
20 | of data) as judged by cifar10_eval.py.
21 |
22 | Speed:
23 | On a single Tesla K40, cifar10_train.py processes a single batch of 128 images
24 | in 0.25-0.35 sec (i.e. 350 - 600 images /sec). The model reaches ~86%
25 | accuracy after 100K steps in 8 hours of training time.
26 |
27 | Usage:
28 | Please see the tutorial and website for how to download the CIFAR-10
29 | data set, compile the program and train the model.
30 |
31 | http://tensorflow.org/tutorials/deep_cnn/
32 | """
33 | from __future__ import absolute_import
34 | from __future__ import division
35 | from __future__ import print_function
36 |
37 | from datetime import datetime
38 | import math
39 | import time
40 |
41 | import numpy as np
42 | import tensorflow as tf
43 |
44 | from tensorflow.models.image.cifar10 import cifar10
45 |
46 | FLAGS = tf.app.flags.FLAGS
47 |
48 | tf.app.flags.DEFINE_string('eval_dir', '/tmp/cifar10_eval',
49 | """Directory where to write event logs.""")
50 | tf.app.flags.DEFINE_string('eval_data', 'test',
51 | """Either 'test' or 'train_eval'.""")
52 | tf.app.flags.DEFINE_string('checkpoint_dir', '/tmp/cifar10_train',
53 | """Directory where to read model checkpoints.""")
54 | tf.app.flags.DEFINE_integer('eval_interval_secs', 60 * 5,
55 | """How often to run the eval.""")
56 | tf.app.flags.DEFINE_integer('num_examples', 10000,
57 | """Number of examples to run.""")
58 | tf.app.flags.DEFINE_boolean('run_once', False,
59 | """Whether to run eval only once.""")
60 |
61 |
62 | def eval_once(saver, summary_writer, top_k_op, summary_op):
63 | """Run Eval once.
64 |
65 | Args:
66 | saver: Saver.
67 | summary_writer: Summary writer.
68 | top_k_op: Top K op.
69 | summary_op: Summary op.
70 | """
71 | with tf.Session() as sess:
72 | ckpt = tf.train.get_checkpoint_state(FLAGS.checkpoint_dir)
73 | if ckpt and ckpt.model_checkpoint_path:
74 | # Restores from checkpoint
75 | saver.restore(sess, ckpt.model_checkpoint_path)
76 | # Assuming model_checkpoint_path looks something like:
77 | # /my-favorite-path/cifar10_train/model.ckpt-0,
78 | # extract global_step from it.
79 | global_step = ckpt.model_checkpoint_path.split('/')[-1].split('-')[-1]
80 | else:
81 | print('No checkpoint file found')
82 | return
83 |
84 | # Start the queue runners.
85 | coord = tf.train.Coordinator()
86 | try:
87 | threads = []
88 | for qr in tf.get_collection(tf.GraphKeys.QUEUE_RUNNERS):
89 | threads.extend(qr.create_threads(sess, coord=coord, daemon=True,
90 | start=True))
91 |
92 | num_iter = int(math.ceil(FLAGS.num_examples / FLAGS.batch_size))
93 | true_count = 0 # Counts the number of correct predictions.
94 | total_sample_count = num_iter * FLAGS.batch_size
95 | step = 0
96 | while step < num_iter and not coord.should_stop():
97 | predictions = sess.run([top_k_op])
98 | true_count += np.sum(predictions)
99 | step += 1
100 |
101 | # Compute precision @ 1.
102 | precision = true_count / total_sample_count
103 | print('%s: precision @ 1 = %.3f' % (datetime.now(), precision))
104 |
105 | summary = tf.Summary()
106 | summary.ParseFromString(sess.run(summary_op))
107 | summary.value.add(tag='Precision @ 1', simple_value=precision)
108 | summary_writer.add_summary(summary, global_step)
109 | except Exception as e: # pylint: disable=broad-except
110 | coord.request_stop(e)
111 |
112 | coord.request_stop()
113 | coord.join(threads, stop_grace_period_secs=10)
114 |
115 |
116 | def evaluate():
117 | """Eval CIFAR-10 for a number of steps."""
118 | with tf.Graph().as_default() as g:
119 | # Get images and labels for CIFAR-10.
120 | eval_data = FLAGS.eval_data == 'test'
121 | images, labels = cifar10.inputs(eval_data=eval_data)
122 |
123 | # Build a Graph that computes the logits predictions from the
124 | # inference model.
125 | logits = cifar10.inference(images)
126 |
127 | # Calculate predictions.
128 | top_k_op = tf.nn.in_top_k(logits, labels, 1)
129 |
130 | # Restore the moving average version of the learned variables for eval.
131 | variable_averages = tf.train.ExponentialMovingAverage(
132 | cifar10.MOVING_AVERAGE_DECAY)
133 | variables_to_restore = variable_averages.variables_to_restore()
134 | saver = tf.train.Saver(variables_to_restore)
135 |
136 | # Build the summary operation based on the TF collection of Summaries.
137 | summary_op = tf.merge_all_summaries()
138 |
139 | summary_writer = tf.train.SummaryWriter(FLAGS.eval_dir, g)
140 |
141 | while True:
142 | eval_once(saver, summary_writer, top_k_op, summary_op)
143 | if FLAGS.run_once:
144 | break
145 | time.sleep(FLAGS.eval_interval_secs)
146 |
147 |
148 | def main(argv=None): # pylint: disable=unused-argument
149 | cifar10.maybe_download_and_extract()
150 | if tf.gfile.Exists(FLAGS.eval_dir):
151 | tf.gfile.DeleteRecursively(FLAGS.eval_dir)
152 | tf.gfile.MakeDirs(FLAGS.eval_dir)
153 | evaluate()
154 |
155 |
156 | if __name__ == '__main__':
157 | tf.app.run()
158 |
--------------------------------------------------------------------------------
/cifar10/cifar10_input.py:
--------------------------------------------------------------------------------
1 | # Copyright 2015 Google Inc. All Rights Reserved.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | # ==============================================================================
15 |
16 | """Routine for decoding the CIFAR-10 binary file format."""
17 |
18 | from __future__ import absolute_import
19 | from __future__ import division
20 | from __future__ import print_function
21 |
22 | import os
23 |
24 | from six.moves import xrange # pylint: disable=redefined-builtin
25 | import tensorflow as tf
26 |
27 | # Process images of this size. Note that this differs from the original CIFAR
28 | # image size of 32 x 32. If one alters this number, then the entire model
29 | # architecture will change and any model would need to be retrained.
30 | IMAGE_SIZE = 24
31 |
32 | # Global constants describing the CIFAR-10 data set.
33 | NUM_CLASSES = 10
34 | NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 50000
35 | NUM_EXAMPLES_PER_EPOCH_FOR_EVAL = 10000
36 |
37 |
38 | def read_cifar10(filename_queue):
39 | """Reads and parses examples from CIFAR10 data files.
40 |
41 | Recommendation: if you want N-way read parallelism, call this function
42 | N times. This will give you N independent Readers reading different
43 | files & positions within those files, which will give better mixing of
44 | examples.
45 |
46 | Args:
47 | filename_queue: A queue of strings with the filenames to read from.
48 |
49 | Returns:
50 | An object representing a single example, with the following fields:
51 | height: number of rows in the result (32)
52 | width: number of columns in the result (32)
53 | depth: number of color channels in the result (3)
54 | key: a scalar string Tensor describing the filename & record number
55 | for this example.
56 | label: an int32 Tensor with the label in the range 0..9.
57 | uint8image: a [height, width, depth] uint8 Tensor with the image data
58 | """
59 |
60 | class CIFAR10Record(object):
61 | pass
62 | result = CIFAR10Record()
63 |
64 | # Dimensions of the images in the CIFAR-10 dataset.
65 | # See http://www.cs.toronto.edu/~kriz/cifar.html for a description of the
66 | # input format.
67 | label_bytes = 1 # 2 for CIFAR-100
68 | result.height = 32
69 | result.width = 32
70 | result.depth = 3
71 | image_bytes = result.height * result.width * result.depth
72 | # Every record consists of a label followed by the image, with a
73 | # fixed number of bytes for each.
74 | record_bytes = label_bytes + image_bytes
75 |
76 | # Read a record, getting filenames from the filename_queue. No
77 | # header or footer in the CIFAR-10 format, so we leave header_bytes
78 | # and footer_bytes at their default of 0.
79 | reader = tf.FixedLengthRecordReader(record_bytes=record_bytes)
80 | result.key, value = reader.read(filename_queue)
81 |
82 | # Convert from a string to a vector of uint8 that is record_bytes long.
83 | record_bytes = tf.decode_raw(value, tf.uint8)
84 |
85 | # The first bytes represent the label, which we convert from uint8->int32.
86 | result.label = tf.cast(
87 | tf.slice(record_bytes, [0], [label_bytes]), tf.int32)
88 |
89 | # The remaining bytes after the label represent the image, which we reshape
90 | # from [depth * height * width] to [depth, height, width].
91 | depth_major = tf.reshape(tf.slice(record_bytes, [label_bytes], [image_bytes]),
92 | [result.depth, result.height, result.width])
93 | # Convert from [depth, height, width] to [height, width, depth].
94 | result.uint8image = tf.transpose(depth_major, [1, 2, 0])
95 |
96 | return result
97 |
98 |
99 | def _generate_image_and_label_batch(image, label, min_queue_examples,
100 | batch_size, shuffle):
101 | """Construct a queued batch of images and labels.
102 |
103 | Args:
104 | image: 3-D Tensor of [height, width, 3] of type.float32.
105 | label: 1-D Tensor of type.int32
106 | min_queue_examples: int32, minimum number of samples to retain
107 | in the queue that provides of batches of examples.
108 | batch_size: Number of images per batch.
109 | shuffle: boolean indicating whether to use a shuffling queue.
110 |
111 | Returns:
112 | images: Images. 4D tensor of [batch_size, height, width, 3] size.
113 | labels: Labels. 1D tensor of [batch_size] size.
114 | """
115 | # Create a queue that shuffles the examples, and then
116 | # read 'batch_size' images + labels from the example queue.
117 | num_preprocess_threads = 16
118 | if shuffle:
119 | images, label_batch = tf.train.shuffle_batch(
120 | [image, label],
121 | batch_size=batch_size,
122 | num_threads=num_preprocess_threads,
123 | capacity=min_queue_examples + 3 * batch_size,
124 | min_after_dequeue=min_queue_examples)
125 | else:
126 | images, label_batch = tf.train.batch(
127 | [image, label],
128 | batch_size=batch_size,
129 | num_threads=num_preprocess_threads,
130 | capacity=min_queue_examples + 3 * batch_size)
131 |
132 | # Display the training images in the visualizer.
133 | tf.image_summary('images', images)
134 |
135 | return images, tf.reshape(label_batch, [batch_size])
136 |
137 |
138 | def distorted_inputs(data_dir, batch_size):
139 | """Construct distorted input for CIFAR training using the Reader ops.
140 |
141 | Args:
142 | data_dir: Path to the CIFAR-10 data directory.
143 | batch_size: Number of images per batch.
144 |
145 | Returns:
146 | images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
147 | labels: Labels. 1D tensor of [batch_size] size.
148 | """
149 | filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)
150 | for i in xrange(1, 6)]
151 | for f in filenames:
152 | if not tf.gfile.Exists(f):
153 | raise ValueError('Failed to find file: ' + f)
154 |
155 | # Create a queue that produces the filenames to read.
156 | filename_queue = tf.train.string_input_producer(filenames)
157 |
158 | # Read examples from files in the filename queue.
159 | read_input = read_cifar10(filename_queue)
160 | reshaped_image = tf.cast(read_input.uint8image, tf.float32)
161 |
162 | height = IMAGE_SIZE
163 | width = IMAGE_SIZE
164 |
165 | # Image processing for training the network. Note the many random
166 | # distortions applied to the image.
167 |
168 | # Randomly crop a [height, width] section of the image.
169 | distorted_image = tf.random_crop(reshaped_image, [height, width, 3])
170 |
171 | # Randomly flip the image horizontally.
172 | distorted_image = tf.image.random_flip_left_right(distorted_image)
173 |
174 | # Because these operations are not commutative, consider randomizing
175 | # the order their operation.
176 | distorted_image = tf.image.random_brightness(distorted_image,
177 | max_delta=63)
178 | distorted_image = tf.image.random_contrast(distorted_image,
179 | lower=0.2, upper=1.8)
180 |
181 | # Subtract off the mean and divide by the variance of the pixels.
182 | float_image = tf.image.per_image_whitening(distorted_image)
183 |
184 | # Ensure that the random shuffling has good mixing properties.
185 | min_fraction_of_examples_in_queue = 0.4
186 | min_queue_examples = int(NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN *
187 | min_fraction_of_examples_in_queue)
188 | print ('Filling queue with %d CIFAR images before starting to train. '
189 | 'This will take a few minutes.' % min_queue_examples)
190 |
191 | # Generate a batch of images and labels by building up a queue of examples.
192 | return _generate_image_and_label_batch(float_image, read_input.label,
193 | min_queue_examples, batch_size,
194 | shuffle=True)
195 |
196 |
197 | def inputs(eval_data, data_dir, batch_size):
198 | """Construct input for CIFAR evaluation using the Reader ops.
199 |
200 | Args:
201 | eval_data: bool, indicating if one should use the train or eval data set.
202 | data_dir: Path to the CIFAR-10 data directory.
203 | batch_size: Number of images per batch.
204 |
205 | Returns:
206 | images: Images. 4D tensor of [batch_size, IMAGE_SIZE, IMAGE_SIZE, 3] size.
207 | labels: Labels. 1D tensor of [batch_size] size.
208 | """
209 | if not eval_data:
210 | filenames = [os.path.join(data_dir, 'data_batch_%d.bin' % i)
211 | for i in xrange(1, 6)]
212 | num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN
213 | else:
214 | filenames = [os.path.join(data_dir, 'test_batch.bin')]
215 | num_examples_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_EVAL
216 |
217 | for f in filenames:
218 | if not tf.gfile.Exists(f):
219 | raise ValueError('Failed to find file: ' + f)
220 |
221 | # Create a queue that produces the filenames to read.
222 | filename_queue = tf.train.string_input_producer(filenames)
223 |
224 | # Read examples from files in the filename queue.
225 | read_input = read_cifar10(filename_queue)
226 | reshaped_image = tf.cast(read_input.uint8image, tf.float32)
227 |
228 | height = IMAGE_SIZE
229 | width = IMAGE_SIZE
230 |
231 | # Image processing for evaluation.
232 | # Crop the central [height, width] of the image.
233 | resized_image = tf.image.resize_image_with_crop_or_pad(reshaped_image,
234 | width, height)
235 |
236 | # Subtract off the mean and divide by the variance of the pixels.
237 | float_image = tf.image.per_image_whitening(resized_image)
238 |
239 | # Ensure that the random shuffling has good mixing properties.
240 | min_fraction_of_examples_in_queue = 0.4
241 | min_queue_examples = int(num_examples_per_epoch *
242 | min_fraction_of_examples_in_queue)
243 |
244 | # Generate a batch of images and labels by building up a queue of examples.
245 | return _generate_image_and_label_batch(float_image, read_input.label,
246 | min_queue_examples, batch_size,
247 | shuffle=False)
248 |
--------------------------------------------------------------------------------
/cifar10/cifar10_sync_dist_train.py:
--------------------------------------------------------------------------------
1 | from __future__ import absolute_import
2 | from __future__ import division
3 | from __future__ import print_function
4 |
5 | from datetime import datetime
6 | import os.path
7 | import time
8 |
9 | import numpy as np
10 | from six.moves import xrange # pylint: disable=redefined-builtin
11 | import tensorflow as tf
12 |
13 | from tensorflow.models.image.cifar10 import cifar10
14 |
15 | FLAGS = tf.app.flags.FLAGS
16 |
17 | tf.app.flags.DEFINE_string('job_name', '', 'One of "ps", "worker"')
18 | tf.app.flags.DEFINE_string('ps_hosts', '',
19 | """Comma-separated list of hostname:port for the """
20 | """parameter server jobs. e.g. """
21 | """'machine1:2222,machine2:1111,machine2:2222'""")
22 | tf.app.flags.DEFINE_string('worker_hosts', '',
23 | """Comma-separated list of hostname:port for the """
24 | """worker jobs. e.g. """
25 | """'machine1:2222,machine2:1111,machine2:2222'""")
26 | tf.app.flags.DEFINE_integer('task_id', 0, 'Task ID of the worker/replica running the training.')
27 |
28 |
29 | tf.app.flags.DEFINE_string('train_dir', '/tmp/cifar10_train',
30 | """Directory where to write event logs """
31 | """and checkpoint.""")
32 | tf.app.flags.DEFINE_integer('max_steps', 1000000,
33 | """Number of batches to run.""")
34 | tf.app.flags.DEFINE_boolean('log_device_placement', False,
35 | """Whether to log device placement.""")
36 |
37 | tf.logging.set_verbosity(tf.logging.INFO)
38 |
39 | INITIAL_LEARNING_RATE = 0.1 # Initial learning rate.
40 | MOVING_AVERAGE_DECAY = 0.9999 # The decay to use for the moving average.
41 | NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN = 50000
42 | NUM_EPOCHS_PER_DECAY = 350.0 # Epochs after which learning rate decays.
43 | LEARNING_RATE_DECAY_FACTOR = 0.1 # Learning rate decay factor.
44 |
45 | def train():
46 | ps_hosts = FLAGS.ps_hosts.split(',')
47 | worker_hosts = FLAGS.worker_hosts.split(',')
48 | print ('PS hosts are: %s' % ps_hosts)
49 | print ('Worker hosts are: %s' % worker_hosts)
50 |
51 | server = tf.train.Server({'ps': ps_hosts, 'worker': worker_hosts},
52 | job_name = FLAGS.job_name,
53 | task_index=FLAGS.task_id)
54 |
55 | if FLAGS.job_name == 'ps':
56 | server.join()
57 |
58 | is_chief = (FLAGS.task_id == 0)
59 | if is_chief:
60 | if tf.gfile.Exists(FLAGS.train_dir):
61 | tf.gfile.DeleteRecursively(FLAGS.train_dir)
62 | tf.gfile.MakeDirs(FLAGS.train_dir)
63 |
64 |
65 | device_setter = tf.train.replica_device_setter(ps_tasks=1)
66 | with tf.device('/job:worker/task:%d' % FLAGS.task_id):
67 | with tf.device(device_setter):
68 | global_step = tf.Variable(0, trainable=False)
69 |
70 | # Get images and labels for CIFAR-10.
71 | images, labels = cifar10.distorted_inputs()
72 |
73 | # Build a Graph that computes the logits predictions from the
74 | # inference model.
75 | logits = cifar10.inference(images)
76 |
77 | # Calculate loss.
78 | loss = cifar10.loss(logits, labels)
79 |
80 | # Need to re-implement the training part for sync updates.
81 | num_batches_per_epoch = NUM_EXAMPLES_PER_EPOCH_FOR_TRAIN / FLAGS.batch_size
82 | decay_steps = int(num_batches_per_epoch * NUM_EPOCHS_PER_DECAY)
83 |
84 | # Decay the learning rate exponentially based on the number of steps.
85 | lr = tf.train.exponential_decay(INITIAL_LEARNING_RATE,
86 | global_step,
87 | decay_steps,
88 | LEARNING_RATE_DECAY_FACTOR,
89 | staircase=True)
90 | tf.scalar_summary('learning_rate', lr)
91 | opt = tf.train.GradientDescentOptimizer(lr)
92 |
93 | # Track the moving averages of all trainable variables.
94 | exp_moving_averager = tf.train.ExponentialMovingAverage(MOVING_AVERAGE_DECAY, global_step)
95 | variables_to_average = (tf.trainable_variables() + tf.moving_average_variables())
96 |
97 | opt = tf.train.SyncReplicasOptimizer(
98 | opt,
99 | replicas_to_aggregate=len(worker_hosts),
100 | replica_id=FLAGS.task_id,
101 | total_num_replicas=len(worker_hosts),
102 | variable_averages=exp_moving_averager,
103 | variables_to_average=variables_to_average)
104 |
105 |
106 | # Compute gradients with respect to the loss.
107 | grads = opt.compute_gradients(loss)
108 |
109 | # Add histograms for gradients.
110 | for grad, var in grads:
111 | if grad is not None:
112 | tf.histogram_summary(var.op.name + '/gradients', grad)
113 |
114 | apply_gradients_op = opt.apply_gradients(grads, global_step=global_step)
115 |
116 | with tf.control_dependencies([apply_gradients_op]):
117 | train_op = tf.identity(loss, name='train_op')
118 |
119 |
120 | chief_queue_runners = [opt.get_chief_queue_runner()]
121 | init_tokens_op = opt.get_init_tokens_op()
122 |
123 | saver = tf.train.Saver()
124 | # We run the summaries in the same thread as the training operations by
125 | # passing in None for summary_op to avoid a summary_thread being started.
126 | # Running summaries and training operations in parallel could run out of
127 | # GPU memory.
128 | sv = tf.train.Supervisor(is_chief=is_chief,
129 | logdir=FLAGS.train_dir,
130 | init_op=tf.initialize_all_variables(),
131 | summary_op=tf.merge_all_summaries(),
132 | global_step=global_step,
133 | saver=saver,
134 | save_model_secs=60)
135 |
136 | tf.logging.info('%s Supervisor' % datetime.now())
137 |
138 | sess_config = tf.ConfigProto(allow_soft_placement=True,
139 | log_device_placement=FLAGS.log_device_placement)
140 |
141 | print ("Before session init")
142 | # Get a session.
143 | sess = sv.prepare_or_wait_for_session(server.target, config=sess_config)
144 | print ("Session init done")
145 |
146 | # Start the queue runners.
147 | queue_runners = tf.get_collection(tf.GraphKeys.QUEUE_RUNNERS)
148 | sv.start_queue_runners(sess, queue_runners)
149 | print ('Started %d queues for processing input data.' % len(queue_runners))
150 |
151 | sv.start_queue_runners(sess, chief_queue_runners)
152 | sess.run(init_tokens_op)
153 |
154 | print ('Start training')
155 | """Train CIFAR-10 for a number of steps."""
156 | for step in xrange(FLAGS.max_steps):
157 | start_time = time.time()
158 | _, loss_value, gs = sess.run([train_op, loss, global_step])
159 | duration = time.time() - start_time
160 |
161 | assert not np.isnan(loss_value), 'Model diverged with loss = NaN'
162 |
163 | if step % 10 == 0:
164 | num_examples_per_step = FLAGS.batch_size
165 | examples_per_sec = num_examples_per_step / duration
166 | sec_per_batch = float(duration)
167 |
168 | format_str = ('%s: step %d (global_step %d), loss = %.2f (%.1f examples/sec; %.3f sec/batch)')
169 | print (format_str % (datetime.now(), step, gs, loss_value, examples_per_sec, sec_per_batch))
170 |
171 | if is_chief:
172 | saver.save(sess, os.path.join(FLAGS.train_dir, 'model.ckpt'),
173 | global_step=global_step)
174 |
175 | def main(argv=None):
176 | cifar10.maybe_download_and_extract()
177 | train()
178 |
179 | if __name__ == '__main__':
180 | tf.app.run()
181 |
--------------------------------------------------------------------------------
/cifar10/cifar10_train.py:
--------------------------------------------------------------------------------
1 | # Copyright 2015 Google Inc. All Rights Reserved.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | # ==============================================================================
15 |
16 | """A binary to train CIFAR-10 using a single GPU.
17 |
18 | Accuracy:
19 | cifar10_train.py achieves ~86% accuracy after 100K steps (256 epochs of
20 | data) as judged by cifar10_eval.py.
21 |
22 | Speed: With batch_size 128.
23 |
24 | System | Step Time (sec/batch) | Accuracy
25 | ------------------------------------------------------------------
26 | 1 Tesla K20m | 0.35-0.60 | ~86% at 60K steps (5 hours)
27 | 1 Tesla K40m | 0.25-0.35 | ~86% at 100K steps (4 hours)
28 |
29 | Usage:
30 | Please see the tutorial and website for how to download the CIFAR-10
31 | data set, compile the program and train the model.
32 |
33 | http://tensorflow.org/tutorials/deep_cnn/
34 | """
35 | from __future__ import absolute_import
36 | from __future__ import division
37 | from __future__ import print_function
38 |
39 | from datetime import datetime
40 | import os.path
41 | import time
42 |
43 | import numpy as np
44 | from six.moves import xrange # pylint: disable=redefined-builtin
45 | import tensorflow as tf
46 |
47 | from tensorflow.models.image.cifar10 import cifar10
48 |
49 | FLAGS = tf.app.flags.FLAGS
50 |
51 | tf.app.flags.DEFINE_string('train_dir', '/tmp/cifar10_train',
52 | """Directory where to write event logs """
53 | """and checkpoint.""")
54 | tf.app.flags.DEFINE_integer('max_steps', 1000000,
55 | """Number of batches to run.""")
56 | tf.app.flags.DEFINE_boolean('log_device_placement', False,
57 | """Whether to log device placement.""")
58 |
59 |
60 | def train():
61 | """Train CIFAR-10 for a number of steps."""
62 | with tf.Graph().as_default():
63 | global_step = tf.Variable(0, trainable=False)
64 |
65 | # Get images and labels for CIFAR-10.
66 | images, labels = cifar10.distorted_inputs()
67 |
68 | # Build a Graph that computes the logits predictions from the
69 | # inference model.
70 | logits = cifar10.inference(images)
71 |
72 | # Calculate loss.
73 | loss = cifar10.loss(logits, labels)
74 |
75 | # Build a Graph that trains the model with one batch of examples and
76 | # updates the model parameters.
77 | train_op = cifar10.train(loss, global_step)
78 |
79 | # Create a saver.
80 | saver = tf.train.Saver(tf.all_variables())
81 |
82 | # Build the summary operation based on the TF collection of Summaries.
83 | summary_op = tf.merge_all_summaries()
84 |
85 | # Build an initialization operation to run below.
86 | init = tf.initialize_all_variables()
87 |
88 | # Start running operations on the Graph.
89 | sess = tf.Session(config=tf.ConfigProto(
90 | log_device_placement=FLAGS.log_device_placement))
91 | sess.run(init)
92 |
93 | # Start the queue runners.
94 | tf.train.start_queue_runners(sess=sess)
95 |
96 | summary_writer = tf.train.SummaryWriter(FLAGS.train_dir, sess.graph)
97 |
98 | for step in xrange(FLAGS.max_steps):
99 | start_time = time.time()
100 | _, loss_value = sess.run([train_op, loss])
101 | duration = time.time() - start_time
102 |
103 | assert not np.isnan(loss_value), 'Model diverged with loss = NaN'
104 |
105 | if step % 10 == 0:
106 | num_examples_per_step = FLAGS.batch_size
107 | examples_per_sec = num_examples_per_step / duration
108 | sec_per_batch = float(duration)
109 |
110 | format_str = ('%s: step %d, loss = %.2f (%.1f examples/sec; %.3f '
111 | 'sec/batch)')
112 | print (format_str % (datetime.now(), step, loss_value,
113 | examples_per_sec, sec_per_batch))
114 |
115 | if step % 100 == 0:
116 | summary_str = sess.run(summary_op)
117 | summary_writer.add_summary(summary_str, step)
118 |
119 | # Save the model checkpoint periodically.
120 | if step % 1000 == 0 or (step + 1) == FLAGS.max_steps:
121 | checkpoint_path = os.path.join(FLAGS.train_dir, 'model.ckpt')
122 | saver.save(sess, checkpoint_path, global_step=step)
123 |
124 |
125 | def main(argv=None): # pylint: disable=unused-argument
126 | cifar10.maybe_download_and_extract()
127 | if tf.gfile.Exists(FLAGS.train_dir):
128 | tf.gfile.DeleteRecursively(FLAGS.train_dir)
129 | tf.gfile.MakeDirs(FLAGS.train_dir)
130 | train()
131 |
132 |
133 | if __name__ == '__main__':
134 | tf.app.run()
135 |
--------------------------------------------------------------------------------
/data/README.md:
--------------------------------------------------------------------------------
1 | ### Data needed in the examples.
2 |
--------------------------------------------------------------------------------
/data/t10k-images-idx3-ubyte.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/caicloud/tensorflow-demo/87460710b5fc9c829de369013c0008a9322b6c4f/data/t10k-images-idx3-ubyte.gz
--------------------------------------------------------------------------------
/data/t10k-labels-idx1-ubyte.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/caicloud/tensorflow-demo/87460710b5fc9c829de369013c0008a9322b6c4f/data/t10k-labels-idx1-ubyte.gz
--------------------------------------------------------------------------------
/data/text8.zip:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/caicloud/tensorflow-demo/87460710b5fc9c829de369013c0008a9322b6c4f/data/text8.zip
--------------------------------------------------------------------------------
/data/train-images-idx3-ubyte.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/caicloud/tensorflow-demo/87460710b5fc9c829de369013c0008a9322b6c4f/data/train-images-idx3-ubyte.gz
--------------------------------------------------------------------------------
/data/train-labels-idx1-ubyte.gz:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/caicloud/tensorflow-demo/87460710b5fc9c829de369013c0008a9322b6c4f/data/train-labels-idx1-ubyte.gz
--------------------------------------------------------------------------------
/distributed/README.md:
--------------------------------------------------------------------------------
1 | ## Distributed TensorFlow examples.
2 | This directory includes yaml files several distributed TensorFlow examples.
3 |
4 | #### Create yaml to start TensorFlow servers
5 | This [script](https://github.com/caicloud/tensorflow-demo/blob/master/distributed/create_tf_server_yaml.py) can generate yaml file needed to create distributed tensorflow cluster. You can use ```--num_workers``` to specify number of workers and use ```--num_parameter_servers``` to specify number of parameter servers.
6 |
7 |
8 | #### Start examples using the TensorFlow servers
9 | This [script](https://github.com/caicloud/tensorflow-demo/blob/master/distributed/start_tf.sh) can run all the following examples on distributed TensorFlow. Example cmd:
10 | ```
11 | ./start_tf.sh 8 3 mnist_cnn.py
12 | ```
13 | The first parameter gives the number of workers. This can be equal or smaller than the nubmer of workers specified when creating the cluster).
14 |
15 | The second parameter gives the number of parameter servers. This must be the same as num_parameter_servers specified when creating the TensorFlow cluster.
16 |
17 | The third parameter gives the code to be run.
18 |
19 | #### MNIST examples
20 | - DNN example ([code](https://github.com/caicloud/tensorflow-demo/blob/master/distributed/mnist_dnn.py))
21 | - DNN example with Sync updates ([code](https://github.com/caicloud/tensorflow-demo/blob/master/distributed/mnist_dnn_sync_update.py))
22 | - CNN example ([code](https://github.com/caicloud/tensorflow-demo/blob/master/distributed/mnist_cnn.py))
23 |
24 | #### Word to Vector example
25 | - Word to Vector example ([code](https://github.com/caicloud/tensorflow-demo/blob/master/distributed/word2vector.py))
26 |
27 |
--------------------------------------------------------------------------------
/distributed/create_tf_server_yaml.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python
2 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved.
3 | #
4 | # Licensed under the Apache License, Version 2.0 (the "License");
5 | # you may not use this file except in compliance with the License.
6 | # You may obtain a copy of the License at
7 | #
8 | # http://www.apache.org/licenses/LICENSE-2.0
9 | #
10 | # Unless required by applicable law or agreed to in writing, software
11 | # distributed under the License is distributed on an "AS IS" BASIS,
12 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
13 | # See the License for the specific language governing permissions and
14 | # limitations under the License.
15 | # ==============================================================================
16 |
17 | """Generates YAML configuration files for distributed Tensorflow workers.
18 | The workers will be run in a Kubernetes (k8s) container cluster.
19 | """
20 | from __future__ import absolute_import
21 | from __future__ import division
22 | from __future__ import print_function
23 |
24 | import argparse
25 | import sys
26 |
27 | # Note: It is intentional that we do not import tensorflow in this script. The
28 | # machine that launches a TensorFlow k8s cluster does not have to have the
29 | # Python package of TensorFlow installed on it.
30 |
31 |
32 | DEFAULT_DOCKER_IMAGE = 'tensorflow/tf_grpc_test_server'
33 | DEFAULT_PORT = 2222
34 |
35 | # TODO(cais): Consider adding resource requests/limits to the pods.
36 | WORKER_RC = (
37 | """apiVersion: v1
38 | kind: ReplicationController
39 | metadata:
40 | name: tf-worker{worker_id}
41 | spec:
42 | replicas: 1
43 | template:
44 | metadata:
45 | labels:
46 | tf-worker: "{worker_id}"
47 | spec:
48 | containers:
49 | - name: tf-worker{worker_id}
50 | image: {docker_image}
51 | args:
52 | - --cluster_spec={cluster_spec}
53 | - --job_name=worker
54 | - --task_id={worker_id}
55 | ports:
56 | - containerPort: {port}
57 | """)
58 | WORKER_SVC = (
59 | """apiVersion: v1
60 | kind: Service
61 | metadata:
62 | name: tf-worker{worker_id}
63 | labels:
64 | tf-worker: "{worker_id}"
65 | spec:
66 | ports:
67 | - port: {port}
68 | targetPort: {port}
69 | selector:
70 | tf-worker: "{worker_id}"
71 | """)
72 | WORKER_LB_SVC = (
73 | """apiVersion: v1
74 | kind: Service
75 | metadata:
76 | name: tf-worker{worker_id}
77 | labels:
78 | tf-worker: "{worker_id}"
79 | spec:
80 | type: LoadBalancer
81 | ports:
82 | - port: {port}
83 | selector:
84 | tf-worker: "{worker_id}"
85 | """)
86 | PARAM_SERVER_RC = (
87 | """apiVersion: v1
88 | kind: ReplicationController
89 | metadata:
90 | name: tf-ps{param_server_id}
91 | spec:
92 | replicas: 1
93 | template:
94 | metadata:
95 | labels:
96 | tf-ps: "{param_server_id}"
97 | spec:
98 | containers:
99 | - name: tf-ps{param_server_id}
100 | image: {docker_image}
101 | args:
102 | - --cluster_spec={cluster_spec}
103 | - --job_name=ps
104 | - --task_id={param_server_id}
105 | ports:
106 | - containerPort: {port}
107 | """)
108 | PARAM_SERVER_SVC = (
109 | """apiVersion: v1
110 | kind: Service
111 | metadata:
112 | name: tf-ps{param_server_id}
113 | labels:
114 | tf-ps: "{param_server_id}"
115 | spec:
116 | ports:
117 | - port: {port}
118 | selector:
119 | tf-ps: "{param_server_id}"
120 | """)
121 |
122 |
123 | def main():
124 | """Do arg parsing."""
125 | parser = argparse.ArgumentParser()
126 | parser.add_argument('--num_workers',
127 | type=int,
128 | default=2,
129 | help='How many worker pods to run')
130 | parser.add_argument('--num_parameter_servers',
131 | type=int,
132 | default=1,
133 | help='How many paramater server pods to run')
134 | parser.add_argument('--grpc_port',
135 | type=int,
136 | default=DEFAULT_PORT,
137 | help='GRPC server port (Default: %d)' % DEFAULT_PORT)
138 | parser.add_argument('--request_load_balancer',
139 | type=bool,
140 | default=False,
141 | help='To request worker0 to be exposed on a public IP '
142 | 'address via an external load balancer, enabling you to '
143 | 'run client processes from outside the cluster')
144 | parser.add_argument('--docker_image',
145 | type=str,
146 | default=DEFAULT_DOCKER_IMAGE,
147 | help='Override default docker image for the TensorFlow '
148 | 'GRPC server')
149 | args = parser.parse_args()
150 |
151 | if args.num_workers <= 0:
152 | sys.stderr.write('--num_workers must be greater than 0; received %d\n'
153 | % args.num_workers)
154 | sys.exit(1)
155 | if args.num_parameter_servers <= 0:
156 | sys.stderr.write(
157 | '--num_parameter_servers must be greater than 0; received %d\n'
158 | % args.num_parameter_servers)
159 | sys.exit(1)
160 |
161 | # Generate contents of yaml config
162 | yaml_config = GenerateConfig(args.num_workers,
163 | args.num_parameter_servers,
164 | args.grpc_port,
165 | args.request_load_balancer,
166 | args.docker_image)
167 | print(yaml_config) # pylint: disable=superfluous-parens
168 |
169 |
170 | def GenerateConfig(num_workers,
171 | num_param_servers,
172 | port,
173 | request_load_balancer,
174 | docker_image):
175 | """Generate configuration strings."""
176 | config = ''
177 | for worker in range(num_workers):
178 | config += WORKER_RC.format(
179 | port=port,
180 | worker_id=worker,
181 | docker_image=docker_image,
182 | cluster_spec=WorkerClusterSpecString(num_workers,
183 | num_param_servers,
184 | port))
185 | config += '---\n'
186 | if request_load_balancer:
187 | config += WORKER_LB_SVC.format(port=port,
188 | worker_id=worker)
189 | else:
190 | config += WORKER_SVC.format(port=port,
191 | worker_id=worker)
192 | config += '---\n'
193 |
194 | for param_server in range(num_param_servers):
195 | config += PARAM_SERVER_RC.format(
196 | port=port,
197 | param_server_id=param_server,
198 | docker_image=docker_image,
199 | cluster_spec=ParamServerClusterSpecString(num_workers,
200 | num_param_servers,
201 | port))
202 | config += '---\n'
203 | config += PARAM_SERVER_SVC.format(port=port,
204 | param_server_id=param_server)
205 | config += '---\n'
206 |
207 | return config
208 |
209 |
210 | def WorkerClusterSpecString(num_workers,
211 | num_param_servers,
212 | port):
213 | """Generates worker cluster spec."""
214 | return ClusterSpecString(num_workers, num_param_servers, port)
215 |
216 |
217 | def ParamServerClusterSpecString(num_workers,
218 | num_param_servers,
219 | port):
220 | """Generates parameter server spec."""
221 | return ClusterSpecString(num_workers, num_param_servers, port)
222 |
223 |
224 | def ClusterSpecString(num_workers,
225 | num_param_servers,
226 | port):
227 | """Generates general cluster spec."""
228 | spec = 'worker|'
229 | for worker in range(num_workers):
230 | spec += 'tf-worker%d:%d' % (worker, port)
231 | if worker != num_workers-1:
232 | spec += ';'
233 |
234 | spec += ',ps|'
235 | for param_server in range(num_param_servers):
236 | spec += 'tf-ps%d:%d' % (param_server, port)
237 | if param_server != num_param_servers-1:
238 | spec += ';'
239 |
240 | return spec
241 |
242 |
243 | if __name__ == '__main__':
244 | main()
245 |
--------------------------------------------------------------------------------
/distributed/mnist_cnn.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import time
3 | import datetime
4 |
5 | import numpy
6 | import tensorflow as tf
7 | from tensorflow.examples.tutorials.mnist import input_data
8 |
9 | flags = tf.app.flags
10 | flags.DEFINE_integer("worker_index", 0,
11 | "Worker task index, should be >= 0. worker_index=0 is "
12 | "the master worker task the performs the variable "
13 | "initialization ")
14 |
15 | flags.DEFINE_string("workers", None,
16 | "The worker url list, separated by comma (e.g. tf-worker1:2222,1.2.3.4:2222)")
17 |
18 | flags.DEFINE_string("parameter_servers", None,
19 | "The ps url list, separated by comma (e.g. tf-ps2:2222,1.2.3.5:2222)")
20 |
21 | flags.DEFINE_string("worker_grpc_url", None,
22 | "Worker GRPC URL (e.g., grpc://1.2.3.4:2222, or "
23 | "grpc://tf-worker0:2222)")
24 |
25 | flags.DEFINE_string("name_scope", None, "The variable name scope.")
26 | FLAGS = flags.FLAGS
27 |
28 | TRAING_STEP = 5000
29 | BATCH_SIZE = 64
30 | EVAL_SIZE = 50
31 | IMAGE_SIZE = 28
32 | NUM_CHANNELS = 1
33 | NUM_LABELS = 10
34 |
35 | print("Loading data from worker index = %d" % FLAGS.worker_index)
36 | mnist = input_data.read_data_sets("/tmp/data", one_hot=True)
37 | print("Testing set size: %d" % len(mnist.test.images))
38 | print("Training set size: %d" % len(mnist.train.images))
39 |
40 | print("Worker GRPC URL: %s" % FLAGS.worker_grpc_url)
41 | print("Workers = %s" % FLAGS.workers)
42 | print("Using name scope %s" % FLAGS.name_scope)
43 |
44 | is_chief = (FLAGS.worker_index == 0)
45 | if is_chief: tf.reset_default_graph()
46 |
47 | cluster = tf.train.ClusterSpec({"ps": FLAGS.parameter_servers.split(","), "worker": FLAGS.workers.split(",")})
48 | # Construct device setter object
49 | device_setter = tf.train.replica_device_setter(cluster=cluster)
50 |
51 | # The device setter will automatically place Variables ops on separate
52 | # parameter servers (ps). The non-Variable ops will be placed on the workers.
53 | with tf.device(device_setter):
54 | with tf.name_scope(FLAGS.name_scope):
55 | global_step = tf.Variable(0, trainable=False)
56 |
57 | # The variables below hold all the trainable weights.
58 | # Convolutional layers.
59 | conv1_weights = tf.Variable(tf.truncated_normal([5, 5, NUM_CHANNELS, 32], stddev=0.1, seed = 2))
60 | conv1_biases = tf.Variable(tf.zeros([32]))
61 |
62 | conv2_weights = tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1, seed = 2))
63 | conv2_biases = tf.Variable(tf.constant(0.1, shape=[64]))
64 |
65 | # fully connected, depth 512.
66 | fc1_weights = tf.Variable(tf.truncated_normal([IMAGE_SIZE // 4 * IMAGE_SIZE // 4 * 64, 512], stddev=0.1, seed=2))
67 | fc1_biases = tf.Variable(tf.constant(0.1, shape=[512]))
68 |
69 | fc2_weights = tf.Variable(tf.truncated_normal([512, NUM_LABELS], stddev=0.1, seed=2))
70 | fc2_biases = tf.Variable(tf.constant(0.1, shape=[NUM_LABELS]))
71 |
72 | x = tf.placeholder(tf.float32, shape=(None, IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS))
73 | y_ = tf.placeholder(tf.float32, shape=(None, NUM_LABELS))
74 |
75 | def model(data, train=False):
76 | """The Model definition."""
77 | # 2D convolution, with 'SAME' padding (i.e. the output feature map has
78 | # the same size as the input). Note that {strides} is a 4D array whose
79 | # shape matches the data layout: [image index, y, x, depth].
80 | conv = tf.nn.conv2d(data, conv1_weights, strides=[1, 1, 1, 1], padding='SAME')
81 | # Bias and rectified linear non-linearity.
82 | relu = tf.nn.relu(tf.nn.bias_add(conv, conv1_biases))
83 | # Max pooling. The kernel size spec {ksize} also follows the layout of
84 | # the data. Here we have a pooling window of 2, and a stride of 2.
85 | pool = tf.nn.max_pool(relu, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
86 |
87 | conv1 = tf.nn.conv2d(pool, conv2_weights, strides=[1, 1, 1, 1], padding='SAME')
88 | relu1 = tf.nn.relu(tf.nn.bias_add(conv1, conv2_biases))
89 | pool1 = tf.nn.max_pool(relu1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')
90 |
91 | # Reshape the feature map into a 2D matrix to feed it to the fully connected layers.
92 | pool_shape = pool1.get_shape().as_list()
93 | reshape = tf.reshape(pool1, [-1, pool_shape[1] * pool_shape[2] * pool_shape[3]])
94 |
95 | # Fully connected layer. Note that the '+' operation automatically broadcasts the biases.
96 | hidden = tf.nn.relu(tf.matmul(reshape, fc1_weights) + fc1_biases)
97 | # Add a 50% dropout during training only. Dropout also scales
98 | # activations such that no rescaling is needed at evaluation time.
99 | if train: hidden = tf.nn.dropout(hidden, 0.5)
100 | return tf.nn.softmax(tf.matmul(hidden, fc2_weights) + fc2_biases)
101 |
102 | train_y = model(x, True)
103 | loss = -tf.reduce_mean(y_ * tf.log(train_y))
104 | regularizers = (tf.nn.l2_loss(fc1_weights) + tf.nn.l2_loss(fc1_biases) +
105 | tf.nn.l2_loss(fc2_weights) + tf.nn.l2_loss(fc2_biases))
106 | loss += 5e-4 * regularizers
107 |
108 | # Decay once per epoch, using an exponential schedule starting at 0.01.
109 | learning_rate = tf.train.exponential_decay(
110 | 0.01, # Base learning rate.
111 | global_step * BATCH_SIZE, # Current index into the dataset.
112 | mnist.train.num_examples, # Decay step.
113 | 0.95, # Decay rate.
114 | staircase=True)
115 |
116 | # Use simple momentum for the optimization.
117 | optimizer = tf.train.MomentumOptimizer(learning_rate, 0.9).minimize(loss, global_step=global_step)
118 |
119 | # Training accuracy
120 | train_correct_prediction = tf.equal(tf.argmax(y_, 1), tf.argmax(train_y, 1))
121 | train_accuracy = tf.reduce_mean(tf.cast(train_correct_prediction, tf.float32))
122 |
123 | # Predictions for the test and validation, which we'll compute less often.
124 | eval_y = model(x, False)
125 | eval_correct_prediction = tf.reduce_sum(tf.cast(tf.equal(tf.argmax(y_, 1), tf.argmax(eval_y, 1)), tf.float32))
126 |
127 | reshaped_test_data = numpy.reshape(mnist.test.images, [-1, 28, 28, 1])
128 | test_label = mnist.test.labels
129 | reshaped_validate_data = numpy.reshape(mnist.validation.images, [-1, 28, 28, 1])
130 | validate_label = mnist.validation.labels
131 |
132 | sv = tf.train.Supervisor(is_chief=is_chief,
133 | logdir="/tmp/dist-mnist-log/train",
134 | saver=tf.train.Saver(),
135 | init_op=tf.initialize_all_variables(),
136 | recovery_wait_secs=1,
137 | global_step=global_step)
138 | sess_config = tf.ConfigProto(allow_soft_placement=True, log_device_placement=True,
139 | device_filters=["/job:ps", "/job:worker/task:%d" % FLAGS.worker_index])
140 |
141 | # The chief worker (worker_index==0) session will prepare the session,
142 | # while the remaining workers will wait for the preparation to complete.
143 | if is_chief:
144 | print("Worker %d: Initializing session..." % FLAGS.worker_index)
145 | else:
146 | print("Worker %d: Waiting for session to be initialized..." % FLAGS.worker_index)
147 |
148 | with sv.prepare_or_wait_for_session(FLAGS.worker_grpc_url, config=sess_config) as sess:
149 | print("Worker %d: Session initialization complete." % FLAGS.worker_index)
150 |
151 | def get_eval(data_x, data_y):
152 | total_len = len(data_x)
153 | start = 0
154 | end = EVAL_SIZE
155 | total_correct = 0
156 | while end < total_len:
157 | cur_correct, step = sess.run([eval_correct_prediction, global_step], feed_dict={x: data_x[start:end], y_:data_y[start:end]})
158 | total_correct += cur_correct
159 | start = end
160 | end += EVAL_SIZE
161 | if end > total_len: end = total_len
162 |
163 | return float(total_correct) / float(total_len)
164 |
165 | # Perform training
166 | time_begin = time.time()
167 | print("Training begins @ %f" % time_begin)
168 |
169 | local_step = 0
170 | while True:
171 | # Training feed
172 | batch_xs, batch_ys = mnist.train.next_batch(BATCH_SIZE)
173 | reshaped_x = numpy.reshape(batch_xs, [BATCH_SIZE, 28, 28, 1])
174 | train_feed = {x: reshaped_x, y_: batch_ys}
175 |
176 | _, step = sess.run([optimizer, global_step], feed_dict=train_feed)
177 | if local_step % 100 == 0:
178 | validate_acc = get_eval(reshaped_validate_data, validate_label)
179 | test_acc = get_eval(reshaped_test_data, test_label)
180 | print("Worker %d: After %d training step(s) (global step: %d), validation accuracy = %g, test accuracy = %g" %
181 | (FLAGS.worker_index, local_step, step, validate_acc, test_acc))
182 | if step >= TRAING_STEP: break
183 | local_step += 1
184 |
185 | time_end = time.time()
186 | print("Training ends @ %f" % time_end)
187 | training_time = time_end - time_begin
188 | print("Training elapsed time: %f s" % training_time)
189 |
190 | # Accuracy on test data
191 | test_acc = get_eval(reshaped_test_data, test_label)
192 | print("Final test accuracy = %g" % (test_acc))
193 |
194 |
--------------------------------------------------------------------------------
/distributed/mnist_dnn.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import time
3 |
4 | import tensorflow as tf
5 | from tensorflow.examples.tutorials.mnist import input_data
6 |
7 | flags = tf.app.flags
8 | flags.DEFINE_integer("worker_index", 0,
9 | "Worker task index, should be >= 0. worker_index=0 is "
10 | "the master worker task the performs the variable "
11 | "initialization ")
12 |
13 | flags.DEFINE_string("workers", None,
14 | "The worker url list, separated by comma (e.g. tf-worker1:2222,1.2.3.4:2222)")
15 |
16 | flags.DEFINE_string("parameter_servers", None,
17 | "The ps url list, separated by comma (e.g. tf-ps2:2222,1.2.3.5:2222)")
18 |
19 | flags.DEFINE_string("worker_grpc_url", None,
20 | "Worker GRPC URL (e.g., grpc://1.2.3.4:2222, or "
21 | "grpc://tf-worker0:2222)")
22 |
23 | flags.DEFINE_string("name_scope", None, "The variable name scope.")
24 | FLAGS = flags.FLAGS
25 |
26 | TRAING_STEP = 5000
27 | hidden_nodes = 500
28 |
29 | def nn_layer(input_tensor, input_dim, output_dim, act=tf.nn.relu):
30 | with tf.name_scope(FLAGS.name_scope):
31 | weights = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=0.1, seed = 2))
32 | biases = tf.Variable(tf.constant(0.1, shape=[output_dim]))
33 | activations = act(tf.matmul(input_tensor, weights) + biases)
34 | return activations
35 |
36 | def model(x, y_, global_step):
37 | hidden1 = nn_layer(x, 784, hidden_nodes)
38 | y = nn_layer(hidden1, hidden_nodes, 10, act=tf.nn.softmax)
39 |
40 | cross_entropy = -tf.reduce_mean(y_ * tf.log(tf.clip_by_value(y, 1e-10, 1.0)))
41 | train_step = tf.train.AdamOptimizer(0.001).minimize(cross_entropy, global_step=global_step)
42 |
43 | correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
44 | accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
45 |
46 | return train_step, accuracy
47 |
48 | print("Loading data from worker index = %d" % FLAGS.worker_index)
49 | mnist = input_data.read_data_sets("/tmp/data", one_hot=True)
50 | print("Testing set size: %d" % len(mnist.test.images))
51 | print("Training set size: %d" % len(mnist.train.images))
52 |
53 | print("Worker GRPC URL: %s" % FLAGS.worker_grpc_url)
54 | print("Workers = %s" % FLAGS.workers)
55 | print("Using name scope %s" % FLAGS.name_scope)
56 |
57 | is_chief = (FLAGS.worker_index == 0)
58 | cluster = tf.train.ClusterSpec({"ps": FLAGS.parameter_servers.split(","), "worker": FLAGS.workers.split(",")})
59 | # Construct device setter object
60 | device_setter = tf.train.replica_device_setter(cluster=cluster)
61 |
62 | # The device setter will automatically place Variables ops on separate
63 | # parameter servers (ps). The non-Variable ops will be placed on the workers.
64 | with tf.device(device_setter):
65 | with tf.name_scope(FLAGS.name_scope):
66 | global_step = tf.Variable(0, trainable=False)
67 |
68 | x = tf.placeholder(tf.float32, [None, 784])
69 | y_ = tf.placeholder(tf.float32, [None, 10])
70 | val_feed = {x: mnist.test.images, y_: mnist.test.labels}
71 |
72 | train_step, accuracy = model(x, y_, global_step)
73 |
74 | sv = tf.train.Supervisor(is_chief=is_chief,
75 | logdir="/tmp/dist-mnist-log/train",
76 | saver=tf.train.Saver(),
77 | init_op=tf.initialize_all_variables(),
78 | recovery_wait_secs=1,
79 | global_step=global_step)
80 | sess_config = tf.ConfigProto(allow_soft_placement=True, log_device_placement=True,
81 | device_filters=["/job:ps", "/job:worker/task:%d" % FLAGS.worker_index])
82 |
83 | # The chief worker (worker_index==0) session will prepare the session,
84 | # while the remaining workers will wait for the preparation to complete.
85 | if is_chief:
86 | print("Worker %d: Initializing session..." % FLAGS.worker_index)
87 | else:
88 | print("Worker %d: Waiting for session to be initialized..." % FLAGS.worker_index)
89 |
90 | with sv.prepare_or_wait_for_session(FLAGS.worker_grpc_url, config=sess_config) as sess:
91 | print("Worker %d: Session initialization complete." % FLAGS.worker_index)
92 |
93 | # Perform training
94 | time_begin = time.time()
95 | print("Training begins @ %f" % time_begin)
96 |
97 | local_step = 0
98 | while True:
99 | # Training feed
100 | batch_xs, batch_ys = mnist.train.next_batch(100)
101 | train_feed = {x: batch_xs, y_: batch_ys}
102 |
103 | _, step = sess.run([train_step, global_step], feed_dict=train_feed)
104 | if local_step % 100 == 0:
105 | print("Worker %d: training step %d done (global step: %d); Accuracy: %g" %
106 | (FLAGS.worker_index, local_step, step, sess.run(accuracy, feed_dict=val_feed)))
107 | if step >= TRAING_STEP: break
108 | local_step += 1
109 |
110 | time_end = time.time()
111 | print("Training ends @ %f" % time_end)
112 | training_time = time_end - time_begin
113 | print("Training elapsed time: %f s" % training_time)
114 |
115 | # Accuracy on test data
116 | print("Final test accuracy = %g" % (sess.run(accuracy, feed_dict=val_feed)))
117 |
118 |
--------------------------------------------------------------------------------
/distributed/mnist_dnn_sync_update.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import time
3 |
4 | import tensorflow as tf
5 | from tensorflow.examples.tutorials.mnist import input_data
6 |
7 | flags = tf.app.flags
8 | flags.DEFINE_integer("worker_index", 0,
9 | "Worker task index, should be >= 0. worker_index=0 is "
10 | "the master worker task the performs the variable "
11 | "initialization ")
12 |
13 | flags.DEFINE_string("workers", None,
14 | "The worker url list, separated by comma (e.g. tf-worker1:2222,1.2.3.4:2222)")
15 |
16 | flags.DEFINE_string("parameter_servers", None,
17 | "The ps url list, separated by comma (e.g. tf-ps2:2222,1.2.3.5:2222)")
18 |
19 | flags.DEFINE_string("worker_grpc_url", None,
20 | "Worker GRPC URL (e.g., grpc://1.2.3.4:2222, or "
21 | "grpc://tf-worker0:2222)")
22 |
23 | flags.DEFINE_string("name_scope", None, "The variable name scope.")
24 | FLAGS = flags.FLAGS
25 |
26 | TRAING_STEP = 5000
27 | hidden_nodes = 500
28 |
29 | def nn_layer(input_tensor, input_dim, output_dim, act=tf.nn.relu):
30 | with tf.name_scope(FLAGS.name_scope):
31 | weights = tf.Variable(tf.truncated_normal([input_dim, output_dim], stddev=0.1, seed = 2))
32 | biases = tf.Variable(tf.constant(0.1, shape=[output_dim]))
33 | activations = act(tf.matmul(input_tensor, weights) + biases)
34 | return activations
35 |
36 | def model(x, y_, global_step):
37 | hidden1 = nn_layer(x, 784, hidden_nodes)
38 | y = nn_layer(hidden1, hidden_nodes, 10, act=tf.nn.softmax)
39 |
40 | cross_entropy = -tf.reduce_mean(y_ * tf.log(tf.clip_by_value(y, 1e-10, 1.0)))
41 | num_workers = len(FLAGS.workers.split(","))
42 | opt = tf.train.SyncReplicasOptimizer(
43 | tf.train.AdamOptimizer(0.01),
44 | replicas_to_aggregate=num_workers,
45 | total_num_replicas=num_workers,
46 | replica_id=FLAGS.worker_index,
47 | name="mnist_sync_replicas")
48 | train_step = opt.minimize(cross_entropy, global_step=global_step)
49 |
50 | correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
51 | accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
52 |
53 | return opt, train_step, accuracy
54 |
55 | print("Loading data from worker index = %d" % FLAGS.worker_index)
56 | mnist = input_data.read_data_sets("/tmp/data", one_hot=True)
57 | print("Testing set size: %d" % len(mnist.test.images))
58 | print("Training set size: %d" % len(mnist.train.images))
59 |
60 | print("Worker GRPC URL: %s" % FLAGS.worker_grpc_url)
61 | print("Workers = %s" % FLAGS.workers)
62 | print("Using name scope %s" % FLAGS.name_scope)
63 |
64 | is_chief = (FLAGS.worker_index == 0)
65 | cluster = tf.train.ClusterSpec({"ps": FLAGS.parameter_servers.split(","), "worker": FLAGS.workers.split(",")})
66 | # Construct device setter object
67 | device_setter = tf.train.replica_device_setter(cluster=cluster)
68 |
69 | # The device setter will automatically place Variables ops on separate
70 | # parameter servers (ps). The non-Variable ops will be placed on the workers.
71 | with tf.device(device_setter):
72 | with tf.name_scope(FLAGS.name_scope):
73 | global_step = tf.Variable(0, trainable=False)
74 |
75 | x = tf.placeholder(tf.float32, [None, 784])
76 | y_ = tf.placeholder(tf.float32, [None, 10])
77 | val_feed = {x: mnist.test.images, y_: mnist.test.labels}
78 |
79 | opt, train_step, accuracy = model(x, y_, global_step)
80 |
81 | chief_queue_runner = opt.get_chief_queue_runner()
82 | init_tokens_op = opt.get_init_tokens_op()
83 |
84 | sv = tf.train.Supervisor(is_chief=is_chief,
85 | logdir="/tmp/dist-mnist-log/train",
86 | saver=tf.train.Saver(),
87 | init_op=tf.initialize_all_variables(),
88 | recovery_wait_secs=1,
89 | global_step=global_step)
90 | sess_config = tf.ConfigProto(allow_soft_placement=True, log_device_placement=True,
91 | device_filters=["/job:ps", "/job:worker/task:%d" % FLAGS.worker_index])
92 |
93 | # The chief worker (worker_index==0) session will prepare the session,
94 | # while the remaining workers will wait for the preparation to complete.
95 | if is_chief:
96 | print("Worker %d: Initializing session..." % FLAGS.worker_index)
97 | else:
98 | print("Worker %d: Waiting for session to be initialized..." % FLAGS.worker_index)
99 |
100 | with sv.prepare_or_wait_for_session(FLAGS.worker_grpc_url, config=sess_config) as sess:
101 | print("Worker %d: Session initialization complete." % FLAGS.worker_index)
102 | print("Starting chief queue runner and running init_tokens_op")
103 | sv.start_queue_runners(sess, [chief_queue_runner])
104 | sess.run(init_tokens_op)
105 |
106 | # Perform training
107 | time_begin = time.time()
108 | print("Training begins @ %f" % time_begin)
109 |
110 | local_step = 0
111 | while True:
112 | # Training feed
113 | batch_xs, batch_ys = mnist.train.next_batch(100)
114 | train_feed = {x: batch_xs, y_: batch_ys}
115 |
116 | _, step = sess.run([train_step, global_step], feed_dict=train_feed)
117 | if local_step % 100 == 0:
118 | print("Worker %d: training step %d done (global step: %d); Accuracy: %g" %
119 | (FLAGS.worker_index, local_step, step, sess.run(accuracy, feed_dict=val_feed)))
120 | if step >= TRAING_STEP: break
121 | local_step += 1
122 |
123 | time_end = time.time()
124 | print("Training ends @ %f" % time_end)
125 | training_time = time_end - time_begin
126 | print("Training elapsed time: %f s" % training_time)
127 |
128 | # Accuracy on test data
129 | print("Final test accuracy = %g" % (sess.run(accuracy, feed_dict=val_feed)))
130 |
131 |
--------------------------------------------------------------------------------
/distributed/start_servers.py:
--------------------------------------------------------------------------------
1 | import tensorflow as tf
2 | import time
3 | c = tf.constant("Hello, distributed TensorFlow!")
4 | server = tf.train.Server.create_local_server()
5 | sess = tf.Session(server.target) # Create a session on the server.
6 | print server.target
7 | print sess.run(c)
8 |
9 | time.sleep(10)
10 |
11 |
--------------------------------------------------------------------------------
/distributed/start_tf.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | NUM_WORKER=${1}
4 | NUM_PS=${2}
5 | CODE=${3}
6 |
7 | WORKER_GRPC_URLS="grpc://tf-worker0:2222"
8 | WORKER_URLS="tf-worker0:2222"
9 | IDX=1
10 | while true; do
11 | if [[ "${IDX}" == "${NUM_WORKER}" ]]; then
12 | break
13 | fi
14 |
15 | WORKER_GRPC_URLS="${WORKER_GRPC_URLS} grpc://tf-worker${IDX}:2222"
16 | WORKER_URLS="${WORKER_URLS},tf-worker${IDX}:2222"
17 | ((IDX++))
18 | done
19 |
20 | PS_URLS="tf-ps0:2222"
21 | IDX=1
22 | while true; do
23 | if [[ "${IDX}" == "${NUM_PS}" ]]; then
24 | break
25 | fi
26 |
27 | PS_URLS="${PS_URLS},tf-ps${IDX}:2222"
28 | ((IDX++))
29 | done
30 |
31 | echo "WORKERS = ${WORKER_URLS}"
32 | echo "PARAMETER_SERVERS = ${PS_URLS}"
33 | echo "Running ${CODE}"
34 | WKR_LOG_PREFIX="/tmp/worker"
35 | URLS=($WORKER_GRPC_URLS)
36 | CUR_BATCH=$(date "+%H%M%S%d%m%y")
37 |
38 | IDX=0
39 | ((NUM_WORKER--))
40 | while true; do
41 | if [[ "${IDX}" == "${NUM_WORKER}" ]]; then
42 | break
43 | fi
44 |
45 | WORKER_GRPC_URL="${URLS[IDX]}"
46 | python "${CODE}" \
47 | --worker_grpc_url="${WORKER_GRPC_URL}" \
48 | --worker_index=${IDX} \
49 | --workers=${WORKER_URLS} \
50 | --name_scope=${CUR_BATCH} \
51 | --parameter_servers=${PS_URLS} > "${WKR_LOG_PREFIX}${IDX}.log" &
52 | echo "Worker ${IDX}: "
53 | echo " GRPC URL: ${WORKER_GRPC_URL}"
54 | echo " log file: ${WKR_LOG_PREFIX}${IDX}.log"
55 |
56 | ((IDX++))
57 | done
58 |
59 | WORKER_GRPC_URL="${URLS[IDX]}"
60 | python "${CODE}" \
61 | --worker_grpc_url="${WORKER_GRPC_URL}" \
62 | --worker_index=${IDX} \
63 | --workers=${WORKER_URLS} \
64 | --name_scope=${CUR_BATCH} \
65 | --parameter_servers=${PS_URLS}
66 |
67 | echo "Done!"
68 |
--------------------------------------------------------------------------------
/distributed/word2vector.py:
--------------------------------------------------------------------------------
1 | import collections
2 | import math
3 | import random
4 | import time
5 | import zipfile
6 |
7 | import numpy as np
8 | import tensorflow as tf
9 | from tensorflow.examples.tutorials.mnist import input_data
10 |
11 | flags = tf.app.flags
12 | flags.DEFINE_integer("worker_index", 0,
13 | "Worker task index, should be >= 0. worker_index=0 is "
14 | "the master worker task the performs the variable "
15 | "initialization ")
16 |
17 | flags.DEFINE_string("workers", None,
18 | "The worker url list, separated by comma (e.g. tf-worker1:2222,1.2.3.4:2222)")
19 |
20 | flags.DEFINE_string("parameter_servers", None,
21 | "The ps url list, separated by comma (e.g. tf-ps2:2222,1.2.3.5:2222)")
22 |
23 | flags.DEFINE_string("worker_grpc_url", None,
24 | "Worker GRPC URL (e.g., grpc://1.2.3.4:2222, or "
25 | "grpc://tf-worker0:2222)")
26 | FLAGS = flags.FLAGS
27 |
28 | vocabulary_size = 50000
29 | batch_size = 128
30 | valid_size = 8 # Random set of words to evaluate similarity on.
31 | valid_window = 100 # Only pick dev samples in the head of the distribution.
32 | embedding_size = 128 # Dimension of the embedding vector.
33 | num_sampled = 64 # Number of negative examples to sample.
34 | train_step = 500000
35 | skip_window = 1 # How many words to consider left and right.
36 | num_skips = 2 # How many times to reuse an input to generate a label.
37 |
38 |
39 | # Read data and split words.
40 | def read_data(filename):
41 | """Extract the first file enclosed in a zip file as a list of words"""
42 | with zipfile.ZipFile(filename) as f:
43 | data = f.read(f.namelist()[0]).split()
44 | return data
45 |
46 | # Build dictionary.
47 | def build_dataset(words):
48 | count = [['UNK', -1]]
49 | count.extend(collections.Counter(words).most_common(vocabulary_size - 1))
50 | dictionary = dict()
51 | for word, _ in count:
52 | dictionary[word] = len(dictionary)
53 | data = list()
54 | unk_count = 0
55 | for word in words:
56 | if word in dictionary:
57 | index = dictionary[word]
58 | else:
59 | index = 0 # dictionary['UNK']
60 | unk_count += 1
61 | data.append(index)
62 | count[0][1] = unk_count
63 | reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys()))
64 | return data, count, dictionary, reverse_dictionary
65 |
66 | data_index = 0
67 | # function to generate training batch
68 | def generate_batch(batch_size, num_skips, skip_window):
69 | global data_index
70 | assert batch_size % num_skips == 0
71 | assert num_skips <= 2 * skip_window
72 | batch = np.ndarray(shape=(batch_size), dtype=np.int32)
73 | labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
74 | span = 2 * skip_window + 1 # [ skip_window target skip_window ]
75 | buffer = collections.deque(maxlen=span)
76 | for _ in range(span):
77 | buffer.append(data[data_index])
78 | data_index = (data_index + 1) % len(data)
79 | for i in range(batch_size // num_skips):
80 | target = skip_window # target label at the center of the buffer
81 | targets_to_avoid = [ skip_window ]
82 | for j in range(num_skips):
83 | while target in targets_to_avoid:
84 | target = random.randint(0, span - 1)
85 | targets_to_avoid.append(target)
86 | batch[i * num_skips + j] = buffer[skip_window]
87 | labels[i * num_skips + j, 0] = buffer[target]
88 | buffer.append(data[data_index])
89 | data_index = (data_index + 1) % len(data)
90 | return batch, labels
91 |
92 | print("Loading data from worker index = %d" % FLAGS.worker_index)
93 | words = read_data("/tmp/data/text8.zip")
94 | print("Data size: %d" % len(words))
95 | data, count, dictionary, reverse_dictionary = build_dataset(words)
96 |
97 | print("Worker GRPC URL: %s" % FLAGS.worker_grpc_url)
98 | print("Workers = %s" % FLAGS.workers)
99 | is_chief = (FLAGS.worker_index == 0)
100 | if is_chief: tf.reset_default_graph()
101 |
102 | cluster = tf.train.ClusterSpec({"ps": FLAGS.parameter_servers.split(","), "worker": FLAGS.workers.split(",")})
103 | # Construct device setter object
104 | device_setter = tf.train.replica_device_setter(cluster=cluster)
105 |
106 | # The device setter will automatically place Variables ops on separate
107 | # parameter servers (ps). The non-Variable ops will be placed on the workers.
108 | with tf.device(device_setter):
109 | global_step = tf.Variable(0, trainable=False)
110 |
111 | # Training input data.
112 | train_inputs = tf.placeholder(tf.int32, shape=[batch_size])
113 | train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
114 |
115 | # Validate input data.
116 | valid_examples = np.random.choice(valid_window, valid_size, replace=False)
117 | valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
118 |
119 | # Look up embeddings for inputs.
120 | embeddings = tf.Variable(tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
121 | embed = tf.nn.embedding_lookup(embeddings, train_inputs)
122 |
123 | # Construct the variables for the NCE loss
124 | nce_weights = tf.Variable(
125 | tf.truncated_normal([vocabulary_size, embedding_size],
126 | stddev=1.0 / math.sqrt(embedding_size)))
127 | nce_biases = tf.Variable(tf.zeros([vocabulary_size]))
128 |
129 | # Compute the average NCE loss for the batch.
130 | # tf.nce_loss automatically draws a new sample of the negative labels each
131 | # time we evaluate the loss.
132 | loss = tf.reduce_mean(tf.nn.nce_loss(
133 | nce_weights, nce_biases, embed, train_labels, num_sampled, vocabulary_size))
134 |
135 | # Construct the SGD optimizer using a learning rate of 1.0.
136 | optimizer = tf.train.GradientDescentOptimizer(1.0).minimize(loss, global_step=global_step)
137 |
138 | # Compute the cosine similarity between minibatch examples and all embeddings.
139 | norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
140 | normalized_embeddings = embeddings / norm
141 | valid_embeddings = tf.nn.embedding_lookup(normalized_embeddings, valid_dataset)
142 | similarity = tf.matmul(valid_embeddings, normalized_embeddings, transpose_b=True)
143 |
144 | sv = tf.train.Supervisor(is_chief=is_chief,
145 | logdir="/tmp/dist-w2v",
146 | saver=tf.train.Saver(),
147 | init_op=tf.initialize_all_variables(),
148 | recovery_wait_secs=1,
149 | global_step=global_step)
150 | sess_config = tf.ConfigProto(allow_soft_placement=True, log_device_placement=True,
151 | device_filters=["/job:ps", "/job:worker/task:%d" % FLAGS.worker_index])
152 |
153 | # The chief worker (worker_index==0) session will prepare the session,
154 | # while the remaining workers will wait for the preparation to complete.
155 | if is_chief:
156 | print("Worker %d: Initializing session..." % FLAGS.worker_index)
157 | else:
158 | print("Worker %d: Waiting for session to be initialized..." % FLAGS.worker_index)
159 |
160 | with sv.prepare_or_wait_for_session(FLAGS.worker_grpc_url, config=sess_config) as sess:
161 | print("Worker %d: Session initialization complete." % FLAGS.worker_index)
162 |
163 | # Output evaluation result
164 | def print_eval_result():
165 | sim = similarity.eval()
166 | for i in xrange(valid_size):
167 | valid_word = reverse_dictionary[valid_examples[i]]
168 | top_k = 8 # number of nearest neighbors
169 | nearest = (-sim[i, :]).argsort()[1:top_k + 1]
170 | log_str = "Nearest to %s:" % valid_word
171 | for k in xrange(top_k):
172 | close_word = reverse_dictionary[nearest[k]]
173 | log_str = "%s %s," % (log_str, close_word)
174 | print(log_str)
175 |
176 | # Perform training
177 | time_begin = time.time()
178 | average_loss = 0
179 | local_step = 0
180 | print("Training begins @ %f" % time_begin)
181 | while True:
182 | # Training feed
183 | batch_inputs, batch_labels = generate_batch(batch_size, num_skips, skip_window)
184 | feed_dict = {train_inputs : batch_inputs, train_labels : batch_labels}
185 |
186 | _, loss_val, step = sess.run([optimizer, loss, global_step], feed_dict=feed_dict)
187 | average_loss += loss_val
188 | if local_step % 5000 == 0:
189 | if local_step > 0: average_loss /= 5000
190 | print("Worker %d: Finished %d training steps (global step: %d): Average Loss: %g" % (FLAGS.worker_index, local_step, step, average_loss))
191 | average_loss = 0
192 |
193 | if local_step % 20000 == 0:
194 | print("Validate evaluation result at step ", local_step)
195 | print_eval_result()
196 |
197 | if step >= train_step: break
198 | local_step += 1
199 |
200 | time_end = time.time()
201 | print("Training ends @ %f" % time_end)
202 | training_time = time_end - time_begin
203 | print("Training elapsed time: %f s" % training_time)
204 |
205 | # Final word vectors.
206 | print("Final answer:")
207 | print_eval_result()
208 |
--------------------------------------------------------------------------------
/file_server.py:
--------------------------------------------------------------------------------
1 | # -*- coding: utf-8 -*-
2 | #!/usr/bin/env python
3 |
4 | """Simple HTTP Server With Upload.
5 | This module builds on BaseHTTPServer by implementing the standard GET
6 | and HEAD requests in a fairly straightforward manner.
7 | """
8 |
9 |
10 | __version__ = "0.1"
11 | __all__ = ["SimpleHTTPRequestHandler"]
12 | __author__ = "bones7456"
13 | __home_page__ = "http://li2z.cn/"
14 |
15 | import os
16 | import posixpath
17 | import BaseHTTPServer
18 | import urllib
19 | import cgi
20 | import shutil
21 | import mimetypes
22 | import re
23 | try:
24 | from cStringIO import StringIO
25 | except ImportError:
26 | from StringIO import StringIO
27 |
28 |
29 | class SimpleHTTPRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
30 |
31 | """Simple HTTP request handler with GET/HEAD/POST commands.
32 | This serves files from the current directory and any of its
33 | subdirectories. The MIME type for files is determined by
34 | calling the .guess_type() method. And can reveive file uploaded
35 | by client.
36 | The GET/HEAD/POST requests are identical except that the HEAD
37 | request omits the actual contents of the file.
38 | """
39 |
40 | server_version = "SimpleHTTPWithUpload/" + __version__
41 |
42 | def do_GET(self):
43 | """Serve a GET request."""
44 | f = self.send_head()
45 | if f:
46 | self.copyfile(f, self.wfile)
47 | f.close()
48 |
49 | def do_HEAD(self):
50 | """Serve a HEAD request."""
51 | f = self.send_head()
52 | if f:
53 | f.close()
54 |
55 | def do_POST(self):
56 | """Serve a POST request."""
57 | r, info = self.deal_post_data()
58 | print r, info, "by: ", self.client_address
59 | f = StringIO()
60 | f.write('')
61 | f.write("\n
Upload Result Page \n")
62 | f.write("\nUpload Result Page \n")
63 | f.write(" \n")
64 | if r:
65 | f.write("Success: ")
66 | else:
67 | f.write("Failed: ")
68 | f.write(info)
69 | f.write("back " % self.headers['referer'])
70 | f.write("Powerd By: bones7456, check new version at ")
71 | f.write("")
72 | f.write("here . \n\n")
73 | length = f.tell()
74 | f.seek(0)
75 | self.send_response(200)
76 | self.send_header("Content-type", "text/html")
77 | self.send_header("Content-Length", str(length))
78 | self.end_headers()
79 | if f:
80 | self.copyfile(f, self.wfile)
81 | f.close()
82 |
83 | def deal_post_data(self):
84 | boundary = self.headers.plisttext.split("=")[1]
85 | remainbytes = int(self.headers['content-length'])
86 | line = self.rfile.readline()
87 | remainbytes -= len(line)
88 | if not boundary in line:
89 | return (False, "Content NOT begin with boundary")
90 | line = self.rfile.readline()
91 | remainbytes -= len(line)
92 | fn = re.findall(r'Content-Disposition.*name="file"; filename="(.*)"', line)
93 | if not fn:
94 | return (False, "Can't find out file name...")
95 | path = self.translate_path(self.path)
96 | fn = os.path.join(path, fn[0])
97 | line = self.rfile.readline()
98 | remainbytes -= len(line)
99 | line = self.rfile.readline()
100 | remainbytes -= len(line)
101 | try:
102 | out = open(fn, 'wb')
103 | except IOError:
104 | return (False, "Can't create file to write, do you have permission to write?")
105 |
106 | preline = self.rfile.readline()
107 | remainbytes -= len(preline)
108 | while remainbytes > 0:
109 | line = self.rfile.readline()
110 | remainbytes -= len(line)
111 | if boundary in line:
112 | preline = preline[0:-1]
113 | if preline.endswith('\r'):
114 | preline = preline[0:-1]
115 | out.write(preline)
116 | out.close()
117 | return (True, "File '%s' upload success!" % fn)
118 | else:
119 | out.write(preline)
120 | preline = line
121 | return (False, "Unexpect Ends of data.")
122 |
123 | def send_head(self):
124 | """Common code for GET and HEAD commands.
125 | This sends the response code and MIME headers.
126 | Return value is either a file object (which has to be copied
127 | to the outputfile by the caller unless the command was HEAD,
128 | and must be closed by the caller under all circumstances), or
129 | None, in which case the caller has nothing further to do.
130 | """
131 | path = self.translate_path(self.path)
132 | f = None
133 | if os.path.isdir(path):
134 | if not self.path.endswith('/'):
135 | # redirect browser - doing basically what apache does
136 | self.send_response(301)
137 | self.send_header("Location", self.path + "/")
138 | self.end_headers()
139 | return None
140 | for index in "index.html", "index.htm":
141 | index = os.path.join(path, index)
142 | if os.path.exists(index):
143 | path = index
144 | break
145 | else:
146 | return self.list_directory(path)
147 | ctype = self.guess_type(path)
148 | try:
149 | # Always read in binary mode. Opening files in text mode may cause
150 | # newline translations, making the actual size of the content
151 | # transmitted *less* than the content-length!
152 | f = open(path, 'rb')
153 | except IOError:
154 | self.send_error(404, "File not found")
155 | return None
156 | self.send_response(200)
157 | self.send_header("Content-type", ctype)
158 | fs = os.fstat(f.fileno())
159 | self.send_header("Content-Length", str(fs[6]))
160 | self.send_header("Last-Modified", self.date_time_string(fs.st_mtime))
161 | self.end_headers()
162 | return f
163 |
164 | def list_directory(self, path):
165 | """Helper to produce a directory listing (absent index.html).
166 | Return value is either a file object, or None (indicating an
167 | error). In either case, the headers are sent, making the
168 | interface the same as for send_head().
169 | """
170 | try:
171 | list = os.listdir(path)
172 | except os.error:
173 | self.send_error(404, "No permission to list directory")
174 | return None
175 | print list
176 | list.sort(key=lambda a: a.lower())
177 | f = StringIO()
178 | displaypath = cgi.escape(urllib.unquote(self.path))
179 | f.write('')
180 | f.write("\nDirectory listing for %s \n" % displaypath)
181 | f.write("\nDirectory listing for %s \n" % displaypath)
182 | f.write(" \n")
183 | f.write("\n")
186 | f.write(" \n\n")
187 | for name in list:
188 | fullname = os.path.join(path, name)
189 | displayname = linkname = name
190 | # Append / for directories or @ for symbolic links
191 | if os.path.isdir(fullname):
192 | displayname = name + "/"
193 | linkname = name + "/"
194 | if os.path.islink(fullname):
195 | displayname = name + "@"
196 | # Note: a link to a directory displays with @ and links with /
197 | f.write('%s \n'
198 | % (urllib.quote(linkname), cgi.escape(displayname)))
199 | f.write(" \n \n\n\n")
200 | length = f.tell()
201 | f.seek(0)
202 | self.send_response(200)
203 | self.send_header("Content-type", "text/html")
204 | self.send_header("Content-Length", str(length))
205 | self.end_headers()
206 | return f
207 |
208 | def translate_path(self, path):
209 | """Translate a /-separated PATH to the local filename syntax.
210 | Components that mean special things to the local file system
211 | (e.g. drive or directory names) are ignored. (XXX They should
212 | probably be diagnosed.)
213 | """
214 | # abandon query parameters
215 | path = path.split('?',1)[0]
216 | path = path.split('#',1)[0]
217 | path = posixpath.normpath(urllib.unquote(path))
218 | words = path.split('/')
219 | words = filter(None, words)
220 | path = os.getcwd()
221 | for word in words:
222 | drive, word = os.path.splitdrive(word)
223 | head, word = os.path.split(word)
224 | if word in (os.curdir, os.pardir): continue
225 | path = os.path.join(path, word)
226 | return path
227 |
228 | def copyfile(self, source, outputfile):
229 | """Copy all data between two file objects.
230 | The SOURCE argument is a file object open for reading
231 | (or anything with a read() method) and the DESTINATION
232 | argument is a file object open for writing (or
233 | anything with a write() method).
234 | The only reason for overriding this would be to change
235 | the block size or perhaps to replace newlines by CRLF
236 | -- note however that this the default server uses this
237 | to copy binary data as well.
238 | """
239 | shutil.copyfileobj(source, outputfile)
240 |
241 | def guess_type(self, path):
242 | """Guess the type of a file.
243 | Argument is a PATH (a filename).
244 | Return value is a string of the form type/subtype,
245 | usable for a MIME Content-type header.
246 | The default implementation looks the file's extension
247 | up in the table self.extensions_map, using application/octet-stream
248 | as a default; however it would be permissible (if
249 | slow) to look inside the data to make a better guess.
250 | """
251 |
252 | base, ext = posixpath.splitext(path)
253 | if ext in self.extensions_map:
254 | return self.extensions_map[ext]
255 | ext = ext.lower()
256 | if ext in self.extensions_map:
257 | return self.extensions_map[ext]
258 | else:
259 | return self.extensions_map['']
260 |
261 | if not mimetypes.inited:
262 | mimetypes.init() # try to read system mime.types
263 | extensions_map = mimetypes.types_map.copy()
264 | extensions_map.update({
265 | '': 'application/octet-stream', # Default
266 | '.py': 'text/plain',
267 | '.c': 'text/plain',
268 | '.h': 'text/plain',
269 | })
270 |
271 |
272 | def test(HandlerClass = SimpleHTTPRequestHandler,
273 | ServerClass = BaseHTTPServer.HTTPServer):
274 | BaseHTTPServer.test(HandlerClass, ServerClass)
275 |
276 | if __name__ == '__main__':
277 | test()
278 |
--------------------------------------------------------------------------------
/imagenet_serving/Dockerfile:
--------------------------------------------------------------------------------
1 | FROM ubuntu:14.04
2 |
3 | MAINTAINER Jeremiah Harmsen
4 |
5 | RUN apt-get update && apt-get install -y \
6 | build-essential \
7 | curl \
8 | git \
9 | libfreetype6-dev \
10 | libpng12-dev \
11 | libzmq3-dev \
12 | pkg-config \
13 | python-dev \
14 | python-numpy \
15 | python-pip \
16 | software-properties-common \
17 | swig \
18 | zip \
19 | zlib1g-dev \
20 | && \
21 | apt-get clean && \
22 | rm -rf /var/lib/apt/lists/*
23 |
24 | RUN curl -fSsL -O https://bootstrap.pypa.io/get-pip.py && \
25 | python get-pip.py && \
26 | rm get-pip.py
27 |
28 | # Set up grpc
29 |
30 | RUN pip install enum34 futures six && \
31 | pip install --pre protobuf>=3.0.0a3 && \
32 | pip install -i https://testpypi.python.org/simple --pre grpcio
33 |
34 | # Set up Bazel.
35 |
36 | # We need to add a custom PPA to pick up JDK8, since trusty doesn't
37 | # have an openjdk8 backport. openjdk-r is maintained by a reliable contributor:
38 | # Matthias Klose (https://launchpad.net/~doko). It will do until
39 | # we either update the base image beyond 14.04 or openjdk-8 is
40 | # finally backported to trusty; see e.g.
41 | # https://bugs.launchpad.net/trusty-backports/+bug/1368094
42 | RUN add-apt-repository -y ppa:openjdk-r/ppa && \
43 | apt-get update && \
44 | apt-get install -y openjdk-8-jdk openjdk-8-jre-headless && \
45 | apt-get clean && \
46 | rm -rf /var/lib/apt/lists/*
47 |
48 | # Running bazel inside a `docker build` command causes trouble, cf:
49 | # https://github.com/bazelbuild/bazel/issues/134
50 | # The easiest solution is to set up a bazelrc file forcing --batch.
51 | RUN echo "startup --batch" >>/root/.bazelrc
52 | # Similarly, we need to workaround sandboxing issues:
53 | # https://github.com/bazelbuild/bazel/issues/418
54 | RUN echo "build --spawn_strategy=standalone --genrule_strategy=standalone" \
55 | >>/root/.bazelrc
56 | ENV BAZELRC /root/.bazelrc
57 | # Install the most recent bazel release.
58 | ENV BAZEL_VERSION 0.2.0
59 | WORKDIR /
60 | RUN mkdir /bazel && \
61 | cd /bazel && \
62 | curl -fSsL -O https://github.com/bazelbuild/bazel/releases/download/$BAZEL_VERSION/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
63 | curl -fSsL -o /bazel/LICENSE.txt https://raw.githubusercontent.com/bazelbuild/bazel/master/LICENSE.txt && \
64 | chmod +x bazel-*.sh && \
65 | ./bazel-$BAZEL_VERSION-installer-linux-x86_64.sh && \
66 | cd / && \
67 | rm -f /bazel/bazel-$BAZEL_VERSION-installer-linux-x86_64.sh
68 |
69 | CMD ["/bin/bash"]
70 |
--------------------------------------------------------------------------------
/imagenet_serving/README.md:
--------------------------------------------------------------------------------
1 | ## [ImageNet](http://www.image-net.org/)线上分类服务器
2 | 这里通过一个已经训练好的model来分类新的图片。Caicloud提供了一个已经编译好的镜像```index.caicloud.io/caicloud/inception_serving```。如果想要了解这个镜像是如何编译的,[这里](https://tensorflow.github.io/serving/serving_inception)有详细的介绍。
3 |
4 | 镜像准备好之后,可以通过serving.json来在kubernetes里启动service:
5 | ```
6 | kubectl create -f serving.json
7 | ```
8 |
9 | 当服务在Kubernetes里面建好之后,使用以下命令得到服务端口:
10 | ```
11 | kubectl describe svc inception-service
12 | ```
13 |
14 | 我们可以看到类似以下的信息:
15 | ```
16 | Name: inception-service
17 | Namespace: default
18 | Labels:
19 | Selector: worker=inception-pod
20 | Type: NodePort
21 | IP: 10.254.121.195
22 | Port: 9000/TCP
23 | NodePort: 32668/TCP
24 | ```
25 | 其中```NodePort```就是我们需要的端口号,有了端口号,我们还需要知道IP。通过下面的命令可以查到IP。先查看所有节点列表
26 | ```
27 | kubectl get nodes
28 | ```
29 | 可以得到类似下面的信息:
30 | ```
31 | NAME STATUS AGE
32 | i-dh4t40ez Ready 19d
33 | i-jnr9dxhz Ready,SchedulingDisabled 19d
34 | i-tiga0i1q Ready 19d
35 | ```
36 | 随便选取一个节点,获取节点IP信息:
37 | ```
38 | kubectl describe node i-dh4t40ez
39 | ```
40 | 可以得到类似如下的结果:
41 | ```
42 | Name: i-dh4t40ez
43 | Labels: failure-domain.beta.kubernetes.io/region=ac1,failure-domain.beta.kubernetes.io/zone=ac1,kubernetes.io/hostname=i-dh4t40ez
44 | CreationTimestamp: Thu, 26 May 2016 07:10:59 +0800
45 | Phase:
46 | Conditions:
47 | Type Status LastHeartbeatTime LastTransitionTime Reason Message
48 | ---- ------ ----------------- ------------------ ------ -------
49 | OutOfDisk False Tue, 14 Jun 2016 10:53:07 +0800 Thu, 26 May 2016 07:10:59 +0800 KubeletHasSufficientDisk kubelet has sufficient disk space available
50 | Ready True Tue, 14 Jun 2016 10:53:07 +0800 Thu, 26 May 2016 07:10:59 +0800 KubeletReady kubelet is posting ready status
51 | Addresses: 180.101.191.78,10.244.1.0
52 | ```
53 | 其中```Addresses```中给出了外网IP```180.101.191.78```,这样这个图片分类器的服务地址就是```180.101.191.78:32668```
54 |
55 | ## client端
56 | #### 直接使用Docker镜像
57 | ```
58 | docker run -it -v "$PWD"/pic:/pic index.caicloud.io/caicloud/inception_serving
59 | cd serving
60 | ```
61 |
62 | #### 在本地编译
63 | 1. 根据[文档](https://tensorflow.github.io/serving/setup#prerequisites)安装必要的工具
64 |
65 | 2. 下载代码并编译client端(第一次编译的时间比较长,需要2-4个小时):
66 | ```
67 | git clone --recurse-submodules https://github.com/tensorflow/serving
68 | cd serving/tensorflow
69 | ./configure
70 | cd ..
71 | bazel build -c opt tensorflow_serving/...
72 | ```
73 |
74 | ## 使用client端分类图片
75 | ```
76 | bazel-bin/tensorflow_serving/example/inception_client --server=180.101.191.78:32668 --image=/pic/02ea79e4aad9d6275da78a9170fa4e82.jpg
77 | ```
78 | 参数```server```需要替换成启动的服务器的地址。
79 |
80 | 运行时有可能会出现超时问题,如果出现此问题,可以修改时间限制的参数:
81 | ```
82 | vi tensorflow_serving/example/inception_client.py
83 | ```
84 | 修改下面超时时限:
85 | ```
86 | result = stub.Classify(request, 10.0) # 10 secs timeout
87 | ```
88 |
89 |
--------------------------------------------------------------------------------
/imagenet_serving/pic/02ea79e4aad9d6275da78a9170fa4e82.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/caicloud/tensorflow-demo/87460710b5fc9c829de369013c0008a9322b6c4f/imagenet_serving/pic/02ea79e4aad9d6275da78a9170fa4e82.jpg
--------------------------------------------------------------------------------
/imagenet_serving/pic/07889356d62fa6517b0db6cf9dcf1f96.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/caicloud/tensorflow-demo/87460710b5fc9c829de369013c0008a9322b6c4f/imagenet_serving/pic/07889356d62fa6517b0db6cf9dcf1f96.jpg
--------------------------------------------------------------------------------
/imagenet_serving/pic/516308313.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/caicloud/tensorflow-demo/87460710b5fc9c829de369013c0008a9322b6c4f/imagenet_serving/pic/516308313.jpg
--------------------------------------------------------------------------------
/imagenet_serving/pic/7e7e745620d307aa2cb4afcdfa90d189.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/caicloud/tensorflow-demo/87460710b5fc9c829de369013c0008a9322b6c4f/imagenet_serving/pic/7e7e745620d307aa2cb4afcdfa90d189.jpg
--------------------------------------------------------------------------------
/imagenet_serving/serving.json:
--------------------------------------------------------------------------------
1 | {
2 | "apiVersion": "v1",
3 | "kind": "ReplicationController",
4 | "metadata": {
5 | "name": "inception-controller"
6 | },
7 | "spec": {
8 | "replicas": 3,
9 | "selector": {
10 | "worker": "inception-pod"
11 | },
12 | "template": {
13 | "metadata": {
14 | "labels": {
15 | "worker": "inception-pod"
16 | }
17 | },
18 | "spec": {
19 | "containers": [
20 | {
21 | "name": "inception-container",
22 | "image": "index.caicloud.io/caicloud/inception_serving",
23 | "command": [
24 | "/bin/sh",
25 | "-c"
26 | ],
27 | "args": [
28 | "/serving/bazel-bin/tensorflow_serving/example/inception_inference --port=9000 /serving/inception-export"
29 | ],
30 | "ports": [
31 | {
32 | "containerPort": 9000
33 | }
34 | ]
35 | }
36 | ],
37 | "restartPolicy": "Always"
38 | }
39 | }
40 | }
41 | }
42 |
43 | {
44 | "kind": "Service",
45 | "apiVersion": "v1",
46 | "metadata": {
47 | "name": "inception-service"
48 | },
49 | "spec": {
50 | "ports": [
51 | {
52 | "port": 9000
53 | }
54 | ],
55 | "selector": {
56 | "worker": "inception-pod"
57 | },
58 | "type": "NodePort"
59 | }
60 | }
61 |
--------------------------------------------------------------------------------
/k8s_tf.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: ReplicationController
3 | metadata:
4 | name: tf-worker1
5 | spec:
6 | replicas: 1
7 | template:
8 | metadata:
9 | labels:
10 | tf-worker: "0"
11 | spec:
12 | containers:
13 | - name: tf-worker1
14 | image: index.caicloud.io/tf_grpc_test_server
15 | args:
16 | - --cluster_spec=worker|tf-worker1:2222;tf-worker3:2222;tf-worker5:2222,ps|tf-ps2:2222;tf-ps4:2222
17 | - --job_name=worker
18 | - --task_id=0
19 | ports:
20 | - containerPort: 2222
21 | ---
22 | apiVersion: v1
23 | kind: Service
24 | metadata:
25 | name: tf-worker1
26 | labels:
27 | tf-worker: "0"
28 | spec:
29 | type: LoadBalancer
30 | ports:
31 | - port: 2222
32 | selector:
33 | tf-worker: "0"
34 | ---
35 | apiVersion: v1
36 | kind: ReplicationController
37 | metadata:
38 | name: tf-worker3
39 | spec:
40 | replicas: 1
41 | template:
42 | metadata:
43 | labels:
44 | tf-worker: "1"
45 | spec:
46 | containers:
47 | - name: tf-worker3
48 | image: index.caicloud.io/caicloud/tf_grpc_test_server
49 | args:
50 | - --cluster_spec=worker|tf-worker1:2222;tf-worker3:2222;tf-worker5:2222,ps|tf-ps2:2222;tf-ps4:2222
51 | - --job_name=worker
52 | - --task_id=1
53 | ports:
54 | - containerPort: 2222
55 | ---
56 | apiVersion: v1
57 | kind: Service
58 | metadata:
59 | type: LoadBalancer
60 | name: tf-worker3
61 | labels:
62 | tf-worker: "1"
63 | spec:
64 | ports:
65 | - port: 2222
66 | selector:
67 | tf-worker: "1"
68 | ---
69 | apiVersion: v1
70 | kind: ReplicationController
71 | metadata:
72 | name: tf-worker5
73 | spec:
74 | replicas: 1
75 | template:
76 | metadata:
77 | labels:
78 | tf-worker: "2"
79 | spec:
80 | containers:
81 | - name: tf-worker5
82 | image: index.caicloud.io/caicloud/tf_grpc_test_server
83 | args:
84 | - --cluster_spec=worker|tf-worker1:2222;tf-worker3:2222;tf-worker5:2222,ps|tf-ps2:2222;tf-ps4:2222
85 | - --job_name=worker
86 | - --task_id=2
87 | ports:
88 | - containerPort: 2222
89 | ---
90 | apiVersion: v1
91 | kind: Service
92 | metadata:
93 | type: LoadBalancer
94 | name: tf-worker5
95 | labels:
96 | tf-worker: "2"
97 | spec:
98 | ports:
99 | - port: 2222
100 | selector:
101 | tf-worker: "2"
102 | ---
103 | apiVersion: v1
104 | kind: ReplicationController
105 | metadata:
106 | name: tf-ps2
107 | spec:
108 | replicas: 1
109 | template:
110 | metadata:
111 | labels:
112 | tf-ps: "0"
113 | spec:
114 | containers:
115 | - name: tf-ps2
116 | image: index.caicloud.io/caicloud/tf_grpc_test_server
117 | args:
118 | - --cluster_spec=worker|tf-worker1:2222;tf-worker3:2222;tf-worker5:2222,ps|tf-ps2:2222;tf-ps4:2222
119 | - --job_name=ps
120 | - --task_id=0
121 | ports:
122 | - containerPort: 2222
123 | ---
124 | apiVersion: v1
125 | kind: Service
126 | metadata:
127 | name: tf-ps2
128 | labels:
129 | tf-ps: "0"
130 | spec:
131 | ports:
132 | - port: 2222
133 | selector:
134 | tf-ps: "0"
135 | ---
136 | apiVersion: v1
137 | kind: ReplicationController
138 | metadata:
139 | name: tf-ps4
140 | spec:
141 | replicas: 1
142 | template:
143 | metadata:
144 | labels:
145 | tf-ps: "1"
146 | spec:
147 | containers:
148 | - name: tf-ps4
149 | image: index.caicloud.io/caicloud/tf_grpc_test_server
150 | args:
151 | - --cluster_spec=worker|tf-worker1:2222;tf-worker3:2222;tf-worker5:2222,ps|tf-ps2:2222;tf-ps4:2222
152 | - --job_name=ps
153 | - --task_id=1
154 | ports:
155 | - containerPort: 2222
156 | ---
157 | apiVersion: v1
158 | kind: Service
159 | metadata:
160 | name: tf-ps4
161 | labels:
162 | tf-ps: "1"
163 | spec:
164 | ports:
165 | - port: 2222
166 | selector:
167 | tf-ps: "1"
168 | ---
169 |
170 |
--------------------------------------------------------------------------------
/k8s_tf_runner.yaml:
--------------------------------------------------------------------------------
1 | apiVersion: v1
2 | kind: ReplicationController
3 | metadata:
4 | name: tf-runner
5 | spec:
6 | replicas: 1
7 | template:
8 | metadata:
9 | labels:
10 | name: tf-runner
11 | spec:
12 | containers:
13 | - name: tf-runner
14 | image: index.caicloud.io/tensorflow:0.8.0
15 | imagePullPolicy: Always
16 | ports:
17 | - containerPort: 6006
18 | name: tensorboard
19 | - containerPort: 8888
20 | name: jupyter
21 | - containerPort: 8000
22 | name: fileserver
23 | ---
24 | apiVersion: v1
25 | kind: Service
26 | metadata:
27 | name: tf-runner
28 | labels:
29 | name: tf-runner
30 | spec:
31 | type: LoadBalancer
32 | ports:
33 | - name: tensorboard
34 | port: 6006
35 | - name: jupyter
36 | port: 8888
37 | - name: fileserver
38 | port: 8000
39 | selector:
40 | name: tf-runner
41 |
42 |
--------------------------------------------------------------------------------
/notebooks/RNN_PennTreeBank_LanguageModeling.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "### Step1: 读取数据"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {
14 | "collapsed": false
15 | },
16 | "outputs": [
17 | {
18 | "name": "stdout",
19 | "output_type": "stream",
20 | "text": [
21 | "Words in training data: 929589\n",
22 | "Words in validating data: 73760\n",
23 | "Words in testing data: 82430\n",
24 | "Example training data: [9970, 9971, 9972, 9974, 9975, 9976, 9980, 9981, 9982, 9983]\n",
25 | "Example validating data: [1132, 93, 358, 5, 329, 51, 9836, 6, 326, 2476]\n",
26 | "Example testing data: [102, 14, 24, 32, 752, 381, 2, 29, 120, 0]\n"
27 | ]
28 | }
29 | ],
30 | "source": [
31 | "import time\n",
32 | "import collections\n",
33 | "import os\n",
34 | "\n",
35 | "import numpy as np\n",
36 | "import tensorflow as tf\n",
37 | "\n",
38 | "def read_words(filename):\n",
39 | " with tf.gfile.GFile(filename, \"r\") as f:\n",
40 | " return f.read().replace(\"\\n\", \"\").split()\n",
41 | "\n",
42 | "def build_vocab(filename):\n",
43 | " data = read_words(filename)\n",
44 | " counter = collections.Counter(data)\n",
45 | " count_pairs = sorted(counter.items(), key=lambda x: (-x[1], x[0]))\n",
46 | " words, _ = list(zip(*count_pairs))\n",
47 | " word_to_id = dict(zip(words, range(len(words))))\n",
48 | " return word_to_id\n",
49 | "\n",
50 | "def file_to_word_ids(filename, word_to_id):\n",
51 | " data = read_words(filename)\n",
52 | " return [word_to_id[word] for word in data]\n",
53 | "\n",
54 | "def ptb_raw_data():\n",
55 | " train_path = \"ptb.train.txt\"\n",
56 | " valid_path = \"ptb.valid.txt\"\n",
57 | " test_path = \"ptb.test.txt\"\n",
58 | "\n",
59 | " word_to_id = build_vocab(train_path)\n",
60 | " train_data = file_to_word_ids(train_path, word_to_id)\n",
61 | " valid_data = file_to_word_ids(valid_path, word_to_id)\n",
62 | " test_data = file_to_word_ids(test_path, word_to_id)\n",
63 | " return train_data, valid_data, test_data\n",
64 | "\n",
65 | "train_data, valid_data, test_data = ptb_raw_data()\n",
66 | "print \"Words in training data:\", len(train_data)\n",
67 | "print \"Words in validating data:\", len(valid_data)\n",
68 | "print \"Words in testing data:\", len(test_data)\n",
69 | "print \"Example training data:\", train_data[:10]\n",
70 | "print \"Example validating data:\", valid_data[:10]\n",
71 | "print \"Example testing data:\", test_data[:10]"
72 | ]
73 | },
74 | {
75 | "cell_type": "markdown",
76 | "metadata": {},
77 | "source": [
78 | "### Step2: 整理RNN数据格式"
79 | ]
80 | },
81 | {
82 | "cell_type": "code",
83 | "execution_count": 2,
84 | "metadata": {
85 | "collapsed": false
86 | },
87 | "outputs": [
88 | {
89 | "name": "stdout",
90 | "output_type": "stream",
91 | "text": [
92 | "X: [[ 0 1 2]\n",
93 | " [ 8 9 10]\n",
94 | " [16 17 18]]\n",
95 | "Y: [[ 1 2 3]\n",
96 | " [ 9 10 11]\n",
97 | " [17 18 19]]\n",
98 | "-------------------\n",
99 | "X: [[ 3 4 5]\n",
100 | " [11 12 13]\n",
101 | " [19 20 21]]\n",
102 | "Y: [[ 4 5 6]\n",
103 | " [12 13 14]\n",
104 | " [20 21 22]]\n",
105 | "-------------------\n"
106 | ]
107 | }
108 | ],
109 | "source": [
110 | "def ptb_iterator(raw_data, batch_size, num_steps):\n",
111 | " raw_data = np.array(raw_data, dtype=np.int32)\n",
112 | " data_len = len(raw_data)\n",
113 | " batch_len = data_len // batch_size\n",
114 | " data = np.zeros([batch_size, batch_len], dtype=np.int32)\n",
115 | " for i in range(batch_size):\n",
116 | " data[i] = raw_data[batch_len * i:batch_len * (i + 1)]\n",
117 | "\n",
118 | " epoch_size = (batch_len - 1) // num_steps\n",
119 | " if epoch_size == 0:\n",
120 | " raise ValueError(\"epoch_size == 0, decrease batch_size or num_steps\")\n",
121 | "\n",
122 | " for i in range(epoch_size):\n",
123 | " x = data[:, i*num_steps:(i+1)*num_steps]\n",
124 | " y = data[:, i*num_steps+1:(i+1)*num_steps+1]\n",
125 | " yield (x, y)\n",
126 | "\n",
127 | "result = ptb_iterator(range(25), 3, 3)\n",
128 | "for x, y in result:\n",
129 | " print \"X:\", x\n",
130 | " print \"Y:\", y\n",
131 | " print \"-------------------\"\n",
132 | " "
133 | ]
134 | },
135 | {
136 | "cell_type": "markdown",
137 | "metadata": {},
138 | "source": [
139 | "### Step 3: 建立RNN网络"
140 | ]
141 | },
142 | {
143 | "cell_type": "code",
144 | "execution_count": null,
145 | "metadata": {
146 | "collapsed": false
147 | },
148 | "outputs": [
149 | {
150 | "name": "stdout",
151 | "output_type": "stream",
152 | "text": [
153 | "Model generated!\n"
154 | ]
155 | }
156 | ],
157 | "source": [
158 | "hidden_size = 650\n",
159 | "num_layer = 2\n",
160 | "vocab_size = 10000\n",
161 | "\n",
162 | "class PTBModel(object):\n",
163 | " def __init__(self, is_training, batch_size, num_steps):\n",
164 | " self.batch_size = batch_size\n",
165 | " self.num_steps = num_steps\n",
166 | " \n",
167 | " # Define Input & Output\n",
168 | " self.input_data = tf.placeholder(tf.int32, [batch_size, num_steps])\n",
169 | " self.targets = tf.placeholder(tf.int32, [batch_size, num_steps])\n",
170 | " \n",
171 | " # Define RNN network\n",
172 | " lstm_cell = tf.nn.rnn_cell.BasicLSTMCell(hidden_size, forget_bias=0.0)\n",
173 | " if is_training :\n",
174 | " lstm_cell = tf.nn.rnn_cell.DropoutWrapper(lstm_cell, output_keep_prob=0.5)\n",
175 | " cell = tf.nn.rnn_cell.MultiRNNCell([lstm_cell] * num_layer)\n",
176 | "\n",
177 | " # Embedding\n",
178 | " self.initial_state = cell.zero_state(batch_size, tf.float32)\n",
179 | " embedding = tf.get_variable(\"embedding\", [vocab_size, hidden_size])\n",
180 | " inputs = tf.nn.embedding_lookup(embedding, self.input_data)\n",
181 | " if is_training: inputs = tf.nn.dropout(inputs, 0.5)\n",
182 | "\n",
183 | " # Forward propregate\n",
184 | " outputs = []\n",
185 | " state = self.initial_state\n",
186 | " with tf.variable_scope(\"RNN\"):\n",
187 | " for time_step in range(num_steps):\n",
188 | " if time_step > 0: tf.get_variable_scope().reuse_variables()\n",
189 | " (cell_output, state) = cell(inputs[:, time_step, :], state)\n",
190 | " outputs.append(cell_output)\n",
191 | "\n",
192 | " output = tf.reshape(tf.concat(1, outputs), [-1, hidden_size])\n",
193 | " softmax_w = tf.get_variable(\"softmax_w\", [hidden_size, vocab_size])\n",
194 | " softmax_b = tf.get_variable(\"softmax_b\", [vocab_size])\n",
195 | " logits = tf.matmul(output, softmax_w) + softmax_b\n",
196 | " loss = tf.nn.seq2seq.sequence_loss_by_example(\n",
197 | " [logits], [tf.reshape(self.targets, [-1])], [tf.ones([batch_size * num_steps])])\n",
198 | " self.cost = cost = tf.reduce_sum(loss) / batch_size\n",
199 | " self.final_state = state\n",
200 | "\n",
201 | " if not is_training: return\n",
202 | " self.lr = tf.Variable(0.0, trainable=False)\n",
203 | " tvars = tf.trainable_variables()\n",
204 | " grads, _ = tf.clip_by_global_norm(tf.gradients(cost, tvars), 5)\n",
205 | " optimizer = tf.train.GradientDescentOptimizer(self.lr)\n",
206 | " self.train_op = optimizer.apply_gradients(zip(grads, tvars))\n",
207 | "\n",
208 | " def assign_lr(self, session, lr_value):\n",
209 | " session.run(tf.assign(self.lr, lr_value))\n",
210 | " \n",
211 | "print(\"Model generated!\")"
212 | ]
213 | },
214 | {
215 | "cell_type": "markdown",
216 | "metadata": {
217 | "collapsed": true
218 | },
219 | "source": [
220 | "### Step 4: 训练模型"
221 | ]
222 | },
223 | {
224 | "cell_type": "code",
225 | "execution_count": null,
226 | "metadata": {
227 | "collapsed": false
228 | },
229 | "outputs": [
230 | {
231 | "name": "stdout",
232 | "output_type": "stream",
233 | "text": [
234 | "Epoch: 1 Learning rate: 1.000\n",
235 | "0.008 perplexity: 5743.727 speed: 197 wps\n",
236 | "0.107 perplexity: 1201.711 speed: 233 wps\n",
237 | "0.206 perplexity: 863.146 speed: 235 wps\n",
238 | "0.306 perplexity: 692.729 speed: 237 wps\n",
239 | "0.405 perplexity: 595.858 speed: 239 wps\n",
240 | "0.505 perplexity: 529.792 speed: 240 wps\n",
241 | "0.604 perplexity: 475.562 speed: 241 wps\n",
242 | "0.704 perplexity: 437.009 speed: 241 wps\n",
243 | "0.803 perplexity: 407.033 speed: 242 wps\n",
244 | "0.903 perplexity: 380.664 speed: 242 wps\n",
245 | "Epoch: 1 Train Perplexity: 360.271\n",
246 | "Epoch: 1 Valid Perplexity: 213.824\n",
247 | "Epoch: 2 Learning rate: 1.000\n",
248 | "0.008 perplexity: 257.275 speed: 252 wps\n",
249 | "0.107 perplexity: 199.571 speed: 231 wps\n",
250 | "0.206 perplexity: 207.185 speed: 231 wps\n",
251 | "0.306 perplexity: 201.614 speed: 234 wps\n",
252 | "0.405 perplexity: 199.146 speed: 236 wps\n",
253 | "0.505 perplexity: 196.261 speed: 240 wps\n",
254 | "0.604 perplexity: 190.792 speed: 241 wps\n",
255 | "0.704 perplexity: 187.582 speed: 242 wps\n",
256 | "0.803 perplexity: 184.645 speed: 243 wps\n",
257 | "0.903 perplexity: 180.317 speed: 244 wps\n",
258 | "Epoch: 2 Train Perplexity: 177.518\n",
259 | "Epoch: 2 Valid Perplexity: 153.699\n",
260 | "Epoch: 3 Learning rate: 1.000\n",
261 | "0.008 perplexity: 185.005 speed: 229 wps\n",
262 | "0.107 perplexity: 143.224 speed: 239 wps\n",
263 | "0.206 perplexity: 153.393 speed: 229 wps\n",
264 | "0.306 perplexity: 149.645 speed: 207 wps\n",
265 | "0.405 perplexity: 148.905 speed: 200 wps\n",
266 | "0.505 perplexity: 147.926 speed: 197 wps\n",
267 | "0.604 perplexity: 144.739 speed: 195 wps\n",
268 | "0.704 perplexity: 143.543 speed: 190 wps\n",
269 | "0.803 perplexity: 142.433 speed: 188 wps\n",
270 | "0.903 perplexity: 139.689 speed: 188 wps\n",
271 | "Epoch: 3 Train Perplexity: 138.276\n",
272 | "Epoch: 3 Valid Perplexity: 130.090\n",
273 | "Epoch: 4 Learning rate: 1.000\n",
274 | "0.008 perplexity: 151.183 speed: 186 wps\n",
275 | "0.107 perplexity: 116.480 speed: 182 wps\n"
276 | ]
277 | }
278 | ],
279 | "source": [
280 | "def run_epoch(session, m, data, eval_op, verbose=False):\n",
281 | " epoch_size = ((len(data) // m.batch_size) - 1) // m.num_steps\n",
282 | " start_time = time.time()\n",
283 | " costs = 0.0\n",
284 | " iters = 0\n",
285 | " state = m.initial_state.eval()\n",
286 | " for step, (x, y) in enumerate(ptb_iterator(data, m.batch_size, m.num_steps)):\n",
287 | " cost, state, _ = session.run([m.cost, m.final_state, eval_op], \n",
288 | " {m.input_data: x, m.targets: y, m.initial_state: state})\n",
289 | " costs += cost\n",
290 | " iters += m.num_steps\n",
291 | "\n",
292 | " if verbose and step % (epoch_size // 10) == 10:\n",
293 | " print(\"%.3f perplexity: %.3f speed: %.0f wps\" % \n",
294 | " (step * 1.0 / epoch_size, np.exp(costs / iters),\n",
295 | " iters * m.batch_size / (time.time() - start_time)))\n",
296 | " return np.exp(costs / iters)\n",
297 | "\n",
298 | "with tf.Session() as session:\n",
299 | " initializer = tf.random_uniform_initializer(-0.05, 0.05)\n",
300 | " with tf.variable_scope(\"model\", reuse=None, initializer=initializer):\n",
301 | " m = PTBModel(True, 20, 35)\n",
302 | " with tf.variable_scope(\"model\", reuse=True, initializer=initializer):\n",
303 | " mtest = PTBModel(False, 1, 1)\n",
304 | "\n",
305 | " tf.initialize_all_variables().run()\n",
306 | "\n",
307 | " for i in range(39):\n",
308 | " base_lr = 1.0\n",
309 | " lr_decay = 0.8 ** max(i - 6, 0.0)\n",
310 | " m.assign_lr(session, base_lr * lr_decay)\n",
311 | "\n",
312 | " print(\"Epoch: %d Learning rate: %.3f\" % (i + 1, session.run(m.lr)))\n",
313 | " train_perplexity = run_epoch(session, m, train_data, m.train_op, verbose=True)\n",
314 | " print(\"Epoch: %d Train Perplexity: %.3f\" % (i + 1, train_perplexity))\n",
315 | " valid_perplexity = run_epoch(session, mtest, valid_data, tf.no_op())\n",
316 | " print(\"Epoch: %d Valid Perplexity: %.3f\" % (i + 1, valid_perplexity))\n",
317 | "\n",
318 | " test_perplexity = run_epoch(session, mtest, test_data, tf.no_op())\n",
319 | " print(\"Test Perplexity: %.3f\" % test_perplexity)"
320 | ]
321 | },
322 | {
323 | "cell_type": "code",
324 | "execution_count": null,
325 | "metadata": {
326 | "collapsed": true
327 | },
328 | "outputs": [],
329 | "source": []
330 | }
331 | ],
332 | "metadata": {
333 | "kernelspec": {
334 | "display_name": "Python 2",
335 | "language": "python",
336 | "name": "python2"
337 | },
338 | "language_info": {
339 | "codemirror_mode": {
340 | "name": "ipython",
341 | "version": 2
342 | },
343 | "file_extension": ".py",
344 | "mimetype": "text/x-python",
345 | "name": "python",
346 | "nbconvert_exporter": "python",
347 | "pygments_lexer": "ipython2",
348 | "version": "2.7.6"
349 | }
350 | },
351 | "nbformat": 4,
352 | "nbformat_minor": 0
353 | }
354 |
--------------------------------------------------------------------------------
/notebooks/hello_world.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "import tensorflow的工具包"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {
14 | "collapsed": true
15 | },
16 | "outputs": [],
17 | "source": [
18 | "import tensorflow as tf"
19 | ]
20 | },
21 | {
22 | "cell_type": "markdown",
23 | "metadata": {},
24 | "source": [
25 | "初始化一个session来运行tensorflow程序。 所有对tensorflow图上节点的操作都需要在session底下进行。session维护了运行的上下文信息"
26 | ]
27 | },
28 | {
29 | "cell_type": "code",
30 | "execution_count": 2,
31 | "metadata": {
32 | "collapsed": false
33 | },
34 | "outputs": [],
35 | "source": [
36 | "sess = tf.InteractiveSession()"
37 | ]
38 | },
39 | {
40 | "cell_type": "markdown",
41 | "metadata": {},
42 | "source": [
43 | "通过tf.name_scope来指定命名空间,用一个命名空间中的变量在图上会自动缩略到一起。\n",
44 | "\n",
45 | "在input中指定了3个输入,一个是常数,用tf.constant指定; 一个是变量,用tf.Variable指定,对于变量,我们需要指定初始值。这里的初始值为[0,1)的随机值;最后一个是placeholder,可以理解为输入数据的一个接口。"
46 | ]
47 | },
48 | {
49 | "cell_type": "code",
50 | "execution_count": 3,
51 | "metadata": {
52 | "collapsed": true
53 | },
54 | "outputs": [],
55 | "source": [
56 | "with tf.name_scope('input'):\n",
57 | " input1 = tf.constant([1.0, 2.0, 3.0], name=\"input1\")\n",
58 | " input2 = tf.Variable(tf.random_uniform([3]), name=\"input2\") \n",
59 | " input3 = tf.placeholder(tf.float32, [3], name=\"input3\")"
60 | ]
61 | },
62 | {
63 | "cell_type": "markdown",
64 | "metadata": {},
65 | "source": [
66 | "初始化所有变量。tensorflow要求在计算之前要对所有变量初始化。"
67 | ]
68 | },
69 | {
70 | "cell_type": "code",
71 | "execution_count": 4,
72 | "metadata": {
73 | "collapsed": false
74 | },
75 | "outputs": [
76 | {
77 | "name": "stdout",
78 | "output_type": "stream",
79 | "text": [
80 | "\n",
81 | "[ 0.9391259 0.75194204 0.15929091]\n"
82 | ]
83 | }
84 | ],
85 | "source": [
86 | "tf.initialize_all_variables().run() \n",
87 | "print input2\n",
88 | "print input2.eval()"
89 | ]
90 | },
91 | {
92 | "cell_type": "markdown",
93 | "metadata": {},
94 | "source": [
95 | "指定一个加法操作。注意这里的加法不是3个数的加法,而是3个向量的加法。 tensorflow和numpy类似,可以支持对任意维度的矩阵做运算。"
96 | ]
97 | },
98 | {
99 | "cell_type": "code",
100 | "execution_count": 5,
101 | "metadata": {
102 | "collapsed": false
103 | },
104 | "outputs": [],
105 | "source": [
106 | "with tf.name_scope(\"add\"):\n",
107 | " output = tf.add_n([input1, input2, input3], name=\"add\")"
108 | ]
109 | },
110 | {
111 | "cell_type": "markdown",
112 | "metadata": {},
113 | "source": [
114 | "将建立的计算图信息记录到log中,这样tensorboard可以实现可视化。"
115 | ]
116 | },
117 | {
118 | "cell_type": "code",
119 | "execution_count": 8,
120 | "metadata": {
121 | "collapsed": false
122 | },
123 | "outputs": [
124 | {
125 | "name": "stdout",
126 | "output_type": "stream",
127 | "text": [
128 | "input/input1:0\n"
129 | ]
130 | }
131 | ],
132 | "source": [
133 | "writer = tf.train.SummaryWriter(\"/log/demo-hello\", sess.graph)"
134 | ]
135 | },
136 | {
137 | "cell_type": "markdown",
138 | "metadata": {},
139 | "source": [
140 | "输出加法结果,注意tensorflow采用惰性计算(lazy evluation),只有当变量被用到的时候才会被计算。所以当查看结果时要明确支出需要查看的变量"
141 | ]
142 | },
143 | {
144 | "cell_type": "code",
145 | "execution_count": 7,
146 | "metadata": {
147 | "collapsed": false
148 | },
149 | "outputs": [
150 | {
151 | "name": "stdout",
152 | "output_type": "stream",
153 | "text": [
154 | "[ 3.93912601 5.75194216 7.15929079]\n"
155 | ]
156 | }
157 | ],
158 | "source": [
159 | "print sess.run(output, {input3: [2, 3, 4]})"
160 | ]
161 | },
162 | {
163 | "cell_type": "code",
164 | "execution_count": null,
165 | "metadata": {
166 | "collapsed": true
167 | },
168 | "outputs": [],
169 | "source": []
170 | }
171 | ],
172 | "metadata": {
173 | "kernelspec": {
174 | "display_name": "Python 2",
175 | "language": "python",
176 | "name": "python2"
177 | },
178 | "language_info": {
179 | "codemirror_mode": {
180 | "name": "ipython",
181 | "version": 2
182 | },
183 | "file_extension": ".py",
184 | "mimetype": "text/x-python",
185 | "name": "python",
186 | "nbconvert_exporter": "python",
187 | "pygments_lexer": "ipython2",
188 | "version": "2.7.6"
189 | }
190 | },
191 | "nbformat": 4,
192 | "nbformat_minor": 0
193 | }
194 |
--------------------------------------------------------------------------------
/notebooks/mnist_cnn.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "### Step 1: 读取数据"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {
14 | "collapsed": false
15 | },
16 | "outputs": [
17 | {
18 | "name": "stdout",
19 | "output_type": "stream",
20 | "text": [
21 | "Extracting /tmp/data/train-images-idx3-ubyte.gz\n",
22 | "Extracting /tmp/data/train-labels-idx1-ubyte.gz\n",
23 | "Extracting /tmp/data/t10k-images-idx3-ubyte.gz\n",
24 | "Extracting /tmp/data/t10k-labels-idx1-ubyte.gz\n",
25 | "Training data size: 55000\n",
26 | "Validating data size: 5000\n",
27 | "Testing data size: 10000\n"
28 | ]
29 | }
30 | ],
31 | "source": [
32 | "import tensorflow as tf\n",
33 | "import time\n",
34 | "from tensorflow.examples.tutorials.mnist import input_data\n",
35 | "\n",
36 | "mnist = input_data.read_data_sets(\"/tmp/data\", one_hot=True)\n",
37 | "\n",
38 | "print \"Training data size: \", mnist.train.num_examples\n",
39 | "print \"Validating data size: \", mnist.validation.num_examples\n",
40 | "print \"Testing data size: \", mnist.test.num_examples"
41 | ]
42 | },
43 | {
44 | "cell_type": "markdown",
45 | "metadata": {},
46 | "source": [
47 | "### Step 2: 建立神经网络"
48 | ]
49 | },
50 | {
51 | "cell_type": "code",
52 | "execution_count": 2,
53 | "metadata": {
54 | "collapsed": false
55 | },
56 | "outputs": [
57 | {
58 | "name": "stdout",
59 | "output_type": "stream",
60 | "text": [
61 | "Network created!\n"
62 | ]
63 | }
64 | ],
65 | "source": [
66 | "# This is where training samples and labels are fed to the graph.\n",
67 | "# These placeholder nodes will be fed a batch of training data at each\n",
68 | "# training step using the {feed_dict} argument to the Run() call below.\n",
69 | "BATCH_SIZE = 64\n",
70 | "EVAL_SIZE = 10000\n",
71 | "IMAGE_SIZE = 28\n",
72 | "NUM_CHANNELS = 1\n",
73 | "NUM_LABELS = 10\n",
74 | "\n",
75 | "x = tf.placeholder(tf.float32, shape=(None, IMAGE_SIZE, IMAGE_SIZE, NUM_CHANNELS))\n",
76 | "y_ = tf.placeholder(tf.float32, shape=(None, NUM_LABELS))\n",
77 | "\n",
78 | "# The variables below hold all the trainable weights. \n",
79 | "# Convolutional layers.\n",
80 | "conv1_weights = tf.Variable(tf.truncated_normal([5, 5, NUM_CHANNELS, 32], stddev=0.1, seed = 2))\n",
81 | "conv1_biases = tf.Variable(tf.zeros([32]))\n",
82 | "\n",
83 | "conv2_weights = tf.Variable(tf.truncated_normal([5, 5, 32, 64], stddev=0.1, seed = 2))\n",
84 | "conv2_biases = tf.Variable(tf.constant(0.1, shape=[64]))\n",
85 | "\n",
86 | "# fully connected, depth 512.\n",
87 | "fc1_weights = tf.Variable(tf.truncated_normal([IMAGE_SIZE // 4 * IMAGE_SIZE // 4 * 64, 512], stddev=0.1, seed = 2))\n",
88 | "fc1_biases = tf.Variable(tf.constant(0.1, shape=[512]))\n",
89 | "\n",
90 | "fc2_weights = tf.Variable(tf.truncated_normal([512, NUM_LABELS], stddev=0.1, seed = 2))\n",
91 | "fc2_biases = tf.Variable(tf.constant(0.1, shape=[NUM_LABELS]))\n",
92 | "\n",
93 | "def model(data, train=False):\n",
94 | " \"\"\"The Model definition.\"\"\"\n",
95 | " # 2D convolution, with 'SAME' padding (i.e. the output feature map has\n",
96 | " # the same size as the input). Note that {strides} is a 4D array whose\n",
97 | " # shape matches the data layout: [image index, y, x, depth].\n",
98 | " conv = tf.nn.conv2d(data, conv1_weights, strides=[1, 1, 1, 1], padding='SAME')\n",
99 | " # Bias and rectified linear non-linearity.\n",
100 | " relu = tf.nn.relu(tf.nn.bias_add(conv, conv1_biases))\n",
101 | " # Max pooling. The kernel size spec {ksize} also follows the layout of\n",
102 | " # the data. Here we have a pooling window of 2, and a stride of 2.\n",
103 | " pool = tf.nn.max_pool(relu, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')\n",
104 | " \n",
105 | " conv1 = tf.nn.conv2d(pool, conv2_weights, strides=[1, 1, 1, 1], padding='SAME')\n",
106 | " relu1 = tf.nn.relu(tf.nn.bias_add(conv1, conv2_biases))\n",
107 | " pool1 = tf.nn.max_pool(relu1, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], padding='SAME')\n",
108 | " \n",
109 | " # Reshape the feature map into a 2D matrix to feed it to the fully connected layers.\n",
110 | " pool_shape = pool1.get_shape().as_list()\n",
111 | " reshape = tf.reshape(pool1, [-1, pool_shape[1] * pool_shape[2] * pool_shape[3]])\n",
112 | " \n",
113 | " # Fully connected layer. Note that the '+' operation automatically broadcasts the biases.\n",
114 | " hidden = tf.nn.relu(tf.matmul(reshape, fc1_weights) + fc1_biases)\n",
115 | " # Add a 50% dropout during training only. Dropout also scales\n",
116 | " # activations such that no rescaling is needed at evaluation time.\n",
117 | " if train: hidden = tf.nn.dropout(hidden, 0.5)\n",
118 | " return tf.nn.softmax(tf.matmul(hidden, fc2_weights) + fc2_biases)\n",
119 | "\n",
120 | "print(\"Network created!\")"
121 | ]
122 | },
123 | {
124 | "cell_type": "markdown",
125 | "metadata": {},
126 | "source": [
127 | "### Step 3: 指定训练过程"
128 | ]
129 | },
130 | {
131 | "cell_type": "code",
132 | "execution_count": 3,
133 | "metadata": {
134 | "collapsed": false
135 | },
136 | "outputs": [
137 | {
138 | "name": "stdout",
139 | "output_type": "stream",
140 | "text": [
141 | "Training & eval step setup!\n"
142 | ]
143 | }
144 | ],
145 | "source": [
146 | "# Predictions for the current training minibatch.\n",
147 | "train_y = model(x, True)\n",
148 | "\n",
149 | "# softmax cross entropy loss.\n",
150 | "loss = -tf.reduce_mean(y_ * tf.log(tf.clip_by_value(train_y, 1e-10, 1.0)))\n",
151 | "# L2 regularization for the fully connected parameters.\n",
152 | "regularizers = (tf.nn.l2_loss(fc1_weights) + tf.nn.l2_loss(fc1_biases) + \n",
153 | " tf.nn.l2_loss(fc2_weights) + tf.nn.l2_loss(fc2_biases))\n",
154 | "# Add the regularization term to the loss.\n",
155 | "loss += 5e-4 * regularizers\n",
156 | "\n",
157 | "# Optimizer: set up a variable that's incremented once per batch and\n",
158 | "# controls the learning rate decay.\n",
159 | "step = tf.Variable(0)\n",
160 | "\n",
161 | "# Decay once per epoch, using an exponential schedule starting at 0.01.\n",
162 | "learning_rate = tf.train.exponential_decay(\n",
163 | " 0.01, # Base learning rate.\n",
164 | " step * BATCH_SIZE, # Current index into the dataset.\n",
165 | " mnist.train.num_examples, # Decay step.\n",
166 | " 0.95, # Decay rate.\n",
167 | " staircase=True)\n",
168 | "\n",
169 | "# Use simple momentum for the optimization.\n",
170 | "optimizer = tf.train.MomentumOptimizer(learning_rate, 0.9).minimize(loss, global_step=step)\n",
171 | "\n",
172 | "# Training accuracy\n",
173 | "train_correct_prediction = tf.equal(tf.argmax(y_, 1), tf.argmax(train_y, 1))\n",
174 | "train_accuracy = tf.reduce_mean(tf.cast(train_correct_prediction, tf.float32))\n",
175 | " \n",
176 | "# Predictions for the test and validation, which we'll compute less often.\n",
177 | "eval_y = model(x, False)\n",
178 | "eval_correct_prediction = tf.equal(tf.argmax(y_, 1), tf.argmax(eval_y, 1))\n",
179 | "eval_accuracy = tf.reduce_mean(tf.cast(eval_correct_prediction, tf.float32))\n",
180 | "\n",
181 | "print(\"Training & eval step setup!\")"
182 | ]
183 | },
184 | {
185 | "cell_type": "markdown",
186 | "metadata": {},
187 | "source": [
188 | "### Step 4: 训练模型"
189 | ]
190 | },
191 | {
192 | "cell_type": "code",
193 | "execution_count": 4,
194 | "metadata": {
195 | "collapsed": false
196 | },
197 | "outputs": [
198 | {
199 | "name": "stdout",
200 | "output_type": "stream",
201 | "text": [
202 | "Initialized!\n",
203 | "After 0 training step(s), validation accuracy = 0.1118, test accuracy = 0.1245\n",
204 | "After 100 training step(s), validation accuracy = 0.831, test accuracy = 0.8325\n",
205 | "After 200 training step(s), validation accuracy = 0.896, test accuracy = 0.8913\n",
206 | "After 300 training step(s), validation accuracy = 0.9232, test accuracy = 0.9247\n",
207 | "After 400 training step(s), validation accuracy = 0.9318, test accuracy = 0.9332\n",
208 | "Final accuracy = 0.9396\n"
209 | ]
210 | }
211 | ],
212 | "source": [
213 | "import numpy\n",
214 | "\n",
215 | "# Create a local session to run the training.\n",
216 | "start_time = time.time()\n",
217 | "ROUNDS = 500\n",
218 | "\n",
219 | "reshaped_test_data = numpy.reshape(mnist.test.images, [-1, 28, 28, 1])\n",
220 | "test_label = mnist.test.labels\n",
221 | "reshaped_validate_data = numpy.reshape(mnist.validation.images, [-1, 28, 28, 1])\n",
222 | "validate_label = mnist.validation.labels\n",
223 | "\n",
224 | "with tf.Session() as sess:\n",
225 | " # Run all the initializers to prepare the trainable parameters.\n",
226 | " tf.initialize_all_variables().run()\n",
227 | " print('Initialized!')\n",
228 | " # Loop through training steps.\n",
229 | " for i in range(ROUNDS):\n",
230 | " # Run the graph and fetch some of the nodes.\n",
231 | " xs, ys = mnist.train.next_batch(BATCH_SIZE)\n",
232 | " reshaped_x = numpy.reshape(xs, [BATCH_SIZE, 28, 28, 1])\n",
233 | " sess.run(optimizer, feed_dict={x: reshaped_x, y_: ys})\n",
234 | " \n",
235 | " if i % 100 == 0:\n",
236 | " elapsed_time = time.time() - start_time\n",
237 | " start_time = time.time()\n",
238 | "\n",
239 | " validate_acc = sess.run(eval_accuracy, feed_dict={x: reshaped_validate_data, y_:validate_label})\n",
240 | " test_acc = sess.run(eval_accuracy, feed_dict={x: reshaped_test_data, y_:test_label})\n",
241 | " print(\"After %d training step(s), validation accuracy = %g, test accuracy = %g\" % \n",
242 | " (i, validate_acc, test_acc))\n",
243 | "\n",
244 | " test_acc = sess.run(eval_accuracy, feed_dict={x: reshaped_test_data, y_:test_label})\n",
245 | " print(\"Final accuracy = %g\" % (test_acc))"
246 | ]
247 | },
248 | {
249 | "cell_type": "code",
250 | "execution_count": null,
251 | "metadata": {
252 | "collapsed": true
253 | },
254 | "outputs": [],
255 | "source": []
256 | }
257 | ],
258 | "metadata": {
259 | "kernelspec": {
260 | "display_name": "Python 2",
261 | "language": "python",
262 | "name": "python2"
263 | },
264 | "language_info": {
265 | "codemirror_mode": {
266 | "name": "ipython",
267 | "version": 2
268 | },
269 | "file_extension": ".py",
270 | "mimetype": "text/x-python",
271 | "name": "python",
272 | "nbconvert_exporter": "python",
273 | "pygments_lexer": "ipython2",
274 | "version": "2.7.6"
275 | }
276 | },
277 | "nbformat": 4,
278 | "nbformat_minor": 0
279 | }
280 |
--------------------------------------------------------------------------------
/notebooks/mnist_tensorboard.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "### Step 1: 读取数据"
8 | ]
9 | },
10 | {
11 | "cell_type": "code",
12 | "execution_count": 1,
13 | "metadata": {
14 | "collapsed": false
15 | },
16 | "outputs": [
17 | {
18 | "name": "stdout",
19 | "output_type": "stream",
20 | "text": [
21 | "Extracting /tmp/data/train-images-idx3-ubyte.gz\n",
22 | "Extracting /tmp/data/train-labels-idx1-ubyte.gz\n",
23 | "Extracting /tmp/data/t10k-images-idx3-ubyte.gz\n",
24 | "Extracting /tmp/data/t10k-labels-idx1-ubyte.gz\n",
25 | "Training data size: 55000\n",
26 | "Validating data size: 5000\n",
27 | "Testing data size: 10000\n"
28 | ]
29 | }
30 | ],
31 | "source": [
32 | "import tensorflow as tf\n",
33 | "import time\n",
34 | "from tensorflow.examples.tutorials.mnist import input_data\n",
35 | "\n",
36 | "SUMMARY_DIR = \"/log/mnist-log\"\n",
37 | "if tf.gfile.Exists(SUMMARY_DIR):\n",
38 | " tf.gfile.DeleteRecursively(SUMMARY_DIR)\n",
39 | "tf.gfile.MakeDirs(SUMMARY_DIR)\n",
40 | "\n",
41 | "mnist = input_data.read_data_sets(\"/tmp/data\", one_hot=True)\n",
42 | "\n",
43 | "print \"Training data size: \", mnist.train.num_examples\n",
44 | "print \"Validating data size: \", mnist.validation.num_examples\n",
45 | "print \"Testing data size: \", mnist.test.num_examples"
46 | ]
47 | },
48 | {
49 | "cell_type": "markdown",
50 | "metadata": {},
51 | "source": [
52 | "### Step 2: 创建神经网络,并指定log信息"
53 | ]
54 | },
55 | {
56 | "cell_type": "code",
57 | "execution_count": 2,
58 | "metadata": {
59 | "collapsed": false
60 | },
61 | "outputs": [
62 | {
63 | "name": "stdout",
64 | "output_type": "stream",
65 | "text": [
66 | "Network created!\n"
67 | ]
68 | }
69 | ],
70 | "source": [
71 | "tf.reset_default_graph()\n",
72 | "sess = tf.InteractiveSession()\n",
73 | "\n",
74 | "def weight_variable(shape):\n",
75 | " \"\"\"Create a weight variable with appropriate initialization.\"\"\"\n",
76 | " initial = tf.truncated_normal(shape, stddev=0.1, seed = 2)\n",
77 | " return tf.Variable(initial)\n",
78 | "\n",
79 | "def bias_variable(shape):\n",
80 | " \"\"\"Create a bias variable with appropriate initialization.\"\"\"\n",
81 | " initial = tf.constant(0.1, shape=shape)\n",
82 | " return tf.Variable(initial)\n",
83 | "\n",
84 | "def variable_summaries(var, name):\n",
85 | " \"\"\"Attach a lot of summaries to a Tensor.\"\"\"\n",
86 | " with tf.name_scope('summaries'):\n",
87 | " mean = tf.reduce_mean(var)\n",
88 | " tf.scalar_summary('mean/' + name, mean)\n",
89 | " with tf.name_scope('stddev'):\n",
90 | " stddev = tf.sqrt(tf.reduce_sum(tf.square(var - mean)))\n",
91 | " tf.scalar_summary('sttdev/' + name, stddev)\n",
92 | " tf.scalar_summary('max/' + name, tf.reduce_max(var))\n",
93 | " tf.scalar_summary('min/' + name, tf.reduce_min(var))\n",
94 | " tf.histogram_summary(name, var)\n",
95 | "\n",
96 | "def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):\n",
97 | " \"\"\"Reusable code for making a simple neural net layer.\n",
98 | " It does a matrix multiply, bias add, and then uses relu to nonlinearize.\n",
99 | " It also sets up name scoping so that the resultant graph is easy to read, and\n",
100 | " adds a number of summary ops.\n",
101 | " \"\"\"\n",
102 | " # Adding a name scope ensures logical grouping of the layers in the graph.\n",
103 | " with tf.name_scope(layer_name):\n",
104 | " # This Variable will hold the state of the weights for the layer\n",
105 | " with tf.name_scope('weights'):\n",
106 | " weights = weight_variable([input_dim, output_dim])\n",
107 | " variable_summaries(weights, layer_name + '/weights')\n",
108 | " with tf.name_scope('biases'):\n",
109 | " biases = bias_variable([output_dim])\n",
110 | " variable_summaries(biases, layer_name + '/biases')\n",
111 | " with tf.name_scope('Wx_plus_b'):\n",
112 | " preactivate = tf.matmul(input_tensor, weights) + biases\n",
113 | " tf.histogram_summary(layer_name + '/pre_activations', preactivate)\n",
114 | " activations = act(preactivate, 'activation')\n",
115 | " tf.histogram_summary(layer_name + '/activations', activations)\n",
116 | " return activations\n",
117 | "\n",
118 | "# Create a multilayer model.\n",
119 | "with tf.name_scope('input'):\n",
120 | " x = tf.placeholder(tf.float32, [None, 784], name='x-input')\n",
121 | " image_shaped_input = tf.reshape(x, [-1, 28, 28, 1])\n",
122 | " tf.image_summary('input', image_shaped_input, 20)\n",
123 | " y_ = tf.placeholder(tf.float32, [None, 10], name='y-input')\n",
124 | "\n",
125 | "hidden_nodes = 500\n",
126 | "hidden1 = nn_layer(x, 784, hidden_nodes, 'layer1')\n",
127 | "y = nn_layer(hidden1, hidden_nodes, 10, 'layer2', act=tf.nn.softmax)\n",
128 | "print(\"Network created!\")"
129 | ]
130 | },
131 | {
132 | "cell_type": "markdown",
133 | "metadata": {},
134 | "source": [
135 | "### Step 3: 指定训练过程"
136 | ]
137 | },
138 | {
139 | "cell_type": "code",
140 | "execution_count": 3,
141 | "metadata": {
142 | "collapsed": false
143 | },
144 | "outputs": [
145 | {
146 | "name": "stdout",
147 | "output_type": "stream",
148 | "text": [
149 | "Training & eval step setup!\n"
150 | ]
151 | }
152 | ],
153 | "source": [
154 | "with tf.name_scope('cross_entropy'):\n",
155 | " diff = y_ * tf.log(tf.clip_by_value(y, 1e-10, 1.0))\n",
156 | " with tf.name_scope('total'):\n",
157 | " cross_entropy = -tf.reduce_mean(diff)\n",
158 | " tf.scalar_summary('cross entropy', cross_entropy)\n",
159 | "\n",
160 | "with tf.name_scope('train'):\n",
161 | " train_step = tf.train.AdamOptimizer(0.001).minimize(cross_entropy)\n",
162 | "\n",
163 | "with tf.name_scope('accuracy'):\n",
164 | " with tf.name_scope('correct_prediction'):\n",
165 | " correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))\n",
166 | " with tf.name_scope('accuracy'):\n",
167 | " accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))\n",
168 | " tf.scalar_summary('accuracy', accuracy)\n",
169 | "\n",
170 | "print(\"Training & eval step setup!\")"
171 | ]
172 | },
173 | {
174 | "cell_type": "markdown",
175 | "metadata": {},
176 | "source": [
177 | "### Step 4: 指定日志文件地址, 初始化所有变量"
178 | ]
179 | },
180 | {
181 | "cell_type": "code",
182 | "execution_count": 4,
183 | "metadata": {
184 | "collapsed": false
185 | },
186 | "outputs": [
187 | {
188 | "name": "stdout",
189 | "output_type": "stream",
190 | "text": [
191 | "Log init done!\n"
192 | ]
193 | }
194 | ],
195 | "source": [
196 | "# Merge all the summaries and write them out to SUMMARY_DIR\n",
197 | "merged = tf.merge_all_summaries()\n",
198 | "train_writer = tf.train.SummaryWriter(SUMMARY_DIR + '/train', sess.graph)\n",
199 | "test_writer = tf.train.SummaryWriter(SUMMARY_DIR + '/test')\n",
200 | "\n",
201 | "tf.initialize_all_variables().run()\n",
202 | "\n",
203 | "validate_feed = {x: mnist.validation.images, y_: mnist.validation.labels}\n",
204 | "test_feed = {x: mnist.test.images, y_: mnist.test.labels}\n",
205 | "print(\"Log init done!\")"
206 | ]
207 | },
208 | {
209 | "cell_type": "markdown",
210 | "metadata": {},
211 | "source": [
212 | "### Step 5: 训练模型"
213 | ]
214 | },
215 | {
216 | "cell_type": "code",
217 | "execution_count": 5,
218 | "metadata": {
219 | "collapsed": false
220 | },
221 | "outputs": [
222 | {
223 | "name": "stdout",
224 | "output_type": "stream",
225 | "text": [
226 | "Training begins @ 1466980873.216786\n",
227 | "After 0 training step(s), validation accuracy = 0.2188, test accuracy = 0.2096\n",
228 | "After 100 training step(s), validation accuracy = 0.9168, test accuracy = 0.9113\n",
229 | "After 200 training step(s), validation accuracy = 0.9382, test accuracy = 0.934\n",
230 | "After 300 training step(s), validation accuracy = 0.9528, test accuracy = 0.9483\n",
231 | "After 400 training step(s), validation accuracy = 0.9538, test accuracy = 0.9535\n",
232 | "Training ends @ 1466980891.376311\n",
233 | "Training elapsed time: 18.159525 s\n",
234 | "After 500 training step(s), test accuracy = 0.9557\n"
235 | ]
236 | }
237 | ],
238 | "source": [
239 | "def feed_dict(train):\n",
240 | " \"\"\"Make a TensorFlow feed_dict: maps data onto Tensor placeholders.\"\"\"\n",
241 | " if train:\n",
242 | " xs, ys = mnist.train.next_batch(100)\n",
243 | " else:\n",
244 | " xs, ys = mnist.test.images, mnist.test.labels\n",
245 | " return {x: xs, y_: ys}\n",
246 | "\n",
247 | "STEPS = 500\n",
248 | "saver = tf.train.Saver()\n",
249 | "time_begin = time.time()\n",
250 | "print(\"Training begins @ %f\" % time_begin)\n",
251 | "for i in range(STEPS):\n",
252 | " _, summary = sess.run([train_step, merged], feed_dict=feed_dict(True))\n",
253 | "\n",
254 | " if i % 100 == 0:\n",
255 | " # Write summary\n",
256 | " train_writer.add_summary(summary, i)\n",
257 | " \n",
258 | " summary = sess.run(merged, feed_dict=feed_dict(False))\n",
259 | " test_writer.add_summary(summary, i) \n",
260 | " \n",
261 | " # Print training information.\n",
262 | " validate_acc = sess.run(accuracy, feed_dict=validate_feed)\n",
263 | " test_acc = sess.run(accuracy, feed_dict=test_feed)\n",
264 | " print(\"After %d training step(s), validation accuracy = %g, test accuracy = %g\" % \n",
265 | " (i, validate_acc, test_acc))\n",
266 | " \n",
267 | " # Store model.\n",
268 | " if i == 300: saver.save(sess, \"/tmp/saved_model\")\n",
269 | "\n",
270 | " \n",
271 | "time_end = time.time()\n",
272 | "print(\"Training ends @ %f\" % time_end)\n",
273 | "training_time = time_end - time_begin\n",
274 | "print(\"Training elapsed time: %f s\" % training_time)\n",
275 | "test_acc = sess.run(accuracy, feed_dict=test_feed)\n",
276 | "print(\"After %d training step(s), test accuracy = %g\" % (STEPS, test_acc))"
277 | ]
278 | },
279 | {
280 | "cell_type": "code",
281 | "execution_count": 6,
282 | "metadata": {
283 | "collapsed": false
284 | },
285 | "outputs": [
286 | {
287 | "name": "stdout",
288 | "output_type": "stream",
289 | "text": [
290 | "Test accuracy for stored model = 0.9483\n"
291 | ]
292 | }
293 | ],
294 | "source": [
295 | "saver.restore(sess, \"/tmp/saved_model\")\n",
296 | "test_acc = sess.run(accuracy, feed_dict=test_feed)\n",
297 | "print(\"Test accuracy for stored model = %g\" % (test_acc))\n",
298 | "\n",
299 | "sess.close()"
300 | ]
301 | },
302 | {
303 | "cell_type": "code",
304 | "execution_count": null,
305 | "metadata": {
306 | "collapsed": true
307 | },
308 | "outputs": [],
309 | "source": []
310 | }
311 | ],
312 | "metadata": {
313 | "kernelspec": {
314 | "display_name": "Python 2",
315 | "language": "python",
316 | "name": "python2"
317 | },
318 | "language_info": {
319 | "codemirror_mode": {
320 | "name": "ipython",
321 | "version": 2
322 | },
323 | "file_extension": ".py",
324 | "mimetype": "text/x-python",
325 | "name": "python",
326 | "nbconvert_exporter": "python",
327 | "pygments_lexer": "ipython2",
328 | "version": "2.7.6"
329 | }
330 | },
331 | "nbformat": 4,
332 | "nbformat_minor": 0
333 | }
334 |
--------------------------------------------------------------------------------
/notebooks/scope.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {
7 | "collapsed": false
8 | },
9 | "outputs": [
10 | {
11 | "name": "stdout",
12 | "output_type": "stream",
13 | "text": [
14 | "a/b/Variable:0\n",
15 | "a/b/b:0\n"
16 | ]
17 | }
18 | ],
19 | "source": [
20 | "import tensorflow as tf\n",
21 | "\n",
22 | "with tf.name_scope(\"a\"):\n",
23 | " with tf.name_scope(\"b\"):\n",
24 | " a = tf.Variable([1]) \n",
25 | " print a.name\n",
26 | " b = tf.Variable([1], name=\"b\")\n",
27 | " print b.name"
28 | ]
29 | },
30 | {
31 | "cell_type": "code",
32 | "execution_count": 2,
33 | "metadata": {
34 | "collapsed": false
35 | },
36 | "outputs": [
37 | {
38 | "name": "stdout",
39 | "output_type": "stream",
40 | "text": [
41 | "a_1/b/Variable:0\n",
42 | "a_1/b/b:0\n"
43 | ]
44 | }
45 | ],
46 | "source": [
47 | "with tf.variable_scope(\"a\"):\n",
48 | " with tf.variable_scope(\"b\"):\n",
49 | " a = tf.Variable([1]) \n",
50 | " print a.name\n",
51 | " b = tf.Variable([1], name=\"b\")\n",
52 | " print b.name"
53 | ]
54 | },
55 | {
56 | "cell_type": "code",
57 | "execution_count": 3,
58 | "metadata": {
59 | "collapsed": false
60 | },
61 | "outputs": [
62 | {
63 | "name": "stdout",
64 | "output_type": "stream",
65 | "text": [
66 | "a_2/b/Variable:0\n",
67 | "a_2/b/b:0\n",
68 | "a/b/b_1:0\n",
69 | "a/b/b_1:0\n"
70 | ]
71 | }
72 | ],
73 | "source": [
74 | "with tf.variable_scope(\"a\"):\n",
75 | " with tf.variable_scope(\"b\"):\n",
76 | " a = tf.Variable([1]) \n",
77 | " print a.name\n",
78 | " b = tf.Variable([1], name=\"b\")\n",
79 | " print b.name\n",
80 | " v = tf.get_variable(\"b\", [1])\n",
81 | " print v.name\n",
82 | " with tf.variable_scope(\"b\", reuse=True):\n",
83 | " v1 = tf.get_variable(\"b\", [1])\n",
84 | " print v1.name "
85 | ]
86 | },
87 | {
88 | "cell_type": "code",
89 | "execution_count": 4,
90 | "metadata": {
91 | "collapsed": false
92 | },
93 | "outputs": [
94 | {
95 | "name": "stdout",
96 | "output_type": "stream",
97 | "text": [
98 | "foo/bar/v:0\n",
99 | "foo/a/bar/Variable:0\n",
100 | "foo/a/bar/Const:0\n"
101 | ]
102 | }
103 | ],
104 | "source": [
105 | "with tf.variable_scope(\"foo\"):\n",
106 | " with tf.name_scope(\"a\"):\n",
107 | " with tf.variable_scope(\"bar\"):\n",
108 | " v = tf.get_variable(\"v\", [1])\n",
109 | " print v.name\n",
110 | " \n",
111 | " v1 = tf.Variable([1]) \n",
112 | " print v1.name\n",
113 | "\n",
114 | " c = tf.constant(10.0)\n",
115 | " print c.name"
116 | ]
117 | }
118 | ],
119 | "metadata": {
120 | "kernelspec": {
121 | "display_name": "Python 2",
122 | "language": "python",
123 | "name": "python2"
124 | },
125 | "language_info": {
126 | "codemirror_mode": {
127 | "name": "ipython",
128 | "version": 2
129 | },
130 | "file_extension": ".py",
131 | "mimetype": "text/x-python",
132 | "name": "python",
133 | "nbconvert_exporter": "python",
134 | "pygments_lexer": "ipython2",
135 | "version": "2.7.6"
136 | }
137 | },
138 | "nbformat": 4,
139 | "nbformat_minor": 0
140 | }
141 |
--------------------------------------------------------------------------------
/picture/create_local.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/caicloud/tensorflow-demo/87460710b5fc9c829de369013c0008a9322b6c4f/picture/create_local.png
--------------------------------------------------------------------------------
/picture/create_terminal.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/caicloud/tensorflow-demo/87460710b5fc9c829de369013c0008a9322b6c4f/picture/create_terminal.png
--------------------------------------------------------------------------------
/picture/dist_creation.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/caicloud/tensorflow-demo/87460710b5fc9c829de369013c0008a9322b6c4f/picture/dist_creation.png
--------------------------------------------------------------------------------
/picture/expanded_view.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/caicloud/tensorflow-demo/87460710b5fc9c829de369013c0008a9322b6c4f/picture/expanded_view.png
--------------------------------------------------------------------------------
/picture/homepage.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/caicloud/tensorflow-demo/87460710b5fc9c829de369013c0008a9322b6c4f/picture/homepage.png
--------------------------------------------------------------------------------
/picture/jupyter.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/caicloud/tensorflow-demo/87460710b5fc9c829de369013c0008a9322b6c4f/picture/jupyter.png
--------------------------------------------------------------------------------
/picture/list_view.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/caicloud/tensorflow-demo/87460710b5fc9c829de369013c0008a9322b6c4f/picture/list_view.png
--------------------------------------------------------------------------------
/picture/terminal_view.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/caicloud/tensorflow-demo/87460710b5fc9c829de369013c0008a9322b6c4f/picture/terminal_view.png
--------------------------------------------------------------------------------
/run_tf.sh:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env bash
2 |
3 | cd /
4 | # run file server.
5 | python /file_server.py &
6 |
7 | cd notebooks
8 | # run tensorboard
9 | tensorboard --logdir=/log &
10 |
11 | # run jupyter
12 | bash /run_jupyter.sh
13 |
--------------------------------------------------------------------------------
/word2vector/BUILD:
--------------------------------------------------------------------------------
1 | # Description:
2 | # TensorFlow model for word2vec
3 |
4 | package(default_visibility = ["//tensorflow:internal"])
5 |
6 | licenses(["notice"]) # Apache 2.0
7 |
8 | exports_files(["LICENSE"])
9 |
10 | load("//tensorflow:tensorflow.bzl", "tf_gen_op_wrapper_py")
11 |
12 | py_library(
13 | name = "package",
14 | srcs = [
15 | "__init__.py",
16 | ],
17 | srcs_version = "PY2AND3",
18 | visibility = ["//tensorflow:__subpackages__"],
19 | deps = [
20 | ":gen_word2vec",
21 | ":word2vec",
22 | ":word2vec_optimized",
23 | ],
24 | )
25 |
26 | py_binary(
27 | name = "word2vec",
28 | srcs = [
29 | "word2vec.py",
30 | ],
31 | srcs_version = "PY2AND3",
32 | deps = [
33 | ":gen_word2vec",
34 | "//tensorflow:tensorflow_py",
35 | "//tensorflow/python:platform",
36 | ],
37 | )
38 |
39 | py_binary(
40 | name = "word2vec_optimized",
41 | srcs = [
42 | "word2vec_optimized.py",
43 | ],
44 | srcs_version = "PY2AND3",
45 | deps = [
46 | ":gen_word2vec",
47 | "//tensorflow:tensorflow_py",
48 | "//tensorflow/python:platform",
49 | ],
50 | )
51 |
52 | cc_library(
53 | name = "word2vec_ops",
54 | srcs = [
55 | "word2vec_ops.cc",
56 | ],
57 | linkstatic = 1,
58 | visibility = ["//tensorflow:internal"],
59 | deps = [
60 | "//tensorflow/core:framework",
61 | ],
62 | alwayslink = 1,
63 | )
64 |
65 | cc_library(
66 | name = "word2vec_kernels",
67 | srcs = [
68 | "word2vec_kernels.cc",
69 | ],
70 | linkstatic = 1,
71 | visibility = ["//tensorflow:internal"],
72 | deps = [
73 | ":word2vec_ops",
74 | "//tensorflow/core",
75 | ],
76 | alwayslink = 1,
77 | )
78 |
79 | tf_gen_op_wrapper_py(
80 | name = "gen_word2vec",
81 | out = "gen_word2vec.py",
82 | deps = [":word2vec_ops"],
83 | )
84 |
85 | filegroup(
86 | name = "all_files",
87 | srcs = glob(
88 | ["**/*"],
89 | exclude = [
90 | "**/METADATA",
91 | "**/OWNERS",
92 | ],
93 | ),
94 | visibility = ["//tensorflow:__subpackages__"],
95 | )
96 |
--------------------------------------------------------------------------------
/word2vector/README.md:
--------------------------------------------------------------------------------
1 | # Vector Representations of Words
2 | ## 背景介绍
3 | 传统的自然语言处理一般使用Bag-of-words模型,把每个单词当成一个符号。比如"cat"用Id123表示,"kitty"用Id456表示。用这样的方式表达单词的一个最大坏处是它忽略了单词之间的语义关系。同时Bag-of-words模型也会导致特征矩阵过于稀疏的问题。用向量来表示一个单词(word to vector, embedding)就可以从一定程度上解决这些问题。具体的Word2Vec的背景,方法和应用在[这篇文章](http://papers.nips.cc/paper/5021-distributed-representations-of-words-and-phrases-and-their-compositionality.pdf)中都有详述,这里我们就不再赘述。下面我们需要介绍如何将Word2Vec算法在Tensorflow上跑起来以及Word2Vec的一个小应用。
4 |
5 | ## 基础版Word2Vec
6 | ```
7 | python word2vec_basic.py
8 | ```
9 |
10 | ## I/O速度提高版
11 | 如果已经运行过基础版Word2Vec,那么训练数据已经被下载下来了,否则可以通过下面命令下载数据:
12 | ```
13 | wget http://mattmahoney.net/dc/text8.zip
14 | ```
15 |
16 | 解压准备好的训练数据:
17 | ```
18 | unzip text8.zip
19 | ```
20 |
21 | 通过运行训练程序:
22 | ```
23 | python word2vec.py --train_data=text8 --eval_data=questions-words.txt --save_path=/tmp
24 | ```
25 |
26 | 单机环境下,这个程序可能需要运行10几个小时。
27 |
28 | ## 训练速度提高版
29 | 如果没有准备数据,那么可以通过上述方法下载数据,数据准备好之后运行:
30 | ```
31 | python word2vec_optimized.py --train_data=text8 --eval_data=questions-words.txt --save_path=/tmp
32 | ```
33 | 相比上面的模型,这个方法可以加速~15-20倍。
34 |
35 |
36 | ## 实现单词加减法
37 | #### 使用上面训练出来的向量
38 | 上面几个程序都没有输出最后每个单词得到的向量,如果想要使用上述结果,需要输出每个单词对应的向量,格式如下:
39 | ```
40 | 单词1 向量1
41 | 单词2 向量2
42 | ...
43 | 单词n 向量n
44 | ```
45 |
46 | 其中单词存在```self._options.vocab_words```中,每个单词对应的embedding在```self._emb``` (word2vec.py),```self._w_in``` (word2vec_optimized.py)中。
47 |
48 | #### 使用已经训练好的向量
49 | 网上有很多已经训练好的Word2Vec模型,其中stanford nlp实验室的[GloVe](http://nlp.stanford.edu/projects/glove/)就提供了不少模型。可以通过下述命令直接下载:
50 | ```
51 | wget http://nlp.stanford.edu/data/glove.6B.zip
52 | unzip glove.6B.zip
53 | ```
54 |
55 | #### 运行单词计算器
56 |
57 |
--------------------------------------------------------------------------------
/word2vector/__init__.py:
--------------------------------------------------------------------------------
1 | # Copyright 2015 Google Inc. All Rights Reserved.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | # ==============================================================================
15 |
16 | """Import generated word2vec optimized ops into embedding package."""
17 | from __future__ import absolute_import
18 | from __future__ import division
19 | from __future__ import print_function
20 |
21 | from tensorflow.models.embedding import gen_word2vec
22 |
--------------------------------------------------------------------------------
/word2vector/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
24 |
25 |
26 | - + =
27 |
28 |
29 |
30 |
31 |
32 |
--------------------------------------------------------------------------------
/word2vector/word2vec_basic.py:
--------------------------------------------------------------------------------
1 | import collections
2 | import math
3 | import os
4 | import random
5 | import zipfile
6 |
7 | import numpy as np
8 | from six.moves import urllib
9 | from six.moves import xrange
10 |
11 | # Step1: prepare data.
12 | url = 'http://mattmahoney.net/dc/'
13 |
14 | def download(filename):
15 | print('Starting downloading data ...')
16 | filename, _ = urllib.request.urlretrieve(url + filename, filename)
17 | print('Data downloaded!')
18 | return filename
19 |
20 | def maybe_download(filename, expected_bytes):
21 | """Download a file if not present or size is incorrect."""
22 | if not os.path.exists(filename):
23 | filename = download(filename)
24 | else:
25 | if os.stat(filename).st_size != expected_bytes:
26 | os.remove(filename)
27 | filename = download(filename)
28 |
29 | statinfo = os.stat(filename)
30 | if statinfo.st_size == expected_bytes:
31 | print('Found and verified', filename)
32 | else:
33 | print(statinfo.st_size)
34 | raise Exception('Failed to verify ' + filename + '. Can you get to it with a browser?')
35 | return filename
36 |
37 | filename = maybe_download('text8.zip', 31344016)
38 | print("Data prepared!")
39 |
40 | # Step2: split words.
41 | def read_data(filename):
42 | """Extract the first file enclosed in a zip file as a list of words"""
43 | with zipfile.ZipFile(filename) as f:
44 | data = f.read(f.namelist()[0]).split()
45 | return data
46 |
47 | words = read_data(filename)
48 | print('Data size', len(words))
49 |
50 | # Step3: Build dictionary.
51 | vocabulary_size = 50000
52 | def build_dataset(words):
53 | count = [['UNK', -1]]
54 | count.extend(collections.Counter(words).most_common(vocabulary_size - 1))
55 | dictionary = dict()
56 | for word, _ in count:
57 | dictionary[word] = len(dictionary)
58 | data = list()
59 | unk_count = 0
60 | for word in words:
61 | if word in dictionary:
62 | index = dictionary[word]
63 | else:
64 | index = 0 # dictionary['UNK']
65 | unk_count += 1
66 | data.append(index)
67 | count[0][1] = unk_count
68 | reverse_dictionary = dict(zip(dictionary.values(), dictionary.keys()))
69 | return data, count, dictionary, reverse_dictionary
70 |
71 | data, count, dictionary, reverse_dictionary = build_dataset(words)
72 | del words # Hint to reduce memory.
73 | print('Most common words (+UNK)', count[:5])
74 | print('Sample data', data[:10])
75 |
76 | # Step4: function to generate training batch
77 | data_index = 0
78 |
79 | def generate_batch(batch_size, num_skips, skip_window):
80 | global data_index
81 | assert batch_size % num_skips == 0
82 | assert num_skips <= 2 * skip_window
83 | batch = np.ndarray(shape=(batch_size), dtype=np.int32)
84 | labels = np.ndarray(shape=(batch_size, 1), dtype=np.int32)
85 | span = 2 * skip_window + 1 # [ skip_window target skip_window ]
86 | buffer = collections.deque(maxlen=span)
87 | for _ in range(span):
88 | buffer.append(data[data_index])
89 | data_index = (data_index + 1) % len(data)
90 | for i in range(batch_size // num_skips):
91 | target = skip_window # target label at the center of the buffer
92 | targets_to_avoid = [ skip_window ]
93 | for j in range(num_skips):
94 | while target in targets_to_avoid:
95 | target = random.randint(0, span - 1)
96 | targets_to_avoid.append(target)
97 | batch[i * num_skips + j] = buffer[skip_window]
98 | labels[i * num_skips + j, 0] = buffer[target]
99 | buffer.append(data[data_index])
100 | data_index = (data_index + 1) % len(data)
101 | return batch, labels
102 |
103 | print(generate_batch(4, 2, 1))
104 | print(generate_batch(6, 3, 2))
105 |
106 |
107 | # Step5: Build graph
108 | import tensorflow as tf
109 |
110 | # Training input data.
111 | batch_size = 128
112 | train_inputs = tf.placeholder(tf.int32, shape=[batch_size])
113 | train_labels = tf.placeholder(tf.int32, shape=[batch_size, 1])
114 |
115 | # Validate input data.
116 | valid_size = 8 # Random set of words to evaluate similarity on.
117 | valid_window = 100 # Only pick dev samples in the head of the distribution.
118 | valid_examples = np.random.choice(valid_window, valid_size, replace=False)
119 | valid_dataset = tf.constant(valid_examples, dtype=tf.int32)
120 |
121 | # Look up embeddings for inputs.
122 | embedding_size = 128 # Dimension of the embedding vector.
123 | embeddings = tf.Variable(tf.random_uniform([vocabulary_size, embedding_size], -1.0, 1.0))
124 | embed = tf.nn.embedding_lookup(embeddings, train_inputs)
125 |
126 | # Construct the variables for the NCE loss
127 | nce_weights = tf.Variable(
128 | tf.truncated_normal([vocabulary_size, embedding_size],
129 | stddev=1.0 / math.sqrt(embedding_size)))
130 | nce_biases = tf.Variable(tf.zeros([vocabulary_size]))
131 |
132 | # Compute the average NCE loss for the batch.
133 | # tf.nce_loss automatically draws a new sample of the negative labels each
134 | # time we evaluate the loss.
135 | num_sampled = 64 # Number of negative examples to sample.
136 | loss = tf.reduce_mean(tf.nn.nce_loss(
137 | nce_weights, nce_biases, embed, train_labels, num_sampled, vocabulary_size))
138 |
139 | # Construct the SGD optimizer using a learning rate of 1.0.
140 | optimizer = tf.train.GradientDescentOptimizer(1.0).minimize(loss)
141 |
142 | # Compute the cosine similarity between minibatch examples and all embeddings.
143 | norm = tf.sqrt(tf.reduce_sum(tf.square(embeddings), 1, keep_dims=True))
144 | normalized_embeddings = embeddings / norm
145 | valid_embeddings = tf.nn.embedding_lookup(normalized_embeddings, valid_dataset)
146 | similarity = tf.matmul(valid_embeddings, normalized_embeddings, transpose_b=True)
147 |
148 | sess = tf.InteractiveSession()
149 | tf.initialize_all_variables().run()
150 |
151 | # Output evaluation result
152 | def print_eval_result():
153 | sim = similarity.eval()
154 | for i in xrange(valid_size):
155 | valid_word = reverse_dictionary[valid_examples[i]]
156 | top_k = 8 # number of nearest neighbors
157 | nearest = (-sim[i, :]).argsort()[1:top_k + 1]
158 | log_str = "Nearest to %s:" % valid_word
159 | for k in xrange(top_k):
160 | close_word = reverse_dictionary[nearest[k]]
161 | log_str = "%s %s," % (log_str, close_word)
162 | print(log_str)
163 |
164 | print_eval_result()
165 |
166 | # Step6: train model.
167 | num_steps = 100001
168 | average_loss = 0
169 | skip_window = 1 # How many words to consider left and right.
170 | num_skips = 2 # How many times to reuse an input to generate a label.
171 | for step in xrange(num_steps):
172 | batch_inputs, batch_labels = generate_batch(batch_size, num_skips, skip_window)
173 | feed_dict = {train_inputs : batch_inputs, train_labels : batch_labels}
174 |
175 | # We perform one update step by evaluating the optimizer op (including it
176 | # in the list of returned values for session.run()
177 | _, loss_val = sess.run([optimizer, loss], feed_dict=feed_dict)
178 | average_loss += loss_val
179 |
180 | if step % 5000 == 0:
181 | if step > 0: average_loss /= 5000
182 | # The average loss is an estimate of the loss over the last 2000 batches.
183 | print("Average loss at step ", step, ": ", average_loss)
184 | average_loss = 0
185 |
186 | if step % 20000 == 0:
187 | print("Validate evaluation result at at step ", step)
188 | print_eval_result()
189 |
190 | final_embeddings = normalized_embeddings.eval()
191 | sess.close()
192 |
193 | # Step 7: Visualize the embeddings.
194 | def plot_with_labels(low_dim_embs, labels, filename='tsne.png'):
195 | print(low_dim_embs.shape)
196 | assert low_dim_embs.shape[0] >= len(labels), "More labels than embeddings"
197 | plt.figure(figsize=(18, 18)) #in inches
198 | for i, label in enumerate(labels):
199 | x, y = low_dim_embs[i,:]
200 | plt.scatter(x, y)
201 | plt.annotate(label,
202 | xy=(x, y),
203 | xytext=(5, 2),
204 | textcoords='offset points',
205 | ha='right',
206 | va='bottom')
207 | plt.show()
208 | plt.savefig(filename)
209 |
210 | try:
211 | from sklearn.manifold import TSNE
212 | import matplotlib.pyplot as plt
213 |
214 | tsne = TSNE(perplexity=30, n_components=2, init='pca', n_iter=5000)
215 | plot_only = 500
216 | low_dim_embs = tsne.fit_transform(final_embeddings[:plot_only,:])
217 | labels = [reverse_dictionary[i] for i in xrange(plot_only)]
218 | plot_with_labels(low_dim_embs, labels)
219 |
220 | except ImportError:
221 | print("Please install sklearn and matplotlib to visualize embeddings.")
222 |
--------------------------------------------------------------------------------
/word2vector/word2vec_kernels.cc:
--------------------------------------------------------------------------------
1 | /* Copyright 2015 Google Inc. All Rights Reserved.
2 |
3 | Licensed under the Apache License, Version 2.0 (the "License");
4 | you may not use this file except in compliance with the License.
5 | You may obtain a copy of the License at
6 |
7 | http://www.apache.org/licenses/LICENSE-2.0
8 |
9 | Unless required by applicable law or agreed to in writing, software
10 | distributed under the License is distributed on an "AS IS" BASIS,
11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | See the License for the specific language governing permissions and
13 | limitations under the License.
14 | ==============================================================================*/
15 |
16 | #include "tensorflow/core/framework/op.h"
17 | #include "tensorflow/core/framework/op_kernel.h"
18 | #include "tensorflow/core/lib/core/stringpiece.h"
19 | #include "tensorflow/core/lib/gtl/map_util.h"
20 | #include "tensorflow/core/lib/random/distribution_sampler.h"
21 | #include "tensorflow/core/lib/random/philox_random.h"
22 | #include "tensorflow/core/lib/random/simple_philox.h"
23 | #include "tensorflow/core/lib/strings/str_util.h"
24 | #include "tensorflow/core/platform/thread_annotations.h"
25 | #include "tensorflow/core/util/guarded_philox_random.h"
26 |
27 | namespace tensorflow {
28 |
29 | // Number of examples to precalculate.
30 | const int kPrecalc = 3000;
31 | // Number of words to read into a sentence before processing.
32 | const int kSentenceSize = 1000;
33 |
34 | namespace {
35 |
36 | bool ScanWord(StringPiece* input, string* word) {
37 | str_util::RemoveLeadingWhitespace(input);
38 | StringPiece tmp;
39 | if (str_util::ConsumeNonWhitespace(input, &tmp)) {
40 | word->assign(tmp.data(), tmp.size());
41 | return true;
42 | } else {
43 | return false;
44 | }
45 | }
46 |
47 | } // end namespace
48 |
49 | class SkipgramOp : public OpKernel {
50 | public:
51 | explicit SkipgramOp(OpKernelConstruction* ctx)
52 | : OpKernel(ctx), rng_(&philox_) {
53 | string filename;
54 | OP_REQUIRES_OK(ctx, ctx->GetAttr("filename", &filename));
55 | OP_REQUIRES_OK(ctx, ctx->GetAttr("batch_size", &batch_size_));
56 | OP_REQUIRES_OK(ctx, ctx->GetAttr("window_size", &window_size_));
57 | OP_REQUIRES_OK(ctx, ctx->GetAttr("min_count", &min_count_));
58 | OP_REQUIRES_OK(ctx, ctx->GetAttr("subsample", &subsample_));
59 | OP_REQUIRES_OK(ctx, Init(ctx->env(), filename));
60 |
61 | mutex_lock l(mu_);
62 | example_pos_ = corpus_size_;
63 | label_pos_ = corpus_size_;
64 | label_limit_ = corpus_size_;
65 | sentence_index_ = kSentenceSize;
66 | for (int i = 0; i < kPrecalc; ++i) {
67 | NextExample(&precalc_examples_[i].input, &precalc_examples_[i].label);
68 | }
69 | }
70 |
71 | void Compute(OpKernelContext* ctx) override {
72 | Tensor words_per_epoch(DT_INT64, TensorShape({}));
73 | Tensor current_epoch(DT_INT32, TensorShape({}));
74 | Tensor total_words_processed(DT_INT64, TensorShape({}));
75 | Tensor examples(DT_INT32, TensorShape({batch_size_}));
76 | auto Texamples = examples.flat();
77 | Tensor labels(DT_INT32, TensorShape({batch_size_}));
78 | auto Tlabels = labels.flat();
79 | {
80 | mutex_lock l(mu_);
81 | for (int i = 0; i < batch_size_; ++i) {
82 | Texamples(i) = precalc_examples_[precalc_index_].input;
83 | Tlabels(i) = precalc_examples_[precalc_index_].label;
84 | precalc_index_++;
85 | if (precalc_index_ >= kPrecalc) {
86 | precalc_index_ = 0;
87 | for (int j = 0; j < kPrecalc; ++j) {
88 | NextExample(&precalc_examples_[j].input,
89 | &precalc_examples_[j].label);
90 | }
91 | }
92 | }
93 | words_per_epoch.scalar()() = corpus_size_;
94 | current_epoch.scalar()() = current_epoch_;
95 | total_words_processed.scalar()() = total_words_processed_;
96 | }
97 | ctx->set_output(0, word_);
98 | ctx->set_output(1, freq_);
99 | ctx->set_output(2, words_per_epoch);
100 | ctx->set_output(3, current_epoch);
101 | ctx->set_output(4, total_words_processed);
102 | ctx->set_output(5, examples);
103 | ctx->set_output(6, labels);
104 | }
105 |
106 | private:
107 | struct Example {
108 | int32 input;
109 | int32 label;
110 | };
111 |
112 | int32 batch_size_ = 0;
113 | int32 window_size_ = 5;
114 | float subsample_ = 1e-3;
115 | int min_count_ = 5;
116 | int32 vocab_size_ = 0;
117 | Tensor word_;
118 | Tensor freq_;
119 | int32 corpus_size_ = 0;
120 | std::vector corpus_;
121 | std::vector precalc_examples_;
122 | int precalc_index_ = 0;
123 | std::vector sentence_;
124 | int sentence_index_ = 0;
125 |
126 | mutex mu_;
127 | random::PhiloxRandom philox_ GUARDED_BY(mu_);
128 | random::SimplePhilox rng_ GUARDED_BY(mu_);
129 | int32 current_epoch_ GUARDED_BY(mu_) = -1;
130 | int64 total_words_processed_ GUARDED_BY(mu_) = 0;
131 | int32 example_pos_ GUARDED_BY(mu_);
132 | int32 label_pos_ GUARDED_BY(mu_);
133 | int32 label_limit_ GUARDED_BY(mu_);
134 |
135 | // {example_pos_, label_pos_} is the cursor for the next example.
136 | // example_pos_ wraps around at the end of corpus_. For each
137 | // example, we randomly generate [label_pos_, label_limit) for
138 | // labels.
139 | void NextExample(int32* example, int32* label) EXCLUSIVE_LOCKS_REQUIRED(mu_) {
140 | while (true) {
141 | if (label_pos_ >= label_limit_) {
142 | ++total_words_processed_;
143 | ++sentence_index_;
144 | if (sentence_index_ >= kSentenceSize) {
145 | sentence_index_ = 0;
146 | for (int i = 0; i < kSentenceSize; ++i, ++example_pos_) {
147 | if (example_pos_ >= corpus_size_) {
148 | ++current_epoch_;
149 | example_pos_ = 0;
150 | }
151 | if (subsample_ > 0) {
152 | int32 word_freq = freq_.flat()(corpus_[example_pos_]);
153 | // See Eq. 5 in http://arxiv.org/abs/1310.4546
154 | float keep_prob =
155 | (std::sqrt(word_freq / (subsample_ * corpus_size_)) + 1) *
156 | (subsample_ * corpus_size_) / word_freq;
157 | if (rng_.RandFloat() > keep_prob) {
158 | i--;
159 | continue;
160 | }
161 | }
162 | sentence_[i] = corpus_[example_pos_];
163 | }
164 | }
165 | const int32 skip = 1 + rng_.Uniform(window_size_);
166 | label_pos_ = std::max(0, sentence_index_ - skip);
167 | label_limit_ =
168 | std::min(kSentenceSize, sentence_index_ + skip + 1);
169 | }
170 | if (sentence_index_ != label_pos_) {
171 | break;
172 | }
173 | ++label_pos_;
174 | }
175 | *example = sentence_[sentence_index_];
176 | *label = sentence_[label_pos_++];
177 | }
178 |
179 | Status Init(Env* env, const string& filename) {
180 | string data;
181 | TF_RETURN_IF_ERROR(ReadFileToString(env, filename, &data));
182 | StringPiece input = data;
183 | string w;
184 | corpus_size_ = 0;
185 | std::unordered_map word_freq;
186 | while (ScanWord(&input, &w)) {
187 | ++(word_freq[w]);
188 | ++corpus_size_;
189 | }
190 | if (corpus_size_ < window_size_ * 10) {
191 | return errors::InvalidArgument("The text file ", filename,
192 | " contains too little data: ",
193 | corpus_size_, " words");
194 | }
195 | typedef std::pair WordFreq;
196 | std::vector ordered;
197 | for (const auto& p : word_freq) {
198 | if (p.second >= min_count_) ordered.push_back(p);
199 | }
200 | LOG(INFO) << "Data file: " << filename << " contains " << data.size()
201 | << " bytes, " << corpus_size_ << " words, " << word_freq.size()
202 | << " unique words, " << ordered.size()
203 | << " unique frequent words.";
204 | word_freq.clear();
205 | std::sort(ordered.begin(), ordered.end(),
206 | [](const WordFreq& x, const WordFreq& y) {
207 | return x.second > y.second;
208 | });
209 | vocab_size_ = static_cast(1 + ordered.size());
210 | Tensor word(DT_STRING, TensorShape({vocab_size_}));
211 | Tensor freq(DT_INT32, TensorShape({vocab_size_}));
212 | word.flat()(0) = "UNK";
213 | static const int32 kUnkId = 0;
214 | std::unordered_map word_id;
215 | int64 total_counted = 0;
216 | for (std::size_t i = 0; i < ordered.size(); ++i) {
217 | const auto& w = ordered[i].first;
218 | auto id = i + 1;
219 | word.flat()(id) = w;
220 | auto word_count = ordered[i].second;
221 | freq.flat()(id) = word_count;
222 | total_counted += word_count;
223 | word_id[w] = id;
224 | }
225 | freq.flat()(kUnkId) = corpus_size_ - total_counted;
226 | word_ = word;
227 | freq_ = freq;
228 | corpus_.reserve(corpus_size_);
229 | input = data;
230 | while (ScanWord(&input, &w)) {
231 | corpus_.push_back(gtl::FindWithDefault(word_id, w, kUnkId));
232 | }
233 | precalc_examples_.resize(kPrecalc);
234 | sentence_.resize(kSentenceSize);
235 | return Status::OK();
236 | }
237 | };
238 |
239 | REGISTER_KERNEL_BUILDER(Name("Skipgram").Device(DEVICE_CPU), SkipgramOp);
240 |
241 | class NegTrainOp : public OpKernel {
242 | public:
243 | explicit NegTrainOp(OpKernelConstruction* ctx) : OpKernel(ctx) {
244 | base_.Init(0, 0);
245 |
246 | OP_REQUIRES_OK(ctx, ctx->GetAttr("num_negative_samples", &num_samples_));
247 |
248 | std::vector vocab_count;
249 | OP_REQUIRES_OK(ctx, ctx->GetAttr("vocab_count", &vocab_count));
250 |
251 | std::vector vocab_weights;
252 | vocab_weights.reserve(vocab_count.size());
253 | for (const auto& f : vocab_count) {
254 | float r = std::pow(static_cast(f), 0.75f);
255 | vocab_weights.push_back(r);
256 | }
257 | sampler_ = new random::DistributionSampler(vocab_weights);
258 | }
259 |
260 | ~NegTrainOp() { delete sampler_; }
261 |
262 | void Compute(OpKernelContext* ctx) override {
263 | Tensor w_in = ctx->mutable_input(0, false);
264 | OP_REQUIRES(ctx, TensorShapeUtils::IsMatrix(w_in.shape()),
265 | errors::InvalidArgument("Must be a matrix"));
266 | Tensor w_out = ctx->mutable_input(1, false);
267 | OP_REQUIRES(ctx, w_in.shape() == w_out.shape(),
268 | errors::InvalidArgument("w_in.shape == w_out.shape"));
269 | const Tensor& examples = ctx->input(2);
270 | OP_REQUIRES(ctx, TensorShapeUtils::IsVector(examples.shape()),
271 | errors::InvalidArgument("Must be a vector"));
272 | const Tensor& labels = ctx->input(3);
273 | OP_REQUIRES(ctx, examples.shape() == labels.shape(),
274 | errors::InvalidArgument("examples.shape == labels.shape"));
275 | const Tensor& learning_rate = ctx->input(4);
276 | OP_REQUIRES(ctx, TensorShapeUtils::IsScalar(learning_rate.shape()),
277 | errors::InvalidArgument("Must be a scalar"));
278 |
279 | auto Tw_in = w_in.matrix();
280 | auto Tw_out = w_out.matrix();
281 | auto Texamples = examples.flat();
282 | auto Tlabels = labels.flat();
283 | auto lr = learning_rate.scalar()();
284 | const int64 vocab_size = w_in.dim_size(0);
285 | const int64 dims = w_in.dim_size(1);
286 | const int64 batch_size = examples.dim_size(0);
287 | OP_REQUIRES(ctx, vocab_size == sampler_->num(),
288 | errors::InvalidArgument("vocab_size mismatches: ", vocab_size,
289 | " vs. ", sampler_->num()));
290 |
291 | // Gradient accumulator for v_in.
292 | Tensor buf(DT_FLOAT, TensorShape({dims}));
293 | auto Tbuf = buf.flat();
294 |
295 | // Scalar buffer to hold sigmoid(+/- dot).
296 | Tensor g_buf(DT_FLOAT, TensorShape({}));
297 | auto g = g_buf.scalar();
298 |
299 | // The following loop needs 2 random 32-bit values per negative
300 | // sample. We reserve 8 values per sample just in case the
301 | // underlying implementation changes.
302 | auto rnd = base_.ReserveSamples32(batch_size * num_samples_ * 8);
303 | random::SimplePhilox srnd(&rnd);
304 |
305 | for (int64 i = 0; i < batch_size; ++i) {
306 | const int32 example = Texamples(i);
307 | DCHECK(0 <= example && example < vocab_size) << example;
308 | const int32 label = Tlabels(i);
309 | DCHECK(0 <= label && label < vocab_size) << label;
310 | auto v_in = Tw_in.chip<0>(example);
311 |
312 | // Positive: example predicts label.
313 | // forward: x = v_in' * v_out
314 | // l = log(sigmoid(x))
315 | // backward: dl/dx = g = sigmoid(-x)
316 | // dl/d(v_in) = g * v_out'
317 | // dl/d(v_out) = v_in' * g
318 | {
319 | auto v_out = Tw_out.chip<0>(label);
320 | auto dot = (v_in * v_out).sum();
321 | g = (dot.exp() + 1.f).inverse();
322 | Tbuf = v_out * (g() * lr);
323 | v_out += v_in * (g() * lr);
324 | }
325 |
326 | // Negative samples:
327 | // forward: x = v_in' * v_sample
328 | // l = log(sigmoid(-x))
329 | // backward: dl/dx = g = -sigmoid(x)
330 | // dl/d(v_in) = g * v_out'
331 | // dl/d(v_out) = v_in' * g
332 | for (int j = 0; j < num_samples_; ++j) {
333 | const int sample = sampler_->Sample(&srnd);
334 | if (sample == label) continue; // Skip.
335 | auto v_sample = Tw_out.chip<0>(sample);
336 | auto dot = (v_in * v_sample).sum();
337 | g = -((-dot).exp() + 1.f).inverse();
338 | Tbuf += v_sample * (g() * lr);
339 | v_sample += v_in * (g() * lr);
340 | }
341 |
342 | // Applies the gradient on v_in.
343 | v_in += Tbuf;
344 | }
345 | }
346 |
347 | private:
348 | int32 num_samples_ = 0;
349 | random::DistributionSampler* sampler_ = nullptr;
350 | GuardedPhiloxRandom base_;
351 | };
352 |
353 | REGISTER_KERNEL_BUILDER(Name("NegTrain").Device(DEVICE_CPU), NegTrainOp);
354 |
355 | } // end namespace tensorflow
356 |
--------------------------------------------------------------------------------
/word2vector/word2vec_ops.cc:
--------------------------------------------------------------------------------
1 | /* Copyright 2015 Google Inc. All Rights Reserved.
2 |
3 | Licensed under the Apache License, Version 2.0 (the "License");
4 | you may not use this file except in compliance with the License.
5 | You may obtain a copy of the License at
6 |
7 | http://www.apache.org/licenses/LICENSE-2.0
8 |
9 | Unless required by applicable law or agreed to in writing, software
10 | distributed under the License is distributed on an "AS IS" BASIS,
11 | WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | See the License for the specific language governing permissions and
13 | limitations under the License.
14 | ==============================================================================*/
15 |
16 | #include "tensorflow/core/framework/op.h"
17 |
18 | namespace tensorflow {
19 |
20 | REGISTER_OP("Skipgram")
21 | .Output("vocab_word: string")
22 | .Output("vocab_freq: int32")
23 | .Output("words_per_epoch: int64")
24 | .Output("current_epoch: int32")
25 | .Output("total_words_processed: int64")
26 | .Output("examples: int32")
27 | .Output("labels: int32")
28 | .SetIsStateful()
29 | .Attr("filename: string")
30 | .Attr("batch_size: int")
31 | .Attr("window_size: int = 5")
32 | .Attr("min_count: int = 5")
33 | .Attr("subsample: float = 1e-3")
34 | .Doc(R"doc(
35 | Parses a text file and creates a batch of examples.
36 |
37 | vocab_word: A vector of words in the corpus.
38 | vocab_freq: Frequencies of words. Sorted in the non-ascending order.
39 | words_per_epoch: Number of words per epoch in the data file.
40 | current_epoch: The current epoch number.
41 | total_words_processed: The total number of words processed so far.
42 | examples: A vector of word ids.
43 | labels: A vector of word ids.
44 | filename: The corpus's text file name.
45 | batch_size: The size of produced batch.
46 | window_size: The number of words to predict to the left and right of the target.
47 | min_count: The minimum number of word occurrences for it to be included in the
48 | vocabulary.
49 | subsample: Threshold for word occurrence. Words that appear with higher
50 | frequency will be randomly down-sampled. Set to 0 to disable.
51 | )doc");
52 |
53 | REGISTER_OP("NegTrain")
54 | .Input("w_in: Ref(float)")
55 | .Input("w_out: Ref(float)")
56 | .Input("examples: int32")
57 | .Input("labels: int32")
58 | .Input("lr: float")
59 | .SetIsStateful()
60 | .Attr("vocab_count: list(int)")
61 | .Attr("num_negative_samples: int")
62 | .Doc(R"doc(
63 | Training via negative sampling.
64 |
65 | w_in: input word embedding.
66 | w_out: output word embedding.
67 | examples: A vector of word ids.
68 | labels: A vector of word ids.
69 | vocab_count: Count of words in the vocabulary.
70 | num_negative_samples: Number of negative samples per example.
71 | )doc");
72 |
73 | } // end namespace tensorflow
74 |
--------------------------------------------------------------------------------
/word2vector/word2vec_optimized.py:
--------------------------------------------------------------------------------
1 | # Copyright 2015 Google Inc. All Rights Reserved.
2 | #
3 | # Licensed under the Apache License, Version 2.0 (the "License");
4 | # you may not use this file except in compliance with the License.
5 | # You may obtain a copy of the License at
6 | #
7 | # http://www.apache.org/licenses/LICENSE-2.0
8 | #
9 | # Unless required by applicable law or agreed to in writing, software
10 | # distributed under the License is distributed on an "AS IS" BASIS,
11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
12 | # See the License for the specific language governing permissions and
13 | # limitations under the License.
14 | # ==============================================================================
15 |
16 | """Multi-threaded word2vec unbatched skip-gram model.
17 |
18 | Trains the model described in:
19 | (Mikolov, et. al.) Efficient Estimation of Word Representations in Vector Space
20 | ICLR 2013.
21 | http://arxiv.org/abs/1301.3781
22 | This model does true SGD (i.e. no minibatching). To do this efficiently, custom
23 | ops are used to sequentially process data within a 'batch'.
24 |
25 | The key ops used are:
26 | * skipgram custom op that does input processing.
27 | * neg_train custom op that efficiently calculates and applies the gradient using
28 | true SGD.
29 | """
30 | from __future__ import absolute_import
31 | from __future__ import division
32 | from __future__ import print_function
33 |
34 | import os
35 | import sys
36 | import threading
37 | import time
38 |
39 | from six.moves import xrange # pylint: disable=redefined-builtin
40 |
41 | import numpy as np
42 | import tensorflow as tf
43 |
44 | from tensorflow.models.embedding import gen_word2vec as word2vec
45 |
46 | flags = tf.app.flags
47 |
48 | flags.DEFINE_string("save_path", None, "Directory to write the model.")
49 | flags.DEFINE_string(
50 | "train_data", None,
51 | "Training data. E.g., unzipped file http://mattmahoney.net/dc/text8.zip.")
52 | flags.DEFINE_string(
53 | "eval_data", None, "Analogy questions. "
54 | "https://word2vec.googlecode.com/svn/trunk/questions-words.txt.")
55 | flags.DEFINE_integer("embedding_size", 200, "The embedding dimension size.")
56 | flags.DEFINE_integer(
57 | "epochs_to_train", 15,
58 | "Number of epochs to train. Each epoch processes the training data once "
59 | "completely.")
60 | flags.DEFINE_float("learning_rate", 0.025, "Initial learning rate.")
61 | flags.DEFINE_integer("num_neg_samples", 25,
62 | "Negative samples per training example.")
63 | flags.DEFINE_integer("batch_size", 500,
64 | "Numbers of training examples each step processes "
65 | "(no minibatching).")
66 | flags.DEFINE_integer("concurrent_steps", 12,
67 | "The number of concurrent training steps.")
68 | flags.DEFINE_integer("window_size", 5,
69 | "The number of words to predict to the left and right "
70 | "of the target word.")
71 | flags.DEFINE_integer("min_count", 5,
72 | "The minimum number of word occurrences for it to be "
73 | "included in the vocabulary.")
74 | flags.DEFINE_float("subsample", 1e-3,
75 | "Subsample threshold for word occurrence. Words that appear "
76 | "with higher frequency will be randomly down-sampled. Set "
77 | "to 0 to disable.")
78 | flags.DEFINE_boolean(
79 | "interactive", False,
80 | "If true, enters an IPython interactive session to play with the trained "
81 | "model. E.g., try model.analogy('france', 'paris', 'russia') and "
82 | "model.nearby(['proton', 'elephant', 'maxwell'])")
83 |
84 | FLAGS = flags.FLAGS
85 |
86 |
87 | class Options(object):
88 | """Options used by our word2vec model."""
89 |
90 | def __init__(self):
91 | # Model options.
92 |
93 | # Embedding dimension.
94 | self.emb_dim = FLAGS.embedding_size
95 |
96 | # Training options.
97 |
98 | # The training text file.
99 | self.train_data = FLAGS.train_data
100 |
101 | # Number of negative samples per example.
102 | self.num_samples = FLAGS.num_neg_samples
103 |
104 | # The initial learning rate.
105 | self.learning_rate = FLAGS.learning_rate
106 |
107 | # Number of epochs to train. After these many epochs, the learning
108 | # rate decays linearly to zero and the training stops.
109 | self.epochs_to_train = FLAGS.epochs_to_train
110 |
111 | # Concurrent training steps.
112 | self.concurrent_steps = FLAGS.concurrent_steps
113 |
114 | # Number of examples for one training step.
115 | self.batch_size = FLAGS.batch_size
116 |
117 | # The number of words to predict to the left and right of the target word.
118 | self.window_size = FLAGS.window_size
119 |
120 | # The minimum number of word occurrences for it to be included in the
121 | # vocabulary.
122 | self.min_count = FLAGS.min_count
123 |
124 | # Subsampling threshold for word occurrence.
125 | self.subsample = FLAGS.subsample
126 |
127 | # Where to write out summaries.
128 | self.save_path = FLAGS.save_path
129 |
130 | # Eval options.
131 |
132 | # The text file for eval.
133 | self.eval_data = FLAGS.eval_data
134 |
135 |
136 | class Word2Vec(object):
137 | """Word2Vec model (Skipgram)."""
138 |
139 | def __init__(self, options, session):
140 | self._options = options
141 | self._session = session
142 | self._word2id = {}
143 | self._id2word = []
144 | self.build_graph()
145 | self.build_eval_graph()
146 | self.save_vocab()
147 | self._read_analogies()
148 |
149 | def _read_analogies(self):
150 | """Reads through the analogy question file.
151 |
152 | Returns:
153 | questions: a [n, 4] numpy array containing the analogy question's
154 | word ids.
155 | questions_skipped: questions skipped due to unknown words.
156 | """
157 | questions = []
158 | questions_skipped = 0
159 | with open(self._options.eval_data, "rb") as analogy_f:
160 | for line in analogy_f:
161 | if line.startswith(b":"): # Skip comments.
162 | continue
163 | words = line.strip().lower().split(b" ")
164 | ids = [self._word2id.get(w.strip()) for w in words]
165 | if None in ids or len(ids) != 4:
166 | questions_skipped += 1
167 | else:
168 | questions.append(np.array(ids))
169 | print("Eval analogy file: ", self._options.eval_data)
170 | print("Questions: ", len(questions))
171 | print("Skipped: ", questions_skipped)
172 | self._analogy_questions = np.array(questions, dtype=np.int32)
173 |
174 | def build_graph(self):
175 | """Build the model graph."""
176 | opts = self._options
177 |
178 | # The training data. A text file.
179 | (words, counts, words_per_epoch, current_epoch, total_words_processed,
180 | examples, labels) = word2vec.skipgram(filename=opts.train_data,
181 | batch_size=opts.batch_size,
182 | window_size=opts.window_size,
183 | min_count=opts.min_count,
184 | subsample=opts.subsample)
185 | (opts.vocab_words, opts.vocab_counts,
186 | opts.words_per_epoch) = self._session.run([words, counts, words_per_epoch])
187 | opts.vocab_size = len(opts.vocab_words)
188 | print("Data file: ", opts.train_data)
189 | print("Vocab size: ", opts.vocab_size - 1, " + UNK")
190 | print("Words per epoch: ", opts.words_per_epoch)
191 |
192 | self._id2word = opts.vocab_words
193 | for i, w in enumerate(self._id2word):
194 | self._word2id[w] = i
195 |
196 | # Declare all variables we need.
197 | # Input words embedding: [vocab_size, emb_dim]
198 | w_in = tf.Variable(
199 | tf.random_uniform(
200 | [opts.vocab_size,
201 | opts.emb_dim], -0.5 / opts.emb_dim, 0.5 / opts.emb_dim),
202 | name="w_in")
203 |
204 | # Global step: scalar, i.e., shape [].
205 | w_out = tf.Variable(tf.zeros([opts.vocab_size, opts.emb_dim]), name="w_out")
206 |
207 | # Global step: []
208 | global_step = tf.Variable(0, name="global_step")
209 |
210 | # Linear learning rate decay.
211 | words_to_train = float(opts.words_per_epoch * opts.epochs_to_train)
212 | lr = opts.learning_rate * tf.maximum(
213 | 0.0001,
214 | 1.0 - tf.cast(total_words_processed, tf.float32) / words_to_train)
215 |
216 | # Training nodes.
217 | inc = global_step.assign_add(1)
218 | with tf.control_dependencies([inc]):
219 | train = word2vec.neg_train(w_in,
220 | w_out,
221 | examples,
222 | labels,
223 | lr,
224 | vocab_count=opts.vocab_counts.tolist(),
225 | num_negative_samples=opts.num_samples)
226 |
227 | self._w_in = w_in
228 | self._examples = examples
229 | self._labels = labels
230 | self._lr = lr
231 | self._train = train
232 | self.step = global_step
233 | self._epoch = current_epoch
234 | self._words = total_words_processed
235 |
236 | def save_vocab(self):
237 | """Save the vocabulary to a file so the model can be reloaded."""
238 | opts = self._options
239 | with open(os.path.join(opts.save_path, "vocab.txt"), "w") as f:
240 | for i in xrange(opts.vocab_size):
241 | f.write("%s %d\n" % (tf.compat.as_text(opts.vocab_words[i]),
242 | opts.vocab_counts[i]))
243 |
244 | def build_eval_graph(self):
245 | """Build the evaluation graph."""
246 | # Eval graph
247 | opts = self._options
248 |
249 | # Each analogy task is to predict the 4th word (d) given three
250 | # words: a, b, c. E.g., a=italy, b=rome, c=france, we should
251 | # predict d=paris.
252 |
253 | # The eval feeds three vectors of word ids for a, b, c, each of
254 | # which is of size N, where N is the number of analogies we want to
255 | # evaluate in one batch.
256 | analogy_a = tf.placeholder(dtype=tf.int32) # [N]
257 | analogy_b = tf.placeholder(dtype=tf.int32) # [N]
258 | analogy_c = tf.placeholder(dtype=tf.int32) # [N]
259 |
260 | # Normalized word embeddings of shape [vocab_size, emb_dim].
261 | nemb = tf.nn.l2_normalize(self._w_in, 1)
262 |
263 | # Each row of a_emb, b_emb, c_emb is a word's embedding vector.
264 | # They all have the shape [N, emb_dim]
265 | a_emb = tf.gather(nemb, analogy_a) # a's embs
266 | b_emb = tf.gather(nemb, analogy_b) # b's embs
267 | c_emb = tf.gather(nemb, analogy_c) # c's embs
268 |
269 | # We expect that d's embedding vectors on the unit hyper-sphere is
270 | # near: c_emb + (b_emb - a_emb), which has the shape [N, emb_dim].
271 | target = c_emb + (b_emb - a_emb)
272 |
273 | # Compute cosine distance between each pair of target and vocab.
274 | # dist has shape [N, vocab_size].
275 | dist = tf.matmul(target, nemb, transpose_b=True)
276 |
277 | # For each question (row in dist), find the top 4 words.
278 | _, pred_idx = tf.nn.top_k(dist, 4)
279 |
280 | # Nodes for computing neighbors for a given word according to
281 | # their cosine distance.
282 | nearby_word = tf.placeholder(dtype=tf.int32) # word id
283 | nearby_emb = tf.gather(nemb, nearby_word)
284 | nearby_dist = tf.matmul(nearby_emb, nemb, transpose_b=True)
285 | nearby_val, nearby_idx = tf.nn.top_k(nearby_dist,
286 | min(1000, opts.vocab_size))
287 |
288 | # Nodes in the construct graph which are used by training and
289 | # evaluation to run/feed/fetch.
290 | self._analogy_a = analogy_a
291 | self._analogy_b = analogy_b
292 | self._analogy_c = analogy_c
293 | self._analogy_pred_idx = pred_idx
294 | self._nearby_word = nearby_word
295 | self._nearby_val = nearby_val
296 | self._nearby_idx = nearby_idx
297 |
298 | # Properly initialize all variables.
299 | tf.initialize_all_variables().run()
300 |
301 | self.saver = tf.train.Saver()
302 |
303 | def _train_thread_body(self):
304 | initial_epoch, = self._session.run([self._epoch])
305 | while True:
306 | _, epoch = self._session.run([self._train, self._epoch])
307 | if epoch != initial_epoch:
308 | break
309 |
310 | def train(self):
311 | """Train the model."""
312 | opts = self._options
313 |
314 | initial_epoch, initial_words = self._session.run([self._epoch, self._words])
315 |
316 | workers = []
317 | for _ in xrange(opts.concurrent_steps):
318 | t = threading.Thread(target=self._train_thread_body)
319 | t.start()
320 | workers.append(t)
321 |
322 | last_words, last_time = initial_words, time.time()
323 | while True:
324 | time.sleep(5) # Reports our progress once a while.
325 | (epoch, step, words,
326 | lr) = self._session.run([self._epoch, self.step, self._words, self._lr])
327 | now = time.time()
328 | last_words, last_time, rate = words, now, (words - last_words) / (
329 | now - last_time)
330 | print("Epoch %4d Step %8d: lr = %5.3f words/sec = %8.0f\r" % (epoch, step,
331 | lr, rate),
332 | end="")
333 | sys.stdout.flush()
334 | if epoch != initial_epoch:
335 | break
336 |
337 | for t in workers:
338 | t.join()
339 |
340 | def _predict(self, analogy):
341 | """Predict the top 4 answers for analogy questions."""
342 | idx, = self._session.run([self._analogy_pred_idx], {
343 | self._analogy_a: analogy[:, 0],
344 | self._analogy_b: analogy[:, 1],
345 | self._analogy_c: analogy[:, 2]
346 | })
347 | return idx
348 |
349 | def eval(self):
350 | """Evaluate analogy questions and reports accuracy."""
351 |
352 | # How many questions we get right at precision@1.
353 | correct = 0
354 |
355 | total = self._analogy_questions.shape[0]
356 | start = 0
357 | while start < total:
358 | limit = start + 2500
359 | sub = self._analogy_questions[start:limit, :]
360 | idx = self._predict(sub)
361 | start = limit
362 | for question in xrange(sub.shape[0]):
363 | for j in xrange(4):
364 | if idx[question, j] == sub[question, 3]:
365 | # Bingo! We predicted correctly. E.g., [italy, rome, france, paris].
366 | correct += 1
367 | break
368 | elif idx[question, j] in sub[question, :3]:
369 | # We need to skip words already in the question.
370 | continue
371 | else:
372 | # The correct label is not the precision@1
373 | break
374 | print()
375 | print("Eval %4d/%d accuracy = %4.1f%%" % (correct, total,
376 | correct * 100.0 / total))
377 |
378 | def analogy(self, w0, w1, w2):
379 | """Predict word w3 as in w0:w1 vs w2:w3."""
380 | wid = np.array([[self._word2id.get(w, 0) for w in [w0, w1, w2]]])
381 | idx = self._predict(wid)
382 | for c in [self._id2word[i] for i in idx[0, :]]:
383 | if c not in [w0, w1, w2]:
384 | return c
385 | return "unknown"
386 |
387 | def nearby(self, words, num=20):
388 | """Prints out nearby words given a list of words."""
389 | ids = np.array([self._word2id.get(x, 0) for x in words])
390 | vals, idx = self._session.run(
391 | [self._nearby_val, self._nearby_idx], {self._nearby_word: ids})
392 | for i in xrange(len(words)):
393 | print("\n%s\n=====================================" % (words[i]))
394 | for (neighbor, distance) in zip(idx[i, :num], vals[i, :num]):
395 | print("%-20s %6.4f" % (self._id2word[neighbor], distance))
396 |
397 |
398 | def _start_shell(local_ns=None):
399 | # An interactive shell is useful for debugging/development.
400 | import IPython
401 | user_ns = {}
402 | if local_ns:
403 | user_ns.update(local_ns)
404 | user_ns.update(globals())
405 | IPython.start_ipython(argv=[], user_ns=user_ns)
406 |
407 |
408 | def main(_):
409 | """Train a word2vec model."""
410 | if not FLAGS.train_data or not FLAGS.eval_data or not FLAGS.save_path:
411 | print("--train_data --eval_data and --save_path must be specified.")
412 | sys.exit(1)
413 | opts = Options()
414 | with tf.Graph().as_default(), tf.Session() as session:
415 | with tf.device("/cpu:0"):
416 | model = Word2Vec(opts, session)
417 | for _ in xrange(opts.epochs_to_train):
418 | model.train() # Process one epoch
419 | model.eval() # Eval analogies.
420 | # Perform a final save.
421 | model.saver.save(session, os.path.join(opts.save_path, "model.ckpt"),
422 | global_step=model.step)
423 | if FLAGS.interactive:
424 | # E.g.,
425 | # [0]: model.Analogy('france', 'paris', 'russia')
426 | # [1]: model.Nearby(['proton', 'elephant', 'maxwell'])
427 | _start_shell(locals())
428 |
429 |
430 | if __name__ == "__main__":
431 | tf.app.run()
432 |
--------------------------------------------------------------------------------
/word2vector/words_calculator_server.py:
--------------------------------------------------------------------------------
1 | import sys
2 | import BaseHTTPServer
3 | import numpy
4 | from sklearn.metrics.pairwise import cosine_similarity
5 | from urlparse import urlparse
6 |
7 | v = []
8 | reverse = {}
9 | dic = {}
10 |
11 | def isword(s):
12 | for i in range(len(s)):
13 | if s[i] <= 'z' and s[i] >= 'a':
14 | return True
15 | return False
16 |
17 | class SimpleHTTPRequestHandler(BaseHTTPServer.BaseHTTPRequestHandler):
18 | def do_GET(self):
19 | global v
20 | global reverse
21 | global dic
22 |
23 | query = urlparse(self.path).query
24 | query_components = dict(qc.split("=") for qc in query.split("&"))
25 | a = query_components["a"]
26 | b = query_components["b"]
27 | c = query_components["c"]
28 |
29 | print "Get request with: ", a, b, c
30 | if not a in dic:
31 | self.send_response(200)
32 | self.send_header('Content-type', 'text/html')
33 | self.end_headers()
34 | self.wfile.write("Word " + a + " is not in the dictionary")
35 | if not b in dic:
36 | self.send_response(200)
37 | self.send_header('Content-type', 'text/html')
38 | self.end_headers()
39 | self.wfile.write("Word " + b + " is not in the dictionary")
40 | if not c in dic:
41 | self.send_response(200)
42 | self.send_header('Content-type', 'text/html')
43 | self.end_headers()
44 | self.wfile.write("Word " + c + " is not in the dictionary")
45 |
46 | cur = v[dic[a]] - v[dic[b]] + v[dic[c]]
47 | sim = cosine_similarity([cur], v)[0]
48 | x = numpy.argsort(-sim)
49 | content = "\n"
50 | i = 0
51 | j = 0
52 | while j < 5:
53 | if reverse[x[i]] != a and reverse[x[i]] != c:
54 | if j == 0:
55 | content += " " + reverse[x[i]] + " "
56 | else:
57 | content += " " + reverse[x[i]] + " \n"
58 | j += 1
59 | i += 1
60 | content += " "
61 | print "Response: " + content
62 |
63 | self.send_response(200)
64 | self.send_header('Content-type', 'text/html')
65 | self.end_headers()
66 | self.wfile.write(content)
67 |
68 | fname = sys.argv[1]
69 | with open(fname) as fp:
70 | print "Loading Word2Vec ..."
71 | i = 0
72 | for line in fp:
73 | paras = line.split(" ")
74 | if not isword(paras[0]): continue
75 | dic[paras[0]] = i
76 | reverse[i] = paras[0]
77 | a = numpy.asarray(map(float, paras[1:]))
78 | v.append(a / numpy.sqrt(sum(numpy.square(a))))
79 | i += 1
80 | print "Word2Vec dictionary loaded!"
81 |
82 | Handler = SimpleHTTPRequestHandler
83 | Server = BaseHTTPServer.HTTPServer
84 | Protocol = "HTTP/1.0"
85 | if sys.argv[2:]:
86 | port = int(sys.argv[2])
87 | else:
88 | port = 8000
89 | server_address = ('127.0.0.1', port)
90 | Handler.protocol_version = Protocol
91 | httpd = Server(server_address, Handler)
92 | print("Serving HTTP")
93 | httpd.serve_forever()
94 |
95 |
--------------------------------------------------------------------------------