├── deploy
    ├── leftnav_files
    ├── index.md
    ├── tfserve.md
    └── hadoop.md
├── images
    ├── getting_started_add.png
    ├── getting_started_adder.png
    ├── getting_started_final.png
    └── getting_started_triple.png
├── extend
    ├── leftnav_files
    ├── index.md
    ├── new_data_formats.md
    ├── tool_developers
    │   └── index.md
    ├── architecture.md
    └── add_filesys.md
├── install
    ├── leftnav_files
    ├── index.md
    ├── install_c.md
    ├── install_windows.md
    ├── install_go.md
    └── install_java.md
├── performance
    ├── leftnav_files
    ├── index.md
    ├── xla
    │   ├── developing_new_backend.md
    │   ├── index.md
    │   ├── shapes.md
    │   ├── jit.md
    │   └── broadcasting.md
    └── performance_guide.md
├── get_started
    ├── leftnav_files
    ├── index.md
    └── summaries_and_tensorboard.md
├── tutorials
    ├── leftnav_files
    ├── index.md
    ├── mandelbrot.md
    ├── pdes.md
    ├── using_gpu.md
    ├── recurrent.md
    └── linear.md
├── programmers_guide
    ├── leftnav_files
    ├── index.md
    ├── dims_types.md
    ├── tfdbg-tflearn.md
    ├── data_versions.md
    ├── threading_and_queues.md
    ├── version_semantics.md
    ├── variables.md
    └── meta_graph.md
├── .gitignore
├── SUMMARY.md
├── README.md
└── book.json


/deploy/leftnav_files:
--------------------------------------------------------------------------------
1 | index.md
2 | distributed.md
3 | tfserve.md
4 | hadoop.md
5 | 


--------------------------------------------------------------------------------
/images/getting_started_add.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/efeiefei/tensorflow_documents_zh/HEAD/images/getting_started_add.png


--------------------------------------------------------------------------------
/images/getting_started_adder.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/efeiefei/tensorflow_documents_zh/HEAD/images/getting_started_adder.png


--------------------------------------------------------------------------------
/images/getting_started_final.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/efeiefei/tensorflow_documents_zh/HEAD/images/getting_started_final.png


--------------------------------------------------------------------------------
/images/getting_started_triple.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/efeiefei/tensorflow_documents_zh/HEAD/images/getting_started_triple.png


--------------------------------------------------------------------------------
/extend/leftnav_files:
--------------------------------------------------------------------------------
1 | index.md
2 | architecture.md
3 | adding_an_op.md
4 | add_filesys.md
5 | new_data_formats.md
6 | estimators.md
7 | language_bindings.md
8 | tool_developers/index.md
9 | 


--------------------------------------------------------------------------------
/install/leftnav_files:
--------------------------------------------------------------------------------
 1 | install_linux.md
 2 | install_mac.md
 3 | install_windows.md
 4 | install_sources.md
 5 | >>>
 6 | migration.md
 7 | >>>
 8 | install_java.md
 9 | install_go.md
10 | install_c.md
11 | 


--------------------------------------------------------------------------------
/performance/leftnav_files:
--------------------------------------------------------------------------------
 1 | performance_guide.md
 2 | xla/index.md
 3 | xla/broadcasting.md
 4 | xla/developing_new_backend.md
 5 | xla/jit.md
 6 | xla/operation_semantics.md
 7 | xla/shapes.md
 8 | xla/tfcompile.md
 9 | quantization.md
10 | 


--------------------------------------------------------------------------------
/get_started/leftnav_files:
--------------------------------------------------------------------------------
 1 | index.md
 2 | get_started.md
 3 | mnist/beginners.md
 4 | mnist/pros.md
 5 | mnist/mechanics.md
 6 | tflearn.md
 7 | input_fn.md
 8 | monitors.md
 9 | summaries_and_tensorboard.md
10 | embedding_viz.md
11 | graph_viz.md
12 | 


--------------------------------------------------------------------------------
/tutorials/leftnav_files:
--------------------------------------------------------------------------------
 1 | index.md
 2 | using_gpu.md
 3 | image_recognition.md
 4 | image_retraining.md
 5 | layers.md
 6 | deep_cnn.md
 7 | word2vec.md
 8 | recurrent.md
 9 | seq2seq.md
10 | linear.md
11 | wide.md
12 | wide_and_deep.md
13 | mandelbrot.md
14 | pdes.md
15 | 


--------------------------------------------------------------------------------
/programmers_guide/leftnav_files:
--------------------------------------------------------------------------------
 1 | index.md
 2 | variables.md
 3 | dims_types.md
 4 | variable_scope.md
 5 | threading_and_queues.md
 6 | reading_data.md
 7 | supervisor.md
 8 | debugger.md
 9 | tfdbg-tflearn.md
10 | meta_graph.md
11 | version_semantics.md
12 | data_versions.md
13 | faq.md
14 | 


--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
 1 | # Node rules:
 2 | ## Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files)
 3 | .grunt
 4 | 
 5 | ## Dependency directory
 6 | ## Commenting this out is preferred by some people, see
 7 | ## https://docs.npmjs.com/misc/faq#should-i-check-my-node_modules-folder-into-git
 8 | node_modules
 9 | 
10 | # Book build output
11 | _book
12 | 
13 | # eBook build output
14 | *.epub
15 | *.mobi
16 | *.pdf


--------------------------------------------------------------------------------
/SUMMARY.md:
--------------------------------------------------------------------------------
 1 | # Summary
 2 | 
 3 | * [介绍](README.md)
 4 | * [安装](install/index.md)
 5 |   * [在 Ubuntu 上安装 TensorFlow](install/install_linux.md)
 6 |   * [在 Windows 上安装 TensorFlow](install/install_windows.md)
 7 | * [开发](get_started/index.md)
 8 |   * [开始](get_started/index.md)
 9 |     * [初识 TensorFlow](get_started/get_started.md)
10 |   * [Programmers' Guide](programmers_guide/index.md)
11 |   * [教程](tutorials/index.md)
12 |   * [性能](performance/index.md)
13 | * [部署](deploy/index.md)
14 | * [延伸](extend/index.md)
15 | 
16 | 
17 | 
18 | 


--------------------------------------------------------------------------------
/install/index.md:
--------------------------------------------------------------------------------
 1 | # 安装TensorFlow
 2 | 
 3 | 如下指南描述了如何安装TensorFlow的不同版本。
 4 | 
 5 | * [在 Ubuntu 上安装 TensorFlow](./install_linux.md)
 6 | * [在 Mac OS X 上安装 TensorFlow](./install_mac.md)
 7 | * [在 Windows 上安装 TensorFlow](./install_windows.md)
 8 | * [从源码安装 TensorFlow](./install_sources.md)
 9 | 
10 | Python TensorFlow API 自版本 0.n 到 1.0 变化花了很多。如下指南描述了如何从老旧 TensorFlow 应用迁移到1.0版本。
11 | 
12 | [迁移到 TensorFlow 1.0](./migration.md)
13 | 
14 | 如下指南描述了如何安装其他语言的TensorFlow库。这些API是为了在应用中使用TensorFlow模型，所以并不如Python API一样具有扩展性。
15 | 
16 | * [为 Java 安装 TensorFlow](./install_java.md)
17 | * [为 C 安装 TensorFlow](./install_c.md)
18 | * [为 GO 安装 TensorFlow](./install_go.md)
19 | 
20 | 


--------------------------------------------------------------------------------
/deploy/index.md:
--------------------------------------------------------------------------------
 1 | # Deploy
 2 | 
 3 | This section focuses on deploying real-world models.  It contains
 4 | the following documents:
 5 | 
 6 |   * @{$distributed$Distributed TensorFlow}, which explains how to create
 7 |     a cluster of TensorFlow servers.
 8 |   * @{$tfserve$TensorFlow Serving}, which describes TensorFlow Serving--an
 9 |     open-source serving system for machine learning models. This document
10 |     provides a short introduction to TensorFlow Serving; the bulk of the
11 |     documentation about TensorFlow Serving is in a
12 |     [separate website](https://tensorflow.github.io/serving/serving_basic).
13 |   * @{$hadoop$How to run TensorFlow on Hadoop}, which has a highly
14 |     self-explanatory title.
15 | 
16 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # 说明
 2 | 
 3 | [TensorFlow](https://www.tensorflow.org/) 正式版本 V1.0 已经发布，api 较之前 V0.xx 版本发生了较大变化，Tutorial、HowTo 等文档也发生了很大变化。新的文档更加合理，对TensorFlow甚至机器学习的新手更加友好，更适合循序渐进的学习。
 4 | 
 5 | 网络上流传较广的TensorFlow中文文档大多为 [TensorFlow中文社区](http://tensorfly.cn/) 的文档，翻译自 V0.5。
 6 | 
 7 | 在此，选取版本 r1.1 的文档进行翻译，r1.1与r1.0的文档内容区别不大，结构做了一些调整。
 8 | 
 9 | ---
10 | 
11 | **具体内容查看 **[**目录**](/SUMMARY.md)**。**
12 | 
13 | ---
14 | 
15 | 本项目同时更新于 GitBook 与 GitHub。
16 | 
17 | GitHub 项目地址：[https://github.com/efeiefei/tensorflow\_documents\_zh/](https://github.com/efeiefei/tensorflow_documents_zh/)
18 | 
19 | GitBook 阅读地址：[https://efeiefei.gitbooks.io/tensorflow\_documents\_zh/](https://efeiefei.gitbooks.io/tensorflow_documents_zh/)
20 | 
21 | 欢迎联系，一起翻译！
22 | 
23 | 


--------------------------------------------------------------------------------
/book.json:
--------------------------------------------------------------------------------
 1 | {
 2 |     "author": "虞连飞<efeigm@gmail.com>",
 3 |     "description": "TensorFlow正式版中文文档",
 4 |     "extension": null,
 5 |     "generator": "site",
 6 |     "links": {
 7 |         "sharing": {
 8 |             "all": null,
 9 |             "facebook": null,
10 |             "google": null,
11 |             "twitter": null,
12 |             "weibo": null
13 |         },
14 |         "sidebar": {
15 |             "Github": "https://github.com/efeiefei/"
16 |         }
17 |     },
18 |     "output": null,
19 |     "pdf": {
20 |         "fontSize": 12,
21 |         "footerTemplate": null,
22 |         "headerTemplate": null,
23 |         "margin": {
24 |             "bottom": 36,
25 |             "left": 62,
26 |             "right": 62,
27 |             "top": 36
28 |         },
29 |         "pageNumbers": false,
30 |         "paperSize": "a4"
31 |     },
32 |     "plugins": [],
33 |     "title": "TensorFlow正式版中文文档",
34 |     "variables": {}
35 | }
36 | 


--------------------------------------------------------------------------------
/get_started/index.md:
--------------------------------------------------------------------------------
 1 | # 开始
 2 | 
 3 | 查看如下指南可对 TensorFlow 程序有个简要的概览：
 4 | 
 5 |   * [初识 TensorFlow](./get_started.md)
 6 | 
 7 | MINIST 是实验新的机器学习工具的经典数据集。我们提供了三篇指南，每篇介绍了一种不同的方法在 
 8 | TensorFlow 上训练 MNIST 模型。
 9 | 
10 |   * [MNIST-机器学习初学者](./mnits/beginners.md)，通过高级 API 介绍了 MNIST。
11 |   * [MNIST-机器学习专家](./mnist/pro.md)，比“MNIST-机器学习初学者”更加深入，
12 |     假设读者对机器学习的概念有所了解。
13 |   * [TensorFlow 机制 101](./mnist/mechanics.md)，通过底层 API 介绍了 MNIST。
14 | 
15 | 对刚接触 TensorFlow 的开发者来说，可以从高级 API 开始。下面的指南可以帮助读者学习高级 API：
16 | 
17 |   * [tf.contrib.learn 快速开始](./tflearn.md)，介绍该 API。
18 |   * [通过 tf.contrib.learn 构建输入函数](./input_fn.md)，让你对该 API 更深入的使用。
19 |   * [通过 tf.contrib.learn 打日志并做监控](./monitors.md)，解释了如何观察(audit)训练速度。
20 | 
21 | TensorBoard 是可视化机器学习的各个方面的工具。如下指南描述了如何使用 TensorBoard。
22 | 
23 |   * [TensorBoard：可视化学习](./summaries_and_tensorboard.md)，让你开始。
24 |   * [TensorBoard：Embedding 可视化](./embedding_viz.md)，演示了如何查看高维数据并与之交互。
25 |   * [TensorBoard：图可视化](./graph_viz.md)，演示了如何可视化计算图。图可视化一般对使用底层
26 |     API 的开发者更有用。
27 | 
28 | 


--------------------------------------------------------------------------------
/deploy/tfserve.md:
--------------------------------------------------------------------------------
 1 | # TensorFlow Serving
 2 | 
 3 | ## Introduction
 4 | 
 5 | TensorFlow Serving is a flexible, high-performance serving system for machine
 6 | learning models, designed for production environments. TensorFlow Serving
 7 | makes it easy to deploy new algorithms and experiments, while keeping the same
 8 | server architecture and APIs.
 9 | 
10 | ## Basic Serving Tutorial
11 | 
12 | See the [basic tutorial](https://tensorflow.github.io/serving/serving_basic)
13 | on the TensorFlow Serving site to learn how to export a trained TensorFlow
14 | model and build a server to serve the exported model.
15 | 
16 | ## Advanced Serving Tutorial
17 | 
18 | See the
19 | [advanced tutorial](https://tensorflow.github.io/serving/serving_advanced)
20 | on the TensorFlow Serving site to learn how to build a server that
21 | dynamically discovers and serves new versions of a trained TensorFlow
22 | model.
23 | 
24 | ## Serving Inception Model Tutorial
25 | 
26 | See the
27 | [serving inception tutorial](https://tensorflow.github.io/serving/serving_inception)
28 | on the TensorFlow Serving site to learn how to serve the inception model with
29 | TensorFlow Serving and Kubernetes.
30 | 
31 | 


--------------------------------------------------------------------------------
/extend/index.md:
--------------------------------------------------------------------------------
 1 | # Extend
 2 | 
 3 | This section explains how developers can add functionality to TensorFlow's
 4 | capabilities. Begin by reading the following architectural overview:
 5 | 
 6 |   * @{$architecture$TensorFlow Architecture}
 7 | 
 8 | The following guides explain how to extend particular aspects of
 9 | TensorFlow:
10 | 
11 |   * @{$adding_an_op$Adding a New Op}, which explains how to create your own
12 |     operations.
13 |   * @{$add_filesys$Adding a Custom Filesystem Plugin}, which explains how to
14 |     add support for your own shared or distributed filesystem.
15 |   * @{$new_data_formats$Custom Data Readers}, which details how to add support
16 |     for your own file and record formats.
17 |   * @{$estimators$Creating Estimators in tf.contrib.learn}, which explains how
18 |     to write your own custom Estimator.  For example, you could build your
19 |     own Estimator to implement some variation on standard linear regression.
20 | 
21 | Python is currently the only language supported by TensorFlow's API stability
22 | promises.  However, TensorFlow also provides functionality in C++, Java, and Go,
23 | plus community support for [Haskell](https://github.com/tensorflow/haskell)
24 | and [Rust](https://github.com/tensorflow/rust).  If you'd like to create or
25 | develop TensorFlow features in a language other than these languages, read the
26 | following guide:
27 | 
28 |   * @{$language_bindings$TensorFlow in Other Languages}
29 | 
30 | To create tools compatible with TensorFlow's model format, read the following
31 | guide:
32 | 
33 |   * @{$tool_developers$A Tool Developer's Guide to TensorFlow Model Files}
34 | 
35 | 
36 | 


--------------------------------------------------------------------------------
/performance/index.md:
--------------------------------------------------------------------------------
 1 | # Performance
 2 | 
 3 | Performance is often a significant issue when training a machine learning
 4 | model.  This section explains various ways to optimize performance.  Start
 5 | your investigation with the following guide:
 6 | 
 7 |   * @{$performance_guide$Performance}, which contains a collection of best
 8 |     practices for optimizing your TensorFlow code.
 9 | 
10 | XLA (Accelerated Linear Algebra) is an experimental compiler for linear
11 | algebra that optimizes TensorFlow computations. The following guides explore
12 | XLA:
13 | 
14 |   * @{$xla$XLA Overview}, which introduces XLA.
15 |   * @{$broadcasting$Broadcasting Semantics}, which describes XLA's
16 |     broadcasting semantics.
17 |   * @{$developing_new_backend$Developing a new back end for XLA}, which
18 |     explains how to re-target TensorFlow in order to optimize the performance
19 |     of the computational graph for particular hardware.
20 |   * @{$jit$Using JIT Compilation}, which describes the XLA JIT compiler that
21 |     compiles and runs parts of TensorFlow graphs via XLA in order to optimize
22 |     performance.
23 |   * @{$operation_semantics$Operation Semantics}, which is a reference manual
24 |     describing the semantics of operations in the `ComputationBuilder`
25 |     interface.
26 |   * @{$shapes$Shapes and Layout}, which details the `Shape` protocol buffer.
27 |   * @{$tfcompile$Using AOT compilation}, which explains `tfcompile`, a
28 |     standalone tool that compiles TensorFlow graphs into executable code in
29 |     order to optimize performance.
30 | 
31 | And finally, we offer the following guide:
32 | 
33 |   * @{$quantization$How to Quantize Neural Networks with TensorFlow}, which
34 |     can explains how to use quantization to reduce model size, both in storage
35 |     and at runtime. Quantization can improve performance, especially on
36 |     mobile hardware.
37 | 
38 | 


--------------------------------------------------------------------------------
/tutorials/index.md:
--------------------------------------------------------------------------------
 1 | # Tutorials
 2 | 
 3 | This section contains tutorials demonstrating how to do specific tasks
 4 | in TensorFlow.  If you are new to TensorFlow, we recommend reading the
 5 | documents in the "Get Started" section before reading these tutorials.
 6 | 
 7 | The following tutorial explains the interaction of CPUs and GPUs on a
 8 | TensorFlow system:
 9 | 
10 |   * @{$using_gpu$Using GPUs}
11 | 
12 | The following tutorials cover different aspects of image recognition:
13 | 
14 |   * @{$image_recognition$Image Recognition}, which introduces the field of
15 |     image recognition and a model (Inception) for recognizing images.
16 |   * @{$image_retraining$How to Retrain Inception's Final Layer for New Categories},
17 |     which has a wonderfully self-explanatory title.
18 |   * @{$layers$A Guide to TF Layers: Building a Convolutional Neural Network},
19 |     which introduces convolutional neural networks (CNNs) and demonstrates how
20 |     to build a CNN in TensorFlow.
21 |   * @{$deep_cnn$Convolutional Neural Networks}, which demonstrates how to
22 |     build a small CNN for recognizing images.  This tutorial is aimed at
23 |     advanced TensorFlow users.
24 | 
25 | The following tutorials focus on machine learning problems in human language:
26 | 
27 |   * @{$word2vec$Vector Representations of Words}, which demonstrates how to
28 |     create an embedding for words.
29 |   * @{$recurrent$Recurrent Neural Networks}, which demonstrates how to use a
30 |     recurrent neural network to predict the next word in a sentence.
31 |   * @{$seq2seq$Sequence-to-Sequence Models}, which demonstrates how to use a
32 |     sequence-to-sequence model to translate text from English to French.
33 | 
34 | The following tutorials focus on linear models:
35 | 
36 |   * @{$linear$Large-Scale Linear Models with TensorFlow}, which introduces
37 |     linear models and demonstrates how to build them with the high-level API.
38 |   * @{$wide$TensorFlow Linear Model Tutorial}, which demonstrates how to solve
39 |     a binary classification problem in TensorFlow.
40 |   * @{$wide_and_deep$TensorFlow Wide & Deep Learning Tutorial}, which explains
41 |     how to use the high-level API to jointly train both a wide linear model
42 |     and a deep feed-forward neural network.
43 | 
44 | Although TensorFlow specializes in machine learning, you may also use
45 | TensorFlow to solve other kinds of math problems.  For example:
46 | 
47 |   * @{$mandelbrot$Mandelbrot Set}
48 |   * @{$pdes$Partial Differential Equations}
49 | 
50 | 


--------------------------------------------------------------------------------
/deploy/hadoop.md:
--------------------------------------------------------------------------------
 1 | # How to run TensorFlow on Hadoop
 2 | 
 3 | This document describes how to run TensorFlow on Hadoop. It will be expanded to
 4 | describe running on various cluster managers, but only describes running on HDFS
 5 | at the moment.
 6 | 
 7 | ## HDFS
 8 | 
 9 | We assume that you are familiar with @{$reading_data$reading data}.
10 | 
11 | To use HDFS with TensorFlow, change the file paths you use to read and write
12 | data to an HDFS path. For example:
13 | 
14 | ```python
15 | filename_queue = tf.train.string_input_producer([
16 |     "hdfs://namenode:8020/path/to/file1.csv",
17 |     "hdfs://namenode:8020/path/to/file2.csv",
18 | ])
19 | ```
20 | 
21 | If you want to use the namenode specified in your HDFS configuration files, then
22 | change the file prefix to `hdfs://default/`.
23 | 
24 | When launching your TensorFlow program, the following environment variables must
25 | be set:
26 | 
27 | *   **JAVA_HOME**: The location of your Java installation.
28 | *   **HADOOP_HDFS_HOME**: The location of your HDFS installation. You can also
29 |     set this environment variable by running:
30 | 
31 |     ```shell
32 |     source ${HADOOP_HOME}/libexec/hadoop-config.sh
33 |     ```
34 | 
35 | *   **LD_LIBRARY_PATH**: To include the path to libjvm.so, and optionally the path 
36 |     to libhdfs.so if your Hadoop distribution does not install libhdfs.so in 
37 |     `$HADOOP_HDFS_HOME/lib/native`. On Linux:
38 | 
39 |     ```shell
40 |     export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${JAVA_HOME}/jre/lib/amd64/server
41 |     ```
42 | 
43 | *   **CLASSPATH**: The Hadoop jars must be added prior to running your
44 |     TensorFlow program. The CLASSPATH set by
45 |     `${HADOOP_HOME}/libexec/hadoop-config.sh` is insufficient. Globs must be
46 |     expanded as described in the libhdfs documentation:
47 | 
48 |     ```shell
49 |     CLASSPATH=$($HADOOP_HDFS_HOME}/bin/hadoop classpath --glob) python your_script.py
50 |     ```
51 |     For older version of Hadoop/libhdfs (older than 2.6.0), you have to expand the
52 |     classpath wildcard manually. For more details, see
53 |     [HADOOP-10903](https://issues.apache.org/jira/browse/HADOOP-10903).
54 | 
55 | If the Hadoop cluster is in secure mode, the following environment variable must
56 | be set:
57 | 
58 | *   **KERB_TICKET_CACHE_PATH**: The path of Kerberos ticket cache file. For example:
59 | 
60 |     ```shell
61 |     export KERB_TICKET_CACHE_PATH=/tmp/krb5cc_10002
62 |     ```
63 | 
64 | If you are running @{$distributed$Distributed TensorFlow}, then all
65 | workers must have the environment variables set and Hadoop installed.
66 | 


--------------------------------------------------------------------------------
/programmers_guide/index.md:
--------------------------------------------------------------------------------
 1 | # Programmer's Guide
 2 | 
 3 | The documents in this unit dive into the details of writing TensorFlow
 4 | code.  This section begins with the following guides, each of which
 5 | explain a particular aspect of TensorFlow:
 6 | 
 7 |   * @{$variables$Variables: Creation, Initialization, Saving, and Loading},
 8 |     which details the mechanics of TensorFlow Variables.
 9 |   * @{$dims_types$Tensor Ranks, Shapes, and Types}, which explains Tensor
10 |     rank (the number of dimensions), shape (the size of each dimension),
11 |     and datatypes.
12 |   * @{$variable_scope$Sharing Variables}, which explains how to share and
13 |     manage large sets of variables when building complex models.
14 |   * @{$threading_and_queues$Threading and Queues}, which explains TensorFlow's
15 |     rich queuing system.
16 |   * @{$reading_data$Reading Data}, which documents three different mechanisms
17 |     for getting data into a TensorFlow program.
18 | 
19 | The following guide is helpful when training a complex model over multiple
20 | days:
21 | 
22 |   * @{$supervisor$Supervisor: Training Helper for Days-Long Trainings}, which
23 |     explains how to gracefully handle system crashes during a lengthy training
24 |     session.
25 | 
26 | TensorFlow provides a debugger named `tfdbg`, which is documented in the
27 | following two guides:
28 | 
29 |   * @{$debugger$TensorFlow Debugger (tfdbg) Command-Line-Interface Tutorial: MNIST},
30 |     which walks you through the use of `tfdbg` within an application written
31 |     in the low-level TensorFlow API.
32 |   * @{$tfdbg-tflearn$How to Use TensorFlow Debugger (tfdbg) with tf.contrib.learn},
33 |     which demonstrates how to use `tfdbg` within the Estimators API.
34 | 
35 | A `MetaGraph` consists of both a computational graph and its associated
36 | metadata.  A `MetaGraph` contains the information required to continue
37 | training, perform evaluation, or run inference on a previously
38 | trained graph.  The following guide details `MetaGraph` objects:
39 | 
40 |   * @{$meta_graph$Exporting and Importing a MetaGraph}.
41 | 
42 | To learn about the TensorFlow versioning scheme, consult the following two
43 | guides:
44 | 
45 |   * @{$version_semantics$TensorFlow Version Semantics}, which explains
46 |     TensorFlow's versioning nomenclature and compatibility rules.
47 |   * @{$data_versions$TensorFlow Data Versioning: GraphDefs and Checkpoints},
48 |     which explains how TensorFlow adds versioning information to computational
49 |     graphs and checkpoints in order to support compatibility across versions.
50 | 
51 | We conclude this section with a FAQ about TensorFlow programming:
52 | 
53 |   * @{$faq$Frequently Asked Questions}
54 | 


--------------------------------------------------------------------------------
/programmers_guide/dims_types.md:
--------------------------------------------------------------------------------
 1 | # Tensor Ranks, Shapes, and Types
 2 | 
 3 | TensorFlow programs use a tensor data structure to represent all data. You can
 4 | think of a TensorFlow tensor as an n-dimensional array or list.
 5 | A tensor has a static type and dynamic dimensions. Only tensors may be passed
 6 | between nodes in the computation graph.
 7 | 
 8 | ## Rank
 9 | 
10 | In the TensorFlow system, tensors are described by a unit of dimensionality
11 | known as *rank*. Tensor rank is not the same as matrix rank. Tensor rank
12 | (sometimes referred to as *order* or *degree* or *n-dimension*) is the number
13 | of dimensions of the tensor. For example, the following tensor (defined as a
14 | Python list) has a rank of 2:
15 | 
16 |     t = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]
17 | 
18 | A rank two tensor is what we typically think of as a matrix, a rank one tensor
19 | is a vector. For a rank two tensor you can access any element with the syntax
20 | `t[i, j]`.  For a rank three tensor you would need to address an element with
21 | `t[i, j, k]`.
22 | 
23 | Rank | Math entity | Python example
24 | --- | --- | ---
25 | 0 | Scalar (magnitude only) | `s = 483`
26 | 1 | Vector (magnitude and direction) | `v = [1.1, 2.2, 3.3]`
27 | 2 | Matrix (table of numbers) | `m = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]`
28 | 3 | 3-Tensor (cube of numbers) | `t = [[[2], [4], [6]], [[8], [10], [12]], [[14], [16], [18]]]`
29 | n | n-Tensor (you get the idea) | `....`
30 | 
31 | ## Shape
32 | 
33 | The TensorFlow documentation uses three notational conventions to describe
34 | tensor dimensionality: rank, shape, and dimension number. The following table
35 | shows how these relate to one another:
36 | 
37 | Rank | Shape | Dimension number | Example
38 | --- | --- | --- | ---
39 | 0 | [] | 0-D | A 0-D tensor.  A scalar.
40 | 1 | [D0] | 1-D | A 1-D tensor with shape [5].
41 | 2 | [D0, D1] | 2-D | A 2-D tensor with shape [3, 4].
42 | 3 | [D0, D1, D2] | 3-D | A 3-D tensor with shape [1, 4, 3].
43 | n | [D0, D1, ... Dn-1] | n-D | A tensor with shape [D0, D1, ... Dn-1].
44 | 
45 | Shapes can be represented via Python lists / tuples of ints, or with the
46 | @{tf.TensorShape}.
47 | 
48 | ## Data types
49 | 
50 | In addition to dimensionality, Tensors have a data type. You can assign any one
51 | of the following data types to a tensor:
52 | 
53 | Data type | Python type | Description
54 | --- | --- | ---
55 | `DT_FLOAT` | `tf.float32` | 32 bits floating point.
56 | `DT_DOUBLE` | `tf.float64` | 64 bits floating point.
57 | `DT_INT8` | `tf.int8` | 8 bits signed integer.
58 | `DT_INT16` | `tf.int16` | 16 bits signed integer.
59 | `DT_INT32` | `tf.int32` | 32 bits signed integer.
60 | `DT_INT64` | `tf.int64` | 64 bits signed integer.
61 | `DT_UINT8` | `tf.uint8` | 8 bits unsigned integer.
62 | `DT_UINT16` | `tf.uint16` | 16 bits unsigned integer.
63 | `DT_STRING` | `tf.string` | Variable length byte arrays.  Each element of a Tensor is a byte array.
64 | `DT_BOOL` | `tf.bool` | Boolean.
65 | `DT_COMPLEX64` | `tf.complex64` | Complex number made of two 32 bits floating points: real and imaginary parts.
66 | `DT_COMPLEX128` | `tf.complex128` | Complex number made of two 64 bits floating points: real and imaginary parts.
67 | `DT_QINT8` | `tf.qint8` | 8 bits signed integer used in quantized Ops.
68 | `DT_QINT32` | `tf.qint32` | 32 bits signed integer used in quantized Ops.
69 | `DT_QUINT8` | `tf.quint8` | 8 bits unsigned integer used in quantized Ops.
70 | 


--------------------------------------------------------------------------------
/tutorials/mandelbrot.md:
--------------------------------------------------------------------------------
  1 | # Mandelbrot Set
  2 | 
  3 | Visualizing the [Mandelbrot set](https://en.wikipedia.org/wiki/Mandelbrot_set)
  4 | doesn't have anything to do with machine learning, but it makes for a fun
  5 | example of how one can use TensorFlow for general mathematics.  This is
  6 | actually a pretty naive implementation of the visualization, but it makes the
  7 | point.  (We may end up providing a more elaborate implementation down the line
  8 | to produce more truly beautiful images.)
  9 | 
 10 | Note: This tutorial was originally prepared as an IPython notebook.
 11 | 
 12 | ## Basic Setup
 13 | 
 14 | We'll need a few imports to get started.
 15 | 
 16 | ```python
 17 | # Import libraries for simulation
 18 | import tensorflow as tf
 19 | import numpy as np
 20 | 
 21 | # Imports for visualization
 22 | import PIL.Image
 23 | from io import BytesIO
 24 | from IPython.display import Image, display
 25 | ```
 26 | 
 27 | Now we'll define a function to actually display the image once we have
 28 | iteration counts.
 29 | 
 30 | ```python
 31 | def DisplayFractal(a, fmt='jpeg'):
 32 |   """Display an array of iteration counts as a
 33 |      colorful picture of a fractal."""
 34 |   a_cyclic = (6.28*a/20.0).reshape(list(a.shape)+[1])
 35 |   img = np.concatenate([10+20*np.cos(a_cyclic),
 36 |                         30+50*np.sin(a_cyclic),
 37 |                         155-80*np.cos(a_cyclic)], 2)
 38 |   img[a==a.max()] = 0
 39 |   a = img
 40 |   a = np.uint8(np.clip(a, 0, 255))
 41 |   f = BytesIO()
 42 |   PIL.Image.fromarray(a).save(f, fmt)
 43 |   display(Image(data=f.getvalue()))
 44 | ```
 45 | 
 46 | ## Session and Variable Initialization
 47 | 
 48 | For playing around like this, we often use an interactive session, but a regular
 49 | session would work as well.
 50 | 
 51 | ```python
 52 | sess = tf.InteractiveSession()
 53 | ```
 54 | 
 55 | It's handy that we can freely mix NumPy and TensorFlow.
 56 | 
 57 | ```python
 58 | # Use NumPy to create a 2D array of complex numbers
 59 | 
 60 | Y, X = np.mgrid[-1.3:1.3:0.005, -2:1:0.005]
 61 | Z = X+1j*Y
 62 | ```
 63 | 
 64 | Now we define and initialize TensorFlow tensors.
 65 | 
 66 | ```python
 67 | xs = tf.constant(Z.astype(np.complex64))
 68 | zs = tf.Variable(xs)
 69 | ns = tf.Variable(tf.zeros_like(xs, tf.float32))
 70 | ```
 71 | 
 72 | TensorFlow requires that you explicitly initialize variables before using them.
 73 | 
 74 | ```python
 75 | tf.global_variables_initializer().run()
 76 | ```
 77 | 
 78 | ## Defining and Running the Computation
 79 | 
 80 | Now we specify more of the computation...
 81 | 
 82 | ```python
 83 | # Compute the new values of z: z^2 + x
 84 | zs_ = zs*zs + xs
 85 | 
 86 | # Have we diverged with this new value?
 87 | not_diverged = tf.abs(zs_) < 4
 88 | 
 89 | # Operation to update the zs and the iteration count.
 90 | #
 91 | # Note: We keep computing zs after they diverge! This
 92 | #       is very wasteful! There are better, if a little
 93 | #       less simple, ways to do this.
 94 | #
 95 | step = tf.group(
 96 |   zs.assign(zs_),
 97 |   ns.assign_add(tf.cast(not_diverged, tf.float32))
 98 |   )
 99 | ```
100 | 
101 | ... and run it for a couple hundred steps
102 | 
103 | ```python
104 | for i in range(200): step.run()
105 | ```
106 | 
107 | Let's see what we've got.
108 | 
109 | ```python
110 | DisplayFractal(ns.eval())
111 | ```
112 | 
113 | ![jpeg](../images/mandelbrot_output.jpg)
114 | 
115 | Not bad!
116 | 
117 | 
118 | 


--------------------------------------------------------------------------------
/tutorials/pdes.md:
--------------------------------------------------------------------------------
  1 | # Partial Differential Equations
  2 | 
  3 | TensorFlow isn't just for machine learning.  Here we give a (somewhat
  4 | pedestrian) example of using TensorFlow for simulating the behavior of a
  5 | [partial differential equation](
  6 | https://en.wikipedia.org/wiki/Partial_differential_equation).
  7 | We'll simulate the surface of square pond as a few raindrops land on it.
  8 | 
  9 | Note: This tutorial was originally prepared as an IPython notebook.
 10 | 
 11 | ## Basic Setup
 12 | 
 13 | A few imports we'll need.
 14 | 
 15 | ```python
 16 | #Import libraries for simulation
 17 | import tensorflow as tf
 18 | import numpy as np
 19 | 
 20 | #Imports for visualization
 21 | import PIL.Image
 22 | from io import BytesIO
 23 | from IPython.display import clear_output, Image, display
 24 | ```
 25 | 
 26 | A function for displaying the state of the pond's surface as an image.
 27 | 
 28 | ```python
 29 | def DisplayArray(a, fmt='jpeg', rng=[0,1]):
 30 |   """Display an array as a picture."""
 31 |   a = (a - rng[0])/float(rng[1] - rng[0])*255
 32 |   a = np.uint8(np.clip(a, 0, 255))
 33 |   f = BytesIO()
 34 |   PIL.Image.fromarray(a).save(f, fmt)
 35 |   clear_output(wait = True)
 36 |   display(Image(data=f.getvalue()))
 37 | ```
 38 | 
 39 | Here we start an interactive TensorFlow session for convenience in playing
 40 | around.  A regular session would work as well if we were doing this in an
 41 | executable .py file.
 42 | 
 43 | ```python
 44 | sess = tf.InteractiveSession()
 45 | ```
 46 | 
 47 | ## Computational Convenience Functions
 48 | 
 49 | 
 50 | ```python
 51 | def make_kernel(a):
 52 |   """Transform a 2D array into a convolution kernel"""
 53 |   a = np.asarray(a)
 54 |   a = a.reshape(list(a.shape) + [1,1])
 55 |   return tf.constant(a, dtype=1)
 56 | 
 57 | def simple_conv(x, k):
 58 |   """A simplified 2D convolution operation"""
 59 |   x = tf.expand_dims(tf.expand_dims(x, 0), -1)
 60 |   y = tf.nn.depthwise_conv2d(x, k, [1, 1, 1, 1], padding='SAME')
 61 |   return y[0, :, :, 0]
 62 | 
 63 | def laplace(x):
 64 |   """Compute the 2D laplacian of an array"""
 65 |   laplace_k = make_kernel([[0.5, 1.0, 0.5],
 66 |                            [1.0, -6., 1.0],
 67 |                            [0.5, 1.0, 0.5]])
 68 |   return simple_conv(x, laplace_k)
 69 | ```
 70 | 
 71 | ## Define the PDE
 72 | 
 73 | Our pond is a perfect 500 x 500 square, as is the case for most ponds found in
 74 | nature.
 75 | 
 76 | ```python
 77 | N = 500
 78 | ```
 79 | 
 80 | Here we create our pond and hit it with some rain drops.
 81 | 
 82 | ```python
 83 | # Initial Conditions -- some rain drops hit a pond
 84 | 
 85 | # Set everything to zero
 86 | u_init = np.zeros([N, N], dtype=np.float32)
 87 | ut_init = np.zeros([N, N], dtype=np.float32)
 88 | 
 89 | # Some rain drops hit a pond at random points
 90 | for n in range(40):
 91 |   a,b = np.random.randint(0, N, 2)
 92 |   u_init[a,b] = np.random.uniform()
 93 | 
 94 | DisplayArray(u_init, rng=[-0.1, 0.1])
 95 | ```
 96 | 
 97 | ![jpeg](../images/pde_output_1.jpg)
 98 | 
 99 | 
100 | Now let's specify the details of the differential equation.
101 | 
102 | 
103 | ```python
104 | # Parameters:
105 | # eps -- time resolution
106 | # damping -- wave damping
107 | eps = tf.placeholder(tf.float32, shape=())
108 | damping = tf.placeholder(tf.float32, shape=())
109 | 
110 | # Create variables for simulation state
111 | U  = tf.Variable(u_init)
112 | Ut = tf.Variable(ut_init)
113 | 
114 | # Discretized PDE update rules
115 | U_ = U + eps * Ut
116 | Ut_ = Ut + eps * (laplace(U) - damping * Ut)
117 | 
118 | # Operation to update the state
119 | step = tf.group(
120 |   U.assign(U_),
121 |   Ut.assign(Ut_))
122 | ```
123 | 
124 | ## Run The Simulation
125 | 
126 | This is where it gets fun -- running time forward with a simple for loop.
127 | 
128 | ```python
129 | # Initialize state to initial conditions
130 | tf.global_variables_initializer().run()
131 | 
132 | # Run 1000 steps of PDE
133 | for i in range(1000):
134 |   # Step simulation
135 |   step.run({eps: 0.03, damping: 0.04})
136 |   DisplayArray(U.eval(), rng=[-0.1, 0.1])
137 | ```
138 | 
139 | ![jpeg](../images/pde_output_2.jpg)
140 | 
141 | Look! Ripples!
142 | 
143 | 


--------------------------------------------------------------------------------
/install/install_c.md:
--------------------------------------------------------------------------------
  1 | # Installing TensorFlow for C
  2 | 
  3 | TensorFlow provides a C API defined in
  4 | [`c_api.h`](https://github.com/tensorflow/tensorflow/tree/master/c/c_api.h),
  5 | which is suitable for
  6 | [building bindings for other languages](https://www.tensorflow.org/extend/language_bindings).
  7 | The API leans towards simplicity and uniformity rather than convenience.
  8 | 
  9 | 
 10 | ## Supported Platforms
 11 | 
 12 | You may install TensorFlow for C on the following operating systems:
 13 | 
 14 |   * Linux
 15 |   * Mac OS X
 16 | 
 17 | 
 18 | ## Installation
 19 | 
 20 | Take the following steps to install the TensorFlow for C library and
 21 | enable TensorFlow for C:
 22 | 
 23 |   1. Decide whether you will run TensorFlow for C on CPU(s) only or
 24 |      with the help of GPU(s). To help you decide, read the section
 25 |      entitled "Determine which TensorFlow to install" in one of the
 26 |      following guides:
 27 | 
 28 |        * @{$install_linux#determine_which_tensorflow_to_install$Installing TensorFlow on Linux}
 29 |        * @{$install_mac#determine_which_tensorflow_to_install$Installing TensorFlow on Mac OS}
 30 | 
 31 |   2. Download and extract the TensorFlow C library into `/usr/local/lib` by
 32 |      invoking the following shell commands:
 33 | 
 34 |          TF_TYPE="cpu" # Change to "gpu" for GPU support
 35 |          OS="linux" # Change to "darwin" for Mac OS
 36 |          TARGET_DIRECTORY="/usr/local"
 37 |          curl -L \
 38 |            "https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-${TF_TYPE}-${OS}-x86_64-1.1.0.tar.gz" |
 39 |            sudo tar -C $TARGET_DIRECTORY -xz
 40 | 
 41 |      The `tar` command extracts the TensorFlow C library into the `lib`
 42 |      subdirectory of `TARGET_DIRECTORY`. For example, specifying `/usr/local`
 43 |      as `TARGET_DIRECTORY` causes `tar` to extract the TensorFlow C library
 44 |      into `/usr/local/lib`.
 45 | 
 46 |      If you'd prefer to extract the library into a different directory,
 47 |      adjust `TARGET_DIRECTORY` accordingly.
 48 | 
 49 |   3. In Step 2, if you specified a system directory (for example, `/usr/local`)
 50 |      as the `TARGET_DIRECTORY`, then run `ldconfig` to configure the linker.
 51 |      For example:
 52 | 
 53 |      <pre><b>sudo ldconfig</b></pre>
 54 | 
 55 |      If you assigned a `TARGET_DIRECTORY` other than a system
 56 |      directory (for example, `~/mydir`), then you must append the extraction
 57 |      directory (for example, `~/mydir/lib`) to two environment variables.
 58 |      For example:
 59 | 
 60 |      <pre> <b>export LIBRARY_PATH=$LIBRARY_PATH:~/mydir/lib</b> # For both Linux and Mac OS X
 61 |      <b>export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/mydir/lib</b> # For Linux only
 62 |      <b>export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:~/mydir/lib</b> # For Mac OS X only</pre>
 63 | 
 64 | 
 65 | 
 66 | ## Validate your installation
 67 | 
 68 | After installing TensorFlow for C, enter the following code into a file named
 69 | `hello_tf.c`:
 70 | 
 71 | ```c
 72 | #include <stdio.h>
 73 | #include <tensorflow/c/c_api.h>
 74 | 
 75 | int main() {
 76 |   printf(“Hello from TensorFlow C library version %s\n”, TF_Version());
 77 |   return 0;
 78 | }
 79 | ```
 80 | 
 81 | ### Build and Run
 82 | 
 83 | Build `hello_tf.c` by invoking the following command:
 84 | 
 85 | 
 86 | <pre><b>gcc hello_tf.c</b></pre>
 87 | 
 88 | 
 89 | Running the resulting executable should output the following message:
 90 | 
 91 | 
 92 | <pre><b>a.out</b>
 93 | Hello from TensorFlow C library version <i>number</i></pre>
 94 | 
 95 | 
 96 | ### Troubleshooting
 97 | 
 98 | If building the program fails, the most likely culprit is that `gcc` cannot
 99 | find the TensorFlow C library.  One way to fix this problem is to specify
100 | the `-I` and `-L` options to `gcc`.  For example, if the `TARGET_LIBRARY`
101 | was `/usr/local`, you would invoke `gcc` as follows:
102 | 
103 | <pre><b>gcc -I/usr/local/include -L/usr/local/lib hello_tf.c -ltensorflow</b></pre>
104 | 
105 | If executing `a.out` fails, ask yourself the following questions:
106 | 
107 |   * Did the program build without error?
108 |   * Have you assigned the correct directory to the environment variables
109 |     noted in Step 3 of [Installation](#installation)?
110 |   * Did you export those environment variables?
111 | 
112 | If you are still seeing build or execution error messages, search (or post to)
113 | [StackOverflow](www.stackoverflow.com/questions/tagged/tensorflow) for
114 | possible solutions.
115 | 
116 | 


--------------------------------------------------------------------------------
/performance/xla/developing_new_backend.md:
--------------------------------------------------------------------------------
 1 | # Developing a new backend for XLA
 2 | 
 3 | This preliminary guide is for early adopters that want to easily retarget
 4 | TensorFlow to their hardware in an efficient manner. The guide is not
 5 | step-by-step and assumes knowledge of [LLVM](http://llvm.org),
 6 | [Bazel](https://bazel.build/), and TensorFlow.
 7 | 
 8 | XLA provides an abstract interface that a new architecture or accelerator can
 9 | implement to create a backend to run TensorFlow graphs. Retargeting XLA should
10 | be significantly simpler and scalable than implementing every existing
11 | TensorFlow Op for new hardware.
12 | 
13 | Most implementations will fall into one of the following scenarios:
14 | 
15 | 1.  Existing CPU architecture not yet officially supported by XLA, with or
16 |     without an existing [LLVM](http://llvm.org) backend.
17 | 2.  Non-CPU-like hardware with an existing LLVM backend.
18 | 3.  Non-CPU-like hardware without an existing LLVM backend.
19 | 
20 | > Note: An LLVM backend can mean either one of the officially released LLVM
21 | > backends or a custom LLVM backend developed in-house.
22 | 
23 | ## Scenario 1: Existing CPU architecture not yet officially supported by XLA
24 | 
25 | In this scenario, start by looking at the existing [XLA CPU backend]
26 | (https://www.tensorflow.org/code/tensorflow/compiler/xla/service/cpu/).
27 | XLA makes it easy to retarget TensorFlow to different CPUs by using LLVM, since
28 | the main difference between XLA backends for CPUs is the code generated by LLVM.
29 | Google tests XLA for x64 and ARM64 architectures.
30 | 
31 | If the hardware vendor has an LLVM backend for their hardware, it is simple to
32 | link the backend with the LLVM built with XLA. In JIT mode, the XLA CPU backend
33 | emits code for the host CPU. For ahead-of-time compilation,
34 | [`xla::AotCompilationOptions`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/compiler.h)
35 | can provide an LLVM triple to configure the target architecture.
36 | 
37 | If there is no existing LLVM backend but another kind of code generator exists,
38 | it should be possible to reuse most of the existing CPU backend.
39 | 
40 | ## Scenario 2: Non-CPU-like hardware with an existing LLVM backend
41 | 
42 | It is possible to model a new
43 | [`xla::Compiler`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/compiler.h)
44 | implementation on the existing [`xla::CPUCompiler`]
45 | (https://www.tensorflow.org/code/tensorflow/compiler/xla/service/cpu/cpu_compiler.cc)
46 | and [`xla::GPUCompiler`]
47 | (https://www.tensorflow.org/code/tensorflow/compiler/xla/service/gpu/gpu_compiler.cc)
48 | classes, since these already emit LLVM IR. Depending on the nature of the
49 | hardware, it is possible that many of the LLVM IR generation aspects will have
50 | to be changed, but a lot of code can be shared with the existing backends.
51 | 
52 | A good example to follow is the [GPU backend]
53 | (https://www.tensorflow.org/code/tensorflow/compiler/xla/service/gpu/)
54 | of XLA. The GPU backend targets a non-CPU-like ISA, and therefore some aspects
55 | of its code generation are unique to the GPU domain. Other kinds of hardware,
56 | e.g. DSPs like Hexagon (which has an upstream LLVM backend), can reuse parts of
57 | the LLVM IR emission logic, but other parts will be unique.
58 | 
59 | ## Scenario 3: Non-CPU-like hardware without an existing LLVM backend
60 | 
61 | If it is not possible to utilize LLVM, then the best option is to implement a
62 | new backend for XLA for the desired hardware. This option requires the most
63 | effort. The classes that need to be implemented are as follows:
64 | 
65 | *   [StreamExecutor](https://www.tensorflow.org/code/tensorflow/stream_executor/stream_executor.h):
66 |     For many devices not all methods of `StreamExecutor` are needed. See
67 |     existing `StreamExecutor` implementations for details.
68 | *   [xla::Compiler](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/compiler.h):
69 |     This class encapsulates the compilation of a HLO computation into an
70 |     `xla::Executable`.
71 | *   [`xla::Executable`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/executable.h):
72 |     This class is used to launch a compiled computation on the platform.
73 | *   [`xla::TransferManager`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/transfer_manager.h):
74 |     This class enables backends to provide platform-specific mechanisms for
75 |     constructing XLA literal data from given device memory handles. In other
76 |     words, it helps encapsulate the transfer of data from the host to the device
77 |     and back.
78 | 


--------------------------------------------------------------------------------
/performance/xla/index.md:
--------------------------------------------------------------------------------
 1 | # XLA Overview
 2 | 
 3 | > Note: XLA is experimental and considered alpha.  Most use cases will not
 4 | > see improvements in performance (speed or decreased memory usage). We have
 5 | > released XLA early so the Open Source Community can contribute to its
 6 | > development, as well as create a path for integration with hardware
 7 | > accelerators.
 8 | 
 9 | XLA (Accelerated Linear Algebra) is a domain-specific compiler for linear
10 | algebra that optimizes TensorFlow computations. The results are improvements in
11 | speed, memory usage, and portability on server and mobile platforms. Initially,
12 | most users will not see large benefits from XLA, but are welcome to experiment
13 | by using XLA via @{$jit$just-in-time (JIT) compilation} or @{$tfcompile$ahead-of-time (AOT) compilation}. Developers targeting new hardware accelerators are
14 | especially encouraged to try out XLA.
15 | 
16 | The XLA framework is experimental and in active development. In particular,
17 | while it is unlikely that the semantics of existing operations will change, it
18 | is expected that more operations will be added to cover important use cases. The
19 | team welcomes feedback from the community about missing functionality and
20 | community contributions via GitHub.
21 | 
22 | ## Why did we build XLA?
23 | 
24 | We had several objectives for XLA to work with TensorFlow:
25 | 
26 | *   *Improve execution speed.* Compile subgraphs to reduce the execution time of
27 |     short-lived Ops to eliminate overhead from the TensorFlow runtime, fuse
28 |     pipelined operations to reduce memory overhead, and specialize to known
29 |     tensor shapes to allow for more aggressive constant propagation.
30 | 
31 | *   *Improve memory usage.* Analyze and schedule memory usage, in principle
32 |     eliminating many intermediate storage buffers.
33 | 
34 | *   *Reduce reliance on custom Ops.* Remove the need for many custom Ops by
35 |     improving the performance of automatically fused low-level Ops to match the
36 |     performance of custom Ops that were fused by hand.
37 | 
38 | *   *Reduce mobile footprint.* Eliminate the TensorFlow runtime by ahead-of-time
39 |     compiling the subgraph and emitting an object/header file pair that can be
40 |     linked directly into another application. The results can reduce the
41 |     footprint for mobile inference by several orders of magnitude.
42 | 
43 | *   *Improve portability.* Make it relatively easy to write a new backend for
44 |     novel hardware, at which point a large fraction of TensorFlow programs will
45 |     run unmodified on that hardware. This is in contrast with the approach of
46 |     specializing individual monolithic Ops for new hardware, which requires
47 |     TensorFlow programs to be rewritten to make use of those Ops.
48 | 
49 | ## How does XLA work?
50 | 
51 | The input language to XLA is called "HLO IR", or just HLO (High Level
52 | Optimizer). The semantics of HLO are described on the
53 | @{$operation_semantics$Operation Semantics} page. It
54 | is most convenient to think of HLO as a [compiler
55 | IR](https://en.wikipedia.org/wiki/Intermediate_representation).
56 | 
57 | XLA takes graphs ("computations") defined in HLO and compiles them into machine
58 | instructions for various architectures. XLA is modular in the sense that it is
59 | easy to slot in an alternative backend to @{$developing_new_backend$target some novel HW architecture}. The CPU backend for x64 and ARM64 as
60 | well as the NVIDIA GPU backend are in the TensorFlow source tree.
61 | 
62 | The following diagram shows the compilation process in XLA:
63 | 
64 | <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
65 |   <img src="../../images/how-does-xla-work.png">
66 | </div>
67 | 
68 | XLA comes with several optimizations and analyses that are target-independent,
69 | such as [CSE](https://en.wikipedia.org/wiki/Common_subexpression_elimination),
70 | target-independent operation fusion, and buffer analysis for allocating runtime
71 | memory for the computation.
72 | 
73 | After the target-independent step, XLA sends the HLO computation to a backend.
74 | The backend can perform further HLO-level analyses and optimizations, this time
75 | with target specific information and needs in mind. For example, the XLA GPU
76 | backend may perform operation fusion beneficial specifically for the GPU
77 | programming model and determine how to partition the computation into streams.
78 | At this stage, backends may also pattern-match certain operations or
79 | combinations thereof to optimized library calls.
80 | 
81 | The next step is target-specific code generation. The CPU and GPU backends
82 | included with XLA use [LLVM](http://llvm.org) for low-level IR, optimization,
83 | and code-generation. These backends emit the LLVM IR necessary to represent the
84 | XLA HLO computation in an efficient manner, and then invoke LLVM to emit native
85 | code from this LLVM IR.
86 | 
87 | The GPU backend currently supports NVIDIA GPUs via the LLVM NVPTX backend; the
88 | CPU backend supports multiple CPU ISAs.
89 | 
90 | ## Supported Platforms
91 | 
92 | XLA currently supports @{$jit$JIT compilation} on x86-64 and NVIDIA GPUs; and
93 | @{$tfcompile$AOT compilation} for x86-64 and ARM.
94 | 


--------------------------------------------------------------------------------
/install/install_windows.md:
--------------------------------------------------------------------------------
  1 | # 在 Windows 上安装 TensorFlow
  2 | 
  3 | 这篇指南描述了如何在 Windows 上安装 TensorFlow。
  4 | 
  5 | ## 确定 TensorFlow 版本
  6 | 
  7 | 如下之中选择一种来安装:
  8 | 
  9 |   * **只支持 CPU 的 TensorFlow**。如果你的系统不支持 NVIDIA® GPU, 你必须安装这个版本。这个版本的 TensorFlow 通常安装起来比较简单（一般 5 到 10分钟），所以即使你拥有 NVIDIA GPU，我们也推荐首先安装这个版本。
 10 |   * **支持 GPU 的 TensorFlow**. TensorFlow 在 GPU 上通常比在 CPU 上的执行的更快。所以如果你有符合如下要求的 NVIDIA® GPU 并且需要注重性能，可以随后安装这个版本。
 11 | 
 12 | ### GPU support TensorFlow 的 NVIDIA 需求
 13 | 
 14 | 需要事先安装如下软件：
 15 | 
 16 |   * CUDA® Toolkit 8.0。详见 [NVIDIA's documentation](http://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/)。确保按照文档中描述的将 Cuda 相关路径加入到 `%PATH%` 环境变量中。
 17 |   * CUDA Toolkit 8.0 相关的 NVIDIA 驱动。
 18 |   * cuDNN v5.1。详见 [NVIDIA's documentation](https://developer.nvidia.com/cudnn)。注意：cuDNN 通常与其他 CUDA DLLs 安装的位置不同。确保将 cuDNN 库的安装目录加入到了`%PATH%`中。
 19 |   * CUDA Compute Capability 3.0 或更高的 GPU 芯片。支持的 GPU 芯片详见 [NVIDIA documentation](https://developer.nvidia.com/cuda-gpus) 。
 20 | 
 21 | 如果上述软件版本较老，请将其升级到指定版本。
 22 | 
 23 | 
 24 | ## 确定如何安装 TensorFlow
 25 | 
 26 | 有如下选择：
 27 | 
 28 |   * "native" pip
 29 |   * Anaconda
 30 | 
 31 | 原生 pip 直接在系统中安装 TensorFlow，而不使用虚拟环境。
 32 | 因为原生 pip 安装没有使用独立的容器隔离开，所以可能干扰其他基于Python的安装。
 33 | 不过，如果你理解 pip 和 Python 环境，原生 pip 安装通常只需要一个命令！
 34 | 如果使用原生 pip 安装，用户可在任何目录中执行 TensorFlow 程序。
 35 | 
 36 | 在 Anaconda 中，你可以通过 conda 创建一个虚拟环境。
 37 | 然而，我们推荐使用 `pip install` 安装 TensorFlow，而非`conda install`。
 38 | 
 39 | **注意：**conda 包是社区支持而非官方支持。也就是说 TensorFlow 团队没有测试也没有管理过 conda 包。
 40 | 使用这个包需要自行承担风险。
 41 | 
 42 | 
 43 | ## 原生 pip 安装
 44 | 
 45 | 如果如下版本的　Python 没有安装，先安装：
 46 | 
 47 |   * [Python 3.5.x from python.org](https://www.python.org/downloads/release/python-352/)
 48 | 
 49 | TensorFlow 在　Windows 上支持　Python 3.5.x。
 50 | 注意 Python 3.5.x 使用 pip3，我们用 pip3 来安装 TensorFlow。
 51 | 
 52 | 在 terminal 中输入如下命令安装只支持 CPU 的 TensorFlow：
 53 | 
 54 | <pre>C:\> <b>pip3 install --upgrade tensorflow</b></pre>
 55 | 
 56 | 安装支持 GPU 的 TensorFlow，使用如下命令：
 57 | 
 58 | <pre>C:\> <b>pip3 install --upgrade tensorflow-gpu</b></pre>
 59 | 
 60 | 
 61 | ## Anaconda 安装
 62 | 
 63 | **Anaconda 安装是社区支持，而非官方支持**
 64 | 
 65 | 按照如下步骤在 Anaconda 环境中安装 TensorFlow：
 66 | 
 67 |   1. 按说明下载并安装 Anaconda：
 68 |      [Anaconda download site](https://www.continuum.io/downloads)
 69 | 
 70 |   2. 建立一个 conda 环境，命名为 <tt>tensorflow</tt>，以便运行某个 Python 版本：
 71 | 
 72 |      <pre>C:\> <b>conda create -n tensorflow</b> </pre>
 73 | 
 74 |   3. 激活 anaconda 环境：
 75 | 
 76 |      <pre>C:\> <b>activate tensorflow</b>
 77 |      (tensorflow)C:\>  # 你的提示符应该发生变化 </pre>
 78 | 
 79 |   4. 在你的 conda 环境中安装只支持 CPU 的 TensorFlow（写在一行）：
 80 | 
 81 |      <pre>(tensorflow)C:\> <b>pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/cpu/tensorflow-1.1.0-cp35-cp35m-win_amd64.whl</b> </pre>
 82 | 
 83 |      安装支持 GPU 的 TensorFlow（写在一行）：
 84 | 
 85 |      <pre>(tensorflow)C:\> <b>pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-win_amd64.whl</b> </pre>
 86 | 
 87 | ## 验证安装结果
 88 | 
 89 | 启动 terminal。
 90 | 
 91 | 如果通过 Anaconda 安装，激活 Anaconda 环境。
 92 | 
 93 | 启动 Python：
 94 | 
 95 | <pre>$ <b>python</b></pre>
 96 | 
 97 | 在 Python 交互式环境中输入
 98 | 
 99 | ```python
100 | >>> import tensorflow as tf
101 | >>> hello = tf.constant('Hello, TensorFlow!')
102 | >>> sess = tf.Session()
103 | >>> print(sess.run(hello))
104 | ```
105 | 
106 | 如果系统输出如下，则安装成功：
107 | 
108 | <pre>Hello, TensorFlow!</pre>
109 | 
110 | 如果你新接触 TensorFlow，参考[初识 TensorFlow](../get_started)进行下一步学习。
111 | 
112 | 如果系统输出错误信息而非欢迎信息，查看[常见安装问题](#common_installation_problems)。
113 | 
114 | ## 常见安装问题
115 | 
116 | 我们依靠 Stack Overflow 来编写 TensorFlow 安装问题及解决方案的文档。
117 | 如下表格包含了 Stack Overflow 上比较常见的安装问题的连接。
118 | 如果你遇到了不在列表中的新的错误信息或者其他安装问题，请在 Stack Overflow 上搜索。
119 | 如果搜索不到，请在 Stack Overflow 上提出一个新的问题，并打上 `tensorflow` 的标签。
120 | 
121 | <table>
122 | <tr> <th>Stack Overflow Link</th> <th>Error Message</th> </tr>
123 | 
124 | <tr>
125 |   <td><a href="https://stackoverflow.com/q/41007279">41007279</a></td>
126 |   <td>
127 |   <pre>[...\stream_executor\dso_loader.cc] Couldn't open CUDA library nvcuda.dll</pre>
128 |   </td>
129 | </tr>
130 | 
131 | <tr>
132 |   <td><a href="https://stackoverflow.com/q/41007279">41007279</a></td>
133 |   <td>
134 |   <pre>[...\stream_executor\cuda\cuda_dnn.cc] Unable to load cuDNN DSO</pre>
135 |   </td>
136 | </tr>
137 | 
138 | <tr>
139 |   <td><a href="http://stackoverflow.com/q/42006320">42006320</a></td>
140 |   <td><pre>ImportError: Traceback (most recent call last):
141 | File "...\tensorflow\core\framework\graph_pb2.py", line 6, in <module>
142 | from google.protobuf import descriptor as _descriptor
143 | ImportError: cannot import name 'descriptor'</pre>
144 |   </td>
145 | </tr>
146 | 
147 | <tr>
148 |   <td><a href="https://stackoverflow.com/q/42011070">42011070</a></td>
149 |   <td><pre>No module named "pywrap_tensorflow"</pre></td>
150 | </tr>
151 | 
152 | <tr>
153 |   <td><a href="https://stackoverflow.com/q/42217532">42217532</a></td>
154 |   <td>
155 |   <pre>OpKernel ('op: "BestSplits" device_type: "CPU"') for unknown op: BestSplits</pre>
156 |   </td>
157 | </tr>
158 | 
159 | <tr>
160 |   <td><a href="https://stackoverflow.com/q/43134753">43134753</a></td>
161 |   <td>
162 |   <pre>The TensorFlow library wasn't compiled to use SSE instructions</pre>
163 |   </td>
164 | </tr>
165 | 
166 | </table>
167 | 
168 | 


--------------------------------------------------------------------------------
/programmers_guide/tfdbg-tflearn.md:
--------------------------------------------------------------------------------
  1 | # How to Use TensorFlow Debugger (tfdbg) with tf.contrib.learn
  2 | 
  3 | [TOC]
  4 | 
  5 | In @{$debugger$a previous tutorial}, we described how to use TensorFlow Debugger (**tfdbg**)
  6 | to debug TensorFlow graphs running in
  7 | @{tf.Session}
  8 | objects managed by yourself. However, many users find
  9 | @{$tflearn$`tf.contrib.learn`}
 10 | @{tf.contrib.learn.Estimator$Estimator}s
 11 | to be a convenient higher-level API for creating and using models
 12 | in TensorFlow. Part of the convenience is that `Estimator`s manage `Session`s
 13 | internally. Fortunately, you can still use `tfdbg` with `Estimator`s by adding
 14 | special hooks.
 15 | 
 16 | ## Debugging tf.contrib.learn Estimators
 17 | 
 18 | Currently, **tfdbg** can debug the
 19 | @{tf.contrib.learn.BaseEstimator.fit$`fit()`}
 20 | @{tf.contrib.learn.BaseEstimator.evaluate$`evaluate()`}
 21 | methods of tf-learn `Estimator`s. To debug `Estimator.fit()`,
 22 | create a `LocalCLIDebugHook` and supply it as the `monitors` argument. For example:
 23 | 
 24 | ```python
 25 | # First, let your BUILD target depend on "//tensorflow/python/debug:debug_py"
 26 | # (You don't need to worry about the BUILD dependency if you are using a pip
 27 | #  install of open-source TensorFlow.)
 28 | from tensorflow.python import debug as tf_debug
 29 | 
 30 | hooks = [tf_debug.LocalCLIDebugHook()]
 31 | 
 32 | # Create a local CLI debug hook and use it as a monitor when calling fit().
 33 | classifier.fit(x=training_set.data,
 34 |                y=training_set.target,
 35 |                steps=1000,
 36 |                monitors=hooks)
 37 | ```
 38 | 
 39 | To debug `Estimator.evaluate()`, you can follow the example below:
 40 | 
 41 | ```python
 42 | accuracy_score = classifier.evaluate(x=test_set.data,
 43 |                                      y=test_set.target,
 44 |                                      hooks=hooks)["accuracy"]
 45 | ```
 46 | 
 47 | 
 48 | For a detailed [example](https://www.tensorflow.org/code/tensorflow/python/debug/examples/debug_tflearn_iris.py) based on
 49 | @{$tflearn$tf-learn's iris tutorial},
 50 | run:
 51 | 
 52 | ```none
 53 | python -m tensorflow.python.debug.examples.debug_tflearn_iris --debug
 54 | ```
 55 | 
 56 | ## Debugging tf.contrib.learn Experiments
 57 | 
 58 | `Experiment` is a construct in `tf.contrib.learn` at a higher level than
 59 | `Estimator`.
 60 | It provides a single interface for training and evaluating a model. To debug
 61 | the `train()` and `evaluate()` calls to an `Experiment` object, you can
 62 | use the keyword arguments `train_monitors` and `eval_hooks`, respectively, when
 63 | calling its constructor. For example:
 64 | 
 65 | ```python
 66 | # First, let your BUILD target depend on "//tensorflow/python/debug:debug_py"
 67 | # (You don't need to worry about the BUILD dependency if you are using a pip
 68 | #  install of open-source TensorFlow.)
 69 | from tensorflow.python import debug as tf_debug
 70 | 
 71 | hooks = [tf_debug.LocalCLIDebugHook()]
 72 | 
 73 | ex = experiment.Experiment(classifier,
 74 |                            train_input_fn=iris_input_fn,
 75 |                            eval_input_fn=iris_input_fn,
 76 |                            train_steps=FLAGS.train_steps,
 77 |                            eval_delay_secs=0,
 78 |                            eval_steps=1,
 79 |                            train_monitors=hooks,
 80 |                            eval_hooks=hooks)
 81 | 
 82 | ex.train()
 83 | accuracy_score = ex.evaluate()["accuracy"]
 84 | ```
 85 | 
 86 | To see the `debug_tflearn_iris` example run in the `Experiment` mode, do:
 87 | 
 88 | ```none
 89 | python -m tensorflow.python.debug.examples.debug_tflearn_iris \
 90 |     --use_experiment --debug
 91 | ```
 92 | 
 93 | ## Debugging Estimators and Experiments without Terminal Access
 94 | 
 95 | If your `Estimator` or `Experiment` is running in an environment to which you
 96 | do not have command-line access (e.g., a remote server), you can use the
 97 | non-interactive `DumpingDebugHook`. For example:
 98 | 
 99 | ```python
100 | # Let your BUILD target depend on "//tensorflow/python/debug:debug_py
101 | # (You don't need to worry about the BUILD dependency if you are using a pip
102 | #  install of open-source TensorFlow.)
103 | from tensorflow.python import debug as tf_debug
104 | 
105 | hooks = [tf_debug.DumpingDebugHook("/shared/storage/location/tfdbg_dumps_1")]
106 | ```
107 | 
108 | Then this `hook` can be used in the same way as the `LocalCLIDebugHook` examples
109 | above. As the training and/or evalution of `Estimator` or `Experiment`
110 | happens, directories of the naming pattern
111 | `/shared/storage/location/tfdbg_dumps_1/run_<epoch_timestamp_microsec>_<uuid>`
112 | will appear. Each directory corresponds to a `Session.run()` call that underlies
113 | the `fit()` or `evaluate()` call. You can load these directories and inspect
114 | them in a command-line interface in an offline manner using the
115 | `offline_analyzer` offered by **tfdbg**. For example:
116 | 
117 | ```bash
118 | python -m tensorflow.python.debug.cli.offline_analyzer \
119 |     --dump_dir="/shared/storage/location/tfdbg_dumps_1/run_<epoch_timestamp_microsec>_<uuid>"
120 | ```
121 | 
122 | The `LocalCLIDebugHook` also allows you to configure a `watch_fn` that can be
123 | used to flexibly specify what `Tensor`s to watch on different `Session.run()`
124 | calls, as a function of the `fetches` and `feed_dict` and other states. See
125 | @{tfdbg.DumpingDebugWrapperSession.__init__$this API doc}
126 | for more details.
127 | 


--------------------------------------------------------------------------------
/install/install_go.md:
--------------------------------------------------------------------------------
  1 | # Installing TensorFlow for Go
  2 | 
  3 | TensorFlow provides APIs for use in Go programs. These APIs are particularly
  4 | well-suited to loading models created in Python and executing them within
  5 | a Go application. This guide explains how to install and set up the
  6 | [TensorFlow Go package](https://godoc.org/github.com/tensorflow/tensorflow/tensorflow/go).
  7 | 
  8 | **WARNING:** The TensorFlow Go API is *not* covered by the TensorFlow
  9 | [API stability guarantees](https://www.tensorflow.org/programmers_guide/version_semantics).
 10 | 
 11 | 
 12 | ## Supported Platforms
 13 | 
 14 | You may install TensorFlow for Go on the following operating systems:
 15 | 
 16 |   * Linux
 17 |   * Mac OS X
 18 | 
 19 | 
 20 | ## Installation
 21 | 
 22 | TensorFlow for Go depends on the TensorFlow C library. Take the following
 23 | steps to install this library and enable TensorFlow for Go:
 24 | 
 25 |   1. Decide whether you will run TensorFlow for Go on CPU(s) only or with
 26 |      the help of GPU(s). To help you decide, read the section entitled
 27 |      "Determine which TensorFlow to install" in one of the following guides:
 28 | 
 29 |      * @{$install_linux#determine_which_tensorflow_to_install$Installing TensorFlow on Linux}
 30 |      * @{$install_mac#determine_which_tensorflow_to_install$Installing TensorFlow on Mac OS}
 31 | 
 32 |   2. Download and extract the TensorFlow C library into `/usr/local/lib` by
 33 |      invoking the following shell commands:
 34 | 
 35 |          TF_TYPE="cpu" # Change to "gpu" for GPU support
 36 |          TARGET_DIRECTORY='/usr/local'
 37 |          curl -L \
 38 |            "https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-${TF_TYPE}-$(go env GOOS)-x86_64-1.1.0.tar.gz" |
 39 |          sudo tar -C $TARGET_DIRECTORY -xz
 40 | 
 41 |      The `tar` command extracts the TensorFlow C library into the `lib`
 42 |      subdirectory of `TARGET_DIRECTORY`. For example, specifying `/usr/local`
 43 |      as `TARGET_DIRECTORY` causes `tar` to extract the TensorFlow C library
 44 |      into `/usr/local/lib`.
 45 | 
 46 |      If you'd prefer to extract the library into a different directory,
 47 |      adjust `TARGET_DIRECTORY` accordingly.
 48 | 
 49 |   3. In Step 2, if you specified a system directory (for example, `/usr/local`)
 50 |      as the `TARGET_DIRECTORY`, then run `ldconfig` to configure the linker.
 51 |      For example:
 52 | 
 53 |      <pre><b>sudo ldconfig</b></pre>
 54 | 
 55 |      If you assigned a `TARGET_DIRECTORY` other than a system
 56 |      directory (for example, `~/mydir`), then you must append the extraction
 57 |      directory (for example, `~/mydir/lib`) to two environment variables
 58 |      as follows:
 59 | 
 60 |      <pre> <b>export LIBRARY_PATH=$LIBRARY_PATH:~/mydir/lib</b> # For both Linux and Mac OS X
 61 |      <b>export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/mydir/lib</b> # For Linux only
 62 |      <b>export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:~/mydir/lib</b> # For Mac OS X only</pre>
 63 | 
 64 |   4. Now that the TensorFlow C library is installed, invoke `go get` as follows
 65 |      to download the appropriate packages and their dependencies:
 66 | 
 67 |      <pre><b>go get github.com/tensorflow/tensorflow/tensorflow/go</b></pre>
 68 | 
 69 |   5. Invoke `go test` as follows to validate the TensorFlow for Go
 70 |      installation:
 71 | 
 72 |      <pre><b>go test github.com/tensorflow/tensorflow/tensorflow/go</b></pre>
 73 | 
 74 | If `go get` or `go test` generate error messages, search (or post to)
 75 | [StackOverflow](http://www.stackoverflow.com/questions/tagged/tensorflow)
 76 | for possible solutions.
 77 | 
 78 | 
 79 | ## Hello World
 80 | 
 81 | After installing TensorFlow for Go, enter the following code into a
 82 | file named `hello_tf.go`:
 83 | 
 84 | ```go
 85 | package main
 86 | 
 87 | import (
 88 | 	tf "github.com/tensorflow/tensorflow/tensorflow/go"
 89 | 	"github.com/tensorflow/tensorflow/tensorflow/go/op"
 90 | 	"fmt"
 91 | )
 92 | 
 93 | func main() {
 94 | 	// Construct a graph with an operation that produces a string constant.
 95 | 	s := op.NewScope()
 96 | 	c := op.Const(s, "Hello from TensorFlow version " + tf.Version())
 97 | 	graph, err := s.Finalize()
 98 | 	if err != nil {
 99 | 		panic(err)
100 | 	}
101 | 
102 | 	// Execute the graph in a session.
103 | 	sess, err := tf.NewSession(graph, nil)
104 | 	if err != nil {
105 | 		panic(err)
106 | 	}
107 | 	output, err := sess.Run(nil, []tf.Output{c}, nil)
108 | 	if err != nil {
109 | 		panic(err)
110 | 	}
111 | 	fmt.Println(output[0].Value())
112 | }
113 | ```
114 | 
115 | For a more advanced example of TensorFlow in Go, look at the
116 | [example in the API documentation](https://godoc.org/github.com/tensorflow/tensorflow/tensorflow/go#ex-package),
117 | which uses a pre-trained TensorFlow model to label contents of an image.
118 | 
119 | 
120 | ### Running
121 | 
122 | Run `hello_tf.go` by invoking the following command:
123 | 
124 | <pre><b>go run hello_tf.go</b>
125 | Hello from TensorFlow version <i>number</i></pre>
126 | 
127 | The program might also generate multiple warning messages of the
128 | following form, which you can ignore:
129 | 
130 | <pre>W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library
131 | wasn't compiled to use *Type* instructions, but these are available on your
132 | machine and could speed up CPU computations.</pre>
133 | 
134 | 
135 | ## Building from source code
136 | 
137 | TensorFlow is open-source. You may build TensorFlow for Go from the
138 | TensorFlow source code by following the instructions in a
139 | [separate document](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/go/README.md).
140 | 


--------------------------------------------------------------------------------
/performance/xla/shapes.md:
--------------------------------------------------------------------------------
  1 | # Shapes and Layout
  2 | 
  3 | The XLA `Shape` proto
  4 | ([xla_data.proto](https://www.tensorflow.org/code/tensorflow/compiler/xla/xla_data.proto))
  5 | describes the rank, size, and data type of an N-dimensional array (*array* in
  6 | short).
  7 | 
  8 | ## Terminology, Notation, and Conventions
  9 | 
 10 | *   The rank of an array is equal to the number of dimensions. The *true rank*
 11 |     of an array is the number of dimensions which have a size greater than 1.
 12 | 
 13 | *   Dimensions are numbered from `0` up to `N-1` for an `N` dimensional array.
 14 |     The dimension numbers are arbitrary labels for convenience. The order of
 15 |     these dimension numbers does not imply a particular minor/major ordering in
 16 |     the layout of the shape. The layout is determined by the `Layout` proto.
 17 | 
 18 | *   By convention, dimensions are listed in increasing order of dimension
 19 |     number. For example, for a 3-dimensional array of size `[A x B x C]`,
 20 |     dimension 0 has size `A`, dimension 1 has size `B` and dimension 2 has size
 21 |     `C`.
 22 | 
 23 |     Some utilities in XLA also support negative indexing, similarly to Python;
 24 |     dimension -1 is the last dimension (equivalent to `N-1` for an `N`
 25 |     dimensional array). For example, for the 3-dimensional array described
 26 |     above, dimension -1 has size `C`, dimension -2 has size `B` and so on.
 27 | 
 28 | *   Two, three, and four dimensional arrays often have specific letters
 29 |     associated with dimensions. For example, for a 2D array:
 30 | 
 31 |     *   dimension 0: `y`
 32 |     *   dimension 1: `x`
 33 | 
 34 |     For a 3D array:
 35 | 
 36 |     *   dimension 0: `z`
 37 |     *   dimension 1: `y`
 38 |     *   dimension 2: `x`
 39 | 
 40 |     For a 4D array:
 41 | 
 42 |     *   dimension 0: `p`
 43 |     *   dimension 1: `z`
 44 |     *   dimension 2: `y`
 45 |     *   dimension 3: `x`
 46 | 
 47 | *   Functions in the XLA API which take dimensions do so in increasing order of
 48 |     dimension number. This matches the ordering used when passing dimensions as
 49 |     an `initializer_list`; e.g.
 50 | 
 51 |     `ShapeUtil::MakeShape(F32, {A, B, C, D})`
 52 | 
 53 |     Will create a shape whose dimension size array consists of the sequence
 54 |     `[A, B, C, D]`.
 55 | 
 56 | ## Layout
 57 | 
 58 | The `Layout` proto describes how an array is represented in memory. The `Layout`
 59 | proto includes the following fields:
 60 | 
 61 | ```
 62 | message Layout {
 63 |   repeated int64 minor_to_major = 1;
 64 |   repeated int64 padded_dimensions = 2;
 65 |   optional PaddingValue padding_value = 3;
 66 | }
 67 | ```
 68 | 
 69 | ### Minor-to-major dimension ordering
 70 | 
 71 | The only required field is `minor_to_major`. This field describes the
 72 | minor-to-major ordering of the dimensions within a shape. Values in
 73 | `minor_to_major` are an ordering of the dimensions of the array (`0` to `N-1`
 74 | for an `N` dimensional array) with the first value being the most-minor
 75 | dimension up to the last value which is the most-major dimension. The most-minor
 76 | dimension is the dimension which changes most rapidly when stepping through the
 77 | elements of the array laid out in linear memory.
 78 | 
 79 | For example, consider the following 2D array of size `[2 x 3]`:
 80 | 
 81 | ```
 82 | a b c
 83 | d e f
 84 | ```
 85 | 
 86 | Here dimension `0` is size 2, and dimension `1` is size 3. If the
 87 | `minor_to_major` field in the layout is `[0, 1]` then dimension `0` is the
 88 | most-minor dimension and dimension `1` is the most-major dimension. This
 89 | corresponds to the following layout in linear memory:
 90 | 
 91 | ```
 92 | a d b e c f
 93 | ```
 94 | 
 95 | This minor-to-major dimension order of `0` up to `N-1` is akin to *column-major*
 96 | (at rank 2). Assuming a monotonic ordering of dimensions, another name we may
 97 | use to refer to this layout in the code is simply "dim 0 is minor".
 98 | 
 99 | On the other hand, if the `minor_to_major` field in the layout is `[1, 0]` then
100 | the layout in linear memory is:
101 | 
102 | ```
103 | a b c d e f
104 | ```
105 | 
106 | A minor-to-major dimension order of `N-1` down to `0` for an `N` dimensional
107 | array is akin to *row-major* (at rank 2). Assuming a monotonic ordering of
108 | dimensions, another name we may use to refer to this layout in the code is
109 | simply "dim 0 is major".
110 | 
111 | #### Default minor-to-major ordering
112 | 
113 | The default layout for newly created Shapes is "dimension order is
114 | major-to-minor" (akin to row-major at rank 2).
115 | 
116 | ### Padding
117 | 
118 | Padding is defined in the optional `padded_dimensions` and `padding_value`
119 | fields. The field `padded_dimensions` describes the sizes (widths) to which each
120 | dimension is padded. If present, the number of elements in `padded_dimensions`
121 | must equal the rank of the shape.
122 | 
123 | For example, given the `[2 x 3]` array defined above, if `padded_dimension` is
124 | `[3, 5]` then dimension 0 is padded to a width of 3 and dimension 1 is padded to
125 | a width of 5. The layout in linear memory (assuming a padding value of 0 and
126 | column-major layout) is:
127 | 
128 | ```
129 | a d 0 b e 0 c f 0 0 0 0 0 0 0
130 | ```
131 | 
132 | This is equivalent to the layout of the following array with the same
133 | minor-to-major dimension order:
134 | 
135 | ```
136 | a b c 0 0
137 | d e f 0 0
138 | 0 0 0 0 0
139 | ```
140 | 
141 | ### Indexing into arrays
142 | 
143 | The class `IndexUtil` in
144 | [index_util.h](https://www.tensorflow.org/code/tensorflow/compiler/xla/index_util.h)
145 | provides utilities for converting between multidimensional indices and linear
146 | indices given a shape and layout. Multidimensional indices include a `int64`
147 | index for each dimension. Linear indices are a single `int64` value which
148 | indexes into the buffer holding the array. See `shape_util.h` and
149 | `layout_util.h` in the same directory for utilities that simplify creation and
150 | manipulation of shapes and layouts.
151 | 


--------------------------------------------------------------------------------
/performance/performance_guide.md:
--------------------------------------------------------------------------------
  1 | # Performance
  2 | 
  3 | This guide contains a collection of best practices for optimizing your
  4 | TensorFlow code. The best practices apply to both new and experienced
  5 | Tensorflow users.
  6 | 
  7 | ## Best Practices
  8 | While optimizing implementations of different types of models can be different,
  9 | the topics below cover best practices to get the most performance from
 10 | TensorFlow. Although these suggestions focus on image-based models, we will
 11 | regularly add tips for all kinds of models. The following list highlights key
 12 | best practices:
 13 | 
 14 | *   Build and install from source
 15 | *   Utilize queues for reading data
 16 | *   Preprocessing on the CPU
 17 | *   Use `NCHW` image data format
 18 | *   Place shared parameters on the GPU
 19 | *   Use fused batch norm
 20 | 
 21 | The following sections detail the preceding suggestions.
 22 | 
 23 | ### Build and install from source
 24 | 
 25 | To install the most optimized version of TensorFlow, build and install
 26 | TensorFlow from source by following [Installing TensorFlow from Source](../install/install_sources).
 27 | Building from source with compiler optimizations for the target hardware and
 28 | ensuring the latest CUDA platform and cuDNN libraries are installed results in
 29 | the highest performing installs.
 30 | 
 31 | For the most stable experience, build from the [latest release](https://github.com/tensorflow/tensorflow/releases)
 32 | branch. To get the latest performance changes and accept some stability risk,
 33 | build from [master](https://github.com/tensorflow/tensorflow).
 34 | 
 35 | If there is a need to build TensorFlow on a platform that has different hardware
 36 | than the target, then cross-compile with the highest optimizations for the target
 37 | platform.  The following command is an example of telling `bazel` to compile for
 38 | a specific platform:
 39 | 
 40 | ```python
 41 | # This command optimizes for Intel’s Broadwell processor
 42 | bazel build -c opt --copt=-march="broadwell" --config=cuda //tensorflow/tools/pip_package:build_pip_package
 43 | 
 44 | ```
 45 | 
 46 | #### Environment, build, and install tips
 47 | 
 48 | *   Compile with the highest level of compute the [GPU
 49 |     supports](http://developer.nvidia.com/cuda-gpus), e.g. P100: 6.0, Titan X
 50 |     (pascal): 6.2, Titan X (maxwell): 5.2, and K80: 3.7.
 51 | *   Install the latest CUDA platform and cuDNN libraries.
 52 | *   Make sure to use a version of gcc that supports all of the optimizations of
 53 |     the target CPU. The recommended minimum gcc version is 4.8.3.
 54 | *   TensorFlow checks on startup whether it has been compiled with the
 55 |     optimizations available on the CPU. If the optimizations are not included,
 56 |     TensorFlow will emit warnings, e.g. AVX, AVX2, and FMA instructions not
 57 |     included.
 58 | 
 59 | ### Utilize queues for reading data
 60 | 
 61 | One common cause of poor performance is underutilizing GPUs, or essentially
 62 | "starving" them of data by not setting up an efficient pipeline. Make sure to
 63 | set up an input pipeline to utilize queues and stream data effectively. Review
 64 | the @{$reading_data#reading_from_files$Reading Data guide} for implementation
 65 | details. One way to identify a "starved" GPU is to generate and review
 66 | timelines. A detailed tutorial for timelines does not exist, but a quick example
 67 | of generating a timeline exists as part of the @{$jit$XLA JIT} tutorial. Another
 68 | simple way to check if a GPU is underutilized is to run `watch nvidia-smi`, and
 69 | if GPU utilization is not approaching 100% then the GPU is not getting data fast
 70 | enough.
 71 | 
 72 | Unless for a special circumstance or for example code, do not feed data
 73 | into the session from Python variables, e.g. `dictionary`.
 74 | 
 75 | ```python
 76 | # This will result in poor performance.
 77 | sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys})
 78 | ```
 79 | 
 80 | ### Preprocessing on the CPU
 81 | 
 82 | Placing preprocessing operations on the CPU can significantly improve
 83 | performance.  When preprocessing occurs on the GPU the flow of data is
 84 | CPU -> GPU (preprocessing) -> CPU -> GPU (training).  The data is bounced back
 85 | and forth between the CPU and GPU.  When preprocessing is placed on the CPU,
 86 | the data flow is CPU (preprocessing) -> GPU (training).  Another benefit is
 87 | preprocessing on the CPU frees GPU time to focus on training.
 88 | 
 89 | Placing preprocessing on the CPU can result in a 6X+ increase in samples/sec
 90 | processed, which could lead to training in 1/6th of the time.  To ensure
 91 | preprocessing is on the CPU, wrap the preprocessing operations as shown below:
 92 | 
 93 | ```python
 94 | with tf.device('/cpu:0'):
 95 |   # function to get and process images or data.
 96 |   distorted_inputs = load_and_distort_images()
 97 | ```
 98 | 
 99 | ### Use large files
100 | 
101 | Under some circumstances, both the CPU and GPU can be starved for data by the
102 | I/O system. If you are using many small files to form your input data set, you
103 | may be limited by the speed of your filesystem. If your training loop runs
104 | faster when using SSDs vs HDDs for storing your input data, you could could be
105 | I/O bottlenecked.
106 | 
107 | If this is the case, you should pre-process your input data, creating a few
108 | large TFRecord files.
109 | 
110 | ### Use NCHW image data format
111 | 
112 | Image data format refers to the representation of batches of images. TensorFlow
113 | supports `NHWC` (TensorFlow default) and `NCHW` (cuDNN default). N refers to the
114 | number of images in a batch, H refers to the number of pixels in the vertical
115 | dimension, W refers to the number of pixels in the horizontal dimension, and C
116 | refers to the channels (e.g. 1 for black and white, 3 for RGB, etc.) Although
117 | cuDNN can operate on both formats, it is faster to operate in its default
118 | format.
119 | 
120 | The best practice is to build models that work with both `NCHW` and `NHWC` as it
121 | is common to train using `NCHW` on GPU, and then do inference with NHWC on CPU.
122 | 
123 | The very brief history of these two formats is that TensorFlow started by using
124 | `NHWC` because it was a little faster on CPUs. Then the TensorFlow team
125 | discovered that `NCHW` performs better when using the NVIDIA cuDNN library.  The
126 | current recommendation is that users support both formats in their models. In
127 | the long term, we plan to rewrite graphs to make switching between the formats
128 | transparent.
129 | 
130 | ### Use fused batch norm
131 | 
132 | When using batch norm
133 | @{tf.contrib.layers.batch_norm} set the attribute `fused=True`:
134 | 
135 | ```python
136 | bn = tf.contrib.layers.batch_norm(
137 |           input_layer, fused=True, data_format='NCHW'
138 |           scope=scope, **kwargs)
139 | ```
140 | 
141 | The non-fused batch norm does computations using several individual Ops. Fused
142 | batch norm combines the individual operations into a single kernel, which runs
143 | faster.
144 | 


--------------------------------------------------------------------------------
/programmers_guide/data_versions.md:
--------------------------------------------------------------------------------
  1 | # TensorFlow Data Versioning: GraphDefs and Checkpoints
  2 | 
  3 | As described in
  4 | @{$version_semantics#compatibility-for-graphs-and-checkpoints$Compatibility for Graphs and Checkpoints},
  5 | TensorFlow marks each kind of data with version information in order to maintain
  6 | backward compatibility. This document provides additional details about the
  7 | versioning mechanism, and how to use it to safely change data formats.
  8 | 
  9 | ## Backward and partial forward compatibility
 10 | 
 11 | The two core artifacts exported from and imported into TensorFlow are
 12 | checkpoints (serialized variable states) and `GraphDef`s (serialized computation
 13 | graphs). Any approach to versioning these artifacts must take into account the
 14 | following requirements:
 15 | 
 16 | *   **Backward compatibility** to support loading `GraphDefs` created with older
 17 |     versions of TensorFlow.
 18 | *   **Forward compatibility** to support scenarios where the producer of a
 19 |     `GraphDef` is upgraded to a newer version of TensorFlow before the consumer.
 20 | *   Enable evolving TensorFlow in incompatible ways. For example, removing Ops,
 21 |     adding attributes, and removing attributes.
 22 | 
 23 | For `GraphDef`s, backward compatibility is enforced within a major version. This
 24 | means functionality can only be removed between major versions. Forward
 25 | compatibility is enforced within Patch releases (1.x.1 -> 1.x.2, for example).
 26 | 
 27 | 
 28 | In order to achieve backward and forward compatibility as well as know when to
 29 | enforce changes in formats, the serialized representations of graphs and
 30 | variable state need to have metadata that describes when they were produced. The
 31 | sections below detail the TensorFlow implementation and guidelines for evolving
 32 | `GraphDef` versions.
 33 | 
 34 | ### Independent data version schemes
 35 | 
 36 | There are data versions for `GraphDef`s and checkpoints. Both data formats
 37 | evolve at different rates, and also at different speeds than the version of
 38 | TensorFlow. Both versioning systems are defined in
 39 | [`core/public/version.h`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/public/version.h).
 40 | Whenever a new version is added a note is added to the header detailing what
 41 | changed and the date.
 42 | 
 43 | ### Data, producers, and consumers
 44 | 
 45 | This section discusses version information for **data**, binaries that produce
 46 | data (**producers**), and binaries that consume data (**consumers**):
 47 | 
 48 | *   Producer binaries have a version (`producer`) and a minimum consumer version
 49 |     that they are compatible with (`min_consumer`).
 50 | *   Consumer binaries have a version (`consumer`) and a minimum producer version
 51 |     that they are compatible with (`min_producer`).
 52 | *   Each piece of versioned data has a [`VersionDef
 53 |     versions`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/versions.proto)
 54 |     field which records the `producer` that made the data, the `min_consumer`
 55 |     that it is compatible with, and a list of `bad_consumers` versions that are
 56 |     disallowed.
 57 | 
 58 | By default, when a producer makes some data, the data inherits the producer's
 59 | `producer` and `min_consumer` versions. `bad_consumers` can be set if specific
 60 | consumer versions are known to contain bugs and must be avoided. A consumer can
 61 | accept a piece of data if
 62 | 
 63 | *   `consumer` >= data's `min_consumer`
 64 | *   data's `producer` >= consumer's `min_producer`
 65 | *   `consumer` not in data's `bad_consumers`
 66 | 
 67 | Since both producers and consumers come from the same TensorFlow code base,
 68 | [`core/public/version.h`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/public/version.h)
 69 | contains a main binary version which is treated as either `producer` or
 70 | `consumer` depending on context and both `min_consumer` and `min_producer`
 71 | (needed by producers and consumers, respectively). Specifically,
 72 | 
 73 | *   For `GraphDef` versions, we have `TF_GRAPH_DEF_VERSION`,
 74 |     `TF_GRAPH_DEF_VERSION_MIN_CONSUMER`, and
 75 |     `TF_GRAPH_DEF_VERSION_MIN_PRODUCER`.
 76 | *   For checkpoint versions, we have `TF_CHECKPOINT_VERSION`,
 77 |     `TF_CHECKPOINT_VERSION_MIN_CONSUMER`, and
 78 |     `TF_CHECKPOINT_VERSION_MIN_PRODUCER`.
 79 | 
 80 | ### Evolving GraphDef versions
 81 | 
 82 | This section presents examples of using this versioning mechanism to make
 83 | changes to the `GraphDef` format.
 84 | 
 85 | **Adding a new Op:**
 86 | 
 87 | 1.  Add the new Op to both consumers and producers at the same time, and do not
 88 |     change any `GraphDef` versions. This type of change is automatically
 89 |     backward compatible, and does not impact forward compatibility plan since
 90 |     existing producer scripts will not suddenly use the new functionality.
 91 | 
 92 | **Adding a new Op and switching existing Python wrappers to use it:**
 93 | 
 94 | 1.  Implement new consumer functionality and increment the binary version.
 95 | 2.  If it is possible to make the wrappers use the new functionality only in
 96 |     cases that did not work before, the wrappers can be updated now.
 97 | 3.  Change Python wrappers to use the new functionality. Do not increment
 98 |     `min_consumer`, since models which do not use this Op should not break.
 99 | 
100 | **Removing an Op or restricting the functionality of an Op:**
101 | 
102 | 1.  Fix all producer scripts (not TensorFlow itself) to not use the banned Op or
103 |     functionality.
104 | 2.  Increment the binary version and implement new consumer functionality that
105 |     bans the removed Op or functionality for GraphDefs at the new version and
106 |     above. If possible, make TensorFlow stop producing `GraphDefs` with the
107 |     banned functionality. This can be done with
108 |     [`REGISTER_OP(...).Deprecated(deprecated_at_version,
109 |     message)`](https://github.com/tensorflow/tensorflow/blob/b289bc7a50fc0254970c60aaeba01c33de61a728/tensorflow/core/ops/array_ops.cc#L1009).
110 | 3.  Wait for a major release for backward compatibility purposes.
111 | 4.  Increase `min_producer` to the GraphDef version from (2) and remove the
112 |     functionality entirely.
113 | 
114 | **Changing the functionality of an Op:**
115 | 
116 | 1.  Add a new similar Op named `SomethingV2` or similar and go through the
117 |     process of adding it and switching existing Python wrappers to use it (may
118 |     take 3 weeks if forward compatibility is desired).
119 | 2.  Remove the old Op (Can only take place with a major version change due to
120 |     backward compatibility).
121 | 3.  Increase `min_consumer` to rule out consumers with the old Op, add back the
122 |     old Op as an alias for `SomethingV2`, and go through the process to switch
123 |     existing Python wrappers to use it.
124 | 4.  Go through the process to remove `SomethingV2`.
125 | 
126 | **Banning a single consumer version that cannot run safely:**
127 | 
128 | 1.  Bump the binary version and add the bad version to `bad_consumers` for all
129 |     new GraphDefs. If possible, add to `bad_consumers` only for GraphDefs which
130 |     contain a certain Op or similar.
131 | 2.  If existing consumers have the bad version, push them out as soon as
132 |     possible.
133 | 


--------------------------------------------------------------------------------
/performance/xla/jit.md:
--------------------------------------------------------------------------------
  1 | # Using JIT Compilation
  2 | 
  3 | > Note: TensorFlow must be compiled from source to include XLA.
  4 | 
  5 | ## Why use just-in-time (JIT) compilation?
  6 | 
  7 | The TensorFlow/XLA JIT compiler compiles and runs parts of TensorFlow graphs via
  8 | XLA. The benefit of this over the standard TensorFlow implementation is that XLA
  9 | can fuse multiple operators (kernel fusion) into a small number of compiled
 10 | kernels. Fusing operators can reduce memory bandwidth requirements and improve
 11 | performance compared to executing operators one-at-a-time, as the TensorFlow
 12 | executor does.
 13 | 
 14 | ## Running TensorFlow graphs via XLA
 15 | 
 16 | There are two ways to run TensorFlow computations via XLA, either by
 17 | JIT-compiling operators placed on a CPU or GPU device, or by placing operators
 18 | on the `XLA_CPU` or `XLA_GPU` TensorFlow devices. Placing operators directly on
 19 | a TensorFlow XLA device forces the operator to run on that device and is mainly
 20 | used for testing.
 21 | 
 22 | > Note: The XLA CPU backend produces fast single-threaded code (in most cases),
 23 | > but does not yet parallelize as well as the TensorFlow CPU backend. The XLA
 24 | > GPU backend is competitive with the standard TensorFlow implementation,
 25 | > sometimes faster, sometimes slower.
 26 | 
 27 | ### Turning on JIT compilation
 28 | 
 29 | JIT compilation can be turned on at the session level or manually for select
 30 | operations. Both of these approaches are zero-copy --- data does not need to be
 31 | copied when passing data between a compiled XLA kernel and a TensorFlow operator
 32 | placed on the same device.
 33 | 
 34 | #### Session
 35 | 
 36 | Turning on JIT compilation at the session level will result in all possible
 37 | operators being greedily compiled into XLA computations. Each XLA computation
 38 | will be compiled into one or more kernels for the underlying device.
 39 | 
 40 | Subject to a few constraints, if there are two adjacent operators in the graph
 41 | that both have XLA implementations, then they will be compiled into a single XLA
 42 | computation.
 43 | 
 44 | JIT compilation is turned on at the session level by setting the
 45 | `global_jit_level` config to `tf.OptimizerOptions.ON_1` and passing the config
 46 | during session initialization.
 47 | 
 48 | ```python
 49 | # Config to turn on JIT compilation
 50 | config = tf.ConfigProto()
 51 | config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1
 52 | 
 53 | sess = tf.Session(config=config)
 54 | ```
 55 | 
 56 | > Note: Turning on JIT at the session level will not result in operations being
 57 | > compiled for the CPU. JIT compilation for CPU operations must be done via
 58 | > the manual method documented below. This decision was made due to the CPU
 59 | > backend being single-threaded.
 60 | 
 61 | #### Manual
 62 | 
 63 | JIT compilation can also be turned on manually for one or more operators. This
 64 | is done by tagging the operators to compile with the attribute
 65 | `_XlaCompile=true`. The simplest way to do this is via the
 66 | `tf.contrib.compiler.jit.experimental_jit_scope()` scope defined in
 67 | [`tensorflow/contrib/compiler/jit.py`](https://www.tensorflow.org/code/tensorflow/contrib/compiler/jit.py).
 68 | Example usage:
 69 | 
 70 | ```python
 71 |     jit_scope = tf.contrib.compiler.jit.experimental_jit_scope
 72 | 
 73 |     x = tf.placeholder(np.float32)
 74 |     with jit_scope():
 75 |       y = tf.add(x, x)  # The "add" will be compiled with XLA.
 76 | ```
 77 | 
 78 | The `_XlaCompile` attribute is currently supported on a best-effort basis. If an
 79 | operator cannot be compiled, TensorFlow will silently fall back to the normal
 80 | implementation.
 81 | 
 82 | ### Placing operators on XLA devices
 83 | 
 84 | Another way to run computations via XLA is to place an operator on a specific
 85 | XLA device. This method is normally only used for testing. Valid targets are
 86 | `XLA_CPU` or `XLA_GPU`.
 87 | 
 88 | ```python
 89 | with tf.device("/job:localhost/replica:0/task:0/device:XLA_GPU:0"):
 90 |   output = tf.add(input1, input2)
 91 | ```
 92 | 
 93 | Unlike JIT compilation on the standard CPU and GPU devices, these devices make a
 94 | copy of data when it is transferred on and off the device. The extra copy makes
 95 | it expensive to mix XLA and TensorFlow operators in the same graph.
 96 | 
 97 | ## Tutorial
 98 | 
 99 | This tutorial covers training a simple version of MNIST softmax with JIT turned
100 | on. Currently JIT at the session level, which is what is used for the tutorial,
101 | only supports GPU.
102 | 
103 | Before starting the tutorial verify that the LD_LIBRARY environment variable or
104 | ldconfig contains `$CUDA_ROOT/extras/CUPTI/lib64`, which contains libraries for
105 | the CUDA Profiling Tools Interface [(CUPTI)](http://docs.nvidia.com/cuda/cupti/index.html).
106 | TensorFlow uses CUPTI to pull tracing information from the GPU.
107 | 
108 | ### Step #1: Prepare sample script
109 | 
110 | Download or move
111 | [mnist_softmax_xla.py](https://www.tensorflow.org/code/tensorflow/examples/tutorials/mnist/mnist_softmax_xla.py)
112 | into a folder outside of the TensorFlow source tree.
113 | 
114 | ### Step #2: Run without XLA
115 | 
116 | Execute the python script to train the model without XLA.
117 | 
118 | ```shell
119 | python mnist_softmax_xla.py --xla=''
120 | ```
121 | 
122 | Using the Chrome Trace Event Profiler (browse to chrome://tracing),
123 | open the timeline file created when the script finishes: `timeline.ctf.json`.
124 | The rendered timeline should look similar to the picture below with multiple
125 | green boxes labeled `MatMul`, possibly across multiple CPUs.
126 | <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
127 |   <img style="width:100%" src="../../images/jit_timeline_gpu.png">
128 | </div>
129 | 
130 | ### Step #3 Run with XLA
131 | 
132 | Execute the python script to train the model with XLA and turn on a debugging
133 | feature of XLA via an environmental variable that outputs the XLA graph.
134 | 
135 | ```shell
136 | TF_XLA_FLAGS=--xla_generate_hlo_graph=.* python mnist_softmax_xla.py
137 | ```
138 | 
139 | Open the timeline file created (`timeline.ctf.json`).  The rendered timeline
140 | should look similar to the picture below with one long bar labeled `_XlaLaunch`.
141 | <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
142 |   <img style="width:100%" src="../../images/jit_timeline_gpu_xla.png">
143 | </div>
144 | 
145 | To understand what is happening in `_XlaLaunch`, look at the console output for
146 | statements similar to the following:
147 | 
148 | ```shell
149 | computation cluster_0[_XlaCompiledKernel=true,_XlaNumConstantArgs=1].v82 [CPU:
150 | pipeline start, before inline]: /tmp/hlo_graph_0.dot
151 | 
152 | ```
153 | 
154 | The console statements point to the location of `hlo_graph_xx.dot` files that
155 | contain information about the graph created by XLA. The process that XLA takes
156 | to fuse Ops is visible by starting at `hlo_graph_0.dot` and viewing each diagram
157 | in succession.
158 | 
159 | To Render the .dot file into a png, install
160 | [GraphViz](http://www.graphviz.org/Download..php) and run:
161 | 
162 | ```shell
163 | dot -Tpng hlo_graph_80.dot -o hlo_graph_80.png
164 | ```
165 | 
166 | The result will look like the following:
167 | <div style="width:95%; margin:auto; margin-bottom:10px; margin-top:20px;">
168 |   <img style="width:100%" src="../../images/jit_gpu_xla_graph.png">
169 | </div>
170 | 


--------------------------------------------------------------------------------
/programmers_guide/threading_and_queues.md:
--------------------------------------------------------------------------------
  1 | # Threading and Queues
  2 | 
  3 | Queues are a powerful mechanism for asynchronous computation using TensorFlow.
  4 | 
  5 | Like everything in TensorFlow, a queue is a node in a TensorFlow graph. It's a
  6 | stateful node, like a variable: other nodes can modify its content. In
  7 | particular, nodes can enqueue new items in to the queue, or dequeue existing
  8 | items from the queue.
  9 | 
 10 | To get a feel for queues, let's consider a simple example. We will create a
 11 | "first in, first out" queue (`FIFOQueue`) and fill it with zeros.
 12 | Then we'll construct a graph
 13 | that takes an item off the queue, adds one to that item, and puts it back on the
 14 | end of the queue. Slowly, the numbers on the queue increase.
 15 | 
 16 | <div style="width:70%; margin:auto; margin-bottom:10px; margin-top:20px;">
 17 | <img style="width:100%" src="../images/IncremeterFifoQueue.gif">
 18 | </div>
 19 | 
 20 | `Enqueue`, `EnqueueMany`, and `Dequeue` are special nodes. They take a pointer
 21 | to the queue instead of a normal value, allowing them to change it. We recommend
 22 | you think of these as being like methods of the queue. In fact, in the Python
 23 | API, they are methods of the queue object (e.g. `q.enqueue(...)`).
 24 | 
 25 | **N.B.** Queue methods (such as `q.enqueue(...)`) *must* run on the same device
 26 | as the queue. Incompatible device placement directives will be ignored when
 27 | creating these operations.
 28 | 
 29 | Now that you have a bit of a feel for queues, let's dive into the details...
 30 | 
 31 | ## Queue usage overview
 32 | 
 33 | Queues, such as @{tf.FIFOQueue}
 34 | and @{tf.RandomShuffleQueue},
 35 | are important TensorFlow objects for computing tensors asynchronously in a
 36 | graph.
 37 | 
 38 | For example, a typical input architecture is to use a `RandomShuffleQueue` to
 39 | prepare inputs for training a model:
 40 | 
 41 | * Multiple threads prepare training examples and push them in the queue.
 42 | * A training thread executes a training op that dequeues mini-batches from the
 43 |   queue
 44 | 
 45 | This architecture has many benefits, as highlighted in the
 46 | @{$reading_data$Reading data how to}, which also gives an overview of
 47 | functions that simplify the construction of input pipelines.
 48 | 
 49 | The TensorFlow `Session` object is multithreaded, so multiple threads can
 50 | easily use the same session and run ops in parallel.  However, it is not always
 51 | easy to implement a Python program that drives threads as described above.  All
 52 | threads must be able to stop together, exceptions must be caught and
 53 | reported, and queues must be properly closed when stopping.
 54 | 
 55 | TensorFlow provides two classes to help:
 56 | @{tf.train.Coordinator} and
 57 | @{tf.train.QueueRunner}. These two classes
 58 | are designed to be used together. The `Coordinator` class helps multiple threads
 59 | stop together and report exceptions to a program that waits for them to stop.
 60 | The `QueueRunner` class is used to create a number of threads cooperating to
 61 | enqueue tensors in the same queue.
 62 | 
 63 | ## Coordinator
 64 | 
 65 | The `Coordinator` class helps multiple threads stop together.
 66 | 
 67 | Its key methods are:
 68 | 
 69 | * @{tf.train.Coordinator.should_stop}: returns True if the threads should stop.
 70 | * @{tf.train.Coordinator.request_stop}: requests that threads should stop.
 71 | * @{tf.train.Coordinator.join}: waits until the specified threads have stopped.
 72 | 
 73 | You first create a `Coordinator` object, and then create a number of threads
 74 | that use the coordinator.  The threads typically run loops that stop when
 75 | `should_stop()` returns `True`.
 76 | 
 77 | Any thread can decide that the computation should stop.  It only has to call
 78 | `request_stop()` and the other threads will stop as `should_stop()` will then
 79 | return `True`.
 80 | 
 81 | ```python
 82 | # Thread body: loop until the coordinator indicates a stop was requested.
 83 | # If some condition becomes true, ask the coordinator to stop.
 84 | def MyLoop(coord):
 85 |   while not coord.should_stop():
 86 |     ...do something...
 87 |     if ...some condition...:
 88 |       coord.request_stop()
 89 | 
 90 | # Main thread: create a coordinator.
 91 | coord = tf.train.Coordinator()
 92 | 
 93 | # Create 10 threads that run 'MyLoop()'
 94 | threads = [threading.Thread(target=MyLoop, args=(coord,)) for i in xrange(10)]
 95 | 
 96 | # Start the threads and wait for all of them to stop.
 97 | for t in threads:
 98 |   t.start()
 99 | coord.join(threads)
100 | ```
101 | 
102 | Obviously, the coordinator can manage threads doing very different things.
103 | They don't have to be all the same as in the example above.  The coordinator
104 | also has support to capture and report exceptions.  See the @{tf.train.Coordinator} documentation for more details.
105 | 
106 | ## QueueRunner
107 | 
108 | The `QueueRunner` class creates a number of threads that repeatedly run an
109 | enqueue op.  These threads can use a coordinator to stop together.  In
110 | addition, a queue runner runs a *closer thread* that automatically closes the
111 | queue if an exception is reported to the coordinator.
112 | 
113 | You can use a queue runner to implement the architecture described above.
114 | 
115 | First build a graph that uses a TensorFlow queue (e.g. a `tf.RandomShuffleQueue`) for input examples.  Add ops that
116 | process examples and enqueue them in the queue.  Add training ops that start by
117 | dequeueing from the queue.
118 | 
119 | ```python
120 | example = ...ops to create one example...
121 | # Create a queue, and an op that enqueues examples one at a time in the queue.
122 | queue = tf.RandomShuffleQueue(...)
123 | enqueue_op = queue.enqueue(example)
124 | # Create a training graph that starts by dequeuing a batch of examples.
125 | inputs = queue.dequeue_many(batch_size)
126 | train_op = ...use 'inputs' to build the training part of the graph...
127 | ```
128 | 
129 | In the Python training program, create a `QueueRunner` that will run a few
130 | threads to process and enqueue examples.  Create a `Coordinator` and ask the
131 | queue runner to start its threads with the coordinator.  Write a training loop
132 | that also uses the coordinator.
133 | 
134 | ```
135 | # Create a queue runner that will run 4 threads in parallel to enqueue
136 | # examples.
137 | qr = tf.train.QueueRunner(queue, [enqueue_op] * 4)
138 | 
139 | # Launch the graph.
140 | sess = tf.Session()
141 | # Create a coordinator, launch the queue runner threads.
142 | coord = tf.train.Coordinator()
143 | enqueue_threads = qr.create_threads(sess, coord=coord, start=True)
144 | # Run the training loop, controlling termination with the coordinator.
145 | for step in xrange(1000000):
146 |     if coord.should_stop():
147 |         break
148 |     sess.run(train_op)
149 | # When done, ask the threads to stop.
150 | coord.request_stop()
151 | # And wait for them to actually do it.
152 | coord.join(enqueue_threads)
153 | ```
154 | 
155 | ## Handling exceptions
156 | 
157 | Threads started by queue runners do more than just run the enqueue ops.  They
158 | also catch and handle exceptions generated by queues, including the
159 | `tf.errors.OutOfRangeError` exception, which is used to report that a queue was closed.
160 | 
161 | A training program that uses a coordinator must similarly catch and report
162 | exceptions in its main loop.
163 | 
164 | Here is an improved version of the training loop above.
165 | 
166 | ```python
167 | try:
168 |     for step in xrange(1000000):
169 |         if coord.should_stop():
170 |             break
171 |         sess.run(train_op)
172 | except Exception, e:
173 |     # Report exceptions to the coordinator.
174 |     coord.request_stop(e)
175 | finally:
176 |     # Terminate as usual. It is safe to call `coord.request_stop()` twice.
177 |     coord.request_stop()
178 |     coord.join(threads)
179 | ```
180 | 


--------------------------------------------------------------------------------
/programmers_guide/version_semantics.md:
--------------------------------------------------------------------------------
  1 | # TensorFlow Version Semantics
  2 | 
  3 | ## Semantic Versioning 2.0
  4 | 
  5 | TensorFlow follows Semantic Versioning 2.0 ([semver](http://semver.org)) for its
  6 | public API. Each release version of TensorFlow has the form `MAJOR.MINOR.PATCH`.
  7 | Changes to the each number have the following meaning:
  8 | 
  9 | * **MAJOR**:  Backwards incompatible changes.  Code and data that worked with
 10 |   a previous major release will not necessarily work with a new release.
 11 |   However, in some cases existing TensorFlow data (graphs, checkpoints, and
 12 |   other protobufs) may be migratable to the newer release; see below for details
 13 |   on data compatibility.
 14 | 
 15 | * **MINOR**: Backwards compatible features, speed improvements, etc.  Code and
 16 |   data that worked with a previous minor release *and* which depends only the
 17 |   public API will continue to work unchanged.  For details on what is and is
 18 |   not the public API, see below.
 19 | 
 20 | * **PATCH**: Backwards compatible bug fixes.
 21 | 
 22 | ## What is covered
 23 | 
 24 | Only the public APIs of TensorFlow are backwards compatible across minor and
 25 | patch versions.  The public APIs consist of
 26 | 
 27 | * The documented public [Python](../api_docs/python) API, excluding `tf.contrib`.
 28 |   This includes all public functions and classes (whose names do not start with
 29 |   `_`) in the tensorflow module and its submodules. Note that the code in
 30 |   the `examples/` to `tools/` directories is not reachable through the
 31 |   tensorflow Python module and is thus not covered by the compatibility
 32 |   guarantee.
 33 | 
 34 |   If a symbol is available through the tensorflow Python module or its
 35 |   submodules, but is not documented, then it is _not_ considered part of the
 36 |   public API.
 37 | 
 38 | * The [C API](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/c/c_api.h).
 39 | 
 40 | * The following protocol buffer files:
 41 |   [`attr_value`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/attr_value.proto),
 42 |   [`config`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/protobuf/config.proto),
 43 |   [`event`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/util/event.proto),
 44 |   [`graph`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/graph.proto),
 45 |   [`op_def`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/op_def.proto),
 46 |   [`reader_base`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/reader_base.proto),
 47 |   [`summary`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/summary.proto),
 48 |   [`tensor`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/tensor.proto),
 49 |   [`tensor_shape`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/tensor_shape.proto),
 50 |   and [`types`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/types.proto).
 51 | 
 52 | ## What is *not* covered
 53 | 
 54 | Some API functions are explicitly marked as "experimental" and can change in
 55 | backward incompatible ways between minor releases. These include:
 56 | 
 57 | * **Experimental APIs**: The @{tf.contrib} module and its submodules in Python
 58 |   and any functions in the C API or fields in protocol buffers that are
 59 |   explicitly commented as being experimental.
 60 | 
 61 | *   **Other languages**: TensorFlow APIs in languages other than Python and C,
 62 |     such as:
 63 | 
 64 |   - @{$cc/guide$C++} (exposed through header files in
 65 |     [`tensorflow/cc`](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/cc)).
 66 |   - [Java](../api_docs/java/reference/org/tensorflow/package-summary), and
 67 |   - [Go](https://godoc.org/github.com/tensorflow/tensorflow/tensorflow/go)
 68 | 
 69 | *   **Details of composite ops:** Many public functions in Python expand to
 70 |     several primitive ops in the graph, and these details will be part of any
 71 |     graphs saved to disk as `GraphDef`s. These details are allowed to change for
 72 |     minor releases. In particular, regressions tests that check for exact
 73 |     matching between graphs are likely to break across minor releases, even
 74 |     though the behavior of the graph should be unchanged and existing
 75 |     checkpoints will still work.
 76 | 
 77 | *   **Floating point numerical details:** The specific floating point values
 78 |     computed by ops may change at any time: users should rely only on
 79 |     approximate accuracy and numerical stability, not on the specific bits
 80 |     computed. Changes to numerical formulas in minor and patch releases should
 81 |     result in comparable or improved accuracy, with the caveat that in machine
 82 |     learning improved accuracy of specific formulas may result in worse accuracy
 83 |     for the overall system.
 84 | 
 85 | *   **Random numbers:** The specific random numbers computed by the
 86 |     @{$python/constant_op#Random_Tensors$random ops} may change at any time:
 87 |     users should rely only on approximately correct distributions and
 88 |     statistical strength, not the specific bits computed. However, we will make
 89 |     changes to random bits rarely and ideally never for patch releases, and all
 90 |     such intended changes will be documented.
 91 | 
 92 | *   **Distributed Tensorflow:** Running 2 different versions of TensorFlow in a
 93 |     single cluster is unsupported. There are no guarantees about backwards
 94 |     compatibility of the wire protocol.
 95 | 
 96 | Furthermore, any API methods marked "deprecated" in the 1.0 release can
 97 | be deleted in any subsequent minor release.
 98 | 
 99 | ## Compatibility for Graphs and Checkpoints
100 | 
101 | Many users of TensorFlow will be saving graphs and trained models to disk for
102 | later evaluation or more training, often changing versions of TensorFlow in the
103 | process.  First, following semver, any graph or checkpoint written out with one
104 | version of TensorFlow can be loaded and evaluated with a later version of
105 | TensorFlow with the same major release.  However, we will endeavour to preserve
106 | backwards compatibility even across major releases when possible, so that the
107 | serialized files are usable over long periods of time.
108 | 
109 | There are two main classes of saved TensorFlow data: graphs and checkpoints.
110 | Graphs describe the data flow graphs of ops to be run during training and
111 | inference, and checkpoints contain the saved tensor values of variables in a
112 | graph.
113 | 
114 | Graphs are serialized via the `GraphDef` protocol buffer.  To facilitate (rare)
115 | backwards incompatible changes to graphs, each `GraphDef` has an integer version
116 | separate from the TensorFlow version.  The semantics are:
117 | 
118 | * Each version of TensorFlow supports an interval of `GraphDef` versions.  This
119 |   interval with be constant across patch releases, and will only grow across
120 |   minor releases.  Dropping support for a `GraphDef` version will only occur
121 |   for a major release of TensorFlow.
122 | 
123 | * Newly created graphs use the newest `GraphDef` version.
124 | 
125 | * If a given version of TensorFlow supports the `GraphDef` version of a graph,
126 |   it will load and evaluate with the same behavior as when it was written out
127 |   (except for floating point numerical details and random numbers), regardless
128 |   of the major version of TensorFlow.  In particular, all checkpoint files will
129 |   be compatible.
130 | 
131 | * If the `GraphDef` upper bound is increased to X in a (minor) release, there
132 |   will be at least six months before the lower bound is increased to X.
133 | 
134 | For example (numbers and versions hypothetical), TensorFlow 1.2 might support
135 | `GraphDef` versions 4 to 7.  TensorFlow 1.3 could add `GraphDef` version 8 and
136 | support versions 4 to 8.  At least six months later, TensorFlow 2.0.0 could drop
137 | support for versions 4 to 7, leaving version 8 only.
138 | 
139 | Finally, when support for a `GraphDef` version is dropped, we will attempt to
140 | provide tools for automatically converting graphs to a newer supported
141 | `GraphDef` version.
142 | 
143 | For developer-level details about `GraphDef` versioning, including how to evolve
144 | the versions to account for changes, see
145 | @{$data_versions$TensorFlow Data Versioning}.
146 | 


--------------------------------------------------------------------------------
/tutorials/using_gpu.md:
--------------------------------------------------------------------------------
  1 | # Using GPUs
  2 | 
  3 | ## Supported devices
  4 | 
  5 | On a typical system, there are multiple computing devices. In TensorFlow, the
  6 | supported device types are `CPU` and `GPU`. They are represented as `strings`.
  7 | For example:
  8 | 
  9 | *   `"/cpu:0"`: The CPU of your machine.
 10 | *   `"/gpu:0"`: The GPU of your machine, if you have one.
 11 | *   `"/gpu:1"`: The second GPU of your machine, etc.
 12 | 
 13 | If a TensorFlow operation has both CPU and GPU implementations, the GPU devices
 14 | will be given priority when the operation is assigned to a device. For example,
 15 | `matmul` has both CPU and GPU kernels. On a system with devices `cpu:0` and
 16 | `gpu:0`, `gpu:0` will be selected to run `matmul`.
 17 | 
 18 | ## Logging Device placement
 19 | 
 20 | To find out which devices your operations and tensors are assigned to, create
 21 | the session with `log_device_placement` configuration option set to `True`.
 22 | 
 23 | ```python
 24 | # Creates a graph.
 25 | a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
 26 | b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
 27 | c = tf.matmul(a, b)
 28 | # Creates a session with log_device_placement set to True.
 29 | sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
 30 | # Runs the op.
 31 | print(sess.run(c))
 32 | ```
 33 | 
 34 | You should see the following output:
 35 | 
 36 | ```
 37 | Device mapping:
 38 | /job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
 39 | id: 0000:05:00.0
 40 | b: /job:localhost/replica:0/task:0/gpu:0
 41 | a: /job:localhost/replica:0/task:0/gpu:0
 42 | MatMul: /job:localhost/replica:0/task:0/gpu:0
 43 | [[ 22.  28.]
 44 |  [ 49.  64.]]
 45 | 
 46 | ```
 47 | 
 48 | ## Manual device placement
 49 | 
 50 | If you would like a particular operation to run on a device of your choice
 51 | instead of what's automatically selected for you, you can use `with tf.device`
 52 | to create a device context such that all the operations within that context will
 53 | have the same device assignment.
 54 | 
 55 | ```python
 56 | # Creates a graph.
 57 | with tf.device('/cpu:0'):
 58 |   a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
 59 |   b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
 60 |   c = tf.matmul(a, b)
 61 | # Creates a session with log_device_placement set to True.
 62 | sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
 63 | # Runs the op.
 64 | print(sess.run(c))
 65 | ```
 66 | 
 67 | You will see that now `a` and `b` are assigned to `cpu:0`.
 68 | 
 69 | ```
 70 | Device mapping:
 71 | /job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus
 72 | id: 0000:05:00.0
 73 | b: /job:localhost/replica:0/task:0/cpu:0
 74 | a: /job:localhost/replica:0/task:0/cpu:0
 75 | MatMul: /job:localhost/replica:0/task:0/gpu:0
 76 | [[ 22.  28.]
 77 |  [ 49.  64.]]
 78 | ```
 79 | 
 80 | ## Allowing GPU memory growth
 81 | 
 82 | By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to
 83 | [`CUDA_VISIBLE_DEVICES`](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars))
 84 | visible to the process. This is done to more efficiently use the relatively
 85 | precious GPU memory resources on the devices by reducing [memory
 86 | fragmentation](https://en.wikipedia.org/wiki/Fragmentation_\(computing\)).
 87 | 
 88 | In some cases it is desirable for the process to only allocate a subset of the
 89 | available memory, or to only grow the memory usage as is needed by the process.
 90 | TensorFlow provides two Config options on the Session to control this.
 91 | 
 92 | The first is the `allow_growth` option, which attempts to allocate only as much
 93 | GPU memory based on runtime allocations: it starts out allocating very little
 94 | memory, and as Sessions get run and more GPU memory is needed, we extend the GPU
 95 | memory region needed by the TensorFlow process. Note that we do not release
 96 | memory, since that can lead to even worse memory fragmentation. To turn this
 97 | option on, set the option in the ConfigProto by:
 98 | 
 99 | ```python
100 | config = tf.ConfigProto()
101 | config.gpu_options.allow_growth = True
102 | session = tf.Session(config=config, ...)
103 | ```
104 | 
105 | The second method is the `per_process_gpu_memory_fraction` option, which
106 | determines the fraction of the overall amount of memory that each visible GPU
107 | should be allocated. For example, you can tell TensorFlow to only allocate 40%
108 | of the total memory of each GPU by:
109 | 
110 | ```python
111 | config = tf.ConfigProto()
112 | config.gpu_options.per_process_gpu_memory_fraction = 0.4
113 | session = tf.Session(config=config, ...)
114 | ```
115 | 
116 | This is useful if you want to truly bound the amount of GPU memory available to
117 | the TensorFlow process.
118 | 
119 | ## Using a single GPU on a multi-GPU system
120 | 
121 | If you have more than one GPU in your system, the GPU with the lowest ID will be
122 | selected by default. If you would like to run on a different GPU, you will need
123 | to specify the preference explicitly:
124 | 
125 | ```python
126 | # Creates a graph.
127 | with tf.device('/gpu:2'):
128 |   a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
129 |   b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
130 |   c = tf.matmul(a, b)
131 | # Creates a session with log_device_placement set to True.
132 | sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
133 | # Runs the op.
134 | print(sess.run(c))
135 | ```
136 | 
137 | If the device you have specified does not exist, you will get
138 | `InvalidArgumentError`:
139 | 
140 | ```
141 | InvalidArgumentError: Invalid argument: Cannot assign a device to node 'b':
142 | Could not satisfy explicit device specification '/gpu:2'
143 |    [[Node: b = Const[dtype=DT_FLOAT, value=Tensor<type: float shape: [3,2]
144 |    values: 1 2 3...>, _device="/gpu:2"]()]]
145 | ```
146 | 
147 | If you would like TensorFlow to automatically choose an existing and supported
148 | device to run the operations in case the specified one doesn't exist, you can
149 | set `allow_soft_placement` to `True` in the configuration option when creating
150 | the session.
151 | 
152 | ```python
153 | # Creates a graph.
154 | with tf.device('/gpu:2'):
155 |   a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a')
156 |   b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b')
157 |   c = tf.matmul(a, b)
158 | # Creates a session with allow_soft_placement and log_device_placement set
159 | # to True.
160 | sess = tf.Session(config=tf.ConfigProto(
161 |       allow_soft_placement=True, log_device_placement=True))
162 | # Runs the op.
163 | print(sess.run(c))
164 | ```
165 | 
166 | ## Using multiple GPUs
167 | 
168 | If you would like to run TensorFlow on multiple GPUs, you can construct your
169 | model in a multi-tower fashion where each tower is assigned to a different GPU.
170 | For example:
171 | 
172 | ```
173 | # Creates a graph.
174 | c = []
175 | for d in ['/gpu:2', '/gpu:3']:
176 |   with tf.device(d):
177 |     a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3])
178 |     b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2])
179 |     c.append(tf.matmul(a, b))
180 | with tf.device('/cpu:0'):
181 |   sum = tf.add_n(c)
182 | # Creates a session with log_device_placement set to True.
183 | sess = tf.Session(config=tf.ConfigProto(log_device_placement=True))
184 | # Runs the op.
185 | print(sess.run(sum))
186 | ```
187 | 
188 | You will see the following output.
189 | 
190 | ```
191 | Device mapping:
192 | /job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K20m, pci bus
193 | id: 0000:02:00.0
194 | /job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: Tesla K20m, pci bus
195 | id: 0000:03:00.0
196 | /job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: Tesla K20m, pci bus
197 | id: 0000:83:00.0
198 | /job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: Tesla K20m, pci bus
199 | id: 0000:84:00.0
200 | Const_3: /job:localhost/replica:0/task:0/gpu:3
201 | Const_2: /job:localhost/replica:0/task:0/gpu:3
202 | MatMul_1: /job:localhost/replica:0/task:0/gpu:3
203 | Const_1: /job:localhost/replica:0/task:0/gpu:2
204 | Const: /job:localhost/replica:0/task:0/gpu:2
205 | MatMul: /job:localhost/replica:0/task:0/gpu:2
206 | AddN: /job:localhost/replica:0/task:0/cpu:0
207 | [[  44.   56.]
208 |  [  98.  128.]]
209 | ```
210 | 
211 | The @{$deep_cnn$cifar10 tutorial} is a good example
212 | demonstrating how to do training with multiple GPUs.
213 | 


--------------------------------------------------------------------------------
/tutorials/recurrent.md:
--------------------------------------------------------------------------------
  1 | # Recurrent Neural Networks
  2 | 
  3 | ## Introduction
  4 | 
  5 | Take a look at [this great article](http://colah.github.io/posts/2015-08-Understanding-LSTMs/)
  6 | for an introduction to recurrent neural networks and LSTMs in particular.
  7 | 
  8 | ## Language Modeling
  9 | 
 10 | In this tutorial we will show how to train a recurrent neural network on
 11 | a challenging task of language modeling. The goal of the problem is to fit a
 12 | probabilistic model which assigns probabilities to sentences. It does so by
 13 | predicting next words in a text given a history of previous words. For this
 14 | purpose we will use the [Penn Tree Bank](https://catalog.ldc.upenn.edu/ldc99t42)
 15 | (PTB) dataset, which is a popular benchmark for measuring the quality of these
 16 | models, whilst being small and relatively fast to train.
 17 | 
 18 | Language modeling is key to many interesting problems such as speech
 19 | recognition, machine translation, or image captioning. It is also fun --
 20 | take a look [here](http://karpathy.github.io/2015/05/21/rnn-effectiveness/).
 21 | 
 22 | For the purpose of this tutorial, we will reproduce the results from
 23 | [Zaremba et al., 2014](http://arxiv.org/abs/1409.2329)
 24 | ([pdf](http://arxiv.org/pdf/1409.2329.pdf)), which achieves very good quality
 25 | on the PTB dataset.
 26 | 
 27 | ## Tutorial Files
 28 | 
 29 | This tutorial references the following files from `models/tutorials/rnn/ptb` in the [TensorFlow models repo](https://github.com/tensorflow/models):
 30 | 
 31 | File | Purpose
 32 | --- | ---
 33 | `ptb_word_lm.py` | The code to train a language model on the PTB dataset.
 34 | `reader.py` | The code to read the dataset.
 35 | 
 36 | ## Download and Prepare the Data
 37 | 
 38 | The data required for this tutorial is in the `data/` directory of the
 39 | PTB dataset from Tomas Mikolov's webpage:
 40 | http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz
 41 | 
 42 | The dataset is already preprocessed and contains overall 10000 different words,
 43 | including the end-of-sentence marker and a special symbol (\<unk\>) for rare
 44 | words. In `reader.py`, we convert each word to a unique integer identifier,
 45 | in order to make it easy for the neural network to process the data.
 46 | 
 47 | ## The Model
 48 | 
 49 | ### LSTM
 50 | 
 51 | The core of the model consists of an LSTM cell that processes one word at a
 52 | time and computes probabilities of the possible values for the next word in the
 53 | sentence. The memory state of the network is initialized with a vector of zeros
 54 | and gets updated after reading each word. For computational reasons, we will
 55 | process data in mini-batches of size `batch_size`.
 56 | 
 57 | The basic pseudocode is as follows:
 58 | 
 59 | ```python
 60 | lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
 61 | # Initial state of the LSTM memory.
 62 | state = tf.zeros([batch_size, lstm.state_size])
 63 | probabilities = []
 64 | loss = 0.0
 65 | for current_batch_of_words in words_in_dataset:
 66 |     # The value of state is updated after processing each batch of words.
 67 |     output, state = lstm(current_batch_of_words, state)
 68 | 
 69 |     # The LSTM output can be used to make next word predictions
 70 |     logits = tf.matmul(output, softmax_w) + softmax_b
 71 |     probabilities.append(tf.nn.softmax(logits))
 72 |     loss += loss_function(probabilities, target_words)
 73 | ```
 74 | 
 75 | ### Truncated Backpropagation
 76 | 
 77 | By design, the output of a recurrent neural network (RNN) depends on arbitrarily
 78 | distant inputs. Unfortunately, this makes backpropagation computation difficult.
 79 | In order to make the learning process tractable, it is common practice to create
 80 | an "unrolled" version of the network, which contains a fixed number
 81 | (`num_steps`) of LSTM inputs and outputs. The model is then trained on this
 82 | finite approximation of the RNN. This can be implemented by feeding inputs of
 83 | length `num_steps` at a time and performing a backward pass after each
 84 | such input block.
 85 | 
 86 | Here is a simplified block of code for creating a graph which performs
 87 | truncated backpropagation:
 88 | 
 89 | ```python
 90 | # Placeholder for the inputs in a given iteration.
 91 | words = tf.placeholder(tf.int32, [batch_size, num_steps])
 92 | 
 93 | lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size)
 94 | # Initial state of the LSTM memory.
 95 | initial_state = state = tf.zeros([batch_size, lstm.state_size])
 96 | 
 97 | for i in range(num_steps):
 98 |     # The value of state is updated after processing each batch of words.
 99 |     output, state = lstm(words[:, i], state)
100 | 
101 |     # The rest of the code.
102 |     # ...
103 | 
104 | final_state = state
105 | ```
106 | 
107 | And this is how to implement an iteration over the whole dataset:
108 | 
109 | ```python
110 | # A numpy array holding the state of LSTM after each batch of words.
111 | numpy_state = initial_state.eval()
112 | total_loss = 0.0
113 | for current_batch_of_words in words_in_dataset:
114 |     numpy_state, current_loss = session.run([final_state, loss],
115 |         # Initialize the LSTM state from the previous iteration.
116 |         feed_dict={initial_state: numpy_state, words: current_batch_of_words})
117 |     total_loss += current_loss
118 | ```
119 | 
120 | ### Inputs
121 | 
122 | The word IDs will be embedded into a dense representation (see the
123 | @{$word2vec$Vector Representations Tutorial}) before feeding to
124 | the LSTM. This allows the model to efficiently represent the knowledge about
125 | particular words. It is also easy to write:
126 | 
127 | ```python
128 | # embedding_matrix is a tensor of shape [vocabulary_size, embedding size]
129 | word_embeddings = tf.nn.embedding_lookup(embedding_matrix, word_ids)
130 | ```
131 | 
132 | The embedding matrix will be initialized randomly and the model will learn to
133 | differentiate the meaning of words just by looking at the data.
134 | 
135 | ### Loss Function
136 | 
137 | We want to minimize the average negative log probability of the target words:
138 | 
139 | $$ \text{loss} = -\frac{1}{N}\sum_{i=1}^{N} \ln p_{\text{target}_i} $$
140 | 
141 | It is not very difficult to implement but the function
142 | `sequence_loss_by_example` is already available, so we can just use it here.
143 | 
144 | The typical measure reported in the papers is average per-word perplexity (often
145 | just called perplexity), which is equal to
146 | 
147 | $$e^{-\frac{1}{N}\sum_{i=1}^{N} \ln p_{\text{target}_i}} = e^{\text{loss}} $$
148 | 
149 | and we will monitor its value throughout the training process.
150 | 
151 | ### Stacking multiple LSTMs
152 | 
153 | To give the model more expressive power, we can add multiple layers of LSTMs
154 | to process the data. The output of the first layer will become the input of
155 | the second and so on.
156 | 
157 | We have a class called `MultiRNNCell` that makes the implementation seamless:
158 | 
159 | ```python
160 | lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size, state_is_tuple=False)
161 | stacked_lstm = tf.contrib.rnn.MultiRNNCell([lstm] * number_of_layers,
162 |     state_is_tuple=False)
163 | 
164 | initial_state = state = stacked_lstm.zero_state(batch_size, tf.float32)
165 | for i in range(num_steps):
166 |     # The value of state is updated after processing each batch of words.
167 |     output, state = stacked_lstm(words[:, i], state)
168 | 
169 |     # The rest of the code.
170 |     # ...
171 | 
172 | final_state = state
173 | ```
174 | 
175 | ## Run the Code
176 | 
177 | Start by cloning the [TensorFlow models repo](https://github.com/tensorflow/models) from GitHub.
178 | You'll also need to download the PTB dataset, as discussed at the beginning of
179 | this tutorial; we'll assume the dataset is located in `/tmp/simple-examples/data`.
180 | 
181 | Run the following commands:
182 | 
183 | ```bash
184 | cd models/tutorials/rnn/ptb
185 | python ptb_word_lm.py --data_path=/tmp/simple-examples/data/ --model=small
186 | ```
187 | 
188 | There are 3 supported model configurations in the tutorial code: "small",
189 | "medium" and "large". The difference between them is in size of the LSTMs and
190 | the set of hyperparameters used for training.
191 | 
192 | The larger the model, the better results it should get. The `small` model should
193 | be able to reach perplexity below 120 on the test set and the `large` one below
194 | 80, though it might take several hours to train.
195 | 
196 | ## What Next?
197 | 
198 | There are several tricks that we haven't mentioned that make the model better,
199 | including:
200 | 
201 | * decreasing learning rate schedule,
202 | * dropout between the LSTM layers.
203 | 
204 | Study the code and modify it to improve the model even further.
205 | 


--------------------------------------------------------------------------------
/performance/xla/broadcasting.md:
--------------------------------------------------------------------------------
  1 | # Broadcasting semantics
  2 | 
  3 | This document describes how the broadcasting semantics in XLA work.
  4 | 
  5 | ## What is broadcasting?
  6 | 
  7 | Broadcasting is the process of making arrays with different shapes have
  8 | compatible shapes for arithmetic operations. The terminology is borrowed from
  9 | Numpy
 10 | [(broadcasting)](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).
 11 | 
 12 | Broadcasting may be required for operations between multi-dimensional arrays of
 13 | different ranks, or between multi-dimensional arrays with different but
 14 | compatible shapes. Consider the addition `X+v` where `X` is a matrix (an array
 15 | of rank 2) and `v` is a vector (an array of rank 1). To perform element-wise
 16 | addition, XLA needs to "broadcast" the vector `v` to the same rank as the
 17 | matrix `X`, by replicating `v` a certain number of times. The vector's length
 18 | has to match at least one of the dimensions of the matrix.
 19 | 
 20 | For example:
 21 | 
 22 |     |1 2 3| + |7 8 9|
 23 |     |4 5 6|
 24 | 
 25 | The matrix's dimensions are (2,3), the vector's are (3). The vector is broadcast
 26 | by replicating it over rows to get:
 27 | 
 28 |     |1 2 3| + |7 8 9| = |8  10 12|
 29 |     |4 5 6|   |7 8 9|   |11 13 15|
 30 | 
 31 | In Numpy, this is called [broadcasting]
 32 | (http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).
 33 | 
 34 | ## Principles
 35 | 
 36 | XLA is a low-level infrastructure with a XLA language this is as strict and
 37 | explicit as possible, avoiding implicit and "magical" features that may make
 38 | some computations slightly easier to define, at the cost of more assumptions
 39 | baked into user code that will be difficult to change in the long term. If
 40 | necessary, implicit and magical features can be added in client-level wrappers.
 41 | 
 42 | In regards to broadcasting, explicit broadcasting specifications on operations
 43 | between arrays of different ranks is required. This is different from Numpy,
 44 | which infers the specification when possible.
 45 | 
 46 | ## Broadcasting a lower-rank array onto a higher-rank array
 47 | 
 48 | *Scalars* can always be broadcast over arrays without an explicit specification
 49 | of broadcasting dimensions. An element-wise binary operation between a scalar
 50 | and an array means applying the operation with the scalar for each element in
 51 | the array. For example, adding a scalar to a matrix means producing a matrix
 52 | each element of which is a sum of the scalar with the corresponding input
 53 | matrix's element.
 54 | 
 55 |     |1 2 3| + 7 = |8  9  10|
 56 |     |4 5 6|       |11 12 13|
 57 | 
 58 | Most broadcasting needs can be captured by using a tuple of dimensions on a
 59 | binary operation. When the inputs to the operation have different ranks, this
 60 | broadcasting tuple specifies which dimension(s) in the **higher-rank** array to
 61 | match with the **lower-rank** array.
 62 | 
 63 | Consider the previous example, instead of adding a scalar to a (2,3) matrix, add
 64 | a vector of dimension (3) to a matrix of dimensions (2,3). *Without specifying
 65 | broadcasting, this operation is invalid.* To correctly request matrix-vector
 66 | addition, specify the broadcasting dimension to be (1), meaning the vector's
 67 | dimension is matched to dimension 1 of the matrix. In 2D, if dimension 0 is
 68 | considered as rows and dimension 1 as columns, this means that each element of
 69 | the vector becomes a column of a size matching the number of rows in the matrix:
 70 | 
 71 |     |7 8 9| ==> |7 8 9|
 72 |                 |7 8 9|
 73 | 
 74 | As a more complex example, consider adding a 3-element vector (dimension (3)) to
 75 | a 3x3 matrix (dimensions (3,3)). There are two ways broadcasting can happen for
 76 | this example:
 77 | 
 78 | (1) A broadcasting dimension of 1 can be used. Each vector element becomes a
 79 | column and the vector is duplicated for each row in the matrix.
 80 | 
 81 |     |7 8 9| ==> |7 8 9|
 82 |                 |7 8 9|
 83 |                 |7 8 9|
 84 | 
 85 | (2) A broadcasting dimension of 0 can be used. Each vector element becomes a row
 86 | and the vector is duplicated for each column in the matrix.
 87 | 
 88 |      |7| ==> |7 7 7|
 89 |      |8|     |8 8 8|
 90 |      |9|     |9 9 9|
 91 | 
 92 | > Note: when adding a 2x3 matrix to a 3-element vector, a broadcasting dimension
 93 | > of 0 is invalid.
 94 | 
 95 | The broadcasting dimensions can be a tuple that describes how a smaller rank
 96 | shape is broadcast into a larger rank shape. For example, given a 2x3x4 cuboid
 97 | and a 3x4 matrix, a broadcasting tuple (1,2) means matching the matrix to
 98 | dimensions 1 and 2 of the cuboid.
 99 | 
100 | This type of broadcast is used in the binary ops in `ComputationBuilder`, if the
101 | `broadcast_dimensions` argument is given. For example, see
102 | [ComputationBuilder::Add](https://www.tensorflow.org/code/tensorflow/compiler/xla/client/computation_builder.cc).
103 | In the XLA source code, this type of broadcasting is sometimes called "InDim"
104 | broadcasting.
105 | 
106 | ### Formal definition
107 | 
108 | The broadcasting attribute allows matching a lower-rank array to a higher-rank
109 | array, by specifying which dimensions of the higher-rank array to match. For
110 | example, for an array with dimensions MxNxPxQ, a vector with dimension T can be
111 | matched as follows:
112 | 
113 |               MxNxPxQ
114 | 
115 |     dim 3:          T
116 |     dim 2:        T
117 |     dim 1:      T
118 |     dim 0:    T
119 | 
120 | In each case, T has to be equal to the matching dimension of the higher-rank
121 | array. The vector's values are then broadcast from the matched dimension to all
122 | the other dimensions.
123 | 
124 | To match a TxV matrix onto the MxNxPxQ array, a pair of broadcasting dimensions
125 | are used:
126 | 
127 |               MxNxPxQ
128 |     dim 2,3:      T V
129 |     dim 1,2:    T V
130 |     dim 0,3:  T     V
131 |     etc...
132 | 
133 | The order of dimensions in the broadcasting tuple has to be the order in which
134 | the lower-rank array's dimensions are expected to match the higher-rank array's
135 | dimensions. The first element in the tuple says which dimension in the
136 | higher-rank array has to match dimension 0 in the lower-rank array. The second
137 | element for dimension 1, and so on. The order of broadcast dimensions has to be
138 | strictly increasing. For example, in the previous example it is illegal to match
139 | V to N and T to P; it is also illegal to match V to both P and N.
140 | 
141 | ## Broadcasting similar-rank arrays with degenerate dimensions
142 | 
143 | A related broadcasting problem is broadcasting two arrays that have the same
144 | rank but different dimension sizes. Similarly to Numpy's rules, this is only
145 | possible when the arrays are *compatible*. Two arrays are compatible when all
146 | their dimensions are compatible. Two dimensions are compatible if:
147 | 
148 | *   They are equal, or
149 | *   One of them is 1 (a "degenerate" dimension)
150 | 
151 | When two compatible arrays are encountered, the result shape has the maximum
152 | among the two inputs at every dimension index.
153 | 
154 | Examples:
155 | 
156 | 1.  (2,1) and (2,3) broadcast to (2,3).
157 | 2.  (1,2,5) and (7,2,5) broadcast to (7,2,5)
158 | 3.  (7,2,5) and (7,1,5) broadcast to (7,2,5)
159 | 4.  (7,2,5) and (7,2,6) are incompatible and cannot be broadcast.
160 | 
161 | A special case arises, and is also supported, where each of the input arrays has
162 | a degenerate dimension at a different index. In this case, the result is an
163 | "outer operation": (2,1) and (1,3) broadcast to (2,3). For more examples,
164 | consult the [Numpy documentation on
165 | broadcasting](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html).
166 | 
167 | ## Broadcast composition
168 | 
169 | Broadcasting of a lower-rank array to a higher-rank array **and** broadcasting
170 | using degenerate dimensions can both be performed in the same binary operation.
171 | For example, a vector of size 4 and an matrix of size 1x2 can be added together
172 | using broadcast dimensions value of (0):
173 | 
174 |     |1 2 3 4| + [5 6]    // [5 6] is a 1x2 matrix, not a vector.
175 | 
176 | First the vector is broadcast up to rank 2 (matrix) using the broadcast
177 | dimensions. The single value (0) in the broadcast dimensions indicates that
178 | dimension zero of the vector matches to dimension zero of the matrix. This
179 | produces an matrix of size 4xM where the value M is chosen to match the
180 | corresponding dimension size in the 1x2 array. Therefore, a 4x2 matrix is
181 | produced:
182 | 
183 |     |1 1| + [5 6]
184 |     |2 2|
185 |     |3 3|
186 |     |4 4|
187 | 
188 | Then "degenerate dimension broadcasting" broadcasts dimension zero of the 1x2
189 | matrix to match the corresponding dimension size of the right hand side:
190 | 
191 |     |1 1| + |5 6|     |6  7|
192 |     |2 2| + |5 6|  =  |7  8|
193 |     |3 3| + |5 6|     |8  9|
194 |     |4 4| + |5 6|     |9 10|
195 | 
196 | A more complicated example is a matrix of size 1x2 added to an array of size
197 | 4x3x1 using broadcast dimensions of (1, 2). First the 1x2 matrix is broadcast up
198 | to rank 3 using the broadcast dimensions to produces an intermediate Mx1x2 array
199 | where the dimension size M is determined by the size of the larger operand (the
200 | 4x3x1 array) producing a 4x1x2 intermediate array. The M is at dimension 0
201 | (left-most dimension) because the dimensions 1 and 2 are mapped to the
202 | dimensions of the original 1x2 matrix as the broadcast dimension are (1, 2).
203 | This intermediate array can be added to the 4x3x1 matrix using broadcasting of
204 | degenerate dimensions to produce a 4x3x2 array result.
205 | 


--------------------------------------------------------------------------------
/install/install_java.md:
--------------------------------------------------------------------------------
  1 | # Installing TensorFlow for Java
  2 | 
  3 | TensorFlow provides APIs for use in Java programs. These APIs are particularly
  4 | well-suited to loading models created in Python and executing them within a
  5 | Java application. This guide explains how to install
  6 | [TensorFlow for Java](https://www.tensorflow.org/api_docs/java/reference/org/tensorflow/package-summary)
  7 | and use it in a Java application.
  8 | 
  9 | **WARNING:** The TensorFlow Java API is *not* covered by the TensorFlow
 10 | [API stability guarantees](https://www.tensorflow.org/programmers_guide/version_semantics).
 11 | 
 12 | 
 13 | ## Supported Platforms
 14 | 
 15 | TensorFlow for Java is supported on the following operating systems:
 16 | 
 17 |   * Linux
 18 |   * Mac OS X
 19 |   * Windows
 20 |   * Android
 21 | 
 22 | The installation instructions for Android are in a separate
 23 | [Android TensorFlow Support page](https://www.tensorflow.org/code/tensorflow/contrib/android).
 24 | After installation, please see this
 25 | [complete example](https://www.tensorflow.org/code/tensorflow/examples/android)
 26 | of TensorFlow on Android.
 27 | 
 28 | ## Using TensorFlow with a Maven project
 29 | 
 30 | If your project uses [Apache Maven](https://maven.apache.org), then add the
 31 | following to the project's `pom.xml` to use the TensorFlow Java APIs:
 32 | 
 33 | ```xml
 34 | <dependency>
 35 |   <groupId>org.tensorflow</groupId>
 36 |   <artifactId>tensorflow</artifactId>
 37 |   <version>1.1.0</version>
 38 | </dependency>
 39 | ```
 40 | 
 41 | That's all.
 42 | 
 43 | ### Example
 44 | 
 45 | As an example, these steps will create a Maven project that uses TensorFlow:
 46 | 
 47 |   1. Create the project's `pom.xml`:
 48 | 
 49 | 
 50 |          <project>
 51 |              <modelVersion>4.0.0</modelVersion>
 52 |              <groupId>org.myorg</groupId>
 53 |              <artifactId>label-image</artifactId>
 54 |              <version>1.0-SNAPSHOT</version>
 55 |              <properties>
 56 |                <exec.mainClass>HelloTF</exec.mainClass>
 57 |                <!-- The sample code requires at least JDK 1.7. -->
 58 |                <!-- The maven compiler plugin defaults to a lower version -->
 59 |                <maven.compiler.source>1.7</maven.compiler.source>
 60 |                <maven.compiler.target>1.7</maven.compiler.target>
 61 |              </properties>
 62 |              <dependencies>
 63 |                <dependency>
 64 |                  <groupId>org.tensorflow</groupId>
 65 |                  <artifactId>tensorflow</artifactId>
 66 |                  <version>1.1.0</version>
 67 |                </dependency>
 68 |              </dependencies>
 69 |          </project>
 70 | 
 71 | 
 72 |   2. Create the source file (`src/main/java/HelloTF.java`):
 73 | 
 74 | 
 75 |         import org.tensorflow.Graph;
 76 |         import org.tensorflow.Session;
 77 |         import org.tensorflow.Tensor;
 78 |         import org.tensorflow.TensorFlow;
 79 | 
 80 |         public class HelloTF {
 81 |           public static void main(String[] args) throws Exception {
 82 |             try (Graph g = new Graph()) {
 83 |               final String value = "Hello from " + TensorFlow.version();
 84 |      
 85 |               // Construct the computation graph with a single operation, a constant
 86 |               // named "MyConst" with a value "value".
 87 |               try (Tensor t = Tensor.create(value.getBytes("UTF-8"))) {
 88 |                 // The Java API doesn't yet include convenience functions for adding operations.
 89 |                 g.opBuilder("Const", "MyConst").setAttr("dtype", t.dataType()).setAttr("value", t).build();
 90 |               }
 91 |      
 92 |               // Execute the "MyConst" operation in a Session.
 93 |               try (Session s = new Session(g);
 94 |                    Tensor output = s.runner().fetch("MyConst").run().get(0)) {
 95 |                 System.out.println(new String(output.bytesValue(), "UTF-8"));
 96 |               }
 97 |             }
 98 |           }
 99 |         }
100 | 
101 | 
102 |   3. Compile and execute:
103 | 
104 |      <pre> # Use -q to hide logging from the mvn tool
105 |      <b>mvn -q compile exec:java</b></pre>
106 | 
107 | 
108 | The preceeding command should output <tt>Hello from <i>version</i></tt>. If it
109 | does, you've succesfully set up TensorFlow for Java and are ready to use it in
110 | Maven projects. If not, check
111 | [Stack Overflow](http://stackoverflow.com/questions/tagged/tensorflow)
112 | for possible solutions.  You can skip reading the rest of this document.
113 | 
114 | ## Using TensorFlow with JDK
115 | 
116 | This section describes how to use TensorFlow using the `java` and `javac`
117 | commands from a JDK installation. If your project uses Apache Maven, then
118 | refer to the simpler instructions above instead.
119 | 
120 | ### Install on Linux or Mac OS
121 | 
122 | Take the following steps to install TensorFlow for Java on Linux or Mac OS:
123 | 
124 |   1. Download
125 |      [libtensorflow.jar](https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-1.1.0.jar),
126 |      which is the TensorFlow Java Archive (JAR).
127 | 
128 |   2. Decide whether you will run TensorFlow for Java on CPU(s) only or with
129 |      the help of GPU(s). To help you decide, read the section entitled
130 |      "Determine which TensorFlow to install" in one of the following guides:
131 | 
132 |      * @{$install_linux#determine_which_tensorflow_to_install$Installing TensorFlow on Linux}
133 |      * @{$install_mac#determine_which_tensorflow_to_install$Installing TensorFlow on Mac OS}
134 | 
135 |   3. Download and extract the appropriate Java Native Interface (JNI)
136 |      file for your operating system and processor support by running the
137 |      following shell commands:
138 | 
139 | 
140 |          TF_TYPE="cpu" # Default processor is CPU. If you want GPU, set to "gpu"
141 |          OS=$(uname -s | tr '[:upper:]' '[:lower:]')
142 |          mkdir -p ./jni
143 |          curl -L \
144 |            "https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow_jni-${TF_TYPE}-${OS}-x86_64-1.1.0.tar.gz" |
145 |            tar -xz -C ./jni
146 | 
147 | ### Install on Windows
148 | 
149 | Take the following steps to install TensorFlow for Java on Windows:
150 | 
151 |   1. Download
152 |      [libtensorflow.jar](https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-1.1.0.jar),
153 |      which is the TensorFlow Java Archive (JAR).
154 |   2. Download the following Java Native Interface (JNI) file appropriate for
155 |      [TensorFlow for Java on Windows](https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow_jni-cpu-windows-x86_64-1.1.0.zip).
156 |   3. Extract this .zip file.
157 | 
158 | 
159 | 
160 | ### Validate the installation
161 | 
162 | After installing TensorFlow for Java, validate your installation by entering
163 | the following code into a file named `HelloTF.java`:
164 | 
165 | ```java
166 | import org.tensorflow.Graph;
167 | import org.tensorflow.Session;
168 | import org.tensorflow.Tensor;
169 | import org.tensorflow.TensorFlow;
170 | 
171 | public class HelloTF {
172 |   public static void main(String[] args) throws Exception {
173 |     try (Graph g = new Graph()) {
174 |       final String value = "Hello from " + TensorFlow.version();
175 | 
176 |       // Construct the computation graph with a single operation, a constant
177 |       // named "MyConst" with a value "value".
178 |       try (Tensor t = Tensor.create(value.getBytes("UTF-8"))) {
179 |         // The Java API doesn't yet include convenience functions for adding operations.
180 |         g.opBuilder("Const", "MyConst").setAttr("dtype", t.dataType()).setAttr("value", t).build();
181 |       }
182 | 
183 |       // Execute the "MyConst" operation in a Session.
184 |       try (Session s = new Session(g);
185 |            Tensor output = s.runner().fetch("MyConst").run().get(0)) {
186 |         System.out.println(new String(output.bytesValue(), "UTF-8"));
187 |       }
188 |     }
189 |   }
190 | }
191 | ```
192 | 
193 | And use the instructions below to compile and run `HelloTF.java`.
194 | 
195 | 
196 | ### Compiling
197 | 
198 | When compiling a Java program that uses TensorFlow, the downloaded `.jar`
199 | must be part of your `classpath`. For example, you can include the
200 | downloaded `.jar` in your `classpath` by using the `-cp` compilation flag
201 | as follows:
202 | 
203 | <pre><b>javac -cp libtensorflow-1.1.0.jar HelloTF.java</b></pre>
204 | 
205 | 
206 | ### Running
207 | 
208 | To execute a Java program that depends on TensorFlow, ensure that the following
209 | two files are available to the JVM:
210 | 
211 |   * the downloaded `.jar` file
212 |   * the extracted JNI library
213 | 
214 | For example, the following command line executes the `HelloTF` program:
215 | 
216 | <pre><b>java -cp libtensorflow-1.1.0.jar:. -Djava.library.path=./jni HelloTF</b></pre>
217 | 
218 | If the program prints <tt>Hello from <i>version</i></tt>, you've successfully
219 | installed TensorFlow for Java and are ready to use the API.  If the program
220 | outputs something else, check
221 | [Stack Overflow](http://stackoverflow.com/questions/tagged/tensorflow)
222 | for possible solutions.
223 | 
224 | 
225 | ### Advanced Example
226 | 
227 | For a more sophisticated example, see
228 | [LabelImage.java](https://www.tensorflow.org/code/tensorflow/java/src/main/java/org/tensorflow/examples/LabelImage.java),
229 | which recognizes objects in an image.
230 | 
231 | 
232 | ## Building from source code
233 | 
234 | TensorFlow is open-source. You may build TensorFlow for Java from the
235 | TensorFlow source code by following the instructions in a
236 | [separate document](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/java/README.md).
237 | 


--------------------------------------------------------------------------------
/extend/new_data_formats.md:
--------------------------------------------------------------------------------
  1 | # Custom Data Readers
  2 | 
  3 | PREREQUISITES:
  4 | 
  5 | *   Some familiarity with C++.
  6 | *   Must have
  7 |     @{$install_sources$downloaded TensorFlow source}, and be
  8 |     able to build it.
  9 | 
 10 | We divide the task of supporting a file format into two pieces:
 11 | 
 12 | *   File formats: We use a *Reader* Op to read a *record* (which can be any
 13 |     string) from a file.
 14 | *   Record formats: We use decoder or parsing Ops to turn a string record
 15 |     into tensors usable by TensorFlow.
 16 | 
 17 | For example, to read a
 18 | [CSV file](https://en.wikipedia.org/wiki/Comma-separated_values), we use
 19 | @{tf.TextLineReader$a Reader for text files}
 20 | followed by
 21 | @{tf.decode_csv$an Op that parses CSV data from a line of text}.
 22 | 
 23 | [TOC]
 24 | 
 25 | ## Writing a Reader for a file format
 26 | 
 27 | A `Reader` is something that reads records from a file.  There are some examples
 28 | of Reader Ops already built into TensorFlow:
 29 | 
 30 | *   @{tf.TFRecordReader}
 31 |     ([source in `kernels/tf_record_reader_op.cc`](https://www.tensorflow.org/code/tensorflow/core/kernels/tf_record_reader_op.cc))
 32 | *   @{tf.FixedLengthRecordReader}
 33 |     ([source in `kernels/fixed_length_record_reader_op.cc`](https://www.tensorflow.org/code/tensorflow/core/kernels/fixed_length_record_reader_op.cc))
 34 | *   @{tf.TextLineReader}
 35 |     ([source in `kernels/text_line_reader_op.cc`](https://www.tensorflow.org/code/tensorflow/core/kernels/text_line_reader_op.cc))
 36 | 
 37 | You can see these all expose the same interface, the only differences
 38 | are in their constructors.  The most important method is `read`.
 39 | It takes a queue argument, which is where it gets filenames to
 40 | read from whenever it needs one (e.g. when the `read` op first runs, or
 41 | the previous `read` reads the last record from a file).  It produces
 42 | two scalar tensors: a string key and a string value.
 43 | 
 44 | To create a new reader called `SomeReader`, you will need to:
 45 | 
 46 | 1.  In C++, define a subclass of
 47 |     [`tensorflow::ReaderBase`](https://www.tensorflow.org/code/tensorflow/core/framework/reader_base.h)
 48 |     called `SomeReader`.
 49 | 2.  In C++, register a new reader op and kernel with the name `"SomeReader"`.
 50 | 3.  In Python, define a subclass of @{tf.ReaderBase} called `SomeReader`.
 51 | 
 52 | You can put all the C++ code in a file in
 53 | `tensorflow/core/user_ops/some_reader_op.cc`. The code to read a file will live
 54 | in a descendant of the C++ `ReaderBase` class, which is defined in
 55 | [`tensorflow/core/kernels/reader_base.h`](https://www.tensorflow.org/code/tensorflow/core/framework/reader_base.h).
 56 | You will need to implement the following methods:
 57 | 
 58 | *   `OnWorkStartedLocked`: open the next file
 59 | *   `ReadLocked`: read a record or report EOF/error
 60 | *   `OnWorkFinishedLocked`: close the current file, and
 61 | *   `ResetLocked`: get a clean slate after, e.g., an error
 62 | 
 63 | These methods have names ending in "Locked" since `ReaderBase` makes sure
 64 | to acquire a mutex before calling any of these methods, so you generally don't
 65 | have to worry about thread safety (though that only protects the members of the
 66 | class, not global state).
 67 | 
 68 | For `OnWorkStartedLocked`, the name of the file to open is the value returned by
 69 | the `current_work()` method.  `ReadLocked` has this signature:
 70 | 
 71 | ```c++
 72 | Status ReadLocked(string* key, string* value, bool* produced, bool* at_end)
 73 | ```
 74 | 
 75 | If `ReadLocked` successfully reads a record from the file, it should fill in:
 76 | 
 77 | *   `*key`: with an identifier for the record, that a human could use to find
 78 |     this record again.  You can include the filename from `current_work()`,
 79 |     and append a record number or whatever.
 80 | *   `*value`: with the contents of the record.
 81 | *   `*produced`: set to `true`.
 82 | 
 83 | If you hit the end of a file (EOF), set `*at_end` to `true`.  In either case,
 84 | return `Status::OK()`.  If there is an error, simply return it using one of the
 85 | helper functions from
 86 | [`tensorflow/core/lib/core/errors.h`](https://www.tensorflow.org/code/tensorflow/core/lib/core/errors.h)
 87 | without modifying any arguments.
 88 | 
 89 | Next you will create the actual Reader op.  It will help if you are familiar
 90 | with @{$adding_an_op$the adding an op how-to}.  The main steps
 91 | are:
 92 | 
 93 | *   Registering the op.
 94 | *   Define and register an `OpKernel`.
 95 | 
 96 | To register the op, you will use a `REGISTER_OP` call defined in
 97 | [`tensorflow/core/framework/op.h`](https://www.tensorflow.org/code/tensorflow/core/framework/op.h).
 98 | Reader ops never take any input and always have a single output with type
 99 | `resource`.  They should have string `container` and `shared_name` attrs.
100 | You may optionally define additional attrs
101 | for configuration or include documentation in a `Doc`.  For examples, see
102 | [`tensorflow/core/ops/io_ops.cc`](https://www.tensorflow.org/code/tensorflow/core/ops/io_ops.cc),
103 | e.g.:
104 | 
105 | ```c++
106 | #include "tensorflow/core/framework/op.h"
107 | 
108 | REGISTER_OP("TextLineReader")
109 |     .Output("reader_handle: resource")
110 |     .Attr("skip_header_lines: int = 0")
111 |     .Attr("container: string = ''")
112 |     .Attr("shared_name: string = ''")
113 |     .SetIsStateful()
114 |     .SetShapeFn(shape_inference::ScalarShape)
115 |     .Doc(R"doc(
116 | A Reader that outputs the lines of a file delimited by '\n'.
117 | )doc");
118 | ```
119 | 
120 | To define an `OpKernel`, Readers can use the shortcut of descending from
121 | `ReaderOpKernel`, defined in
122 | [`tensorflow/core/framework/reader_op_kernel.h`](https://www.tensorflow.org/code/tensorflow/core/framework/reader_op_kernel.h),
123 | and implement a constructor that calls `SetReaderFactory`.  After defining
124 | your class, you will need to register it using `REGISTER_KERNEL_BUILDER(...)`.
125 | An example with no attrs:
126 | 
127 | ```c++
128 | #include "tensorflow/core/framework/reader_op_kernel.h"
129 | 
130 | class TFRecordReaderOp : public ReaderOpKernel {
131 |  public:
132 |   explicit TFRecordReaderOp(OpKernelConstruction* context)
133 |       : ReaderOpKernel(context) {
134 |     Env* env = context->env();
135 |     SetReaderFactory([this, env]() { return new TFRecordReader(name(), env); });
136 |   }
137 | };
138 | 
139 | REGISTER_KERNEL_BUILDER(Name("TFRecordReader").Device(DEVICE_CPU),
140 |                         TFRecordReaderOp);
141 | ```
142 | 
143 | An example with attrs:
144 | 
145 | ```c++
146 | #include "tensorflow/core/framework/reader_op_kernel.h"
147 | 
148 | class TextLineReaderOp : public ReaderOpKernel {
149 |  public:
150 |   explicit TextLineReaderOp(OpKernelConstruction* context)
151 |       : ReaderOpKernel(context) {
152 |     int skip_header_lines = -1;
153 |     OP_REQUIRES_OK(context,
154 |                    context->GetAttr("skip_header_lines", &skip_header_lines));
155 |     OP_REQUIRES(context, skip_header_lines >= 0,
156 |                 errors::InvalidArgument("skip_header_lines must be >= 0 not ",
157 |                                         skip_header_lines));
158 |     Env* env = context->env();
159 |     SetReaderFactory([this, skip_header_lines, env]() {
160 |       return new TextLineReader(name(), skip_header_lines, env);
161 |     });
162 |   }
163 | };
164 | 
165 | REGISTER_KERNEL_BUILDER(Name("TextLineReader").Device(DEVICE_CPU),
166 |                         TextLineReaderOp);
167 | ```
168 | 
169 | The last step is to add the Python wrapper.  You can either do this by
170 | @{$adding_an_op#building_the_op_library$compiling a dynamic library}
171 | or, if you are building TensorFlow from source, adding to `user_ops.py`.
172 | For the latter, you will import `tensorflow.python.ops.io_ops` in
173 | [`tensorflow/python/user_ops/user_ops.py`](https://www.tensorflow.org/code/tensorflow/python/user_ops/user_ops.py)
174 | and add a descendant of [`io_ops.ReaderBase`](https://www.tensorflow.org/code/tensorflow/python/ops/io_ops.py).
175 | 
176 | ```python
177 | from tensorflow.python.framework import ops
178 | from tensorflow.python.ops import common_shapes
179 | from tensorflow.python.ops import io_ops
180 | 
181 | class SomeReader(io_ops.ReaderBase):
182 | 
183 |     def __init__(self, name=None):
184 |         rr = gen_user_ops.some_reader(name=name)
185 |         super(SomeReader, self).__init__(rr)
186 | 
187 | 
188 | ops.NotDifferentiable("SomeReader")
189 | ```
190 | 
191 | You can see some examples in
192 | [`tensorflow/python/ops/io_ops.py`](https://www.tensorflow.org/code/tensorflow/python/ops/io_ops.py).
193 | 
194 | ## Writing an Op for a record format
195 | 
196 | Generally this is an ordinary op that takes a scalar string record as input, and
197 | so follow @{$adding_an_op$the instructions to add an Op}.
198 | You may optionally take a scalar string key as input, and include that in error
199 | messages reporting improperly formatted data.  That way users can more easily
200 | track down where the bad data came from.
201 | 
202 | Examples of Ops useful for decoding records:
203 | 
204 | *   @{tf.parse_single_example}
205 |     (and
206 |     @{tf.parse_example})
207 | *   @{tf.decode_csv}
208 | *   @{tf.decode_raw}
209 | 
210 | Note that it can be useful to use multiple Ops to decode a particular record
211 | format.  For example, you may have an image saved as a string in
212 | [a `tf.train.Example` protocol buffer](https://www.tensorflow.org/code/tensorflow/core/example/example.proto).
213 | Depending on the format of that image, you might take the corresponding output
214 | from a
215 | @{tf.parse_single_example}
216 | op and call @{tf.image.decode_jpeg},
217 | @{tf.image.decode_png}, or
218 | @{tf.decode_raw}.  It is common to
219 | take the output of `tf.decode_raw` and use
220 | @{tf.slice} and
221 | @{tf.reshape} to extract pieces.
222 | 


--------------------------------------------------------------------------------
/extend/tool_developers/index.md:
--------------------------------------------------------------------------------
  1 | # A Tool Developer's Guide to TensorFlow Model Files
  2 | 
  3 | Most users shouldn't need to care about the internal details of how TensorFlow
  4 | stores data on disk, but you might if you're a tool developer. For example, you
  5 | may want to analyze models, or convert back and forth between TensorFlow and
  6 | other formats. This guide tries to explain some of the details of how you can
  7 | work with the main files that hold model data, to make it easier to develop
  8 | those kind of tools.
  9 | 
 10 | [TOC]
 11 | 
 12 | ## Protocol Buffers
 13 | 
 14 | All of TensorFlow's file formats are based on
 15 | [Protocol Buffers](https://developers.google.com/protocol-buffers/?hl=en), so to
 16 | start it's worth getting familiar with how they work. The summary is that you
 17 | define data structures in text files, and the protobuf tools generate classes in
 18 | C, Python, and other languages that can load, save, and access the data in a
 19 | friendly way. We often refer to Protocol Buffers as protobufs, and I'll use
 20 | that convention in this guide.
 21 | 
 22 | ## GraphDef
 23 | 
 24 | The foundation of computation in TensorFlow is the `Graph` object. This holds a
 25 | network of nodes, each representing one operation, connected to each other as
 26 | inputs and outputs. After you've created a `Graph` object, you can save it out
 27 | by calling `as_graph_def()`, which returns a `GraphDef` object.
 28 | 
 29 | The GraphDef class is an object created by the ProtoBuf library from the
 30 | definition in
 31 | [tensorflow/core/framework/graph.proto](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/graph.proto). The protobuf tools parse
 32 | this text file, and generate the code to load, store, and manipulate graph
 33 | definitions. If you see a standalone TensorFlow file representing a model, it's
 34 | likely to contain a serialized version of one of these `GraphDef` objects
 35 | saved out by the protobuf code.
 36 | 
 37 | This generated code is used to save and load the GraphDef files from disk. The code that actually loads the model looks like this:
 38 | 
 39 | ```python
 40 | graph_def = graph_pb2.GraphDef()
 41 | ```
 42 | 
 43 | This line creates an empty `GraphDef` object, the class that's been created
 44 | from the textual definition in graph.proto. This is the object we're going to
 45 | populate with the data from our file.
 46 | 
 47 | ```python
 48 | with open(FLAGS.graph, "rb") as f:
 49 | ```
 50 | 
 51 | Here we get a file handle for the path we've passed in to the script
 52 | 
 53 | ```python
 54 |   if FLAGS.input_binary:
 55 |     graph_def.ParseFromString(f.read())
 56 |   else:
 57 |     text_format.Merge(f.read(), graph_def)
 58 | ```
 59 | 
 60 | ## Text or Binary?
 61 | 
 62 | There are actually two different formats that a ProtoBuf can be saved in.
 63 | TextFormat is a human-readable form, which makes it nice for debugging and
 64 | editing, but can get large when there's numerical data like weights stored in
 65 | it. You can see a small example of that in
 66 | [graph_run_run2.pbtxt](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tensorboard/components/tf_tensorboard/test/data/graph_run_run2.pbtxt).
 67 | 
 68 | Binary format files are a lot smaller than their text equivalents, even though
 69 | they're not as readable for us. In this script, we ask the user to supply a
 70 | flag indicating whether the input file is binary or text, so we know the right
 71 | function to call. You can find an example of a large binary file inside the
 72 | [inception_v3 archive](https://storage.googleapis.com/download.tensorflow.org/models/inception_v3_2016_08_28_frozen.pb.tar.gz),
 73 | as `inception_v3_2016_08_28_frozen.pb`.
 74 | 
 75 | The API itself can be a bit confusing - the binary call is actually
 76 | `ParseFromString()`, whereas you use a utility function from the `text_format`
 77 | module to load textual files.
 78 | 
 79 | ## Nodes
 80 | 
 81 | Once you've loaded a file into the `graph_def` variable, you can now access the
 82 | data inside it. For most practical purposes, the important section is the list
 83 | of nodes stored in the node member. Here's the code that loops through those:
 84 | 
 85 | ```python
 86 | for node in graph_def.node
 87 | ```
 88 | 
 89 | Each node is a `NodeDef` object, defined in
 90 | [tensorflow/core/framework/node_def.proto](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/node_def.proto). These
 91 | are the fundamental building blocks of TensorFlow graphs, with each one defining
 92 | a single operation along with its input connections. Here are the members of a
 93 | `NodeDef`, and what they mean.
 94 | 
 95 | ### `name`
 96 | 
 97 | Every node should have a unique identifier that's not used by any other nodes
 98 | in the graph. If you don't specify one as you're building a graph using the
 99 | Python API, one reflecting the name of operation, such as "MatMul",
100 | concatenated with a monotonically increasing number, such as "5", will be
101 | picked for you. The name is used when defining the connections between nodes,
102 | and when setting inputs and outputs for the whole graph when it's run.
103 | 
104 | ### `op`
105 | 
106 | This defines what operation to run, for example `"Add"`, `"MatMul"`, or
107 | `"Conv2D"`. When a graph is run, this op name is looked up in a registry to
108 | find an implementation. The registry is populated by calls to the
109 | `REGISTER_OP()` macro, like those in
110 | [tensorflow/core/ops/nn_ops.cc](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/ops/nn_ops.cc).
111 | 
112 | ### `input`
113 | 
114 | A list of strings, each one of which is the name of another node, optionally
115 | followed by a colon and an output port number. For example, a node with two
116 | inputs might have a list like `["some_node_name", "another_node_name"]`, which
117 | is equivalent to `["some_node_name:0", "another_node_name:0"]`, and defines the
118 | node's first input as the first output from the node with the name
119 | `"some_node_name"`, and a second input from the first output of
120 | `"another_node_name"`
121 | 
122 | ### `device`
123 | 
124 | In most cases you can ignore this, since it defines where to run a node in a
125 | distributed environment, or when you want to force the operation onto CPU or
126 | GPU.
127 | 
128 | ### `attr`
129 | 
130 | This is a key/value store holding all the attributes of a node. These are the
131 | permanent properties of nodes, things that don't change at runtime such as the
132 | size of filters for convolutions, or the values of constant ops. Because there
133 | can be so many different types of attribute values, from strings, to ints, to
134 | arrays of tensor values, there's a separate protobuf file defining the data
135 | structure that holds them, in
136 | [tensorflow/core/framework/attr_value.proto](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/attr_value.proto).
137 | 
138 | Each attribute has a unique name string, and the expected attributes are listed
139 | when the operation is defined. If an attribute isn't present in a node, but it
140 | has a default listed in the operation definition, that default is used when the
141 | graph is created.
142 | 
143 | You can access all of these members by calling `node.name`, `node.op`, etc. in
144 | Python. The list of nodes stored in the `GraphDef` is a full definition of the
145 | model architecture.
146 | 
147 | ## Freezing
148 | 
149 | One confusing part about this is that the weights usually aren't stored inside
150 | the file format during training. Instead, they're held in separate checkpoint
151 | files, and there are `Variable` ops in the graph that load the latest values
152 | when they're initialized. It's often not very convenient to have separate files
153 | when you're deploying to production, so there's the
154 | [freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py) script that takes a graph definition and a set
155 | of checkpoints and freezes them together into a single file.
156 | 
157 | What this does is load the `GraphDef`, pull in the values for all the variables
158 | from the latest checkpoint file, and then replace each `Variable` op with a
159 | `Const` that has the numerical data for the weights stored in its attributes
160 | It then strips away all the extraneous nodes that aren't used for forward
161 | inference, and saves out the resulting `GraphDef` into an output file.
162 | 
163 | ## Weight Formats
164 | 
165 | If you're dealing with TensorFlow models that represent neural networks, one of
166 | the most common problems is extracting and interpreting the weight values. A
167 | common way to store them, for example in graphs created by the freeze_graph
168 | script, is as `Const` ops containing the weights as `Tensors`. These are
169 | defined in
170 | [tensorflow/core/framework/tensor.proto](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/tensor.proto), and contain information
171 | about the size and type of the data, as well as the values themselves. In
172 | Python, you get a `TensorProto` object from a `NodeDef` representing a `Const`
173 | op by calling something like `some_node_def.attr['value'].tensor`.
174 | 
175 | This will give you an object representing the weights data. The data itself
176 | will be stored in one of the lists with the suffix _val as indicated by the
177 | type of the object, for example `float_val` for 32-bit float data types.
178 | 
179 | The ordering of convolution weight values is often tricky to deal with when
180 | converting between different frameworks. In TensorFlow, the filter weights for
181 | the `Conv2D` operation are stored on the second input, and are expected to be
182 | in the order `[filter_height, filter_width, input_depth, output_depth]`, where
183 | filter_count increasing by one means moving to an adjacent value in memory.
184 | 
185 | Hopefully this rundown gives you a better idea of what's going on inside
186 | TensorFlow model files, and will help you if you ever need to manipulate them.
187 | 


--------------------------------------------------------------------------------
/programmers_guide/variables.md:
--------------------------------------------------------------------------------
  1 | # Variables: Creation, Initialization, Saving, and Loading
  2 | 
  3 | When you train a model, you use @{$python/state_ops$variables}
  4 | to hold and update parameters.  Variables are in-memory buffers containing
  5 | tensors.  They must be explicitly initialized and can be saved to disk during
  6 | and after training. You can later restore saved values to exercise or analyze
  7 | the model.
  8 | 
  9 | This document references the following TensorFlow classes.  Follow the links to
 10 | their reference manual for a complete description of their API:
 11 | 
 12 | *  The @{tf.Variable} class.
 13 | *  The @{tf.train.Saver} class.
 14 | 
 15 | 
 16 | ## Creation
 17 | 
 18 | When you create a @{$python/state_ops$Variable} you pass a
 19 | `Tensor` as its initial value to the `Variable()` constructor.  TensorFlow
 20 | provides a collection of ops that produce tensors often used for initialization
 21 | from @{$python/constant_op$constants or random values}.
 22 | 
 23 | Note that all these ops require you to specify the shape of the tensors.  That
 24 | shape automatically becomes the shape of the variable.  Variables generally
 25 | have a fixed shape, but TensorFlow provides advanced mechanisms to reshape
 26 | variables.
 27 | 
 28 | ```python
 29 | # Create two variables.
 30 | weights = tf.Variable(tf.random_normal([784, 200], stddev=0.35),
 31 |                       name="weights")
 32 | biases = tf.Variable(tf.zeros([200]), name="biases")
 33 | ```
 34 | 
 35 | Calling `tf.Variable()` adds several ops to the graph:
 36 | 
 37 | *  A `variable` op that holds the variable value.
 38 | *  An initializer op that sets the variable to its initial value.  This is
 39 |    actually a `tf.assign` op.
 40 | *  The ops for the initial value, such as the `zeros` op for the `biases`
 41 |    variable in the example are also added to the graph.
 42 | 
 43 | The value returned by `tf.Variable()` value is an instance of the Python class
 44 | `tf.Variable`.
 45 | 
 46 | ### Device placement
 47 | 
 48 | A variable can be pinned to a particular device when it is created, using a
 49 | @{tf.device$`with tf.device(...):`} block:
 50 | 
 51 | ```python
 52 | # Pin a variable to CPU.
 53 | with tf.device("/cpu:0"):
 54 |   v = tf.Variable(...)
 55 | 
 56 | # Pin a variable to GPU.
 57 | with tf.device("/gpu:0"):
 58 |   v = tf.Variable(...)
 59 | 
 60 | # Pin a variable to a particular parameter server task.
 61 | with tf.device("/job:ps/task:7"):
 62 |   v = tf.Variable(...)
 63 | ```
 64 | 
 65 | **N.B.** Operations that mutate a variable, such as
 66 | @{tf.Variable.assign} and the parameter
 67 | update operations in a
 68 | @{tf.train.Optimizer} *must* run on
 69 | the same device as the variable. Incompatible device placement directives will
 70 | be ignored when creating these operations.
 71 | 
 72 | Device placement is particularly important when running in a replicated
 73 | setting. See
 74 | @{tf.train.replica_device_setter}
 75 | for details of a device function that can simplify the configuration for devices
 76 | for a replicated model.
 77 | 
 78 | ## Initialization
 79 | 
 80 | Variable initializers must be run explicitly before other ops in your model can
 81 | be run.  The easiest way to do that is to add an op that runs all the variable
 82 | initializers, and run that op before using the model.
 83 | 
 84 | You can alternatively restore variable values from a checkpoint file, see
 85 | below.
 86 | 
 87 | Use `tf.global_variables_initializer()` to add an op to run variable initializers.
 88 | Only run that op after you have fully constructed your model and launched it in
 89 | a session.
 90 | 
 91 | ```python
 92 | # Create two variables.
 93 | weights = tf.Variable(tf.random_normal([784, 200], stddev=0.35),
 94 |                       name="weights")
 95 | biases = tf.Variable(tf.zeros([200]), name="biases")
 96 | ...
 97 | # Add an op to initialize the variables.
 98 | init_op = tf.global_variables_initializer()
 99 | 
100 | # Later, when launching the model
101 | with tf.Session() as sess:
102 |   # Run the init operation.
103 |   sess.run(init_op)
104 |   ...
105 |   # Use the model
106 |   ...
107 | ```
108 | 
109 | ### Initialization from another Variable
110 | 
111 | You sometimes need to initialize a variable from the initial value of another
112 | variable.  As the op added by `tf.global_variables_initializer()` initializes all
113 | variables in parallel you have to be careful when this is needed.
114 | 
115 | To initialize a new variable from the value of another variable use the other
116 | variable's `initialized_value()` property.  You can use the initialized value
117 | directly as the initial value for the new variable, or you can use it as any
118 | other tensor to compute a value for the new variable.
119 | 
120 | 
121 | ```python
122 | # Create a variable with a random value.
123 | weights = tf.Variable(tf.random_normal([784, 200], stddev=0.35),
124 |                       name="weights")
125 | # Create another variable with the same value as 'weights'.
126 | w2 = tf.Variable(weights.initialized_value(), name="w2")
127 | # Create another variable with twice the value of 'weights'
128 | w_twice = tf.Variable(weights.initialized_value() * 2.0, name="w_twice")
129 | ```
130 | 
131 | ### Custom Initialization
132 | 
133 | The convenience function `tf.global_variables_initializer()` adds an op to
134 | initialize *all variables* in the model.  You can also pass an explicit list of
135 | variables to initialize to `tf.variables_initializer`.  See the
136 | @{$python/state_ops$Variables Documentation} for more options,
137 | including checking if variables are initialized.
138 | 
139 | ## Saving and Restoring
140 | 
141 | The easiest way to save and restore a model is to use a `tf.train.Saver` object.
142 | The constructor adds `save` and `restore` ops to the graph for all, or a
143 | specified list, of the variables in the graph.  The saver object provides
144 | methods to run these ops, specifying paths for the checkpoint files to write to
145 | or read from.
146 | 
147 | ### Checkpoint Files
148 | 
149 | Variables are saved in binary files that, roughly, contain a map from variable
150 | names to tensor values.
151 | 
152 | When you create a `Saver` object, you can optionally choose names for the
153 | variables in the checkpoint files.  By default, it uses the value of the
154 | @{tf.Variable.name} property for
155 | each variable.
156 | 
157 | To understand what variables are in a checkpoint, you can use the
158 | [`inspect_checkpoint`](https://www.tensorflow.org/code/tensorflow/python/tools/inspect_checkpoint.py)
159 | library, and in particular, the `print_tensors_in_checkpoint_file` function.
160 | 
161 | ### Saving Variables
162 | 
163 | Create a `Saver` with `tf.train.Saver()` to manage all variables in
164 | the model.
165 | 
166 | ```python
167 | # Create some variables.
168 | v1 = tf.Variable(..., name="v1")
169 | v2 = tf.Variable(..., name="v2")
170 | ...
171 | # Add an op to initialize the variables.
172 | init_op = tf.global_variables_initializer()
173 | 
174 | # Add ops to save and restore all the variables.
175 | saver = tf.train.Saver()
176 | 
177 | # Later, launch the model, initialize the variables, do some work, save the
178 | # variables to disk.
179 | with tf.Session() as sess:
180 |   sess.run(init_op)
181 |   # Do some work with the model.
182 |   ..
183 |   # Save the variables to disk.
184 |   save_path = saver.save(sess, "/tmp/model.ckpt")
185 |   print("Model saved in file: %s" % save_path)
186 | ```
187 | 
188 | ### Restoring Variables
189 | 
190 | The same `Saver` object is used to restore variables.  Note that when you
191 | restore variables from a file you do not have to initialize them beforehand.
192 | 
193 | ```python
194 | # Create some variables.
195 | v1 = tf.Variable(..., name="v1")
196 | v2 = tf.Variable(..., name="v2")
197 | ...
198 | # Add ops to save and restore all the variables.
199 | saver = tf.train.Saver()
200 | 
201 | # Later, launch the model, use the saver to restore variables from disk, and
202 | # do some work with the model.
203 | with tf.Session() as sess:
204 |   # Restore variables from disk.
205 |   saver.restore(sess, "/tmp/model.ckpt")
206 |   print("Model restored.")
207 |   # Do some work with the model
208 |   ...
209 | ```
210 | 
211 | ### Choosing which Variables to Save and Restore
212 | 
213 | If you do not pass any argument to `tf.train.Saver()` the saver handles all
214 | variables in the graph.  Each one of them is saved under the name that was
215 | passed when the variable was created.
216 | 
217 | It is sometimes useful to explicitly specify names for variables in the
218 | checkpoint files.  For example, you may have trained a model with a variable
219 | named `"weights"` whose value you want to restore in a new variable named
220 | `"params"`.
221 | 
222 | It is also sometimes useful to only save or restore a subset of the variables
223 | used by a model.  For example, you may have trained a neural net with 5 layers,
224 | and you now want to train a new model with 6 layers, restoring the parameters
225 | from the 5 layers of the previously trained model into the first 5 layers of
226 | the new model.
227 | 
228 | You can easily specify the names and variables to save by passing to the
229 | `tf.train.Saver()` constructor a Python dictionary: keys are the
230 | names to use, values are the variables to manage.
231 | 
232 | Notes:
233 | 
234 | *  You can create as many saver objects as you want if you need to save and
235 |    restore different subsets of the model variables.  The same variable can be
236 |    listed in multiple saver objects, its value is only changed when the saver
237 |    `restore()` method is run.
238 | 
239 | *  If you only restore a subset of the model variables at the start
240 |    of a session, you have to run an initialize op for the other variables.  See
241 |    @{tf.variables_initializer}
242 |    for more information.
243 | 
244 | ```python
245 | # Create some variables.
246 | v1 = tf.Variable(..., name="v1")
247 | v2 = tf.Variable(..., name="v2")
248 | ...
249 | # Add ops to save and restore only 'v2' using the name "my_v2"
250 | saver = tf.train.Saver({"my_v2": v2})
251 | # Use the saver object normally after that.
252 | ...
253 | ```
254 | 


--------------------------------------------------------------------------------
/extend/architecture.md:
--------------------------------------------------------------------------------
  1 | # TensorFlow Architecture
  2 | 
  3 | We designed TensorFlow for large-scale distributed training and inference, but
  4 | it is also flexible enough to support experimentation with new machine
  5 | learning models and system-level optimizations.
  6 | 
  7 | This document describes the system architecture that makes possible this
  8 | combination of scale and flexibility. It assumes that you have basic familiarity
  9 | with TensorFlow programming concepts such as the computation graph, operations,
 10 | and sessions. See @{$get_started$Getting Started}
 11 | for an introduction to these topics. Some familiarity
 12 | with @{$distributed$distributed TensorFlow}
 13 | will also be helpful.
 14 | 
 15 | This document is for developers who want to extend TensorFlow in some way not
 16 | supported by current APIs, hardware engineers who want to optimize for
 17 | TensorFlow, implementers of machine learning systems working on scaling and
 18 | distribution, or anyone who wants to look under Tensorflow's hood. After
 19 | reading it you should understand TensorFlow architecture well enough to read
 20 | and modify the core TensorFlow code.
 21 | 
 22 | ## Overview
 23 | 
 24 | The TensorFlow runtime is a cross-platform library. Figure 1 illustrates its
 25 | general architecture. A C API separates user level code in different languages
 26 | from the core runtime.
 27 | 
 28 | ![TensorFlow Layers](../images/layers.png){: width="300"}
 29 | 
 30 | **Figure 1**
 31 | 
 32 | 
 33 | This document focuses on the following layers:
 34 | 
 35 | *  **Client**:
 36 |    *  Defines the computation as a dataflow graph.
 37 |    *  Initiates graph execution using a [**session**](
 38 |       https://www.tensorflow.org/code/tensorflow/python/client/session.py)
 39 | *  **Distributed Master**
 40 |    *  Prunes a specific subgraph from the graph, as defined by the arguments
 41 |       to Session.run().
 42 |    *  Partitions the subgraph into multiple pieces that run in different
 43 |       processes and devices.
 44 |    *  Distributes the graph pieces to worker services.
 45 |    *  Initiates graph piece execution by worker services.
 46 | *  **Worker Services** (one for each task)
 47 |    *  Schedule the execution of graph operations using kernel implementations
 48 |       appropriate to the available hardware (CPUs, GPUs, etc).
 49 |    *  Send and receive operation results to and from other worker services.
 50 | *  **Kernel Implementations**
 51 |    *  Perform the computation for individual graph operations.
 52 | 
 53 | Figure 2 illustrates the interaction of these components. "/job:worker/task:0" and
 54 | "/job:ps/task:0" are both tasks with worker services. "PS" stands for "parameter
 55 | server": a task responsible for storing and updating the model's parameters.
 56 | Other tasks send updates to these parameters as they work on optimizing the
 57 | parameters. This particular division of labor between tasks is not required, but
 58 | it is common for distributed training.
 59 | 
 60 | ![TensorFlow Architecture Diagram](../images/diag1.svg){: width="500"}
 61 | 
 62 | **Figure 2**
 63 | 
 64 | Note that the Distributed Master and Worker Service only exist in
 65 | distributed TensorFlow. The single-process version of TensorFlow includes a
 66 | special Session implementation that does everything the distributed master does
 67 | but only communicates with devices in the local process.
 68 | 
 69 | The following sections describe the core TensorFlow layers in greater detail and
 70 | step through the processing of an example graph.
 71 | 
 72 | ## Client
 73 | 
 74 | Users write the client TensorFlow program that builds the computation graph.
 75 | This program can either directly compose individual operations or use a
 76 | convenience library like the Estimators API to compose neural network layers and
 77 | other higher-level abstractions. TensorFlow supports multiple client
 78 | languages, and we have prioritized Python and C++, because our internal users
 79 | are most familiar with these languages. As features become more established,
 80 | we typically port them to C++, so that users can access an optimized
 81 | implementation from all client languages. Most of the training libraries are
 82 | still Python-only, but C++ does have support for efficient inference.
 83 | 
 84 | The client creates a session, which sends the graph definition to the
 85 | distributed master as a @{tf.GraphDef}
 86 | protocol buffer. When the client evaluates a node or nodes in the
 87 | graph, the evaluation triggers a call to the distributed master to initiate
 88 | computation.
 89 | 
 90 | In Figure 3, the client has built a graph that applies weights (w) to a
 91 | feature vector (x), adds a bias term (b) and saves the result in a variable
 92 | (s).
 93 | 
 94 | ![TensorFlow Architecture Diagram: Client](../images/graph_client.svg){: width="700"}
 95 | 
 96 | **Figure 3**
 97 | 
 98 | ### Code
 99 | 
100 | *  @{tf.Session}
101 | 
102 | ## Distributed master
103 | 
104 | The distributed master:
105 | 
106 | *  prunes the graph to obtain the subgraph required to evaluate the nodes
107 |    requested by the client,
108 | *  partitions the graph to obtain graph pieces for
109 |    each participating device, and
110 | *  caches these pieces so that they may be re-used in subsequent steps.
111 | 
112 | Since the master sees the overall computation for
113 | a step, it applies standard optimizations such as common subexpression
114 | elimination and constant folding. It then coordinates execution of the
115 | optimized subgraphs across a set of tasks.
116 | 
117 | ![TensorFlow Architecture Diagram: Master](../images/graph_master_cln.svg){: width="700"}
118 | 
119 | **Figure 4**
120 | 
121 | 
122 | Figure 5 shows a possible partition of our example graph. The distributed
123 | master has grouped the model parameters in order to place them together on the
124 | parameter server.
125 | 
126 | ![Partitioned Graph](../images/graph_split1.svg){: width="700"}
127 | 
128 | **Figure 5**
129 | 
130 | 
131 | Where graph edges are cut by the partition, the distributed master inserts
132 | send and receive nodes to pass information between the distributed tasks
133 | (Figure 6).
134 | 
135 | ![Partitioned Graph](../images/graph_split2.svg){: width="700"}
136 | 
137 | **Figure 6**
138 | 
139 | 
140 | The distributed master then ships the graph pieces to the distributed tasks.
141 | 
142 | ![Partitioned Graph](../images/graph_workers_cln.svg){: width="700"}
143 | 
144 | **Figure 7**
145 | 
146 | ### Code
147 | 
148 | *  [MasterService API definition](https://www.tensorflow.org/code/tensorflow/core/protobuf/master_service.proto)
149 | *  [Master interface](https://www.tensorflow.org/code/tensorflow/core/distributed_runtime/master_interface.h)
150 | 
151 | ## Worker Service
152 | 
153 | The worker service in each task:
154 | 
155 | *  handles requests from the master,
156 | *  schedules the execution of the kernels for the operations that comprise a
157 |    local subgraph, and
158 | *  mediates direct communication between tasks.
159 | 
160 | We optimize the worker service for running large graphs with low overhead. Our
161 | current implementation can execute tens of thousands of subgraphs per second,
162 | which enables a large number of replicas to make rapid, fine-grained training
163 | steps. The worker service dispatches kernels to local devices and runs kernels
164 | in parallel when possible, for example by using multiple CPU cores or GPU
165 | streams.
166 | 
167 | We specialize Send and Recv operations for each pair of source and destination
168 | device types:
169 | 
170 | *  Transfers between local CPU and GPU devices use the
171 |    `cudaMemcpyAsync()` API to overlap computation and data transfer.
172 | *  Transfers between two local GPUs use peer-to-peer DMA, to avoid an expensive
173 |    copy via the host CPU.
174 | 
175 | For transfers between tasks, TensorFlow uses multiple protocols, including:
176 | 
177 | *  gRPC over TCP.
178 | *  RDMA over Converged Ethernet.
179 | 
180 | We also have preliminary support for NVIDIA's NCCL library for multi-GPU
181 | communication (see [`tf.contrib.nccl`](
182 | https://www.tensorflow.org/code/tensorflow/contrib/nccl/python/ops/nccl_ops.py)).
183 | 
184 | ![Partitioned Graph](../images/graph_send_recv.svg){: width="700"}
185 | 
186 | **Figure 8**
187 | 
188 | ### Code
189 | 
190 | *   [WorkerService API definition](https://www.tensorflow.org/code/tensorflow/core/protobuf/worker_service.proto)
191 | *   [Worker interface](https://www.tensorflow.org/code/tensorflow/core/distributed_runtime/worker_interface.h)
192 | *   [Remote rendezvous (for Send and Recv implementations)](https://www.tensorflow.org/code/tensorflow/core/distributed_runtime/rpc/rpc_rendezvous_mgr.h)
193 | 
194 | ## Kernel Implementations
195 | 
196 | The runtime contains over 200 standard operations, including mathematical, array
197 | manipulation, control flow, and state management operations. Each of these
198 | operations can have kernel implementations optimized for a variety of devices.
199 | Many of the operation kernels are implemented using Eigen::Tensor, which uses
200 | C++ templates to generate efficient parallel code for multicore CPUs and GPUs;
201 | however, we liberally use libraries like cuDNN where a more efficient kernel
202 | implementation is possible. We have also implemented
203 | @{$quantization$quantization}, which enables
204 | faster inference in environments such as mobile devices and high-throughput
205 | datacenter applications, and use the
206 | [gemmlowp](https://github.com/google/gemmlowp) low-precision matrix library to
207 | accelerate quantized computation.
208 | 
209 | If it is difficult or inefficient to represent a subcomputation as a composition
210 | of operations, users can register additional kernels that provide an efficient
211 | implementation written in C++. For example, we recommend registering your own
212 | fused kernels for some performance critical operations, such as the ReLU and
213 | Sigmoid activation functions and their corresponding gradients. The @{$xla$XLA Compiler} has an
214 | experimental implementation of automatic kernel fusion.
215 | 
216 | ### Code
217 | 
218 | *   [`OpKernel` interface](https://www.tensorflow.org/code/tensorflow/core/framework/op_kernel.h)
219 | 


--------------------------------------------------------------------------------
/get_started/summaries_and_tensorboard.md:
--------------------------------------------------------------------------------
  1 | # TensorBoard: Visualizing Learning
  2 | 
  3 | The computations you'll use TensorFlow for - like training a massive
  4 | deep neural network - can be complex and confusing. To make it easier to
  5 | understand, debug, and optimize TensorFlow programs, we've included a suite of
  6 | visualization tools called TensorBoard. You can use TensorBoard to visualize
  7 | your TensorFlow graph, plot quantitative metrics about the execution of your
  8 | graph, and show additional data like images that pass through it. When
  9 | TensorBoard is fully configured, it looks like this:
 10 | 
 11 | ![MNIST TensorBoard](../images/mnist_tensorboard.png "MNIST TensorBoard")
 12 | 
 13 | <div class="video-wrapper">
 14 |   <iframe class="devsite-embedded-youtube-video" data-video-id="eBbEDRsCmv4"
 15 |           data-autohide="1" data-showinfo="0" frameborder="0" allowfullscreen>
 16 |   </iframe>
 17 | </div>
 18 | 
 19 | This tutorial is intended to get you started with simple TensorBoard usage.
 20 | There are other resources available as well! The [TensorBoard README](https://www.tensorflow.org/code/tensorflow/tensorboard/README.md)
 21 | has a lot more information on TensorBoard usage, including tips & tricks, and
 22 | debugging information.
 23 | 
 24 | ## Serializing the data
 25 | 
 26 | TensorBoard operates by reading TensorFlow events files, which contain summary
 27 | data that you can generate when running TensorFlow. Here's the general
 28 | lifecycle for summary data within TensorBoard.
 29 | 
 30 | First, create the TensorFlow graph that you'd like to collect summary
 31 | data from, and decide which nodes you would like to annotate with
 32 | @{$python/summary$summary operations}.
 33 | 
 34 | For example, suppose you are training a convolutional neural network for
 35 | recognizing MNIST digits. You'd like to record how the learning rate
 36 | varies over time, and how the objective function is changing. Collect these by
 37 | attaching @{tf.summary.scalar} ops
 38 | to the nodes that output the learning rate and loss respectively. Then, give
 39 | each `scalar_summary` a meaningful `tag`, like `'learning rate'` or `'loss
 40 | function'`.
 41 | 
 42 | Perhaps you'd also like to visualize the distributions of activations coming
 43 | off a particular layer, or the distribution of gradients or weights. Collect
 44 | this data by attaching
 45 | @{tf.summary.histogram} ops to
 46 | the gradient outputs and to the variable that holds your weights, respectively.
 47 | 
 48 | For details on all of the summary operations available, check out the docs on
 49 | @{$python/summary$summary operations}.
 50 | 
 51 | Operations in TensorFlow don't do anything until you run them, or an op that
 52 | depends on their output. And the summary nodes that we've just created are
 53 | peripheral to your graph: none of the ops you are currently running depend on
 54 | them. So, to generate summaries, we need to run all of these summary nodes.
 55 | Managing them by hand would be tedious, so use
 56 | @{tf.summary.merge_all}
 57 | to combine them into a single op that generates all the summary data.
 58 | 
 59 | Then, you can just run the merged summary op, which will generate a serialized
 60 | `Summary` protobuf object with all of your summary data at a given step.
 61 | Finally, to write this summary data to disk, pass the summary protobuf to a
 62 | @{tf.summary.FileWriter}.
 63 | 
 64 | The `FileWriter` takes a logdir in its constructor - this logdir is quite
 65 | important, it's the directory where all of the events will be written out.
 66 | Also, the `FileWriter` can optionally take a `Graph` in its constructor.
 67 | If it receives a `Graph` object, then TensorBoard will visualize your graph
 68 | along with tensor shape information. This will give you a much better sense of
 69 | what flows through the graph: see
 70 | @{$graph_viz#tensor-shape-information$Tensor shape information}.
 71 | 
 72 | Now that you've modified your graph and have a `FileWriter`, you're ready to
 73 | start running your network! If you want, you could run the merged summary op
 74 | every single step, and record a ton of training data. That's likely to be more
 75 | data than you need, though. Instead, consider running the merged summary op
 76 | every `n` steps.
 77 | 
 78 | The code example below is a modification of the
 79 | @{$beginners$simple MNIST tutorial},
 80 | in which we have added some summary ops, and run them every ten steps. If you
 81 | run this and then launch `tensorboard --logdir=/tmp/mnist_logs`, you'll be able
 82 | to visualize statistics, such as how the weights or accuracy varied during
 83 | training. The code below is an excerpt; full source is
 84 | [here](https://www.tensorflow.org/code/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py).
 85 | 
 86 | ```python
 87 | def variable_summaries(var):
 88 |   """Attach a lot of summaries to a Tensor (for TensorBoard visualization)."""
 89 |   with tf.name_scope('summaries'):
 90 |     mean = tf.reduce_mean(var)
 91 |     tf.summary.scalar('mean', mean)
 92 |     with tf.name_scope('stddev'):
 93 |       stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean)))
 94 |     tf.summary.scalar('stddev', stddev)
 95 |     tf.summary.scalar('max', tf.reduce_max(var))
 96 |     tf.summary.scalar('min', tf.reduce_min(var))
 97 |     tf.summary.histogram('histogram', var)
 98 | 
 99 | def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu):
100 |   """Reusable code for making a simple neural net layer.
101 | 
102 |   It does a matrix multiply, bias add, and then uses relu to nonlinearize.
103 |   It also sets up name scoping so that the resultant graph is easy to read,
104 |   and adds a number of summary ops.
105 |   """
106 |   # Adding a name scope ensures logical grouping of the layers in the graph.
107 |   with tf.name_scope(layer_name):
108 |     # This Variable will hold the state of the weights for the layer
109 |     with tf.name_scope('weights'):
110 |       weights = weight_variable([input_dim, output_dim])
111 |       variable_summaries(weights)
112 |     with tf.name_scope('biases'):
113 |       biases = bias_variable([output_dim])
114 |       variable_summaries(biases)
115 |     with tf.name_scope('Wx_plus_b'):
116 |       preactivate = tf.matmul(input_tensor, weights) + biases
117 |       tf.summary.histogram('pre_activations', preactivate)
118 |     activations = act(preactivate, name='activation')
119 |     tf.summary.histogram('activations', activations)
120 |     return activations
121 | 
122 | hidden1 = nn_layer(x, 784, 500, 'layer1')
123 | 
124 | with tf.name_scope('dropout'):
125 |   keep_prob = tf.placeholder(tf.float32)
126 |   tf.summary.scalar('dropout_keep_probability', keep_prob)
127 |   dropped = tf.nn.dropout(hidden1, keep_prob)
128 | 
129 | # Do not apply softmax activation yet, see below.
130 | y = nn_layer(dropped, 500, 10, 'layer2', act=tf.identity)
131 | 
132 | with tf.name_scope('cross_entropy'):
133 |   # The raw formulation of cross-entropy,
134 |   #
135 |   # tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.softmax(y)),
136 |   #                               reduction_indices=[1]))
137 |   #
138 |   # can be numerically unstable.
139 |   #
140 |   # So here we use tf.nn.softmax_cross_entropy_with_logits on the
141 |   # raw outputs of the nn_layer above, and then average across
142 |   # the batch.
143 |   diff = tf.nn.softmax_cross_entropy_with_logits(targets=y_, logits=y)
144 |   with tf.name_scope('total'):
145 |     cross_entropy = tf.reduce_mean(diff)
146 | tf.summary.scalar('cross_entropy', cross_entropy)
147 | 
148 | with tf.name_scope('train'):
149 |   train_step = tf.train.AdamOptimizer(FLAGS.learning_rate).minimize(
150 |       cross_entropy)
151 | 
152 | with tf.name_scope('accuracy'):
153 |   with tf.name_scope('correct_prediction'):
154 |     correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1))
155 |   with tf.name_scope('accuracy'):
156 |     accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32))
157 | tf.summary.scalar('accuracy', accuracy)
158 | 
159 | # Merge all the summaries and write them out to /tmp/mnist_logs (by default)
160 | merged = tf.summary.merge_all()
161 | train_writer = tf.summary.FileWriter(FLAGS.summaries_dir + '/train',
162 |                                       sess.graph)
163 | test_writer = tf.summary.FileWriter(FLAGS.summaries_dir + '/test')
164 | tf.global_variables_initializer().run()
165 | ```
166 | 
167 | After we've initialized the `FileWriters`, we have to add summaries to the
168 | `FileWriters` as we train and test the model.
169 | 
170 | ```python
171 | # Train the model, and also write summaries.
172 | # Every 10th step, measure test-set accuracy, and write test summaries
173 | # All other steps, run train_step on training data, & add training summaries
174 | 
175 | def feed_dict(train):
176 |   """Make a TensorFlow feed_dict: maps data onto Tensor placeholders."""
177 |   if train or FLAGS.fake_data:
178 |     xs, ys = mnist.train.next_batch(100, fake_data=FLAGS.fake_data)
179 |     k = FLAGS.dropout
180 |   else:
181 |     xs, ys = mnist.test.images, mnist.test.labels
182 |     k = 1.0
183 |   return {x: xs, y_: ys, keep_prob: k}
184 | 
185 | for i in range(FLAGS.max_steps):
186 |   if i % 10 == 0:  # Record summaries and test-set accuracy
187 |     summary, acc = sess.run([merged, accuracy], feed_dict=feed_dict(False))
188 |     test_writer.add_summary(summary, i)
189 |     print('Accuracy at step %s: %s' % (i, acc))
190 |   else:  # Record train set summaries, and train
191 |     summary, _ = sess.run([merged, train_step], feed_dict=feed_dict(True))
192 |     train_writer.add_summary(summary, i)
193 | ```
194 | 
195 | You're now all set to visualize this data using TensorBoard.
196 | 
197 | 
198 | ## Launching TensorBoard
199 | 
200 | To run TensorBoard, use the following command (alternatively `python -m
201 | tensorflow.tensorboard`)
202 | 
203 | ```bash
204 | tensorboard --logdir=path/to/log-directory
205 | ```
206 | 
207 | where `logdir` points to the directory where the `FileWriter` serialized its
208 | data.  If this `logdir` directory contains subdirectories which contain
209 | serialized data from separate runs, then TensorBoard will visualize the data
210 | from all of those runs. Once TensorBoard is running, navigate your web browser
211 | to `localhost:6006` to view the TensorBoard.
212 | 
213 | When looking at TensorBoard, you will see the navigation tabs in the top right
214 | corner. Each tab represents a set of serialized data that can be visualized.
215 | 
216 | For in depth information on how to use the *graph* tab to visualize your graph,
217 | see @{$graph_viz$TensorBoard: Graph Visualization}.
218 | 
219 | For more usage information on TensorBoard in general, see the [TensorBoard
220 | README](https://www.tensorflow.org/code/tensorflow/tensorboard/README.md).
221 | 


--------------------------------------------------------------------------------
/tutorials/linear.md:
--------------------------------------------------------------------------------
  1 | # Large-scale Linear Models with TensorFlow
  2 | 
  3 | The tf.learn API provides (among other things) a rich set of tools for working
  4 | with linear models in TensorFlow. This document provides an overview of those
  5 | tools. It explains:
  6 | 
  7 |    * what a linear model is.
  8 |    * why you might want to use a linear model.
  9 |    * how tf.learn makes it easy to build linear models in TensorFlow.
 10 |    * how you can use tf.learn to combine linear models with
 11 |    deep learning to get the advantages of both.
 12 | 
 13 | Read this overview to decide whether the tf.learn linear model tools might be
 14 | useful to you. Then do the @{$wide$Linear Models tutorial} to
 15 | give it a try. This overview uses code samples from the tutorial, but the
 16 | tutorial walks through the code in greater detail.
 17 | 
 18 | To understand this overview it will help to have some familiarity
 19 | with basic machine learning concepts, and also with
 20 | @{$tflearn$tf.learn}.
 21 | 
 22 | [TOC]
 23 | 
 24 | ## What is a linear model?
 25 | 
 26 | A *linear model* uses a single weighted sum of features to make a prediction.
 27 | For example, if you have [data](https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.names)
 28 | on age, years of education, and weekly hours of
 29 | work for a population, you can learn weights for each of those numbers so that
 30 | their weighted sum estimates a person's salary. You can also use linear models
 31 | for classification.
 32 | 
 33 | Some linear models transform the weighted sum into a more convenient form. For
 34 | example, *logistic regression* plugs the weighted sum into the logistic
 35 | function to turn the output into a value between 0 and 1. But you still just
 36 | have one weight for each input feature.
 37 | 
 38 | ## Why would you want to use a linear model?
 39 | 
 40 | Why would you want to use so simple a model when recent research has
 41 | demonstrated the power of more complex neural networks with many layers?
 42 | 
 43 | Linear models:
 44 | 
 45 |    * train quickly, compared to deep neural nets.
 46 |    * can work well on very large feature sets.
 47 |    * can be trained with algorithms that don't require a lot of fiddling
 48 |    with learning rates, etc.
 49 |    * can be interpreted and debugged more easily than neural nets.
 50 |    You can examine the weights assigned to each feature to figure out what's
 51 |    having the biggest impact on a prediction.
 52 |    * provide an excellent starting point for learning about machine learning.
 53 |    * are widely used in industry.
 54 | 
 55 | ## How does tf.learn help you build linear models?
 56 | 
 57 | You can build a linear model from scratch in TensorFlow without the help of a
 58 | special API. But tf.learn provides some tools that make it easier to build
 59 | effective large-scale linear models.
 60 | 
 61 | ### Feature columns and transformations
 62 | 
 63 | Much of the work of designing a linear model consists of transforming raw data
 64 | into suitable input features. tf.learn uses the `FeatureColumn` abstraction to
 65 | enable these transformations.
 66 | 
 67 | A `FeatureColumn` represents a single feature in your data. A `FeatureColumn`
 68 | may represent a quantity like 'height', or it may represent a category like
 69 | 'eye_color' where the value is drawn from a set of discrete possibilities like {'blue', 'brown', 'green'}.
 70 | 
 71 | In the case of both *continuous features* like 'height' and *categorical
 72 | features* like 'eye_color', a single value in the data might get transformed
 73 | into a sequence of numbers before it is input into the model. The
 74 | `FeatureColumn` abstraction lets you manipulate the feature as a single
 75 | semantic unit in spite of this fact. You can specify transformations and
 76 | select features to include without dealing with specific indices in the
 77 | tensors you feed into the model.
 78 | 
 79 | #### Sparse columns
 80 | 
 81 | Categorical features in linear models are typically translated into a sparse
 82 | vector in which each possible value has a corresponding index or id. For
 83 | example, if there are only three possible eye colors you can represent
 84 | 'eye_color' as a length 3 vector: 'brown' would become [1, 0, 0], 'blue' would
 85 | become [0, 1, 0] and 'green' would become [0, 0, 1]. These vectors are called
 86 | "sparse" because they may be very long, with many zeros, when the set of
 87 | possible values is very large (such as all English words).
 88 | 
 89 | While you don't need to use sparse columns to use tf.learn linear models, one
 90 | of the strengths of linear models is their ability to deal with large sparse
 91 | vectors. Sparse features are a primary use case for the tf.learn linear model
 92 | tools.
 93 | 
 94 | ##### Encoding sparse columns
 95 | 
 96 | `FeatureColumn` handles the conversion of categorical values into vectors
 97 | automatically, with code like this:
 98 | 
 99 | ```python
100 | eye_color = tf.contrib.layers.sparse_column_with_keys(
101 |   column_name="eye_color", keys=["blue", "brown", "green"])
102 | ```
103 | 
104 | where `eye_color` is the name of a column in your source data.
105 | 
106 | You can also generate `FeatureColumn`s for categorical features for which you
107 | don't know all possible values. For this case you would use
108 | `sparse_column_with_hash_bucket()`, which uses a hash function to assign
109 | indices to feature values.
110 | 
111 | ```python
112 | education = tf.contrib.layers.sparse_column_with_hash_bucket(\
113 |     "education", hash_bucket_size=1000)
114 | ```
115 | 
116 | ##### Feature Crosses
117 | 
118 | Because linear models assign independent weights to separate features, they
119 | can't learn the relative importance of specific combinations of feature
120 | values. If you have a feature 'favorite_sport' and a feature 'home_city' and
121 | you're trying to predict whether a person likes to wear red, your linear model
122 | won't be able to learn that baseball fans from St. Louis especially like to
123 | wear red.
124 | 
125 | You can get around this limitation by creating a new feature
126 | 'favorite_sport_x_home_city'. The value of this feature for a given person is
127 | just the concatenation of the values of the two source features:
128 | 'baseball_x_stlouis', for example. This sort of combination feature is called
129 | a *feature cross*.
130 | 
131 | The `crossed_column()` method makes it easy to set up feature crosses:
132 | 
133 | ```python
134 | sport = tf.contrib.layers.sparse_column_with_hash_bucket(\
135 |     "sport", hash_bucket_size=1000)
136 | city = tf.contrib.layers.sparse_column_with_hash_bucket(\
137 |     "city", hash_bucket_size=1000)
138 | sport_x_city = tf.contrib.layers.crossed_column(
139 |     [sport, city], hash_bucket_size=int(1e4))
140 | ```
141 | 
142 | #### Continuous columns
143 | 
144 | You can specify a continuous feature like so:
145 | 
146 | ```python
147 | age = tf.contrib.layers.real_valued_column("age")
148 | ```
149 | 
150 | Although, as a single real number, a continuous feature can often be input
151 | directly into the model, tf.learn offers useful transformations for this sort
152 | of column as well.
153 | 
154 | ##### Bucketization
155 | 
156 | *Bucketization* turns a continuous column into a categorical column. This
157 | transformation lets you use continuous features in feature crosses, or learn
158 | cases where specific value ranges have particular importance.
159 | 
160 | Bucketization divides the range of possible values into subranges called
161 | buckets:
162 | 
163 | ```python
164 | age_buckets = tf.contrib.layers.bucketized_column(
165 |     age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65])
166 | ```
167 | 
168 | The bucket into which a value falls becomes the categorical label for
169 | that value.
170 | 
171 | #### Input function
172 | 
173 | `FeatureColumn`s provide a specification for the input data for your model,
174 | indicating how to represent and transform the data. But they do not provide
175 | the data itself. You provide the data through an input function.
176 | 
177 | The input function must return a dictionary of tensors. Each key corresponds to
178 | the name of a `FeatureColumn`. Each key's value is a tensor containing the
179 | values of that feature for all data instances. See
180 | @{$input_fn$Building Input Functions with tf.contrib.learn} for a
181 | more comprehensive look at input functions, and `input_fn` in the
182 | [linear models tutorial code](https://www.tensorflow.org/code/tensorflow/examples/learn/wide_n_deep_tutorial.py)
183 | for an example implementation of an input function.
184 | 
185 | The input function is passed to the `fit()` and `evaluate()` calls that
186 | initiate training and testing, as described in the next section.
187 | 
188 | ### Linear estimators
189 | 
190 | tf.learn's estimator classes provide a unified training and evaluation harness
191 | for regression and classification models. They take care of the details of the
192 | training and evaluation loops and allow the user to focus on model inputs and
193 | architecture.
194 | 
195 | To build a linear estimator, you can use either the
196 | `tf.contrib.learn.LinearClassifier` estimator or the
197 | `tf.contrib.learn.LinearRegressor` estimator, for classification and
198 | regression respectively.
199 | 
200 | As with all tf.learn estimators, to run the estimator you just:
201 | 
202 |    1. Instantiate the estimator class. For the two linear estimator classes,
203 |    you pass a list of `FeatureColumn`s to the constructor.
204 |    2. Call the estimator's `fit()` method to train it.
205 |    3. Call the estimator's `evaluate()` method to see how it does.
206 | 
207 | For example:
208 | 
209 | ```python
210 | e = tf.contrib.learn.LinearClassifier(feature_columns=[
211 |   native_country, education, occupation, workclass, marital_status,
212 |   race, age_buckets, education_x_occupation, age_buckets_x_race_x_occupation],
213 |   model_dir=YOUR_MODEL_DIRECTORY)
214 | e.fit(input_fn=input_fn_train, steps=200)
215 | # Evaluate for one step (one pass through the test data).
216 | results = e.evaluate(input_fn=input_fn_test, steps=1)
217 | 
218 | # Print the stats for the evaluation.
219 | for key in sorted(results):
220 |     print("%s: %s" % (key, results[key]))
221 | ```
222 | 
223 | ### Wide and deep learning
224 | 
225 | The tf.learn API also provides an estimator class that lets you jointly train
226 | a linear model and a deep neural network. This novel approach combines the
227 | ability of linear models to "memorize" key features with the generalization
228 | ability of neural nets. Use `tf.contrib.learn.DNNLinearCombinedClassifier` to
229 | create this sort of "wide and deep" model:
230 | 
231 | ```python
232 | e = tf.contrib.learn.DNNLinearCombinedClassifier(
233 |     model_dir=YOUR_MODEL_DIR,
234 |     linear_feature_columns=wide_columns,
235 |     dnn_feature_columns=deep_columns,
236 |     dnn_hidden_units=[100, 50])
237 | ```
238 | For more information, see the @{$wide_and_deep$Wide and Deep Learning tutorial}.
239 | 


--------------------------------------------------------------------------------
/programmers_guide/meta_graph.md:
--------------------------------------------------------------------------------
  1 | # Exporting and Importing a MetaGraph
  2 | 
  3 | A [`MetaGraph`](https://www.tensorflow.org/code/tensorflow/core/protobuf/meta_graph.proto) contains both a TensorFlow GraphDef
  4 | as well as associated metadata necessary for running computation in a
  5 | graph when crossing a process boundary.  It can also be used for long
  6 | term storage of graphs.  The MetaGraph contains the information required
  7 | to continue training, perform evaluation, or run inference on a previously trained graph.
  8 | 
  9 | The APIs for exporting and importing the complete model are in
 10 | the @{tf.train.Saver} class:
 11 | @{tf.train.export_meta_graph}
 12 | and
 13 | @{tf.train.import_meta_graph}.
 14 | 
 15 | ## What's in a MetaGraph
 16 | 
 17 | The information contained in a MetaGraph is expressed as a
 18 | [`MetaGraphDef`](https://www.tensorflow.org/code/tensorflow/core/protobuf/meta_graph.proto)
 19 | protocol buffer. It contains the following fields:
 20 | 
 21 | * [`MetaInfoDef`](https://www.tensorflow.org/code/tensorflow/core/protobuf/meta_graph.proto) for meta information, such as version and other user information.
 22 | * [`GraphDef`](https://www.tensorflow.org/code/tensorflow/core/framework/graph.proto) for describing the graph.
 23 | * [`SaverDef`](https://www.tensorflow.org/code/tensorflow/core/protobuf/saver.proto) for the saver.
 24 | * [`CollectionDef`](https://www.tensorflow.org/code/tensorflow/core/protobuf/meta_graph.proto)
 25 | map that further describes additional components of the model, such as
 26 | @{$python/state_ops$`Variables`},
 27 | @{tf.train.QueueRunner}, etc.  In order for a Python object to be serialized
 28 | to and from `MetaGraphDef`, the Python class must implement `to_proto()` and
 29 | `from_proto()` methods, and register them with the system using
 30 | `register_proto_function`.
 31 | 
 32 |   For example,
 33 | 
 34 |   ```Python
 35 |   def to_proto(self, export_scope=None):
 36 | 
 37 |     """Converts a `Variable` to a `VariableDef` protocol buffer.
 38 | 
 39 |     Args:
 40 |       export_scope: Optional `string`. Name scope to remove.
 41 | 
 42 |     Returns:
 43 |       A `VariableDef` protocol buffer, or `None` if the `Variable` is not
 44 |       in the specified name scope.
 45 |     """
 46 |     if (export_scope is None or
 47 |         self._variable.name.startswith(export_scope)):
 48 |       var_def = variable_pb2.VariableDef()
 49 |       var_def.variable_name = ops.strip_name_scope(
 50 |           self._variable.name, export_scope)
 51 |       var_def.initializer_name = ops.strip_name_scope(
 52 |           self.initializer.name, export_scope)
 53 |       var_def.snapshot_name = ops.strip_name_scope(
 54 |           self._snapshot.name, export_scope)
 55 |       if self._save_slice_info:
 56 |         var_def.save_slice_info_def.MergeFrom(self._save_slice_info.to_proto(
 57 |             export_scope=export_scope))
 58 |       return var_def
 59 |     else:
 60 |       return None
 61 | 
 62 |   @staticmethod
 63 |   def from_proto(variable_def, import_scope=None):
 64 |     """Returns a `Variable` object created from `variable_def`."""
 65 |     return Variable(variable_def=variable_def, import_scope=import_scope)
 66 | 
 67 |   ops.register_proto_function(ops.GraphKeys.GLOBAL_VARIABLES,
 68 |                               proto_type=variable_pb2.VariableDef,
 69 |                               to_proto=Variable.to_proto,
 70 |                               from_proto=Variable.from_proto)
 71 |   ```
 72 | 
 73 | ## Exporting a Complete Model to MetaGraph
 74 | 
 75 | The API for exporting a running model as a MetaGraph is `export_meta_graph()`.
 76 | 
 77 |   ```Python
 78 |   def export_meta_graph(filename=None, collection_list=None, as_text=False):
 79 |     """Writes `MetaGraphDef` to save_path/filename.
 80 | 
 81 |     Args:
 82 |       filename: Optional meta_graph filename including the path.
 83 |       collection_list: List of string keys to collect.
 84 |       as_text: If `True`, writes the meta_graph as an ASCII proto.
 85 | 
 86 |     Returns:
 87 |       A `MetaGraphDef` proto.
 88 |     """
 89 |   ```
 90 | 
 91 |   A `collection` can contain any Python objects that users would like to
 92 |   be able to uniquely identify and easily retrieve. These objects can be
 93 |   special operations in the graph, such as `train_op`, or hyper parameters,
 94 |   such as "learning rate".  Users can specify the list of collections
 95 |   they would like to export.  If no `collection_list` is specified,
 96 |   all collections in the model will be exported.
 97 | 
 98 |   The API returns a serialized protocol buffer. If `filename` is
 99 |   specified, the protocol buffer will also be written to a file.
100 | 
101 |   Here are some of the typical usage models:
102 | 
103 |   * Export the default running graph:
104 | 
105 |   ```Python
106 |   # Build the model
107 |   ...
108 |   with tf.Session() as sess:
109 |     # Use the model
110 |     ...
111 |   # Export the model to /tmp/my-model.meta.
112 |   meta_graph_def = tf.train.export_meta_graph(filename='/tmp/my-model.meta')
113 |   ```
114 | 
115 |   * Export the default running graph and only a subset of the collections.
116 | 
117 |   ```Python
118 |   meta_graph_def = tf.train.export_meta_graph(
119 |       filename='/tmp/my-model.meta',
120 |       collection_list=["input_tensor", "output_tensor"])
121 |   ```
122 | 
123 | 
124 | The MetaGraph is also automatically exported via the `save()` API in
125 | @{tf.train.Saver}.
126 | 
127 | 
128 | ## Import a MetaGraph
129 | 
130 | The API for importing a MetaGraph file into a graph is `import_meta_graph()`.
131 | 
132 | Here are some of the typical usage models:
133 | 
134 | * Import and continue training without building the model from scratch.
135 | 
136 |   ```Python
137 |   ...
138 |   # Create a saver.
139 |   saver = tf.train.Saver(...variables...)
140 |   # Remember the training_op we want to run by adding it to a collection.
141 |   tf.add_to_collection('train_op', train_op)
142 |   sess = tf.Session()
143 |   for step in xrange(1000000):
144 |       sess.run(train_op)
145 |       if step % 1000 == 0:
146 |           # Saves checkpoint, which by default also exports a meta_graph
147 |           # named 'my-model-global_step.meta'.
148 |           saver.save(sess, 'my-model', global_step=step)
149 |   ```
150 | 
151 |   Later we can continue training from this saved `meta_graph` without building
152 |   the model from scratch.
153 | 
154 |   ```Python
155 |   with tf.Session() as sess:
156 |     new_saver = tf.train.import_meta_graph('my-save-dir/my-model-10000.meta')
157 |     new_saver.restore(sess, 'my-save-dir/my-model-10000')
158 |     # tf.get_collection() returns a list. In this example we only want the
159 |     # first one.
160 |     train_op = tf.get_collection('train_op')[0]
161 |     for step in xrange(1000000):
162 |       sess.run(train_op)
163 |   ```
164 | 
165 | * Import and extend the graph.
166 | 
167 |   For example, we can first build an inference graph, export it as a meta graph:
168 | 
169 |   ```Python
170 |   # Creates an inference graph.
171 |   # Hidden 1
172 |   images = tf.constant(1.2, tf.float32, shape=[100, 28])
173 |   with tf.name_scope("hidden1"):
174 |     weights = tf.Variable(
175 |         tf.truncated_normal([28, 128],
176 |                             stddev=1.0 / math.sqrt(float(28))),
177 |         name="weights")
178 |     biases = tf.Variable(tf.zeros([128]),
179 |                          name="biases")
180 |     hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases)
181 |   # Hidden 2
182 |   with tf.name_scope("hidden2"):
183 |     weights = tf.Variable(
184 |         tf.truncated_normal([128, 32],
185 |                             stddev=1.0 / math.sqrt(float(128))),
186 |         name="weights")
187 |     biases = tf.Variable(tf.zeros([32]),
188 |                          name="biases")
189 |     hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases)
190 |   # Linear
191 |   with tf.name_scope("softmax_linear"):
192 |     weights = tf.Variable(
193 |         tf.truncated_normal([32, 10],
194 |                             stddev=1.0 / math.sqrt(float(32))),
195 |         name="weights")
196 |     biases = tf.Variable(tf.zeros([10]),
197 |                          name="biases")
198 |     logits = tf.matmul(hidden2, weights) + biases
199 |     tf.add_to_collection("logits", logits)
200 | 
201 |   init_all_op = tf.global_variables_initializer()
202 | 
203 |   with tf.Session() as sess:
204 |     # Initializes all the variables.
205 |     sess.run(init_all_op)
206 |     # Runs to logit.
207 |     sess.run(logits)
208 |     # Creates a saver.
209 |     saver0 = tf.train.Saver()
210 |     saver0.save(sess, 'my-save-dir/my-model-10000')
211 |     # Generates MetaGraphDef.
212 |     saver0.export_meta_graph('my-save-dir/my-model-10000.meta')
213 |   ```
214 | 
215 |   Then later import it and extend it to a training graph.
216 | 
217 |   ```Python
218 |   with tf.Session() as sess:
219 |     new_saver = tf.train.import_meta_graph('my-save-dir/my-model-10000.meta')
220 |     new_saver.restore(sess, 'my-save-dir/my-model-10000')
221 |     # Addes loss and train.
222 |     labels = tf.constant(0, tf.int32, shape=[100], name="labels")
223 |     batch_size = tf.size(labels)
224 |     labels = tf.expand_dims(labels, 1)
225 |     indices = tf.expand_dims(tf.range(0, batch_size), 1)
226 |     concated = tf.concat([indices, labels], 1)
227 |     onehot_labels = tf.sparse_to_dense(
228 |         concated, tf.stack([batch_size, 10]), 1.0, 0.0)
229 |     logits = tf.get_collection("logits")[0]
230 |     cross_entropy = tf.nn.softmax_cross_entropy_with_logits(
231 |         labels=onehot_labels, logits=logits, name="xentropy")
232 |     loss = tf.reduce_mean(cross_entropy, name="xentropy_mean")
233 | 
234 |     tf.summary.scalar('loss', loss)
235 |     # Creates the gradient descent optimizer with the given learning rate.
236 |     optimizer = tf.train.GradientDescentOptimizer(0.01)
237 | 
238 |     # Runs train_op.
239 |     train_op = optimizer.minimize(loss)
240 |     sess.run(train_op)
241 |   ```
242 | 
243 | * Import a graph with preset devices.
244 | 
245 |   Sometimes an exported meta graph is from a training environment that the
246 |   importer doesn't have. For example, the model might have been trained
247 |   on GPUs, or in a distributed environment with replicas. When importing
248 |   such models, it's useful to be able to clear the device settings in
249 |   the graph so that we can run it on locally available devices. This can
250 |   be achieved by calling `import_meta_graph` with the `clear_devices`
251 |   option set to `True`.
252 | 
253 |   ```Python
254 |   with tf.Session() as sess:
255 |     new_saver = tf.train.import_meta_graph('my-save-dir/my-model-10000.meta',
256 |         clear_devices=True)
257 |     new_saver.restore(sess, 'my-save-dir/my-model-10000')
258 |     ...
259 |   ```
260 | 
261 | * Import within the default graph.
262 | 
263 |   Sometimes you might want to run `export_meta_graph` and `import_meta_graph`
264 |   in codelab using the default graph. In that case, you need to reset
265 |   the default graph by calling `tf.reset_default_graph()` first before
266 |   running import.
267 | 
268 |   ```Python
269 |   meta_graph_def = tf.train.export_meta_graph()
270 |   ...
271 |   tf.reset_default_graph()
272 |   ...
273 |   tf.train.import_meta_graph(meta_graph_def)
274 |   ...
275 |   ```
276 | 
277 | * Retrieve Hyper Parameters
278 | 
279 |   ```Python
280 |   filename = ".".join([tf.train.latest_checkpoint(train_dir), "meta"])
281 |   tf.train.import_meta_graph(filename)
282 |   hparams = tf.get_collection("hparams")
283 |   ```
284 | 


--------------------------------------------------------------------------------
/extend/add_filesys.md:
--------------------------------------------------------------------------------
  1 | # Adding a Custom Filesystem Plugin
  2 | 
  3 | ## Background
  4 | 
  5 | The TensorFlow framework is often used in multi-process and
  6 | multi-machine environments, such as Google data centers, Google Cloud
  7 | Machine Learning, Amazon Web Services (AWS), and on-site distributed clusters.
  8 | In order to both share and save certain types of state produced by TensorFlow,
  9 | the framework assumes the existence of a reliable, shared filesystem. This
 10 | shared filesystem has numerous uses, for example:
 11 | 
 12 | *   Checkpoints of state are often saved to a distributed filesystem for
 13 |     reliability and fault-tolerance.
 14 | *   Training processes communicate with TensorBoard by writing event files
 15 |     to a directory, which TensorBoard watches. A shared filesystem allows this
 16 |     communication to work even when TensorBoard runs in a different process or
 17 |     machine.
 18 | 
 19 | There are many different implementations of shared or distributed filesystems in
 20 | the real world, so TensorFlow provides an ability for users to implement a
 21 | custom FileSystem plugin that can be registered with the TensorFlow runtime.
 22 | When the TensorFlow runtime attempts to write to a file through the `FileSystem`
 23 | interface, it uses a portion of the pathname to dynamically select the
 24 | implementation that should be used for filesystem operations. Thus, adding
 25 | support for your custom filesystem requires implementing a `FileSystem`
 26 | interface, building a shared object containing that implementation, and loading
 27 | that object at runtime in whichever process needs to write to that filesystem.
 28 | 
 29 | Note that TensorFlow already includes many filesystem implementations, such as:
 30 | 
 31 | *   A standard POSIX filesystem
 32 | 
 33 |     Note: NFS filesystems often mount as a POSIX interface, and so standard
 34 |     TensorFlow can work on top of NFS-mounted remote filesystems.
 35 | *   HDFS - the Hadoop File System
 36 | *   GCS - Google Cloud Storage filesystem
 37 | *   A "memory-mapped-file" filesystem
 38 | 
 39 | The rest of this guide describes how to implement a custom filesystem.
 40 | 
 41 | ## Implementing a custom filesystem plugin
 42 | 
 43 | To implement a custom filesystem plugin, you must do the following:
 44 | 
 45 | *   Implement subclasses of `RandomAccessFile`, `WriteableFile`,
 46 |     `AppendableFile`, and `ReadOnlyMemoryRegion`.
 47 | *   Implement the `FileSystem` interface as a subclass.
 48 | *   Register the `FileSystem` implementation with an appropriate prefix pattern.
 49 | *   Load the filesystem plugin in a process that wants to write to that
 50 |     filesystem.
 51 | 
 52 | ### The FileSystem interface
 53 | 
 54 | The `FileSystem` interface is an abstract C++ interface defined in
 55 | [file_system.h](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/file_system.h).
 56 | An implementation of the `FileSystem` interface should implement all relevant
 57 | the methods defined by the interface. Implementing the interface requires
 58 | defining operations such as creating `RandomAccessFile`, `WritableFile`, and
 59 | implementing standard filesystem operations such as `FileExists`, `IsDirectory`,
 60 | `GetMatchingPaths`, `DeleteFile`, and so on. An implementation of these
 61 | interfaces will often involve translating the function's input arguments to
 62 | delegate to an already-existing library function implementing the equivalent
 63 | functionality in your custom filesystem.
 64 | 
 65 | For example, the `PosixFileSystem` implementation implements `DeleteFile` using
 66 | the POSIX `unlink()` function; `CreateDir` simply calls `mkdir()`; `GetFileSize`
 67 | involves calling `stat()` on the file and then returns the filesize as reported
 68 | by the return of the stat object. Similarly, for the `HDFSFileSystem`
 69 | implementation, these calls simply delegate to the `libHDFS` implementation of
 70 | similar functionality, such as `hdfsDelete` for
 71 | [DeleteFile](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/hadoop/hadoop_file_system.cc#L386).
 72 | 
 73 | We suggest looking through these code examples to get an idea of how different
 74 | filesystem implementations call their existing libraries. Examples include:
 75 | 
 76 | *   [POSIX
 77 |     plugin](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/posix/posix_file_system.h)
 78 | *   [HDFS
 79 |     plugin](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/hadoop/hadoop_file_system.h)
 80 | *   [GCS
 81 |     plugin](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/cloud/gcs_file_system.h)
 82 | 
 83 | #### The File interfaces
 84 | 
 85 | Beyond operations that allow you to query and manipulate files and directories
 86 | in a filesystem, the `FileSystem` interface requires you to implement factories
 87 | that return implementations of abstract objects such as the
 88 | [RandomAccessFile](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/file_system.h#L223),
 89 | the `WritableFile`, so that TensorFlow code and read and write to files in that
 90 | `FileSystem` implementation.
 91 | 
 92 | To implement a `RandomAccessFile`, you must implement a single interface called
 93 | `Read()`, in which the implementation must provide a way to read from an offset
 94 | within a named file.
 95 | 
 96 | For example, below is the implementation of RandomAccessFile for the POSIX
 97 | filesystem, which uses the `pread()` random-access POSIX function to implement
 98 | read. Notice that the particular implementation must know how to retry or
 99 | propagate errors from the underlying filesystem.
100 | 
101 | ```C++
102 |     class PosixRandomAccessFile : public RandomAccessFile {
103 |      public:
104 |       PosixRandomAccessFile(const string& fname, int fd)
105 |           : filename_(fname), fd_(fd) {}
106 |       ~PosixRandomAccessFile() override { close(fd_); }
107 | 
108 |       Status Read(uint64 offset, size_t n, StringPiece* result,
109 |                   char* scratch) const override {
110 |         Status s;
111 |         char* dst = scratch;
112 |         while (n > 0 && s.ok()) {
113 |           ssize_t r = pread(fd_, dst, n, static_cast<off_t>(offset));
114 |           if (r > 0) {
115 |             dst += r;
116 |             n -= r;
117 |             offset += r;
118 |           } else if (r == 0) {
119 |             s = Status(error::OUT_OF_RANGE, "Read less bytes than requested");
120 |           } else if (errno == EINTR || errno == EAGAIN) {
121 |             // Retry
122 |           } else {
123 |             s = IOError(filename_, errno);
124 |           }
125 |         }
126 |         *result = StringPiece(scratch, dst - scratch);
127 |         return s;
128 |       }
129 | 
130 |      private:
131 |       string filename_;
132 |       int fd_;
133 |     };
134 | ```
135 | 
136 | To implement the WritableFile sequential-writing abstraction, one must implement
137 | a few interfaces, such as `Append()`, `Flush()`, `Sync()`, and `Close()`.
138 | 
139 | For example, below is the implementation of WritableFile for the POSIX
140 | filesystem, which takes a `FILE` object in its constructor and uses standard
141 | posix functions on that object to implement the interface.
142 | 
143 | ```C++
144 |     class PosixWritableFile : public WritableFile {
145 |      public:
146 |       PosixWritableFile(const string& fname, FILE* f)
147 |           : filename_(fname), file_(f) {}
148 | 
149 |       ~PosixWritableFile() override {
150 |         if (file_ != NULL) {
151 |           fclose(file_);
152 |         }
153 |       }
154 | 
155 |       Status Append(const StringPiece& data) override {
156 |         size_t r = fwrite(data.data(), 1, data.size(), file_);
157 |         if (r != data.size()) {
158 |           return IOError(filename_, errno);
159 |         }
160 |         return Status::OK();
161 |       }
162 | 
163 |       Status Close() override {
164 |         Status result;
165 |         if (fclose(file_) != 0) {
166 |           result = IOError(filename_, errno);
167 |         }
168 |         file_ = NULL;
169 |         return result;
170 |       }
171 | 
172 |       Status Flush() override {
173 |         if (fflush(file_) != 0) {
174 |           return IOError(filename_, errno);
175 |         }
176 |         return Status::OK();
177 |       }
178 | 
179 |       Status Sync() override {
180 |         Status s;
181 |         if (fflush(file_) != 0) {
182 |           s = IOError(filename_, errno);
183 |         }
184 |         return s;
185 |       }
186 | 
187 |      private:
188 |       string filename_;
189 |       FILE* file_;
190 |     };
191 | 
192 | ```
193 | 
194 | For more details, please see the documentations of those interfaces, and look at
195 | example implementations for inspiration.
196 | 
197 | ### Registering and loading the filesystem
198 | 
199 | Once you have implemented the `FileSystem` implementation for your custom
200 | filesystem, you need to register it under a "scheme" so that paths prefixed with
201 | that scheme are directed to your implementation. To do this, you call
202 | `REGISTER_FILE_SYSTEM`::
203 | 
204 | ```
205 |     REGISTER_FILE_SYSTEM("foobar", FooBarFileSystem);
206 | ```
207 | 
208 | When TensorFlow tries to operate on a file whose path starts with `foobar://`,
209 | it will use the `FooBarFileSystem` implementation.
210 | 
211 | ```C++
212 |     string filename = "foobar://path/to/file.txt";
213 |     std::unique_ptr<WritableFile> file;
214 | 
215 |     // Calls FooBarFileSystem::NewWritableFile to return
216 |     // a WritableFile class, which happens to be the FooBarFileSystem's
217 |     // WritableFile implementation.
218 |     TF_RETURN_IF_ERROR(env->NewWritableFile(filename, &file));
219 | ```
220 | 
221 | Next, you must build a shared object containing this implementation. An example
222 | of doing so using bazel's `cc_binary` rule can be found
223 | [here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/BUILD#L244),
224 | but you may use any build system to do so. See the section on @{$adding_an_op#build-the-op-library$building the op library} for similar
225 | instructions.
226 | 
227 | The result of building this target is a `.so` shared object file.
228 | 
229 | Lastly, you must dynamically load this implementation in the process. In Python,
230 | you can call the `tf.load_file_system_library(file_system_library)` function,
231 | passing the path to the shared object. Calling this in your client program loads
232 | the shared object in the process, thus registering your implementation as
233 | available for any file operations going through the `FileSystem` interface. You
234 | can see
235 | [test_file_system.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/framework/file_system_test.py)
236 | for an example.
237 | 
238 | ## What goes through this interface?
239 | 
240 | Almost all core C++ file operations within TensorFlow use the `FileSystem`
241 | interface, such as the `CheckpointWriter`, the `EventsWriter`, and many other
242 | utilities. This means implementing a `FileSystem` implementation allows most of
243 | your TensorFlow programs to write to your shared filesystem.
244 | 
245 | In Python, the `gfile` and `file_io` classes bind underneath to the `FileSystem
246 | implementation via SWIG, which means that once you have loaded this filesystem
247 | library, you can do:
248 | 
249 | ```
250 | with gfile.Open("foobar://path/to/file.txt") as w:
251 | 
252 |   w.write("hi")
253 | ```
254 | 
255 | When you do this, a file containing "hi" will appear in the "/path/to/file.txt"
256 | of your shared filesystem.
257 | 


--------------------------------------------------------------------------------