├── deploy ├── leftnav_files ├── index.md ├── tfserve.md └── hadoop.md ├── images ├── getting_started_add.png ├── getting_started_adder.png ├── getting_started_final.png └── getting_started_triple.png ├── extend ├── leftnav_files ├── index.md ├── new_data_formats.md ├── tool_developers │ └── index.md ├── architecture.md └── add_filesys.md ├── install ├── leftnav_files ├── index.md ├── install_c.md ├── install_windows.md ├── install_go.md └── install_java.md ├── performance ├── leftnav_files ├── index.md ├── xla │ ├── developing_new_backend.md │ ├── index.md │ ├── shapes.md │ ├── jit.md │ └── broadcasting.md └── performance_guide.md ├── get_started ├── leftnav_files ├── index.md └── summaries_and_tensorboard.md ├── tutorials ├── leftnav_files ├── index.md ├── mandelbrot.md ├── pdes.md ├── using_gpu.md ├── recurrent.md └── linear.md ├── programmers_guide ├── leftnav_files ├── index.md ├── dims_types.md ├── tfdbg-tflearn.md ├── data_versions.md ├── threading_and_queues.md ├── version_semantics.md ├── variables.md └── meta_graph.md ├── .gitignore ├── SUMMARY.md ├── README.md └── book.json /deploy/leftnav_files: -------------------------------------------------------------------------------- 1 | index.md 2 | distributed.md 3 | tfserve.md 4 | hadoop.md 5 | -------------------------------------------------------------------------------- /images/getting_started_add.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/efeiefei/tensorflow_documents_zh/HEAD/images/getting_started_add.png -------------------------------------------------------------------------------- /images/getting_started_adder.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/efeiefei/tensorflow_documents_zh/HEAD/images/getting_started_adder.png -------------------------------------------------------------------------------- /images/getting_started_final.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/efeiefei/tensorflow_documents_zh/HEAD/images/getting_started_final.png -------------------------------------------------------------------------------- /images/getting_started_triple.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/efeiefei/tensorflow_documents_zh/HEAD/images/getting_started_triple.png -------------------------------------------------------------------------------- /extend/leftnav_files: -------------------------------------------------------------------------------- 1 | index.md 2 | architecture.md 3 | adding_an_op.md 4 | add_filesys.md 5 | new_data_formats.md 6 | estimators.md 7 | language_bindings.md 8 | tool_developers/index.md 9 | -------------------------------------------------------------------------------- /install/leftnav_files: -------------------------------------------------------------------------------- 1 | install_linux.md 2 | install_mac.md 3 | install_windows.md 4 | install_sources.md 5 | >>> 6 | migration.md 7 | >>> 8 | install_java.md 9 | install_go.md 10 | install_c.md 11 | -------------------------------------------------------------------------------- /performance/leftnav_files: -------------------------------------------------------------------------------- 1 | performance_guide.md 2 | xla/index.md 3 | xla/broadcasting.md 4 | xla/developing_new_backend.md 5 | xla/jit.md 6 | xla/operation_semantics.md 7 | xla/shapes.md 8 | xla/tfcompile.md 9 | quantization.md 10 | -------------------------------------------------------------------------------- /get_started/leftnav_files: -------------------------------------------------------------------------------- 1 | index.md 2 | get_started.md 3 | mnist/beginners.md 4 | mnist/pros.md 5 | mnist/mechanics.md 6 | tflearn.md 7 | input_fn.md 8 | monitors.md 9 | summaries_and_tensorboard.md 10 | embedding_viz.md 11 | graph_viz.md 12 | -------------------------------------------------------------------------------- /tutorials/leftnav_files: -------------------------------------------------------------------------------- 1 | index.md 2 | using_gpu.md 3 | image_recognition.md 4 | image_retraining.md 5 | layers.md 6 | deep_cnn.md 7 | word2vec.md 8 | recurrent.md 9 | seq2seq.md 10 | linear.md 11 | wide.md 12 | wide_and_deep.md 13 | mandelbrot.md 14 | pdes.md 15 | -------------------------------------------------------------------------------- /programmers_guide/leftnav_files: -------------------------------------------------------------------------------- 1 | index.md 2 | variables.md 3 | dims_types.md 4 | variable_scope.md 5 | threading_and_queues.md 6 | reading_data.md 7 | supervisor.md 8 | debugger.md 9 | tfdbg-tflearn.md 10 | meta_graph.md 11 | version_semantics.md 12 | data_versions.md 13 | faq.md 14 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Node rules: 2 | ## Grunt intermediate storage (http://gruntjs.com/creating-plugins#storing-task-files) 3 | .grunt 4 | 5 | ## Dependency directory 6 | ## Commenting this out is preferred by some people, see 7 | ## https://docs.npmjs.com/misc/faq#should-i-check-my-node_modules-folder-into-git 8 | node_modules 9 | 10 | # Book build output 11 | _book 12 | 13 | # eBook build output 14 | *.epub 15 | *.mobi 16 | *.pdf -------------------------------------------------------------------------------- /SUMMARY.md: -------------------------------------------------------------------------------- 1 | # Summary 2 | 3 | * [介绍](README.md) 4 | * [安装](install/index.md) 5 | * [在 Ubuntu 上安装 TensorFlow](install/install_linux.md) 6 | * [在 Windows 上安装 TensorFlow](install/install_windows.md) 7 | * [开发](get_started/index.md) 8 | * [开始](get_started/index.md) 9 | * [初识 TensorFlow](get_started/get_started.md) 10 | * [Programmers' Guide](programmers_guide/index.md) 11 | * [教程](tutorials/index.md) 12 | * [性能](performance/index.md) 13 | * [部署](deploy/index.md) 14 | * [延伸](extend/index.md) 15 | 16 | 17 | 18 | -------------------------------------------------------------------------------- /install/index.md: -------------------------------------------------------------------------------- 1 | # 安装TensorFlow 2 | 3 | 如下指南描述了如何安装TensorFlow的不同版本。 4 | 5 | * [在 Ubuntu 上安装 TensorFlow](./install_linux.md) 6 | * [在 Mac OS X 上安装 TensorFlow](./install_mac.md) 7 | * [在 Windows 上安装 TensorFlow](./install_windows.md) 8 | * [从源码安装 TensorFlow](./install_sources.md) 9 | 10 | Python TensorFlow API 自版本 0.n 到 1.0 变化花了很多。如下指南描述了如何从老旧 TensorFlow 应用迁移到1.0版本。 11 | 12 | [迁移到 TensorFlow 1.0](./migration.md) 13 | 14 | 如下指南描述了如何安装其他语言的TensorFlow库。这些API是为了在应用中使用TensorFlow模型,所以并不如Python API一样具有扩展性。 15 | 16 | * [为 Java 安装 TensorFlow](./install_java.md) 17 | * [为 C 安装 TensorFlow](./install_c.md) 18 | * [为 GO 安装 TensorFlow](./install_go.md) 19 | 20 | -------------------------------------------------------------------------------- /deploy/index.md: -------------------------------------------------------------------------------- 1 | # Deploy 2 | 3 | This section focuses on deploying real-world models. It contains 4 | the following documents: 5 | 6 | * @{$distributed$Distributed TensorFlow}, which explains how to create 7 | a cluster of TensorFlow servers. 8 | * @{$tfserve$TensorFlow Serving}, which describes TensorFlow Serving--an 9 | open-source serving system for machine learning models. This document 10 | provides a short introduction to TensorFlow Serving; the bulk of the 11 | documentation about TensorFlow Serving is in a 12 | [separate website](https://tensorflow.github.io/serving/serving_basic). 13 | * @{$hadoop$How to run TensorFlow on Hadoop}, which has a highly 14 | self-explanatory title. 15 | 16 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # 说明 2 | 3 | [TensorFlow](https://www.tensorflow.org/) 正式版本 V1.0 已经发布,api 较之前 V0.xx 版本发生了较大变化,Tutorial、HowTo 等文档也发生了很大变化。新的文档更加合理,对TensorFlow甚至机器学习的新手更加友好,更适合循序渐进的学习。 4 | 5 | 网络上流传较广的TensorFlow中文文档大多为 [TensorFlow中文社区](http://tensorfly.cn/) 的文档,翻译自 V0.5。 6 | 7 | 在此,选取版本 r1.1 的文档进行翻译,r1.1与r1.0的文档内容区别不大,结构做了一些调整。 8 | 9 | --- 10 | 11 | **具体内容查看 **[**目录**](/SUMMARY.md)**。** 12 | 13 | --- 14 | 15 | 本项目同时更新于 GitBook 与 GitHub。 16 | 17 | GitHub 项目地址:[https://github.com/efeiefei/tensorflow\_documents\_zh/](https://github.com/efeiefei/tensorflow_documents_zh/) 18 | 19 | GitBook 阅读地址:[https://efeiefei.gitbooks.io/tensorflow\_documents\_zh/](https://efeiefei.gitbooks.io/tensorflow_documents_zh/) 20 | 21 | 欢迎联系,一起翻译! 22 | 23 | -------------------------------------------------------------------------------- /book.json: -------------------------------------------------------------------------------- 1 | { 2 | "author": "虞连飞", 3 | "description": "TensorFlow正式版中文文档", 4 | "extension": null, 5 | "generator": "site", 6 | "links": { 7 | "sharing": { 8 | "all": null, 9 | "facebook": null, 10 | "google": null, 11 | "twitter": null, 12 | "weibo": null 13 | }, 14 | "sidebar": { 15 | "Github": "https://github.com/efeiefei/" 16 | } 17 | }, 18 | "output": null, 19 | "pdf": { 20 | "fontSize": 12, 21 | "footerTemplate": null, 22 | "headerTemplate": null, 23 | "margin": { 24 | "bottom": 36, 25 | "left": 62, 26 | "right": 62, 27 | "top": 36 28 | }, 29 | "pageNumbers": false, 30 | "paperSize": "a4" 31 | }, 32 | "plugins": [], 33 | "title": "TensorFlow正式版中文文档", 34 | "variables": {} 35 | } 36 | -------------------------------------------------------------------------------- /get_started/index.md: -------------------------------------------------------------------------------- 1 | # 开始 2 | 3 | 查看如下指南可对 TensorFlow 程序有个简要的概览: 4 | 5 | * [初识 TensorFlow](./get_started.md) 6 | 7 | MINIST 是实验新的机器学习工具的经典数据集。我们提供了三篇指南,每篇介绍了一种不同的方法在 8 | TensorFlow 上训练 MNIST 模型。 9 | 10 | * [MNIST-机器学习初学者](./mnits/beginners.md),通过高级 API 介绍了 MNIST。 11 | * [MNIST-机器学习专家](./mnist/pro.md),比“MNIST-机器学习初学者”更加深入, 12 | 假设读者对机器学习的概念有所了解。 13 | * [TensorFlow 机制 101](./mnist/mechanics.md),通过底层 API 介绍了 MNIST。 14 | 15 | 对刚接触 TensorFlow 的开发者来说,可以从高级 API 开始。下面的指南可以帮助读者学习高级 API: 16 | 17 | * [tf.contrib.learn 快速开始](./tflearn.md),介绍该 API。 18 | * [通过 tf.contrib.learn 构建输入函数](./input_fn.md),让你对该 API 更深入的使用。 19 | * [通过 tf.contrib.learn 打日志并做监控](./monitors.md),解释了如何观察(audit)训练速度。 20 | 21 | TensorBoard 是可视化机器学习的各个方面的工具。如下指南描述了如何使用 TensorBoard。 22 | 23 | * [TensorBoard:可视化学习](./summaries_and_tensorboard.md),让你开始。 24 | * [TensorBoard:Embedding 可视化](./embedding_viz.md),演示了如何查看高维数据并与之交互。 25 | * [TensorBoard:图可视化](./graph_viz.md),演示了如何可视化计算图。图可视化一般对使用底层 26 | API 的开发者更有用。 27 | 28 | -------------------------------------------------------------------------------- /deploy/tfserve.md: -------------------------------------------------------------------------------- 1 | # TensorFlow Serving 2 | 3 | ## Introduction 4 | 5 | TensorFlow Serving is a flexible, high-performance serving system for machine 6 | learning models, designed for production environments. TensorFlow Serving 7 | makes it easy to deploy new algorithms and experiments, while keeping the same 8 | server architecture and APIs. 9 | 10 | ## Basic Serving Tutorial 11 | 12 | See the [basic tutorial](https://tensorflow.github.io/serving/serving_basic) 13 | on the TensorFlow Serving site to learn how to export a trained TensorFlow 14 | model and build a server to serve the exported model. 15 | 16 | ## Advanced Serving Tutorial 17 | 18 | See the 19 | [advanced tutorial](https://tensorflow.github.io/serving/serving_advanced) 20 | on the TensorFlow Serving site to learn how to build a server that 21 | dynamically discovers and serves new versions of a trained TensorFlow 22 | model. 23 | 24 | ## Serving Inception Model Tutorial 25 | 26 | See the 27 | [serving inception tutorial](https://tensorflow.github.io/serving/serving_inception) 28 | on the TensorFlow Serving site to learn how to serve the inception model with 29 | TensorFlow Serving and Kubernetes. 30 | 31 | -------------------------------------------------------------------------------- /extend/index.md: -------------------------------------------------------------------------------- 1 | # Extend 2 | 3 | This section explains how developers can add functionality to TensorFlow's 4 | capabilities. Begin by reading the following architectural overview: 5 | 6 | * @{$architecture$TensorFlow Architecture} 7 | 8 | The following guides explain how to extend particular aspects of 9 | TensorFlow: 10 | 11 | * @{$adding_an_op$Adding a New Op}, which explains how to create your own 12 | operations. 13 | * @{$add_filesys$Adding a Custom Filesystem Plugin}, which explains how to 14 | add support for your own shared or distributed filesystem. 15 | * @{$new_data_formats$Custom Data Readers}, which details how to add support 16 | for your own file and record formats. 17 | * @{$estimators$Creating Estimators in tf.contrib.learn}, which explains how 18 | to write your own custom Estimator. For example, you could build your 19 | own Estimator to implement some variation on standard linear regression. 20 | 21 | Python is currently the only language supported by TensorFlow's API stability 22 | promises. However, TensorFlow also provides functionality in C++, Java, and Go, 23 | plus community support for [Haskell](https://github.com/tensorflow/haskell) 24 | and [Rust](https://github.com/tensorflow/rust). If you'd like to create or 25 | develop TensorFlow features in a language other than these languages, read the 26 | following guide: 27 | 28 | * @{$language_bindings$TensorFlow in Other Languages} 29 | 30 | To create tools compatible with TensorFlow's model format, read the following 31 | guide: 32 | 33 | * @{$tool_developers$A Tool Developer's Guide to TensorFlow Model Files} 34 | 35 | 36 | -------------------------------------------------------------------------------- /performance/index.md: -------------------------------------------------------------------------------- 1 | # Performance 2 | 3 | Performance is often a significant issue when training a machine learning 4 | model. This section explains various ways to optimize performance. Start 5 | your investigation with the following guide: 6 | 7 | * @{$performance_guide$Performance}, which contains a collection of best 8 | practices for optimizing your TensorFlow code. 9 | 10 | XLA (Accelerated Linear Algebra) is an experimental compiler for linear 11 | algebra that optimizes TensorFlow computations. The following guides explore 12 | XLA: 13 | 14 | * @{$xla$XLA Overview}, which introduces XLA. 15 | * @{$broadcasting$Broadcasting Semantics}, which describes XLA's 16 | broadcasting semantics. 17 | * @{$developing_new_backend$Developing a new back end for XLA}, which 18 | explains how to re-target TensorFlow in order to optimize the performance 19 | of the computational graph for particular hardware. 20 | * @{$jit$Using JIT Compilation}, which describes the XLA JIT compiler that 21 | compiles and runs parts of TensorFlow graphs via XLA in order to optimize 22 | performance. 23 | * @{$operation_semantics$Operation Semantics}, which is a reference manual 24 | describing the semantics of operations in the `ComputationBuilder` 25 | interface. 26 | * @{$shapes$Shapes and Layout}, which details the `Shape` protocol buffer. 27 | * @{$tfcompile$Using AOT compilation}, which explains `tfcompile`, a 28 | standalone tool that compiles TensorFlow graphs into executable code in 29 | order to optimize performance. 30 | 31 | And finally, we offer the following guide: 32 | 33 | * @{$quantization$How to Quantize Neural Networks with TensorFlow}, which 34 | can explains how to use quantization to reduce model size, both in storage 35 | and at runtime. Quantization can improve performance, especially on 36 | mobile hardware. 37 | 38 | -------------------------------------------------------------------------------- /tutorials/index.md: -------------------------------------------------------------------------------- 1 | # Tutorials 2 | 3 | This section contains tutorials demonstrating how to do specific tasks 4 | in TensorFlow. If you are new to TensorFlow, we recommend reading the 5 | documents in the "Get Started" section before reading these tutorials. 6 | 7 | The following tutorial explains the interaction of CPUs and GPUs on a 8 | TensorFlow system: 9 | 10 | * @{$using_gpu$Using GPUs} 11 | 12 | The following tutorials cover different aspects of image recognition: 13 | 14 | * @{$image_recognition$Image Recognition}, which introduces the field of 15 | image recognition and a model (Inception) for recognizing images. 16 | * @{$image_retraining$How to Retrain Inception's Final Layer for New Categories}, 17 | which has a wonderfully self-explanatory title. 18 | * @{$layers$A Guide to TF Layers: Building a Convolutional Neural Network}, 19 | which introduces convolutional neural networks (CNNs) and demonstrates how 20 | to build a CNN in TensorFlow. 21 | * @{$deep_cnn$Convolutional Neural Networks}, which demonstrates how to 22 | build a small CNN for recognizing images. This tutorial is aimed at 23 | advanced TensorFlow users. 24 | 25 | The following tutorials focus on machine learning problems in human language: 26 | 27 | * @{$word2vec$Vector Representations of Words}, which demonstrates how to 28 | create an embedding for words. 29 | * @{$recurrent$Recurrent Neural Networks}, which demonstrates how to use a 30 | recurrent neural network to predict the next word in a sentence. 31 | * @{$seq2seq$Sequence-to-Sequence Models}, which demonstrates how to use a 32 | sequence-to-sequence model to translate text from English to French. 33 | 34 | The following tutorials focus on linear models: 35 | 36 | * @{$linear$Large-Scale Linear Models with TensorFlow}, which introduces 37 | linear models and demonstrates how to build them with the high-level API. 38 | * @{$wide$TensorFlow Linear Model Tutorial}, which demonstrates how to solve 39 | a binary classification problem in TensorFlow. 40 | * @{$wide_and_deep$TensorFlow Wide & Deep Learning Tutorial}, which explains 41 | how to use the high-level API to jointly train both a wide linear model 42 | and a deep feed-forward neural network. 43 | 44 | Although TensorFlow specializes in machine learning, you may also use 45 | TensorFlow to solve other kinds of math problems. For example: 46 | 47 | * @{$mandelbrot$Mandelbrot Set} 48 | * @{$pdes$Partial Differential Equations} 49 | 50 | -------------------------------------------------------------------------------- /deploy/hadoop.md: -------------------------------------------------------------------------------- 1 | # How to run TensorFlow on Hadoop 2 | 3 | This document describes how to run TensorFlow on Hadoop. It will be expanded to 4 | describe running on various cluster managers, but only describes running on HDFS 5 | at the moment. 6 | 7 | ## HDFS 8 | 9 | We assume that you are familiar with @{$reading_data$reading data}. 10 | 11 | To use HDFS with TensorFlow, change the file paths you use to read and write 12 | data to an HDFS path. For example: 13 | 14 | ```python 15 | filename_queue = tf.train.string_input_producer([ 16 | "hdfs://namenode:8020/path/to/file1.csv", 17 | "hdfs://namenode:8020/path/to/file2.csv", 18 | ]) 19 | ``` 20 | 21 | If you want to use the namenode specified in your HDFS configuration files, then 22 | change the file prefix to `hdfs://default/`. 23 | 24 | When launching your TensorFlow program, the following environment variables must 25 | be set: 26 | 27 | * **JAVA_HOME**: The location of your Java installation. 28 | * **HADOOP_HDFS_HOME**: The location of your HDFS installation. You can also 29 | set this environment variable by running: 30 | 31 | ```shell 32 | source ${HADOOP_HOME}/libexec/hadoop-config.sh 33 | ``` 34 | 35 | * **LD_LIBRARY_PATH**: To include the path to libjvm.so, and optionally the path 36 | to libhdfs.so if your Hadoop distribution does not install libhdfs.so in 37 | `$HADOOP_HDFS_HOME/lib/native`. On Linux: 38 | 39 | ```shell 40 | export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:${JAVA_HOME}/jre/lib/amd64/server 41 | ``` 42 | 43 | * **CLASSPATH**: The Hadoop jars must be added prior to running your 44 | TensorFlow program. The CLASSPATH set by 45 | `${HADOOP_HOME}/libexec/hadoop-config.sh` is insufficient. Globs must be 46 | expanded as described in the libhdfs documentation: 47 | 48 | ```shell 49 | CLASSPATH=$($HADOOP_HDFS_HOME}/bin/hadoop classpath --glob) python your_script.py 50 | ``` 51 | For older version of Hadoop/libhdfs (older than 2.6.0), you have to expand the 52 | classpath wildcard manually. For more details, see 53 | [HADOOP-10903](https://issues.apache.org/jira/browse/HADOOP-10903). 54 | 55 | If the Hadoop cluster is in secure mode, the following environment variable must 56 | be set: 57 | 58 | * **KERB_TICKET_CACHE_PATH**: The path of Kerberos ticket cache file. For example: 59 | 60 | ```shell 61 | export KERB_TICKET_CACHE_PATH=/tmp/krb5cc_10002 62 | ``` 63 | 64 | If you are running @{$distributed$Distributed TensorFlow}, then all 65 | workers must have the environment variables set and Hadoop installed. 66 | -------------------------------------------------------------------------------- /programmers_guide/index.md: -------------------------------------------------------------------------------- 1 | # Programmer's Guide 2 | 3 | The documents in this unit dive into the details of writing TensorFlow 4 | code. This section begins with the following guides, each of which 5 | explain a particular aspect of TensorFlow: 6 | 7 | * @{$variables$Variables: Creation, Initialization, Saving, and Loading}, 8 | which details the mechanics of TensorFlow Variables. 9 | * @{$dims_types$Tensor Ranks, Shapes, and Types}, which explains Tensor 10 | rank (the number of dimensions), shape (the size of each dimension), 11 | and datatypes. 12 | * @{$variable_scope$Sharing Variables}, which explains how to share and 13 | manage large sets of variables when building complex models. 14 | * @{$threading_and_queues$Threading and Queues}, which explains TensorFlow's 15 | rich queuing system. 16 | * @{$reading_data$Reading Data}, which documents three different mechanisms 17 | for getting data into a TensorFlow program. 18 | 19 | The following guide is helpful when training a complex model over multiple 20 | days: 21 | 22 | * @{$supervisor$Supervisor: Training Helper for Days-Long Trainings}, which 23 | explains how to gracefully handle system crashes during a lengthy training 24 | session. 25 | 26 | TensorFlow provides a debugger named `tfdbg`, which is documented in the 27 | following two guides: 28 | 29 | * @{$debugger$TensorFlow Debugger (tfdbg) Command-Line-Interface Tutorial: MNIST}, 30 | which walks you through the use of `tfdbg` within an application written 31 | in the low-level TensorFlow API. 32 | * @{$tfdbg-tflearn$How to Use TensorFlow Debugger (tfdbg) with tf.contrib.learn}, 33 | which demonstrates how to use `tfdbg` within the Estimators API. 34 | 35 | A `MetaGraph` consists of both a computational graph and its associated 36 | metadata. A `MetaGraph` contains the information required to continue 37 | training, perform evaluation, or run inference on a previously 38 | trained graph. The following guide details `MetaGraph` objects: 39 | 40 | * @{$meta_graph$Exporting and Importing a MetaGraph}. 41 | 42 | To learn about the TensorFlow versioning scheme, consult the following two 43 | guides: 44 | 45 | * @{$version_semantics$TensorFlow Version Semantics}, which explains 46 | TensorFlow's versioning nomenclature and compatibility rules. 47 | * @{$data_versions$TensorFlow Data Versioning: GraphDefs and Checkpoints}, 48 | which explains how TensorFlow adds versioning information to computational 49 | graphs and checkpoints in order to support compatibility across versions. 50 | 51 | We conclude this section with a FAQ about TensorFlow programming: 52 | 53 | * @{$faq$Frequently Asked Questions} 54 | -------------------------------------------------------------------------------- /programmers_guide/dims_types.md: -------------------------------------------------------------------------------- 1 | # Tensor Ranks, Shapes, and Types 2 | 3 | TensorFlow programs use a tensor data structure to represent all data. You can 4 | think of a TensorFlow tensor as an n-dimensional array or list. 5 | A tensor has a static type and dynamic dimensions. Only tensors may be passed 6 | between nodes in the computation graph. 7 | 8 | ## Rank 9 | 10 | In the TensorFlow system, tensors are described by a unit of dimensionality 11 | known as *rank*. Tensor rank is not the same as matrix rank. Tensor rank 12 | (sometimes referred to as *order* or *degree* or *n-dimension*) is the number 13 | of dimensions of the tensor. For example, the following tensor (defined as a 14 | Python list) has a rank of 2: 15 | 16 | t = [[1, 2, 3], [4, 5, 6], [7, 8, 9]] 17 | 18 | A rank two tensor is what we typically think of as a matrix, a rank one tensor 19 | is a vector. For a rank two tensor you can access any element with the syntax 20 | `t[i, j]`. For a rank three tensor you would need to address an element with 21 | `t[i, j, k]`. 22 | 23 | Rank | Math entity | Python example 24 | --- | --- | --- 25 | 0 | Scalar (magnitude only) | `s = 483` 26 | 1 | Vector (magnitude and direction) | `v = [1.1, 2.2, 3.3]` 27 | 2 | Matrix (table of numbers) | `m = [[1, 2, 3], [4, 5, 6], [7, 8, 9]]` 28 | 3 | 3-Tensor (cube of numbers) | `t = [[[2], [4], [6]], [[8], [10], [12]], [[14], [16], [18]]]` 29 | n | n-Tensor (you get the idea) | `....` 30 | 31 | ## Shape 32 | 33 | The TensorFlow documentation uses three notational conventions to describe 34 | tensor dimensionality: rank, shape, and dimension number. The following table 35 | shows how these relate to one another: 36 | 37 | Rank | Shape | Dimension number | Example 38 | --- | --- | --- | --- 39 | 0 | [] | 0-D | A 0-D tensor. A scalar. 40 | 1 | [D0] | 1-D | A 1-D tensor with shape [5]. 41 | 2 | [D0, D1] | 2-D | A 2-D tensor with shape [3, 4]. 42 | 3 | [D0, D1, D2] | 3-D | A 3-D tensor with shape [1, 4, 3]. 43 | n | [D0, D1, ... Dn-1] | n-D | A tensor with shape [D0, D1, ... Dn-1]. 44 | 45 | Shapes can be represented via Python lists / tuples of ints, or with the 46 | @{tf.TensorShape}. 47 | 48 | ## Data types 49 | 50 | In addition to dimensionality, Tensors have a data type. You can assign any one 51 | of the following data types to a tensor: 52 | 53 | Data type | Python type | Description 54 | --- | --- | --- 55 | `DT_FLOAT` | `tf.float32` | 32 bits floating point. 56 | `DT_DOUBLE` | `tf.float64` | 64 bits floating point. 57 | `DT_INT8` | `tf.int8` | 8 bits signed integer. 58 | `DT_INT16` | `tf.int16` | 16 bits signed integer. 59 | `DT_INT32` | `tf.int32` | 32 bits signed integer. 60 | `DT_INT64` | `tf.int64` | 64 bits signed integer. 61 | `DT_UINT8` | `tf.uint8` | 8 bits unsigned integer. 62 | `DT_UINT16` | `tf.uint16` | 16 bits unsigned integer. 63 | `DT_STRING` | `tf.string` | Variable length byte arrays. Each element of a Tensor is a byte array. 64 | `DT_BOOL` | `tf.bool` | Boolean. 65 | `DT_COMPLEX64` | `tf.complex64` | Complex number made of two 32 bits floating points: real and imaginary parts. 66 | `DT_COMPLEX128` | `tf.complex128` | Complex number made of two 64 bits floating points: real and imaginary parts. 67 | `DT_QINT8` | `tf.qint8` | 8 bits signed integer used in quantized Ops. 68 | `DT_QINT32` | `tf.qint32` | 32 bits signed integer used in quantized Ops. 69 | `DT_QUINT8` | `tf.quint8` | 8 bits unsigned integer used in quantized Ops. 70 | -------------------------------------------------------------------------------- /tutorials/mandelbrot.md: -------------------------------------------------------------------------------- 1 | # Mandelbrot Set 2 | 3 | Visualizing the [Mandelbrot set](https://en.wikipedia.org/wiki/Mandelbrot_set) 4 | doesn't have anything to do with machine learning, but it makes for a fun 5 | example of how one can use TensorFlow for general mathematics. This is 6 | actually a pretty naive implementation of the visualization, but it makes the 7 | point. (We may end up providing a more elaborate implementation down the line 8 | to produce more truly beautiful images.) 9 | 10 | Note: This tutorial was originally prepared as an IPython notebook. 11 | 12 | ## Basic Setup 13 | 14 | We'll need a few imports to get started. 15 | 16 | ```python 17 | # Import libraries for simulation 18 | import tensorflow as tf 19 | import numpy as np 20 | 21 | # Imports for visualization 22 | import PIL.Image 23 | from io import BytesIO 24 | from IPython.display import Image, display 25 | ``` 26 | 27 | Now we'll define a function to actually display the image once we have 28 | iteration counts. 29 | 30 | ```python 31 | def DisplayFractal(a, fmt='jpeg'): 32 | """Display an array of iteration counts as a 33 | colorful picture of a fractal.""" 34 | a_cyclic = (6.28*a/20.0).reshape(list(a.shape)+[1]) 35 | img = np.concatenate([10+20*np.cos(a_cyclic), 36 | 30+50*np.sin(a_cyclic), 37 | 155-80*np.cos(a_cyclic)], 2) 38 | img[a==a.max()] = 0 39 | a = img 40 | a = np.uint8(np.clip(a, 0, 255)) 41 | f = BytesIO() 42 | PIL.Image.fromarray(a).save(f, fmt) 43 | display(Image(data=f.getvalue())) 44 | ``` 45 | 46 | ## Session and Variable Initialization 47 | 48 | For playing around like this, we often use an interactive session, but a regular 49 | session would work as well. 50 | 51 | ```python 52 | sess = tf.InteractiveSession() 53 | ``` 54 | 55 | It's handy that we can freely mix NumPy and TensorFlow. 56 | 57 | ```python 58 | # Use NumPy to create a 2D array of complex numbers 59 | 60 | Y, X = np.mgrid[-1.3:1.3:0.005, -2:1:0.005] 61 | Z = X+1j*Y 62 | ``` 63 | 64 | Now we define and initialize TensorFlow tensors. 65 | 66 | ```python 67 | xs = tf.constant(Z.astype(np.complex64)) 68 | zs = tf.Variable(xs) 69 | ns = tf.Variable(tf.zeros_like(xs, tf.float32)) 70 | ``` 71 | 72 | TensorFlow requires that you explicitly initialize variables before using them. 73 | 74 | ```python 75 | tf.global_variables_initializer().run() 76 | ``` 77 | 78 | ## Defining and Running the Computation 79 | 80 | Now we specify more of the computation... 81 | 82 | ```python 83 | # Compute the new values of z: z^2 + x 84 | zs_ = zs*zs + xs 85 | 86 | # Have we diverged with this new value? 87 | not_diverged = tf.abs(zs_) < 4 88 | 89 | # Operation to update the zs and the iteration count. 90 | # 91 | # Note: We keep computing zs after they diverge! This 92 | # is very wasteful! There are better, if a little 93 | # less simple, ways to do this. 94 | # 95 | step = tf.group( 96 | zs.assign(zs_), 97 | ns.assign_add(tf.cast(not_diverged, tf.float32)) 98 | ) 99 | ``` 100 | 101 | ... and run it for a couple hundred steps 102 | 103 | ```python 104 | for i in range(200): step.run() 105 | ``` 106 | 107 | Let's see what we've got. 108 | 109 | ```python 110 | DisplayFractal(ns.eval()) 111 | ``` 112 | 113 | ![jpeg](../images/mandelbrot_output.jpg) 114 | 115 | Not bad! 116 | 117 | 118 | -------------------------------------------------------------------------------- /tutorials/pdes.md: -------------------------------------------------------------------------------- 1 | # Partial Differential Equations 2 | 3 | TensorFlow isn't just for machine learning. Here we give a (somewhat 4 | pedestrian) example of using TensorFlow for simulating the behavior of a 5 | [partial differential equation]( 6 | https://en.wikipedia.org/wiki/Partial_differential_equation). 7 | We'll simulate the surface of square pond as a few raindrops land on it. 8 | 9 | Note: This tutorial was originally prepared as an IPython notebook. 10 | 11 | ## Basic Setup 12 | 13 | A few imports we'll need. 14 | 15 | ```python 16 | #Import libraries for simulation 17 | import tensorflow as tf 18 | import numpy as np 19 | 20 | #Imports for visualization 21 | import PIL.Image 22 | from io import BytesIO 23 | from IPython.display import clear_output, Image, display 24 | ``` 25 | 26 | A function for displaying the state of the pond's surface as an image. 27 | 28 | ```python 29 | def DisplayArray(a, fmt='jpeg', rng=[0,1]): 30 | """Display an array as a picture.""" 31 | a = (a - rng[0])/float(rng[1] - rng[0])*255 32 | a = np.uint8(np.clip(a, 0, 255)) 33 | f = BytesIO() 34 | PIL.Image.fromarray(a).save(f, fmt) 35 | clear_output(wait = True) 36 | display(Image(data=f.getvalue())) 37 | ``` 38 | 39 | Here we start an interactive TensorFlow session for convenience in playing 40 | around. A regular session would work as well if we were doing this in an 41 | executable .py file. 42 | 43 | ```python 44 | sess = tf.InteractiveSession() 45 | ``` 46 | 47 | ## Computational Convenience Functions 48 | 49 | 50 | ```python 51 | def make_kernel(a): 52 | """Transform a 2D array into a convolution kernel""" 53 | a = np.asarray(a) 54 | a = a.reshape(list(a.shape) + [1,1]) 55 | return tf.constant(a, dtype=1) 56 | 57 | def simple_conv(x, k): 58 | """A simplified 2D convolution operation""" 59 | x = tf.expand_dims(tf.expand_dims(x, 0), -1) 60 | y = tf.nn.depthwise_conv2d(x, k, [1, 1, 1, 1], padding='SAME') 61 | return y[0, :, :, 0] 62 | 63 | def laplace(x): 64 | """Compute the 2D laplacian of an array""" 65 | laplace_k = make_kernel([[0.5, 1.0, 0.5], 66 | [1.0, -6., 1.0], 67 | [0.5, 1.0, 0.5]]) 68 | return simple_conv(x, laplace_k) 69 | ``` 70 | 71 | ## Define the PDE 72 | 73 | Our pond is a perfect 500 x 500 square, as is the case for most ponds found in 74 | nature. 75 | 76 | ```python 77 | N = 500 78 | ``` 79 | 80 | Here we create our pond and hit it with some rain drops. 81 | 82 | ```python 83 | # Initial Conditions -- some rain drops hit a pond 84 | 85 | # Set everything to zero 86 | u_init = np.zeros([N, N], dtype=np.float32) 87 | ut_init = np.zeros([N, N], dtype=np.float32) 88 | 89 | # Some rain drops hit a pond at random points 90 | for n in range(40): 91 | a,b = np.random.randint(0, N, 2) 92 | u_init[a,b] = np.random.uniform() 93 | 94 | DisplayArray(u_init, rng=[-0.1, 0.1]) 95 | ``` 96 | 97 | ![jpeg](../images/pde_output_1.jpg) 98 | 99 | 100 | Now let's specify the details of the differential equation. 101 | 102 | 103 | ```python 104 | # Parameters: 105 | # eps -- time resolution 106 | # damping -- wave damping 107 | eps = tf.placeholder(tf.float32, shape=()) 108 | damping = tf.placeholder(tf.float32, shape=()) 109 | 110 | # Create variables for simulation state 111 | U = tf.Variable(u_init) 112 | Ut = tf.Variable(ut_init) 113 | 114 | # Discretized PDE update rules 115 | U_ = U + eps * Ut 116 | Ut_ = Ut + eps * (laplace(U) - damping * Ut) 117 | 118 | # Operation to update the state 119 | step = tf.group( 120 | U.assign(U_), 121 | Ut.assign(Ut_)) 122 | ``` 123 | 124 | ## Run The Simulation 125 | 126 | This is where it gets fun -- running time forward with a simple for loop. 127 | 128 | ```python 129 | # Initialize state to initial conditions 130 | tf.global_variables_initializer().run() 131 | 132 | # Run 1000 steps of PDE 133 | for i in range(1000): 134 | # Step simulation 135 | step.run({eps: 0.03, damping: 0.04}) 136 | DisplayArray(U.eval(), rng=[-0.1, 0.1]) 137 | ``` 138 | 139 | ![jpeg](../images/pde_output_2.jpg) 140 | 141 | Look! Ripples! 142 | 143 | -------------------------------------------------------------------------------- /install/install_c.md: -------------------------------------------------------------------------------- 1 | # Installing TensorFlow for C 2 | 3 | TensorFlow provides a C API defined in 4 | [`c_api.h`](https://github.com/tensorflow/tensorflow/tree/master/c/c_api.h), 5 | which is suitable for 6 | [building bindings for other languages](https://www.tensorflow.org/extend/language_bindings). 7 | The API leans towards simplicity and uniformity rather than convenience. 8 | 9 | 10 | ## Supported Platforms 11 | 12 | You may install TensorFlow for C on the following operating systems: 13 | 14 | * Linux 15 | * Mac OS X 16 | 17 | 18 | ## Installation 19 | 20 | Take the following steps to install the TensorFlow for C library and 21 | enable TensorFlow for C: 22 | 23 | 1. Decide whether you will run TensorFlow for C on CPU(s) only or 24 | with the help of GPU(s). To help you decide, read the section 25 | entitled "Determine which TensorFlow to install" in one of the 26 | following guides: 27 | 28 | * @{$install_linux#determine_which_tensorflow_to_install$Installing TensorFlow on Linux} 29 | * @{$install_mac#determine_which_tensorflow_to_install$Installing TensorFlow on Mac OS} 30 | 31 | 2. Download and extract the TensorFlow C library into `/usr/local/lib` by 32 | invoking the following shell commands: 33 | 34 | TF_TYPE="cpu" # Change to "gpu" for GPU support 35 | OS="linux" # Change to "darwin" for Mac OS 36 | TARGET_DIRECTORY="/usr/local" 37 | curl -L \ 38 | "https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-${TF_TYPE}-${OS}-x86_64-1.1.0.tar.gz" | 39 | sudo tar -C $TARGET_DIRECTORY -xz 40 | 41 | The `tar` command extracts the TensorFlow C library into the `lib` 42 | subdirectory of `TARGET_DIRECTORY`. For example, specifying `/usr/local` 43 | as `TARGET_DIRECTORY` causes `tar` to extract the TensorFlow C library 44 | into `/usr/local/lib`. 45 | 46 | If you'd prefer to extract the library into a different directory, 47 | adjust `TARGET_DIRECTORY` accordingly. 48 | 49 | 3. In Step 2, if you specified a system directory (for example, `/usr/local`) 50 | as the `TARGET_DIRECTORY`, then run `ldconfig` to configure the linker. 51 | For example: 52 | 53 |
sudo ldconfig
54 | 55 | If you assigned a `TARGET_DIRECTORY` other than a system 56 | directory (for example, `~/mydir`), then you must append the extraction 57 | directory (for example, `~/mydir/lib`) to two environment variables. 58 | For example: 59 | 60 |
 export LIBRARY_PATH=$LIBRARY_PATH:~/mydir/lib # For both Linux and Mac OS X
 61 |      export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/mydir/lib # For Linux only
 62 |      export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:~/mydir/lib # For Mac OS X only
63 | 64 | 65 | 66 | ## Validate your installation 67 | 68 | After installing TensorFlow for C, enter the following code into a file named 69 | `hello_tf.c`: 70 | 71 | ```c 72 | #include 73 | #include 74 | 75 | int main() { 76 | printf(“Hello from TensorFlow C library version %s\n”, TF_Version()); 77 | return 0; 78 | } 79 | ``` 80 | 81 | ### Build and Run 82 | 83 | Build `hello_tf.c` by invoking the following command: 84 | 85 | 86 |
gcc hello_tf.c
87 | 88 | 89 | Running the resulting executable should output the following message: 90 | 91 | 92 |
a.out
 93 | Hello from TensorFlow C library version number
94 | 95 | 96 | ### Troubleshooting 97 | 98 | If building the program fails, the most likely culprit is that `gcc` cannot 99 | find the TensorFlow C library. One way to fix this problem is to specify 100 | the `-I` and `-L` options to `gcc`. For example, if the `TARGET_LIBRARY` 101 | was `/usr/local`, you would invoke `gcc` as follows: 102 | 103 |
gcc -I/usr/local/include -L/usr/local/lib hello_tf.c -ltensorflow
104 | 105 | If executing `a.out` fails, ask yourself the following questions: 106 | 107 | * Did the program build without error? 108 | * Have you assigned the correct directory to the environment variables 109 | noted in Step 3 of [Installation](#installation)? 110 | * Did you export those environment variables? 111 | 112 | If you are still seeing build or execution error messages, search (or post to) 113 | [StackOverflow](www.stackoverflow.com/questions/tagged/tensorflow) for 114 | possible solutions. 115 | 116 | -------------------------------------------------------------------------------- /performance/xla/developing_new_backend.md: -------------------------------------------------------------------------------- 1 | # Developing a new backend for XLA 2 | 3 | This preliminary guide is for early adopters that want to easily retarget 4 | TensorFlow to their hardware in an efficient manner. The guide is not 5 | step-by-step and assumes knowledge of [LLVM](http://llvm.org), 6 | [Bazel](https://bazel.build/), and TensorFlow. 7 | 8 | XLA provides an abstract interface that a new architecture or accelerator can 9 | implement to create a backend to run TensorFlow graphs. Retargeting XLA should 10 | be significantly simpler and scalable than implementing every existing 11 | TensorFlow Op for new hardware. 12 | 13 | Most implementations will fall into one of the following scenarios: 14 | 15 | 1. Existing CPU architecture not yet officially supported by XLA, with or 16 | without an existing [LLVM](http://llvm.org) backend. 17 | 2. Non-CPU-like hardware with an existing LLVM backend. 18 | 3. Non-CPU-like hardware without an existing LLVM backend. 19 | 20 | > Note: An LLVM backend can mean either one of the officially released LLVM 21 | > backends or a custom LLVM backend developed in-house. 22 | 23 | ## Scenario 1: Existing CPU architecture not yet officially supported by XLA 24 | 25 | In this scenario, start by looking at the existing [XLA CPU backend] 26 | (https://www.tensorflow.org/code/tensorflow/compiler/xla/service/cpu/). 27 | XLA makes it easy to retarget TensorFlow to different CPUs by using LLVM, since 28 | the main difference between XLA backends for CPUs is the code generated by LLVM. 29 | Google tests XLA for x64 and ARM64 architectures. 30 | 31 | If the hardware vendor has an LLVM backend for their hardware, it is simple to 32 | link the backend with the LLVM built with XLA. In JIT mode, the XLA CPU backend 33 | emits code for the host CPU. For ahead-of-time compilation, 34 | [`xla::AotCompilationOptions`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/compiler.h) 35 | can provide an LLVM triple to configure the target architecture. 36 | 37 | If there is no existing LLVM backend but another kind of code generator exists, 38 | it should be possible to reuse most of the existing CPU backend. 39 | 40 | ## Scenario 2: Non-CPU-like hardware with an existing LLVM backend 41 | 42 | It is possible to model a new 43 | [`xla::Compiler`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/compiler.h) 44 | implementation on the existing [`xla::CPUCompiler`] 45 | (https://www.tensorflow.org/code/tensorflow/compiler/xla/service/cpu/cpu_compiler.cc) 46 | and [`xla::GPUCompiler`] 47 | (https://www.tensorflow.org/code/tensorflow/compiler/xla/service/gpu/gpu_compiler.cc) 48 | classes, since these already emit LLVM IR. Depending on the nature of the 49 | hardware, it is possible that many of the LLVM IR generation aspects will have 50 | to be changed, but a lot of code can be shared with the existing backends. 51 | 52 | A good example to follow is the [GPU backend] 53 | (https://www.tensorflow.org/code/tensorflow/compiler/xla/service/gpu/) 54 | of XLA. The GPU backend targets a non-CPU-like ISA, and therefore some aspects 55 | of its code generation are unique to the GPU domain. Other kinds of hardware, 56 | e.g. DSPs like Hexagon (which has an upstream LLVM backend), can reuse parts of 57 | the LLVM IR emission logic, but other parts will be unique. 58 | 59 | ## Scenario 3: Non-CPU-like hardware without an existing LLVM backend 60 | 61 | If it is not possible to utilize LLVM, then the best option is to implement a 62 | new backend for XLA for the desired hardware. This option requires the most 63 | effort. The classes that need to be implemented are as follows: 64 | 65 | * [StreamExecutor](https://www.tensorflow.org/code/tensorflow/stream_executor/stream_executor.h): 66 | For many devices not all methods of `StreamExecutor` are needed. See 67 | existing `StreamExecutor` implementations for details. 68 | * [xla::Compiler](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/compiler.h): 69 | This class encapsulates the compilation of a HLO computation into an 70 | `xla::Executable`. 71 | * [`xla::Executable`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/executable.h): 72 | This class is used to launch a compiled computation on the platform. 73 | * [`xla::TransferManager`](https://www.tensorflow.org/code/tensorflow/compiler/xla/service/transfer_manager.h): 74 | This class enables backends to provide platform-specific mechanisms for 75 | constructing XLA literal data from given device memory handles. In other 76 | words, it helps encapsulate the transfer of data from the host to the device 77 | and back. 78 | -------------------------------------------------------------------------------- /performance/xla/index.md: -------------------------------------------------------------------------------- 1 | # XLA Overview 2 | 3 | > Note: XLA is experimental and considered alpha. Most use cases will not 4 | > see improvements in performance (speed or decreased memory usage). We have 5 | > released XLA early so the Open Source Community can contribute to its 6 | > development, as well as create a path for integration with hardware 7 | > accelerators. 8 | 9 | XLA (Accelerated Linear Algebra) is a domain-specific compiler for linear 10 | algebra that optimizes TensorFlow computations. The results are improvements in 11 | speed, memory usage, and portability on server and mobile platforms. Initially, 12 | most users will not see large benefits from XLA, but are welcome to experiment 13 | by using XLA via @{$jit$just-in-time (JIT) compilation} or @{$tfcompile$ahead-of-time (AOT) compilation}. Developers targeting new hardware accelerators are 14 | especially encouraged to try out XLA. 15 | 16 | The XLA framework is experimental and in active development. In particular, 17 | while it is unlikely that the semantics of existing operations will change, it 18 | is expected that more operations will be added to cover important use cases. The 19 | team welcomes feedback from the community about missing functionality and 20 | community contributions via GitHub. 21 | 22 | ## Why did we build XLA? 23 | 24 | We had several objectives for XLA to work with TensorFlow: 25 | 26 | * *Improve execution speed.* Compile subgraphs to reduce the execution time of 27 | short-lived Ops to eliminate overhead from the TensorFlow runtime, fuse 28 | pipelined operations to reduce memory overhead, and specialize to known 29 | tensor shapes to allow for more aggressive constant propagation. 30 | 31 | * *Improve memory usage.* Analyze and schedule memory usage, in principle 32 | eliminating many intermediate storage buffers. 33 | 34 | * *Reduce reliance on custom Ops.* Remove the need for many custom Ops by 35 | improving the performance of automatically fused low-level Ops to match the 36 | performance of custom Ops that were fused by hand. 37 | 38 | * *Reduce mobile footprint.* Eliminate the TensorFlow runtime by ahead-of-time 39 | compiling the subgraph and emitting an object/header file pair that can be 40 | linked directly into another application. The results can reduce the 41 | footprint for mobile inference by several orders of magnitude. 42 | 43 | * *Improve portability.* Make it relatively easy to write a new backend for 44 | novel hardware, at which point a large fraction of TensorFlow programs will 45 | run unmodified on that hardware. This is in contrast with the approach of 46 | specializing individual monolithic Ops for new hardware, which requires 47 | TensorFlow programs to be rewritten to make use of those Ops. 48 | 49 | ## How does XLA work? 50 | 51 | The input language to XLA is called "HLO IR", or just HLO (High Level 52 | Optimizer). The semantics of HLO are described on the 53 | @{$operation_semantics$Operation Semantics} page. It 54 | is most convenient to think of HLO as a [compiler 55 | IR](https://en.wikipedia.org/wiki/Intermediate_representation). 56 | 57 | XLA takes graphs ("computations") defined in HLO and compiles them into machine 58 | instructions for various architectures. XLA is modular in the sense that it is 59 | easy to slot in an alternative backend to @{$developing_new_backend$target some novel HW architecture}. The CPU backend for x64 and ARM64 as 60 | well as the NVIDIA GPU backend are in the TensorFlow source tree. 61 | 62 | The following diagram shows the compilation process in XLA: 63 | 64 |
65 | 66 |
67 | 68 | XLA comes with several optimizations and analyses that are target-independent, 69 | such as [CSE](https://en.wikipedia.org/wiki/Common_subexpression_elimination), 70 | target-independent operation fusion, and buffer analysis for allocating runtime 71 | memory for the computation. 72 | 73 | After the target-independent step, XLA sends the HLO computation to a backend. 74 | The backend can perform further HLO-level analyses and optimizations, this time 75 | with target specific information and needs in mind. For example, the XLA GPU 76 | backend may perform operation fusion beneficial specifically for the GPU 77 | programming model and determine how to partition the computation into streams. 78 | At this stage, backends may also pattern-match certain operations or 79 | combinations thereof to optimized library calls. 80 | 81 | The next step is target-specific code generation. The CPU and GPU backends 82 | included with XLA use [LLVM](http://llvm.org) for low-level IR, optimization, 83 | and code-generation. These backends emit the LLVM IR necessary to represent the 84 | XLA HLO computation in an efficient manner, and then invoke LLVM to emit native 85 | code from this LLVM IR. 86 | 87 | The GPU backend currently supports NVIDIA GPUs via the LLVM NVPTX backend; the 88 | CPU backend supports multiple CPU ISAs. 89 | 90 | ## Supported Platforms 91 | 92 | XLA currently supports @{$jit$JIT compilation} on x86-64 and NVIDIA GPUs; and 93 | @{$tfcompile$AOT compilation} for x86-64 and ARM. 94 | -------------------------------------------------------------------------------- /install/install_windows.md: -------------------------------------------------------------------------------- 1 | # 在 Windows 上安装 TensorFlow 2 | 3 | 这篇指南描述了如何在 Windows 上安装 TensorFlow。 4 | 5 | ## 确定 TensorFlow 版本 6 | 7 | 如下之中选择一种来安装: 8 | 9 | * **只支持 CPU 的 TensorFlow**。如果你的系统不支持 NVIDIA® GPU, 你必须安装这个版本。这个版本的 TensorFlow 通常安装起来比较简单(一般 5 到 10分钟),所以即使你拥有 NVIDIA GPU,我们也推荐首先安装这个版本。 10 | * **支持 GPU 的 TensorFlow**. TensorFlow 在 GPU 上通常比在 CPU 上的执行的更快。所以如果你有符合如下要求的 NVIDIA® GPU 并且需要注重性能,可以随后安装这个版本。 11 | 12 | ### GPU support TensorFlow 的 NVIDIA 需求 13 | 14 | 需要事先安装如下软件: 15 | 16 | * CUDA® Toolkit 8.0。详见 [NVIDIA's documentation](http://docs.nvidia.com/cuda/cuda-installation-guide-microsoft-windows/)。确保按照文档中描述的将 Cuda 相关路径加入到 `%PATH%` 环境变量中。 17 | * CUDA Toolkit 8.0 相关的 NVIDIA 驱动。 18 | * cuDNN v5.1。详见 [NVIDIA's documentation](https://developer.nvidia.com/cudnn)。注意:cuDNN 通常与其他 CUDA DLLs 安装的位置不同。确保将 cuDNN 库的安装目录加入到了`%PATH%`中。 19 | * CUDA Compute Capability 3.0 或更高的 GPU 芯片。支持的 GPU 芯片详见 [NVIDIA documentation](https://developer.nvidia.com/cuda-gpus) 。 20 | 21 | 如果上述软件版本较老,请将其升级到指定版本。 22 | 23 | 24 | ## 确定如何安装 TensorFlow 25 | 26 | 有如下选择: 27 | 28 | * "native" pip 29 | * Anaconda 30 | 31 | 原生 pip 直接在系统中安装 TensorFlow,而不使用虚拟环境。 32 | 因为原生 pip 安装没有使用独立的容器隔离开,所以可能干扰其他基于Python的安装。 33 | 不过,如果你理解 pip 和 Python 环境,原生 pip 安装通常只需要一个命令! 34 | 如果使用原生 pip 安装,用户可在任何目录中执行 TensorFlow 程序。 35 | 36 | 在 Anaconda 中,你可以通过 conda 创建一个虚拟环境。 37 | 然而,我们推荐使用 `pip install` 安装 TensorFlow,而非`conda install`。 38 | 39 | **注意:**conda 包是社区支持而非官方支持。也就是说 TensorFlow 团队没有测试也没有管理过 conda 包。 40 | 使用这个包需要自行承担风险。 41 | 42 | 43 | ## 原生 pip 安装 44 | 45 | 如果如下版本的 Python 没有安装,先安装: 46 | 47 | * [Python 3.5.x from python.org](https://www.python.org/downloads/release/python-352/) 48 | 49 | TensorFlow 在 Windows 上支持 Python 3.5.x。 50 | 注意 Python 3.5.x 使用 pip3,我们用 pip3 来安装 TensorFlow。 51 | 52 | 在 terminal 中输入如下命令安装只支持 CPU 的 TensorFlow: 53 | 54 |
C:\> pip3 install --upgrade tensorflow
55 | 56 | 安装支持 GPU 的 TensorFlow,使用如下命令: 57 | 58 |
C:\> pip3 install --upgrade tensorflow-gpu
59 | 60 | 61 | ## Anaconda 安装 62 | 63 | **Anaconda 安装是社区支持,而非官方支持** 64 | 65 | 按照如下步骤在 Anaconda 环境中安装 TensorFlow: 66 | 67 | 1. 按说明下载并安装 Anaconda: 68 | [Anaconda download site](https://www.continuum.io/downloads) 69 | 70 | 2. 建立一个 conda 环境,命名为 tensorflow,以便运行某个 Python 版本: 71 | 72 |
C:\> conda create -n tensorflow 
73 | 74 | 3. 激活 anaconda 环境: 75 | 76 |
C:\> activate tensorflow
 77 |      (tensorflow)C:\>  # 你的提示符应该发生变化 
78 | 79 | 4. 在你的 conda 环境中安装只支持 CPU 的 TensorFlow(写在一行): 80 | 81 |
(tensorflow)C:\> pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/cpu/tensorflow-1.1.0-cp35-cp35m-win_amd64.whl 
82 | 83 | 安装支持 GPU 的 TensorFlow(写在一行): 84 | 85 |
(tensorflow)C:\> pip install --ignore-installed --upgrade https://storage.googleapis.com/tensorflow/windows/gpu/tensorflow_gpu-1.1.0-cp35-cp35m-win_amd64.whl 
86 | 87 | ## 验证安装结果 88 | 89 | 启动 terminal。 90 | 91 | 如果通过 Anaconda 安装,激活 Anaconda 环境。 92 | 93 | 启动 Python: 94 | 95 |
$ python
96 | 97 | 在 Python 交互式环境中输入 98 | 99 | ```python 100 | >>> import tensorflow as tf 101 | >>> hello = tf.constant('Hello, TensorFlow!') 102 | >>> sess = tf.Session() 103 | >>> print(sess.run(hello)) 104 | ``` 105 | 106 | 如果系统输出如下,则安装成功: 107 | 108 |
Hello, TensorFlow!
109 | 110 | 如果你新接触 TensorFlow,参考[初识 TensorFlow](../get_started)进行下一步学习。 111 | 112 | 如果系统输出错误信息而非欢迎信息,查看[常见安装问题](#common_installation_problems)。 113 | 114 | ## 常见安装问题 115 | 116 | 我们依靠 Stack Overflow 来编写 TensorFlow 安装问题及解决方案的文档。 117 | 如下表格包含了 Stack Overflow 上比较常见的安装问题的连接。 118 | 如果你遇到了不在列表中的新的错误信息或者其他安装问题,请在 Stack Overflow 上搜索。 119 | 如果搜索不到,请在 Stack Overflow 上提出一个新的问题,并打上 `tensorflow` 的标签。 120 | 121 | 122 | 123 | 124 | 125 | 126 | 129 | 130 | 131 | 132 | 133 | 136 | 137 | 138 | 139 | 140 | 145 | 146 | 147 | 148 | 149 | 150 | 151 | 152 | 153 | 154 | 157 | 158 | 159 | 160 | 161 | 164 | 165 | 166 |
Stack Overflow Link Error Message
41007279 127 |
[...\stream_executor\dso_loader.cc] Couldn't open CUDA library nvcuda.dll
128 |
41007279 134 |
[...\stream_executor\cuda\cuda_dnn.cc] Unable to load cuDNN DSO
135 |
42006320
ImportError: Traceback (most recent call last):
141 | File "...\tensorflow\core\framework\graph_pb2.py", line 6, in 
142 | from google.protobuf import descriptor as _descriptor
143 | ImportError: cannot import name 'descriptor'
144 |
42011070
No module named "pywrap_tensorflow"
42217532 155 |
OpKernel ('op: "BestSplits" device_type: "CPU"') for unknown op: BestSplits
156 |
43134753 162 |
The TensorFlow library wasn't compiled to use SSE instructions
163 |
167 | 168 | -------------------------------------------------------------------------------- /programmers_guide/tfdbg-tflearn.md: -------------------------------------------------------------------------------- 1 | # How to Use TensorFlow Debugger (tfdbg) with tf.contrib.learn 2 | 3 | [TOC] 4 | 5 | In @{$debugger$a previous tutorial}, we described how to use TensorFlow Debugger (**tfdbg**) 6 | to debug TensorFlow graphs running in 7 | @{tf.Session} 8 | objects managed by yourself. However, many users find 9 | @{$tflearn$`tf.contrib.learn`} 10 | @{tf.contrib.learn.Estimator$Estimator}s 11 | to be a convenient higher-level API for creating and using models 12 | in TensorFlow. Part of the convenience is that `Estimator`s manage `Session`s 13 | internally. Fortunately, you can still use `tfdbg` with `Estimator`s by adding 14 | special hooks. 15 | 16 | ## Debugging tf.contrib.learn Estimators 17 | 18 | Currently, **tfdbg** can debug the 19 | @{tf.contrib.learn.BaseEstimator.fit$`fit()`} 20 | @{tf.contrib.learn.BaseEstimator.evaluate$`evaluate()`} 21 | methods of tf-learn `Estimator`s. To debug `Estimator.fit()`, 22 | create a `LocalCLIDebugHook` and supply it as the `monitors` argument. For example: 23 | 24 | ```python 25 | # First, let your BUILD target depend on "//tensorflow/python/debug:debug_py" 26 | # (You don't need to worry about the BUILD dependency if you are using a pip 27 | # install of open-source TensorFlow.) 28 | from tensorflow.python import debug as tf_debug 29 | 30 | hooks = [tf_debug.LocalCLIDebugHook()] 31 | 32 | # Create a local CLI debug hook and use it as a monitor when calling fit(). 33 | classifier.fit(x=training_set.data, 34 | y=training_set.target, 35 | steps=1000, 36 | monitors=hooks) 37 | ``` 38 | 39 | To debug `Estimator.evaluate()`, you can follow the example below: 40 | 41 | ```python 42 | accuracy_score = classifier.evaluate(x=test_set.data, 43 | y=test_set.target, 44 | hooks=hooks)["accuracy"] 45 | ``` 46 | 47 | 48 | For a detailed [example](https://www.tensorflow.org/code/tensorflow/python/debug/examples/debug_tflearn_iris.py) based on 49 | @{$tflearn$tf-learn's iris tutorial}, 50 | run: 51 | 52 | ```none 53 | python -m tensorflow.python.debug.examples.debug_tflearn_iris --debug 54 | ``` 55 | 56 | ## Debugging tf.contrib.learn Experiments 57 | 58 | `Experiment` is a construct in `tf.contrib.learn` at a higher level than 59 | `Estimator`. 60 | It provides a single interface for training and evaluating a model. To debug 61 | the `train()` and `evaluate()` calls to an `Experiment` object, you can 62 | use the keyword arguments `train_monitors` and `eval_hooks`, respectively, when 63 | calling its constructor. For example: 64 | 65 | ```python 66 | # First, let your BUILD target depend on "//tensorflow/python/debug:debug_py" 67 | # (You don't need to worry about the BUILD dependency if you are using a pip 68 | # install of open-source TensorFlow.) 69 | from tensorflow.python import debug as tf_debug 70 | 71 | hooks = [tf_debug.LocalCLIDebugHook()] 72 | 73 | ex = experiment.Experiment(classifier, 74 | train_input_fn=iris_input_fn, 75 | eval_input_fn=iris_input_fn, 76 | train_steps=FLAGS.train_steps, 77 | eval_delay_secs=0, 78 | eval_steps=1, 79 | train_monitors=hooks, 80 | eval_hooks=hooks) 81 | 82 | ex.train() 83 | accuracy_score = ex.evaluate()["accuracy"] 84 | ``` 85 | 86 | To see the `debug_tflearn_iris` example run in the `Experiment` mode, do: 87 | 88 | ```none 89 | python -m tensorflow.python.debug.examples.debug_tflearn_iris \ 90 | --use_experiment --debug 91 | ``` 92 | 93 | ## Debugging Estimators and Experiments without Terminal Access 94 | 95 | If your `Estimator` or `Experiment` is running in an environment to which you 96 | do not have command-line access (e.g., a remote server), you can use the 97 | non-interactive `DumpingDebugHook`. For example: 98 | 99 | ```python 100 | # Let your BUILD target depend on "//tensorflow/python/debug:debug_py 101 | # (You don't need to worry about the BUILD dependency if you are using a pip 102 | # install of open-source TensorFlow.) 103 | from tensorflow.python import debug as tf_debug 104 | 105 | hooks = [tf_debug.DumpingDebugHook("/shared/storage/location/tfdbg_dumps_1")] 106 | ``` 107 | 108 | Then this `hook` can be used in the same way as the `LocalCLIDebugHook` examples 109 | above. As the training and/or evalution of `Estimator` or `Experiment` 110 | happens, directories of the naming pattern 111 | `/shared/storage/location/tfdbg_dumps_1/run__` 112 | will appear. Each directory corresponds to a `Session.run()` call that underlies 113 | the `fit()` or `evaluate()` call. You can load these directories and inspect 114 | them in a command-line interface in an offline manner using the 115 | `offline_analyzer` offered by **tfdbg**. For example: 116 | 117 | ```bash 118 | python -m tensorflow.python.debug.cli.offline_analyzer \ 119 | --dump_dir="/shared/storage/location/tfdbg_dumps_1/run__" 120 | ``` 121 | 122 | The `LocalCLIDebugHook` also allows you to configure a `watch_fn` that can be 123 | used to flexibly specify what `Tensor`s to watch on different `Session.run()` 124 | calls, as a function of the `fetches` and `feed_dict` and other states. See 125 | @{tfdbg.DumpingDebugWrapperSession.__init__$this API doc} 126 | for more details. 127 | -------------------------------------------------------------------------------- /install/install_go.md: -------------------------------------------------------------------------------- 1 | # Installing TensorFlow for Go 2 | 3 | TensorFlow provides APIs for use in Go programs. These APIs are particularly 4 | well-suited to loading models created in Python and executing them within 5 | a Go application. This guide explains how to install and set up the 6 | [TensorFlow Go package](https://godoc.org/github.com/tensorflow/tensorflow/tensorflow/go). 7 | 8 | **WARNING:** The TensorFlow Go API is *not* covered by the TensorFlow 9 | [API stability guarantees](https://www.tensorflow.org/programmers_guide/version_semantics). 10 | 11 | 12 | ## Supported Platforms 13 | 14 | You may install TensorFlow for Go on the following operating systems: 15 | 16 | * Linux 17 | * Mac OS X 18 | 19 | 20 | ## Installation 21 | 22 | TensorFlow for Go depends on the TensorFlow C library. Take the following 23 | steps to install this library and enable TensorFlow for Go: 24 | 25 | 1. Decide whether you will run TensorFlow for Go on CPU(s) only or with 26 | the help of GPU(s). To help you decide, read the section entitled 27 | "Determine which TensorFlow to install" in one of the following guides: 28 | 29 | * @{$install_linux#determine_which_tensorflow_to_install$Installing TensorFlow on Linux} 30 | * @{$install_mac#determine_which_tensorflow_to_install$Installing TensorFlow on Mac OS} 31 | 32 | 2. Download and extract the TensorFlow C library into `/usr/local/lib` by 33 | invoking the following shell commands: 34 | 35 | TF_TYPE="cpu" # Change to "gpu" for GPU support 36 | TARGET_DIRECTORY='/usr/local' 37 | curl -L \ 38 | "https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-${TF_TYPE}-$(go env GOOS)-x86_64-1.1.0.tar.gz" | 39 | sudo tar -C $TARGET_DIRECTORY -xz 40 | 41 | The `tar` command extracts the TensorFlow C library into the `lib` 42 | subdirectory of `TARGET_DIRECTORY`. For example, specifying `/usr/local` 43 | as `TARGET_DIRECTORY` causes `tar` to extract the TensorFlow C library 44 | into `/usr/local/lib`. 45 | 46 | If you'd prefer to extract the library into a different directory, 47 | adjust `TARGET_DIRECTORY` accordingly. 48 | 49 | 3. In Step 2, if you specified a system directory (for example, `/usr/local`) 50 | as the `TARGET_DIRECTORY`, then run `ldconfig` to configure the linker. 51 | For example: 52 | 53 |
sudo ldconfig
54 | 55 | If you assigned a `TARGET_DIRECTORY` other than a system 56 | directory (for example, `~/mydir`), then you must append the extraction 57 | directory (for example, `~/mydir/lib`) to two environment variables 58 | as follows: 59 | 60 |
 export LIBRARY_PATH=$LIBRARY_PATH:~/mydir/lib # For both Linux and Mac OS X
 61 |      export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:~/mydir/lib # For Linux only
 62 |      export DYLD_LIBRARY_PATH=$DYLD_LIBRARY_PATH:~/mydir/lib # For Mac OS X only
63 | 64 | 4. Now that the TensorFlow C library is installed, invoke `go get` as follows 65 | to download the appropriate packages and their dependencies: 66 | 67 |
go get github.com/tensorflow/tensorflow/tensorflow/go
68 | 69 | 5. Invoke `go test` as follows to validate the TensorFlow for Go 70 | installation: 71 | 72 |
go test github.com/tensorflow/tensorflow/tensorflow/go
73 | 74 | If `go get` or `go test` generate error messages, search (or post to) 75 | [StackOverflow](http://www.stackoverflow.com/questions/tagged/tensorflow) 76 | for possible solutions. 77 | 78 | 79 | ## Hello World 80 | 81 | After installing TensorFlow for Go, enter the following code into a 82 | file named `hello_tf.go`: 83 | 84 | ```go 85 | package main 86 | 87 | import ( 88 | tf "github.com/tensorflow/tensorflow/tensorflow/go" 89 | "github.com/tensorflow/tensorflow/tensorflow/go/op" 90 | "fmt" 91 | ) 92 | 93 | func main() { 94 | // Construct a graph with an operation that produces a string constant. 95 | s := op.NewScope() 96 | c := op.Const(s, "Hello from TensorFlow version " + tf.Version()) 97 | graph, err := s.Finalize() 98 | if err != nil { 99 | panic(err) 100 | } 101 | 102 | // Execute the graph in a session. 103 | sess, err := tf.NewSession(graph, nil) 104 | if err != nil { 105 | panic(err) 106 | } 107 | output, err := sess.Run(nil, []tf.Output{c}, nil) 108 | if err != nil { 109 | panic(err) 110 | } 111 | fmt.Println(output[0].Value()) 112 | } 113 | ``` 114 | 115 | For a more advanced example of TensorFlow in Go, look at the 116 | [example in the API documentation](https://godoc.org/github.com/tensorflow/tensorflow/tensorflow/go#ex-package), 117 | which uses a pre-trained TensorFlow model to label contents of an image. 118 | 119 | 120 | ### Running 121 | 122 | Run `hello_tf.go` by invoking the following command: 123 | 124 |
go run hello_tf.go
125 | Hello from TensorFlow version number
126 | 127 | The program might also generate multiple warning messages of the 128 | following form, which you can ignore: 129 | 130 |
W tensorflow/core/platform/cpu_feature_guard.cc:45] The TensorFlow library
131 | wasn't compiled to use *Type* instructions, but these are available on your
132 | machine and could speed up CPU computations.
133 | 134 | 135 | ## Building from source code 136 | 137 | TensorFlow is open-source. You may build TensorFlow for Go from the 138 | TensorFlow source code by following the instructions in a 139 | [separate document](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/go/README.md). 140 | -------------------------------------------------------------------------------- /performance/xla/shapes.md: -------------------------------------------------------------------------------- 1 | # Shapes and Layout 2 | 3 | The XLA `Shape` proto 4 | ([xla_data.proto](https://www.tensorflow.org/code/tensorflow/compiler/xla/xla_data.proto)) 5 | describes the rank, size, and data type of an N-dimensional array (*array* in 6 | short). 7 | 8 | ## Terminology, Notation, and Conventions 9 | 10 | * The rank of an array is equal to the number of dimensions. The *true rank* 11 | of an array is the number of dimensions which have a size greater than 1. 12 | 13 | * Dimensions are numbered from `0` up to `N-1` for an `N` dimensional array. 14 | The dimension numbers are arbitrary labels for convenience. The order of 15 | these dimension numbers does not imply a particular minor/major ordering in 16 | the layout of the shape. The layout is determined by the `Layout` proto. 17 | 18 | * By convention, dimensions are listed in increasing order of dimension 19 | number. For example, for a 3-dimensional array of size `[A x B x C]`, 20 | dimension 0 has size `A`, dimension 1 has size `B` and dimension 2 has size 21 | `C`. 22 | 23 | Some utilities in XLA also support negative indexing, similarly to Python; 24 | dimension -1 is the last dimension (equivalent to `N-1` for an `N` 25 | dimensional array). For example, for the 3-dimensional array described 26 | above, dimension -1 has size `C`, dimension -2 has size `B` and so on. 27 | 28 | * Two, three, and four dimensional arrays often have specific letters 29 | associated with dimensions. For example, for a 2D array: 30 | 31 | * dimension 0: `y` 32 | * dimension 1: `x` 33 | 34 | For a 3D array: 35 | 36 | * dimension 0: `z` 37 | * dimension 1: `y` 38 | * dimension 2: `x` 39 | 40 | For a 4D array: 41 | 42 | * dimension 0: `p` 43 | * dimension 1: `z` 44 | * dimension 2: `y` 45 | * dimension 3: `x` 46 | 47 | * Functions in the XLA API which take dimensions do so in increasing order of 48 | dimension number. This matches the ordering used when passing dimensions as 49 | an `initializer_list`; e.g. 50 | 51 | `ShapeUtil::MakeShape(F32, {A, B, C, D})` 52 | 53 | Will create a shape whose dimension size array consists of the sequence 54 | `[A, B, C, D]`. 55 | 56 | ## Layout 57 | 58 | The `Layout` proto describes how an array is represented in memory. The `Layout` 59 | proto includes the following fields: 60 | 61 | ``` 62 | message Layout { 63 | repeated int64 minor_to_major = 1; 64 | repeated int64 padded_dimensions = 2; 65 | optional PaddingValue padding_value = 3; 66 | } 67 | ``` 68 | 69 | ### Minor-to-major dimension ordering 70 | 71 | The only required field is `minor_to_major`. This field describes the 72 | minor-to-major ordering of the dimensions within a shape. Values in 73 | `minor_to_major` are an ordering of the dimensions of the array (`0` to `N-1` 74 | for an `N` dimensional array) with the first value being the most-minor 75 | dimension up to the last value which is the most-major dimension. The most-minor 76 | dimension is the dimension which changes most rapidly when stepping through the 77 | elements of the array laid out in linear memory. 78 | 79 | For example, consider the following 2D array of size `[2 x 3]`: 80 | 81 | ``` 82 | a b c 83 | d e f 84 | ``` 85 | 86 | Here dimension `0` is size 2, and dimension `1` is size 3. If the 87 | `minor_to_major` field in the layout is `[0, 1]` then dimension `0` is the 88 | most-minor dimension and dimension `1` is the most-major dimension. This 89 | corresponds to the following layout in linear memory: 90 | 91 | ``` 92 | a d b e c f 93 | ``` 94 | 95 | This minor-to-major dimension order of `0` up to `N-1` is akin to *column-major* 96 | (at rank 2). Assuming a monotonic ordering of dimensions, another name we may 97 | use to refer to this layout in the code is simply "dim 0 is minor". 98 | 99 | On the other hand, if the `minor_to_major` field in the layout is `[1, 0]` then 100 | the layout in linear memory is: 101 | 102 | ``` 103 | a b c d e f 104 | ``` 105 | 106 | A minor-to-major dimension order of `N-1` down to `0` for an `N` dimensional 107 | array is akin to *row-major* (at rank 2). Assuming a monotonic ordering of 108 | dimensions, another name we may use to refer to this layout in the code is 109 | simply "dim 0 is major". 110 | 111 | #### Default minor-to-major ordering 112 | 113 | The default layout for newly created Shapes is "dimension order is 114 | major-to-minor" (akin to row-major at rank 2). 115 | 116 | ### Padding 117 | 118 | Padding is defined in the optional `padded_dimensions` and `padding_value` 119 | fields. The field `padded_dimensions` describes the sizes (widths) to which each 120 | dimension is padded. If present, the number of elements in `padded_dimensions` 121 | must equal the rank of the shape. 122 | 123 | For example, given the `[2 x 3]` array defined above, if `padded_dimension` is 124 | `[3, 5]` then dimension 0 is padded to a width of 3 and dimension 1 is padded to 125 | a width of 5. The layout in linear memory (assuming a padding value of 0 and 126 | column-major layout) is: 127 | 128 | ``` 129 | a d 0 b e 0 c f 0 0 0 0 0 0 0 130 | ``` 131 | 132 | This is equivalent to the layout of the following array with the same 133 | minor-to-major dimension order: 134 | 135 | ``` 136 | a b c 0 0 137 | d e f 0 0 138 | 0 0 0 0 0 139 | ``` 140 | 141 | ### Indexing into arrays 142 | 143 | The class `IndexUtil` in 144 | [index_util.h](https://www.tensorflow.org/code/tensorflow/compiler/xla/index_util.h) 145 | provides utilities for converting between multidimensional indices and linear 146 | indices given a shape and layout. Multidimensional indices include a `int64` 147 | index for each dimension. Linear indices are a single `int64` value which 148 | indexes into the buffer holding the array. See `shape_util.h` and 149 | `layout_util.h` in the same directory for utilities that simplify creation and 150 | manipulation of shapes and layouts. 151 | -------------------------------------------------------------------------------- /performance/performance_guide.md: -------------------------------------------------------------------------------- 1 | # Performance 2 | 3 | This guide contains a collection of best practices for optimizing your 4 | TensorFlow code. The best practices apply to both new and experienced 5 | Tensorflow users. 6 | 7 | ## Best Practices 8 | While optimizing implementations of different types of models can be different, 9 | the topics below cover best practices to get the most performance from 10 | TensorFlow. Although these suggestions focus on image-based models, we will 11 | regularly add tips for all kinds of models. The following list highlights key 12 | best practices: 13 | 14 | * Build and install from source 15 | * Utilize queues for reading data 16 | * Preprocessing on the CPU 17 | * Use `NCHW` image data format 18 | * Place shared parameters on the GPU 19 | * Use fused batch norm 20 | 21 | The following sections detail the preceding suggestions. 22 | 23 | ### Build and install from source 24 | 25 | To install the most optimized version of TensorFlow, build and install 26 | TensorFlow from source by following [Installing TensorFlow from Source](../install/install_sources). 27 | Building from source with compiler optimizations for the target hardware and 28 | ensuring the latest CUDA platform and cuDNN libraries are installed results in 29 | the highest performing installs. 30 | 31 | For the most stable experience, build from the [latest release](https://github.com/tensorflow/tensorflow/releases) 32 | branch. To get the latest performance changes and accept some stability risk, 33 | build from [master](https://github.com/tensorflow/tensorflow). 34 | 35 | If there is a need to build TensorFlow on a platform that has different hardware 36 | than the target, then cross-compile with the highest optimizations for the target 37 | platform. The following command is an example of telling `bazel` to compile for 38 | a specific platform: 39 | 40 | ```python 41 | # This command optimizes for Intel’s Broadwell processor 42 | bazel build -c opt --copt=-march="broadwell" --config=cuda //tensorflow/tools/pip_package:build_pip_package 43 | 44 | ``` 45 | 46 | #### Environment, build, and install tips 47 | 48 | * Compile with the highest level of compute the [GPU 49 | supports](http://developer.nvidia.com/cuda-gpus), e.g. P100: 6.0, Titan X 50 | (pascal): 6.2, Titan X (maxwell): 5.2, and K80: 3.7. 51 | * Install the latest CUDA platform and cuDNN libraries. 52 | * Make sure to use a version of gcc that supports all of the optimizations of 53 | the target CPU. The recommended minimum gcc version is 4.8.3. 54 | * TensorFlow checks on startup whether it has been compiled with the 55 | optimizations available on the CPU. If the optimizations are not included, 56 | TensorFlow will emit warnings, e.g. AVX, AVX2, and FMA instructions not 57 | included. 58 | 59 | ### Utilize queues for reading data 60 | 61 | One common cause of poor performance is underutilizing GPUs, or essentially 62 | "starving" them of data by not setting up an efficient pipeline. Make sure to 63 | set up an input pipeline to utilize queues and stream data effectively. Review 64 | the @{$reading_data#reading_from_files$Reading Data guide} for implementation 65 | details. One way to identify a "starved" GPU is to generate and review 66 | timelines. A detailed tutorial for timelines does not exist, but a quick example 67 | of generating a timeline exists as part of the @{$jit$XLA JIT} tutorial. Another 68 | simple way to check if a GPU is underutilized is to run `watch nvidia-smi`, and 69 | if GPU utilization is not approaching 100% then the GPU is not getting data fast 70 | enough. 71 | 72 | Unless for a special circumstance or for example code, do not feed data 73 | into the session from Python variables, e.g. `dictionary`. 74 | 75 | ```python 76 | # This will result in poor performance. 77 | sess.run(train_step, feed_dict={x: batch_xs, y_: batch_ys}) 78 | ``` 79 | 80 | ### Preprocessing on the CPU 81 | 82 | Placing preprocessing operations on the CPU can significantly improve 83 | performance. When preprocessing occurs on the GPU the flow of data is 84 | CPU -> GPU (preprocessing) -> CPU -> GPU (training). The data is bounced back 85 | and forth between the CPU and GPU. When preprocessing is placed on the CPU, 86 | the data flow is CPU (preprocessing) -> GPU (training). Another benefit is 87 | preprocessing on the CPU frees GPU time to focus on training. 88 | 89 | Placing preprocessing on the CPU can result in a 6X+ increase in samples/sec 90 | processed, which could lead to training in 1/6th of the time. To ensure 91 | preprocessing is on the CPU, wrap the preprocessing operations as shown below: 92 | 93 | ```python 94 | with tf.device('/cpu:0'): 95 | # function to get and process images or data. 96 | distorted_inputs = load_and_distort_images() 97 | ``` 98 | 99 | ### Use large files 100 | 101 | Under some circumstances, both the CPU and GPU can be starved for data by the 102 | I/O system. If you are using many small files to form your input data set, you 103 | may be limited by the speed of your filesystem. If your training loop runs 104 | faster when using SSDs vs HDDs for storing your input data, you could could be 105 | I/O bottlenecked. 106 | 107 | If this is the case, you should pre-process your input data, creating a few 108 | large TFRecord files. 109 | 110 | ### Use NCHW image data format 111 | 112 | Image data format refers to the representation of batches of images. TensorFlow 113 | supports `NHWC` (TensorFlow default) and `NCHW` (cuDNN default). N refers to the 114 | number of images in a batch, H refers to the number of pixels in the vertical 115 | dimension, W refers to the number of pixels in the horizontal dimension, and C 116 | refers to the channels (e.g. 1 for black and white, 3 for RGB, etc.) Although 117 | cuDNN can operate on both formats, it is faster to operate in its default 118 | format. 119 | 120 | The best practice is to build models that work with both `NCHW` and `NHWC` as it 121 | is common to train using `NCHW` on GPU, and then do inference with NHWC on CPU. 122 | 123 | The very brief history of these two formats is that TensorFlow started by using 124 | `NHWC` because it was a little faster on CPUs. Then the TensorFlow team 125 | discovered that `NCHW` performs better when using the NVIDIA cuDNN library. The 126 | current recommendation is that users support both formats in their models. In 127 | the long term, we plan to rewrite graphs to make switching between the formats 128 | transparent. 129 | 130 | ### Use fused batch norm 131 | 132 | When using batch norm 133 | @{tf.contrib.layers.batch_norm} set the attribute `fused=True`: 134 | 135 | ```python 136 | bn = tf.contrib.layers.batch_norm( 137 | input_layer, fused=True, data_format='NCHW' 138 | scope=scope, **kwargs) 139 | ``` 140 | 141 | The non-fused batch norm does computations using several individual Ops. Fused 142 | batch norm combines the individual operations into a single kernel, which runs 143 | faster. 144 | -------------------------------------------------------------------------------- /programmers_guide/data_versions.md: -------------------------------------------------------------------------------- 1 | # TensorFlow Data Versioning: GraphDefs and Checkpoints 2 | 3 | As described in 4 | @{$version_semantics#compatibility-for-graphs-and-checkpoints$Compatibility for Graphs and Checkpoints}, 5 | TensorFlow marks each kind of data with version information in order to maintain 6 | backward compatibility. This document provides additional details about the 7 | versioning mechanism, and how to use it to safely change data formats. 8 | 9 | ## Backward and partial forward compatibility 10 | 11 | The two core artifacts exported from and imported into TensorFlow are 12 | checkpoints (serialized variable states) and `GraphDef`s (serialized computation 13 | graphs). Any approach to versioning these artifacts must take into account the 14 | following requirements: 15 | 16 | * **Backward compatibility** to support loading `GraphDefs` created with older 17 | versions of TensorFlow. 18 | * **Forward compatibility** to support scenarios where the producer of a 19 | `GraphDef` is upgraded to a newer version of TensorFlow before the consumer. 20 | * Enable evolving TensorFlow in incompatible ways. For example, removing Ops, 21 | adding attributes, and removing attributes. 22 | 23 | For `GraphDef`s, backward compatibility is enforced within a major version. This 24 | means functionality can only be removed between major versions. Forward 25 | compatibility is enforced within Patch releases (1.x.1 -> 1.x.2, for example). 26 | 27 | 28 | In order to achieve backward and forward compatibility as well as know when to 29 | enforce changes in formats, the serialized representations of graphs and 30 | variable state need to have metadata that describes when they were produced. The 31 | sections below detail the TensorFlow implementation and guidelines for evolving 32 | `GraphDef` versions. 33 | 34 | ### Independent data version schemes 35 | 36 | There are data versions for `GraphDef`s and checkpoints. Both data formats 37 | evolve at different rates, and also at different speeds than the version of 38 | TensorFlow. Both versioning systems are defined in 39 | [`core/public/version.h`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/public/version.h). 40 | Whenever a new version is added a note is added to the header detailing what 41 | changed and the date. 42 | 43 | ### Data, producers, and consumers 44 | 45 | This section discusses version information for **data**, binaries that produce 46 | data (**producers**), and binaries that consume data (**consumers**): 47 | 48 | * Producer binaries have a version (`producer`) and a minimum consumer version 49 | that they are compatible with (`min_consumer`). 50 | * Consumer binaries have a version (`consumer`) and a minimum producer version 51 | that they are compatible with (`min_producer`). 52 | * Each piece of versioned data has a [`VersionDef 53 | versions`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/versions.proto) 54 | field which records the `producer` that made the data, the `min_consumer` 55 | that it is compatible with, and a list of `bad_consumers` versions that are 56 | disallowed. 57 | 58 | By default, when a producer makes some data, the data inherits the producer's 59 | `producer` and `min_consumer` versions. `bad_consumers` can be set if specific 60 | consumer versions are known to contain bugs and must be avoided. A consumer can 61 | accept a piece of data if 62 | 63 | * `consumer` >= data's `min_consumer` 64 | * data's `producer` >= consumer's `min_producer` 65 | * `consumer` not in data's `bad_consumers` 66 | 67 | Since both producers and consumers come from the same TensorFlow code base, 68 | [`core/public/version.h`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/public/version.h) 69 | contains a main binary version which is treated as either `producer` or 70 | `consumer` depending on context and both `min_consumer` and `min_producer` 71 | (needed by producers and consumers, respectively). Specifically, 72 | 73 | * For `GraphDef` versions, we have `TF_GRAPH_DEF_VERSION`, 74 | `TF_GRAPH_DEF_VERSION_MIN_CONSUMER`, and 75 | `TF_GRAPH_DEF_VERSION_MIN_PRODUCER`. 76 | * For checkpoint versions, we have `TF_CHECKPOINT_VERSION`, 77 | `TF_CHECKPOINT_VERSION_MIN_CONSUMER`, and 78 | `TF_CHECKPOINT_VERSION_MIN_PRODUCER`. 79 | 80 | ### Evolving GraphDef versions 81 | 82 | This section presents examples of using this versioning mechanism to make 83 | changes to the `GraphDef` format. 84 | 85 | **Adding a new Op:** 86 | 87 | 1. Add the new Op to both consumers and producers at the same time, and do not 88 | change any `GraphDef` versions. This type of change is automatically 89 | backward compatible, and does not impact forward compatibility plan since 90 | existing producer scripts will not suddenly use the new functionality. 91 | 92 | **Adding a new Op and switching existing Python wrappers to use it:** 93 | 94 | 1. Implement new consumer functionality and increment the binary version. 95 | 2. If it is possible to make the wrappers use the new functionality only in 96 | cases that did not work before, the wrappers can be updated now. 97 | 3. Change Python wrappers to use the new functionality. Do not increment 98 | `min_consumer`, since models which do not use this Op should not break. 99 | 100 | **Removing an Op or restricting the functionality of an Op:** 101 | 102 | 1. Fix all producer scripts (not TensorFlow itself) to not use the banned Op or 103 | functionality. 104 | 2. Increment the binary version and implement new consumer functionality that 105 | bans the removed Op or functionality for GraphDefs at the new version and 106 | above. If possible, make TensorFlow stop producing `GraphDefs` with the 107 | banned functionality. This can be done with 108 | [`REGISTER_OP(...).Deprecated(deprecated_at_version, 109 | message)`](https://github.com/tensorflow/tensorflow/blob/b289bc7a50fc0254970c60aaeba01c33de61a728/tensorflow/core/ops/array_ops.cc#L1009). 110 | 3. Wait for a major release for backward compatibility purposes. 111 | 4. Increase `min_producer` to the GraphDef version from (2) and remove the 112 | functionality entirely. 113 | 114 | **Changing the functionality of an Op:** 115 | 116 | 1. Add a new similar Op named `SomethingV2` or similar and go through the 117 | process of adding it and switching existing Python wrappers to use it (may 118 | take 3 weeks if forward compatibility is desired). 119 | 2. Remove the old Op (Can only take place with a major version change due to 120 | backward compatibility). 121 | 3. Increase `min_consumer` to rule out consumers with the old Op, add back the 122 | old Op as an alias for `SomethingV2`, and go through the process to switch 123 | existing Python wrappers to use it. 124 | 4. Go through the process to remove `SomethingV2`. 125 | 126 | **Banning a single consumer version that cannot run safely:** 127 | 128 | 1. Bump the binary version and add the bad version to `bad_consumers` for all 129 | new GraphDefs. If possible, add to `bad_consumers` only for GraphDefs which 130 | contain a certain Op or similar. 131 | 2. If existing consumers have the bad version, push them out as soon as 132 | possible. 133 | -------------------------------------------------------------------------------- /performance/xla/jit.md: -------------------------------------------------------------------------------- 1 | # Using JIT Compilation 2 | 3 | > Note: TensorFlow must be compiled from source to include XLA. 4 | 5 | ## Why use just-in-time (JIT) compilation? 6 | 7 | The TensorFlow/XLA JIT compiler compiles and runs parts of TensorFlow graphs via 8 | XLA. The benefit of this over the standard TensorFlow implementation is that XLA 9 | can fuse multiple operators (kernel fusion) into a small number of compiled 10 | kernels. Fusing operators can reduce memory bandwidth requirements and improve 11 | performance compared to executing operators one-at-a-time, as the TensorFlow 12 | executor does. 13 | 14 | ## Running TensorFlow graphs via XLA 15 | 16 | There are two ways to run TensorFlow computations via XLA, either by 17 | JIT-compiling operators placed on a CPU or GPU device, or by placing operators 18 | on the `XLA_CPU` or `XLA_GPU` TensorFlow devices. Placing operators directly on 19 | a TensorFlow XLA device forces the operator to run on that device and is mainly 20 | used for testing. 21 | 22 | > Note: The XLA CPU backend produces fast single-threaded code (in most cases), 23 | > but does not yet parallelize as well as the TensorFlow CPU backend. The XLA 24 | > GPU backend is competitive with the standard TensorFlow implementation, 25 | > sometimes faster, sometimes slower. 26 | 27 | ### Turning on JIT compilation 28 | 29 | JIT compilation can be turned on at the session level or manually for select 30 | operations. Both of these approaches are zero-copy --- data does not need to be 31 | copied when passing data between a compiled XLA kernel and a TensorFlow operator 32 | placed on the same device. 33 | 34 | #### Session 35 | 36 | Turning on JIT compilation at the session level will result in all possible 37 | operators being greedily compiled into XLA computations. Each XLA computation 38 | will be compiled into one or more kernels for the underlying device. 39 | 40 | Subject to a few constraints, if there are two adjacent operators in the graph 41 | that both have XLA implementations, then they will be compiled into a single XLA 42 | computation. 43 | 44 | JIT compilation is turned on at the session level by setting the 45 | `global_jit_level` config to `tf.OptimizerOptions.ON_1` and passing the config 46 | during session initialization. 47 | 48 | ```python 49 | # Config to turn on JIT compilation 50 | config = tf.ConfigProto() 51 | config.graph_options.optimizer_options.global_jit_level = tf.OptimizerOptions.ON_1 52 | 53 | sess = tf.Session(config=config) 54 | ``` 55 | 56 | > Note: Turning on JIT at the session level will not result in operations being 57 | > compiled for the CPU. JIT compilation for CPU operations must be done via 58 | > the manual method documented below. This decision was made due to the CPU 59 | > backend being single-threaded. 60 | 61 | #### Manual 62 | 63 | JIT compilation can also be turned on manually for one or more operators. This 64 | is done by tagging the operators to compile with the attribute 65 | `_XlaCompile=true`. The simplest way to do this is via the 66 | `tf.contrib.compiler.jit.experimental_jit_scope()` scope defined in 67 | [`tensorflow/contrib/compiler/jit.py`](https://www.tensorflow.org/code/tensorflow/contrib/compiler/jit.py). 68 | Example usage: 69 | 70 | ```python 71 | jit_scope = tf.contrib.compiler.jit.experimental_jit_scope 72 | 73 | x = tf.placeholder(np.float32) 74 | with jit_scope(): 75 | y = tf.add(x, x) # The "add" will be compiled with XLA. 76 | ``` 77 | 78 | The `_XlaCompile` attribute is currently supported on a best-effort basis. If an 79 | operator cannot be compiled, TensorFlow will silently fall back to the normal 80 | implementation. 81 | 82 | ### Placing operators on XLA devices 83 | 84 | Another way to run computations via XLA is to place an operator on a specific 85 | XLA device. This method is normally only used for testing. Valid targets are 86 | `XLA_CPU` or `XLA_GPU`. 87 | 88 | ```python 89 | with tf.device("/job:localhost/replica:0/task:0/device:XLA_GPU:0"): 90 | output = tf.add(input1, input2) 91 | ``` 92 | 93 | Unlike JIT compilation on the standard CPU and GPU devices, these devices make a 94 | copy of data when it is transferred on and off the device. The extra copy makes 95 | it expensive to mix XLA and TensorFlow operators in the same graph. 96 | 97 | ## Tutorial 98 | 99 | This tutorial covers training a simple version of MNIST softmax with JIT turned 100 | on. Currently JIT at the session level, which is what is used for the tutorial, 101 | only supports GPU. 102 | 103 | Before starting the tutorial verify that the LD_LIBRARY environment variable or 104 | ldconfig contains `$CUDA_ROOT/extras/CUPTI/lib64`, which contains libraries for 105 | the CUDA Profiling Tools Interface [(CUPTI)](http://docs.nvidia.com/cuda/cupti/index.html). 106 | TensorFlow uses CUPTI to pull tracing information from the GPU. 107 | 108 | ### Step #1: Prepare sample script 109 | 110 | Download or move 111 | [mnist_softmax_xla.py](https://www.tensorflow.org/code/tensorflow/examples/tutorials/mnist/mnist_softmax_xla.py) 112 | into a folder outside of the TensorFlow source tree. 113 | 114 | ### Step #2: Run without XLA 115 | 116 | Execute the python script to train the model without XLA. 117 | 118 | ```shell 119 | python mnist_softmax_xla.py --xla='' 120 | ``` 121 | 122 | Using the Chrome Trace Event Profiler (browse to chrome://tracing), 123 | open the timeline file created when the script finishes: `timeline.ctf.json`. 124 | The rendered timeline should look similar to the picture below with multiple 125 | green boxes labeled `MatMul`, possibly across multiple CPUs. 126 |
127 | 128 |
129 | 130 | ### Step #3 Run with XLA 131 | 132 | Execute the python script to train the model with XLA and turn on a debugging 133 | feature of XLA via an environmental variable that outputs the XLA graph. 134 | 135 | ```shell 136 | TF_XLA_FLAGS=--xla_generate_hlo_graph=.* python mnist_softmax_xla.py 137 | ``` 138 | 139 | Open the timeline file created (`timeline.ctf.json`). The rendered timeline 140 | should look similar to the picture below with one long bar labeled `_XlaLaunch`. 141 |
142 | 143 |
144 | 145 | To understand what is happening in `_XlaLaunch`, look at the console output for 146 | statements similar to the following: 147 | 148 | ```shell 149 | computation cluster_0[_XlaCompiledKernel=true,_XlaNumConstantArgs=1].v82 [CPU: 150 | pipeline start, before inline]: /tmp/hlo_graph_0.dot 151 | 152 | ``` 153 | 154 | The console statements point to the location of `hlo_graph_xx.dot` files that 155 | contain information about the graph created by XLA. The process that XLA takes 156 | to fuse Ops is visible by starting at `hlo_graph_0.dot` and viewing each diagram 157 | in succession. 158 | 159 | To Render the .dot file into a png, install 160 | [GraphViz](http://www.graphviz.org/Download..php) and run: 161 | 162 | ```shell 163 | dot -Tpng hlo_graph_80.dot -o hlo_graph_80.png 164 | ``` 165 | 166 | The result will look like the following: 167 |
168 | 169 |
170 | -------------------------------------------------------------------------------- /programmers_guide/threading_and_queues.md: -------------------------------------------------------------------------------- 1 | # Threading and Queues 2 | 3 | Queues are a powerful mechanism for asynchronous computation using TensorFlow. 4 | 5 | Like everything in TensorFlow, a queue is a node in a TensorFlow graph. It's a 6 | stateful node, like a variable: other nodes can modify its content. In 7 | particular, nodes can enqueue new items in to the queue, or dequeue existing 8 | items from the queue. 9 | 10 | To get a feel for queues, let's consider a simple example. We will create a 11 | "first in, first out" queue (`FIFOQueue`) and fill it with zeros. 12 | Then we'll construct a graph 13 | that takes an item off the queue, adds one to that item, and puts it back on the 14 | end of the queue. Slowly, the numbers on the queue increase. 15 | 16 |
17 | 18 |
19 | 20 | `Enqueue`, `EnqueueMany`, and `Dequeue` are special nodes. They take a pointer 21 | to the queue instead of a normal value, allowing them to change it. We recommend 22 | you think of these as being like methods of the queue. In fact, in the Python 23 | API, they are methods of the queue object (e.g. `q.enqueue(...)`). 24 | 25 | **N.B.** Queue methods (such as `q.enqueue(...)`) *must* run on the same device 26 | as the queue. Incompatible device placement directives will be ignored when 27 | creating these operations. 28 | 29 | Now that you have a bit of a feel for queues, let's dive into the details... 30 | 31 | ## Queue usage overview 32 | 33 | Queues, such as @{tf.FIFOQueue} 34 | and @{tf.RandomShuffleQueue}, 35 | are important TensorFlow objects for computing tensors asynchronously in a 36 | graph. 37 | 38 | For example, a typical input architecture is to use a `RandomShuffleQueue` to 39 | prepare inputs for training a model: 40 | 41 | * Multiple threads prepare training examples and push them in the queue. 42 | * A training thread executes a training op that dequeues mini-batches from the 43 | queue 44 | 45 | This architecture has many benefits, as highlighted in the 46 | @{$reading_data$Reading data how to}, which also gives an overview of 47 | functions that simplify the construction of input pipelines. 48 | 49 | The TensorFlow `Session` object is multithreaded, so multiple threads can 50 | easily use the same session and run ops in parallel. However, it is not always 51 | easy to implement a Python program that drives threads as described above. All 52 | threads must be able to stop together, exceptions must be caught and 53 | reported, and queues must be properly closed when stopping. 54 | 55 | TensorFlow provides two classes to help: 56 | @{tf.train.Coordinator} and 57 | @{tf.train.QueueRunner}. These two classes 58 | are designed to be used together. The `Coordinator` class helps multiple threads 59 | stop together and report exceptions to a program that waits for them to stop. 60 | The `QueueRunner` class is used to create a number of threads cooperating to 61 | enqueue tensors in the same queue. 62 | 63 | ## Coordinator 64 | 65 | The `Coordinator` class helps multiple threads stop together. 66 | 67 | Its key methods are: 68 | 69 | * @{tf.train.Coordinator.should_stop}: returns True if the threads should stop. 70 | * @{tf.train.Coordinator.request_stop}: requests that threads should stop. 71 | * @{tf.train.Coordinator.join}: waits until the specified threads have stopped. 72 | 73 | You first create a `Coordinator` object, and then create a number of threads 74 | that use the coordinator. The threads typically run loops that stop when 75 | `should_stop()` returns `True`. 76 | 77 | Any thread can decide that the computation should stop. It only has to call 78 | `request_stop()` and the other threads will stop as `should_stop()` will then 79 | return `True`. 80 | 81 | ```python 82 | # Thread body: loop until the coordinator indicates a stop was requested. 83 | # If some condition becomes true, ask the coordinator to stop. 84 | def MyLoop(coord): 85 | while not coord.should_stop(): 86 | ...do something... 87 | if ...some condition...: 88 | coord.request_stop() 89 | 90 | # Main thread: create a coordinator. 91 | coord = tf.train.Coordinator() 92 | 93 | # Create 10 threads that run 'MyLoop()' 94 | threads = [threading.Thread(target=MyLoop, args=(coord,)) for i in xrange(10)] 95 | 96 | # Start the threads and wait for all of them to stop. 97 | for t in threads: 98 | t.start() 99 | coord.join(threads) 100 | ``` 101 | 102 | Obviously, the coordinator can manage threads doing very different things. 103 | They don't have to be all the same as in the example above. The coordinator 104 | also has support to capture and report exceptions. See the @{tf.train.Coordinator} documentation for more details. 105 | 106 | ## QueueRunner 107 | 108 | The `QueueRunner` class creates a number of threads that repeatedly run an 109 | enqueue op. These threads can use a coordinator to stop together. In 110 | addition, a queue runner runs a *closer thread* that automatically closes the 111 | queue if an exception is reported to the coordinator. 112 | 113 | You can use a queue runner to implement the architecture described above. 114 | 115 | First build a graph that uses a TensorFlow queue (e.g. a `tf.RandomShuffleQueue`) for input examples. Add ops that 116 | process examples and enqueue them in the queue. Add training ops that start by 117 | dequeueing from the queue. 118 | 119 | ```python 120 | example = ...ops to create one example... 121 | # Create a queue, and an op that enqueues examples one at a time in the queue. 122 | queue = tf.RandomShuffleQueue(...) 123 | enqueue_op = queue.enqueue(example) 124 | # Create a training graph that starts by dequeuing a batch of examples. 125 | inputs = queue.dequeue_many(batch_size) 126 | train_op = ...use 'inputs' to build the training part of the graph... 127 | ``` 128 | 129 | In the Python training program, create a `QueueRunner` that will run a few 130 | threads to process and enqueue examples. Create a `Coordinator` and ask the 131 | queue runner to start its threads with the coordinator. Write a training loop 132 | that also uses the coordinator. 133 | 134 | ``` 135 | # Create a queue runner that will run 4 threads in parallel to enqueue 136 | # examples. 137 | qr = tf.train.QueueRunner(queue, [enqueue_op] * 4) 138 | 139 | # Launch the graph. 140 | sess = tf.Session() 141 | # Create a coordinator, launch the queue runner threads. 142 | coord = tf.train.Coordinator() 143 | enqueue_threads = qr.create_threads(sess, coord=coord, start=True) 144 | # Run the training loop, controlling termination with the coordinator. 145 | for step in xrange(1000000): 146 | if coord.should_stop(): 147 | break 148 | sess.run(train_op) 149 | # When done, ask the threads to stop. 150 | coord.request_stop() 151 | # And wait for them to actually do it. 152 | coord.join(enqueue_threads) 153 | ``` 154 | 155 | ## Handling exceptions 156 | 157 | Threads started by queue runners do more than just run the enqueue ops. They 158 | also catch and handle exceptions generated by queues, including the 159 | `tf.errors.OutOfRangeError` exception, which is used to report that a queue was closed. 160 | 161 | A training program that uses a coordinator must similarly catch and report 162 | exceptions in its main loop. 163 | 164 | Here is an improved version of the training loop above. 165 | 166 | ```python 167 | try: 168 | for step in xrange(1000000): 169 | if coord.should_stop(): 170 | break 171 | sess.run(train_op) 172 | except Exception, e: 173 | # Report exceptions to the coordinator. 174 | coord.request_stop(e) 175 | finally: 176 | # Terminate as usual. It is safe to call `coord.request_stop()` twice. 177 | coord.request_stop() 178 | coord.join(threads) 179 | ``` 180 | -------------------------------------------------------------------------------- /programmers_guide/version_semantics.md: -------------------------------------------------------------------------------- 1 | # TensorFlow Version Semantics 2 | 3 | ## Semantic Versioning 2.0 4 | 5 | TensorFlow follows Semantic Versioning 2.0 ([semver](http://semver.org)) for its 6 | public API. Each release version of TensorFlow has the form `MAJOR.MINOR.PATCH`. 7 | Changes to the each number have the following meaning: 8 | 9 | * **MAJOR**: Backwards incompatible changes. Code and data that worked with 10 | a previous major release will not necessarily work with a new release. 11 | However, in some cases existing TensorFlow data (graphs, checkpoints, and 12 | other protobufs) may be migratable to the newer release; see below for details 13 | on data compatibility. 14 | 15 | * **MINOR**: Backwards compatible features, speed improvements, etc. Code and 16 | data that worked with a previous minor release *and* which depends only the 17 | public API will continue to work unchanged. For details on what is and is 18 | not the public API, see below. 19 | 20 | * **PATCH**: Backwards compatible bug fixes. 21 | 22 | ## What is covered 23 | 24 | Only the public APIs of TensorFlow are backwards compatible across minor and 25 | patch versions. The public APIs consist of 26 | 27 | * The documented public [Python](../api_docs/python) API, excluding `tf.contrib`. 28 | This includes all public functions and classes (whose names do not start with 29 | `_`) in the tensorflow module and its submodules. Note that the code in 30 | the `examples/` to `tools/` directories is not reachable through the 31 | tensorflow Python module and is thus not covered by the compatibility 32 | guarantee. 33 | 34 | If a symbol is available through the tensorflow Python module or its 35 | submodules, but is not documented, then it is _not_ considered part of the 36 | public API. 37 | 38 | * The [C API](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/c/c_api.h). 39 | 40 | * The following protocol buffer files: 41 | [`attr_value`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/attr_value.proto), 42 | [`config`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/protobuf/config.proto), 43 | [`event`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/util/event.proto), 44 | [`graph`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/graph.proto), 45 | [`op_def`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/op_def.proto), 46 | [`reader_base`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/kernels/reader_base.proto), 47 | [`summary`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/summary.proto), 48 | [`tensor`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/tensor.proto), 49 | [`tensor_shape`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/tensor_shape.proto), 50 | and [`types`](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/types.proto). 51 | 52 | ## What is *not* covered 53 | 54 | Some API functions are explicitly marked as "experimental" and can change in 55 | backward incompatible ways between minor releases. These include: 56 | 57 | * **Experimental APIs**: The @{tf.contrib} module and its submodules in Python 58 | and any functions in the C API or fields in protocol buffers that are 59 | explicitly commented as being experimental. 60 | 61 | * **Other languages**: TensorFlow APIs in languages other than Python and C, 62 | such as: 63 | 64 | - @{$cc/guide$C++} (exposed through header files in 65 | [`tensorflow/cc`](https://github.com/tensorflow/tensorflow/tree/master/tensorflow/cc)). 66 | - [Java](../api_docs/java/reference/org/tensorflow/package-summary), and 67 | - [Go](https://godoc.org/github.com/tensorflow/tensorflow/tensorflow/go) 68 | 69 | * **Details of composite ops:** Many public functions in Python expand to 70 | several primitive ops in the graph, and these details will be part of any 71 | graphs saved to disk as `GraphDef`s. These details are allowed to change for 72 | minor releases. In particular, regressions tests that check for exact 73 | matching between graphs are likely to break across minor releases, even 74 | though the behavior of the graph should be unchanged and existing 75 | checkpoints will still work. 76 | 77 | * **Floating point numerical details:** The specific floating point values 78 | computed by ops may change at any time: users should rely only on 79 | approximate accuracy and numerical stability, not on the specific bits 80 | computed. Changes to numerical formulas in minor and patch releases should 81 | result in comparable or improved accuracy, with the caveat that in machine 82 | learning improved accuracy of specific formulas may result in worse accuracy 83 | for the overall system. 84 | 85 | * **Random numbers:** The specific random numbers computed by the 86 | @{$python/constant_op#Random_Tensors$random ops} may change at any time: 87 | users should rely only on approximately correct distributions and 88 | statistical strength, not the specific bits computed. However, we will make 89 | changes to random bits rarely and ideally never for patch releases, and all 90 | such intended changes will be documented. 91 | 92 | * **Distributed Tensorflow:** Running 2 different versions of TensorFlow in a 93 | single cluster is unsupported. There are no guarantees about backwards 94 | compatibility of the wire protocol. 95 | 96 | Furthermore, any API methods marked "deprecated" in the 1.0 release can 97 | be deleted in any subsequent minor release. 98 | 99 | ## Compatibility for Graphs and Checkpoints 100 | 101 | Many users of TensorFlow will be saving graphs and trained models to disk for 102 | later evaluation or more training, often changing versions of TensorFlow in the 103 | process. First, following semver, any graph or checkpoint written out with one 104 | version of TensorFlow can be loaded and evaluated with a later version of 105 | TensorFlow with the same major release. However, we will endeavour to preserve 106 | backwards compatibility even across major releases when possible, so that the 107 | serialized files are usable over long periods of time. 108 | 109 | There are two main classes of saved TensorFlow data: graphs and checkpoints. 110 | Graphs describe the data flow graphs of ops to be run during training and 111 | inference, and checkpoints contain the saved tensor values of variables in a 112 | graph. 113 | 114 | Graphs are serialized via the `GraphDef` protocol buffer. To facilitate (rare) 115 | backwards incompatible changes to graphs, each `GraphDef` has an integer version 116 | separate from the TensorFlow version. The semantics are: 117 | 118 | * Each version of TensorFlow supports an interval of `GraphDef` versions. This 119 | interval with be constant across patch releases, and will only grow across 120 | minor releases. Dropping support for a `GraphDef` version will only occur 121 | for a major release of TensorFlow. 122 | 123 | * Newly created graphs use the newest `GraphDef` version. 124 | 125 | * If a given version of TensorFlow supports the `GraphDef` version of a graph, 126 | it will load and evaluate with the same behavior as when it was written out 127 | (except for floating point numerical details and random numbers), regardless 128 | of the major version of TensorFlow. In particular, all checkpoint files will 129 | be compatible. 130 | 131 | * If the `GraphDef` upper bound is increased to X in a (minor) release, there 132 | will be at least six months before the lower bound is increased to X. 133 | 134 | For example (numbers and versions hypothetical), TensorFlow 1.2 might support 135 | `GraphDef` versions 4 to 7. TensorFlow 1.3 could add `GraphDef` version 8 and 136 | support versions 4 to 8. At least six months later, TensorFlow 2.0.0 could drop 137 | support for versions 4 to 7, leaving version 8 only. 138 | 139 | Finally, when support for a `GraphDef` version is dropped, we will attempt to 140 | provide tools for automatically converting graphs to a newer supported 141 | `GraphDef` version. 142 | 143 | For developer-level details about `GraphDef` versioning, including how to evolve 144 | the versions to account for changes, see 145 | @{$data_versions$TensorFlow Data Versioning}. 146 | -------------------------------------------------------------------------------- /tutorials/using_gpu.md: -------------------------------------------------------------------------------- 1 | # Using GPUs 2 | 3 | ## Supported devices 4 | 5 | On a typical system, there are multiple computing devices. In TensorFlow, the 6 | supported device types are `CPU` and `GPU`. They are represented as `strings`. 7 | For example: 8 | 9 | * `"/cpu:0"`: The CPU of your machine. 10 | * `"/gpu:0"`: The GPU of your machine, if you have one. 11 | * `"/gpu:1"`: The second GPU of your machine, etc. 12 | 13 | If a TensorFlow operation has both CPU and GPU implementations, the GPU devices 14 | will be given priority when the operation is assigned to a device. For example, 15 | `matmul` has both CPU and GPU kernels. On a system with devices `cpu:0` and 16 | `gpu:0`, `gpu:0` will be selected to run `matmul`. 17 | 18 | ## Logging Device placement 19 | 20 | To find out which devices your operations and tensors are assigned to, create 21 | the session with `log_device_placement` configuration option set to `True`. 22 | 23 | ```python 24 | # Creates a graph. 25 | a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') 26 | b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') 27 | c = tf.matmul(a, b) 28 | # Creates a session with log_device_placement set to True. 29 | sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) 30 | # Runs the op. 31 | print(sess.run(c)) 32 | ``` 33 | 34 | You should see the following output: 35 | 36 | ``` 37 | Device mapping: 38 | /job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus 39 | id: 0000:05:00.0 40 | b: /job:localhost/replica:0/task:0/gpu:0 41 | a: /job:localhost/replica:0/task:0/gpu:0 42 | MatMul: /job:localhost/replica:0/task:0/gpu:0 43 | [[ 22. 28.] 44 | [ 49. 64.]] 45 | 46 | ``` 47 | 48 | ## Manual device placement 49 | 50 | If you would like a particular operation to run on a device of your choice 51 | instead of what's automatically selected for you, you can use `with tf.device` 52 | to create a device context such that all the operations within that context will 53 | have the same device assignment. 54 | 55 | ```python 56 | # Creates a graph. 57 | with tf.device('/cpu:0'): 58 | a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') 59 | b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') 60 | c = tf.matmul(a, b) 61 | # Creates a session with log_device_placement set to True. 62 | sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) 63 | # Runs the op. 64 | print(sess.run(c)) 65 | ``` 66 | 67 | You will see that now `a` and `b` are assigned to `cpu:0`. 68 | 69 | ``` 70 | Device mapping: 71 | /job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K40c, pci bus 72 | id: 0000:05:00.0 73 | b: /job:localhost/replica:0/task:0/cpu:0 74 | a: /job:localhost/replica:0/task:0/cpu:0 75 | MatMul: /job:localhost/replica:0/task:0/gpu:0 76 | [[ 22. 28.] 77 | [ 49. 64.]] 78 | ``` 79 | 80 | ## Allowing GPU memory growth 81 | 82 | By default, TensorFlow maps nearly all of the GPU memory of all GPUs (subject to 83 | [`CUDA_VISIBLE_DEVICES`](http://docs.nvidia.com/cuda/cuda-c-programming-guide/index.html#env-vars)) 84 | visible to the process. This is done to more efficiently use the relatively 85 | precious GPU memory resources on the devices by reducing [memory 86 | fragmentation](https://en.wikipedia.org/wiki/Fragmentation_\(computing\)). 87 | 88 | In some cases it is desirable for the process to only allocate a subset of the 89 | available memory, or to only grow the memory usage as is needed by the process. 90 | TensorFlow provides two Config options on the Session to control this. 91 | 92 | The first is the `allow_growth` option, which attempts to allocate only as much 93 | GPU memory based on runtime allocations: it starts out allocating very little 94 | memory, and as Sessions get run and more GPU memory is needed, we extend the GPU 95 | memory region needed by the TensorFlow process. Note that we do not release 96 | memory, since that can lead to even worse memory fragmentation. To turn this 97 | option on, set the option in the ConfigProto by: 98 | 99 | ```python 100 | config = tf.ConfigProto() 101 | config.gpu_options.allow_growth = True 102 | session = tf.Session(config=config, ...) 103 | ``` 104 | 105 | The second method is the `per_process_gpu_memory_fraction` option, which 106 | determines the fraction of the overall amount of memory that each visible GPU 107 | should be allocated. For example, you can tell TensorFlow to only allocate 40% 108 | of the total memory of each GPU by: 109 | 110 | ```python 111 | config = tf.ConfigProto() 112 | config.gpu_options.per_process_gpu_memory_fraction = 0.4 113 | session = tf.Session(config=config, ...) 114 | ``` 115 | 116 | This is useful if you want to truly bound the amount of GPU memory available to 117 | the TensorFlow process. 118 | 119 | ## Using a single GPU on a multi-GPU system 120 | 121 | If you have more than one GPU in your system, the GPU with the lowest ID will be 122 | selected by default. If you would like to run on a different GPU, you will need 123 | to specify the preference explicitly: 124 | 125 | ```python 126 | # Creates a graph. 127 | with tf.device('/gpu:2'): 128 | a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') 129 | b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') 130 | c = tf.matmul(a, b) 131 | # Creates a session with log_device_placement set to True. 132 | sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) 133 | # Runs the op. 134 | print(sess.run(c)) 135 | ``` 136 | 137 | If the device you have specified does not exist, you will get 138 | `InvalidArgumentError`: 139 | 140 | ``` 141 | InvalidArgumentError: Invalid argument: Cannot assign a device to node 'b': 142 | Could not satisfy explicit device specification '/gpu:2' 143 | [[Node: b = Const[dtype=DT_FLOAT, value=Tensor, _device="/gpu:2"]()]] 145 | ``` 146 | 147 | If you would like TensorFlow to automatically choose an existing and supported 148 | device to run the operations in case the specified one doesn't exist, you can 149 | set `allow_soft_placement` to `True` in the configuration option when creating 150 | the session. 151 | 152 | ```python 153 | # Creates a graph. 154 | with tf.device('/gpu:2'): 155 | a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3], name='a') 156 | b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2], name='b') 157 | c = tf.matmul(a, b) 158 | # Creates a session with allow_soft_placement and log_device_placement set 159 | # to True. 160 | sess = tf.Session(config=tf.ConfigProto( 161 | allow_soft_placement=True, log_device_placement=True)) 162 | # Runs the op. 163 | print(sess.run(c)) 164 | ``` 165 | 166 | ## Using multiple GPUs 167 | 168 | If you would like to run TensorFlow on multiple GPUs, you can construct your 169 | model in a multi-tower fashion where each tower is assigned to a different GPU. 170 | For example: 171 | 172 | ``` 173 | # Creates a graph. 174 | c = [] 175 | for d in ['/gpu:2', '/gpu:3']: 176 | with tf.device(d): 177 | a = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[2, 3]) 178 | b = tf.constant([1.0, 2.0, 3.0, 4.0, 5.0, 6.0], shape=[3, 2]) 179 | c.append(tf.matmul(a, b)) 180 | with tf.device('/cpu:0'): 181 | sum = tf.add_n(c) 182 | # Creates a session with log_device_placement set to True. 183 | sess = tf.Session(config=tf.ConfigProto(log_device_placement=True)) 184 | # Runs the op. 185 | print(sess.run(sum)) 186 | ``` 187 | 188 | You will see the following output. 189 | 190 | ``` 191 | Device mapping: 192 | /job:localhost/replica:0/task:0/gpu:0 -> device: 0, name: Tesla K20m, pci bus 193 | id: 0000:02:00.0 194 | /job:localhost/replica:0/task:0/gpu:1 -> device: 1, name: Tesla K20m, pci bus 195 | id: 0000:03:00.0 196 | /job:localhost/replica:0/task:0/gpu:2 -> device: 2, name: Tesla K20m, pci bus 197 | id: 0000:83:00.0 198 | /job:localhost/replica:0/task:0/gpu:3 -> device: 3, name: Tesla K20m, pci bus 199 | id: 0000:84:00.0 200 | Const_3: /job:localhost/replica:0/task:0/gpu:3 201 | Const_2: /job:localhost/replica:0/task:0/gpu:3 202 | MatMul_1: /job:localhost/replica:0/task:0/gpu:3 203 | Const_1: /job:localhost/replica:0/task:0/gpu:2 204 | Const: /job:localhost/replica:0/task:0/gpu:2 205 | MatMul: /job:localhost/replica:0/task:0/gpu:2 206 | AddN: /job:localhost/replica:0/task:0/cpu:0 207 | [[ 44. 56.] 208 | [ 98. 128.]] 209 | ``` 210 | 211 | The @{$deep_cnn$cifar10 tutorial} is a good example 212 | demonstrating how to do training with multiple GPUs. 213 | -------------------------------------------------------------------------------- /tutorials/recurrent.md: -------------------------------------------------------------------------------- 1 | # Recurrent Neural Networks 2 | 3 | ## Introduction 4 | 5 | Take a look at [this great article](http://colah.github.io/posts/2015-08-Understanding-LSTMs/) 6 | for an introduction to recurrent neural networks and LSTMs in particular. 7 | 8 | ## Language Modeling 9 | 10 | In this tutorial we will show how to train a recurrent neural network on 11 | a challenging task of language modeling. The goal of the problem is to fit a 12 | probabilistic model which assigns probabilities to sentences. It does so by 13 | predicting next words in a text given a history of previous words. For this 14 | purpose we will use the [Penn Tree Bank](https://catalog.ldc.upenn.edu/ldc99t42) 15 | (PTB) dataset, which is a popular benchmark for measuring the quality of these 16 | models, whilst being small and relatively fast to train. 17 | 18 | Language modeling is key to many interesting problems such as speech 19 | recognition, machine translation, or image captioning. It is also fun -- 20 | take a look [here](http://karpathy.github.io/2015/05/21/rnn-effectiveness/). 21 | 22 | For the purpose of this tutorial, we will reproduce the results from 23 | [Zaremba et al., 2014](http://arxiv.org/abs/1409.2329) 24 | ([pdf](http://arxiv.org/pdf/1409.2329.pdf)), which achieves very good quality 25 | on the PTB dataset. 26 | 27 | ## Tutorial Files 28 | 29 | This tutorial references the following files from `models/tutorials/rnn/ptb` in the [TensorFlow models repo](https://github.com/tensorflow/models): 30 | 31 | File | Purpose 32 | --- | --- 33 | `ptb_word_lm.py` | The code to train a language model on the PTB dataset. 34 | `reader.py` | The code to read the dataset. 35 | 36 | ## Download and Prepare the Data 37 | 38 | The data required for this tutorial is in the `data/` directory of the 39 | PTB dataset from Tomas Mikolov's webpage: 40 | http://www.fit.vutbr.cz/~imikolov/rnnlm/simple-examples.tgz 41 | 42 | The dataset is already preprocessed and contains overall 10000 different words, 43 | including the end-of-sentence marker and a special symbol (\) for rare 44 | words. In `reader.py`, we convert each word to a unique integer identifier, 45 | in order to make it easy for the neural network to process the data. 46 | 47 | ## The Model 48 | 49 | ### LSTM 50 | 51 | The core of the model consists of an LSTM cell that processes one word at a 52 | time and computes probabilities of the possible values for the next word in the 53 | sentence. The memory state of the network is initialized with a vector of zeros 54 | and gets updated after reading each word. For computational reasons, we will 55 | process data in mini-batches of size `batch_size`. 56 | 57 | The basic pseudocode is as follows: 58 | 59 | ```python 60 | lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size) 61 | # Initial state of the LSTM memory. 62 | state = tf.zeros([batch_size, lstm.state_size]) 63 | probabilities = [] 64 | loss = 0.0 65 | for current_batch_of_words in words_in_dataset: 66 | # The value of state is updated after processing each batch of words. 67 | output, state = lstm(current_batch_of_words, state) 68 | 69 | # The LSTM output can be used to make next word predictions 70 | logits = tf.matmul(output, softmax_w) + softmax_b 71 | probabilities.append(tf.nn.softmax(logits)) 72 | loss += loss_function(probabilities, target_words) 73 | ``` 74 | 75 | ### Truncated Backpropagation 76 | 77 | By design, the output of a recurrent neural network (RNN) depends on arbitrarily 78 | distant inputs. Unfortunately, this makes backpropagation computation difficult. 79 | In order to make the learning process tractable, it is common practice to create 80 | an "unrolled" version of the network, which contains a fixed number 81 | (`num_steps`) of LSTM inputs and outputs. The model is then trained on this 82 | finite approximation of the RNN. This can be implemented by feeding inputs of 83 | length `num_steps` at a time and performing a backward pass after each 84 | such input block. 85 | 86 | Here is a simplified block of code for creating a graph which performs 87 | truncated backpropagation: 88 | 89 | ```python 90 | # Placeholder for the inputs in a given iteration. 91 | words = tf.placeholder(tf.int32, [batch_size, num_steps]) 92 | 93 | lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size) 94 | # Initial state of the LSTM memory. 95 | initial_state = state = tf.zeros([batch_size, lstm.state_size]) 96 | 97 | for i in range(num_steps): 98 | # The value of state is updated after processing each batch of words. 99 | output, state = lstm(words[:, i], state) 100 | 101 | # The rest of the code. 102 | # ... 103 | 104 | final_state = state 105 | ``` 106 | 107 | And this is how to implement an iteration over the whole dataset: 108 | 109 | ```python 110 | # A numpy array holding the state of LSTM after each batch of words. 111 | numpy_state = initial_state.eval() 112 | total_loss = 0.0 113 | for current_batch_of_words in words_in_dataset: 114 | numpy_state, current_loss = session.run([final_state, loss], 115 | # Initialize the LSTM state from the previous iteration. 116 | feed_dict={initial_state: numpy_state, words: current_batch_of_words}) 117 | total_loss += current_loss 118 | ``` 119 | 120 | ### Inputs 121 | 122 | The word IDs will be embedded into a dense representation (see the 123 | @{$word2vec$Vector Representations Tutorial}) before feeding to 124 | the LSTM. This allows the model to efficiently represent the knowledge about 125 | particular words. It is also easy to write: 126 | 127 | ```python 128 | # embedding_matrix is a tensor of shape [vocabulary_size, embedding size] 129 | word_embeddings = tf.nn.embedding_lookup(embedding_matrix, word_ids) 130 | ``` 131 | 132 | The embedding matrix will be initialized randomly and the model will learn to 133 | differentiate the meaning of words just by looking at the data. 134 | 135 | ### Loss Function 136 | 137 | We want to minimize the average negative log probability of the target words: 138 | 139 | $$ \text{loss} = -\frac{1}{N}\sum_{i=1}^{N} \ln p_{\text{target}_i} $$ 140 | 141 | It is not very difficult to implement but the function 142 | `sequence_loss_by_example` is already available, so we can just use it here. 143 | 144 | The typical measure reported in the papers is average per-word perplexity (often 145 | just called perplexity), which is equal to 146 | 147 | $$e^{-\frac{1}{N}\sum_{i=1}^{N} \ln p_{\text{target}_i}} = e^{\text{loss}} $$ 148 | 149 | and we will monitor its value throughout the training process. 150 | 151 | ### Stacking multiple LSTMs 152 | 153 | To give the model more expressive power, we can add multiple layers of LSTMs 154 | to process the data. The output of the first layer will become the input of 155 | the second and so on. 156 | 157 | We have a class called `MultiRNNCell` that makes the implementation seamless: 158 | 159 | ```python 160 | lstm = tf.contrib.rnn.BasicLSTMCell(lstm_size, state_is_tuple=False) 161 | stacked_lstm = tf.contrib.rnn.MultiRNNCell([lstm] * number_of_layers, 162 | state_is_tuple=False) 163 | 164 | initial_state = state = stacked_lstm.zero_state(batch_size, tf.float32) 165 | for i in range(num_steps): 166 | # The value of state is updated after processing each batch of words. 167 | output, state = stacked_lstm(words[:, i], state) 168 | 169 | # The rest of the code. 170 | # ... 171 | 172 | final_state = state 173 | ``` 174 | 175 | ## Run the Code 176 | 177 | Start by cloning the [TensorFlow models repo](https://github.com/tensorflow/models) from GitHub. 178 | You'll also need to download the PTB dataset, as discussed at the beginning of 179 | this tutorial; we'll assume the dataset is located in `/tmp/simple-examples/data`. 180 | 181 | Run the following commands: 182 | 183 | ```bash 184 | cd models/tutorials/rnn/ptb 185 | python ptb_word_lm.py --data_path=/tmp/simple-examples/data/ --model=small 186 | ``` 187 | 188 | There are 3 supported model configurations in the tutorial code: "small", 189 | "medium" and "large". The difference between them is in size of the LSTMs and 190 | the set of hyperparameters used for training. 191 | 192 | The larger the model, the better results it should get. The `small` model should 193 | be able to reach perplexity below 120 on the test set and the `large` one below 194 | 80, though it might take several hours to train. 195 | 196 | ## What Next? 197 | 198 | There are several tricks that we haven't mentioned that make the model better, 199 | including: 200 | 201 | * decreasing learning rate schedule, 202 | * dropout between the LSTM layers. 203 | 204 | Study the code and modify it to improve the model even further. 205 | -------------------------------------------------------------------------------- /performance/xla/broadcasting.md: -------------------------------------------------------------------------------- 1 | # Broadcasting semantics 2 | 3 | This document describes how the broadcasting semantics in XLA work. 4 | 5 | ## What is broadcasting? 6 | 7 | Broadcasting is the process of making arrays with different shapes have 8 | compatible shapes for arithmetic operations. The terminology is borrowed from 9 | Numpy 10 | [(broadcasting)](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html). 11 | 12 | Broadcasting may be required for operations between multi-dimensional arrays of 13 | different ranks, or between multi-dimensional arrays with different but 14 | compatible shapes. Consider the addition `X+v` where `X` is a matrix (an array 15 | of rank 2) and `v` is a vector (an array of rank 1). To perform element-wise 16 | addition, XLA needs to "broadcast" the vector `v` to the same rank as the 17 | matrix `X`, by replicating `v` a certain number of times. The vector's length 18 | has to match at least one of the dimensions of the matrix. 19 | 20 | For example: 21 | 22 | |1 2 3| + |7 8 9| 23 | |4 5 6| 24 | 25 | The matrix's dimensions are (2,3), the vector's are (3). The vector is broadcast 26 | by replicating it over rows to get: 27 | 28 | |1 2 3| + |7 8 9| = |8 10 12| 29 | |4 5 6| |7 8 9| |11 13 15| 30 | 31 | In Numpy, this is called [broadcasting] 32 | (http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html). 33 | 34 | ## Principles 35 | 36 | XLA is a low-level infrastructure with a XLA language this is as strict and 37 | explicit as possible, avoiding implicit and "magical" features that may make 38 | some computations slightly easier to define, at the cost of more assumptions 39 | baked into user code that will be difficult to change in the long term. If 40 | necessary, implicit and magical features can be added in client-level wrappers. 41 | 42 | In regards to broadcasting, explicit broadcasting specifications on operations 43 | between arrays of different ranks is required. This is different from Numpy, 44 | which infers the specification when possible. 45 | 46 | ## Broadcasting a lower-rank array onto a higher-rank array 47 | 48 | *Scalars* can always be broadcast over arrays without an explicit specification 49 | of broadcasting dimensions. An element-wise binary operation between a scalar 50 | and an array means applying the operation with the scalar for each element in 51 | the array. For example, adding a scalar to a matrix means producing a matrix 52 | each element of which is a sum of the scalar with the corresponding input 53 | matrix's element. 54 | 55 | |1 2 3| + 7 = |8 9 10| 56 | |4 5 6| |11 12 13| 57 | 58 | Most broadcasting needs can be captured by using a tuple of dimensions on a 59 | binary operation. When the inputs to the operation have different ranks, this 60 | broadcasting tuple specifies which dimension(s) in the **higher-rank** array to 61 | match with the **lower-rank** array. 62 | 63 | Consider the previous example, instead of adding a scalar to a (2,3) matrix, add 64 | a vector of dimension (3) to a matrix of dimensions (2,3). *Without specifying 65 | broadcasting, this operation is invalid.* To correctly request matrix-vector 66 | addition, specify the broadcasting dimension to be (1), meaning the vector's 67 | dimension is matched to dimension 1 of the matrix. In 2D, if dimension 0 is 68 | considered as rows and dimension 1 as columns, this means that each element of 69 | the vector becomes a column of a size matching the number of rows in the matrix: 70 | 71 | |7 8 9| ==> |7 8 9| 72 | |7 8 9| 73 | 74 | As a more complex example, consider adding a 3-element vector (dimension (3)) to 75 | a 3x3 matrix (dimensions (3,3)). There are two ways broadcasting can happen for 76 | this example: 77 | 78 | (1) A broadcasting dimension of 1 can be used. Each vector element becomes a 79 | column and the vector is duplicated for each row in the matrix. 80 | 81 | |7 8 9| ==> |7 8 9| 82 | |7 8 9| 83 | |7 8 9| 84 | 85 | (2) A broadcasting dimension of 0 can be used. Each vector element becomes a row 86 | and the vector is duplicated for each column in the matrix. 87 | 88 | |7| ==> |7 7 7| 89 | |8| |8 8 8| 90 | |9| |9 9 9| 91 | 92 | > Note: when adding a 2x3 matrix to a 3-element vector, a broadcasting dimension 93 | > of 0 is invalid. 94 | 95 | The broadcasting dimensions can be a tuple that describes how a smaller rank 96 | shape is broadcast into a larger rank shape. For example, given a 2x3x4 cuboid 97 | and a 3x4 matrix, a broadcasting tuple (1,2) means matching the matrix to 98 | dimensions 1 and 2 of the cuboid. 99 | 100 | This type of broadcast is used in the binary ops in `ComputationBuilder`, if the 101 | `broadcast_dimensions` argument is given. For example, see 102 | [ComputationBuilder::Add](https://www.tensorflow.org/code/tensorflow/compiler/xla/client/computation_builder.cc). 103 | In the XLA source code, this type of broadcasting is sometimes called "InDim" 104 | broadcasting. 105 | 106 | ### Formal definition 107 | 108 | The broadcasting attribute allows matching a lower-rank array to a higher-rank 109 | array, by specifying which dimensions of the higher-rank array to match. For 110 | example, for an array with dimensions MxNxPxQ, a vector with dimension T can be 111 | matched as follows: 112 | 113 | MxNxPxQ 114 | 115 | dim 3: T 116 | dim 2: T 117 | dim 1: T 118 | dim 0: T 119 | 120 | In each case, T has to be equal to the matching dimension of the higher-rank 121 | array. The vector's values are then broadcast from the matched dimension to all 122 | the other dimensions. 123 | 124 | To match a TxV matrix onto the MxNxPxQ array, a pair of broadcasting dimensions 125 | are used: 126 | 127 | MxNxPxQ 128 | dim 2,3: T V 129 | dim 1,2: T V 130 | dim 0,3: T V 131 | etc... 132 | 133 | The order of dimensions in the broadcasting tuple has to be the order in which 134 | the lower-rank array's dimensions are expected to match the higher-rank array's 135 | dimensions. The first element in the tuple says which dimension in the 136 | higher-rank array has to match dimension 0 in the lower-rank array. The second 137 | element for dimension 1, and so on. The order of broadcast dimensions has to be 138 | strictly increasing. For example, in the previous example it is illegal to match 139 | V to N and T to P; it is also illegal to match V to both P and N. 140 | 141 | ## Broadcasting similar-rank arrays with degenerate dimensions 142 | 143 | A related broadcasting problem is broadcasting two arrays that have the same 144 | rank but different dimension sizes. Similarly to Numpy's rules, this is only 145 | possible when the arrays are *compatible*. Two arrays are compatible when all 146 | their dimensions are compatible. Two dimensions are compatible if: 147 | 148 | * They are equal, or 149 | * One of them is 1 (a "degenerate" dimension) 150 | 151 | When two compatible arrays are encountered, the result shape has the maximum 152 | among the two inputs at every dimension index. 153 | 154 | Examples: 155 | 156 | 1. (2,1) and (2,3) broadcast to (2,3). 157 | 2. (1,2,5) and (7,2,5) broadcast to (7,2,5) 158 | 3. (7,2,5) and (7,1,5) broadcast to (7,2,5) 159 | 4. (7,2,5) and (7,2,6) are incompatible and cannot be broadcast. 160 | 161 | A special case arises, and is also supported, where each of the input arrays has 162 | a degenerate dimension at a different index. In this case, the result is an 163 | "outer operation": (2,1) and (1,3) broadcast to (2,3). For more examples, 164 | consult the [Numpy documentation on 165 | broadcasting](http://docs.scipy.org/doc/numpy/user/basics.broadcasting.html). 166 | 167 | ## Broadcast composition 168 | 169 | Broadcasting of a lower-rank array to a higher-rank array **and** broadcasting 170 | using degenerate dimensions can both be performed in the same binary operation. 171 | For example, a vector of size 4 and an matrix of size 1x2 can be added together 172 | using broadcast dimensions value of (0): 173 | 174 | |1 2 3 4| + [5 6] // [5 6] is a 1x2 matrix, not a vector. 175 | 176 | First the vector is broadcast up to rank 2 (matrix) using the broadcast 177 | dimensions. The single value (0) in the broadcast dimensions indicates that 178 | dimension zero of the vector matches to dimension zero of the matrix. This 179 | produces an matrix of size 4xM where the value M is chosen to match the 180 | corresponding dimension size in the 1x2 array. Therefore, a 4x2 matrix is 181 | produced: 182 | 183 | |1 1| + [5 6] 184 | |2 2| 185 | |3 3| 186 | |4 4| 187 | 188 | Then "degenerate dimension broadcasting" broadcasts dimension zero of the 1x2 189 | matrix to match the corresponding dimension size of the right hand side: 190 | 191 | |1 1| + |5 6| |6 7| 192 | |2 2| + |5 6| = |7 8| 193 | |3 3| + |5 6| |8 9| 194 | |4 4| + |5 6| |9 10| 195 | 196 | A more complicated example is a matrix of size 1x2 added to an array of size 197 | 4x3x1 using broadcast dimensions of (1, 2). First the 1x2 matrix is broadcast up 198 | to rank 3 using the broadcast dimensions to produces an intermediate Mx1x2 array 199 | where the dimension size M is determined by the size of the larger operand (the 200 | 4x3x1 array) producing a 4x1x2 intermediate array. The M is at dimension 0 201 | (left-most dimension) because the dimensions 1 and 2 are mapped to the 202 | dimensions of the original 1x2 matrix as the broadcast dimension are (1, 2). 203 | This intermediate array can be added to the 4x3x1 matrix using broadcasting of 204 | degenerate dimensions to produce a 4x3x2 array result. 205 | -------------------------------------------------------------------------------- /install/install_java.md: -------------------------------------------------------------------------------- 1 | # Installing TensorFlow for Java 2 | 3 | TensorFlow provides APIs for use in Java programs. These APIs are particularly 4 | well-suited to loading models created in Python and executing them within a 5 | Java application. This guide explains how to install 6 | [TensorFlow for Java](https://www.tensorflow.org/api_docs/java/reference/org/tensorflow/package-summary) 7 | and use it in a Java application. 8 | 9 | **WARNING:** The TensorFlow Java API is *not* covered by the TensorFlow 10 | [API stability guarantees](https://www.tensorflow.org/programmers_guide/version_semantics). 11 | 12 | 13 | ## Supported Platforms 14 | 15 | TensorFlow for Java is supported on the following operating systems: 16 | 17 | * Linux 18 | * Mac OS X 19 | * Windows 20 | * Android 21 | 22 | The installation instructions for Android are in a separate 23 | [Android TensorFlow Support page](https://www.tensorflow.org/code/tensorflow/contrib/android). 24 | After installation, please see this 25 | [complete example](https://www.tensorflow.org/code/tensorflow/examples/android) 26 | of TensorFlow on Android. 27 | 28 | ## Using TensorFlow with a Maven project 29 | 30 | If your project uses [Apache Maven](https://maven.apache.org), then add the 31 | following to the project's `pom.xml` to use the TensorFlow Java APIs: 32 | 33 | ```xml 34 | 35 | org.tensorflow 36 | tensorflow 37 | 1.1.0 38 | 39 | ``` 40 | 41 | That's all. 42 | 43 | ### Example 44 | 45 | As an example, these steps will create a Maven project that uses TensorFlow: 46 | 47 | 1. Create the project's `pom.xml`: 48 | 49 | 50 | 51 | 4.0.0 52 | org.myorg 53 | label-image 54 | 1.0-SNAPSHOT 55 | 56 | HelloTF 57 | 58 | 59 | 1.7 60 | 1.7 61 | 62 | 63 | 64 | org.tensorflow 65 | tensorflow 66 | 1.1.0 67 | 68 | 69 | 70 | 71 | 72 | 2. Create the source file (`src/main/java/HelloTF.java`): 73 | 74 | 75 | import org.tensorflow.Graph; 76 | import org.tensorflow.Session; 77 | import org.tensorflow.Tensor; 78 | import org.tensorflow.TensorFlow; 79 | 80 | public class HelloTF { 81 | public static void main(String[] args) throws Exception { 82 | try (Graph g = new Graph()) { 83 | final String value = "Hello from " + TensorFlow.version(); 84 | 85 | // Construct the computation graph with a single operation, a constant 86 | // named "MyConst" with a value "value". 87 | try (Tensor t = Tensor.create(value.getBytes("UTF-8"))) { 88 | // The Java API doesn't yet include convenience functions for adding operations. 89 | g.opBuilder("Const", "MyConst").setAttr("dtype", t.dataType()).setAttr("value", t).build(); 90 | } 91 | 92 | // Execute the "MyConst" operation in a Session. 93 | try (Session s = new Session(g); 94 | Tensor output = s.runner().fetch("MyConst").run().get(0)) { 95 | System.out.println(new String(output.bytesValue(), "UTF-8")); 96 | } 97 | } 98 | } 99 | } 100 | 101 | 102 | 3. Compile and execute: 103 | 104 |
 # Use -q to hide logging from the mvn tool
105 |      mvn -q compile exec:java
106 | 107 | 108 | The preceeding command should output Hello from version. If it 109 | does, you've succesfully set up TensorFlow for Java and are ready to use it in 110 | Maven projects. If not, check 111 | [Stack Overflow](http://stackoverflow.com/questions/tagged/tensorflow) 112 | for possible solutions. You can skip reading the rest of this document. 113 | 114 | ## Using TensorFlow with JDK 115 | 116 | This section describes how to use TensorFlow using the `java` and `javac` 117 | commands from a JDK installation. If your project uses Apache Maven, then 118 | refer to the simpler instructions above instead. 119 | 120 | ### Install on Linux or Mac OS 121 | 122 | Take the following steps to install TensorFlow for Java on Linux or Mac OS: 123 | 124 | 1. Download 125 | [libtensorflow.jar](https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-1.1.0.jar), 126 | which is the TensorFlow Java Archive (JAR). 127 | 128 | 2. Decide whether you will run TensorFlow for Java on CPU(s) only or with 129 | the help of GPU(s). To help you decide, read the section entitled 130 | "Determine which TensorFlow to install" in one of the following guides: 131 | 132 | * @{$install_linux#determine_which_tensorflow_to_install$Installing TensorFlow on Linux} 133 | * @{$install_mac#determine_which_tensorflow_to_install$Installing TensorFlow on Mac OS} 134 | 135 | 3. Download and extract the appropriate Java Native Interface (JNI) 136 | file for your operating system and processor support by running the 137 | following shell commands: 138 | 139 | 140 | TF_TYPE="cpu" # Default processor is CPU. If you want GPU, set to "gpu" 141 | OS=$(uname -s | tr '[:upper:]' '[:lower:]') 142 | mkdir -p ./jni 143 | curl -L \ 144 | "https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow_jni-${TF_TYPE}-${OS}-x86_64-1.1.0.tar.gz" | 145 | tar -xz -C ./jni 146 | 147 | ### Install on Windows 148 | 149 | Take the following steps to install TensorFlow for Java on Windows: 150 | 151 | 1. Download 152 | [libtensorflow.jar](https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow-1.1.0.jar), 153 | which is the TensorFlow Java Archive (JAR). 154 | 2. Download the following Java Native Interface (JNI) file appropriate for 155 | [TensorFlow for Java on Windows](https://storage.googleapis.com/tensorflow/libtensorflow/libtensorflow_jni-cpu-windows-x86_64-1.1.0.zip). 156 | 3. Extract this .zip file. 157 | 158 | 159 | 160 | ### Validate the installation 161 | 162 | After installing TensorFlow for Java, validate your installation by entering 163 | the following code into a file named `HelloTF.java`: 164 | 165 | ```java 166 | import org.tensorflow.Graph; 167 | import org.tensorflow.Session; 168 | import org.tensorflow.Tensor; 169 | import org.tensorflow.TensorFlow; 170 | 171 | public class HelloTF { 172 | public static void main(String[] args) throws Exception { 173 | try (Graph g = new Graph()) { 174 | final String value = "Hello from " + TensorFlow.version(); 175 | 176 | // Construct the computation graph with a single operation, a constant 177 | // named "MyConst" with a value "value". 178 | try (Tensor t = Tensor.create(value.getBytes("UTF-8"))) { 179 | // The Java API doesn't yet include convenience functions for adding operations. 180 | g.opBuilder("Const", "MyConst").setAttr("dtype", t.dataType()).setAttr("value", t).build(); 181 | } 182 | 183 | // Execute the "MyConst" operation in a Session. 184 | try (Session s = new Session(g); 185 | Tensor output = s.runner().fetch("MyConst").run().get(0)) { 186 | System.out.println(new String(output.bytesValue(), "UTF-8")); 187 | } 188 | } 189 | } 190 | } 191 | ``` 192 | 193 | And use the instructions below to compile and run `HelloTF.java`. 194 | 195 | 196 | ### Compiling 197 | 198 | When compiling a Java program that uses TensorFlow, the downloaded `.jar` 199 | must be part of your `classpath`. For example, you can include the 200 | downloaded `.jar` in your `classpath` by using the `-cp` compilation flag 201 | as follows: 202 | 203 |
javac -cp libtensorflow-1.1.0.jar HelloTF.java
204 | 205 | 206 | ### Running 207 | 208 | To execute a Java program that depends on TensorFlow, ensure that the following 209 | two files are available to the JVM: 210 | 211 | * the downloaded `.jar` file 212 | * the extracted JNI library 213 | 214 | For example, the following command line executes the `HelloTF` program: 215 | 216 |
java -cp libtensorflow-1.1.0.jar:. -Djava.library.path=./jni HelloTF
217 | 218 | If the program prints Hello from version, you've successfully 219 | installed TensorFlow for Java and are ready to use the API. If the program 220 | outputs something else, check 221 | [Stack Overflow](http://stackoverflow.com/questions/tagged/tensorflow) 222 | for possible solutions. 223 | 224 | 225 | ### Advanced Example 226 | 227 | For a more sophisticated example, see 228 | [LabelImage.java](https://www.tensorflow.org/code/tensorflow/java/src/main/java/org/tensorflow/examples/LabelImage.java), 229 | which recognizes objects in an image. 230 | 231 | 232 | ## Building from source code 233 | 234 | TensorFlow is open-source. You may build TensorFlow for Java from the 235 | TensorFlow source code by following the instructions in a 236 | [separate document](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/java/README.md). 237 | -------------------------------------------------------------------------------- /extend/new_data_formats.md: -------------------------------------------------------------------------------- 1 | # Custom Data Readers 2 | 3 | PREREQUISITES: 4 | 5 | * Some familiarity with C++. 6 | * Must have 7 | @{$install_sources$downloaded TensorFlow source}, and be 8 | able to build it. 9 | 10 | We divide the task of supporting a file format into two pieces: 11 | 12 | * File formats: We use a *Reader* Op to read a *record* (which can be any 13 | string) from a file. 14 | * Record formats: We use decoder or parsing Ops to turn a string record 15 | into tensors usable by TensorFlow. 16 | 17 | For example, to read a 18 | [CSV file](https://en.wikipedia.org/wiki/Comma-separated_values), we use 19 | @{tf.TextLineReader$a Reader for text files} 20 | followed by 21 | @{tf.decode_csv$an Op that parses CSV data from a line of text}. 22 | 23 | [TOC] 24 | 25 | ## Writing a Reader for a file format 26 | 27 | A `Reader` is something that reads records from a file. There are some examples 28 | of Reader Ops already built into TensorFlow: 29 | 30 | * @{tf.TFRecordReader} 31 | ([source in `kernels/tf_record_reader_op.cc`](https://www.tensorflow.org/code/tensorflow/core/kernels/tf_record_reader_op.cc)) 32 | * @{tf.FixedLengthRecordReader} 33 | ([source in `kernels/fixed_length_record_reader_op.cc`](https://www.tensorflow.org/code/tensorflow/core/kernels/fixed_length_record_reader_op.cc)) 34 | * @{tf.TextLineReader} 35 | ([source in `kernels/text_line_reader_op.cc`](https://www.tensorflow.org/code/tensorflow/core/kernels/text_line_reader_op.cc)) 36 | 37 | You can see these all expose the same interface, the only differences 38 | are in their constructors. The most important method is `read`. 39 | It takes a queue argument, which is where it gets filenames to 40 | read from whenever it needs one (e.g. when the `read` op first runs, or 41 | the previous `read` reads the last record from a file). It produces 42 | two scalar tensors: a string key and a string value. 43 | 44 | To create a new reader called `SomeReader`, you will need to: 45 | 46 | 1. In C++, define a subclass of 47 | [`tensorflow::ReaderBase`](https://www.tensorflow.org/code/tensorflow/core/framework/reader_base.h) 48 | called `SomeReader`. 49 | 2. In C++, register a new reader op and kernel with the name `"SomeReader"`. 50 | 3. In Python, define a subclass of @{tf.ReaderBase} called `SomeReader`. 51 | 52 | You can put all the C++ code in a file in 53 | `tensorflow/core/user_ops/some_reader_op.cc`. The code to read a file will live 54 | in a descendant of the C++ `ReaderBase` class, which is defined in 55 | [`tensorflow/core/kernels/reader_base.h`](https://www.tensorflow.org/code/tensorflow/core/framework/reader_base.h). 56 | You will need to implement the following methods: 57 | 58 | * `OnWorkStartedLocked`: open the next file 59 | * `ReadLocked`: read a record or report EOF/error 60 | * `OnWorkFinishedLocked`: close the current file, and 61 | * `ResetLocked`: get a clean slate after, e.g., an error 62 | 63 | These methods have names ending in "Locked" since `ReaderBase` makes sure 64 | to acquire a mutex before calling any of these methods, so you generally don't 65 | have to worry about thread safety (though that only protects the members of the 66 | class, not global state). 67 | 68 | For `OnWorkStartedLocked`, the name of the file to open is the value returned by 69 | the `current_work()` method. `ReadLocked` has this signature: 70 | 71 | ```c++ 72 | Status ReadLocked(string* key, string* value, bool* produced, bool* at_end) 73 | ``` 74 | 75 | If `ReadLocked` successfully reads a record from the file, it should fill in: 76 | 77 | * `*key`: with an identifier for the record, that a human could use to find 78 | this record again. You can include the filename from `current_work()`, 79 | and append a record number or whatever. 80 | * `*value`: with the contents of the record. 81 | * `*produced`: set to `true`. 82 | 83 | If you hit the end of a file (EOF), set `*at_end` to `true`. In either case, 84 | return `Status::OK()`. If there is an error, simply return it using one of the 85 | helper functions from 86 | [`tensorflow/core/lib/core/errors.h`](https://www.tensorflow.org/code/tensorflow/core/lib/core/errors.h) 87 | without modifying any arguments. 88 | 89 | Next you will create the actual Reader op. It will help if you are familiar 90 | with @{$adding_an_op$the adding an op how-to}. The main steps 91 | are: 92 | 93 | * Registering the op. 94 | * Define and register an `OpKernel`. 95 | 96 | To register the op, you will use a `REGISTER_OP` call defined in 97 | [`tensorflow/core/framework/op.h`](https://www.tensorflow.org/code/tensorflow/core/framework/op.h). 98 | Reader ops never take any input and always have a single output with type 99 | `resource`. They should have string `container` and `shared_name` attrs. 100 | You may optionally define additional attrs 101 | for configuration or include documentation in a `Doc`. For examples, see 102 | [`tensorflow/core/ops/io_ops.cc`](https://www.tensorflow.org/code/tensorflow/core/ops/io_ops.cc), 103 | e.g.: 104 | 105 | ```c++ 106 | #include "tensorflow/core/framework/op.h" 107 | 108 | REGISTER_OP("TextLineReader") 109 | .Output("reader_handle: resource") 110 | .Attr("skip_header_lines: int = 0") 111 | .Attr("container: string = ''") 112 | .Attr("shared_name: string = ''") 113 | .SetIsStateful() 114 | .SetShapeFn(shape_inference::ScalarShape) 115 | .Doc(R"doc( 116 | A Reader that outputs the lines of a file delimited by '\n'. 117 | )doc"); 118 | ``` 119 | 120 | To define an `OpKernel`, Readers can use the shortcut of descending from 121 | `ReaderOpKernel`, defined in 122 | [`tensorflow/core/framework/reader_op_kernel.h`](https://www.tensorflow.org/code/tensorflow/core/framework/reader_op_kernel.h), 123 | and implement a constructor that calls `SetReaderFactory`. After defining 124 | your class, you will need to register it using `REGISTER_KERNEL_BUILDER(...)`. 125 | An example with no attrs: 126 | 127 | ```c++ 128 | #include "tensorflow/core/framework/reader_op_kernel.h" 129 | 130 | class TFRecordReaderOp : public ReaderOpKernel { 131 | public: 132 | explicit TFRecordReaderOp(OpKernelConstruction* context) 133 | : ReaderOpKernel(context) { 134 | Env* env = context->env(); 135 | SetReaderFactory([this, env]() { return new TFRecordReader(name(), env); }); 136 | } 137 | }; 138 | 139 | REGISTER_KERNEL_BUILDER(Name("TFRecordReader").Device(DEVICE_CPU), 140 | TFRecordReaderOp); 141 | ``` 142 | 143 | An example with attrs: 144 | 145 | ```c++ 146 | #include "tensorflow/core/framework/reader_op_kernel.h" 147 | 148 | class TextLineReaderOp : public ReaderOpKernel { 149 | public: 150 | explicit TextLineReaderOp(OpKernelConstruction* context) 151 | : ReaderOpKernel(context) { 152 | int skip_header_lines = -1; 153 | OP_REQUIRES_OK(context, 154 | context->GetAttr("skip_header_lines", &skip_header_lines)); 155 | OP_REQUIRES(context, skip_header_lines >= 0, 156 | errors::InvalidArgument("skip_header_lines must be >= 0 not ", 157 | skip_header_lines)); 158 | Env* env = context->env(); 159 | SetReaderFactory([this, skip_header_lines, env]() { 160 | return new TextLineReader(name(), skip_header_lines, env); 161 | }); 162 | } 163 | }; 164 | 165 | REGISTER_KERNEL_BUILDER(Name("TextLineReader").Device(DEVICE_CPU), 166 | TextLineReaderOp); 167 | ``` 168 | 169 | The last step is to add the Python wrapper. You can either do this by 170 | @{$adding_an_op#building_the_op_library$compiling a dynamic library} 171 | or, if you are building TensorFlow from source, adding to `user_ops.py`. 172 | For the latter, you will import `tensorflow.python.ops.io_ops` in 173 | [`tensorflow/python/user_ops/user_ops.py`](https://www.tensorflow.org/code/tensorflow/python/user_ops/user_ops.py) 174 | and add a descendant of [`io_ops.ReaderBase`](https://www.tensorflow.org/code/tensorflow/python/ops/io_ops.py). 175 | 176 | ```python 177 | from tensorflow.python.framework import ops 178 | from tensorflow.python.ops import common_shapes 179 | from tensorflow.python.ops import io_ops 180 | 181 | class SomeReader(io_ops.ReaderBase): 182 | 183 | def __init__(self, name=None): 184 | rr = gen_user_ops.some_reader(name=name) 185 | super(SomeReader, self).__init__(rr) 186 | 187 | 188 | ops.NotDifferentiable("SomeReader") 189 | ``` 190 | 191 | You can see some examples in 192 | [`tensorflow/python/ops/io_ops.py`](https://www.tensorflow.org/code/tensorflow/python/ops/io_ops.py). 193 | 194 | ## Writing an Op for a record format 195 | 196 | Generally this is an ordinary op that takes a scalar string record as input, and 197 | so follow @{$adding_an_op$the instructions to add an Op}. 198 | You may optionally take a scalar string key as input, and include that in error 199 | messages reporting improperly formatted data. That way users can more easily 200 | track down where the bad data came from. 201 | 202 | Examples of Ops useful for decoding records: 203 | 204 | * @{tf.parse_single_example} 205 | (and 206 | @{tf.parse_example}) 207 | * @{tf.decode_csv} 208 | * @{tf.decode_raw} 209 | 210 | Note that it can be useful to use multiple Ops to decode a particular record 211 | format. For example, you may have an image saved as a string in 212 | [a `tf.train.Example` protocol buffer](https://www.tensorflow.org/code/tensorflow/core/example/example.proto). 213 | Depending on the format of that image, you might take the corresponding output 214 | from a 215 | @{tf.parse_single_example} 216 | op and call @{tf.image.decode_jpeg}, 217 | @{tf.image.decode_png}, or 218 | @{tf.decode_raw}. It is common to 219 | take the output of `tf.decode_raw` and use 220 | @{tf.slice} and 221 | @{tf.reshape} to extract pieces. 222 | -------------------------------------------------------------------------------- /extend/tool_developers/index.md: -------------------------------------------------------------------------------- 1 | # A Tool Developer's Guide to TensorFlow Model Files 2 | 3 | Most users shouldn't need to care about the internal details of how TensorFlow 4 | stores data on disk, but you might if you're a tool developer. For example, you 5 | may want to analyze models, or convert back and forth between TensorFlow and 6 | other formats. This guide tries to explain some of the details of how you can 7 | work with the main files that hold model data, to make it easier to develop 8 | those kind of tools. 9 | 10 | [TOC] 11 | 12 | ## Protocol Buffers 13 | 14 | All of TensorFlow's file formats are based on 15 | [Protocol Buffers](https://developers.google.com/protocol-buffers/?hl=en), so to 16 | start it's worth getting familiar with how they work. The summary is that you 17 | define data structures in text files, and the protobuf tools generate classes in 18 | C, Python, and other languages that can load, save, and access the data in a 19 | friendly way. We often refer to Protocol Buffers as protobufs, and I'll use 20 | that convention in this guide. 21 | 22 | ## GraphDef 23 | 24 | The foundation of computation in TensorFlow is the `Graph` object. This holds a 25 | network of nodes, each representing one operation, connected to each other as 26 | inputs and outputs. After you've created a `Graph` object, you can save it out 27 | by calling `as_graph_def()`, which returns a `GraphDef` object. 28 | 29 | The GraphDef class is an object created by the ProtoBuf library from the 30 | definition in 31 | [tensorflow/core/framework/graph.proto](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/graph.proto). The protobuf tools parse 32 | this text file, and generate the code to load, store, and manipulate graph 33 | definitions. If you see a standalone TensorFlow file representing a model, it's 34 | likely to contain a serialized version of one of these `GraphDef` objects 35 | saved out by the protobuf code. 36 | 37 | This generated code is used to save and load the GraphDef files from disk. The code that actually loads the model looks like this: 38 | 39 | ```python 40 | graph_def = graph_pb2.GraphDef() 41 | ``` 42 | 43 | This line creates an empty `GraphDef` object, the class that's been created 44 | from the textual definition in graph.proto. This is the object we're going to 45 | populate with the data from our file. 46 | 47 | ```python 48 | with open(FLAGS.graph, "rb") as f: 49 | ``` 50 | 51 | Here we get a file handle for the path we've passed in to the script 52 | 53 | ```python 54 | if FLAGS.input_binary: 55 | graph_def.ParseFromString(f.read()) 56 | else: 57 | text_format.Merge(f.read(), graph_def) 58 | ``` 59 | 60 | ## Text or Binary? 61 | 62 | There are actually two different formats that a ProtoBuf can be saved in. 63 | TextFormat is a human-readable form, which makes it nice for debugging and 64 | editing, but can get large when there's numerical data like weights stored in 65 | it. You can see a small example of that in 66 | [graph_run_run2.pbtxt](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/tensorboard/components/tf_tensorboard/test/data/graph_run_run2.pbtxt). 67 | 68 | Binary format files are a lot smaller than their text equivalents, even though 69 | they're not as readable for us. In this script, we ask the user to supply a 70 | flag indicating whether the input file is binary or text, so we know the right 71 | function to call. You can find an example of a large binary file inside the 72 | [inception_v3 archive](https://storage.googleapis.com/download.tensorflow.org/models/inception_v3_2016_08_28_frozen.pb.tar.gz), 73 | as `inception_v3_2016_08_28_frozen.pb`. 74 | 75 | The API itself can be a bit confusing - the binary call is actually 76 | `ParseFromString()`, whereas you use a utility function from the `text_format` 77 | module to load textual files. 78 | 79 | ## Nodes 80 | 81 | Once you've loaded a file into the `graph_def` variable, you can now access the 82 | data inside it. For most practical purposes, the important section is the list 83 | of nodes stored in the node member. Here's the code that loops through those: 84 | 85 | ```python 86 | for node in graph_def.node 87 | ``` 88 | 89 | Each node is a `NodeDef` object, defined in 90 | [tensorflow/core/framework/node_def.proto](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/node_def.proto). These 91 | are the fundamental building blocks of TensorFlow graphs, with each one defining 92 | a single operation along with its input connections. Here are the members of a 93 | `NodeDef`, and what they mean. 94 | 95 | ### `name` 96 | 97 | Every node should have a unique identifier that's not used by any other nodes 98 | in the graph. If you don't specify one as you're building a graph using the 99 | Python API, one reflecting the name of operation, such as "MatMul", 100 | concatenated with a monotonically increasing number, such as "5", will be 101 | picked for you. The name is used when defining the connections between nodes, 102 | and when setting inputs and outputs for the whole graph when it's run. 103 | 104 | ### `op` 105 | 106 | This defines what operation to run, for example `"Add"`, `"MatMul"`, or 107 | `"Conv2D"`. When a graph is run, this op name is looked up in a registry to 108 | find an implementation. The registry is populated by calls to the 109 | `REGISTER_OP()` macro, like those in 110 | [tensorflow/core/ops/nn_ops.cc](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/ops/nn_ops.cc). 111 | 112 | ### `input` 113 | 114 | A list of strings, each one of which is the name of another node, optionally 115 | followed by a colon and an output port number. For example, a node with two 116 | inputs might have a list like `["some_node_name", "another_node_name"]`, which 117 | is equivalent to `["some_node_name:0", "another_node_name:0"]`, and defines the 118 | node's first input as the first output from the node with the name 119 | `"some_node_name"`, and a second input from the first output of 120 | `"another_node_name"` 121 | 122 | ### `device` 123 | 124 | In most cases you can ignore this, since it defines where to run a node in a 125 | distributed environment, or when you want to force the operation onto CPU or 126 | GPU. 127 | 128 | ### `attr` 129 | 130 | This is a key/value store holding all the attributes of a node. These are the 131 | permanent properties of nodes, things that don't change at runtime such as the 132 | size of filters for convolutions, or the values of constant ops. Because there 133 | can be so many different types of attribute values, from strings, to ints, to 134 | arrays of tensor values, there's a separate protobuf file defining the data 135 | structure that holds them, in 136 | [tensorflow/core/framework/attr_value.proto](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/attr_value.proto). 137 | 138 | Each attribute has a unique name string, and the expected attributes are listed 139 | when the operation is defined. If an attribute isn't present in a node, but it 140 | has a default listed in the operation definition, that default is used when the 141 | graph is created. 142 | 143 | You can access all of these members by calling `node.name`, `node.op`, etc. in 144 | Python. The list of nodes stored in the `GraphDef` is a full definition of the 145 | model architecture. 146 | 147 | ## Freezing 148 | 149 | One confusing part about this is that the weights usually aren't stored inside 150 | the file format during training. Instead, they're held in separate checkpoint 151 | files, and there are `Variable` ops in the graph that load the latest values 152 | when they're initialized. It's often not very convenient to have separate files 153 | when you're deploying to production, so there's the 154 | [freeze_graph.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py) script that takes a graph definition and a set 155 | of checkpoints and freezes them together into a single file. 156 | 157 | What this does is load the `GraphDef`, pull in the values for all the variables 158 | from the latest checkpoint file, and then replace each `Variable` op with a 159 | `Const` that has the numerical data for the weights stored in its attributes 160 | It then strips away all the extraneous nodes that aren't used for forward 161 | inference, and saves out the resulting `GraphDef` into an output file. 162 | 163 | ## Weight Formats 164 | 165 | If you're dealing with TensorFlow models that represent neural networks, one of 166 | the most common problems is extracting and interpreting the weight values. A 167 | common way to store them, for example in graphs created by the freeze_graph 168 | script, is as `Const` ops containing the weights as `Tensors`. These are 169 | defined in 170 | [tensorflow/core/framework/tensor.proto](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/framework/tensor.proto), and contain information 171 | about the size and type of the data, as well as the values themselves. In 172 | Python, you get a `TensorProto` object from a `NodeDef` representing a `Const` 173 | op by calling something like `some_node_def.attr['value'].tensor`. 174 | 175 | This will give you an object representing the weights data. The data itself 176 | will be stored in one of the lists with the suffix _val as indicated by the 177 | type of the object, for example `float_val` for 32-bit float data types. 178 | 179 | The ordering of convolution weight values is often tricky to deal with when 180 | converting between different frameworks. In TensorFlow, the filter weights for 181 | the `Conv2D` operation are stored on the second input, and are expected to be 182 | in the order `[filter_height, filter_width, input_depth, output_depth]`, where 183 | filter_count increasing by one means moving to an adjacent value in memory. 184 | 185 | Hopefully this rundown gives you a better idea of what's going on inside 186 | TensorFlow model files, and will help you if you ever need to manipulate them. 187 | -------------------------------------------------------------------------------- /programmers_guide/variables.md: -------------------------------------------------------------------------------- 1 | # Variables: Creation, Initialization, Saving, and Loading 2 | 3 | When you train a model, you use @{$python/state_ops$variables} 4 | to hold and update parameters. Variables are in-memory buffers containing 5 | tensors. They must be explicitly initialized and can be saved to disk during 6 | and after training. You can later restore saved values to exercise or analyze 7 | the model. 8 | 9 | This document references the following TensorFlow classes. Follow the links to 10 | their reference manual for a complete description of their API: 11 | 12 | * The @{tf.Variable} class. 13 | * The @{tf.train.Saver} class. 14 | 15 | 16 | ## Creation 17 | 18 | When you create a @{$python/state_ops$Variable} you pass a 19 | `Tensor` as its initial value to the `Variable()` constructor. TensorFlow 20 | provides a collection of ops that produce tensors often used for initialization 21 | from @{$python/constant_op$constants or random values}. 22 | 23 | Note that all these ops require you to specify the shape of the tensors. That 24 | shape automatically becomes the shape of the variable. Variables generally 25 | have a fixed shape, but TensorFlow provides advanced mechanisms to reshape 26 | variables. 27 | 28 | ```python 29 | # Create two variables. 30 | weights = tf.Variable(tf.random_normal([784, 200], stddev=0.35), 31 | name="weights") 32 | biases = tf.Variable(tf.zeros([200]), name="biases") 33 | ``` 34 | 35 | Calling `tf.Variable()` adds several ops to the graph: 36 | 37 | * A `variable` op that holds the variable value. 38 | * An initializer op that sets the variable to its initial value. This is 39 | actually a `tf.assign` op. 40 | * The ops for the initial value, such as the `zeros` op for the `biases` 41 | variable in the example are also added to the graph. 42 | 43 | The value returned by `tf.Variable()` value is an instance of the Python class 44 | `tf.Variable`. 45 | 46 | ### Device placement 47 | 48 | A variable can be pinned to a particular device when it is created, using a 49 | @{tf.device$`with tf.device(...):`} block: 50 | 51 | ```python 52 | # Pin a variable to CPU. 53 | with tf.device("/cpu:0"): 54 | v = tf.Variable(...) 55 | 56 | # Pin a variable to GPU. 57 | with tf.device("/gpu:0"): 58 | v = tf.Variable(...) 59 | 60 | # Pin a variable to a particular parameter server task. 61 | with tf.device("/job:ps/task:7"): 62 | v = tf.Variable(...) 63 | ``` 64 | 65 | **N.B.** Operations that mutate a variable, such as 66 | @{tf.Variable.assign} and the parameter 67 | update operations in a 68 | @{tf.train.Optimizer} *must* run on 69 | the same device as the variable. Incompatible device placement directives will 70 | be ignored when creating these operations. 71 | 72 | Device placement is particularly important when running in a replicated 73 | setting. See 74 | @{tf.train.replica_device_setter} 75 | for details of a device function that can simplify the configuration for devices 76 | for a replicated model. 77 | 78 | ## Initialization 79 | 80 | Variable initializers must be run explicitly before other ops in your model can 81 | be run. The easiest way to do that is to add an op that runs all the variable 82 | initializers, and run that op before using the model. 83 | 84 | You can alternatively restore variable values from a checkpoint file, see 85 | below. 86 | 87 | Use `tf.global_variables_initializer()` to add an op to run variable initializers. 88 | Only run that op after you have fully constructed your model and launched it in 89 | a session. 90 | 91 | ```python 92 | # Create two variables. 93 | weights = tf.Variable(tf.random_normal([784, 200], stddev=0.35), 94 | name="weights") 95 | biases = tf.Variable(tf.zeros([200]), name="biases") 96 | ... 97 | # Add an op to initialize the variables. 98 | init_op = tf.global_variables_initializer() 99 | 100 | # Later, when launching the model 101 | with tf.Session() as sess: 102 | # Run the init operation. 103 | sess.run(init_op) 104 | ... 105 | # Use the model 106 | ... 107 | ``` 108 | 109 | ### Initialization from another Variable 110 | 111 | You sometimes need to initialize a variable from the initial value of another 112 | variable. As the op added by `tf.global_variables_initializer()` initializes all 113 | variables in parallel you have to be careful when this is needed. 114 | 115 | To initialize a new variable from the value of another variable use the other 116 | variable's `initialized_value()` property. You can use the initialized value 117 | directly as the initial value for the new variable, or you can use it as any 118 | other tensor to compute a value for the new variable. 119 | 120 | 121 | ```python 122 | # Create a variable with a random value. 123 | weights = tf.Variable(tf.random_normal([784, 200], stddev=0.35), 124 | name="weights") 125 | # Create another variable with the same value as 'weights'. 126 | w2 = tf.Variable(weights.initialized_value(), name="w2") 127 | # Create another variable with twice the value of 'weights' 128 | w_twice = tf.Variable(weights.initialized_value() * 2.0, name="w_twice") 129 | ``` 130 | 131 | ### Custom Initialization 132 | 133 | The convenience function `tf.global_variables_initializer()` adds an op to 134 | initialize *all variables* in the model. You can also pass an explicit list of 135 | variables to initialize to `tf.variables_initializer`. See the 136 | @{$python/state_ops$Variables Documentation} for more options, 137 | including checking if variables are initialized. 138 | 139 | ## Saving and Restoring 140 | 141 | The easiest way to save and restore a model is to use a `tf.train.Saver` object. 142 | The constructor adds `save` and `restore` ops to the graph for all, or a 143 | specified list, of the variables in the graph. The saver object provides 144 | methods to run these ops, specifying paths for the checkpoint files to write to 145 | or read from. 146 | 147 | ### Checkpoint Files 148 | 149 | Variables are saved in binary files that, roughly, contain a map from variable 150 | names to tensor values. 151 | 152 | When you create a `Saver` object, you can optionally choose names for the 153 | variables in the checkpoint files. By default, it uses the value of the 154 | @{tf.Variable.name} property for 155 | each variable. 156 | 157 | To understand what variables are in a checkpoint, you can use the 158 | [`inspect_checkpoint`](https://www.tensorflow.org/code/tensorflow/python/tools/inspect_checkpoint.py) 159 | library, and in particular, the `print_tensors_in_checkpoint_file` function. 160 | 161 | ### Saving Variables 162 | 163 | Create a `Saver` with `tf.train.Saver()` to manage all variables in 164 | the model. 165 | 166 | ```python 167 | # Create some variables. 168 | v1 = tf.Variable(..., name="v1") 169 | v2 = tf.Variable(..., name="v2") 170 | ... 171 | # Add an op to initialize the variables. 172 | init_op = tf.global_variables_initializer() 173 | 174 | # Add ops to save and restore all the variables. 175 | saver = tf.train.Saver() 176 | 177 | # Later, launch the model, initialize the variables, do some work, save the 178 | # variables to disk. 179 | with tf.Session() as sess: 180 | sess.run(init_op) 181 | # Do some work with the model. 182 | .. 183 | # Save the variables to disk. 184 | save_path = saver.save(sess, "/tmp/model.ckpt") 185 | print("Model saved in file: %s" % save_path) 186 | ``` 187 | 188 | ### Restoring Variables 189 | 190 | The same `Saver` object is used to restore variables. Note that when you 191 | restore variables from a file you do not have to initialize them beforehand. 192 | 193 | ```python 194 | # Create some variables. 195 | v1 = tf.Variable(..., name="v1") 196 | v2 = tf.Variable(..., name="v2") 197 | ... 198 | # Add ops to save and restore all the variables. 199 | saver = tf.train.Saver() 200 | 201 | # Later, launch the model, use the saver to restore variables from disk, and 202 | # do some work with the model. 203 | with tf.Session() as sess: 204 | # Restore variables from disk. 205 | saver.restore(sess, "/tmp/model.ckpt") 206 | print("Model restored.") 207 | # Do some work with the model 208 | ... 209 | ``` 210 | 211 | ### Choosing which Variables to Save and Restore 212 | 213 | If you do not pass any argument to `tf.train.Saver()` the saver handles all 214 | variables in the graph. Each one of them is saved under the name that was 215 | passed when the variable was created. 216 | 217 | It is sometimes useful to explicitly specify names for variables in the 218 | checkpoint files. For example, you may have trained a model with a variable 219 | named `"weights"` whose value you want to restore in a new variable named 220 | `"params"`. 221 | 222 | It is also sometimes useful to only save or restore a subset of the variables 223 | used by a model. For example, you may have trained a neural net with 5 layers, 224 | and you now want to train a new model with 6 layers, restoring the parameters 225 | from the 5 layers of the previously trained model into the first 5 layers of 226 | the new model. 227 | 228 | You can easily specify the names and variables to save by passing to the 229 | `tf.train.Saver()` constructor a Python dictionary: keys are the 230 | names to use, values are the variables to manage. 231 | 232 | Notes: 233 | 234 | * You can create as many saver objects as you want if you need to save and 235 | restore different subsets of the model variables. The same variable can be 236 | listed in multiple saver objects, its value is only changed when the saver 237 | `restore()` method is run. 238 | 239 | * If you only restore a subset of the model variables at the start 240 | of a session, you have to run an initialize op for the other variables. See 241 | @{tf.variables_initializer} 242 | for more information. 243 | 244 | ```python 245 | # Create some variables. 246 | v1 = tf.Variable(..., name="v1") 247 | v2 = tf.Variable(..., name="v2") 248 | ... 249 | # Add ops to save and restore only 'v2' using the name "my_v2" 250 | saver = tf.train.Saver({"my_v2": v2}) 251 | # Use the saver object normally after that. 252 | ... 253 | ``` 254 | -------------------------------------------------------------------------------- /extend/architecture.md: -------------------------------------------------------------------------------- 1 | # TensorFlow Architecture 2 | 3 | We designed TensorFlow for large-scale distributed training and inference, but 4 | it is also flexible enough to support experimentation with new machine 5 | learning models and system-level optimizations. 6 | 7 | This document describes the system architecture that makes possible this 8 | combination of scale and flexibility. It assumes that you have basic familiarity 9 | with TensorFlow programming concepts such as the computation graph, operations, 10 | and sessions. See @{$get_started$Getting Started} 11 | for an introduction to these topics. Some familiarity 12 | with @{$distributed$distributed TensorFlow} 13 | will also be helpful. 14 | 15 | This document is for developers who want to extend TensorFlow in some way not 16 | supported by current APIs, hardware engineers who want to optimize for 17 | TensorFlow, implementers of machine learning systems working on scaling and 18 | distribution, or anyone who wants to look under Tensorflow's hood. After 19 | reading it you should understand TensorFlow architecture well enough to read 20 | and modify the core TensorFlow code. 21 | 22 | ## Overview 23 | 24 | The TensorFlow runtime is a cross-platform library. Figure 1 illustrates its 25 | general architecture. A C API separates user level code in different languages 26 | from the core runtime. 27 | 28 | ![TensorFlow Layers](../images/layers.png){: width="300"} 29 | 30 | **Figure 1** 31 | 32 | 33 | This document focuses on the following layers: 34 | 35 | * **Client**: 36 | * Defines the computation as a dataflow graph. 37 | * Initiates graph execution using a [**session**]( 38 | https://www.tensorflow.org/code/tensorflow/python/client/session.py) 39 | * **Distributed Master** 40 | * Prunes a specific subgraph from the graph, as defined by the arguments 41 | to Session.run(). 42 | * Partitions the subgraph into multiple pieces that run in different 43 | processes and devices. 44 | * Distributes the graph pieces to worker services. 45 | * Initiates graph piece execution by worker services. 46 | * **Worker Services** (one for each task) 47 | * Schedule the execution of graph operations using kernel implementations 48 | appropriate to the available hardware (CPUs, GPUs, etc). 49 | * Send and receive operation results to and from other worker services. 50 | * **Kernel Implementations** 51 | * Perform the computation for individual graph operations. 52 | 53 | Figure 2 illustrates the interaction of these components. "/job:worker/task:0" and 54 | "/job:ps/task:0" are both tasks with worker services. "PS" stands for "parameter 55 | server": a task responsible for storing and updating the model's parameters. 56 | Other tasks send updates to these parameters as they work on optimizing the 57 | parameters. This particular division of labor between tasks is not required, but 58 | it is common for distributed training. 59 | 60 | ![TensorFlow Architecture Diagram](../images/diag1.svg){: width="500"} 61 | 62 | **Figure 2** 63 | 64 | Note that the Distributed Master and Worker Service only exist in 65 | distributed TensorFlow. The single-process version of TensorFlow includes a 66 | special Session implementation that does everything the distributed master does 67 | but only communicates with devices in the local process. 68 | 69 | The following sections describe the core TensorFlow layers in greater detail and 70 | step through the processing of an example graph. 71 | 72 | ## Client 73 | 74 | Users write the client TensorFlow program that builds the computation graph. 75 | This program can either directly compose individual operations or use a 76 | convenience library like the Estimators API to compose neural network layers and 77 | other higher-level abstractions. TensorFlow supports multiple client 78 | languages, and we have prioritized Python and C++, because our internal users 79 | are most familiar with these languages. As features become more established, 80 | we typically port them to C++, so that users can access an optimized 81 | implementation from all client languages. Most of the training libraries are 82 | still Python-only, but C++ does have support for efficient inference. 83 | 84 | The client creates a session, which sends the graph definition to the 85 | distributed master as a @{tf.GraphDef} 86 | protocol buffer. When the client evaluates a node or nodes in the 87 | graph, the evaluation triggers a call to the distributed master to initiate 88 | computation. 89 | 90 | In Figure 3, the client has built a graph that applies weights (w) to a 91 | feature vector (x), adds a bias term (b) and saves the result in a variable 92 | (s). 93 | 94 | ![TensorFlow Architecture Diagram: Client](../images/graph_client.svg){: width="700"} 95 | 96 | **Figure 3** 97 | 98 | ### Code 99 | 100 | * @{tf.Session} 101 | 102 | ## Distributed master 103 | 104 | The distributed master: 105 | 106 | * prunes the graph to obtain the subgraph required to evaluate the nodes 107 | requested by the client, 108 | * partitions the graph to obtain graph pieces for 109 | each participating device, and 110 | * caches these pieces so that they may be re-used in subsequent steps. 111 | 112 | Since the master sees the overall computation for 113 | a step, it applies standard optimizations such as common subexpression 114 | elimination and constant folding. It then coordinates execution of the 115 | optimized subgraphs across a set of tasks. 116 | 117 | ![TensorFlow Architecture Diagram: Master](../images/graph_master_cln.svg){: width="700"} 118 | 119 | **Figure 4** 120 | 121 | 122 | Figure 5 shows a possible partition of our example graph. The distributed 123 | master has grouped the model parameters in order to place them together on the 124 | parameter server. 125 | 126 | ![Partitioned Graph](../images/graph_split1.svg){: width="700"} 127 | 128 | **Figure 5** 129 | 130 | 131 | Where graph edges are cut by the partition, the distributed master inserts 132 | send and receive nodes to pass information between the distributed tasks 133 | (Figure 6). 134 | 135 | ![Partitioned Graph](../images/graph_split2.svg){: width="700"} 136 | 137 | **Figure 6** 138 | 139 | 140 | The distributed master then ships the graph pieces to the distributed tasks. 141 | 142 | ![Partitioned Graph](../images/graph_workers_cln.svg){: width="700"} 143 | 144 | **Figure 7** 145 | 146 | ### Code 147 | 148 | * [MasterService API definition](https://www.tensorflow.org/code/tensorflow/core/protobuf/master_service.proto) 149 | * [Master interface](https://www.tensorflow.org/code/tensorflow/core/distributed_runtime/master_interface.h) 150 | 151 | ## Worker Service 152 | 153 | The worker service in each task: 154 | 155 | * handles requests from the master, 156 | * schedules the execution of the kernels for the operations that comprise a 157 | local subgraph, and 158 | * mediates direct communication between tasks. 159 | 160 | We optimize the worker service for running large graphs with low overhead. Our 161 | current implementation can execute tens of thousands of subgraphs per second, 162 | which enables a large number of replicas to make rapid, fine-grained training 163 | steps. The worker service dispatches kernels to local devices and runs kernels 164 | in parallel when possible, for example by using multiple CPU cores or GPU 165 | streams. 166 | 167 | We specialize Send and Recv operations for each pair of source and destination 168 | device types: 169 | 170 | * Transfers between local CPU and GPU devices use the 171 | `cudaMemcpyAsync()` API to overlap computation and data transfer. 172 | * Transfers between two local GPUs use peer-to-peer DMA, to avoid an expensive 173 | copy via the host CPU. 174 | 175 | For transfers between tasks, TensorFlow uses multiple protocols, including: 176 | 177 | * gRPC over TCP. 178 | * RDMA over Converged Ethernet. 179 | 180 | We also have preliminary support for NVIDIA's NCCL library for multi-GPU 181 | communication (see [`tf.contrib.nccl`]( 182 | https://www.tensorflow.org/code/tensorflow/contrib/nccl/python/ops/nccl_ops.py)). 183 | 184 | ![Partitioned Graph](../images/graph_send_recv.svg){: width="700"} 185 | 186 | **Figure 8** 187 | 188 | ### Code 189 | 190 | * [WorkerService API definition](https://www.tensorflow.org/code/tensorflow/core/protobuf/worker_service.proto) 191 | * [Worker interface](https://www.tensorflow.org/code/tensorflow/core/distributed_runtime/worker_interface.h) 192 | * [Remote rendezvous (for Send and Recv implementations)](https://www.tensorflow.org/code/tensorflow/core/distributed_runtime/rpc/rpc_rendezvous_mgr.h) 193 | 194 | ## Kernel Implementations 195 | 196 | The runtime contains over 200 standard operations, including mathematical, array 197 | manipulation, control flow, and state management operations. Each of these 198 | operations can have kernel implementations optimized for a variety of devices. 199 | Many of the operation kernels are implemented using Eigen::Tensor, which uses 200 | C++ templates to generate efficient parallel code for multicore CPUs and GPUs; 201 | however, we liberally use libraries like cuDNN where a more efficient kernel 202 | implementation is possible. We have also implemented 203 | @{$quantization$quantization}, which enables 204 | faster inference in environments such as mobile devices and high-throughput 205 | datacenter applications, and use the 206 | [gemmlowp](https://github.com/google/gemmlowp) low-precision matrix library to 207 | accelerate quantized computation. 208 | 209 | If it is difficult or inefficient to represent a subcomputation as a composition 210 | of operations, users can register additional kernels that provide an efficient 211 | implementation written in C++. For example, we recommend registering your own 212 | fused kernels for some performance critical operations, such as the ReLU and 213 | Sigmoid activation functions and their corresponding gradients. The @{$xla$XLA Compiler} has an 214 | experimental implementation of automatic kernel fusion. 215 | 216 | ### Code 217 | 218 | * [`OpKernel` interface](https://www.tensorflow.org/code/tensorflow/core/framework/op_kernel.h) 219 | -------------------------------------------------------------------------------- /get_started/summaries_and_tensorboard.md: -------------------------------------------------------------------------------- 1 | # TensorBoard: Visualizing Learning 2 | 3 | The computations you'll use TensorFlow for - like training a massive 4 | deep neural network - can be complex and confusing. To make it easier to 5 | understand, debug, and optimize TensorFlow programs, we've included a suite of 6 | visualization tools called TensorBoard. You can use TensorBoard to visualize 7 | your TensorFlow graph, plot quantitative metrics about the execution of your 8 | graph, and show additional data like images that pass through it. When 9 | TensorBoard is fully configured, it looks like this: 10 | 11 | ![MNIST TensorBoard](../images/mnist_tensorboard.png "MNIST TensorBoard") 12 | 13 |
14 | 17 |
18 | 19 | This tutorial is intended to get you started with simple TensorBoard usage. 20 | There are other resources available as well! The [TensorBoard README](https://www.tensorflow.org/code/tensorflow/tensorboard/README.md) 21 | has a lot more information on TensorBoard usage, including tips & tricks, and 22 | debugging information. 23 | 24 | ## Serializing the data 25 | 26 | TensorBoard operates by reading TensorFlow events files, which contain summary 27 | data that you can generate when running TensorFlow. Here's the general 28 | lifecycle for summary data within TensorBoard. 29 | 30 | First, create the TensorFlow graph that you'd like to collect summary 31 | data from, and decide which nodes you would like to annotate with 32 | @{$python/summary$summary operations}. 33 | 34 | For example, suppose you are training a convolutional neural network for 35 | recognizing MNIST digits. You'd like to record how the learning rate 36 | varies over time, and how the objective function is changing. Collect these by 37 | attaching @{tf.summary.scalar} ops 38 | to the nodes that output the learning rate and loss respectively. Then, give 39 | each `scalar_summary` a meaningful `tag`, like `'learning rate'` or `'loss 40 | function'`. 41 | 42 | Perhaps you'd also like to visualize the distributions of activations coming 43 | off a particular layer, or the distribution of gradients or weights. Collect 44 | this data by attaching 45 | @{tf.summary.histogram} ops to 46 | the gradient outputs and to the variable that holds your weights, respectively. 47 | 48 | For details on all of the summary operations available, check out the docs on 49 | @{$python/summary$summary operations}. 50 | 51 | Operations in TensorFlow don't do anything until you run them, or an op that 52 | depends on their output. And the summary nodes that we've just created are 53 | peripheral to your graph: none of the ops you are currently running depend on 54 | them. So, to generate summaries, we need to run all of these summary nodes. 55 | Managing them by hand would be tedious, so use 56 | @{tf.summary.merge_all} 57 | to combine them into a single op that generates all the summary data. 58 | 59 | Then, you can just run the merged summary op, which will generate a serialized 60 | `Summary` protobuf object with all of your summary data at a given step. 61 | Finally, to write this summary data to disk, pass the summary protobuf to a 62 | @{tf.summary.FileWriter}. 63 | 64 | The `FileWriter` takes a logdir in its constructor - this logdir is quite 65 | important, it's the directory where all of the events will be written out. 66 | Also, the `FileWriter` can optionally take a `Graph` in its constructor. 67 | If it receives a `Graph` object, then TensorBoard will visualize your graph 68 | along with tensor shape information. This will give you a much better sense of 69 | what flows through the graph: see 70 | @{$graph_viz#tensor-shape-information$Tensor shape information}. 71 | 72 | Now that you've modified your graph and have a `FileWriter`, you're ready to 73 | start running your network! If you want, you could run the merged summary op 74 | every single step, and record a ton of training data. That's likely to be more 75 | data than you need, though. Instead, consider running the merged summary op 76 | every `n` steps. 77 | 78 | The code example below is a modification of the 79 | @{$beginners$simple MNIST tutorial}, 80 | in which we have added some summary ops, and run them every ten steps. If you 81 | run this and then launch `tensorboard --logdir=/tmp/mnist_logs`, you'll be able 82 | to visualize statistics, such as how the weights or accuracy varied during 83 | training. The code below is an excerpt; full source is 84 | [here](https://www.tensorflow.org/code/tensorflow/examples/tutorials/mnist/mnist_with_summaries.py). 85 | 86 | ```python 87 | def variable_summaries(var): 88 | """Attach a lot of summaries to a Tensor (for TensorBoard visualization).""" 89 | with tf.name_scope('summaries'): 90 | mean = tf.reduce_mean(var) 91 | tf.summary.scalar('mean', mean) 92 | with tf.name_scope('stddev'): 93 | stddev = tf.sqrt(tf.reduce_mean(tf.square(var - mean))) 94 | tf.summary.scalar('stddev', stddev) 95 | tf.summary.scalar('max', tf.reduce_max(var)) 96 | tf.summary.scalar('min', tf.reduce_min(var)) 97 | tf.summary.histogram('histogram', var) 98 | 99 | def nn_layer(input_tensor, input_dim, output_dim, layer_name, act=tf.nn.relu): 100 | """Reusable code for making a simple neural net layer. 101 | 102 | It does a matrix multiply, bias add, and then uses relu to nonlinearize. 103 | It also sets up name scoping so that the resultant graph is easy to read, 104 | and adds a number of summary ops. 105 | """ 106 | # Adding a name scope ensures logical grouping of the layers in the graph. 107 | with tf.name_scope(layer_name): 108 | # This Variable will hold the state of the weights for the layer 109 | with tf.name_scope('weights'): 110 | weights = weight_variable([input_dim, output_dim]) 111 | variable_summaries(weights) 112 | with tf.name_scope('biases'): 113 | biases = bias_variable([output_dim]) 114 | variable_summaries(biases) 115 | with tf.name_scope('Wx_plus_b'): 116 | preactivate = tf.matmul(input_tensor, weights) + biases 117 | tf.summary.histogram('pre_activations', preactivate) 118 | activations = act(preactivate, name='activation') 119 | tf.summary.histogram('activations', activations) 120 | return activations 121 | 122 | hidden1 = nn_layer(x, 784, 500, 'layer1') 123 | 124 | with tf.name_scope('dropout'): 125 | keep_prob = tf.placeholder(tf.float32) 126 | tf.summary.scalar('dropout_keep_probability', keep_prob) 127 | dropped = tf.nn.dropout(hidden1, keep_prob) 128 | 129 | # Do not apply softmax activation yet, see below. 130 | y = nn_layer(dropped, 500, 10, 'layer2', act=tf.identity) 131 | 132 | with tf.name_scope('cross_entropy'): 133 | # The raw formulation of cross-entropy, 134 | # 135 | # tf.reduce_mean(-tf.reduce_sum(y_ * tf.log(tf.softmax(y)), 136 | # reduction_indices=[1])) 137 | # 138 | # can be numerically unstable. 139 | # 140 | # So here we use tf.nn.softmax_cross_entropy_with_logits on the 141 | # raw outputs of the nn_layer above, and then average across 142 | # the batch. 143 | diff = tf.nn.softmax_cross_entropy_with_logits(targets=y_, logits=y) 144 | with tf.name_scope('total'): 145 | cross_entropy = tf.reduce_mean(diff) 146 | tf.summary.scalar('cross_entropy', cross_entropy) 147 | 148 | with tf.name_scope('train'): 149 | train_step = tf.train.AdamOptimizer(FLAGS.learning_rate).minimize( 150 | cross_entropy) 151 | 152 | with tf.name_scope('accuracy'): 153 | with tf.name_scope('correct_prediction'): 154 | correct_prediction = tf.equal(tf.argmax(y, 1), tf.argmax(y_, 1)) 155 | with tf.name_scope('accuracy'): 156 | accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 157 | tf.summary.scalar('accuracy', accuracy) 158 | 159 | # Merge all the summaries and write them out to /tmp/mnist_logs (by default) 160 | merged = tf.summary.merge_all() 161 | train_writer = tf.summary.FileWriter(FLAGS.summaries_dir + '/train', 162 | sess.graph) 163 | test_writer = tf.summary.FileWriter(FLAGS.summaries_dir + '/test') 164 | tf.global_variables_initializer().run() 165 | ``` 166 | 167 | After we've initialized the `FileWriters`, we have to add summaries to the 168 | `FileWriters` as we train and test the model. 169 | 170 | ```python 171 | # Train the model, and also write summaries. 172 | # Every 10th step, measure test-set accuracy, and write test summaries 173 | # All other steps, run train_step on training data, & add training summaries 174 | 175 | def feed_dict(train): 176 | """Make a TensorFlow feed_dict: maps data onto Tensor placeholders.""" 177 | if train or FLAGS.fake_data: 178 | xs, ys = mnist.train.next_batch(100, fake_data=FLAGS.fake_data) 179 | k = FLAGS.dropout 180 | else: 181 | xs, ys = mnist.test.images, mnist.test.labels 182 | k = 1.0 183 | return {x: xs, y_: ys, keep_prob: k} 184 | 185 | for i in range(FLAGS.max_steps): 186 | if i % 10 == 0: # Record summaries and test-set accuracy 187 | summary, acc = sess.run([merged, accuracy], feed_dict=feed_dict(False)) 188 | test_writer.add_summary(summary, i) 189 | print('Accuracy at step %s: %s' % (i, acc)) 190 | else: # Record train set summaries, and train 191 | summary, _ = sess.run([merged, train_step], feed_dict=feed_dict(True)) 192 | train_writer.add_summary(summary, i) 193 | ``` 194 | 195 | You're now all set to visualize this data using TensorBoard. 196 | 197 | 198 | ## Launching TensorBoard 199 | 200 | To run TensorBoard, use the following command (alternatively `python -m 201 | tensorflow.tensorboard`) 202 | 203 | ```bash 204 | tensorboard --logdir=path/to/log-directory 205 | ``` 206 | 207 | where `logdir` points to the directory where the `FileWriter` serialized its 208 | data. If this `logdir` directory contains subdirectories which contain 209 | serialized data from separate runs, then TensorBoard will visualize the data 210 | from all of those runs. Once TensorBoard is running, navigate your web browser 211 | to `localhost:6006` to view the TensorBoard. 212 | 213 | When looking at TensorBoard, you will see the navigation tabs in the top right 214 | corner. Each tab represents a set of serialized data that can be visualized. 215 | 216 | For in depth information on how to use the *graph* tab to visualize your graph, 217 | see @{$graph_viz$TensorBoard: Graph Visualization}. 218 | 219 | For more usage information on TensorBoard in general, see the [TensorBoard 220 | README](https://www.tensorflow.org/code/tensorflow/tensorboard/README.md). 221 | -------------------------------------------------------------------------------- /tutorials/linear.md: -------------------------------------------------------------------------------- 1 | # Large-scale Linear Models with TensorFlow 2 | 3 | The tf.learn API provides (among other things) a rich set of tools for working 4 | with linear models in TensorFlow. This document provides an overview of those 5 | tools. It explains: 6 | 7 | * what a linear model is. 8 | * why you might want to use a linear model. 9 | * how tf.learn makes it easy to build linear models in TensorFlow. 10 | * how you can use tf.learn to combine linear models with 11 | deep learning to get the advantages of both. 12 | 13 | Read this overview to decide whether the tf.learn linear model tools might be 14 | useful to you. Then do the @{$wide$Linear Models tutorial} to 15 | give it a try. This overview uses code samples from the tutorial, but the 16 | tutorial walks through the code in greater detail. 17 | 18 | To understand this overview it will help to have some familiarity 19 | with basic machine learning concepts, and also with 20 | @{$tflearn$tf.learn}. 21 | 22 | [TOC] 23 | 24 | ## What is a linear model? 25 | 26 | A *linear model* uses a single weighted sum of features to make a prediction. 27 | For example, if you have [data](https://archive.ics.uci.edu/ml/machine-learning-databases/adult/adult.names) 28 | on age, years of education, and weekly hours of 29 | work for a population, you can learn weights for each of those numbers so that 30 | their weighted sum estimates a person's salary. You can also use linear models 31 | for classification. 32 | 33 | Some linear models transform the weighted sum into a more convenient form. For 34 | example, *logistic regression* plugs the weighted sum into the logistic 35 | function to turn the output into a value between 0 and 1. But you still just 36 | have one weight for each input feature. 37 | 38 | ## Why would you want to use a linear model? 39 | 40 | Why would you want to use so simple a model when recent research has 41 | demonstrated the power of more complex neural networks with many layers? 42 | 43 | Linear models: 44 | 45 | * train quickly, compared to deep neural nets. 46 | * can work well on very large feature sets. 47 | * can be trained with algorithms that don't require a lot of fiddling 48 | with learning rates, etc. 49 | * can be interpreted and debugged more easily than neural nets. 50 | You can examine the weights assigned to each feature to figure out what's 51 | having the biggest impact on a prediction. 52 | * provide an excellent starting point for learning about machine learning. 53 | * are widely used in industry. 54 | 55 | ## How does tf.learn help you build linear models? 56 | 57 | You can build a linear model from scratch in TensorFlow without the help of a 58 | special API. But tf.learn provides some tools that make it easier to build 59 | effective large-scale linear models. 60 | 61 | ### Feature columns and transformations 62 | 63 | Much of the work of designing a linear model consists of transforming raw data 64 | into suitable input features. tf.learn uses the `FeatureColumn` abstraction to 65 | enable these transformations. 66 | 67 | A `FeatureColumn` represents a single feature in your data. A `FeatureColumn` 68 | may represent a quantity like 'height', or it may represent a category like 69 | 'eye_color' where the value is drawn from a set of discrete possibilities like {'blue', 'brown', 'green'}. 70 | 71 | In the case of both *continuous features* like 'height' and *categorical 72 | features* like 'eye_color', a single value in the data might get transformed 73 | into a sequence of numbers before it is input into the model. The 74 | `FeatureColumn` abstraction lets you manipulate the feature as a single 75 | semantic unit in spite of this fact. You can specify transformations and 76 | select features to include without dealing with specific indices in the 77 | tensors you feed into the model. 78 | 79 | #### Sparse columns 80 | 81 | Categorical features in linear models are typically translated into a sparse 82 | vector in which each possible value has a corresponding index or id. For 83 | example, if there are only three possible eye colors you can represent 84 | 'eye_color' as a length 3 vector: 'brown' would become [1, 0, 0], 'blue' would 85 | become [0, 1, 0] and 'green' would become [0, 0, 1]. These vectors are called 86 | "sparse" because they may be very long, with many zeros, when the set of 87 | possible values is very large (such as all English words). 88 | 89 | While you don't need to use sparse columns to use tf.learn linear models, one 90 | of the strengths of linear models is their ability to deal with large sparse 91 | vectors. Sparse features are a primary use case for the tf.learn linear model 92 | tools. 93 | 94 | ##### Encoding sparse columns 95 | 96 | `FeatureColumn` handles the conversion of categorical values into vectors 97 | automatically, with code like this: 98 | 99 | ```python 100 | eye_color = tf.contrib.layers.sparse_column_with_keys( 101 | column_name="eye_color", keys=["blue", "brown", "green"]) 102 | ``` 103 | 104 | where `eye_color` is the name of a column in your source data. 105 | 106 | You can also generate `FeatureColumn`s for categorical features for which you 107 | don't know all possible values. For this case you would use 108 | `sparse_column_with_hash_bucket()`, which uses a hash function to assign 109 | indices to feature values. 110 | 111 | ```python 112 | education = tf.contrib.layers.sparse_column_with_hash_bucket(\ 113 | "education", hash_bucket_size=1000) 114 | ``` 115 | 116 | ##### Feature Crosses 117 | 118 | Because linear models assign independent weights to separate features, they 119 | can't learn the relative importance of specific combinations of feature 120 | values. If you have a feature 'favorite_sport' and a feature 'home_city' and 121 | you're trying to predict whether a person likes to wear red, your linear model 122 | won't be able to learn that baseball fans from St. Louis especially like to 123 | wear red. 124 | 125 | You can get around this limitation by creating a new feature 126 | 'favorite_sport_x_home_city'. The value of this feature for a given person is 127 | just the concatenation of the values of the two source features: 128 | 'baseball_x_stlouis', for example. This sort of combination feature is called 129 | a *feature cross*. 130 | 131 | The `crossed_column()` method makes it easy to set up feature crosses: 132 | 133 | ```python 134 | sport = tf.contrib.layers.sparse_column_with_hash_bucket(\ 135 | "sport", hash_bucket_size=1000) 136 | city = tf.contrib.layers.sparse_column_with_hash_bucket(\ 137 | "city", hash_bucket_size=1000) 138 | sport_x_city = tf.contrib.layers.crossed_column( 139 | [sport, city], hash_bucket_size=int(1e4)) 140 | ``` 141 | 142 | #### Continuous columns 143 | 144 | You can specify a continuous feature like so: 145 | 146 | ```python 147 | age = tf.contrib.layers.real_valued_column("age") 148 | ``` 149 | 150 | Although, as a single real number, a continuous feature can often be input 151 | directly into the model, tf.learn offers useful transformations for this sort 152 | of column as well. 153 | 154 | ##### Bucketization 155 | 156 | *Bucketization* turns a continuous column into a categorical column. This 157 | transformation lets you use continuous features in feature crosses, or learn 158 | cases where specific value ranges have particular importance. 159 | 160 | Bucketization divides the range of possible values into subranges called 161 | buckets: 162 | 163 | ```python 164 | age_buckets = tf.contrib.layers.bucketized_column( 165 | age, boundaries=[18, 25, 30, 35, 40, 45, 50, 55, 60, 65]) 166 | ``` 167 | 168 | The bucket into which a value falls becomes the categorical label for 169 | that value. 170 | 171 | #### Input function 172 | 173 | `FeatureColumn`s provide a specification for the input data for your model, 174 | indicating how to represent and transform the data. But they do not provide 175 | the data itself. You provide the data through an input function. 176 | 177 | The input function must return a dictionary of tensors. Each key corresponds to 178 | the name of a `FeatureColumn`. Each key's value is a tensor containing the 179 | values of that feature for all data instances. See 180 | @{$input_fn$Building Input Functions with tf.contrib.learn} for a 181 | more comprehensive look at input functions, and `input_fn` in the 182 | [linear models tutorial code](https://www.tensorflow.org/code/tensorflow/examples/learn/wide_n_deep_tutorial.py) 183 | for an example implementation of an input function. 184 | 185 | The input function is passed to the `fit()` and `evaluate()` calls that 186 | initiate training and testing, as described in the next section. 187 | 188 | ### Linear estimators 189 | 190 | tf.learn's estimator classes provide a unified training and evaluation harness 191 | for regression and classification models. They take care of the details of the 192 | training and evaluation loops and allow the user to focus on model inputs and 193 | architecture. 194 | 195 | To build a linear estimator, you can use either the 196 | `tf.contrib.learn.LinearClassifier` estimator or the 197 | `tf.contrib.learn.LinearRegressor` estimator, for classification and 198 | regression respectively. 199 | 200 | As with all tf.learn estimators, to run the estimator you just: 201 | 202 | 1. Instantiate the estimator class. For the two linear estimator classes, 203 | you pass a list of `FeatureColumn`s to the constructor. 204 | 2. Call the estimator's `fit()` method to train it. 205 | 3. Call the estimator's `evaluate()` method to see how it does. 206 | 207 | For example: 208 | 209 | ```python 210 | e = tf.contrib.learn.LinearClassifier(feature_columns=[ 211 | native_country, education, occupation, workclass, marital_status, 212 | race, age_buckets, education_x_occupation, age_buckets_x_race_x_occupation], 213 | model_dir=YOUR_MODEL_DIRECTORY) 214 | e.fit(input_fn=input_fn_train, steps=200) 215 | # Evaluate for one step (one pass through the test data). 216 | results = e.evaluate(input_fn=input_fn_test, steps=1) 217 | 218 | # Print the stats for the evaluation. 219 | for key in sorted(results): 220 | print("%s: %s" % (key, results[key])) 221 | ``` 222 | 223 | ### Wide and deep learning 224 | 225 | The tf.learn API also provides an estimator class that lets you jointly train 226 | a linear model and a deep neural network. This novel approach combines the 227 | ability of linear models to "memorize" key features with the generalization 228 | ability of neural nets. Use `tf.contrib.learn.DNNLinearCombinedClassifier` to 229 | create this sort of "wide and deep" model: 230 | 231 | ```python 232 | e = tf.contrib.learn.DNNLinearCombinedClassifier( 233 | model_dir=YOUR_MODEL_DIR, 234 | linear_feature_columns=wide_columns, 235 | dnn_feature_columns=deep_columns, 236 | dnn_hidden_units=[100, 50]) 237 | ``` 238 | For more information, see the @{$wide_and_deep$Wide and Deep Learning tutorial}. 239 | -------------------------------------------------------------------------------- /programmers_guide/meta_graph.md: -------------------------------------------------------------------------------- 1 | # Exporting and Importing a MetaGraph 2 | 3 | A [`MetaGraph`](https://www.tensorflow.org/code/tensorflow/core/protobuf/meta_graph.proto) contains both a TensorFlow GraphDef 4 | as well as associated metadata necessary for running computation in a 5 | graph when crossing a process boundary. It can also be used for long 6 | term storage of graphs. The MetaGraph contains the information required 7 | to continue training, perform evaluation, or run inference on a previously trained graph. 8 | 9 | The APIs for exporting and importing the complete model are in 10 | the @{tf.train.Saver} class: 11 | @{tf.train.export_meta_graph} 12 | and 13 | @{tf.train.import_meta_graph}. 14 | 15 | ## What's in a MetaGraph 16 | 17 | The information contained in a MetaGraph is expressed as a 18 | [`MetaGraphDef`](https://www.tensorflow.org/code/tensorflow/core/protobuf/meta_graph.proto) 19 | protocol buffer. It contains the following fields: 20 | 21 | * [`MetaInfoDef`](https://www.tensorflow.org/code/tensorflow/core/protobuf/meta_graph.proto) for meta information, such as version and other user information. 22 | * [`GraphDef`](https://www.tensorflow.org/code/tensorflow/core/framework/graph.proto) for describing the graph. 23 | * [`SaverDef`](https://www.tensorflow.org/code/tensorflow/core/protobuf/saver.proto) for the saver. 24 | * [`CollectionDef`](https://www.tensorflow.org/code/tensorflow/core/protobuf/meta_graph.proto) 25 | map that further describes additional components of the model, such as 26 | @{$python/state_ops$`Variables`}, 27 | @{tf.train.QueueRunner}, etc. In order for a Python object to be serialized 28 | to and from `MetaGraphDef`, the Python class must implement `to_proto()` and 29 | `from_proto()` methods, and register them with the system using 30 | `register_proto_function`. 31 | 32 | For example, 33 | 34 | ```Python 35 | def to_proto(self, export_scope=None): 36 | 37 | """Converts a `Variable` to a `VariableDef` protocol buffer. 38 | 39 | Args: 40 | export_scope: Optional `string`. Name scope to remove. 41 | 42 | Returns: 43 | A `VariableDef` protocol buffer, or `None` if the `Variable` is not 44 | in the specified name scope. 45 | """ 46 | if (export_scope is None or 47 | self._variable.name.startswith(export_scope)): 48 | var_def = variable_pb2.VariableDef() 49 | var_def.variable_name = ops.strip_name_scope( 50 | self._variable.name, export_scope) 51 | var_def.initializer_name = ops.strip_name_scope( 52 | self.initializer.name, export_scope) 53 | var_def.snapshot_name = ops.strip_name_scope( 54 | self._snapshot.name, export_scope) 55 | if self._save_slice_info: 56 | var_def.save_slice_info_def.MergeFrom(self._save_slice_info.to_proto( 57 | export_scope=export_scope)) 58 | return var_def 59 | else: 60 | return None 61 | 62 | @staticmethod 63 | def from_proto(variable_def, import_scope=None): 64 | """Returns a `Variable` object created from `variable_def`.""" 65 | return Variable(variable_def=variable_def, import_scope=import_scope) 66 | 67 | ops.register_proto_function(ops.GraphKeys.GLOBAL_VARIABLES, 68 | proto_type=variable_pb2.VariableDef, 69 | to_proto=Variable.to_proto, 70 | from_proto=Variable.from_proto) 71 | ``` 72 | 73 | ## Exporting a Complete Model to MetaGraph 74 | 75 | The API for exporting a running model as a MetaGraph is `export_meta_graph()`. 76 | 77 | ```Python 78 | def export_meta_graph(filename=None, collection_list=None, as_text=False): 79 | """Writes `MetaGraphDef` to save_path/filename. 80 | 81 | Args: 82 | filename: Optional meta_graph filename including the path. 83 | collection_list: List of string keys to collect. 84 | as_text: If `True`, writes the meta_graph as an ASCII proto. 85 | 86 | Returns: 87 | A `MetaGraphDef` proto. 88 | """ 89 | ``` 90 | 91 | A `collection` can contain any Python objects that users would like to 92 | be able to uniquely identify and easily retrieve. These objects can be 93 | special operations in the graph, such as `train_op`, or hyper parameters, 94 | such as "learning rate". Users can specify the list of collections 95 | they would like to export. If no `collection_list` is specified, 96 | all collections in the model will be exported. 97 | 98 | The API returns a serialized protocol buffer. If `filename` is 99 | specified, the protocol buffer will also be written to a file. 100 | 101 | Here are some of the typical usage models: 102 | 103 | * Export the default running graph: 104 | 105 | ```Python 106 | # Build the model 107 | ... 108 | with tf.Session() as sess: 109 | # Use the model 110 | ... 111 | # Export the model to /tmp/my-model.meta. 112 | meta_graph_def = tf.train.export_meta_graph(filename='/tmp/my-model.meta') 113 | ``` 114 | 115 | * Export the default running graph and only a subset of the collections. 116 | 117 | ```Python 118 | meta_graph_def = tf.train.export_meta_graph( 119 | filename='/tmp/my-model.meta', 120 | collection_list=["input_tensor", "output_tensor"]) 121 | ``` 122 | 123 | 124 | The MetaGraph is also automatically exported via the `save()` API in 125 | @{tf.train.Saver}. 126 | 127 | 128 | ## Import a MetaGraph 129 | 130 | The API for importing a MetaGraph file into a graph is `import_meta_graph()`. 131 | 132 | Here are some of the typical usage models: 133 | 134 | * Import and continue training without building the model from scratch. 135 | 136 | ```Python 137 | ... 138 | # Create a saver. 139 | saver = tf.train.Saver(...variables...) 140 | # Remember the training_op we want to run by adding it to a collection. 141 | tf.add_to_collection('train_op', train_op) 142 | sess = tf.Session() 143 | for step in xrange(1000000): 144 | sess.run(train_op) 145 | if step % 1000 == 0: 146 | # Saves checkpoint, which by default also exports a meta_graph 147 | # named 'my-model-global_step.meta'. 148 | saver.save(sess, 'my-model', global_step=step) 149 | ``` 150 | 151 | Later we can continue training from this saved `meta_graph` without building 152 | the model from scratch. 153 | 154 | ```Python 155 | with tf.Session() as sess: 156 | new_saver = tf.train.import_meta_graph('my-save-dir/my-model-10000.meta') 157 | new_saver.restore(sess, 'my-save-dir/my-model-10000') 158 | # tf.get_collection() returns a list. In this example we only want the 159 | # first one. 160 | train_op = tf.get_collection('train_op')[0] 161 | for step in xrange(1000000): 162 | sess.run(train_op) 163 | ``` 164 | 165 | * Import and extend the graph. 166 | 167 | For example, we can first build an inference graph, export it as a meta graph: 168 | 169 | ```Python 170 | # Creates an inference graph. 171 | # Hidden 1 172 | images = tf.constant(1.2, tf.float32, shape=[100, 28]) 173 | with tf.name_scope("hidden1"): 174 | weights = tf.Variable( 175 | tf.truncated_normal([28, 128], 176 | stddev=1.0 / math.sqrt(float(28))), 177 | name="weights") 178 | biases = tf.Variable(tf.zeros([128]), 179 | name="biases") 180 | hidden1 = tf.nn.relu(tf.matmul(images, weights) + biases) 181 | # Hidden 2 182 | with tf.name_scope("hidden2"): 183 | weights = tf.Variable( 184 | tf.truncated_normal([128, 32], 185 | stddev=1.0 / math.sqrt(float(128))), 186 | name="weights") 187 | biases = tf.Variable(tf.zeros([32]), 188 | name="biases") 189 | hidden2 = tf.nn.relu(tf.matmul(hidden1, weights) + biases) 190 | # Linear 191 | with tf.name_scope("softmax_linear"): 192 | weights = tf.Variable( 193 | tf.truncated_normal([32, 10], 194 | stddev=1.0 / math.sqrt(float(32))), 195 | name="weights") 196 | biases = tf.Variable(tf.zeros([10]), 197 | name="biases") 198 | logits = tf.matmul(hidden2, weights) + biases 199 | tf.add_to_collection("logits", logits) 200 | 201 | init_all_op = tf.global_variables_initializer() 202 | 203 | with tf.Session() as sess: 204 | # Initializes all the variables. 205 | sess.run(init_all_op) 206 | # Runs to logit. 207 | sess.run(logits) 208 | # Creates a saver. 209 | saver0 = tf.train.Saver() 210 | saver0.save(sess, 'my-save-dir/my-model-10000') 211 | # Generates MetaGraphDef. 212 | saver0.export_meta_graph('my-save-dir/my-model-10000.meta') 213 | ``` 214 | 215 | Then later import it and extend it to a training graph. 216 | 217 | ```Python 218 | with tf.Session() as sess: 219 | new_saver = tf.train.import_meta_graph('my-save-dir/my-model-10000.meta') 220 | new_saver.restore(sess, 'my-save-dir/my-model-10000') 221 | # Addes loss and train. 222 | labels = tf.constant(0, tf.int32, shape=[100], name="labels") 223 | batch_size = tf.size(labels) 224 | labels = tf.expand_dims(labels, 1) 225 | indices = tf.expand_dims(tf.range(0, batch_size), 1) 226 | concated = tf.concat([indices, labels], 1) 227 | onehot_labels = tf.sparse_to_dense( 228 | concated, tf.stack([batch_size, 10]), 1.0, 0.0) 229 | logits = tf.get_collection("logits")[0] 230 | cross_entropy = tf.nn.softmax_cross_entropy_with_logits( 231 | labels=onehot_labels, logits=logits, name="xentropy") 232 | loss = tf.reduce_mean(cross_entropy, name="xentropy_mean") 233 | 234 | tf.summary.scalar('loss', loss) 235 | # Creates the gradient descent optimizer with the given learning rate. 236 | optimizer = tf.train.GradientDescentOptimizer(0.01) 237 | 238 | # Runs train_op. 239 | train_op = optimizer.minimize(loss) 240 | sess.run(train_op) 241 | ``` 242 | 243 | * Import a graph with preset devices. 244 | 245 | Sometimes an exported meta graph is from a training environment that the 246 | importer doesn't have. For example, the model might have been trained 247 | on GPUs, or in a distributed environment with replicas. When importing 248 | such models, it's useful to be able to clear the device settings in 249 | the graph so that we can run it on locally available devices. This can 250 | be achieved by calling `import_meta_graph` with the `clear_devices` 251 | option set to `True`. 252 | 253 | ```Python 254 | with tf.Session() as sess: 255 | new_saver = tf.train.import_meta_graph('my-save-dir/my-model-10000.meta', 256 | clear_devices=True) 257 | new_saver.restore(sess, 'my-save-dir/my-model-10000') 258 | ... 259 | ``` 260 | 261 | * Import within the default graph. 262 | 263 | Sometimes you might want to run `export_meta_graph` and `import_meta_graph` 264 | in codelab using the default graph. In that case, you need to reset 265 | the default graph by calling `tf.reset_default_graph()` first before 266 | running import. 267 | 268 | ```Python 269 | meta_graph_def = tf.train.export_meta_graph() 270 | ... 271 | tf.reset_default_graph() 272 | ... 273 | tf.train.import_meta_graph(meta_graph_def) 274 | ... 275 | ``` 276 | 277 | * Retrieve Hyper Parameters 278 | 279 | ```Python 280 | filename = ".".join([tf.train.latest_checkpoint(train_dir), "meta"]) 281 | tf.train.import_meta_graph(filename) 282 | hparams = tf.get_collection("hparams") 283 | ``` 284 | -------------------------------------------------------------------------------- /extend/add_filesys.md: -------------------------------------------------------------------------------- 1 | # Adding a Custom Filesystem Plugin 2 | 3 | ## Background 4 | 5 | The TensorFlow framework is often used in multi-process and 6 | multi-machine environments, such as Google data centers, Google Cloud 7 | Machine Learning, Amazon Web Services (AWS), and on-site distributed clusters. 8 | In order to both share and save certain types of state produced by TensorFlow, 9 | the framework assumes the existence of a reliable, shared filesystem. This 10 | shared filesystem has numerous uses, for example: 11 | 12 | * Checkpoints of state are often saved to a distributed filesystem for 13 | reliability and fault-tolerance. 14 | * Training processes communicate with TensorBoard by writing event files 15 | to a directory, which TensorBoard watches. A shared filesystem allows this 16 | communication to work even when TensorBoard runs in a different process or 17 | machine. 18 | 19 | There are many different implementations of shared or distributed filesystems in 20 | the real world, so TensorFlow provides an ability for users to implement a 21 | custom FileSystem plugin that can be registered with the TensorFlow runtime. 22 | When the TensorFlow runtime attempts to write to a file through the `FileSystem` 23 | interface, it uses a portion of the pathname to dynamically select the 24 | implementation that should be used for filesystem operations. Thus, adding 25 | support for your custom filesystem requires implementing a `FileSystem` 26 | interface, building a shared object containing that implementation, and loading 27 | that object at runtime in whichever process needs to write to that filesystem. 28 | 29 | Note that TensorFlow already includes many filesystem implementations, such as: 30 | 31 | * A standard POSIX filesystem 32 | 33 | Note: NFS filesystems often mount as a POSIX interface, and so standard 34 | TensorFlow can work on top of NFS-mounted remote filesystems. 35 | * HDFS - the Hadoop File System 36 | * GCS - Google Cloud Storage filesystem 37 | * A "memory-mapped-file" filesystem 38 | 39 | The rest of this guide describes how to implement a custom filesystem. 40 | 41 | ## Implementing a custom filesystem plugin 42 | 43 | To implement a custom filesystem plugin, you must do the following: 44 | 45 | * Implement subclasses of `RandomAccessFile`, `WriteableFile`, 46 | `AppendableFile`, and `ReadOnlyMemoryRegion`. 47 | * Implement the `FileSystem` interface as a subclass. 48 | * Register the `FileSystem` implementation with an appropriate prefix pattern. 49 | * Load the filesystem plugin in a process that wants to write to that 50 | filesystem. 51 | 52 | ### The FileSystem interface 53 | 54 | The `FileSystem` interface is an abstract C++ interface defined in 55 | [file_system.h](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/file_system.h). 56 | An implementation of the `FileSystem` interface should implement all relevant 57 | the methods defined by the interface. Implementing the interface requires 58 | defining operations such as creating `RandomAccessFile`, `WritableFile`, and 59 | implementing standard filesystem operations such as `FileExists`, `IsDirectory`, 60 | `GetMatchingPaths`, `DeleteFile`, and so on. An implementation of these 61 | interfaces will often involve translating the function's input arguments to 62 | delegate to an already-existing library function implementing the equivalent 63 | functionality in your custom filesystem. 64 | 65 | For example, the `PosixFileSystem` implementation implements `DeleteFile` using 66 | the POSIX `unlink()` function; `CreateDir` simply calls `mkdir()`; `GetFileSize` 67 | involves calling `stat()` on the file and then returns the filesize as reported 68 | by the return of the stat object. Similarly, for the `HDFSFileSystem` 69 | implementation, these calls simply delegate to the `libHDFS` implementation of 70 | similar functionality, such as `hdfsDelete` for 71 | [DeleteFile](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/hadoop/hadoop_file_system.cc#L386). 72 | 73 | We suggest looking through these code examples to get an idea of how different 74 | filesystem implementations call their existing libraries. Examples include: 75 | 76 | * [POSIX 77 | plugin](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/posix/posix_file_system.h) 78 | * [HDFS 79 | plugin](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/hadoop/hadoop_file_system.h) 80 | * [GCS 81 | plugin](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/cloud/gcs_file_system.h) 82 | 83 | #### The File interfaces 84 | 85 | Beyond operations that allow you to query and manipulate files and directories 86 | in a filesystem, the `FileSystem` interface requires you to implement factories 87 | that return implementations of abstract objects such as the 88 | [RandomAccessFile](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/core/platform/file_system.h#L223), 89 | the `WritableFile`, so that TensorFlow code and read and write to files in that 90 | `FileSystem` implementation. 91 | 92 | To implement a `RandomAccessFile`, you must implement a single interface called 93 | `Read()`, in which the implementation must provide a way to read from an offset 94 | within a named file. 95 | 96 | For example, below is the implementation of RandomAccessFile for the POSIX 97 | filesystem, which uses the `pread()` random-access POSIX function to implement 98 | read. Notice that the particular implementation must know how to retry or 99 | propagate errors from the underlying filesystem. 100 | 101 | ```C++ 102 | class PosixRandomAccessFile : public RandomAccessFile { 103 | public: 104 | PosixRandomAccessFile(const string& fname, int fd) 105 | : filename_(fname), fd_(fd) {} 106 | ~PosixRandomAccessFile() override { close(fd_); } 107 | 108 | Status Read(uint64 offset, size_t n, StringPiece* result, 109 | char* scratch) const override { 110 | Status s; 111 | char* dst = scratch; 112 | while (n > 0 && s.ok()) { 113 | ssize_t r = pread(fd_, dst, n, static_cast(offset)); 114 | if (r > 0) { 115 | dst += r; 116 | n -= r; 117 | offset += r; 118 | } else if (r == 0) { 119 | s = Status(error::OUT_OF_RANGE, "Read less bytes than requested"); 120 | } else if (errno == EINTR || errno == EAGAIN) { 121 | // Retry 122 | } else { 123 | s = IOError(filename_, errno); 124 | } 125 | } 126 | *result = StringPiece(scratch, dst - scratch); 127 | return s; 128 | } 129 | 130 | private: 131 | string filename_; 132 | int fd_; 133 | }; 134 | ``` 135 | 136 | To implement the WritableFile sequential-writing abstraction, one must implement 137 | a few interfaces, such as `Append()`, `Flush()`, `Sync()`, and `Close()`. 138 | 139 | For example, below is the implementation of WritableFile for the POSIX 140 | filesystem, which takes a `FILE` object in its constructor and uses standard 141 | posix functions on that object to implement the interface. 142 | 143 | ```C++ 144 | class PosixWritableFile : public WritableFile { 145 | public: 146 | PosixWritableFile(const string& fname, FILE* f) 147 | : filename_(fname), file_(f) {} 148 | 149 | ~PosixWritableFile() override { 150 | if (file_ != NULL) { 151 | fclose(file_); 152 | } 153 | } 154 | 155 | Status Append(const StringPiece& data) override { 156 | size_t r = fwrite(data.data(), 1, data.size(), file_); 157 | if (r != data.size()) { 158 | return IOError(filename_, errno); 159 | } 160 | return Status::OK(); 161 | } 162 | 163 | Status Close() override { 164 | Status result; 165 | if (fclose(file_) != 0) { 166 | result = IOError(filename_, errno); 167 | } 168 | file_ = NULL; 169 | return result; 170 | } 171 | 172 | Status Flush() override { 173 | if (fflush(file_) != 0) { 174 | return IOError(filename_, errno); 175 | } 176 | return Status::OK(); 177 | } 178 | 179 | Status Sync() override { 180 | Status s; 181 | if (fflush(file_) != 0) { 182 | s = IOError(filename_, errno); 183 | } 184 | return s; 185 | } 186 | 187 | private: 188 | string filename_; 189 | FILE* file_; 190 | }; 191 | 192 | ``` 193 | 194 | For more details, please see the documentations of those interfaces, and look at 195 | example implementations for inspiration. 196 | 197 | ### Registering and loading the filesystem 198 | 199 | Once you have implemented the `FileSystem` implementation for your custom 200 | filesystem, you need to register it under a "scheme" so that paths prefixed with 201 | that scheme are directed to your implementation. To do this, you call 202 | `REGISTER_FILE_SYSTEM`:: 203 | 204 | ``` 205 | REGISTER_FILE_SYSTEM("foobar", FooBarFileSystem); 206 | ``` 207 | 208 | When TensorFlow tries to operate on a file whose path starts with `foobar://`, 209 | it will use the `FooBarFileSystem` implementation. 210 | 211 | ```C++ 212 | string filename = "foobar://path/to/file.txt"; 213 | std::unique_ptr file; 214 | 215 | // Calls FooBarFileSystem::NewWritableFile to return 216 | // a WritableFile class, which happens to be the FooBarFileSystem's 217 | // WritableFile implementation. 218 | TF_RETURN_IF_ERROR(env->NewWritableFile(filename, &file)); 219 | ``` 220 | 221 | Next, you must build a shared object containing this implementation. An example 222 | of doing so using bazel's `cc_binary` rule can be found 223 | [here](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/BUILD#L244), 224 | but you may use any build system to do so. See the section on @{$adding_an_op#build-the-op-library$building the op library} for similar 225 | instructions. 226 | 227 | The result of building this target is a `.so` shared object file. 228 | 229 | Lastly, you must dynamically load this implementation in the process. In Python, 230 | you can call the `tf.load_file_system_library(file_system_library)` function, 231 | passing the path to the shared object. Calling this in your client program loads 232 | the shared object in the process, thus registering your implementation as 233 | available for any file operations going through the `FileSystem` interface. You 234 | can see 235 | [test_file_system.py](https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/framework/file_system_test.py) 236 | for an example. 237 | 238 | ## What goes through this interface? 239 | 240 | Almost all core C++ file operations within TensorFlow use the `FileSystem` 241 | interface, such as the `CheckpointWriter`, the `EventsWriter`, and many other 242 | utilities. This means implementing a `FileSystem` implementation allows most of 243 | your TensorFlow programs to write to your shared filesystem. 244 | 245 | In Python, the `gfile` and `file_io` classes bind underneath to the `FileSystem 246 | implementation via SWIG, which means that once you have loaded this filesystem 247 | library, you can do: 248 | 249 | ``` 250 | with gfile.Open("foobar://path/to/file.txt") as w: 251 | 252 | w.write("hi") 253 | ``` 254 | 255 | When you do this, a file containing "hi" will appear in the "/path/to/file.txt" 256 | of your shared filesystem. 257 | --------------------------------------------------------------------------------