├── README.md └── setupTensorFlowGCE.sh /README.md: -------------------------------------------------------------------------------- 1 | # Tensor-Flow-on-Google-Compute-Engine 2 | 3 | **A simple script to set up GCE environment in Google Cloud so it is ready to run TensorFlow - an Open Source Software Library for Machine Intelligence. Machine learning made simple!** 4 | 5 | ![alt text](https://cloud.githubusercontent.com/assets/4972997/13024300/8889655a-d1a7-11e5-8bb5-5bb4e72bf21e.png "Recognizing Pandas on GCE!") 6 | 7 | ## Requirements 8 | 9 | You can probably get away with a lower spec machine (it may run slower - when compiling all vCPUs were used to 100%), but one thing you should pay attention to is RAM. Even with the 4 GB we have here, we still had to use SWAP when building at certain points. Given that hard drive disks on GCE are not physically attached to the machine (they are network attached I believe) this is not something you want to do often. 4 GB seemed to be the sweet spot where it spent most of its time in RAM except a 1 or 2 times where it used almost the full 8GB of RAM + Swap. 10 | 11 | Anyhow, here is my machine spec on GCE: 12 | 13 | * n1-highcpu-4 instance with 4 GB RAM 14 | * Running vanilla Ubuntu Trusty 14.04 LTS. 15 | * 20GB persistent disk (we shall be using 4GB of this for our swap partition, and the rest you will need to store all your images and such). You could probably get away with less, but this gives us some flexibility. As an FYI after I had everything installed with OS I had used just under 14GB of space. 16 | 17 | 18 | ## Usage / Quick Start 19 | 20 | A heads up of what your 5 minutes of fun will look like (sped up 400%): 21 | ![tfinstall](https://cloud.githubusercontent.com/assets/4972997/13024353/24cfb9d2-d1a8-11e5-9e61-3f5e81e8fe66.gif) 22 | 23 | Lets go... 24 | 25 | 1. Save the script to your home directory. ```git clone https://github.com/jasonmayes/Tensor-Flow-on-Google-Compute-Engine.git ``` 26 | 2. ```cd Tensor-Flow-on-Google-Compute-Engine``` 27 | 3. ```chmod +x setupTensorFlowGCE.sh``` 28 | 4. Edit file and remove swap sections if your machine has >= 8GB RAM. Then run: ```./setupTensorFlowGCE.sh``` and follow any instructions that appear (basically say yes to everything and accept Java licence). This will take about 5 mins to install everything if you are watching the screen :-) Once it has finished run ```source ~/.bashrc``` to ensure your terminal can find bazel. Alternatively you can just log out and in again. In the final step you will be asked to locate python, please use: 29 | /usr/bin/python unless you have a different version you wish to use. 30 | 5. Now the environment is setup we can compile TensorFlow. Ensure you are in correct directory: ```cd ~/tensorflow/tensorflow``` and then run: ```bazel build -c opt //tensorflow/tools/pip_package:build_pip_package```. This will take some time to compile. Grab a coffee. No really, we are looking at about 35 minutes here... 31 | 6. TensorFlow is now ready to be used! Woohoo! Run the included example to test: ```bazel run tensorflow/models/image/imagenet:classify_image``` (this will also take time if it is the first time you have run it). At the end of execution you should see the highest probablity is a "Panda" which is the example image we are testing when running this. 32 | 7. Optional: You may also want to compile the examples for image classification and labelling if you plan to use those: ```bazel build -c opt --copt=-mavx tensorflow/examples/image_retraining:retrain``` and ```bazel build -c opt --copt=-mavx tensorflow/examples/label_image``` These commands compile with AVX support to get the speed improvements described below. Please ensure your CPU supports AVX if you run these. Currently GCE VCPUs do at time of writing. 33 | 34 | 35 | ## Notes 36 | This script has been tried and tested within the Google Compute Engine environment. I have good faith it would work on other cloud services too assuming the base image of the OS was the same and was a 64bit CPU with AVX support. 37 | 38 | Once you have run this script, you can run the following commands to classify your own image using the default trained model provided by TensorFlow: 39 | 40 | ```shell 41 | cd /home/yourUsername/tensorflow/tensorflow 42 | ``` 43 | 44 | ```shell 45 | bazel-bin/tensorflow/models/image/imagenet/classify_image --image_file=foo.jpg 46 | ``` 47 | 48 | Also worthy of note is that in this script we fetch and compile Python from source. Depending on what repos you wish to add to your server you may be able to simplify this step by using this instead: 49 | 50 | ```shell 51 | sudo add-apt-repository ppa:fkrull/deadsnakes 52 | sudo apt-get update 53 | sudo apt-get install python2.7 54 | ``` 55 | 56 | 57 | ## Performance 58 | 59 | One must remember here that this is raw CPU based data. If you have GPU support in the cloud performance will probably be better, but at present GCE only has CPU based instances. That being said GCE supports AVX which you can use to speed things up. I have provided results both with and without below for comparison. 60 | 61 | I have tested peformance via a real world scenario of retraining one of the Tensor Flow examples of recognizing a custom object on a vareity of instance sizes. This is by no means a scientific test, simply what I observed, and the average taken. When you retrain the top layers of the model the system creates "bottlenecks" (a term referring to the layer just before the final output layer that actually does the classification) for all the input images. These take time to create and even on multi vCPU systems pushes all vCPUs close to 100%. So lets have some fun... 62 | 63 | ![cpuusage](https://cloud.githubusercontent.com/assets/4972997/13094360/0b3db572-d4bf-11e5-8555-acc9bf143987.gif) 64 | (Yep, I had to try on a 24 vCPU system just for fun...) 65 | 66 | **Input data:** 1920x1080 pixel resolution images (consider you will typically have 2000 - 3000 images as input data, maybe more, and each image has a coresponding "bottleneck" file needed to be generated). 67 | 68 | 69 | ### Results 70 | 71 | I got the following results on different instance sizes (without compiling AVX support): 72 | 73 | * **n1-highcpu-4** generated 4 bottlenecks per minute on average (4.1 hours per 1000 images). 74 | * **n1-highcpu-16** generated 15 bottlenecks per minute on average. (1.1 hours per 1000 images). 75 | * **n1-highcpu-24** generated 19 bottlenecks per minute on average. (53 mins per 1000 images). It should be noted that with a default Google account 24 CPUs is the maximum you can have in any one region without requesting an upgrade. 76 | 77 | However if we recompile the retrainer and labeler with AVX support using: 78 | 79 | ```shell 80 | bazel build -c opt --copt=-mavx tensorflow/examples/image_retraining:retrain 81 | bazel build -c opt --copt=-mavx tensorflow/examples/label_image 82 | ``` 83 | 84 | Then our training times drop dramatically: 85 | 86 | * **n1-highcpu-4** generated **150 bottlenecks per minute** on average. (**6.6 minutes per 1000 images**). 87 | * **n1-highcpu-24** generated **185 bottlenecks per minute** on average. (**5.4 minutes per 1000 images**). 88 | 89 | With AVX enabled the increase in performance between 4 vCPU vs 24 is pretty minimal. 90 | 91 | 92 | ## What gets installed? 93 | 94 | Not a lot, but it's the setup and the dependencies for the below that take time to find if doing this by yourself! 95 | Life is always simpler when you have the answer infront of you... 96 | 97 | * Java 8 98 | * [Bazel](https://github.com/bazelbuild/bazel) for building 99 | * Python and associated deps. 100 | * [TensorFlow](https://github.com/tensorflow/tensorflow) 101 | * Git 102 | * Unzip 103 | * Dependencies for all of the above. See script for exact details. 104 | 105 | 106 | ## What gets changed? 107 | 108 | * A swap partition is created on your physical disk 109 | * Swappiness value for the OS is changed so that it prefers to use RAM. It will only use swap when RAM is full. 110 | * $PATH has $HOME/bin added to it. 111 | 112 | 113 | ## Questions / Comments / Disclaimer 114 | 115 | This has been tried and tested with the current version of [TensorFlow](https://github.com/tensorflow/tensorflow) as at 12th Feb 2016. 116 | 117 | Also I would just like to say I am not by any means an expert on Tensor Flow, or machine learning, I am learning as I go along and sharing what I find in the hope it will help others out who are also just getting started and want to get playing quickly in the cloud. 118 | 119 | If you found this useful, you may enjoy my other ramblings and discoveries. Feel free to [check out my website to connect with me on social channels](http://www.jasonmayes.com). 120 | -------------------------------------------------------------------------------- /setupTensorFlowGCE.sh: -------------------------------------------------------------------------------- 1 | #!/bin/bash 2 | # 3 | # This script will hopefully save you a lot of time setting up GCE 4 | # (Google Compute Engine) to be ready to run TensorFlow. 5 | # 6 | # Tested using the following environment: 7 | # - n1-highcpu-4 instance with 3.6GB RAM 8 | # - Running vanilla Ubuntu Trusty 14.04 LTS. 9 | # - 20GB persistent disk. 10 | # 11 | # @author Jason Mayes 12 | # 13 | # Excessive commenting has been included below for clarity :-) 14 | # Save this script to /home/yourUserName, chmod +x setupTensorFlowGCE.sh, + run 15 | # using ./setupTensorFlowGCE.sh 16 | 17 | mkdir tensorflow 18 | cd tensorflow 19 | 20 | ################################################################################ 21 | # Install utils. 22 | ################################################################################ 23 | echo -e "\e[36m***Installing utilities*** \e[0m" 24 | sudo apt-get update 25 | sudo apt-get install unzip git-all pkg-config zip g++ zlib1g-dev 26 | 27 | ################################################################################ 28 | # Install Java deps. 29 | ################################################################################ 30 | echo -e "\e[36m***Installing Java8. Press ENTER when prompted*** \e[0m" 31 | echo -e "\e[36m***And accept licence*** \e[0m" 32 | sudo add-apt-repository ppa:webupd8team/java 33 | sudo apt-get update 34 | sudo apt-get install oracle-java8-installer 35 | 36 | ################################################################################ 37 | # Install Bazel dep. 38 | ################################################################################ 39 | echo -e "\e[36m***Installing Bazel*** \e[0m" 40 | wget https://goo.gl/OQ2ZCl -O bazel-installer-linux-x86_64.sh 41 | chmod +x bazel-installer-linux-x86_64.sh 42 | sudo ./bazel-installer-linux-x86_64.sh 43 | rm bazel-installer-linux-x86_64.sh 44 | sudo chown $USER:$USER ~/.cache/bazel/ 45 | 46 | ################################################################################ 47 | # Fetch Swig and Python deps. 48 | ################################################################################ 49 | echo -e "\e[36m***Installing python deps*** \e[0m" 50 | sudo apt-get install swig 51 | sudo apt-get install build-essential python-dev python-pip checkinstall 52 | sudo apt-get install libreadline-gplv2-dev libncursesw5-dev libssl-dev \ 53 | libsqlite3-dev tk-dev libgdbm-dev libc6-dev libbz2-dev 54 | 55 | ################################################################################ 56 | # Fetch and install Python. 57 | ################################################################################ 58 | echo -e "\e[36m***Installing Python*** \e[0m" 59 | wget https://www.python.org/ftp/python/3.5.1/Python-3.5.1.tgz 60 | tar -xvf Python-3.5.1.tgz 61 | cd Python-3.5.1 62 | ./configure 63 | make 64 | sudo make install 65 | cd ../ 66 | rm Python-3.5.1.tgz 67 | sudo echo "alias python=python3.5" >> ~/.bashrc 68 | source ~/.bashrc 69 | 70 | ################################################################################ 71 | # Grab v0.8 TensorFlow from git. 72 | ################################################################################ 73 | echo -e "\e[36m***Cloning TensorFlow from GitHub*** \e[0m" 74 | git clone --recurse-submodules -b r0.8 https://github.com/tensorflow/tensorflow.git 75 | sed -i 's/kDefaultTotalBytesLimit = 64/kDefaultTotalBytesLimit = 128/' tensorflow/google/protobuf/src/google/protobuf/io/coded_stream.h 76 | 77 | ################################################################################ 78 | # We need Numpy for this Tensor flow to work. 79 | ################################################################################ 80 | echo -e "\e[36m***Installing Numpy*** \e[0m" 81 | sudo apt-get install python-numpy 82 | sudo pip install numpy --upgrade 83 | 84 | ################################################################################ 85 | # GCE has no swap, prevent trying to use one else out of virtual memory error. 86 | ################################################################################ 87 | echo -e "\e[36m***Changing swappiness*** \e[0m" 88 | sudo sysctl vm.swappiness=0 89 | # Make change persistent even after reboot. 90 | cp /etc/sysctl.conf /tmp/ 91 | echo "vm.swappiness=0" >> /tmp/sysctl.conf 92 | sudo cp /tmp/sysctl.conf /etc/ 93 | 94 | ################################################################################ 95 | # Make a swap which is used only if RAM not available. 96 | ################################################################################ 97 | echo -e "\e[36m***Creating swap file*** \e[0m" 98 | sudo touch /var/swap.img 99 | sudo chmod 600 /var/swap.img 100 | # Create approx 4GB swap assuming 3.6GB RAM (almost 8GB total space available) 101 | sudo dd if=/dev/zero of=/var/swap.img bs=1024k count=4000 102 | sudo mkswap /var/swap.img 103 | sudo swapon /var/swap.img 104 | free 105 | echo -e "\e[36mReady to run TensorFlow! \e[0m" 106 | 107 | ################################################################################ 108 | # Now let's configure tensor flow. 109 | ################################################################################ 110 | echo -e "\e[36m***Configuring TensorFlow*** \e[0m" 111 | echo -e "\e[36mType /usr/bin/python for config and say NO to GPU support. \e[0m" 112 | echo -e "\e[36mRunning configure: \e[0m" 113 | cd tensorflow 114 | ./configure 115 | --------------------------------------------------------------------------------