├── .github └── ISSUE_TEMPLATE.md ├── .gitignore ├── 01_Simple_Linear_Model.ipynb ├── 02_Convolutional_Neural_Network.ipynb ├── 03B_Layers_API.ipynb ├── 03C_Keras_API.ipynb ├── 03_PrettyTensor.ipynb ├── 04_Save_Restore.ipynb ├── 05_Ensemble_Learning.ipynb ├── 06_CIFAR-10.ipynb ├── 07_Inception_Model.ipynb ├── 08_Transfer_Learning.ipynb ├── 09_Video_Data.ipynb ├── 10_Fine-Tuning.ipynb ├── 11_Adversarial_Examples.ipynb ├── 12_Adversarial_Noise_MNIST.ipynb ├── 13B_Visual_Analysis_MNIST.ipynb ├── 13_Visual_Analysis.ipynb ├── 14_DeepDream.ipynb ├── 15_Style_Transfer.ipynb ├── 16_Reinforcement_Learning.ipynb ├── 17_Estimator_API.ipynb ├── 18_TFRecords_Dataset_API.ipynb ├── 19_Hyper-Parameters.ipynb ├── 20_Natural_Language_Processing.ipynb ├── 21_Machine_Translation.ipynb ├── 22_Image_Captioning.ipynb ├── 23_Time-Series-Prediction.ipynb ├── LICENSE ├── README.md ├── cache.py ├── cifar10.py ├── coco.py ├── convert.py ├── dataset.py ├── download.py ├── europarl.py ├── forks.md ├── images ├── 02_convolution.png ├── 02_convolution.svg ├── 02_network_flowchart.png ├── 02_network_flowchart.svg ├── 06_network_flowchart.png ├── 06_network_flowchart.svg ├── 07_inception_flowchart.png ├── 08_transfer_learning_flowchart.png ├── 08_transfer_learning_flowchart.svg ├── 09_transfer_learning_flowchart.png ├── 09_transfer_learning_flowchart.svg ├── 10_transfer_learning_flowchart.png ├── 10_transfer_learning_flowchart.svg ├── 11_adversarial_examples_flowchart.png ├── 11_adversarial_examples_flowchart.svg ├── 12_adversarial_noise_flowchart.png ├── 12_adversarial_noise_flowchart.svg ├── 13_visual_analysis_flowchart.png ├── 13_visual_analysis_flowchart.svg ├── 13b_visual_analysis_flowchart.png ├── 13b_visual_analysis_flowchart.svg ├── 14_deepdream_flowchart.png ├── 14_deepdream_flowchart.svg ├── 14_deepdream_recursive_flowchart.png ├── 14_deepdream_recursive_flowchart.svg ├── 15_style_transfer_flowchart.png ├── 15_style_transfer_flowchart.svg ├── 16_flowchart.png ├── 16_flowchart.svg ├── 16_motion-trace.png ├── 16_problem.png ├── 16_problem.svg ├── 16_q-values-details.png ├── 16_q-values-details.svg ├── 16_q-values-simple.png ├── 16_q-values-simple.svg ├── 16_training_stability.png ├── 16_training_stability.svg ├── 19_flowchart_bayesian_optimization.png ├── 19_flowchart_bayesian_optimization.svg ├── 20_natural_language_flowchart.png ├── 20_natural_language_flowchart.svg ├── 20_recurrent_unit.png ├── 20_recurrent_unit.svg ├── 20_unrolled_3layers_flowchart.png ├── 20_unrolled_3layers_flowchart.svg ├── 20_unrolled_flowchart.png ├── 20_unrolled_flowchart.svg ├── 21_machine_translation_flowchart.png ├── 21_machine_translation_flowchart.svg ├── 22_image_captioning_flowchart.png ├── 22_image_captioning_flowchart.svg ├── 23_time_series_flowchart.png ├── 23_time_series_flowchart.svg ├── Denmark.jpg ├── Europe.jpg ├── elon_musk.jpg ├── elon_musk_100x100.jpg ├── escher_planefilling2.jpg ├── giger.jpg ├── hulk.jpg ├── parrot.jpg ├── parrot_cropped1.jpg ├── parrot_cropped2.jpg ├── parrot_cropped3.jpg ├── parrot_padded.jpg ├── style1.jpg ├── style2.jpg ├── style3.jpg ├── style4.jpg ├── style5.jpg ├── style6.jpg ├── style7.jpg ├── style8.jpg ├── style9.jpg ├── willy_wonka_new.jpg └── willy_wonka_old.jpg ├── imdb.py ├── inception.py ├── inception5h.py ├── knifey.py ├── mnist.py ├── reinforcement_learning.py ├── requirements.txt ├── vgg16.py └── weather.py /.github/ISSUE_TEMPLATE.md: -------------------------------------------------------------------------------- 1 | # STOP! 2 | 3 | **Please don't waste my time!** 4 | 5 | Most of the problems people are having are already described in the [installation instructions](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/README.md). 6 | 7 | You should first make a serious attempt to solve your problem. 8 | If you ask a question that has already been answered elsewhere, or if you do not 9 | give enough details about your problem, then your issue may be closed immediately. 10 | 11 | ## Python 3 12 | 13 | These tutorials were developed in **Python 3.5** (and higher) and may give strange errors in Python 2.7 14 | 15 | ## Missing Files 16 | 17 | You need to **download the whole repository**, either using `git clone` or as a zip-file. See the [installation instructions](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/README.md). 18 | 19 | ## Questions about TensorFlow 20 | 21 | General questions about TensorFlow should either be asked on [StackOverflow](http://stackoverflow.com/questions/tagged/tensorflow) or the [official TensorFlow repository](https://github.com/tensorflow/tensorflow/issues). 22 | 23 | ## Modifications 24 | 25 | Questions about modifications or how to use these tutorials on your own data-set should also be asked on [StackOverflow](http://stackoverflow.com/questions/tagged/tensorflow). 26 | 27 | Thousands of people are using these tutorials. It is impossible for me to give individual support for your project. 28 | 29 | ## Suggestions for Changes 30 | 31 | The tutorials cannot change too much because it would make the [YouTube videos](https://www.youtube.com/playlist?list=PL9Hr9sNUjfsmEu1ZniY0XpHSzl5uihcXZ) too different from the source-code. 32 | 33 | ## Requests for New Tutorials 34 | 35 | These tutorials were made by a single person on his own time. It took a very long time to 36 | research and produce the tutorials. If a topic is not covered then the best thing is to make 37 | a new tutorial by yourself. All you need is a decent microphone, a screen-grabbing tool, and a 38 | video editor. I used the free version of [DaVinci Resolve](https://www.blackmagicdesign.com/products/davinciresolve). 39 | 40 | ## Other Issues? 41 | 42 | Please carefully read the [installation instructions](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/README.md) and only open an issue if you are still having problems. 43 | -------------------------------------------------------------------------------- /.gitignore: -------------------------------------------------------------------------------- 1 | # Various 2 | sandbox*.py 3 | 4 | # Data for TensorFlow 5 | data/ 6 | inception/ 7 | vgg16/ 8 | checkpoints/ 9 | checkpoints* 10 | logs/ 11 | summary/ 12 | 13 | # PyCharm 14 | .idea/ 15 | 16 | 17 | # Byte-compiled / optimized / DLL files 18 | __pycache__/ 19 | *.py[cod] 20 | *$py.class 21 | 22 | # C extensions 23 | *.so 24 | 25 | # Distribution / packaging 26 | .Python 27 | env/ 28 | build/ 29 | develop-eggs/ 30 | dist/ 31 | downloads/ 32 | eggs/ 33 | .eggs/ 34 | lib/ 35 | lib64/ 36 | parts/ 37 | sdist/ 38 | var/ 39 | *.egg-info/ 40 | .installed.cfg 41 | *.egg 42 | 43 | # PyInstaller 44 | # Usually these files are written by a python script from a template 45 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 46 | *.manifest 47 | *.spec 48 | 49 | # Installer logs 50 | pip-log.txt 51 | pip-delete-this-directory.txt 52 | 53 | # Unit test / coverage reports 54 | htmlcov/ 55 | .tox/ 56 | .coverage 57 | .coverage.* 58 | .cache 59 | nosetests.xml 60 | coverage.xml 61 | *,cover 62 | .hypothesis/ 63 | 64 | # Translations 65 | *.mo 66 | *.pot 67 | 68 | # Django stuff: 69 | *.log 70 | local_settings.py 71 | 72 | # Flask stuff: 73 | instance/ 74 | .webassets-cache 75 | 76 | # Scrapy stuff: 77 | .scrapy 78 | 79 | # Sphinx documentation 80 | docs/_build/ 81 | 82 | # PyBuilder 83 | target/ 84 | 85 | # IPython Notebook 86 | .ipynb_checkpoints 87 | 88 | # pyenv 89 | .python-version 90 | 91 | # celery beat schedule file 92 | celerybeat-schedule 93 | 94 | # dotenv 95 | .env 96 | 97 | # virtualenv 98 | venv/ 99 | ENV/ 100 | 101 | # Spyder project settings 102 | .spyderproject 103 | 104 | # Rope project settings 105 | .ropeproject 106 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2016 by Magnus Erik Hvass Pedersen 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # TensorFlow Tutorials 2 | 3 | [Original repository on GitHub](https://github.com/Hvass-Labs/TensorFlow-Tutorials) 4 | 5 | Original author is [Magnus Erik Hvass Pedersen](http://www.hvass-labs.org) 6 | 7 | ## Introduction 8 | 9 | * These tutorials are intended for beginners in Deep Learning and TensorFlow. 10 | * Each tutorial covers a single topic. 11 | * The source-code is well-documented. 12 | * There is a [YouTube video](https://www.youtube.com/playlist?list=PL9Hr9sNUjfsmEu1ZniY0XpHSzl5uihcXZ) for each tutorial. 13 | 14 | ## Tutorials for TensorFlow 2 15 | 16 | The following tutorials have been updated and work with **TensorFlow 2** 17 | (some of them run in "v.1 compatibility mode"). 18 | 19 | 1. Simple Linear Model 20 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/01_Simple_Linear_Model.ipynb)) 21 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/01_Simple_Linear_Model.ipynb)) 22 | 23 | 2. Convolutional Neural Network 24 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/02_Convolutional_Neural_Network.ipynb)) 25 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/02_Convolutional_Neural_Network.ipynb)) 26 | 27 | 3-C. Keras API 28 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/03C_Keras_API.ipynb)) 29 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/03C_Keras_API.ipynb)) 30 | 31 | 10. Fine-Tuning 32 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/10_Fine-Tuning.ipynb)) 33 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/10_Fine-Tuning.ipynb)) 34 | 35 | 13-B. Visual Analysis for MNIST 36 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/13B_Visual_Analysis_MNIST.ipynb)) 37 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/13B_Visual_Analysis_MNIST.ipynb)) 38 | 39 | 16. Reinforcement Learning 40 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/16_Reinforcement_Learning.ipynb)) 41 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/16_Reinforcement_Learning.ipynb)) 42 | 43 | 19. Hyper-Parameter Optimization 44 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/19_Hyper-Parameters.ipynb)) 45 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/19_Hyper-Parameters.ipynb)) 46 | 47 | 20. Natural Language Processing 48 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/20_Natural_Language_Processing.ipynb)) 49 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/20_Natural_Language_Processing.ipynb)) 50 | 51 | 21. Machine Translation 52 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/21_Machine_Translation.ipynb)) 53 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/21_Machine_Translation.ipynb)) 54 | 55 | 22. Image Captioning 56 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/22_Image_Captioning.ipynb)) 57 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/22_Image_Captioning.ipynb)) 58 | 59 | 23. Time-Series Prediction 60 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/23_Time-Series-Prediction.ipynb)) 61 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/23_Time-Series-Prediction.ipynb)) 62 | 63 | ## Tutorials for TensorFlow 1 64 | 65 | The following tutorials only work with the older **TensorFlow 1** API, so you 66 | would need to install an older version of TensorFlow to run these. It would take 67 | too much time and effort to convert these tutorials to TensorFlow 2. 68 | 69 | 3. Pretty Tensor 70 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/03_PrettyTensor.ipynb)) 71 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/03_PrettyTensor.ipynb)) 72 | 73 | 3-B. Layers API 74 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/03B_Layers_API.ipynb)) 75 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/03B_Layers_API.ipynb)) 76 | 77 | 4. Save & Restore 78 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/04_Save_Restore.ipynb)) 79 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/04_Save_Restore.ipynb)) 80 | 81 | 5. Ensemble Learning 82 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/05_Ensemble_Learning.ipynb)) 83 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/05_Ensemble_Learning.ipynb)) 84 | 85 | 6. CIFAR-10 86 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/06_CIFAR-10.ipynb)) 87 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/06_CIFAR-10.ipynb)) 88 | 89 | 7. Inception Model 90 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/07_Inception_Model.ipynb)) 91 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/07_Inception_Model.ipynb)) 92 | 93 | 8. Transfer Learning 94 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/08_Transfer_Learning.ipynb)) 95 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/08_Transfer_Learning.ipynb)) 96 | 97 | 9. Video Data 98 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/09_Video_Data.ipynb)) 99 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/09_Video_Data.ipynb)) 100 | 101 | 11. Adversarial Examples 102 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/11_Adversarial_Examples.ipynb)) 103 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/11_Adversarial_Examples.ipynb)) 104 | 105 | 12. Adversarial Noise for MNIST 106 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/12_Adversarial_Noise_MNIST.ipynb)) 107 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/12_Adversarial_Noise_MNIST.ipynb)) 108 | 109 | 13. Visual Analysis 110 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/13_Visual_Analysis.ipynb)) 111 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/13_Visual_Analysis.ipynb)) 112 | 113 | 14. DeepDream 114 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/14_DeepDream.ipynb)) 115 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/14_DeepDream.ipynb)) 116 | 117 | 15. Style Transfer 118 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/15_Style_Transfer.ipynb)) 119 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/15_Style_Transfer.ipynb)) 120 | 121 | 17. Estimator API 122 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/17_Estimator_API.ipynb)) 123 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/17_Estimator_API.ipynb)) 124 | 125 | 18. TFRecords & Dataset API 126 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/18_TFRecords_Dataset_API.ipynb)) 127 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/18_TFRecords_Dataset_API.ipynb)) 128 | 129 | ## Videos 130 | 131 | These tutorials are also available as [YouTube videos](https://www.youtube.com/playlist?list=PL9Hr9sNUjfsmEu1ZniY0XpHSzl5uihcXZ). 132 | 133 | ## Translations 134 | 135 | These tutorials have been translated to the following languages: 136 | 137 | * [Chinese](https://github.com/Hvass-Labs/TensorFlow-Tutorials-Chinese) 138 | 139 | ### New Translations 140 | 141 | You can help by translating the remaining tutorials or reviewing the ones that have already been translated. You can also help by translating to other languages. 142 | 143 | It is a very big job to translate all the tutorials, so you should just start with Tutorials #01, #02 and #03-C which are the most important for beginners. 144 | 145 | ### New Videos 146 | 147 | You are also very welcome to record your own YouTube videos in other languages. It is strongly recommended that you get a decent microphone because good sound quality is very important. I used `vokoscreen` for recording the videos and the free [DaVinci Resolve](https://www.blackmagicdesign.com/products/davinciresolve/) for editing the videos. 148 | 149 | ## Forks 150 | 151 | See the [selected list of forks](forks.md) for community modifications to these tutorials. 152 | 153 | ## Installation 154 | 155 | There are different ways of installing and running TensorFlow. This section describes how I did it 156 | for these tutorials. You may want to do it differently and you can search the internet for instructions. 157 | 158 | If you are new to using Python and Linux then this may be challenging 159 | to get working and you may need to do internet searches for error-messages, etc. 160 | It will get easier with practice. You can also run the tutorials without installing 161 | anything by using Google Colab, see further below. 162 | 163 | Some of the Python Notebooks use source-code located in different files to allow for easy re-use 164 | across multiple tutorials. It is therefore recommended that you download the whole repository 165 | from GitHub, instead of just downloading the individual Python Notebooks. 166 | 167 | ### Git 168 | 169 | The easiest way to download and install these tutorials is by using git from the command-line: 170 | 171 | git clone https://github.com/Hvass-Labs/TensorFlow-Tutorials.git 172 | 173 | This will create the directory `TensorFlow-Tutorials` and download all the files to it. 174 | 175 | This also makes it easy to update the tutorials, simply by executing this command inside that directory: 176 | 177 | git pull 178 | 179 | ### Download Zip-File 180 | 181 | You can also [download](https://github.com/Hvass-Labs/TensorFlow-Tutorials/archive/master.zip) 182 | the contents of the GitHub repository as a Zip-file and extract it manually. 183 | 184 | ### Environment 185 | 186 | I use [Anaconda](https://www.continuum.io/downloads) because it comes with many Python 187 | packages already installed and it is easy to work with. After installing Anaconda, 188 | you should create a [conda environment](http://conda.pydata.org/docs/using/envs.html) 189 | so you do not destroy your main installation in case you make a mistake somewhere: 190 | 191 | conda create --name tf python=3 192 | 193 | When Python gets updated to a new version, it takes a while before TensorFlow also 194 | uses the new Python version. So if the TensorFlow installation fails, then you may 195 | have to specify an older Python version for your new environment, such as: 196 | 197 | conda create --name tf python=3.6 198 | 199 | Now you can switch to the new environment by running the following (on Linux): 200 | 201 | source activate tf 202 | 203 | ### Required Packages 204 | 205 | The tutorials require several Python packages to be installed. The packages are listed in 206 | [requirements.txt](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/requirements.txt) 207 | 208 | To install the required Python packages and dependencies you first have to activate the 209 | conda-environment as described above, and then you run the following command 210 | in a terminal: 211 | 212 | pip install -r requirements.txt 213 | 214 | Starting with TensorFlow 2.1 it includes both the CPU and GPU versions and will 215 | automatically switch if you have a GPU. But this requires the installation of various 216 | NVIDIA drivers, which is a bit complicated and is not described here. 217 | 218 | ### Python Version 3.5 or Later 219 | 220 | These tutorials were developed on Linux using **Python 3.5 / 3.6** (the [Anaconda](https://www.continuum.io/downloads) distribution) and [PyCharm](https://www.jetbrains.com/pycharm/). 221 | 222 | There are reports that Python 2.7 gives error messages with these tutorials. Please make sure you are using **Python 3.5** or later! 223 | 224 | ## How To Run 225 | 226 | If you have followed the above installation instructions, you should 227 | now be able to run the tutorials in the Python Notebooks: 228 | 229 | cd ~/development/TensorFlow-Tutorials/ # Your installation directory. 230 | jupyter notebook 231 | 232 | This should start a web-browser that shows the list of tutorials. Click on a tutorial to load it. 233 | 234 | ### Run in Google Colab 235 | 236 | If you do not want to install anything on your own computer, then the Notebooks 237 | can be viewed, edited and run entirely on the internet by using 238 | [Google Colab](https://colab.research.google.com). There is a 239 | [YouTube video](https://www.youtube.com/watch?v=Hs6HI2YWchM) explaining how to do this. 240 | You click the "Google Colab"-link next to each tutorial listed above. 241 | You can view the Notebook on Colab but in order to run it you need to login using 242 | your Google account. 243 | Then you need to execute the following commands at the top of the Notebook, 244 | which clones the contents of this repository to your work-directory on Colab. 245 | 246 | # Clone the repository from GitHub to Google Colab's temporary drive. 247 | import os 248 | work_dir = "/content/TensorFlow-Tutorials/" 249 | if not os.path.exists(work_dir): 250 | !git clone https://github.com/Hvass-Labs/TensorFlow-Tutorials.git 251 | os.chdir(work_dir) 252 | 253 | All required packages should already be installed on Colab, otherwise you 254 | can run the following command: 255 | 256 | !pip install -r requirements.txt 257 | 258 | ## Older Versions 259 | 260 | Sometimes the source-code has changed from that shown in the YouTube videos. This may be due to 261 | bug-fixes, improvements, or because code-sections are moved to separate files for easy re-use. 262 | 263 | If you want to see the exact versions of the source-code that were used in the YouTube videos, 264 | then you can [browse the history](https://github.com/Hvass-Labs/TensorFlow-Tutorials/commits/master) 265 | of commits to the GitHub repository. 266 | 267 | ## License (MIT) 268 | 269 | These tutorials and source-code are published under the [MIT License](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/LICENSE) 270 | which allows very broad use for both academic and commercial purposes. 271 | 272 | A few of the images used for demonstration purposes may be under copyright. These images are included under the "fair usage" laws. 273 | 274 | You are very welcome to modify these tutorials and use them in your own projects. 275 | Please keep a link to the [original repository](https://github.com/Hvass-Labs/TensorFlow-Tutorials). 276 | -------------------------------------------------------------------------------- /cache.py: -------------------------------------------------------------------------------- 1 | ######################################################################## 2 | # 3 | # Cache-wrapper for a function or class. 4 | # 5 | # Save the result of calling a function or creating an object-instance 6 | # to harddisk. This is used to persist the data so it can be reloaded 7 | # very quickly and easily. 8 | # 9 | # Implemented in Python 3.5 10 | # 11 | ######################################################################## 12 | # 13 | # This file is part of the TensorFlow Tutorials available at: 14 | # 15 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials 16 | # 17 | # Published under the MIT License. See the file LICENSE for details. 18 | # 19 | # Copyright 2016 by Magnus Erik Hvass Pedersen 20 | # 21 | ######################################################################## 22 | 23 | import os 24 | import pickle 25 | import numpy as np 26 | 27 | ######################################################################## 28 | 29 | 30 | def cache(cache_path, fn, *args, **kwargs): 31 | """ 32 | Cache-wrapper for a function or class. If the cache-file exists 33 | then the data is reloaded and returned, otherwise the function 34 | is called and the result is saved to cache. The fn-argument can 35 | also be a class instead, in which case an object-instance is 36 | created and saved to the cache-file. 37 | 38 | :param cache_path: 39 | File-path for the cache-file. 40 | 41 | :param fn: 42 | Function or class to be called. 43 | 44 | :param args: 45 | Arguments to the function or class-init. 46 | 47 | :param kwargs: 48 | Keyword arguments to the function or class-init. 49 | 50 | :return: 51 | The result of calling the function or creating the object-instance. 52 | """ 53 | 54 | # If the cache-file exists. 55 | if os.path.exists(cache_path): 56 | # Load the cached data from the file. 57 | with open(cache_path, mode='rb') as file: 58 | obj = pickle.load(file) 59 | 60 | print("- Data loaded from cache-file: " + cache_path) 61 | else: 62 | # The cache-file does not exist. 63 | 64 | # Call the function / class-init with the supplied arguments. 65 | obj = fn(*args, **kwargs) 66 | 67 | # Save the data to a cache-file. 68 | with open(cache_path, mode='wb') as file: 69 | pickle.dump(obj, file) 70 | 71 | print("- Data saved to cache-file: " + cache_path) 72 | 73 | return obj 74 | 75 | 76 | ######################################################################## 77 | 78 | 79 | def convert_numpy2pickle(in_path, out_path): 80 | """ 81 | Convert a numpy-file to pickle-file. 82 | 83 | The first version of the cache-function used numpy for saving the data. 84 | Instead of re-calculating all the data, you can just convert the 85 | cache-file using this function. 86 | 87 | :param in_path: 88 | Input file in numpy-format written using numpy.save(). 89 | 90 | :param out_path: 91 | Output file written as a pickle-file. 92 | 93 | :return: 94 | Nothing. 95 | """ 96 | 97 | # Load the data using numpy. 98 | data = np.load(in_path) 99 | 100 | # Save the data using pickle. 101 | with open(out_path, mode='wb') as file: 102 | pickle.dump(data, file) 103 | 104 | 105 | ######################################################################## 106 | 107 | if __name__ == '__main__': 108 | # This is a short example of using a cache-file. 109 | 110 | # This is the function that will only get called if the result 111 | # is not already saved in the cache-file. This would normally 112 | # be a function that takes a long time to compute, or if you 113 | # need persistent data for some other reason. 114 | def expensive_function(a, b): 115 | return a * b 116 | 117 | print('Computing expensive_function() ...') 118 | 119 | # Either load the result from a cache-file if it already exists, 120 | # otherwise calculate expensive_function(a=123, b=456) and 121 | # save the result to the cache-file for next time. 122 | result = cache(cache_path='cache_expensive_function.pkl', 123 | fn=expensive_function, a=123, b=456) 124 | 125 | print('result =', result) 126 | 127 | # Newline. 128 | print() 129 | 130 | # This is another example which saves an object to a cache-file. 131 | 132 | # We want to cache an object-instance of this class. 133 | # The motivation is to do an expensive computation only once, 134 | # or if we need to persist the data for some other reason. 135 | class ExpensiveClass: 136 | def __init__(self, c, d): 137 | self.c = c 138 | self.d = d 139 | self.result = c * d 140 | 141 | def print_result(self): 142 | print('c =', self.c) 143 | print('d =', self.d) 144 | print('result = c * d =', self.result) 145 | 146 | print('Creating object from ExpensiveClass() ...') 147 | 148 | # Either load the object from a cache-file if it already exists, 149 | # otherwise make an object-instance ExpensiveClass(c=123, d=456) 150 | # and save the object to the cache-file for the next time. 151 | obj = cache(cache_path='cache_ExpensiveClass.pkl', 152 | fn=ExpensiveClass, c=123, d=456) 153 | 154 | obj.print_result() 155 | 156 | ######################################################################## 157 | -------------------------------------------------------------------------------- /cifar10.py: -------------------------------------------------------------------------------- 1 | ######################################################################## 2 | # 3 | # Functions for downloading the CIFAR-10 data-set from the internet 4 | # and loading it into memory. 5 | # 6 | # Implemented in Python 3.5 7 | # 8 | # Usage: 9 | # 1) Set the variable data_path with the desired storage path. 10 | # 2) Call maybe_download_and_extract() to download the data-set 11 | # if it is not already located in the given data_path. 12 | # 3) Call load_class_names() to get an array of the class-names. 13 | # 4) Call load_training_data() and load_test_data() to get 14 | # the images, class-numbers and one-hot encoded class-labels 15 | # for the training-set and test-set. 16 | # 5) Use the returned data in your own program. 17 | # 18 | # Format: 19 | # The images for the training- and test-sets are returned as 4-dim numpy 20 | # arrays each with the shape: [image_number, height, width, channel] 21 | # where the individual pixels are floats between 0.0 and 1.0. 22 | # 23 | ######################################################################## 24 | # 25 | # This file is part of the TensorFlow Tutorials available at: 26 | # 27 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials 28 | # 29 | # Published under the MIT License. See the file LICENSE for details. 30 | # 31 | # Copyright 2016 by Magnus Erik Hvass Pedersen 32 | # 33 | ######################################################################## 34 | 35 | import numpy as np 36 | import pickle 37 | import os 38 | import download 39 | from dataset import one_hot_encoded 40 | 41 | ######################################################################## 42 | 43 | # Directory where you want to download and save the data-set. 44 | # Set this before you start calling any of the functions below. 45 | data_path = "data/CIFAR-10/" 46 | 47 | # URL for the data-set on the internet. 48 | data_url = "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz" 49 | 50 | ######################################################################## 51 | # Various constants for the size of the images. 52 | # Use these constants in your own program. 53 | 54 | # Width and height of each image. 55 | img_size = 32 56 | 57 | # Number of channels in each image, 3 channels: Red, Green, Blue. 58 | num_channels = 3 59 | 60 | # Length of an image when flattened to a 1-dim array. 61 | img_size_flat = img_size * img_size * num_channels 62 | 63 | # Number of classes. 64 | num_classes = 10 65 | 66 | ######################################################################## 67 | # Various constants used to allocate arrays of the correct size. 68 | 69 | # Number of files for the training-set. 70 | _num_files_train = 5 71 | 72 | # Number of images for each batch-file in the training-set. 73 | _images_per_file = 10000 74 | 75 | # Total number of images in the training-set. 76 | # This is used to pre-allocate arrays for efficiency. 77 | _num_images_train = _num_files_train * _images_per_file 78 | 79 | ######################################################################## 80 | # Private functions for downloading, unpacking and loading data-files. 81 | 82 | 83 | def _get_file_path(filename=""): 84 | """ 85 | Return the full path of a data-file for the data-set. 86 | 87 | If filename=="" then return the directory of the files. 88 | """ 89 | 90 | return os.path.join(data_path, "cifar-10-batches-py/", filename) 91 | 92 | 93 | def _unpickle(filename): 94 | """ 95 | Unpickle the given file and return the data. 96 | 97 | Note that the appropriate dir-name is prepended the filename. 98 | """ 99 | 100 | # Create full path for the file. 101 | file_path = _get_file_path(filename) 102 | 103 | print("Loading data: " + file_path) 104 | 105 | with open(file_path, mode='rb') as file: 106 | # In Python 3.X it is important to set the encoding, 107 | # otherwise an exception is raised here. 108 | data = pickle.load(file, encoding='bytes') 109 | 110 | return data 111 | 112 | 113 | def _convert_images(raw): 114 | """ 115 | Convert images from the CIFAR-10 format and 116 | return a 4-dim array with shape: [image_number, height, width, channel] 117 | where the pixels are floats between 0.0 and 1.0. 118 | """ 119 | 120 | # Convert the raw images from the data-files to floating-points. 121 | raw_float = np.array(raw, dtype=float) / 255.0 122 | 123 | # Reshape the array to 4-dimensions. 124 | images = raw_float.reshape([-1, num_channels, img_size, img_size]) 125 | 126 | # Reorder the indices of the array. 127 | images = images.transpose([0, 2, 3, 1]) 128 | 129 | return images 130 | 131 | 132 | def _load_data(filename): 133 | """ 134 | Load a pickled data-file from the CIFAR-10 data-set 135 | and return the converted images (see above) and the class-number 136 | for each image. 137 | """ 138 | 139 | # Load the pickled data-file. 140 | data = _unpickle(filename) 141 | 142 | # Get the raw images. 143 | raw_images = data[b'data'] 144 | 145 | # Get the class-numbers for each image. Convert to numpy-array. 146 | cls = np.array(data[b'labels']) 147 | 148 | # Convert the images. 149 | images = _convert_images(raw_images) 150 | 151 | return images, cls 152 | 153 | 154 | ######################################################################## 155 | # Public functions that you may call to download the data-set from 156 | # the internet and load the data into memory. 157 | 158 | 159 | def maybe_download_and_extract(): 160 | """ 161 | Download and extract the CIFAR-10 data-set if it doesn't already exist 162 | in data_path (set this variable first to the desired path). 163 | """ 164 | 165 | download.maybe_download_and_extract(url=data_url, download_dir=data_path) 166 | 167 | 168 | def load_class_names(): 169 | """ 170 | Load the names for the classes in the CIFAR-10 data-set. 171 | 172 | Returns a list with the names. Example: names[3] is the name 173 | associated with class-number 3. 174 | """ 175 | 176 | # Load the class-names from the pickled file. 177 | raw = _unpickle(filename="batches.meta")[b'label_names'] 178 | 179 | # Convert from binary strings. 180 | names = [x.decode('utf-8') for x in raw] 181 | 182 | return names 183 | 184 | 185 | def load_training_data(): 186 | """ 187 | Load all the training-data for the CIFAR-10 data-set. 188 | 189 | The data-set is split into 5 data-files which are merged here. 190 | 191 | Returns the images, class-numbers and one-hot encoded class-labels. 192 | """ 193 | 194 | # Pre-allocate the arrays for the images and class-numbers for efficiency. 195 | images = np.zeros(shape=[_num_images_train, img_size, img_size, num_channels], dtype=float) 196 | cls = np.zeros(shape=[_num_images_train], dtype=int) 197 | 198 | # Begin-index for the current batch. 199 | begin = 0 200 | 201 | # For each data-file. 202 | for i in range(_num_files_train): 203 | # Load the images and class-numbers from the data-file. 204 | images_batch, cls_batch = _load_data(filename="data_batch_" + str(i + 1)) 205 | 206 | # Number of images in this batch. 207 | num_images = len(images_batch) 208 | 209 | # End-index for the current batch. 210 | end = begin + num_images 211 | 212 | # Store the images into the array. 213 | images[begin:end, :] = images_batch 214 | 215 | # Store the class-numbers into the array. 216 | cls[begin:end] = cls_batch 217 | 218 | # The begin-index for the next batch is the current end-index. 219 | begin = end 220 | 221 | return images, cls, one_hot_encoded(class_numbers=cls, num_classes=num_classes) 222 | 223 | 224 | def load_test_data(): 225 | """ 226 | Load all the test-data for the CIFAR-10 data-set. 227 | 228 | Returns the images, class-numbers and one-hot encoded class-labels. 229 | """ 230 | 231 | images, cls = _load_data(filename="test_batch") 232 | 233 | return images, cls, one_hot_encoded(class_numbers=cls, num_classes=num_classes) 234 | 235 | ######################################################################## 236 | -------------------------------------------------------------------------------- /coco.py: -------------------------------------------------------------------------------- 1 | ######################################################################## 2 | # 3 | # Functions for downloading the COCO data-set from the internet 4 | # and loading it into memory. This data-set contains images and 5 | # various associated data such as text-captions describing the images. 6 | # 7 | # http://cocodataset.org 8 | # 9 | # Implemented in Python 3.6 10 | # 11 | # Usage: 12 | # 1) Call set_data_dir() to set the desired storage directory. 13 | # 2) Call maybe_download_and_extract() to download the data-set 14 | # if it is not already located in the given data_dir. 15 | # 3) Call load_records(train=True) and load_records(train=False) 16 | # to load the data-records for the training- and validation sets. 17 | # 5) Use the returned data in your own program. 18 | # 19 | # Format: 20 | # The COCO data-set contains a large number of images and various 21 | # data for each image stored in a JSON-file. 22 | # Functionality is provided for getting a list of image-filenames 23 | # (but not actually loading the images) along with their associated 24 | # data such as text-captions describing the contents of the images. 25 | # 26 | ######################################################################## 27 | # 28 | # This file is part of the TensorFlow Tutorials available at: 29 | # 30 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials 31 | # 32 | # Published under the MIT License. See the file LICENSE for details. 33 | # 34 | # Copyright 2018 by Magnus Erik Hvass Pedersen 35 | # 36 | ######################################################################## 37 | 38 | import json 39 | import os 40 | import download 41 | from cache import cache 42 | 43 | ######################################################################## 44 | 45 | # Directory where you want to download and save the data-set. 46 | # Set this before you start calling any of the functions below. 47 | # Use the function set_data_dir() to also update train_dir and val_dir. 48 | data_dir = "data/coco/" 49 | 50 | # Sub-directories for the training- and validation-sets. 51 | train_dir = "data/coco/train2017" 52 | val_dir = "data/coco/val2017" 53 | 54 | # Base-URL for the data-sets on the internet. 55 | data_url = "http://images.cocodataset.org/" 56 | 57 | 58 | ######################################################################## 59 | # Private helper-functions. 60 | 61 | def _load_records(train=True): 62 | """ 63 | Load the image-filenames and captions 64 | for either the training-set or the validation-set. 65 | """ 66 | 67 | if train: 68 | # Training-set. 69 | filename = "captions_train2017.json" 70 | else: 71 | # Validation-set. 72 | filename = "captions_val2017.json" 73 | 74 | # Full path for the data-file. 75 | path = os.path.join(data_dir, "annotations", filename) 76 | 77 | # Load the file. 78 | with open(path, "r", encoding="utf-8") as file: 79 | data_raw = json.load(file) 80 | 81 | # Convenience variables. 82 | images = data_raw['images'] 83 | annotations = data_raw['annotations'] 84 | 85 | # Initialize the dict for holding our data. 86 | # The lookup-key is the image-id. 87 | records = dict() 88 | 89 | # Collect all the filenames for the images. 90 | for image in images: 91 | # Get the id and filename for this image. 92 | image_id = image['id'] 93 | filename = image['file_name'] 94 | 95 | # Initialize a new data-record. 96 | record = dict() 97 | 98 | # Set the image-filename in the data-record. 99 | record['filename'] = filename 100 | 101 | # Initialize an empty list of image-captions 102 | # which will be filled further below. 103 | record['captions'] = list() 104 | 105 | # Save the record using the the image-id as the lookup-key. 106 | records[image_id] = record 107 | 108 | # Collect all the captions for the images. 109 | for ann in annotations: 110 | # Get the id and caption for an image. 111 | image_id = ann['image_id'] 112 | caption = ann['caption'] 113 | 114 | # Lookup the data-record for this image-id. 115 | # This data-record should already exist from the loop above. 116 | record = records[image_id] 117 | 118 | # Append the current caption to the list of captions in the 119 | # data-record that was initialized in the loop above. 120 | record['captions'].append(caption) 121 | 122 | # Convert the records-dict to a list of tuples. 123 | records_list = [(key, record['filename'], record['captions']) 124 | for key, record in sorted(records.items())] 125 | 126 | # Convert the list of tuples to separate tuples with the data. 127 | ids, filenames, captions = zip(*records_list) 128 | 129 | return ids, filenames, captions 130 | 131 | 132 | ######################################################################## 133 | # Public functions that you may call to download the data-set from 134 | # the internet and load the data into memory. 135 | 136 | 137 | def set_data_dir(new_data_dir): 138 | """ 139 | Set the base-directory for data-files and then 140 | set the sub-dirs for training and validation data. 141 | """ 142 | 143 | # Ensure we update the global variables. 144 | global data_dir, train_dir, val_dir 145 | 146 | data_dir = new_data_dir 147 | train_dir = os.path.join(new_data_dir, "train2017") 148 | val_dir = os.path.join(new_data_dir, "val2017") 149 | 150 | 151 | def maybe_download_and_extract(): 152 | """ 153 | Download and extract the COCO data-set if the data-files don't 154 | already exist in data_dir. 155 | """ 156 | 157 | # Filenames to download from the internet. 158 | filenames = ["zips/train2017.zip", "zips/val2017.zip", 159 | "annotations/annotations_trainval2017.zip"] 160 | 161 | # Download these files. 162 | for filename in filenames: 163 | # Create the full URL for the given file. 164 | url = data_url + filename 165 | 166 | print("Downloading " + url) 167 | 168 | download.maybe_download_and_extract(url=url, download_dir=data_dir) 169 | 170 | 171 | def load_records(train=True): 172 | """ 173 | Load the data-records for the data-set. This returns the image ids, 174 | filenames and text-captions for either the training-set or validation-set. 175 | 176 | This wraps _load_records() above with a cache, so if the cache-file already 177 | exists then it is loaded instead of processing the original data-file. 178 | 179 | :param train: 180 | Bool whether to load the training-set (True) or validation-set (False). 181 | 182 | :return: 183 | ids, filenames, captions for the images in the data-set. 184 | """ 185 | 186 | if train: 187 | # Cache-file for the training-set data. 188 | cache_filename = "records_train.pkl" 189 | else: 190 | # Cache-file for the validation-set data. 191 | cache_filename = "records_val.pkl" 192 | 193 | # Path for the cache-file. 194 | cache_path = os.path.join(data_dir, cache_filename) 195 | 196 | # If the data-records already exist in a cache-file then load it, 197 | # otherwise call the _load_records() function and save its 198 | # return-values to the cache-file so it can be loaded the next time. 199 | records = cache(cache_path=cache_path, 200 | fn=_load_records, 201 | train=train) 202 | 203 | return records 204 | 205 | ######################################################################## 206 | -------------------------------------------------------------------------------- /convert.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/python 2 | 3 | ######################################################################## 4 | # 5 | # Function and script for converting videos to images. 6 | # 7 | # This can be run as a script in a Linux shell by typing: 8 | # 9 | # python convert.py 10 | # 11 | # Or by running: 12 | # 13 | # chmod +x convert.py 14 | # ./convert.py 15 | # 16 | # Requires the program avconv to be installed. 17 | # Tested with avconv v. 9.18-6 on Linux Mint. 18 | # 19 | # Implemented in Python 3.5 (seems to work in Python 2.7 as well) 20 | # 21 | ######################################################################## 22 | # 23 | # This file is part of the TensorFlow Tutorials available at: 24 | # 25 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials 26 | # 27 | # Published under the MIT License. See the file LICENSE for details. 28 | # 29 | # Copyright 2016 by Magnus Erik Hvass Pedersen 30 | # 31 | ######################################################################## 32 | 33 | import os 34 | import subprocess 35 | import argparse 36 | 37 | ######################################################################## 38 | 39 | 40 | def video2images(in_dir, out_dir, crop_size, out_size, framerate, video_exts): 41 | """ 42 | Convert videos to images. The videos are located in the directory in_dir 43 | and all its sub-directories which are processed recursively. The directory 44 | structure is replicated to out_dir where the jpeg-images are saved. 45 | 46 | :param in_dir: 47 | Input directory for the videos e.g. "/home/magnus/video/" 48 | All sub-directories are processed recursively. 49 | 50 | :param out_dir: 51 | Output directory for the images e.g. "/home/magnus/video-images/" 52 | 53 | :param crop_size: 54 | Integer. First the videos are cropped to this width and height. 55 | 56 | :param out_size: 57 | Integer. After cropping, the videos are resized to this width and height. 58 | 59 | :param framerate: 60 | Integer. Number of frames to grab per second. 61 | 62 | :param video_exts: 63 | Tuple of strings. Extensions for video-files e.g. ('.mts', '.mp4') 64 | Not case-sensitive. 65 | 66 | :return: 67 | Nothing. 68 | """ 69 | 70 | # Convert all video extensions to lower-case. 71 | video_exts = tuple(ext.lower() for ext in video_exts) 72 | 73 | # Number of videos processed. 74 | video_count = 0 75 | 76 | # Process all the sub-dirs recursively. 77 | for current_dir, dir_names, file_names in os.walk(in_dir): 78 | # The current dir relative to the input directory. 79 | relative_path = os.path.relpath(current_dir, in_dir) 80 | 81 | # Name of the new directory for the output images. 82 | new_dir = os.path.join(out_dir, relative_path) 83 | 84 | # If the output-directory does not exist, then create it. 85 | if not os.path.exists(new_dir): 86 | os.makedirs(new_dir) 87 | 88 | # For all the files in the current directory. 89 | for file_name in file_names: 90 | # If the file has a valid video-extension. Compare lower-cases. 91 | if file_name.lower().endswith(video_exts): 92 | # File-path for the input video. 93 | in_file = os.path.join(current_dir, file_name) 94 | 95 | # Split the file-path in root and extension. 96 | file_root, file_ext = os.path.splitext(file_name) 97 | 98 | # Create the template file-name for the output images. 99 | new_file_name = file_root + "-%4d.jpg" 100 | 101 | # Complete file-path for the output images incl. all sub-dirs. 102 | new_file_path = os.path.join(new_dir, new_file_name) 103 | 104 | # Clean up the path by removing e.g. "/./" 105 | new_file_path = os.path.normpath(new_file_path) 106 | 107 | # Print status. 108 | print("Converting video to images:") 109 | print("- Input video: {0}".format(in_file)) 110 | print("- Output images: {0}".format(new_file_path)) 111 | 112 | # Command to be run in the shell for the video-conversion tool. 113 | cmd = "avconv -i {0} -r {1} -vf crop={2}:{2} -vf scale={3}:{3} -qscale 2 {4}" 114 | 115 | # Fill in the arguments for the command-line. 116 | cmd = cmd.format(in_file, framerate, crop_size, out_size, new_file_path) 117 | 118 | # Run the command-line in a shell. 119 | subprocess.call(cmd, shell=True) 120 | 121 | # Increase the number of videos processed. 122 | video_count += 1 123 | 124 | # Print newline. 125 | print() 126 | 127 | print("Number of videos converted: {0}".format(video_count)) 128 | 129 | 130 | ######################################################################## 131 | # This script allows you to run the video-conversion from the command-line. 132 | 133 | if __name__ == "__main__": 134 | # Argument description. 135 | desc = "Convert videos to images. " \ 136 | "Recursively processes all sub-dirs of INDIR " \ 137 | "and replicates the dir-structure to OUTDIR. " \ 138 | "The video is first cropped to CROP:CROP pixels, " \ 139 | "then resized to SIZE:SIZE pixels and written as a jpeg-file. " 140 | 141 | # Create the argument parser. 142 | parser = argparse.ArgumentParser(description=desc) 143 | 144 | # Add arguments to the parser. 145 | parser.add_argument("--indir", required=True, 146 | help="input directory where videos are located") 147 | 148 | parser.add_argument("--outdir", required=True, 149 | help="output directory where images will be saved") 150 | 151 | parser.add_argument("--crop", required=True, type=int, 152 | help="the input videos are first cropped to CROP:CROP pixels") 153 | 154 | parser.add_argument("--size", required=True, type=int, 155 | help="the input videos are then resized to SIZE:SIZE pixels") 156 | 157 | parser.add_argument("--rate", required=False, type=int, default=5, 158 | help="the number of frames to convert per second") 159 | 160 | parser.add_argument("--exts", required=False, nargs="+", 161 | help="list of extensions for video-files e.g. .mts .mp4") 162 | 163 | # Parse the command-line arguments. 164 | args = parser.parse_args() 165 | 166 | # Get the arguments. 167 | in_dir = args.indir 168 | out_dir = args.outdir 169 | crop_size = args.crop 170 | out_size = args.size 171 | framerate = args.rate 172 | video_exts = args.exts 173 | 174 | if video_exts is None: 175 | # Default extensions for video-files. 176 | video_exts = (".MTS", ".mp4") 177 | else: 178 | # A list of strings is provided as a command-line argument, but we 179 | # need a tuple instead of a list, so convert it to a tuple. 180 | video_exts = tuple(video_exts) 181 | 182 | # Print the arguments. 183 | print("Convert videos to images.") 184 | print("- Input dir: " + in_dir) 185 | print("- Output dir: " + out_dir) 186 | print("- Crop width and height: {0}".format(crop_size)) 187 | print("- Resize width and height: {0}".format(out_size)) 188 | print("- Frame-rate: {0}".format(framerate)) 189 | print("- Video extensions: {0}".format(video_exts)) 190 | print() 191 | 192 | # Perform the conversions. 193 | video2images(in_dir=in_dir, out_dir=out_dir, 194 | crop_size=crop_size, out_size=out_size, 195 | framerate=framerate, video_exts=video_exts) 196 | 197 | ######################################################################## 198 | -------------------------------------------------------------------------------- /dataset.py: -------------------------------------------------------------------------------- 1 | ######################################################################## 2 | # 3 | # Class for creating a data-set consisting of all files in a directory. 4 | # 5 | # Example usage is shown in the file knifey.py and Tutorial #09. 6 | # 7 | # Implemented in Python 3.5 8 | # 9 | ######################################################################## 10 | # 11 | # This file is part of the TensorFlow Tutorials available at: 12 | # 13 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials 14 | # 15 | # Published under the MIT License. See the file LICENSE for details. 16 | # 17 | # Copyright 2016 by Magnus Erik Hvass Pedersen 18 | # 19 | ######################################################################## 20 | 21 | import numpy as np 22 | import os 23 | import shutil 24 | from cache import cache 25 | 26 | ######################################################################## 27 | 28 | 29 | def one_hot_encoded(class_numbers, num_classes=None): 30 | """ 31 | Generate the One-Hot encoded class-labels from an array of integers. 32 | 33 | For example, if class_number=2 and num_classes=4 then 34 | the one-hot encoded label is the float array: [0. 0. 1. 0.] 35 | 36 | :param class_numbers: 37 | Array of integers with class-numbers. 38 | Assume the integers are from zero to num_classes-1 inclusive. 39 | 40 | :param num_classes: 41 | Number of classes. If None then use max(class_numbers)+1. 42 | 43 | :return: 44 | 2-dim array of shape: [len(class_numbers), num_classes] 45 | """ 46 | 47 | # Find the number of classes if None is provided. 48 | # Assumes the lowest class-number is zero. 49 | if num_classes is None: 50 | num_classes = np.max(class_numbers) + 1 51 | 52 | return np.eye(num_classes, dtype=float)[class_numbers] 53 | 54 | 55 | ######################################################################## 56 | 57 | 58 | class DataSet: 59 | def __init__(self, in_dir, exts='.jpg'): 60 | """ 61 | Create a data-set consisting of the filenames in the given directory 62 | and sub-dirs that match the given filename-extensions. 63 | 64 | For example, the knifey-spoony data-set (see knifey.py) has the 65 | following dir-structure: 66 | 67 | knifey-spoony/forky/ 68 | knifey-spoony/knifey/ 69 | knifey-spoony/spoony/ 70 | knifey-spoony/forky/test/ 71 | knifey-spoony/knifey/test/ 72 | knifey-spoony/spoony/test/ 73 | 74 | This means there are 3 classes called: forky, knifey, and spoony. 75 | 76 | If we set in_dir = "knifey-spoony/" and create a new DataSet-object 77 | then it will scan through these directories and create a training-set 78 | and test-set for each of these classes. 79 | 80 | The training-set will contain a list of all the *.jpg filenames 81 | in the following directories: 82 | 83 | knifey-spoony/forky/ 84 | knifey-spoony/knifey/ 85 | knifey-spoony/spoony/ 86 | 87 | The test-set will contain a list of all the *.jpg filenames 88 | in the following directories: 89 | 90 | knifey-spoony/forky/test/ 91 | knifey-spoony/knifey/test/ 92 | knifey-spoony/spoony/test/ 93 | 94 | See the TensorFlow Tutorial #09 for a usage example. 95 | 96 | :param in_dir: 97 | Root-dir for the files in the data-set. 98 | This would be 'knifey-spoony/' in the example above. 99 | 100 | :param exts: 101 | String or tuple of strings with valid filename-extensions. 102 | Not case-sensitive. 103 | 104 | :return: 105 | Object instance. 106 | """ 107 | 108 | # Extend the input directory to the full path. 109 | in_dir = os.path.abspath(in_dir) 110 | 111 | # Input directory. 112 | self.in_dir = in_dir 113 | 114 | # Convert all file-extensions to lower-case. 115 | self.exts = tuple(ext.lower() for ext in exts) 116 | 117 | # Names for the classes. 118 | self.class_names = [] 119 | 120 | # Filenames for all the files in the training-set. 121 | self.filenames = [] 122 | 123 | # Filenames for all the files in the test-set. 124 | self.filenames_test = [] 125 | 126 | # Class-number for each file in the training-set. 127 | self.class_numbers = [] 128 | 129 | # Class-number for each file in the test-set. 130 | self.class_numbers_test = [] 131 | 132 | # Total number of classes in the data-set. 133 | self.num_classes = 0 134 | 135 | # For all files/dirs in the input directory. 136 | for name in os.listdir(in_dir): 137 | # Full path for the file / dir. 138 | current_dir = os.path.join(in_dir, name) 139 | 140 | # If it is a directory. 141 | if os.path.isdir(current_dir): 142 | # Add the dir-name to the list of class-names. 143 | self.class_names.append(name) 144 | 145 | # Training-set. 146 | 147 | # Get all the valid filenames in the dir (not sub-dirs). 148 | filenames = self._get_filenames(current_dir) 149 | 150 | # Append them to the list of all filenames for the training-set. 151 | self.filenames.extend(filenames) 152 | 153 | # The class-number for this class. 154 | class_number = self.num_classes 155 | 156 | # Create an array of class-numbers. 157 | class_numbers = [class_number] * len(filenames) 158 | 159 | # Append them to the list of all class-numbers for the training-set. 160 | self.class_numbers.extend(class_numbers) 161 | 162 | # Test-set. 163 | 164 | # Get all the valid filenames in the sub-dir named 'test'. 165 | filenames_test = self._get_filenames(os.path.join(current_dir, 'test')) 166 | 167 | # Append them to the list of all filenames for the test-set. 168 | self.filenames_test.extend(filenames_test) 169 | 170 | # Create an array of class-numbers. 171 | class_numbers = [class_number] * len(filenames_test) 172 | 173 | # Append them to the list of all class-numbers for the test-set. 174 | self.class_numbers_test.extend(class_numbers) 175 | 176 | # Increase the total number of classes in the data-set. 177 | self.num_classes += 1 178 | 179 | def _get_filenames(self, dir): 180 | """ 181 | Create and return a list of filenames with matching extensions in the given directory. 182 | 183 | :param dir: 184 | Directory to scan for files. Sub-dirs are not scanned. 185 | 186 | :return: 187 | List of filenames. Only filenames. Does not include the directory. 188 | """ 189 | 190 | # Initialize empty list. 191 | filenames = [] 192 | 193 | # If the directory exists. 194 | if os.path.exists(dir): 195 | # Get all the filenames with matching extensions. 196 | for filename in os.listdir(dir): 197 | if filename.lower().endswith(self.exts): 198 | filenames.append(filename) 199 | 200 | return filenames 201 | 202 | def get_paths(self, test=False): 203 | """ 204 | Get the full paths for the files in the data-set. 205 | 206 | :param test: 207 | Boolean. Return the paths for the test-set (True) or training-set (False). 208 | 209 | :return: 210 | Iterator with strings for the path-names. 211 | """ 212 | 213 | if test: 214 | # Use the filenames and class-numbers for the test-set. 215 | filenames = self.filenames_test 216 | class_numbers = self.class_numbers_test 217 | 218 | # Sub-dir for test-set. 219 | test_dir = "test/" 220 | else: 221 | # Use the filenames and class-numbers for the training-set. 222 | filenames = self.filenames 223 | class_numbers = self.class_numbers 224 | 225 | # Don't use a sub-dir for test-set. 226 | test_dir = "" 227 | 228 | for filename, cls in zip(filenames, class_numbers): 229 | # Full path-name for the file. 230 | path = os.path.join(self.in_dir, self.class_names[cls], test_dir, filename) 231 | 232 | yield path 233 | 234 | def get_training_set(self): 235 | """ 236 | Return the list of paths for the files in the training-set, 237 | and the list of class-numbers as integers, 238 | and the class-numbers as one-hot encoded arrays. 239 | """ 240 | 241 | return list(self.get_paths()), \ 242 | np.asarray(self.class_numbers), \ 243 | one_hot_encoded(class_numbers=self.class_numbers, 244 | num_classes=self.num_classes) 245 | 246 | def get_test_set(self): 247 | """ 248 | Return the list of paths for the files in the test-set, 249 | and the list of class-numbers as integers, 250 | and the class-numbers as one-hot encoded arrays. 251 | """ 252 | 253 | return list(self.get_paths(test=True)), \ 254 | np.asarray(self.class_numbers_test), \ 255 | one_hot_encoded(class_numbers=self.class_numbers_test, 256 | num_classes=self.num_classes) 257 | 258 | def copy_files(self, train_dir, test_dir): 259 | """ 260 | Copy all the files in the training-set to train_dir 261 | and copy all the files in the test-set to test_dir. 262 | 263 | For example, the normal directory structure for the 264 | different classes in the training-set is: 265 | 266 | knifey-spoony/forky/ 267 | knifey-spoony/knifey/ 268 | knifey-spoony/spoony/ 269 | 270 | Normally the test-set is a sub-dir of the training-set: 271 | 272 | knifey-spoony/forky/test/ 273 | knifey-spoony/knifey/test/ 274 | knifey-spoony/spoony/test/ 275 | 276 | But some APIs use another dir-structure for the training-set: 277 | 278 | knifey-spoony/train/forky/ 279 | knifey-spoony/train/knifey/ 280 | knifey-spoony/train/spoony/ 281 | 282 | and for the test-set: 283 | 284 | knifey-spoony/test/forky/ 285 | knifey-spoony/test/knifey/ 286 | knifey-spoony/test/spoony/ 287 | 288 | :param train_dir: Directory for the training-set e.g. 'knifey-spoony/train/' 289 | :param test_dir: Directory for the test-set e.g. 'knifey-spoony/test/' 290 | :return: Nothing. 291 | """ 292 | 293 | # Helper-function for actually copying the files. 294 | def _copy_files(src_paths, dst_dir, class_numbers): 295 | 296 | # Create a list of dirs for each class, e.g.: 297 | # ['knifey-spoony/test/forky/', 298 | # 'knifey-spoony/test/knifey/', 299 | # 'knifey-spoony/test/spoony/'] 300 | class_dirs = [os.path.join(dst_dir, class_name + "/") 301 | for class_name in self.class_names] 302 | 303 | # Check if each class-directory exists, otherwise create it. 304 | for dir in class_dirs: 305 | if not os.path.exists(dir): 306 | os.makedirs(dir) 307 | 308 | # For all the file-paths and associated class-numbers, 309 | # copy the file to the destination dir for that class. 310 | for src, cls in zip(src_paths, class_numbers): 311 | shutil.copy(src=src, dst=class_dirs[cls]) 312 | 313 | # Copy the files for the training-set. 314 | _copy_files(src_paths=self.get_paths(test=False), 315 | dst_dir=train_dir, 316 | class_numbers=self.class_numbers) 317 | 318 | print("- Copied training-set to:", train_dir) 319 | 320 | # Copy the files for the test-set. 321 | _copy_files(src_paths=self.get_paths(test=True), 322 | dst_dir=test_dir, 323 | class_numbers=self.class_numbers_test) 324 | 325 | print("- Copied test-set to:", test_dir) 326 | 327 | 328 | ######################################################################## 329 | 330 | 331 | def load_cached(cache_path, in_dir): 332 | """ 333 | Wrapper-function for creating a DataSet-object, which will be 334 | loaded from a cache-file if it already exists, otherwise a new 335 | object will be created and saved to the cache-file. 336 | 337 | This is useful if you need to ensure the ordering of the 338 | filenames is consistent every time you load the data-set, 339 | for example if you use the DataSet-object in combination 340 | with Transfer Values saved to another cache-file, see e.g. 341 | Tutorial #09 for an example of this. 342 | 343 | :param cache_path: 344 | File-path for the cache-file. 345 | 346 | :param in_dir: 347 | Root-dir for the files in the data-set. 348 | This is an argument for the DataSet-init function. 349 | 350 | :return: 351 | The DataSet-object. 352 | """ 353 | 354 | print("Creating dataset from the files in: " + in_dir) 355 | 356 | # If the object-instance for DataSet(in_dir=data_dir) already 357 | # exists in the cache-file then reload it, otherwise create 358 | # an object instance and save it to the cache-file for next time. 359 | dataset = cache(cache_path=cache_path, 360 | fn=DataSet, in_dir=in_dir) 361 | 362 | return dataset 363 | 364 | 365 | ######################################################################## 366 | -------------------------------------------------------------------------------- /download.py: -------------------------------------------------------------------------------- 1 | ######################################################################## 2 | # 3 | # Functions for downloading and extracting data-files from the internet. 4 | # 5 | # Implemented in Python 3.5 6 | # 7 | ######################################################################## 8 | # 9 | # This file is part of the TensorFlow Tutorials available at: 10 | # 11 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials 12 | # 13 | # Published under the MIT License. See the file LICENSE for details. 14 | # 15 | # Copyright 2016 by Magnus Erik Hvass Pedersen 16 | # 17 | ######################################################################## 18 | 19 | import sys 20 | import os 21 | import urllib.request 22 | import tarfile 23 | import zipfile 24 | 25 | ######################################################################## 26 | 27 | 28 | def _print_download_progress(count, block_size, total_size): 29 | """ 30 | Function used for printing the download progress. 31 | Used as a call-back function in maybe_download_and_extract(). 32 | """ 33 | 34 | # Percentage completion. 35 | pct_complete = float(count * block_size) / total_size 36 | 37 | # Limit it because rounding errors may cause it to exceed 100%. 38 | pct_complete = min(1.0, pct_complete) 39 | 40 | # Status-message. Note the \r which means the line should overwrite itself. 41 | msg = "\r- Download progress: {0:.1%}".format(pct_complete) 42 | 43 | # Print it. 44 | sys.stdout.write(msg) 45 | sys.stdout.flush() 46 | 47 | 48 | ######################################################################## 49 | 50 | def download(base_url, filename, download_dir): 51 | """ 52 | Download the given file if it does not already exist in the download_dir. 53 | 54 | :param base_url: The internet URL without the filename. 55 | :param filename: The filename that will be added to the base_url. 56 | :param download_dir: Local directory for storing the file. 57 | :return: Nothing. 58 | """ 59 | 60 | # Path for local file. 61 | save_path = os.path.join(download_dir, filename) 62 | 63 | # Check if the file already exists, otherwise we need to download it now. 64 | if not os.path.exists(save_path): 65 | # Check if the download directory exists, otherwise create it. 66 | if not os.path.exists(download_dir): 67 | os.makedirs(download_dir) 68 | 69 | print("Downloading", filename, "...") 70 | 71 | # Download the file from the internet. 72 | url = base_url + filename 73 | file_path, _ = urllib.request.urlretrieve(url=url, 74 | filename=save_path, 75 | reporthook=_print_download_progress) 76 | 77 | print(" Done!") 78 | 79 | 80 | def maybe_download_and_extract(url, download_dir): 81 | """ 82 | Download and extract the data if it doesn't already exist. 83 | Assumes the url is a tar-ball file. 84 | 85 | :param url: 86 | Internet URL for the tar-file to download. 87 | Example: "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz" 88 | 89 | :param download_dir: 90 | Directory where the downloaded file is saved. 91 | Example: "data/CIFAR-10/" 92 | 93 | :return: 94 | Nothing. 95 | """ 96 | 97 | # Filename for saving the file downloaded from the internet. 98 | # Use the filename from the URL and add it to the download_dir. 99 | filename = url.split('/')[-1] 100 | file_path = os.path.join(download_dir, filename) 101 | 102 | # Check if the file already exists. 103 | # If it exists then we assume it has also been extracted, 104 | # otherwise we need to download and extract it now. 105 | if not os.path.exists(file_path): 106 | # Check if the download directory exists, otherwise create it. 107 | if not os.path.exists(download_dir): 108 | os.makedirs(download_dir) 109 | 110 | # Download the file from the internet. 111 | file_path, _ = urllib.request.urlretrieve(url=url, 112 | filename=file_path, 113 | reporthook=_print_download_progress) 114 | 115 | print() 116 | print("Download finished. Extracting files.") 117 | 118 | if file_path.endswith(".zip"): 119 | # Unpack the zip-file. 120 | zipfile.ZipFile(file=file_path, mode="r").extractall(download_dir) 121 | elif file_path.endswith((".tar.gz", ".tgz")): 122 | # Unpack the tar-ball. 123 | tarfile.open(name=file_path, mode="r:gz").extractall(download_dir) 124 | 125 | print("Done.") 126 | else: 127 | print("Data has apparently already been downloaded and unpacked.") 128 | 129 | 130 | ######################################################################## 131 | -------------------------------------------------------------------------------- /europarl.py: -------------------------------------------------------------------------------- 1 | ######################################################################## 2 | # 3 | # Functions for downloading the Europarl data-set from the internet 4 | # and loading it into memory. This data-set is used for translation 5 | # between English and most European languages. 6 | # 7 | # http://www.statmt.org/europarl/ 8 | # 9 | # Implemented in Python 3.6 10 | # 11 | # Usage: 12 | # 1) Set the variable data_dir with the desired storage directory. 13 | # 2) Determine the language-code to use e.g. "da" for Danish. 14 | # 3) Call maybe_download_and_extract() to download the data-set 15 | # if it is not already located in the given data_dir. 16 | # 4) Call load_data(english=True) and load_data(english=False) 17 | # to load the two data-files. 18 | # 5) Use the returned data in your own program. 19 | # 20 | # Format: 21 | # The Europarl data-set contains millions of text-pairs between English 22 | # and most European languages. The data is stored in two text-files. 23 | # The data is returned as lists of strings by the load_data() function. 24 | # 25 | # The list of currently supported languages and their codes are as follows: 26 | # 27 | # bg - Bulgarian 28 | # cs - Czech 29 | # da - Danish 30 | # de - German 31 | # el - Greek 32 | # es - Spanish 33 | # et - Estonian 34 | # fi - Finnish 35 | # fr - French 36 | # hu - Hungarian 37 | # it - Italian 38 | # lt - Lithuanian 39 | # lv - Latvian 40 | # nl - Dutch 41 | # pl - Polish 42 | # pt - Portuguese 43 | # ro - Romanian 44 | # sk - Slovak 45 | # sl - Slovene 46 | # sv - Swedish 47 | # 48 | ######################################################################## 49 | # 50 | # This file is part of the TensorFlow Tutorials available at: 51 | # 52 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials 53 | # 54 | # Published under the MIT License. See the file LICENSE for details. 55 | # 56 | # Copyright 2018 by Magnus Erik Hvass Pedersen 57 | # 58 | ######################################################################## 59 | 60 | import os 61 | import download 62 | 63 | ######################################################################## 64 | 65 | # Directory where you want to download and save the data-set. 66 | # Set this before you start calling any of the functions below. 67 | data_dir = "data/europarl/" 68 | 69 | # Base-URL for the data-sets on the internet. 70 | data_url = "http://www.statmt.org/europarl/v7/" 71 | 72 | 73 | ######################################################################## 74 | # Public functions that you may call to download the data-set from 75 | # the internet and load the data into memory. 76 | 77 | 78 | def maybe_download_and_extract(language_code="da"): 79 | """ 80 | Download and extract the Europarl data-set if the data-file doesn't 81 | already exist in data_dir. The data-set is for translating between 82 | English and the given language-code (e.g. 'da' for Danish, see the 83 | list of available language-codes above). 84 | """ 85 | 86 | # Create the full URL for the file with this data-set. 87 | url = data_url + language_code + "-en.tgz" 88 | 89 | download.maybe_download_and_extract(url=url, download_dir=data_dir) 90 | 91 | 92 | def load_data(english=True, language_code="da", start="", end=""): 93 | """ 94 | Load the data-file for either the English-language texts or 95 | for the other language (e.g. "da" for Danish). 96 | 97 | All lines of the data-file are returned as a list of strings. 98 | 99 | :param english: 100 | Boolean whether to load the data-file for 101 | English (True) or the other language (False). 102 | 103 | :param language_code: 104 | Two-char code for the other language e.g. "da" for Danish. 105 | See list of available codes above. 106 | 107 | :param start: 108 | Prepend each line with this text e.g. "ssss " to indicate start of line. 109 | 110 | :param end: 111 | Append each line with this text e.g. " eeee" to indicate end of line. 112 | 113 | :return: 114 | List of strings with all the lines of the data-file. 115 | """ 116 | 117 | if english: 118 | # Load the English data. 119 | filename = "europarl-v7.{0}-en.en".format(language_code) 120 | else: 121 | # Load the other language. 122 | filename = "europarl-v7.{0}-en.{0}".format(language_code) 123 | 124 | # Full path for the data-file. 125 | path = os.path.join(data_dir, filename) 126 | 127 | # Open and read all the contents of the data-file. 128 | with open(path, encoding="utf-8") as file: 129 | # Read the line from file, strip leading and trailing whitespace, 130 | # prepend the start-text and append the end-text. 131 | texts = [start + line.strip() + end for line in file] 132 | 133 | return texts 134 | 135 | 136 | ######################################################################## 137 | -------------------------------------------------------------------------------- /forks.md: -------------------------------------------------------------------------------- 1 | # TensorFlow Tutorials - Forks 2 | 3 | These are forks of the [original TensorFlow Tutorials by Hvass-Labs](https://github.com/Hvass-Labs/TensorFlow-Tutorials). 4 | They are not developed or even reviewed by the original author, who takes no reponsibility for these forks. 5 | 6 | If you have made a fork of the TensorFlow Tutorials with substantial modifications that you feel may be useful to others, 7 | then please [open a new issue on GitHub](https://github.com/Hvass-Labs/TensorFlow-Tutorials/issues) with a link and short description. 8 | 9 | * [Keras port of some tutorials.](https://github.com/chidochipotle/TensorFlow-Tutorials) 10 | * [The Inception model as an OpenFaaS function.](https://github.com/faas-and-furious/inception-function) 11 | -------------------------------------------------------------------------------- /images/02_convolution.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/02_convolution.png -------------------------------------------------------------------------------- /images/02_network_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/02_network_flowchart.png -------------------------------------------------------------------------------- /images/06_network_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/06_network_flowchart.png -------------------------------------------------------------------------------- /images/07_inception_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/07_inception_flowchart.png -------------------------------------------------------------------------------- /images/08_transfer_learning_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/08_transfer_learning_flowchart.png -------------------------------------------------------------------------------- /images/09_transfer_learning_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/09_transfer_learning_flowchart.png -------------------------------------------------------------------------------- /images/10_transfer_learning_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/10_transfer_learning_flowchart.png -------------------------------------------------------------------------------- /images/11_adversarial_examples_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/11_adversarial_examples_flowchart.png -------------------------------------------------------------------------------- /images/12_adversarial_noise_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/12_adversarial_noise_flowchart.png -------------------------------------------------------------------------------- /images/13_visual_analysis_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/13_visual_analysis_flowchart.png -------------------------------------------------------------------------------- /images/13b_visual_analysis_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/13b_visual_analysis_flowchart.png -------------------------------------------------------------------------------- /images/14_deepdream_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/14_deepdream_flowchart.png -------------------------------------------------------------------------------- /images/14_deepdream_recursive_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/14_deepdream_recursive_flowchart.png -------------------------------------------------------------------------------- /images/15_style_transfer_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/15_style_transfer_flowchart.png -------------------------------------------------------------------------------- /images/16_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/16_flowchart.png -------------------------------------------------------------------------------- /images/16_motion-trace.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/16_motion-trace.png -------------------------------------------------------------------------------- /images/16_problem.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/16_problem.png -------------------------------------------------------------------------------- /images/16_q-values-details.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/16_q-values-details.png -------------------------------------------------------------------------------- /images/16_q-values-simple.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/16_q-values-simple.png -------------------------------------------------------------------------------- /images/16_training_stability.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/16_training_stability.png -------------------------------------------------------------------------------- /images/19_flowchart_bayesian_optimization.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/19_flowchart_bayesian_optimization.png -------------------------------------------------------------------------------- /images/20_natural_language_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/20_natural_language_flowchart.png -------------------------------------------------------------------------------- /images/20_natural_language_flowchart.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 19 | 21 | 29 | 35 | 36 | 44 | 50 | 51 | 59 | 65 | 66 | 74 | 80 | 81 | 89 | 95 | 96 | 104 | 110 | 111 | 112 | 134 | 136 | 137 | 139 | image/svg+xml 140 | 142 | 143 | 144 | 145 | 146 | 151 | [11, 6, 21, 3, 49, 17] 162 | [[0.67, 0.36, ..., 0.39], [0.76, 0.61, ..., 0.70], ...] 177 | Raw Text 189 | "This is not a good movie!" 200 | 207 | 209 | 211 | Tokenizer 223 | Converts text to integer-tokens 234 | 235 | 242 | 243 | 245 | 247 | Embedding 259 | Converts integer-tokensto real-valued vectors. 274 | 275 | 282 | 283 | 285 | 287 | Recurrent Neural Network 299 | Process sequences of arbitrary length 310 | 311 | 318 | 319 | 322 | 324 | 326 | Sigmoid 338 | Dense output layer to predict class 349 | 350 | 351 | 358 | 359 | 362 | 0.0 (Negative) 373 | 380 | 381 | 384 | 1.0 (Positive) 395 | 402 | 403 | 409 | 415 | 421 | 427 | 433 | 439 | 440 | 441 | -------------------------------------------------------------------------------- /images/20_recurrent_unit.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/20_recurrent_unit.png -------------------------------------------------------------------------------- /images/20_recurrent_unit.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 19 | 21 | 29 | 35 | 36 | 45 | 51 | 52 | 61 | 67 | 68 | 77 | 83 | 84 | 92 | 98 | 99 | 107 | 113 | 114 | 115 | 137 | 139 | 140 | 142 | image/svg+xml 143 | 145 | 146 | 147 | 148 | 149 | 154 | 161 | 163 | 166 | Old State 177 | 184 | 185 | 188 | New State 199 | 206 | 207 | 209 | Output 224 | 231 | 232 | 235 | Input 246 | 253 | 254 | 257 | Gate 268 | 275 | 276 | 279 | Gate 290 | 297 | 298 | 304 | 310 | 316 | 322 | 328 | 334 | 335 | 336 | 337 | -------------------------------------------------------------------------------- /images/20_unrolled_3layers_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/20_unrolled_3layers_flowchart.png -------------------------------------------------------------------------------- /images/20_unrolled_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/20_unrolled_flowchart.png -------------------------------------------------------------------------------- /images/20_unrolled_flowchart.svg: -------------------------------------------------------------------------------- 1 | 2 | 3 | 4 | 19 | 21 | 29 | 34 | 35 | 43 | 48 | 49 | 57 | 62 | 63 | 71 | 76 | 77 | 85 | 91 | 92 | 101 | 107 | 108 | 117 | 123 | 124 | 133 | 139 | 140 | 149 | 155 | 156 | 165 | 171 | 172 | 181 | 187 | 188 | 197 | 203 | 204 | 212 | 218 | 219 | 227 | 233 | 234 | 242 | 248 | 249 | 257 | 263 | 264 | 272 | 278 | 279 | 287 | 293 | 294 | 302 | 308 | 309 | 310 | 332 | 334 | 335 | 337 | image/svg+xml 338 | 340 | 341 | 342 | 343 | 344 | 349 | 352 | RU 363 | 370 | 371 | 373 | "this" 384 | 391 | 392 | 395 | "is" 406 | 413 | 414 | 417 | "not" 428 | 435 | 436 | 439 | "a" 450 | 457 | 458 | 461 | "very" 472 | 479 | 480 | 483 | "good" 494 | 501 | 502 | 505 | "movie" 516 | 523 | 524 | 527 | FirstStateZero 546 | 553 | 554 | 557 | RU 568 | 575 | 576 | 579 | RU 590 | 597 | 598 | 601 | RU 612 | 619 | 620 | 623 | RU 634 | 641 | 642 | 645 | RU 656 | 663 | 664 | 667 | RU 678 | 685 | 686 | 689 | PositiveorNegative 708 | 715 | 716 | 722 | 728 | 734 | 740 | 746 | 752 | 758 | 764 | 770 | 776 | 782 | 788 | 794 | 800 | 806 | (Output) 817 | (States) 828 | 833 | 838 | 843 | 848 | 849 | 850 | -------------------------------------------------------------------------------- /images/21_machine_translation_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/21_machine_translation_flowchart.png -------------------------------------------------------------------------------- /images/22_image_captioning_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/22_image_captioning_flowchart.png -------------------------------------------------------------------------------- /images/23_time_series_flowchart.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/23_time_series_flowchart.png -------------------------------------------------------------------------------- /images/Denmark.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/Denmark.jpg -------------------------------------------------------------------------------- /images/Europe.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/Europe.jpg -------------------------------------------------------------------------------- /images/elon_musk.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/elon_musk.jpg -------------------------------------------------------------------------------- /images/elon_musk_100x100.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/elon_musk_100x100.jpg -------------------------------------------------------------------------------- /images/escher_planefilling2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/escher_planefilling2.jpg -------------------------------------------------------------------------------- /images/giger.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/giger.jpg -------------------------------------------------------------------------------- /images/hulk.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/hulk.jpg -------------------------------------------------------------------------------- /images/parrot.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/parrot.jpg -------------------------------------------------------------------------------- /images/parrot_cropped1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/parrot_cropped1.jpg -------------------------------------------------------------------------------- /images/parrot_cropped2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/parrot_cropped2.jpg -------------------------------------------------------------------------------- /images/parrot_cropped3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/parrot_cropped3.jpg -------------------------------------------------------------------------------- /images/parrot_padded.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/parrot_padded.jpg -------------------------------------------------------------------------------- /images/style1.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/style1.jpg -------------------------------------------------------------------------------- /images/style2.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/style2.jpg -------------------------------------------------------------------------------- /images/style3.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/style3.jpg -------------------------------------------------------------------------------- /images/style4.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/style4.jpg -------------------------------------------------------------------------------- /images/style5.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/style5.jpg -------------------------------------------------------------------------------- /images/style6.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/style6.jpg -------------------------------------------------------------------------------- /images/style7.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/style7.jpg -------------------------------------------------------------------------------- /images/style8.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/style8.jpg -------------------------------------------------------------------------------- /images/style9.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/style9.jpg -------------------------------------------------------------------------------- /images/willy_wonka_new.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/willy_wonka_new.jpg -------------------------------------------------------------------------------- /images/willy_wonka_old.jpg: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/willy_wonka_old.jpg -------------------------------------------------------------------------------- /imdb.py: -------------------------------------------------------------------------------- 1 | ######################################################################## 2 | # 3 | # Functions for downloading the IMDB Review data-set from the internet 4 | # and loading it into memory. 5 | # 6 | # Implemented in Python 3.6 7 | # 8 | # Usage: 9 | # 1) Set the variable data_dir with the desired storage directory. 10 | # 2) Call maybe_download_and_extract() to download the data-set 11 | # if it is not already located in the given data_dir. 12 | # 3) Call load_data(train=True) to load the training-set. 13 | # 4) Call load_data(train=False) to load the test-set. 14 | # 5) Use the returned data in your own program. 15 | # 16 | # Format: 17 | # The IMDB Review data-set consists of 50000 reviews of movies 18 | # that are split into 25000 reviews for the training- and test-set, 19 | # and each of those is split into 12500 positive and 12500 negative reviews. 20 | # These are returned as lists of strings by the load_data() function. 21 | # 22 | ######################################################################## 23 | # 24 | # This file is part of the TensorFlow Tutorials available at: 25 | # 26 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials 27 | # 28 | # Published under the MIT License. See the file LICENSE for details. 29 | # 30 | # Copyright 2018 by Magnus Erik Hvass Pedersen 31 | # 32 | ######################################################################## 33 | 34 | import os 35 | import download 36 | import glob 37 | 38 | ######################################################################## 39 | 40 | # Directory where you want to download and save the data-set. 41 | # Set this before you start calling any of the functions below. 42 | data_dir = "data/IMDB/" 43 | 44 | # URL for the data-set on the internet. 45 | data_url = "http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz" 46 | 47 | 48 | ######################################################################## 49 | # Private helper-functions. 50 | 51 | def _read_text_file(path): 52 | """ 53 | Read and return all the contents of the text-file with the given path. 54 | It is returned as a single string where all lines are concatenated. 55 | """ 56 | 57 | with open(path, 'rt', encoding='utf-8') as file: 58 | # Read a list of strings. 59 | lines = file.readlines() 60 | 61 | # Concatenate to a single string. 62 | text = " ".join(lines) 63 | 64 | return text 65 | 66 | 67 | ######################################################################## 68 | # Public functions that you may call to download the data-set from 69 | # the internet and load the data into memory. 70 | 71 | 72 | def maybe_download_and_extract(): 73 | """ 74 | Download and extract the IMDB Review data-set if it doesn't already exist 75 | in data_dir (set this variable first to the desired directory). 76 | """ 77 | 78 | download.maybe_download_and_extract(url=data_url, download_dir=data_dir) 79 | 80 | 81 | def load_data(train=True): 82 | """ 83 | Load all the data from the IMDB Review data-set for sentiment analysis. 84 | 85 | :param train: Boolean whether to load the training-set (True) 86 | or the test-set (False). 87 | 88 | :return: A list of all the reviews as text-strings, 89 | and a list of the corresponding sentiments 90 | where 1.0 is positive and 0.0 is negative. 91 | """ 92 | 93 | # Part of the path-name for either training or test-set. 94 | train_test_path = "train" if train else "test" 95 | 96 | # Base-directory where the extracted data is located. 97 | dir_base = os.path.join(data_dir, "aclImdb", train_test_path) 98 | 99 | # Filename-patterns for the data-files. 100 | path_pattern_pos = os.path.join(dir_base, "pos", "*.txt") 101 | path_pattern_neg = os.path.join(dir_base, "neg", "*.txt") 102 | 103 | # Get lists of all the file-paths for the data. 104 | paths_pos = glob.glob(path_pattern_pos) 105 | paths_neg = glob.glob(path_pattern_neg) 106 | 107 | # Read all the text-files. 108 | data_pos = [_read_text_file(path) for path in paths_pos] 109 | data_neg = [_read_text_file(path) for path in paths_neg] 110 | 111 | # Concatenate the positive and negative data. 112 | x = data_pos + data_neg 113 | 114 | # Create a list of the sentiments for the text-data. 115 | # 1.0 is a positive sentiment, 0.0 is a negative sentiment. 116 | y = [1.0] * len(data_pos) + [0.0] * len(data_neg) 117 | 118 | return x, y 119 | 120 | 121 | ######################################################################## 122 | -------------------------------------------------------------------------------- /inception.py: -------------------------------------------------------------------------------- 1 | ######################################################################## 2 | # 3 | # The Inception Model v3 for TensorFlow. 4 | # 5 | # This is a pre-trained Deep Neural Network for classifying images. 6 | # You provide an image or filename for a jpeg-file which will be 7 | # loaded and input to the Inception model, which will then output 8 | # an array of numbers indicating how likely it is that the 9 | # input-image is of each class. 10 | # 11 | # See the example code at the bottom of this file or in the 12 | # accompanying Python Notebooks. 13 | # 14 | # Tutorial #07 shows how to use the Inception model. 15 | # Tutorial #08 shows how to use it for Transfer Learning. 16 | # 17 | # What is Transfer Learning? 18 | # 19 | # Transfer Learning is the use of a Neural Network for classifying 20 | # images from another data-set than it was trained on. For example, 21 | # the Inception model was trained on the ImageNet data-set using 22 | # a very powerful and expensive computer. But the Inception model 23 | # can be re-used on data-sets it was not trained on without having 24 | # to re-train the entire model, even though the number of classes 25 | # are different for the two data-sets. This allows you to use the 26 | # Inception model on your own data-sets without the need for a 27 | # very powerful and expensive computer to train it. 28 | # 29 | # The last layer of the Inception model before the softmax-classifier 30 | # is called the Transfer Layer because the output of that layer will 31 | # be used as the input in your new softmax-classifier (or as the 32 | # input for another neural network), which will then be trained on 33 | # your own data-set. 34 | # 35 | # The output values of the Transfer Layer are called Transfer Values. 36 | # These are the actual values that will be input to your new 37 | # softmax-classifier or to another neural network that you create. 38 | # 39 | # The word 'bottleneck' is also sometimes used to refer to the 40 | # Transfer Layer or Transfer Values, but it is a confusing word 41 | # that is not used here. 42 | # 43 | # Implemented in Python 3.5 with TensorFlow v0.10.0rc0 44 | # 45 | ######################################################################## 46 | # 47 | # This file is part of the TensorFlow Tutorials available at: 48 | # 49 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials 50 | # 51 | # Published under the MIT License. See the file LICENSE for details. 52 | # 53 | # Copyright 2016 by Magnus Erik Hvass Pedersen 54 | # 55 | ######################################################################## 56 | 57 | import numpy as np 58 | import tensorflow as tf 59 | import download 60 | from cache import cache 61 | import os 62 | import sys 63 | 64 | ######################################################################## 65 | # Various directories and file-names. 66 | 67 | # Internet URL for the tar-file with the Inception model. 68 | # Note that this might change in the future and will need to be updated. 69 | data_url = "http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz" 70 | 71 | # Directory to store the downloaded data. 72 | data_dir = "inception/" 73 | 74 | # File containing the mappings between class-number and uid. (Downloaded) 75 | path_uid_to_cls = "imagenet_2012_challenge_label_map_proto.pbtxt" 76 | 77 | # File containing the mappings between uid and string. (Downloaded) 78 | path_uid_to_name = "imagenet_synset_to_human_label_map.txt" 79 | 80 | # File containing the TensorFlow graph definition. (Downloaded) 81 | path_graph_def = "classify_image_graph_def.pb" 82 | 83 | ######################################################################## 84 | 85 | 86 | def maybe_download(): 87 | """ 88 | Download the Inception model from the internet if it does not already 89 | exist in the data_dir. The file is about 85 MB. 90 | """ 91 | 92 | print("Downloading Inception v3 Model ...") 93 | download.maybe_download_and_extract(url=data_url, download_dir=data_dir) 94 | 95 | 96 | ######################################################################## 97 | 98 | 99 | class NameLookup: 100 | """ 101 | Used for looking up the name associated with a class-number. 102 | This is used to print the name of a class instead of its number, 103 | e.g. "plant" or "horse". 104 | 105 | Maps between: 106 | - cls is the class-number as an integer between 1 and 1000 (inclusive). 107 | - uid is a class-id as a string from the ImageNet data-set, e.g. "n00017222". 108 | - name is the class-name as a string, e.g. "plant, flora, plant life" 109 | 110 | There are actually 1008 output classes of the Inception model 111 | but there are only 1000 named classes in these mapping-files. 112 | The remaining 8 output classes of the model should not be used. 113 | """ 114 | 115 | def __init__(self): 116 | # Mappings between uid, cls and name are dicts, where insertions and 117 | # lookup have O(1) time-usage on average, but may be O(n) in worst case. 118 | self._uid_to_cls = {} # Map from uid to cls. 119 | self._uid_to_name = {} # Map from uid to name. 120 | self._cls_to_uid = {} # Map from cls to uid. 121 | 122 | # Read the uid-to-name mappings from file. 123 | path = os.path.join(data_dir, path_uid_to_name) 124 | with open(file=path, mode='r') as file: 125 | # Read all lines from the file. 126 | lines = file.readlines() 127 | 128 | for line in lines: 129 | # Remove newlines. 130 | line = line.replace("\n", "") 131 | 132 | # Split the line on tabs. 133 | elements = line.split("\t") 134 | 135 | # Get the uid. 136 | uid = elements[0] 137 | 138 | # Get the class-name. 139 | name = elements[1] 140 | 141 | # Insert into the lookup-dict. 142 | self._uid_to_name[uid] = name 143 | 144 | # Read the uid-to-cls mappings from file. 145 | path = os.path.join(data_dir, path_uid_to_cls) 146 | with open(file=path, mode='r') as file: 147 | # Read all lines from the file. 148 | lines = file.readlines() 149 | 150 | for line in lines: 151 | # We assume the file is in the proper format, 152 | # so the following lines come in pairs. Other lines are ignored. 153 | 154 | if line.startswith(" target_class: "): 155 | # This line must be the class-number as an integer. 156 | 157 | # Split the line. 158 | elements = line.split(": ") 159 | 160 | # Get the class-number as an integer. 161 | cls = int(elements[1]) 162 | 163 | elif line.startswith(" target_class_string: "): 164 | # This line must be the uid as a string. 165 | 166 | # Split the line. 167 | elements = line.split(": ") 168 | 169 | # Get the uid as a string e.g. "n01494475" 170 | uid = elements[1] 171 | 172 | # Remove the enclosing "" from the string. 173 | uid = uid[1:-2] 174 | 175 | # Insert into the lookup-dicts for both ways between uid and cls. 176 | self._uid_to_cls[uid] = cls 177 | self._cls_to_uid[cls] = uid 178 | 179 | def uid_to_cls(self, uid): 180 | """ 181 | Return the class-number as an integer for the given uid-string. 182 | """ 183 | 184 | return self._uid_to_cls[uid] 185 | 186 | def uid_to_name(self, uid, only_first_name=False): 187 | """ 188 | Return the class-name for the given uid string. 189 | 190 | Some class-names are lists of names, if you only want the first name, 191 | then set only_first_name=True. 192 | """ 193 | 194 | # Lookup the name from the uid. 195 | name = self._uid_to_name[uid] 196 | 197 | # Only use the first name in the list? 198 | if only_first_name: 199 | name = name.split(",")[0] 200 | 201 | return name 202 | 203 | def cls_to_name(self, cls, only_first_name=False): 204 | """ 205 | Return the class-name from the integer class-number. 206 | 207 | Some class-names are lists of names, if you only want the first name, 208 | then set only_first_name=True. 209 | """ 210 | 211 | # Lookup the uid from the cls. 212 | uid = self._cls_to_uid[cls] 213 | 214 | # Lookup the name from the uid. 215 | name = self.uid_to_name(uid=uid, only_first_name=only_first_name) 216 | 217 | return name 218 | 219 | 220 | ######################################################################## 221 | 222 | 223 | class Inception: 224 | """ 225 | The Inception model is a Deep Neural Network which has already been 226 | trained for classifying images into 1000 different categories. 227 | 228 | When you create a new instance of this class, the Inception model 229 | will be loaded and can be used immediately without training. 230 | 231 | The Inception model can also be used for Transfer Learning. 232 | """ 233 | 234 | # Name of the tensor for feeding the input image as jpeg. 235 | tensor_name_input_jpeg = "DecodeJpeg/contents:0" 236 | 237 | # Name of the tensor for feeding the decoded input image. 238 | # Use this for feeding images in other formats than jpeg. 239 | tensor_name_input_image = "DecodeJpeg:0" 240 | 241 | # Name of the tensor for the resized input image. 242 | # This is used to retrieve the image after it has been resized. 243 | tensor_name_resized_image = "ResizeBilinear:0" 244 | 245 | # Name of the tensor for the output of the softmax-classifier. 246 | # This is used for classifying images with the Inception model. 247 | tensor_name_softmax = "softmax:0" 248 | 249 | # Name of the tensor for the unscaled outputs of the softmax-classifier (aka. logits). 250 | tensor_name_softmax_logits = "softmax/logits:0" 251 | 252 | # Name of the tensor for the output of the Inception model. 253 | # This is used for Transfer Learning. 254 | tensor_name_transfer_layer = "pool_3:0" 255 | 256 | def __init__(self): 257 | # Mappings between class-numbers and class-names. 258 | # Used to print the class-name as a string e.g. "horse" or "plant". 259 | self.name_lookup = NameLookup() 260 | 261 | # Now load the Inception model from file. The way TensorFlow 262 | # does this is confusing and requires several steps. 263 | 264 | # Create a new TensorFlow computational graph. 265 | self.graph = tf.Graph() 266 | 267 | # Set the new graph as the default. 268 | with self.graph.as_default(): 269 | 270 | # TensorFlow graphs are saved to disk as so-called Protocol Buffers 271 | # aka. proto-bufs which is a file-format that works on multiple 272 | # platforms. In this case it is saved as a binary file. 273 | 274 | # Open the graph-def file for binary reading. 275 | path = os.path.join(data_dir, path_graph_def) 276 | with tf.gfile.FastGFile(path, 'rb') as file: 277 | # The graph-def is a saved copy of a TensorFlow graph. 278 | # First we need to create an empty graph-def. 279 | graph_def = tf.GraphDef() 280 | 281 | # Then we load the proto-buf file into the graph-def. 282 | graph_def.ParseFromString(file.read()) 283 | 284 | # Finally we import the graph-def to the default TensorFlow graph. 285 | tf.import_graph_def(graph_def, name='') 286 | 287 | # Now self.graph holds the Inception model from the proto-buf file. 288 | 289 | # Get the output of the Inception model by looking up the tensor 290 | # with the appropriate name for the output of the softmax-classifier. 291 | self.y_pred = self.graph.get_tensor_by_name(self.tensor_name_softmax) 292 | 293 | # Get the unscaled outputs for the Inception model (aka. softmax-logits). 294 | self.y_logits = self.graph.get_tensor_by_name(self.tensor_name_softmax_logits) 295 | 296 | # Get the tensor for the resized image that is input to the neural network. 297 | self.resized_image = self.graph.get_tensor_by_name(self.tensor_name_resized_image) 298 | 299 | # Get the tensor for the last layer of the graph, aka. the transfer-layer. 300 | self.transfer_layer = self.graph.get_tensor_by_name(self.tensor_name_transfer_layer) 301 | 302 | # Get the number of elements in the transfer-layer. 303 | self.transfer_len = self.transfer_layer.get_shape()[3] 304 | 305 | # Create a TensorFlow session for executing the graph. 306 | self.session = tf.Session(graph=self.graph) 307 | 308 | def close(self): 309 | """ 310 | Call this function when you are done using the Inception model. 311 | It closes the TensorFlow session to release its resources. 312 | """ 313 | 314 | self.session.close() 315 | 316 | def _write_summary(self, logdir='summary/'): 317 | """ 318 | Write graph to summary-file so it can be shown in TensorBoard. 319 | 320 | This function is used for debugging and may be changed or removed in the future. 321 | 322 | :param logdir: 323 | Directory for writing the summary-files. 324 | 325 | :return: 326 | Nothing. 327 | """ 328 | 329 | writer = tf.train.SummaryWriter(logdir=logdir, graph=self.graph) 330 | writer.close() 331 | 332 | def _create_feed_dict(self, image_path=None, image=None): 333 | """ 334 | Create and return a feed-dict with an image. 335 | 336 | :param image_path: 337 | The input image is a jpeg-file with this file-path. 338 | 339 | :param image: 340 | The input image is a 3-dim array which is already decoded. 341 | The pixels MUST be values between 0 and 255 (float or int). 342 | 343 | :return: 344 | Dict for feeding to the Inception graph in TensorFlow. 345 | """ 346 | 347 | if image is not None: 348 | # Image is passed in as a 3-dim array that is already decoded. 349 | feed_dict = {self.tensor_name_input_image: image} 350 | 351 | elif image_path is not None: 352 | # Read the jpeg-image as an array of bytes. 353 | image_data = tf.gfile.FastGFile(image_path, 'rb').read() 354 | 355 | # Image is passed in as a jpeg-encoded image. 356 | feed_dict = {self.tensor_name_input_jpeg: image_data} 357 | 358 | else: 359 | raise ValueError("Either image or image_path must be set.") 360 | 361 | return feed_dict 362 | 363 | def classify(self, image_path=None, image=None): 364 | """ 365 | Use the Inception model to classify a single image. 366 | 367 | The image will be resized automatically to 299 x 299 pixels, 368 | see the discussion in the Python Notebook for Tutorial #07. 369 | 370 | :param image_path: 371 | The input image is a jpeg-file with this file-path. 372 | 373 | :param image: 374 | The input image is a 3-dim array which is already decoded. 375 | The pixels MUST be values between 0 and 255 (float or int). 376 | 377 | :return: 378 | Array of floats (aka. softmax-array) indicating how likely 379 | the Inception model thinks the image is of each given class. 380 | """ 381 | 382 | # Create a feed-dict for the TensorFlow graph with the input image. 383 | feed_dict = self._create_feed_dict(image_path=image_path, image=image) 384 | 385 | # Execute the TensorFlow session to get the predicted labels. 386 | pred = self.session.run(self.y_pred, feed_dict=feed_dict) 387 | 388 | # Reduce the array to a single dimension. 389 | pred = np.squeeze(pred) 390 | 391 | return pred 392 | 393 | def get_resized_image(self, image_path=None, image=None): 394 | """ 395 | Input an image to the Inception model and return 396 | the resized image. The resized image can be plotted so 397 | we can see what the neural network sees as its input. 398 | 399 | :param image_path: 400 | The input image is a jpeg-file with this file-path. 401 | 402 | :param image: 403 | The input image is a 3-dim array which is already decoded. 404 | The pixels MUST be values between 0 and 255 (float or int). 405 | 406 | :return: 407 | A 3-dim array holding the image. 408 | """ 409 | 410 | # Create a feed-dict for the TensorFlow graph with the input image. 411 | feed_dict = self._create_feed_dict(image_path=image_path, image=image) 412 | 413 | # Execute the TensorFlow session to get the predicted labels. 414 | resized_image = self.session.run(self.resized_image, feed_dict=feed_dict) 415 | 416 | # Remove the 1st dimension of the 4-dim tensor. 417 | resized_image = resized_image.squeeze(axis=0) 418 | 419 | # Scale pixels to be between 0.0 and 1.0 420 | resized_image = resized_image.astype(float) / 255.0 421 | 422 | return resized_image 423 | 424 | def print_scores(self, pred, k=10, only_first_name=True): 425 | """ 426 | Print the scores (or probabilities) for the top-k predicted classes. 427 | 428 | :param pred: 429 | Predicted class-labels returned from the predict() function. 430 | 431 | :param k: 432 | How many classes to print. 433 | 434 | :param only_first_name: 435 | Some class-names are lists of names, if you only want the first name, 436 | then set only_first_name=True. 437 | 438 | :return: 439 | Nothing. 440 | """ 441 | 442 | # Get a sorted index for the pred-array. 443 | idx = pred.argsort() 444 | 445 | # The index is sorted lowest-to-highest values. Take the last k. 446 | top_k = idx[-k:] 447 | 448 | # Iterate the top-k classes in reversed order (i.e. highest first). 449 | for cls in reversed(top_k): 450 | # Lookup the class-name. 451 | name = self.name_lookup.cls_to_name(cls=cls, only_first_name=only_first_name) 452 | 453 | # Predicted score (or probability) for this class. 454 | score = pred[cls] 455 | 456 | # Print the score and class-name. 457 | print("{0:>6.2%} : {1}".format(score, name)) 458 | 459 | def transfer_values(self, image_path=None, image=None): 460 | """ 461 | Calculate the transfer-values for the given image. 462 | These are the values of the last layer of the Inception model before 463 | the softmax-layer, when inputting the image to the Inception model. 464 | 465 | The transfer-values allow us to use the Inception model in so-called 466 | Transfer Learning for other data-sets and different classifications. 467 | 468 | It may take several hours or more to calculate the transfer-values 469 | for all images in a data-set. It is therefore useful to cache the 470 | results using the function transfer_values_cache() below. 471 | 472 | :param image_path: 473 | The input image is a jpeg-file with this file-path. 474 | 475 | :param image: 476 | The input image is a 3-dim array which is already decoded. 477 | The pixels MUST be values between 0 and 255 (float or int). 478 | 479 | :return: 480 | The transfer-values for those images. 481 | """ 482 | 483 | # Create a feed-dict for the TensorFlow graph with the input image. 484 | feed_dict = self._create_feed_dict(image_path=image_path, image=image) 485 | 486 | # Use TensorFlow to run the graph for the Inception model. 487 | # This calculates the values for the last layer of the Inception model 488 | # prior to the softmax-classification, which we call transfer-values. 489 | transfer_values = self.session.run(self.transfer_layer, feed_dict=feed_dict) 490 | 491 | # Reduce to a 1-dim array. 492 | transfer_values = np.squeeze(transfer_values) 493 | 494 | return transfer_values 495 | 496 | 497 | ######################################################################## 498 | # Batch-processing. 499 | 500 | 501 | def process_images(fn, images=None, image_paths=None): 502 | """ 503 | Call the function fn() for each image, e.g. transfer_values() from 504 | the Inception model above. All the results are concatenated and returned. 505 | 506 | :param fn: 507 | Function to be called for each image. 508 | 509 | :param images: 510 | List of images to process. 511 | 512 | :param image_paths: 513 | List of file-paths for the images to process. 514 | 515 | :return: 516 | Numpy array with the results. 517 | """ 518 | 519 | # Are we using images or image_paths? 520 | using_images = images is not None 521 | 522 | # Number of images. 523 | if using_images: 524 | num_images = len(images) 525 | else: 526 | num_images = len(image_paths) 527 | 528 | # Pre-allocate list for the results. 529 | # This holds references to other arrays. Initially the references are None. 530 | result = [None] * num_images 531 | 532 | # For each input image. 533 | for i in range(num_images): 534 | # Status-message. Note the \r which means the line should overwrite itself. 535 | msg = "\r- Processing image: {0:>6} / {1}".format(i+1, num_images) 536 | 537 | # Print the status message. 538 | sys.stdout.write(msg) 539 | sys.stdout.flush() 540 | 541 | # Process the image and store the result for later use. 542 | if using_images: 543 | result[i] = fn(image=images[i]) 544 | else: 545 | result[i] = fn(image_path=image_paths[i]) 546 | 547 | # Print newline. 548 | print() 549 | 550 | # Convert the result to a numpy array. 551 | result = np.array(result) 552 | 553 | return result 554 | 555 | 556 | ######################################################################## 557 | 558 | 559 | def transfer_values_cache(cache_path, model, images=None, image_paths=None): 560 | """ 561 | This function either loads the transfer-values if they have 562 | already been calculated, otherwise it calculates the values 563 | and saves them to a file that can be re-loaded again later. 564 | 565 | Because the transfer-values can be expensive to compute, it can 566 | be useful to cache the values through this function instead 567 | of calling transfer_values() directly on the Inception model. 568 | 569 | See Tutorial #08 for an example on how to use this function. 570 | 571 | :param cache_path: 572 | File containing the cached transfer-values for the images. 573 | 574 | :param model: 575 | Instance of the Inception model. 576 | 577 | :param images: 578 | 4-dim array with images. [image_number, height, width, colour_channel] 579 | 580 | :param image_paths: 581 | Array of file-paths for images (must be jpeg-format). 582 | 583 | :return: 584 | The transfer-values from the Inception model for those images. 585 | """ 586 | 587 | # Helper-function for processing the images if the cache-file does not exist. 588 | # This is needed because we cannot supply both fn=process_images 589 | # and fn=model.transfer_values to the cache()-function. 590 | def fn(): 591 | return process_images(fn=model.transfer_values, images=images, image_paths=image_paths) 592 | 593 | # Read the transfer-values from a cache-file, or calculate them if the file does not exist. 594 | transfer_values = cache(cache_path=cache_path, fn=fn) 595 | 596 | return transfer_values 597 | 598 | 599 | ######################################################################## 600 | # Example usage. 601 | 602 | if __name__ == '__main__': 603 | print(tf.__version__) 604 | 605 | # Download Inception model if not already done. 606 | maybe_download() 607 | 608 | # Load the Inception model so it is ready for classifying images. 609 | model = Inception() 610 | 611 | # Path for a jpeg-image that is included in the downloaded data. 612 | image_path = os.path.join(data_dir, 'cropped_panda.jpg') 613 | 614 | # Use the Inception model to classify the image. 615 | pred = model.classify(image_path=image_path) 616 | 617 | # Print the scores and names for the top-10 predictions. 618 | model.print_scores(pred=pred, k=10) 619 | 620 | # Close the TensorFlow session. 621 | model.close() 622 | 623 | # Transfer Learning is demonstrated in Tutorial #08. 624 | 625 | ######################################################################## 626 | -------------------------------------------------------------------------------- /inception5h.py: -------------------------------------------------------------------------------- 1 | ######################################################################## 2 | # 3 | # The Inception Model 5h for TensorFlow. 4 | # 5 | # This variant of the Inception model is easier to use for DeepDream 6 | # and other imaging techniques. This is because it allows the input 7 | # image to be any size, and the optimized images are also prettier. 8 | # 9 | # It is unclear which Inception model this implements because the 10 | # Google developers have (as usual) neglected to document it. 11 | # It is dubbed the 5h-model because that is the name of the zip-file, 12 | # but it is apparently simpler than the v.3 model. 13 | # 14 | # See the Python Notebook for Tutorial #14 for an example usage. 15 | # 16 | # Implemented in Python 3.5 with TensorFlow v0.11.0rc0 17 | # 18 | ######################################################################## 19 | # 20 | # This file is part of the TensorFlow Tutorials available at: 21 | # 22 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials 23 | # 24 | # Published under the MIT License. See the file LICENSE for details. 25 | # 26 | # Copyright 2016 by Magnus Erik Hvass Pedersen 27 | # 28 | ######################################################################## 29 | 30 | import numpy as np 31 | import tensorflow as tf 32 | import download 33 | import os 34 | 35 | ######################################################################## 36 | # Various directories and file-names. 37 | 38 | # Internet URL for the tar-file with the Inception model. 39 | # Note that this might change in the future and will need to be updated. 40 | data_url = "http://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip" 41 | 42 | # Directory to store the downloaded data. 43 | data_dir = "inception/5h/" 44 | 45 | # File containing the TensorFlow graph definition. (Downloaded) 46 | path_graph_def = "tensorflow_inception_graph.pb" 47 | 48 | ######################################################################## 49 | 50 | 51 | def maybe_download(): 52 | """ 53 | Download the Inception model from the internet if it does not already 54 | exist in the data_dir. The file is about 50 MB. 55 | """ 56 | 57 | print("Downloading Inception 5h Model ...") 58 | download.maybe_download_and_extract(url=data_url, download_dir=data_dir) 59 | 60 | 61 | ######################################################################## 62 | 63 | 64 | class Inception5h: 65 | """ 66 | The Inception model is a Deep Neural Network which has already been 67 | trained for classifying images into 1000 different categories. 68 | 69 | When you create a new instance of this class, the Inception model 70 | will be loaded and can be used immediately without training. 71 | """ 72 | 73 | # Name of the tensor for feeding the input image. 74 | tensor_name_input_image = "input:0" 75 | 76 | # Names for some of the commonly used layers in the Inception model. 77 | layer_names = ['conv2d0', 'conv2d1', 'conv2d2', 78 | 'mixed3a', 'mixed3b', 79 | 'mixed4a', 'mixed4b', 'mixed4c', 'mixed4d', 'mixed4e', 80 | 'mixed5a', 'mixed5b'] 81 | 82 | def __init__(self): 83 | # Now load the Inception model from file. The way TensorFlow 84 | # does this is confusing and requires several steps. 85 | 86 | # Create a new TensorFlow computational graph. 87 | self.graph = tf.Graph() 88 | 89 | # Set the new graph as the default. 90 | with self.graph.as_default(): 91 | 92 | # TensorFlow graphs are saved to disk as so-called Protocol Buffers 93 | # aka. proto-bufs which is a file-format that works on multiple 94 | # platforms. In this case it is saved as a binary file. 95 | 96 | # Open the graph-def file for binary reading. 97 | path = os.path.join(data_dir, path_graph_def) 98 | with tf.gfile.FastGFile(path, 'rb') as file: 99 | # The graph-def is a saved copy of a TensorFlow graph. 100 | # First we need to create an empty graph-def. 101 | graph_def = tf.GraphDef() 102 | 103 | # Then we load the proto-buf file into the graph-def. 104 | graph_def.ParseFromString(file.read()) 105 | 106 | # Finally we import the graph-def to the default TensorFlow graph. 107 | tf.import_graph_def(graph_def, name='') 108 | 109 | # Now self.graph holds the Inception model from the proto-buf file. 110 | 111 | # Get a reference to the tensor for inputting images to the graph. 112 | self.input = self.graph.get_tensor_by_name(self.tensor_name_input_image) 113 | 114 | # Get references to the tensors for the commonly used layers. 115 | self.layer_tensors = [self.graph.get_tensor_by_name(name + ":0") for name in self.layer_names] 116 | 117 | def create_feed_dict(self, image=None): 118 | """ 119 | Create and return a feed-dict with an image. 120 | 121 | :param image: 122 | The input image is a 3-dim array which is already decoded. 123 | The pixels MUST be values between 0 and 255 (float or int). 124 | 125 | :return: 126 | Dict for feeding to the Inception graph in TensorFlow. 127 | """ 128 | 129 | # Expand 3-dim array to 4-dim by prepending an 'empty' dimension. 130 | # This is because we are only feeding a single image, but the 131 | # Inception model was built to take multiple images as input. 132 | image = np.expand_dims(image, axis=0) 133 | 134 | # Image is passed in as a 3-dim array of raw pixel-values. 135 | feed_dict = {self.tensor_name_input_image: image} 136 | 137 | return feed_dict 138 | 139 | def get_gradient(self, tensor): 140 | """ 141 | Get the gradient of the given tensor with respect to 142 | the input image. This allows us to modify the input 143 | image so as to maximize the given tensor. 144 | 145 | For use in e.g. DeepDream and Visual Analysis. 146 | 147 | :param tensor: 148 | The tensor whose value we want to maximize 149 | by changing the input image. 150 | 151 | :return: 152 | Gradient for the tensor with regard to the input image. 153 | """ 154 | 155 | # Set the graph as default so we can add operations to it. 156 | with self.graph.as_default(): 157 | # Square the tensor-values. 158 | # You can try and remove this to see the effect. 159 | tensor = tf.square(tensor) 160 | 161 | # Average the tensor so we get a single scalar value. 162 | tensor_mean = tf.reduce_mean(tensor) 163 | 164 | # Use TensorFlow to automatically create a mathematical 165 | # formula for the gradient using the chain-rule of 166 | # differentiation. 167 | gradient = tf.gradients(tensor_mean, self.input)[0] 168 | 169 | return gradient 170 | 171 | ######################################################################## 172 | -------------------------------------------------------------------------------- /knifey.py: -------------------------------------------------------------------------------- 1 | ######################################################################## 2 | # 3 | # Functions for downloading the Knifey-Spoony data-set from the internet 4 | # and loading it into memory. Note that this only loads the file-names 5 | # for the images in the data-set and does not load the actual images. 6 | # 7 | # Implemented in Python 3.5 8 | # 9 | ######################################################################## 10 | # 11 | # This file is part of the TensorFlow Tutorials available at: 12 | # 13 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials 14 | # 15 | # Published under the MIT License. See the file LICENSE for details. 16 | # 17 | # Copyright 2016 by Magnus Erik Hvass Pedersen 18 | # 19 | ######################################################################## 20 | 21 | from dataset import load_cached 22 | import download 23 | import os 24 | 25 | ######################################################################## 26 | 27 | # Directory where you want to download and save the data-set. 28 | # Set this before you start calling any of the functions below. 29 | data_dir = "data/knifey-spoony/" 30 | 31 | # Directory for the training-set after copying the files using copy_files(). 32 | train_dir = os.path.join(data_dir, "train/") 33 | 34 | # Directory for the test-set after copying the files using copy_files(). 35 | test_dir = os.path.join(data_dir, "test/") 36 | 37 | # URL for the data-set on the internet. 38 | data_url = "https://github.com/Hvass-Labs/knifey-spoony/raw/master/knifey-spoony.tar.gz" 39 | 40 | ######################################################################## 41 | # Various constants for the size of the images. 42 | # Use these constants in your own program. 43 | 44 | # Width and height of each image. 45 | img_size = 200 46 | 47 | # Number of channels in each image, 3 channels: Red, Green, Blue. 48 | num_channels = 3 49 | 50 | # Shape of the numpy-array for an image. 51 | img_shape = [img_size, img_size, num_channels] 52 | 53 | # Length of an image when flattened to a 1-dim array. 54 | img_size_flat = img_size * img_size * num_channels 55 | 56 | # Number of classes. 57 | num_classes = 3 58 | 59 | ######################################################################## 60 | # Public functions that you may call to download the data-set from 61 | # the internet and load the data into memory. 62 | 63 | 64 | def maybe_download_and_extract(): 65 | """ 66 | Download and extract the Knifey-Spoony data-set if it doesn't already exist 67 | in data_dir (set this variable first to the desired directory). 68 | """ 69 | 70 | download.maybe_download_and_extract(url=data_url, download_dir=data_dir) 71 | 72 | 73 | def load(): 74 | """ 75 | Load the Knifey-Spoony data-set into memory. 76 | 77 | This uses a cache-file which is reloaded if it already exists, 78 | otherwise the Knifey-Spoony data-set is created and saved to 79 | the cache-file. The reason for using a cache-file is that it 80 | ensure the files are ordered consistently each time the data-set 81 | is loaded. This is important when the data-set is used in 82 | combination with Transfer Learning as is done in Tutorial #09. 83 | 84 | :return: 85 | A DataSet-object for the Knifey-Spoony data-set. 86 | """ 87 | 88 | # Path for the cache-file. 89 | cache_path = os.path.join(data_dir, "knifey-spoony.pkl") 90 | 91 | # If the DataSet-object already exists in a cache-file 92 | # then load it, otherwise create a new object and save 93 | # it to the cache-file so it can be loaded the next time. 94 | dataset = load_cached(cache_path=cache_path, 95 | in_dir=data_dir) 96 | 97 | return dataset 98 | 99 | 100 | def copy_files(): 101 | """ 102 | Copy all the files in the training-set to train_dir 103 | and copy all the files in the test-set to test_dir. 104 | 105 | This creates the directories if they don't already exist, 106 | and it overwrites the images if they already exist. 107 | 108 | The images are originally stored in a directory-structure 109 | that is incompatible with e.g. the Keras API. This function 110 | copies the files to a dir-structure that works with e.g. Keras. 111 | """ 112 | 113 | # Load the Knifey-Spoony dataset. 114 | # This is very fast as it only gathers lists of the files 115 | # and does not actually load the images into memory. 116 | dataset = load() 117 | 118 | # Copy the files to separate training- and test-dirs. 119 | dataset.copy_files(train_dir=train_dir, test_dir=test_dir) 120 | 121 | ######################################################################## 122 | 123 | if __name__ == '__main__': 124 | # Download and extract the data-set if it doesn't already exist. 125 | maybe_download_and_extract() 126 | 127 | # Load the data-set. 128 | dataset = load() 129 | 130 | # Get the file-paths for the images and their associated class-numbers 131 | # and class-labels. This is for the training-set. 132 | image_paths_train, cls_train, labels_train = dataset.get_training_set() 133 | 134 | # Get the file-paths for the images and their associated class-numbers 135 | # and class-labels. This is for the test-set. 136 | image_paths_test, cls_test, labels_test = dataset.get_test_set() 137 | 138 | # Check if the training-set looks OK. 139 | 140 | # Print some of the file-paths for the training-set. 141 | for path in image_paths_train[0:5]: 142 | print(path) 143 | 144 | # Print the associated class-numbers. 145 | print(cls_train[0:5]) 146 | 147 | # Print the class-numbers as one-hot encoded arrays. 148 | print(labels_train[0:5]) 149 | 150 | # Check if the test-set looks OK. 151 | 152 | # Print some of the file-paths for the test-set. 153 | for path in image_paths_test[0:5]: 154 | print(path) 155 | 156 | # Print the associated class-numbers. 157 | print(cls_test[0:5]) 158 | 159 | # Print the class-numbers as one-hot encoded arrays. 160 | print(labels_test[0:5]) 161 | 162 | ######################################################################## 163 | -------------------------------------------------------------------------------- /mnist.py: -------------------------------------------------------------------------------- 1 | ######################################################################## 2 | # 3 | # Downloads the MNIST data-set for recognizing hand-written digits. 4 | # 5 | # Implemented in Python 3.6 6 | # 7 | # Usage: 8 | # 1) Create a new object instance: data = MNIST(data_dir="data/MNIST/") 9 | # This automatically downloads the files to the given dir. 10 | # 2) Use the training-set as data.x_train, data.y_train and data.y_train_cls 11 | # 3) Get random batches of training data using data.random_batch() 12 | # 4) Use the test-set as data.x_test, data.y_test and data.y_test_cls 13 | # 14 | ######################################################################## 15 | # 16 | # This file is part of the TensorFlow Tutorials available at: 17 | # 18 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials 19 | # 20 | # Published under the MIT License. See the file LICENSE for details. 21 | # 22 | # Copyright 2016-18 by Magnus Erik Hvass Pedersen 23 | # 24 | ######################################################################## 25 | 26 | import numpy as np 27 | import gzip 28 | import os 29 | from dataset import one_hot_encoded 30 | from download import download 31 | 32 | ######################################################################## 33 | 34 | # Base URL for downloading the data-files from the internet. 35 | base_url = "https://storage.googleapis.com/cvdf-datasets/mnist/" 36 | 37 | # Filenames for the data-set. 38 | filename_x_train = "train-images-idx3-ubyte.gz" 39 | filename_y_train = "train-labels-idx1-ubyte.gz" 40 | filename_x_test = "t10k-images-idx3-ubyte.gz" 41 | filename_y_test = "t10k-labels-idx1-ubyte.gz" 42 | 43 | ######################################################################## 44 | 45 | 46 | class MNIST: 47 | """ 48 | The MNIST data-set for recognizing hand-written digits. 49 | This automatically downloads the data-files if they do 50 | not already exist in the local data_dir. 51 | 52 | Note: Pixel-values are floats between 0.0 and 1.0. 53 | """ 54 | 55 | # The images are 28 pixels in each dimension. 56 | img_size = 28 57 | 58 | # The images are stored in one-dimensional arrays of this length. 59 | img_size_flat = img_size * img_size 60 | 61 | # Tuple with height and width of images used to reshape arrays. 62 | img_shape = (img_size, img_size) 63 | 64 | # Number of colour channels for the images: 1 channel for gray-scale. 65 | num_channels = 1 66 | 67 | # Tuple with height, width and depth used to reshape arrays. 68 | # This is used for reshaping in Keras. 69 | img_shape_full = (img_size, img_size, num_channels) 70 | 71 | # Number of classes, one class for each of 10 digits. 72 | num_classes = 10 73 | 74 | def __init__(self, data_dir="data/MNIST/"): 75 | """ 76 | Load the MNIST data-set. Automatically downloads the files 77 | if they do not already exist locally. 78 | 79 | :param data_dir: Base-directory for downloading files. 80 | """ 81 | 82 | # Copy args to self. 83 | self.data_dir = data_dir 84 | 85 | # Number of images in each sub-set. 86 | self.num_train = 55000 87 | self.num_val = 5000 88 | self.num_test = 10000 89 | 90 | # Download / load the training-set. 91 | x_train = self._load_images(filename=filename_x_train) 92 | y_train_cls = self._load_cls(filename=filename_y_train) 93 | 94 | # Split the training-set into train / validation. 95 | # Pixel-values are converted from ints between 0 and 255 96 | # to floats between 0.0 and 1.0. 97 | self.x_train = x_train[0:self.num_train] / 255.0 98 | self.x_val = x_train[self.num_train:] / 255.0 99 | self.y_train_cls = y_train_cls[0:self.num_train] 100 | self.y_val_cls = y_train_cls[self.num_train:] 101 | 102 | # Download / load the test-set. 103 | self.x_test = self._load_images(filename=filename_x_test) / 255.0 104 | self.y_test_cls = self._load_cls(filename=filename_y_test) 105 | 106 | # Convert the class-numbers from bytes to ints as that is needed 107 | # some places in TensorFlow. 108 | self.y_train_cls = self.y_train_cls.astype(np.int) 109 | self.y_val_cls = self.y_val_cls.astype(np.int) 110 | self.y_test_cls = self.y_test_cls.astype(np.int) 111 | 112 | # Convert the integer class-numbers into one-hot encoded arrays. 113 | self.y_train = one_hot_encoded(class_numbers=self.y_train_cls, 114 | num_classes=self.num_classes) 115 | self.y_val = one_hot_encoded(class_numbers=self.y_val_cls, 116 | num_classes=self.num_classes) 117 | self.y_test = one_hot_encoded(class_numbers=self.y_test_cls, 118 | num_classes=self.num_classes) 119 | 120 | def _load_data(self, filename, offset): 121 | """ 122 | Load the data in the given file. Automatically downloads the file 123 | if it does not already exist in the data_dir. 124 | 125 | :param filename: Name of the data-file. 126 | :param offset: Start offset in bytes when reading the data-file. 127 | :return: The data as a numpy array. 128 | """ 129 | 130 | # Download the file from the internet if it does not exist locally. 131 | download(base_url=base_url, filename=filename, download_dir=self.data_dir) 132 | 133 | # Read the data-file. 134 | path = os.path.join(self.data_dir, filename) 135 | with gzip.open(path, 'rb') as f: 136 | data = np.frombuffer(f.read(), np.uint8, offset=offset) 137 | 138 | return data 139 | 140 | def _load_images(self, filename): 141 | """ 142 | Load image-data from the given file. 143 | Automatically downloads the file if it does not exist locally. 144 | 145 | :param filename: Name of the data-file. 146 | :return: Numpy array. 147 | """ 148 | 149 | # Read the data as one long array of bytes. 150 | data = self._load_data(filename=filename, offset=16) 151 | 152 | # Reshape to 2-dim array with shape (num_images, img_size_flat). 153 | images_flat = data.reshape(-1, self.img_size_flat) 154 | 155 | return images_flat 156 | 157 | def _load_cls(self, filename): 158 | """ 159 | Load class-numbers from the given file. 160 | Automatically downloads the file if it does not exist locally. 161 | 162 | :param filename: Name of the data-file. 163 | :return: Numpy array. 164 | """ 165 | return self._load_data(filename=filename, offset=8) 166 | 167 | def random_batch(self, batch_size=32): 168 | """ 169 | Create a random batch of training-data. 170 | 171 | :param batch_size: Number of images in the batch. 172 | :return: 3 numpy arrays (x, y, y_cls) 173 | """ 174 | 175 | # Create a random index into the training-set. 176 | idx = np.random.randint(low=0, high=self.num_train, size=batch_size) 177 | 178 | # Use the index to lookup random training-data. 179 | x_batch = self.x_train[idx] 180 | y_batch = self.y_train[idx] 181 | y_batch_cls = self.y_train_cls[idx] 182 | 183 | return x_batch, y_batch, y_batch_cls 184 | 185 | 186 | ######################################################################## 187 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | ################################################################ 2 | # 3 | # Python package requirements for the TensorFlow Tutorials: 4 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials 5 | # 6 | # If you are using Anaconda then you can install all required 7 | # Python packages by running the following commands in a shell: 8 | # 9 | # conda create --name tf python=3 10 | # source activate tf 11 | # pip install -r requirements.txt 12 | # 13 | # Note that you have to edit this file to select whether you 14 | # want to install the CPU or GPU version of TensorFlow. 15 | # 16 | ################################################################ 17 | # Basic packages used in many of the tutorials. 18 | 19 | numpy 20 | scipy 21 | jupyter 22 | matplotlib 23 | Pillow 24 | scikit-learn 25 | 26 | ################################################################ 27 | # TensorFlow v.2.1 and above include both CPU and GPU versions. 28 | 29 | tensorflow 30 | 31 | ################################################################ 32 | # Some tutorials use other individual Python packages. 33 | # Uncomment the relevant lines for the tutorials you want to run. 34 | 35 | # gym[atari] # Tutorial #16 on Reinforcement Learning. 36 | # pandas # Tutorial #23 on Time-Series Prediction. 37 | 38 | ################################################################ 39 | # PrettyTensor was used as the builder API for several of the 40 | # earlier tutorials. PrettyTensor is apparently no longer being 41 | # maintained and may not work with newer versions of TensorFlow. 42 | 43 | # prettytensor 44 | 45 | ################################################################ 46 | -------------------------------------------------------------------------------- /vgg16.py: -------------------------------------------------------------------------------- 1 | ######################################################################## 2 | # 3 | # The pre-trained VGG16 Model for TensorFlow. 4 | # 5 | # This model seems to produce better-looking images in Style Transfer 6 | # than the Inception 5h model that otherwise works well for DeepDream. 7 | # 8 | # See the Python Notebook for Tutorial #15 for an example usage. 9 | # 10 | # Implemented in Python 3.5 with TensorFlow v0.11.0rc0 11 | # 12 | ######################################################################## 13 | # 14 | # This file is part of the TensorFlow Tutorials available at: 15 | # 16 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials 17 | # 18 | # Published under the MIT License. See the file LICENSE for details. 19 | # 20 | # Copyright 2016 by Magnus Erik Hvass Pedersen 21 | # 22 | ######################################################################## 23 | 24 | import numpy as np 25 | import tensorflow as tf 26 | import download 27 | import os 28 | 29 | ######################################################################## 30 | # Various directories and file-names. 31 | 32 | # The pre-trained VGG16 model is taken from this tutorial: 33 | # https://github.com/pkmital/CADL/blob/master/session-4/libs/vgg16.py 34 | 35 | # The class-names are available in the following URL: 36 | # https://s3.amazonaws.com/cadl/models/synset.txt 37 | 38 | # Internet URL for the file with the VGG16 model. 39 | # Note that this might change in the future and will need to be updated. 40 | data_url = "https://s3.amazonaws.com/cadl/models/vgg16.tfmodel" 41 | 42 | # Directory to store the downloaded data. 43 | data_dir = "vgg16/" 44 | 45 | # File containing the TensorFlow graph definition. (Downloaded) 46 | path_graph_def = "vgg16.tfmodel" 47 | 48 | ######################################################################## 49 | 50 | 51 | def maybe_download(): 52 | """ 53 | Download the VGG16 model from the internet if it does not already 54 | exist in the data_dir. WARNING! The file is about 550 MB. 55 | """ 56 | 57 | print("Downloading VGG16 Model ...") 58 | 59 | # The file on the internet is not stored in a compressed format. 60 | # This function should not extract the file when it does not have 61 | # a relevant filename-extensions such as .zip or .tar.gz 62 | download.maybe_download_and_extract(url=data_url, download_dir=data_dir) 63 | 64 | 65 | ######################################################################## 66 | 67 | 68 | class VGG16: 69 | """ 70 | The VGG16 model is a Deep Neural Network which has already been 71 | trained for classifying images into 1000 different categories. 72 | 73 | When you create a new instance of this class, the VGG16 model 74 | will be loaded and can be used immediately without training. 75 | """ 76 | 77 | # Name of the tensor for feeding the input image. 78 | tensor_name_input_image = "images:0" 79 | 80 | # Names of the tensors for the dropout random-values.. 81 | tensor_name_dropout = 'dropout/random_uniform:0' 82 | tensor_name_dropout1 = 'dropout_1/random_uniform:0' 83 | 84 | # Names for the convolutional layers in the model for use in Style Transfer. 85 | layer_names = ['conv1_1/conv1_1', 'conv1_2/conv1_2', 86 | 'conv2_1/conv2_1', 'conv2_2/conv2_2', 87 | 'conv3_1/conv3_1', 'conv3_2/conv3_2', 'conv3_3/conv3_3', 88 | 'conv4_1/conv4_1', 'conv4_2/conv4_2', 'conv4_3/conv4_3', 89 | 'conv5_1/conv5_1', 'conv5_2/conv5_2', 'conv5_3/conv5_3'] 90 | 91 | def __init__(self): 92 | # Now load the model from file. The way TensorFlow 93 | # does this is confusing and requires several steps. 94 | 95 | # Create a new TensorFlow computational graph. 96 | self.graph = tf.Graph() 97 | 98 | # Set the new graph as the default. 99 | with self.graph.as_default(): 100 | 101 | # TensorFlow graphs are saved to disk as so-called Protocol Buffers 102 | # aka. proto-bufs which is a file-format that works on multiple 103 | # platforms. In this case it is saved as a binary file. 104 | 105 | # Open the graph-def file for binary reading. 106 | path = os.path.join(data_dir, path_graph_def) 107 | with tf.gfile.FastGFile(path, 'rb') as file: 108 | # The graph-def is a saved copy of a TensorFlow graph. 109 | # First we need to create an empty graph-def. 110 | graph_def = tf.GraphDef() 111 | 112 | # Then we load the proto-buf file into the graph-def. 113 | graph_def.ParseFromString(file.read()) 114 | 115 | # Finally we import the graph-def to the default TensorFlow graph. 116 | tf.import_graph_def(graph_def, name='') 117 | 118 | # Now self.graph holds the VGG16 model from the proto-buf file. 119 | 120 | # Get a reference to the tensor for inputting images to the graph. 121 | self.input = self.graph.get_tensor_by_name(self.tensor_name_input_image) 122 | 123 | # Get references to the tensors for the commonly used layers. 124 | self.layer_tensors = [self.graph.get_tensor_by_name(name + ":0") for name in self.layer_names] 125 | 126 | def get_layer_tensors(self, layer_ids): 127 | """ 128 | Return a list of references to the tensors for the layers with the given id's. 129 | """ 130 | 131 | return [self.layer_tensors[idx] for idx in layer_ids] 132 | 133 | def get_layer_names(self, layer_ids): 134 | """ 135 | Return a list of names for the layers with the given id's. 136 | """ 137 | 138 | return [self.layer_names[idx] for idx in layer_ids] 139 | 140 | def get_all_layer_names(self, startswith=None): 141 | """ 142 | Return a list of all the layers (operations) in the graph. 143 | The list can be filtered for names that start with the given string. 144 | """ 145 | 146 | # Get a list of the names for all layers (operations) in the graph. 147 | names = [op.name for op in self.graph.get_operations()] 148 | 149 | # Filter the list of names so we only get those starting with 150 | # the given string. 151 | if startswith is not None: 152 | names = [name for name in names if name.startswith(startswith)] 153 | 154 | return names 155 | 156 | def create_feed_dict(self, image): 157 | """ 158 | Create and return a feed-dict with an image. 159 | 160 | :param image: 161 | The input image is a 3-dim array which is already decoded. 162 | The pixels MUST be values between 0 and 255 (float or int). 163 | 164 | :return: 165 | Dict for feeding to the graph in TensorFlow. 166 | """ 167 | 168 | # Expand 3-dim array to 4-dim by prepending an 'empty' dimension. 169 | # This is because we are only feeding a single image, but the 170 | # VGG16 model was built to take multiple images as input. 171 | image = np.expand_dims(image, axis=0) 172 | 173 | if False: 174 | # In the original code using this VGG16 model, the random values 175 | # for the dropout are fixed to 1.0. 176 | # Experiments suggest that it does not seem to matter for 177 | # Style Transfer, and this causes an error with a GPU. 178 | dropout_fix = 1.0 179 | 180 | # Create feed-dict for inputting data to TensorFlow. 181 | feed_dict = {self.tensor_name_input_image: image, 182 | self.tensor_name_dropout: [[dropout_fix]], 183 | self.tensor_name_dropout1: [[dropout_fix]]} 184 | else: 185 | # Create feed-dict for inputting data to TensorFlow. 186 | feed_dict = {self.tensor_name_input_image: image} 187 | 188 | return feed_dict 189 | 190 | ######################################################################## 191 | -------------------------------------------------------------------------------- /weather.py: -------------------------------------------------------------------------------- 1 | ######################################################################## 2 | # 3 | # Functions for downloading and re-sampling weather-data 4 | # for 5 cities in Denmark between 1980-2018. 5 | # 6 | # The raw data was obtained from: 7 | # 8 | # National Climatic Data Center (NCDC) in USA 9 | # https://www7.ncdc.noaa.gov/CDO/cdoselect.cmd 10 | # 11 | # Note that the NCDC's database functionality may change soon, and 12 | # that the CSV-file needed some manual editing before it could be read. 13 | # See the function _convert_raw_data() below for inspiration if you 14 | # want to convert a new data-file from NCDC's database. 15 | # 16 | # Implemented in Python 3.6 17 | # 18 | # Usage: 19 | # 1) Set the desired storage directory in the data_dir variable. 20 | # 2) Call maybe_download_and_extract() to download the data-set 21 | # if it is not already located in the given data_dir. 22 | # 3) Either call load_original_data() or load_resampled_data() 23 | # to load the original or resampled data for use in your program. 24 | # 25 | # Format: 26 | # The raw data-file from NCDC is not included in the downloaded archive, 27 | # which instead contains a cleaned-up version of the raw data-file 28 | # referred to as the "original data". This data has not yet been resampled. 29 | # The original data-file is available as a pickled file for fast reloading 30 | # with Pandas, and as a CSV-file for broad compatibility. 31 | # 32 | ######################################################################## 33 | # 34 | # This file is part of the TensorFlow Tutorials available at: 35 | # 36 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials 37 | # 38 | # Published under the MIT License. See the file LICENSE for details. 39 | # 40 | # Copyright 2018 by Magnus Erik Hvass Pedersen 41 | # 42 | ######################################################################## 43 | 44 | import pandas as pd 45 | import os 46 | import download 47 | 48 | ######################################################################## 49 | 50 | # Directory where you want to download and save the data-set. 51 | # Set this before you start calling any of the functions below. 52 | data_dir = "data/weather-denmark/" 53 | 54 | 55 | # Full path for the pickled data-file. (Original data). 56 | def path_original_data_pickle(): 57 | return os.path.join(data_dir, "weather-denmark.pkl") 58 | 59 | 60 | # Full path for the comma-separated text-file. (Original data). 61 | def path_original_data_csv(): 62 | return os.path.join(data_dir, "weather-denmark.csv") 63 | 64 | 65 | # Full path for the resampled data as a pickled file. 66 | def path_resampled_data_pickle(): 67 | return os.path.join(data_dir, "weather-denmark-resampled.pkl") 68 | 69 | 70 | # URL for the data-set on the internet. 71 | data_url = "https://github.com/Hvass-Labs/weather-denmark/raw/master/weather-denmark.tar.gz" 72 | 73 | 74 | # List of the cities in this data-set. These are cities in Denmark. 75 | cities = ['Aalborg', 'Aarhus', 'Esbjerg', 'Odense', 'Roskilde'] 76 | 77 | 78 | ######################################################################## 79 | # Private helper-functions. 80 | 81 | 82 | def _date_string(x): 83 | """Convert two integers to a string for the date and time.""" 84 | 85 | date = x[0] # Date. Example: 19801231 86 | time = x[1] # Time. Example: 1230 87 | 88 | return "{0}{1:04d}".format(date, time) 89 | 90 | 91 | def _usaf_to_city(usaf): 92 | """ 93 | The raw data-file uses USAF-codes to identify weather-stations. 94 | If you download another data-set from NCDC then you will have to 95 | change this function to use the USAF-codes in your new data-file. 96 | """ 97 | 98 | table = \ 99 | { 100 | 60300: 'Aalborg', 101 | 60700: 'Aarhus', 102 | 60800: 'Esbjerg', 103 | 61200: 'Odense', 104 | 61700: 'Roskilde' 105 | } 106 | 107 | return table[usaf] 108 | 109 | 110 | def _convert_raw_data(path): 111 | """ 112 | This converts a raw data-file obtained from the NCDC database. 113 | This function may be useful as an inspiration if you want to 114 | download another raw data-file from NCDC, but you will have 115 | to modify this function to match the data you have downloaded. 116 | 117 | Note that you may also have to manually edit the raw data-file, 118 | e.g. because the header is not in a proper comma-separated format. 119 | """ 120 | 121 | # The raw CSV-file uses various markers for "not-available" (NA). 122 | # (This is one of several oddities with NCDC's file-format.) 123 | na_values = ['999', '999.0', '999.9', '9999.9'] 124 | 125 | # Use Pandas to load the comma-separated file. 126 | # Note that you may have to manually edit the file's header 127 | # to get this to load correctly. 128 | df_raw = pd.read_csv(path, sep=',', header=1, 129 | index_col=False, na_values=na_values) 130 | 131 | # Create a new data-frame containing only the data 132 | # we are interested in. 133 | df = pd.DataFrame() 134 | 135 | # Get the city-name / weather-station name from the USAF code. 136 | df['City'] = df_raw['USAF '].apply(_usaf_to_city) 137 | 138 | # Convert the integer date-time to a proper date-time object. 139 | datestr = df_raw[['Date ', 'HrMn']].apply(_date_string, axis=1) 140 | df['DateTime'] = pd.to_datetime(datestr, format='%Y%m%d%H%M') 141 | 142 | # Get the data we are interested in. 143 | df['Temp'] = df_raw['Temp '] 144 | df['Pressure'] = df_raw['Slp '] 145 | df['WindSpeed'] = df_raw['Spd '] 146 | df['WindDir'] = df_raw['Dir'] 147 | 148 | # Set the city-name and date-time as the index. 149 | df.set_index(['City', 'DateTime'], inplace=True) 150 | 151 | # Save the new data-frame as a pickle for fast reloading. 152 | df.to_pickle(path_original_data_pickle()) 153 | 154 | # Save the new data-frame as a CSV-file for general readability. 155 | df.to_csv(path_original_data_csv()) 156 | 157 | return df 158 | 159 | 160 | def _resample(df): 161 | """ 162 | Resample the contents of a Pandas data-frame by first 163 | removing empty rows and columns, then up-sampling and 164 | interpolating the data for 1-minute intervals, and 165 | finally down-sampling to 60-minute intervals. 166 | """ 167 | 168 | # Remove all empty rows. 169 | df_res = df.dropna(how='all') 170 | 171 | # Upsample so the time-series has data for every minute. 172 | df_res = df_res.resample('1T') 173 | 174 | # Fill in missing values. 175 | df_res = df_res.interpolate(method='time') 176 | 177 | # Downsample so the time-series has data for every hour. 178 | df_res = df_res.resample('60T') 179 | 180 | # Finalize the resampling. (Is this really necessary?) 181 | df_res = df_res.interpolate() 182 | 183 | # Remove all empty rows. 184 | df_res = df_res.dropna(how='all') 185 | 186 | return df_res 187 | 188 | 189 | ######################################################################## 190 | # Public functions that you may call to download the data-set from 191 | # the internet and load the data into memory. 192 | 193 | 194 | def maybe_download_and_extract(): 195 | """ 196 | Download and extract the weather-data if the data-files don't 197 | already exist in the data_dir. 198 | """ 199 | 200 | download.maybe_download_and_extract(url=data_url, download_dir=data_dir) 201 | 202 | 203 | def load_original_data(): 204 | """ 205 | Load and return the original data that has not been resampled. 206 | 207 | Note that this is not the raw data obtained from NCDC. 208 | It is a cleaned-up version of that data, as written by the 209 | function _convert_raw_data() above. 210 | """ 211 | 212 | return pd.read_pickle(path_original_data_pickle()) 213 | 214 | 215 | def load_resampled_data(): 216 | """ 217 | Load and return the resampled weather-data. 218 | 219 | This has data-points at regular 60-minute intervals where 220 | missing data has been linearly interpolated. 221 | 222 | This uses a cache-file for saving and quickly reloading the data, 223 | so the original data is only resampled once. 224 | """ 225 | 226 | # Path for the cache-file with the resampled data. 227 | path = path_resampled_data_pickle() 228 | 229 | # If the cache-file exists ... 230 | if os.path.exists(path): 231 | # Reload the cache-file. 232 | df = pd.read_pickle(path) 233 | else: 234 | # Otherwise resample the original data and save it in a cache-file. 235 | 236 | # Load the original data. 237 | df_org = load_original_data() 238 | 239 | # Split the original data into separate data-frames for each city. 240 | df_cities = [df_org.xs(city) for city in cities] 241 | 242 | # Resample the data for each city. 243 | df_resampled = [_resample(df_city) for df_city in df_cities] 244 | 245 | # Join the resampled data into a single data-frame. 246 | df = pd.concat(df_resampled, keys=cities, 247 | axis=1, join='inner') 248 | 249 | # Save the resampled data in a cache-file for quick reloading. 250 | df.to_pickle(path) 251 | 252 | return df 253 | 254 | 255 | ######################################################################## 256 | --------------------------------------------------------------------------------