├── .github
└── ISSUE_TEMPLATE.md
├── .gitignore
├── 01_Simple_Linear_Model.ipynb
├── 02_Convolutional_Neural_Network.ipynb
├── 03B_Layers_API.ipynb
├── 03C_Keras_API.ipynb
├── 03_PrettyTensor.ipynb
├── 04_Save_Restore.ipynb
├── 05_Ensemble_Learning.ipynb
├── 06_CIFAR-10.ipynb
├── 07_Inception_Model.ipynb
├── 08_Transfer_Learning.ipynb
├── 09_Video_Data.ipynb
├── 10_Fine-Tuning.ipynb
├── 11_Adversarial_Examples.ipynb
├── 12_Adversarial_Noise_MNIST.ipynb
├── 13B_Visual_Analysis_MNIST.ipynb
├── 13_Visual_Analysis.ipynb
├── 14_DeepDream.ipynb
├── 15_Style_Transfer.ipynb
├── 16_Reinforcement_Learning.ipynb
├── 17_Estimator_API.ipynb
├── 18_TFRecords_Dataset_API.ipynb
├── 19_Hyper-Parameters.ipynb
├── 20_Natural_Language_Processing.ipynb
├── 21_Machine_Translation.ipynb
├── 22_Image_Captioning.ipynb
├── 23_Time-Series-Prediction.ipynb
├── LICENSE
├── README.md
├── cache.py
├── cifar10.py
├── coco.py
├── convert.py
├── dataset.py
├── download.py
├── europarl.py
├── forks.md
├── images
├── 02_convolution.png
├── 02_convolution.svg
├── 02_network_flowchart.png
├── 02_network_flowchart.svg
├── 06_network_flowchart.png
├── 06_network_flowchart.svg
├── 07_inception_flowchart.png
├── 08_transfer_learning_flowchart.png
├── 08_transfer_learning_flowchart.svg
├── 09_transfer_learning_flowchart.png
├── 09_transfer_learning_flowchart.svg
├── 10_transfer_learning_flowchart.png
├── 10_transfer_learning_flowchart.svg
├── 11_adversarial_examples_flowchart.png
├── 11_adversarial_examples_flowchart.svg
├── 12_adversarial_noise_flowchart.png
├── 12_adversarial_noise_flowchart.svg
├── 13_visual_analysis_flowchart.png
├── 13_visual_analysis_flowchart.svg
├── 13b_visual_analysis_flowchart.png
├── 13b_visual_analysis_flowchart.svg
├── 14_deepdream_flowchart.png
├── 14_deepdream_flowchart.svg
├── 14_deepdream_recursive_flowchart.png
├── 14_deepdream_recursive_flowchart.svg
├── 15_style_transfer_flowchart.png
├── 15_style_transfer_flowchart.svg
├── 16_flowchart.png
├── 16_flowchart.svg
├── 16_motion-trace.png
├── 16_problem.png
├── 16_problem.svg
├── 16_q-values-details.png
├── 16_q-values-details.svg
├── 16_q-values-simple.png
├── 16_q-values-simple.svg
├── 16_training_stability.png
├── 16_training_stability.svg
├── 19_flowchart_bayesian_optimization.png
├── 19_flowchart_bayesian_optimization.svg
├── 20_natural_language_flowchart.png
├── 20_natural_language_flowchart.svg
├── 20_recurrent_unit.png
├── 20_recurrent_unit.svg
├── 20_unrolled_3layers_flowchart.png
├── 20_unrolled_3layers_flowchart.svg
├── 20_unrolled_flowchart.png
├── 20_unrolled_flowchart.svg
├── 21_machine_translation_flowchart.png
├── 21_machine_translation_flowchart.svg
├── 22_image_captioning_flowchart.png
├── 22_image_captioning_flowchart.svg
├── 23_time_series_flowchart.png
├── 23_time_series_flowchart.svg
├── Denmark.jpg
├── Europe.jpg
├── elon_musk.jpg
├── elon_musk_100x100.jpg
├── escher_planefilling2.jpg
├── giger.jpg
├── hulk.jpg
├── parrot.jpg
├── parrot_cropped1.jpg
├── parrot_cropped2.jpg
├── parrot_cropped3.jpg
├── parrot_padded.jpg
├── style1.jpg
├── style2.jpg
├── style3.jpg
├── style4.jpg
├── style5.jpg
├── style6.jpg
├── style7.jpg
├── style8.jpg
├── style9.jpg
├── willy_wonka_new.jpg
└── willy_wonka_old.jpg
├── imdb.py
├── inception.py
├── inception5h.py
├── knifey.py
├── mnist.py
├── reinforcement_learning.py
├── requirements.txt
├── vgg16.py
└── weather.py
/.github/ISSUE_TEMPLATE.md:
--------------------------------------------------------------------------------
1 | # STOP!
2 |
3 | **Please don't waste my time!**
4 |
5 | Most of the problems people are having are already described in the [installation instructions](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/README.md).
6 |
7 | You should first make a serious attempt to solve your problem.
8 | If you ask a question that has already been answered elsewhere, or if you do not
9 | give enough details about your problem, then your issue may be closed immediately.
10 |
11 | ## Python 3
12 |
13 | These tutorials were developed in **Python 3.5** (and higher) and may give strange errors in Python 2.7
14 |
15 | ## Missing Files
16 |
17 | You need to **download the whole repository**, either using `git clone` or as a zip-file. See the [installation instructions](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/README.md).
18 |
19 | ## Questions about TensorFlow
20 |
21 | General questions about TensorFlow should either be asked on [StackOverflow](http://stackoverflow.com/questions/tagged/tensorflow) or the [official TensorFlow repository](https://github.com/tensorflow/tensorflow/issues).
22 |
23 | ## Modifications
24 |
25 | Questions about modifications or how to use these tutorials on your own data-set should also be asked on [StackOverflow](http://stackoverflow.com/questions/tagged/tensorflow).
26 |
27 | Thousands of people are using these tutorials. It is impossible for me to give individual support for your project.
28 |
29 | ## Suggestions for Changes
30 |
31 | The tutorials cannot change too much because it would make the [YouTube videos](https://www.youtube.com/playlist?list=PL9Hr9sNUjfsmEu1ZniY0XpHSzl5uihcXZ) too different from the source-code.
32 |
33 | ## Requests for New Tutorials
34 |
35 | These tutorials were made by a single person on his own time. It took a very long time to
36 | research and produce the tutorials. If a topic is not covered then the best thing is to make
37 | a new tutorial by yourself. All you need is a decent microphone, a screen-grabbing tool, and a
38 | video editor. I used the free version of [DaVinci Resolve](https://www.blackmagicdesign.com/products/davinciresolve).
39 |
40 | ## Other Issues?
41 |
42 | Please carefully read the [installation instructions](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/README.md) and only open an issue if you are still having problems.
43 |
--------------------------------------------------------------------------------
/.gitignore:
--------------------------------------------------------------------------------
1 | # Various
2 | sandbox*.py
3 |
4 | # Data for TensorFlow
5 | data/
6 | inception/
7 | vgg16/
8 | checkpoints/
9 | checkpoints*
10 | logs/
11 | summary/
12 |
13 | # PyCharm
14 | .idea/
15 |
16 |
17 | # Byte-compiled / optimized / DLL files
18 | __pycache__/
19 | *.py[cod]
20 | *$py.class
21 |
22 | # C extensions
23 | *.so
24 |
25 | # Distribution / packaging
26 | .Python
27 | env/
28 | build/
29 | develop-eggs/
30 | dist/
31 | downloads/
32 | eggs/
33 | .eggs/
34 | lib/
35 | lib64/
36 | parts/
37 | sdist/
38 | var/
39 | *.egg-info/
40 | .installed.cfg
41 | *.egg
42 |
43 | # PyInstaller
44 | # Usually these files are written by a python script from a template
45 | # before PyInstaller builds the exe, so as to inject date/other infos into it.
46 | *.manifest
47 | *.spec
48 |
49 | # Installer logs
50 | pip-log.txt
51 | pip-delete-this-directory.txt
52 |
53 | # Unit test / coverage reports
54 | htmlcov/
55 | .tox/
56 | .coverage
57 | .coverage.*
58 | .cache
59 | nosetests.xml
60 | coverage.xml
61 | *,cover
62 | .hypothesis/
63 |
64 | # Translations
65 | *.mo
66 | *.pot
67 |
68 | # Django stuff:
69 | *.log
70 | local_settings.py
71 |
72 | # Flask stuff:
73 | instance/
74 | .webassets-cache
75 |
76 | # Scrapy stuff:
77 | .scrapy
78 |
79 | # Sphinx documentation
80 | docs/_build/
81 |
82 | # PyBuilder
83 | target/
84 |
85 | # IPython Notebook
86 | .ipynb_checkpoints
87 |
88 | # pyenv
89 | .python-version
90 |
91 | # celery beat schedule file
92 | celerybeat-schedule
93 |
94 | # dotenv
95 | .env
96 |
97 | # virtualenv
98 | venv/
99 | ENV/
100 |
101 | # Spyder project settings
102 | .spyderproject
103 |
104 | # Rope project settings
105 | .ropeproject
106 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | The MIT License (MIT)
2 |
3 | Copyright (c) 2016 by Magnus Erik Hvass Pedersen
4 |
5 | Permission is hereby granted, free of charge, to any person obtaining a copy
6 | of this software and associated documentation files (the "Software"), to deal
7 | in the Software without restriction, including without limitation the rights
8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 |
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 |
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 |
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # TensorFlow Tutorials
2 |
3 | [Original repository on GitHub](https://github.com/Hvass-Labs/TensorFlow-Tutorials)
4 |
5 | Original author is [Magnus Erik Hvass Pedersen](http://www.hvass-labs.org)
6 |
7 | ## Introduction
8 |
9 | * These tutorials are intended for beginners in Deep Learning and TensorFlow.
10 | * Each tutorial covers a single topic.
11 | * The source-code is well-documented.
12 | * There is a [YouTube video](https://www.youtube.com/playlist?list=PL9Hr9sNUjfsmEu1ZniY0XpHSzl5uihcXZ) for each tutorial.
13 |
14 | ## Tutorials for TensorFlow 2
15 |
16 | The following tutorials have been updated and work with **TensorFlow 2**
17 | (some of them run in "v.1 compatibility mode").
18 |
19 | 1. Simple Linear Model
20 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/01_Simple_Linear_Model.ipynb))
21 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/01_Simple_Linear_Model.ipynb))
22 |
23 | 2. Convolutional Neural Network
24 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/02_Convolutional_Neural_Network.ipynb))
25 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/02_Convolutional_Neural_Network.ipynb))
26 |
27 | 3-C. Keras API
28 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/03C_Keras_API.ipynb))
29 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/03C_Keras_API.ipynb))
30 |
31 | 10. Fine-Tuning
32 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/10_Fine-Tuning.ipynb))
33 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/10_Fine-Tuning.ipynb))
34 |
35 | 13-B. Visual Analysis for MNIST
36 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/13B_Visual_Analysis_MNIST.ipynb))
37 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/13B_Visual_Analysis_MNIST.ipynb))
38 |
39 | 16. Reinforcement Learning
40 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/16_Reinforcement_Learning.ipynb))
41 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/16_Reinforcement_Learning.ipynb))
42 |
43 | 19. Hyper-Parameter Optimization
44 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/19_Hyper-Parameters.ipynb))
45 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/19_Hyper-Parameters.ipynb))
46 |
47 | 20. Natural Language Processing
48 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/20_Natural_Language_Processing.ipynb))
49 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/20_Natural_Language_Processing.ipynb))
50 |
51 | 21. Machine Translation
52 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/21_Machine_Translation.ipynb))
53 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/21_Machine_Translation.ipynb))
54 |
55 | 22. Image Captioning
56 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/22_Image_Captioning.ipynb))
57 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/22_Image_Captioning.ipynb))
58 |
59 | 23. Time-Series Prediction
60 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/23_Time-Series-Prediction.ipynb))
61 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/23_Time-Series-Prediction.ipynb))
62 |
63 | ## Tutorials for TensorFlow 1
64 |
65 | The following tutorials only work with the older **TensorFlow 1** API, so you
66 | would need to install an older version of TensorFlow to run these. It would take
67 | too much time and effort to convert these tutorials to TensorFlow 2.
68 |
69 | 3. Pretty Tensor
70 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/03_PrettyTensor.ipynb))
71 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/03_PrettyTensor.ipynb))
72 |
73 | 3-B. Layers API
74 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/03B_Layers_API.ipynb))
75 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/03B_Layers_API.ipynb))
76 |
77 | 4. Save & Restore
78 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/04_Save_Restore.ipynb))
79 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/04_Save_Restore.ipynb))
80 |
81 | 5. Ensemble Learning
82 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/05_Ensemble_Learning.ipynb))
83 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/05_Ensemble_Learning.ipynb))
84 |
85 | 6. CIFAR-10
86 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/06_CIFAR-10.ipynb))
87 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/06_CIFAR-10.ipynb))
88 |
89 | 7. Inception Model
90 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/07_Inception_Model.ipynb))
91 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/07_Inception_Model.ipynb))
92 |
93 | 8. Transfer Learning
94 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/08_Transfer_Learning.ipynb))
95 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/08_Transfer_Learning.ipynb))
96 |
97 | 9. Video Data
98 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/09_Video_Data.ipynb))
99 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/09_Video_Data.ipynb))
100 |
101 | 11. Adversarial Examples
102 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/11_Adversarial_Examples.ipynb))
103 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/11_Adversarial_Examples.ipynb))
104 |
105 | 12. Adversarial Noise for MNIST
106 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/12_Adversarial_Noise_MNIST.ipynb))
107 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/12_Adversarial_Noise_MNIST.ipynb))
108 |
109 | 13. Visual Analysis
110 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/13_Visual_Analysis.ipynb))
111 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/13_Visual_Analysis.ipynb))
112 |
113 | 14. DeepDream
114 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/14_DeepDream.ipynb))
115 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/14_DeepDream.ipynb))
116 |
117 | 15. Style Transfer
118 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/15_Style_Transfer.ipynb))
119 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/15_Style_Transfer.ipynb))
120 |
121 | 17. Estimator API
122 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/17_Estimator_API.ipynb))
123 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/17_Estimator_API.ipynb))
124 |
125 | 18. TFRecords & Dataset API
126 | ([Notebook](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/18_TFRecords_Dataset_API.ipynb))
127 | ([Google Colab](https://colab.research.google.com/github/Hvass-Labs/TensorFlow-Tutorials/blob/master/18_TFRecords_Dataset_API.ipynb))
128 |
129 | ## Videos
130 |
131 | These tutorials are also available as [YouTube videos](https://www.youtube.com/playlist?list=PL9Hr9sNUjfsmEu1ZniY0XpHSzl5uihcXZ).
132 |
133 | ## Translations
134 |
135 | These tutorials have been translated to the following languages:
136 |
137 | * [Chinese](https://github.com/Hvass-Labs/TensorFlow-Tutorials-Chinese)
138 |
139 | ### New Translations
140 |
141 | You can help by translating the remaining tutorials or reviewing the ones that have already been translated. You can also help by translating to other languages.
142 |
143 | It is a very big job to translate all the tutorials, so you should just start with Tutorials #01, #02 and #03-C which are the most important for beginners.
144 |
145 | ### New Videos
146 |
147 | You are also very welcome to record your own YouTube videos in other languages. It is strongly recommended that you get a decent microphone because good sound quality is very important. I used `vokoscreen` for recording the videos and the free [DaVinci Resolve](https://www.blackmagicdesign.com/products/davinciresolve/) for editing the videos.
148 |
149 | ## Forks
150 |
151 | See the [selected list of forks](forks.md) for community modifications to these tutorials.
152 |
153 | ## Installation
154 |
155 | There are different ways of installing and running TensorFlow. This section describes how I did it
156 | for these tutorials. You may want to do it differently and you can search the internet for instructions.
157 |
158 | If you are new to using Python and Linux then this may be challenging
159 | to get working and you may need to do internet searches for error-messages, etc.
160 | It will get easier with practice. You can also run the tutorials without installing
161 | anything by using Google Colab, see further below.
162 |
163 | Some of the Python Notebooks use source-code located in different files to allow for easy re-use
164 | across multiple tutorials. It is therefore recommended that you download the whole repository
165 | from GitHub, instead of just downloading the individual Python Notebooks.
166 |
167 | ### Git
168 |
169 | The easiest way to download and install these tutorials is by using git from the command-line:
170 |
171 | git clone https://github.com/Hvass-Labs/TensorFlow-Tutorials.git
172 |
173 | This will create the directory `TensorFlow-Tutorials` and download all the files to it.
174 |
175 | This also makes it easy to update the tutorials, simply by executing this command inside that directory:
176 |
177 | git pull
178 |
179 | ### Download Zip-File
180 |
181 | You can also [download](https://github.com/Hvass-Labs/TensorFlow-Tutorials/archive/master.zip)
182 | the contents of the GitHub repository as a Zip-file and extract it manually.
183 |
184 | ### Environment
185 |
186 | I use [Anaconda](https://www.continuum.io/downloads) because it comes with many Python
187 | packages already installed and it is easy to work with. After installing Anaconda,
188 | you should create a [conda environment](http://conda.pydata.org/docs/using/envs.html)
189 | so you do not destroy your main installation in case you make a mistake somewhere:
190 |
191 | conda create --name tf python=3
192 |
193 | When Python gets updated to a new version, it takes a while before TensorFlow also
194 | uses the new Python version. So if the TensorFlow installation fails, then you may
195 | have to specify an older Python version for your new environment, such as:
196 |
197 | conda create --name tf python=3.6
198 |
199 | Now you can switch to the new environment by running the following (on Linux):
200 |
201 | source activate tf
202 |
203 | ### Required Packages
204 |
205 | The tutorials require several Python packages to be installed. The packages are listed in
206 | [requirements.txt](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/requirements.txt)
207 |
208 | To install the required Python packages and dependencies you first have to activate the
209 | conda-environment as described above, and then you run the following command
210 | in a terminal:
211 |
212 | pip install -r requirements.txt
213 |
214 | Starting with TensorFlow 2.1 it includes both the CPU and GPU versions and will
215 | automatically switch if you have a GPU. But this requires the installation of various
216 | NVIDIA drivers, which is a bit complicated and is not described here.
217 |
218 | ### Python Version 3.5 or Later
219 |
220 | These tutorials were developed on Linux using **Python 3.5 / 3.6** (the [Anaconda](https://www.continuum.io/downloads) distribution) and [PyCharm](https://www.jetbrains.com/pycharm/).
221 |
222 | There are reports that Python 2.7 gives error messages with these tutorials. Please make sure you are using **Python 3.5** or later!
223 |
224 | ## How To Run
225 |
226 | If you have followed the above installation instructions, you should
227 | now be able to run the tutorials in the Python Notebooks:
228 |
229 | cd ~/development/TensorFlow-Tutorials/ # Your installation directory.
230 | jupyter notebook
231 |
232 | This should start a web-browser that shows the list of tutorials. Click on a tutorial to load it.
233 |
234 | ### Run in Google Colab
235 |
236 | If you do not want to install anything on your own computer, then the Notebooks
237 | can be viewed, edited and run entirely on the internet by using
238 | [Google Colab](https://colab.research.google.com). There is a
239 | [YouTube video](https://www.youtube.com/watch?v=Hs6HI2YWchM) explaining how to do this.
240 | You click the "Google Colab"-link next to each tutorial listed above.
241 | You can view the Notebook on Colab but in order to run it you need to login using
242 | your Google account.
243 | Then you need to execute the following commands at the top of the Notebook,
244 | which clones the contents of this repository to your work-directory on Colab.
245 |
246 | # Clone the repository from GitHub to Google Colab's temporary drive.
247 | import os
248 | work_dir = "/content/TensorFlow-Tutorials/"
249 | if not os.path.exists(work_dir):
250 | !git clone https://github.com/Hvass-Labs/TensorFlow-Tutorials.git
251 | os.chdir(work_dir)
252 |
253 | All required packages should already be installed on Colab, otherwise you
254 | can run the following command:
255 |
256 | !pip install -r requirements.txt
257 |
258 | ## Older Versions
259 |
260 | Sometimes the source-code has changed from that shown in the YouTube videos. This may be due to
261 | bug-fixes, improvements, or because code-sections are moved to separate files for easy re-use.
262 |
263 | If you want to see the exact versions of the source-code that were used in the YouTube videos,
264 | then you can [browse the history](https://github.com/Hvass-Labs/TensorFlow-Tutorials/commits/master)
265 | of commits to the GitHub repository.
266 |
267 | ## License (MIT)
268 |
269 | These tutorials and source-code are published under the [MIT License](https://github.com/Hvass-Labs/TensorFlow-Tutorials/blob/master/LICENSE)
270 | which allows very broad use for both academic and commercial purposes.
271 |
272 | A few of the images used for demonstration purposes may be under copyright. These images are included under the "fair usage" laws.
273 |
274 | You are very welcome to modify these tutorials and use them in your own projects.
275 | Please keep a link to the [original repository](https://github.com/Hvass-Labs/TensorFlow-Tutorials).
276 |
--------------------------------------------------------------------------------
/cache.py:
--------------------------------------------------------------------------------
1 | ########################################################################
2 | #
3 | # Cache-wrapper for a function or class.
4 | #
5 | # Save the result of calling a function or creating an object-instance
6 | # to harddisk. This is used to persist the data so it can be reloaded
7 | # very quickly and easily.
8 | #
9 | # Implemented in Python 3.5
10 | #
11 | ########################################################################
12 | #
13 | # This file is part of the TensorFlow Tutorials available at:
14 | #
15 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials
16 | #
17 | # Published under the MIT License. See the file LICENSE for details.
18 | #
19 | # Copyright 2016 by Magnus Erik Hvass Pedersen
20 | #
21 | ########################################################################
22 |
23 | import os
24 | import pickle
25 | import numpy as np
26 |
27 | ########################################################################
28 |
29 |
30 | def cache(cache_path, fn, *args, **kwargs):
31 | """
32 | Cache-wrapper for a function or class. If the cache-file exists
33 | then the data is reloaded and returned, otherwise the function
34 | is called and the result is saved to cache. The fn-argument can
35 | also be a class instead, in which case an object-instance is
36 | created and saved to the cache-file.
37 |
38 | :param cache_path:
39 | File-path for the cache-file.
40 |
41 | :param fn:
42 | Function or class to be called.
43 |
44 | :param args:
45 | Arguments to the function or class-init.
46 |
47 | :param kwargs:
48 | Keyword arguments to the function or class-init.
49 |
50 | :return:
51 | The result of calling the function or creating the object-instance.
52 | """
53 |
54 | # If the cache-file exists.
55 | if os.path.exists(cache_path):
56 | # Load the cached data from the file.
57 | with open(cache_path, mode='rb') as file:
58 | obj = pickle.load(file)
59 |
60 | print("- Data loaded from cache-file: " + cache_path)
61 | else:
62 | # The cache-file does not exist.
63 |
64 | # Call the function / class-init with the supplied arguments.
65 | obj = fn(*args, **kwargs)
66 |
67 | # Save the data to a cache-file.
68 | with open(cache_path, mode='wb') as file:
69 | pickle.dump(obj, file)
70 |
71 | print("- Data saved to cache-file: " + cache_path)
72 |
73 | return obj
74 |
75 |
76 | ########################################################################
77 |
78 |
79 | def convert_numpy2pickle(in_path, out_path):
80 | """
81 | Convert a numpy-file to pickle-file.
82 |
83 | The first version of the cache-function used numpy for saving the data.
84 | Instead of re-calculating all the data, you can just convert the
85 | cache-file using this function.
86 |
87 | :param in_path:
88 | Input file in numpy-format written using numpy.save().
89 |
90 | :param out_path:
91 | Output file written as a pickle-file.
92 |
93 | :return:
94 | Nothing.
95 | """
96 |
97 | # Load the data using numpy.
98 | data = np.load(in_path)
99 |
100 | # Save the data using pickle.
101 | with open(out_path, mode='wb') as file:
102 | pickle.dump(data, file)
103 |
104 |
105 | ########################################################################
106 |
107 | if __name__ == '__main__':
108 | # This is a short example of using a cache-file.
109 |
110 | # This is the function that will only get called if the result
111 | # is not already saved in the cache-file. This would normally
112 | # be a function that takes a long time to compute, or if you
113 | # need persistent data for some other reason.
114 | def expensive_function(a, b):
115 | return a * b
116 |
117 | print('Computing expensive_function() ...')
118 |
119 | # Either load the result from a cache-file if it already exists,
120 | # otherwise calculate expensive_function(a=123, b=456) and
121 | # save the result to the cache-file for next time.
122 | result = cache(cache_path='cache_expensive_function.pkl',
123 | fn=expensive_function, a=123, b=456)
124 |
125 | print('result =', result)
126 |
127 | # Newline.
128 | print()
129 |
130 | # This is another example which saves an object to a cache-file.
131 |
132 | # We want to cache an object-instance of this class.
133 | # The motivation is to do an expensive computation only once,
134 | # or if we need to persist the data for some other reason.
135 | class ExpensiveClass:
136 | def __init__(self, c, d):
137 | self.c = c
138 | self.d = d
139 | self.result = c * d
140 |
141 | def print_result(self):
142 | print('c =', self.c)
143 | print('d =', self.d)
144 | print('result = c * d =', self.result)
145 |
146 | print('Creating object from ExpensiveClass() ...')
147 |
148 | # Either load the object from a cache-file if it already exists,
149 | # otherwise make an object-instance ExpensiveClass(c=123, d=456)
150 | # and save the object to the cache-file for the next time.
151 | obj = cache(cache_path='cache_ExpensiveClass.pkl',
152 | fn=ExpensiveClass, c=123, d=456)
153 |
154 | obj.print_result()
155 |
156 | ########################################################################
157 |
--------------------------------------------------------------------------------
/cifar10.py:
--------------------------------------------------------------------------------
1 | ########################################################################
2 | #
3 | # Functions for downloading the CIFAR-10 data-set from the internet
4 | # and loading it into memory.
5 | #
6 | # Implemented in Python 3.5
7 | #
8 | # Usage:
9 | # 1) Set the variable data_path with the desired storage path.
10 | # 2) Call maybe_download_and_extract() to download the data-set
11 | # if it is not already located in the given data_path.
12 | # 3) Call load_class_names() to get an array of the class-names.
13 | # 4) Call load_training_data() and load_test_data() to get
14 | # the images, class-numbers and one-hot encoded class-labels
15 | # for the training-set and test-set.
16 | # 5) Use the returned data in your own program.
17 | #
18 | # Format:
19 | # The images for the training- and test-sets are returned as 4-dim numpy
20 | # arrays each with the shape: [image_number, height, width, channel]
21 | # where the individual pixels are floats between 0.0 and 1.0.
22 | #
23 | ########################################################################
24 | #
25 | # This file is part of the TensorFlow Tutorials available at:
26 | #
27 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials
28 | #
29 | # Published under the MIT License. See the file LICENSE for details.
30 | #
31 | # Copyright 2016 by Magnus Erik Hvass Pedersen
32 | #
33 | ########################################################################
34 |
35 | import numpy as np
36 | import pickle
37 | import os
38 | import download
39 | from dataset import one_hot_encoded
40 |
41 | ########################################################################
42 |
43 | # Directory where you want to download and save the data-set.
44 | # Set this before you start calling any of the functions below.
45 | data_path = "data/CIFAR-10/"
46 |
47 | # URL for the data-set on the internet.
48 | data_url = "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
49 |
50 | ########################################################################
51 | # Various constants for the size of the images.
52 | # Use these constants in your own program.
53 |
54 | # Width and height of each image.
55 | img_size = 32
56 |
57 | # Number of channels in each image, 3 channels: Red, Green, Blue.
58 | num_channels = 3
59 |
60 | # Length of an image when flattened to a 1-dim array.
61 | img_size_flat = img_size * img_size * num_channels
62 |
63 | # Number of classes.
64 | num_classes = 10
65 |
66 | ########################################################################
67 | # Various constants used to allocate arrays of the correct size.
68 |
69 | # Number of files for the training-set.
70 | _num_files_train = 5
71 |
72 | # Number of images for each batch-file in the training-set.
73 | _images_per_file = 10000
74 |
75 | # Total number of images in the training-set.
76 | # This is used to pre-allocate arrays for efficiency.
77 | _num_images_train = _num_files_train * _images_per_file
78 |
79 | ########################################################################
80 | # Private functions for downloading, unpacking and loading data-files.
81 |
82 |
83 | def _get_file_path(filename=""):
84 | """
85 | Return the full path of a data-file for the data-set.
86 |
87 | If filename=="" then return the directory of the files.
88 | """
89 |
90 | return os.path.join(data_path, "cifar-10-batches-py/", filename)
91 |
92 |
93 | def _unpickle(filename):
94 | """
95 | Unpickle the given file and return the data.
96 |
97 | Note that the appropriate dir-name is prepended the filename.
98 | """
99 |
100 | # Create full path for the file.
101 | file_path = _get_file_path(filename)
102 |
103 | print("Loading data: " + file_path)
104 |
105 | with open(file_path, mode='rb') as file:
106 | # In Python 3.X it is important to set the encoding,
107 | # otherwise an exception is raised here.
108 | data = pickle.load(file, encoding='bytes')
109 |
110 | return data
111 |
112 |
113 | def _convert_images(raw):
114 | """
115 | Convert images from the CIFAR-10 format and
116 | return a 4-dim array with shape: [image_number, height, width, channel]
117 | where the pixels are floats between 0.0 and 1.0.
118 | """
119 |
120 | # Convert the raw images from the data-files to floating-points.
121 | raw_float = np.array(raw, dtype=float) / 255.0
122 |
123 | # Reshape the array to 4-dimensions.
124 | images = raw_float.reshape([-1, num_channels, img_size, img_size])
125 |
126 | # Reorder the indices of the array.
127 | images = images.transpose([0, 2, 3, 1])
128 |
129 | return images
130 |
131 |
132 | def _load_data(filename):
133 | """
134 | Load a pickled data-file from the CIFAR-10 data-set
135 | and return the converted images (see above) and the class-number
136 | for each image.
137 | """
138 |
139 | # Load the pickled data-file.
140 | data = _unpickle(filename)
141 |
142 | # Get the raw images.
143 | raw_images = data[b'data']
144 |
145 | # Get the class-numbers for each image. Convert to numpy-array.
146 | cls = np.array(data[b'labels'])
147 |
148 | # Convert the images.
149 | images = _convert_images(raw_images)
150 |
151 | return images, cls
152 |
153 |
154 | ########################################################################
155 | # Public functions that you may call to download the data-set from
156 | # the internet and load the data into memory.
157 |
158 |
159 | def maybe_download_and_extract():
160 | """
161 | Download and extract the CIFAR-10 data-set if it doesn't already exist
162 | in data_path (set this variable first to the desired path).
163 | """
164 |
165 | download.maybe_download_and_extract(url=data_url, download_dir=data_path)
166 |
167 |
168 | def load_class_names():
169 | """
170 | Load the names for the classes in the CIFAR-10 data-set.
171 |
172 | Returns a list with the names. Example: names[3] is the name
173 | associated with class-number 3.
174 | """
175 |
176 | # Load the class-names from the pickled file.
177 | raw = _unpickle(filename="batches.meta")[b'label_names']
178 |
179 | # Convert from binary strings.
180 | names = [x.decode('utf-8') for x in raw]
181 |
182 | return names
183 |
184 |
185 | def load_training_data():
186 | """
187 | Load all the training-data for the CIFAR-10 data-set.
188 |
189 | The data-set is split into 5 data-files which are merged here.
190 |
191 | Returns the images, class-numbers and one-hot encoded class-labels.
192 | """
193 |
194 | # Pre-allocate the arrays for the images and class-numbers for efficiency.
195 | images = np.zeros(shape=[_num_images_train, img_size, img_size, num_channels], dtype=float)
196 | cls = np.zeros(shape=[_num_images_train], dtype=int)
197 |
198 | # Begin-index for the current batch.
199 | begin = 0
200 |
201 | # For each data-file.
202 | for i in range(_num_files_train):
203 | # Load the images and class-numbers from the data-file.
204 | images_batch, cls_batch = _load_data(filename="data_batch_" + str(i + 1))
205 |
206 | # Number of images in this batch.
207 | num_images = len(images_batch)
208 |
209 | # End-index for the current batch.
210 | end = begin + num_images
211 |
212 | # Store the images into the array.
213 | images[begin:end, :] = images_batch
214 |
215 | # Store the class-numbers into the array.
216 | cls[begin:end] = cls_batch
217 |
218 | # The begin-index for the next batch is the current end-index.
219 | begin = end
220 |
221 | return images, cls, one_hot_encoded(class_numbers=cls, num_classes=num_classes)
222 |
223 |
224 | def load_test_data():
225 | """
226 | Load all the test-data for the CIFAR-10 data-set.
227 |
228 | Returns the images, class-numbers and one-hot encoded class-labels.
229 | """
230 |
231 | images, cls = _load_data(filename="test_batch")
232 |
233 | return images, cls, one_hot_encoded(class_numbers=cls, num_classes=num_classes)
234 |
235 | ########################################################################
236 |
--------------------------------------------------------------------------------
/coco.py:
--------------------------------------------------------------------------------
1 | ########################################################################
2 | #
3 | # Functions for downloading the COCO data-set from the internet
4 | # and loading it into memory. This data-set contains images and
5 | # various associated data such as text-captions describing the images.
6 | #
7 | # http://cocodataset.org
8 | #
9 | # Implemented in Python 3.6
10 | #
11 | # Usage:
12 | # 1) Call set_data_dir() to set the desired storage directory.
13 | # 2) Call maybe_download_and_extract() to download the data-set
14 | # if it is not already located in the given data_dir.
15 | # 3) Call load_records(train=True) and load_records(train=False)
16 | # to load the data-records for the training- and validation sets.
17 | # 5) Use the returned data in your own program.
18 | #
19 | # Format:
20 | # The COCO data-set contains a large number of images and various
21 | # data for each image stored in a JSON-file.
22 | # Functionality is provided for getting a list of image-filenames
23 | # (but not actually loading the images) along with their associated
24 | # data such as text-captions describing the contents of the images.
25 | #
26 | ########################################################################
27 | #
28 | # This file is part of the TensorFlow Tutorials available at:
29 | #
30 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials
31 | #
32 | # Published under the MIT License. See the file LICENSE for details.
33 | #
34 | # Copyright 2018 by Magnus Erik Hvass Pedersen
35 | #
36 | ########################################################################
37 |
38 | import json
39 | import os
40 | import download
41 | from cache import cache
42 |
43 | ########################################################################
44 |
45 | # Directory where you want to download and save the data-set.
46 | # Set this before you start calling any of the functions below.
47 | # Use the function set_data_dir() to also update train_dir and val_dir.
48 | data_dir = "data/coco/"
49 |
50 | # Sub-directories for the training- and validation-sets.
51 | train_dir = "data/coco/train2017"
52 | val_dir = "data/coco/val2017"
53 |
54 | # Base-URL for the data-sets on the internet.
55 | data_url = "http://images.cocodataset.org/"
56 |
57 |
58 | ########################################################################
59 | # Private helper-functions.
60 |
61 | def _load_records(train=True):
62 | """
63 | Load the image-filenames and captions
64 | for either the training-set or the validation-set.
65 | """
66 |
67 | if train:
68 | # Training-set.
69 | filename = "captions_train2017.json"
70 | else:
71 | # Validation-set.
72 | filename = "captions_val2017.json"
73 |
74 | # Full path for the data-file.
75 | path = os.path.join(data_dir, "annotations", filename)
76 |
77 | # Load the file.
78 | with open(path, "r", encoding="utf-8") as file:
79 | data_raw = json.load(file)
80 |
81 | # Convenience variables.
82 | images = data_raw['images']
83 | annotations = data_raw['annotations']
84 |
85 | # Initialize the dict for holding our data.
86 | # The lookup-key is the image-id.
87 | records = dict()
88 |
89 | # Collect all the filenames for the images.
90 | for image in images:
91 | # Get the id and filename for this image.
92 | image_id = image['id']
93 | filename = image['file_name']
94 |
95 | # Initialize a new data-record.
96 | record = dict()
97 |
98 | # Set the image-filename in the data-record.
99 | record['filename'] = filename
100 |
101 | # Initialize an empty list of image-captions
102 | # which will be filled further below.
103 | record['captions'] = list()
104 |
105 | # Save the record using the the image-id as the lookup-key.
106 | records[image_id] = record
107 |
108 | # Collect all the captions for the images.
109 | for ann in annotations:
110 | # Get the id and caption for an image.
111 | image_id = ann['image_id']
112 | caption = ann['caption']
113 |
114 | # Lookup the data-record for this image-id.
115 | # This data-record should already exist from the loop above.
116 | record = records[image_id]
117 |
118 | # Append the current caption to the list of captions in the
119 | # data-record that was initialized in the loop above.
120 | record['captions'].append(caption)
121 |
122 | # Convert the records-dict to a list of tuples.
123 | records_list = [(key, record['filename'], record['captions'])
124 | for key, record in sorted(records.items())]
125 |
126 | # Convert the list of tuples to separate tuples with the data.
127 | ids, filenames, captions = zip(*records_list)
128 |
129 | return ids, filenames, captions
130 |
131 |
132 | ########################################################################
133 | # Public functions that you may call to download the data-set from
134 | # the internet and load the data into memory.
135 |
136 |
137 | def set_data_dir(new_data_dir):
138 | """
139 | Set the base-directory for data-files and then
140 | set the sub-dirs for training and validation data.
141 | """
142 |
143 | # Ensure we update the global variables.
144 | global data_dir, train_dir, val_dir
145 |
146 | data_dir = new_data_dir
147 | train_dir = os.path.join(new_data_dir, "train2017")
148 | val_dir = os.path.join(new_data_dir, "val2017")
149 |
150 |
151 | def maybe_download_and_extract():
152 | """
153 | Download and extract the COCO data-set if the data-files don't
154 | already exist in data_dir.
155 | """
156 |
157 | # Filenames to download from the internet.
158 | filenames = ["zips/train2017.zip", "zips/val2017.zip",
159 | "annotations/annotations_trainval2017.zip"]
160 |
161 | # Download these files.
162 | for filename in filenames:
163 | # Create the full URL for the given file.
164 | url = data_url + filename
165 |
166 | print("Downloading " + url)
167 |
168 | download.maybe_download_and_extract(url=url, download_dir=data_dir)
169 |
170 |
171 | def load_records(train=True):
172 | """
173 | Load the data-records for the data-set. This returns the image ids,
174 | filenames and text-captions for either the training-set or validation-set.
175 |
176 | This wraps _load_records() above with a cache, so if the cache-file already
177 | exists then it is loaded instead of processing the original data-file.
178 |
179 | :param train:
180 | Bool whether to load the training-set (True) or validation-set (False).
181 |
182 | :return:
183 | ids, filenames, captions for the images in the data-set.
184 | """
185 |
186 | if train:
187 | # Cache-file for the training-set data.
188 | cache_filename = "records_train.pkl"
189 | else:
190 | # Cache-file for the validation-set data.
191 | cache_filename = "records_val.pkl"
192 |
193 | # Path for the cache-file.
194 | cache_path = os.path.join(data_dir, cache_filename)
195 |
196 | # If the data-records already exist in a cache-file then load it,
197 | # otherwise call the _load_records() function and save its
198 | # return-values to the cache-file so it can be loaded the next time.
199 | records = cache(cache_path=cache_path,
200 | fn=_load_records,
201 | train=train)
202 |
203 | return records
204 |
205 | ########################################################################
206 |
--------------------------------------------------------------------------------
/convert.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/python
2 |
3 | ########################################################################
4 | #
5 | # Function and script for converting videos to images.
6 | #
7 | # This can be run as a script in a Linux shell by typing:
8 | #
9 | # python convert.py
10 | #
11 | # Or by running:
12 | #
13 | # chmod +x convert.py
14 | # ./convert.py
15 | #
16 | # Requires the program avconv to be installed.
17 | # Tested with avconv v. 9.18-6 on Linux Mint.
18 | #
19 | # Implemented in Python 3.5 (seems to work in Python 2.7 as well)
20 | #
21 | ########################################################################
22 | #
23 | # This file is part of the TensorFlow Tutorials available at:
24 | #
25 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials
26 | #
27 | # Published under the MIT License. See the file LICENSE for details.
28 | #
29 | # Copyright 2016 by Magnus Erik Hvass Pedersen
30 | #
31 | ########################################################################
32 |
33 | import os
34 | import subprocess
35 | import argparse
36 |
37 | ########################################################################
38 |
39 |
40 | def video2images(in_dir, out_dir, crop_size, out_size, framerate, video_exts):
41 | """
42 | Convert videos to images. The videos are located in the directory in_dir
43 | and all its sub-directories which are processed recursively. The directory
44 | structure is replicated to out_dir where the jpeg-images are saved.
45 |
46 | :param in_dir:
47 | Input directory for the videos e.g. "/home/magnus/video/"
48 | All sub-directories are processed recursively.
49 |
50 | :param out_dir:
51 | Output directory for the images e.g. "/home/magnus/video-images/"
52 |
53 | :param crop_size:
54 | Integer. First the videos are cropped to this width and height.
55 |
56 | :param out_size:
57 | Integer. After cropping, the videos are resized to this width and height.
58 |
59 | :param framerate:
60 | Integer. Number of frames to grab per second.
61 |
62 | :param video_exts:
63 | Tuple of strings. Extensions for video-files e.g. ('.mts', '.mp4')
64 | Not case-sensitive.
65 |
66 | :return:
67 | Nothing.
68 | """
69 |
70 | # Convert all video extensions to lower-case.
71 | video_exts = tuple(ext.lower() for ext in video_exts)
72 |
73 | # Number of videos processed.
74 | video_count = 0
75 |
76 | # Process all the sub-dirs recursively.
77 | for current_dir, dir_names, file_names in os.walk(in_dir):
78 | # The current dir relative to the input directory.
79 | relative_path = os.path.relpath(current_dir, in_dir)
80 |
81 | # Name of the new directory for the output images.
82 | new_dir = os.path.join(out_dir, relative_path)
83 |
84 | # If the output-directory does not exist, then create it.
85 | if not os.path.exists(new_dir):
86 | os.makedirs(new_dir)
87 |
88 | # For all the files in the current directory.
89 | for file_name in file_names:
90 | # If the file has a valid video-extension. Compare lower-cases.
91 | if file_name.lower().endswith(video_exts):
92 | # File-path for the input video.
93 | in_file = os.path.join(current_dir, file_name)
94 |
95 | # Split the file-path in root and extension.
96 | file_root, file_ext = os.path.splitext(file_name)
97 |
98 | # Create the template file-name for the output images.
99 | new_file_name = file_root + "-%4d.jpg"
100 |
101 | # Complete file-path for the output images incl. all sub-dirs.
102 | new_file_path = os.path.join(new_dir, new_file_name)
103 |
104 | # Clean up the path by removing e.g. "/./"
105 | new_file_path = os.path.normpath(new_file_path)
106 |
107 | # Print status.
108 | print("Converting video to images:")
109 | print("- Input video: {0}".format(in_file))
110 | print("- Output images: {0}".format(new_file_path))
111 |
112 | # Command to be run in the shell for the video-conversion tool.
113 | cmd = "avconv -i {0} -r {1} -vf crop={2}:{2} -vf scale={3}:{3} -qscale 2 {4}"
114 |
115 | # Fill in the arguments for the command-line.
116 | cmd = cmd.format(in_file, framerate, crop_size, out_size, new_file_path)
117 |
118 | # Run the command-line in a shell.
119 | subprocess.call(cmd, shell=True)
120 |
121 | # Increase the number of videos processed.
122 | video_count += 1
123 |
124 | # Print newline.
125 | print()
126 |
127 | print("Number of videos converted: {0}".format(video_count))
128 |
129 |
130 | ########################################################################
131 | # This script allows you to run the video-conversion from the command-line.
132 |
133 | if __name__ == "__main__":
134 | # Argument description.
135 | desc = "Convert videos to images. " \
136 | "Recursively processes all sub-dirs of INDIR " \
137 | "and replicates the dir-structure to OUTDIR. " \
138 | "The video is first cropped to CROP:CROP pixels, " \
139 | "then resized to SIZE:SIZE pixels and written as a jpeg-file. "
140 |
141 | # Create the argument parser.
142 | parser = argparse.ArgumentParser(description=desc)
143 |
144 | # Add arguments to the parser.
145 | parser.add_argument("--indir", required=True,
146 | help="input directory where videos are located")
147 |
148 | parser.add_argument("--outdir", required=True,
149 | help="output directory where images will be saved")
150 |
151 | parser.add_argument("--crop", required=True, type=int,
152 | help="the input videos are first cropped to CROP:CROP pixels")
153 |
154 | parser.add_argument("--size", required=True, type=int,
155 | help="the input videos are then resized to SIZE:SIZE pixels")
156 |
157 | parser.add_argument("--rate", required=False, type=int, default=5,
158 | help="the number of frames to convert per second")
159 |
160 | parser.add_argument("--exts", required=False, nargs="+",
161 | help="list of extensions for video-files e.g. .mts .mp4")
162 |
163 | # Parse the command-line arguments.
164 | args = parser.parse_args()
165 |
166 | # Get the arguments.
167 | in_dir = args.indir
168 | out_dir = args.outdir
169 | crop_size = args.crop
170 | out_size = args.size
171 | framerate = args.rate
172 | video_exts = args.exts
173 |
174 | if video_exts is None:
175 | # Default extensions for video-files.
176 | video_exts = (".MTS", ".mp4")
177 | else:
178 | # A list of strings is provided as a command-line argument, but we
179 | # need a tuple instead of a list, so convert it to a tuple.
180 | video_exts = tuple(video_exts)
181 |
182 | # Print the arguments.
183 | print("Convert videos to images.")
184 | print("- Input dir: " + in_dir)
185 | print("- Output dir: " + out_dir)
186 | print("- Crop width and height: {0}".format(crop_size))
187 | print("- Resize width and height: {0}".format(out_size))
188 | print("- Frame-rate: {0}".format(framerate))
189 | print("- Video extensions: {0}".format(video_exts))
190 | print()
191 |
192 | # Perform the conversions.
193 | video2images(in_dir=in_dir, out_dir=out_dir,
194 | crop_size=crop_size, out_size=out_size,
195 | framerate=framerate, video_exts=video_exts)
196 |
197 | ########################################################################
198 |
--------------------------------------------------------------------------------
/dataset.py:
--------------------------------------------------------------------------------
1 | ########################################################################
2 | #
3 | # Class for creating a data-set consisting of all files in a directory.
4 | #
5 | # Example usage is shown in the file knifey.py and Tutorial #09.
6 | #
7 | # Implemented in Python 3.5
8 | #
9 | ########################################################################
10 | #
11 | # This file is part of the TensorFlow Tutorials available at:
12 | #
13 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials
14 | #
15 | # Published under the MIT License. See the file LICENSE for details.
16 | #
17 | # Copyright 2016 by Magnus Erik Hvass Pedersen
18 | #
19 | ########################################################################
20 |
21 | import numpy as np
22 | import os
23 | import shutil
24 | from cache import cache
25 |
26 | ########################################################################
27 |
28 |
29 | def one_hot_encoded(class_numbers, num_classes=None):
30 | """
31 | Generate the One-Hot encoded class-labels from an array of integers.
32 |
33 | For example, if class_number=2 and num_classes=4 then
34 | the one-hot encoded label is the float array: [0. 0. 1. 0.]
35 |
36 | :param class_numbers:
37 | Array of integers with class-numbers.
38 | Assume the integers are from zero to num_classes-1 inclusive.
39 |
40 | :param num_classes:
41 | Number of classes. If None then use max(class_numbers)+1.
42 |
43 | :return:
44 | 2-dim array of shape: [len(class_numbers), num_classes]
45 | """
46 |
47 | # Find the number of classes if None is provided.
48 | # Assumes the lowest class-number is zero.
49 | if num_classes is None:
50 | num_classes = np.max(class_numbers) + 1
51 |
52 | return np.eye(num_classes, dtype=float)[class_numbers]
53 |
54 |
55 | ########################################################################
56 |
57 |
58 | class DataSet:
59 | def __init__(self, in_dir, exts='.jpg'):
60 | """
61 | Create a data-set consisting of the filenames in the given directory
62 | and sub-dirs that match the given filename-extensions.
63 |
64 | For example, the knifey-spoony data-set (see knifey.py) has the
65 | following dir-structure:
66 |
67 | knifey-spoony/forky/
68 | knifey-spoony/knifey/
69 | knifey-spoony/spoony/
70 | knifey-spoony/forky/test/
71 | knifey-spoony/knifey/test/
72 | knifey-spoony/spoony/test/
73 |
74 | This means there are 3 classes called: forky, knifey, and spoony.
75 |
76 | If we set in_dir = "knifey-spoony/" and create a new DataSet-object
77 | then it will scan through these directories and create a training-set
78 | and test-set for each of these classes.
79 |
80 | The training-set will contain a list of all the *.jpg filenames
81 | in the following directories:
82 |
83 | knifey-spoony/forky/
84 | knifey-spoony/knifey/
85 | knifey-spoony/spoony/
86 |
87 | The test-set will contain a list of all the *.jpg filenames
88 | in the following directories:
89 |
90 | knifey-spoony/forky/test/
91 | knifey-spoony/knifey/test/
92 | knifey-spoony/spoony/test/
93 |
94 | See the TensorFlow Tutorial #09 for a usage example.
95 |
96 | :param in_dir:
97 | Root-dir for the files in the data-set.
98 | This would be 'knifey-spoony/' in the example above.
99 |
100 | :param exts:
101 | String or tuple of strings with valid filename-extensions.
102 | Not case-sensitive.
103 |
104 | :return:
105 | Object instance.
106 | """
107 |
108 | # Extend the input directory to the full path.
109 | in_dir = os.path.abspath(in_dir)
110 |
111 | # Input directory.
112 | self.in_dir = in_dir
113 |
114 | # Convert all file-extensions to lower-case.
115 | self.exts = tuple(ext.lower() for ext in exts)
116 |
117 | # Names for the classes.
118 | self.class_names = []
119 |
120 | # Filenames for all the files in the training-set.
121 | self.filenames = []
122 |
123 | # Filenames for all the files in the test-set.
124 | self.filenames_test = []
125 |
126 | # Class-number for each file in the training-set.
127 | self.class_numbers = []
128 |
129 | # Class-number for each file in the test-set.
130 | self.class_numbers_test = []
131 |
132 | # Total number of classes in the data-set.
133 | self.num_classes = 0
134 |
135 | # For all files/dirs in the input directory.
136 | for name in os.listdir(in_dir):
137 | # Full path for the file / dir.
138 | current_dir = os.path.join(in_dir, name)
139 |
140 | # If it is a directory.
141 | if os.path.isdir(current_dir):
142 | # Add the dir-name to the list of class-names.
143 | self.class_names.append(name)
144 |
145 | # Training-set.
146 |
147 | # Get all the valid filenames in the dir (not sub-dirs).
148 | filenames = self._get_filenames(current_dir)
149 |
150 | # Append them to the list of all filenames for the training-set.
151 | self.filenames.extend(filenames)
152 |
153 | # The class-number for this class.
154 | class_number = self.num_classes
155 |
156 | # Create an array of class-numbers.
157 | class_numbers = [class_number] * len(filenames)
158 |
159 | # Append them to the list of all class-numbers for the training-set.
160 | self.class_numbers.extend(class_numbers)
161 |
162 | # Test-set.
163 |
164 | # Get all the valid filenames in the sub-dir named 'test'.
165 | filenames_test = self._get_filenames(os.path.join(current_dir, 'test'))
166 |
167 | # Append them to the list of all filenames for the test-set.
168 | self.filenames_test.extend(filenames_test)
169 |
170 | # Create an array of class-numbers.
171 | class_numbers = [class_number] * len(filenames_test)
172 |
173 | # Append them to the list of all class-numbers for the test-set.
174 | self.class_numbers_test.extend(class_numbers)
175 |
176 | # Increase the total number of classes in the data-set.
177 | self.num_classes += 1
178 |
179 | def _get_filenames(self, dir):
180 | """
181 | Create and return a list of filenames with matching extensions in the given directory.
182 |
183 | :param dir:
184 | Directory to scan for files. Sub-dirs are not scanned.
185 |
186 | :return:
187 | List of filenames. Only filenames. Does not include the directory.
188 | """
189 |
190 | # Initialize empty list.
191 | filenames = []
192 |
193 | # If the directory exists.
194 | if os.path.exists(dir):
195 | # Get all the filenames with matching extensions.
196 | for filename in os.listdir(dir):
197 | if filename.lower().endswith(self.exts):
198 | filenames.append(filename)
199 |
200 | return filenames
201 |
202 | def get_paths(self, test=False):
203 | """
204 | Get the full paths for the files in the data-set.
205 |
206 | :param test:
207 | Boolean. Return the paths for the test-set (True) or training-set (False).
208 |
209 | :return:
210 | Iterator with strings for the path-names.
211 | """
212 |
213 | if test:
214 | # Use the filenames and class-numbers for the test-set.
215 | filenames = self.filenames_test
216 | class_numbers = self.class_numbers_test
217 |
218 | # Sub-dir for test-set.
219 | test_dir = "test/"
220 | else:
221 | # Use the filenames and class-numbers for the training-set.
222 | filenames = self.filenames
223 | class_numbers = self.class_numbers
224 |
225 | # Don't use a sub-dir for test-set.
226 | test_dir = ""
227 |
228 | for filename, cls in zip(filenames, class_numbers):
229 | # Full path-name for the file.
230 | path = os.path.join(self.in_dir, self.class_names[cls], test_dir, filename)
231 |
232 | yield path
233 |
234 | def get_training_set(self):
235 | """
236 | Return the list of paths for the files in the training-set,
237 | and the list of class-numbers as integers,
238 | and the class-numbers as one-hot encoded arrays.
239 | """
240 |
241 | return list(self.get_paths()), \
242 | np.asarray(self.class_numbers), \
243 | one_hot_encoded(class_numbers=self.class_numbers,
244 | num_classes=self.num_classes)
245 |
246 | def get_test_set(self):
247 | """
248 | Return the list of paths for the files in the test-set,
249 | and the list of class-numbers as integers,
250 | and the class-numbers as one-hot encoded arrays.
251 | """
252 |
253 | return list(self.get_paths(test=True)), \
254 | np.asarray(self.class_numbers_test), \
255 | one_hot_encoded(class_numbers=self.class_numbers_test,
256 | num_classes=self.num_classes)
257 |
258 | def copy_files(self, train_dir, test_dir):
259 | """
260 | Copy all the files in the training-set to train_dir
261 | and copy all the files in the test-set to test_dir.
262 |
263 | For example, the normal directory structure for the
264 | different classes in the training-set is:
265 |
266 | knifey-spoony/forky/
267 | knifey-spoony/knifey/
268 | knifey-spoony/spoony/
269 |
270 | Normally the test-set is a sub-dir of the training-set:
271 |
272 | knifey-spoony/forky/test/
273 | knifey-spoony/knifey/test/
274 | knifey-spoony/spoony/test/
275 |
276 | But some APIs use another dir-structure for the training-set:
277 |
278 | knifey-spoony/train/forky/
279 | knifey-spoony/train/knifey/
280 | knifey-spoony/train/spoony/
281 |
282 | and for the test-set:
283 |
284 | knifey-spoony/test/forky/
285 | knifey-spoony/test/knifey/
286 | knifey-spoony/test/spoony/
287 |
288 | :param train_dir: Directory for the training-set e.g. 'knifey-spoony/train/'
289 | :param test_dir: Directory for the test-set e.g. 'knifey-spoony/test/'
290 | :return: Nothing.
291 | """
292 |
293 | # Helper-function for actually copying the files.
294 | def _copy_files(src_paths, dst_dir, class_numbers):
295 |
296 | # Create a list of dirs for each class, e.g.:
297 | # ['knifey-spoony/test/forky/',
298 | # 'knifey-spoony/test/knifey/',
299 | # 'knifey-spoony/test/spoony/']
300 | class_dirs = [os.path.join(dst_dir, class_name + "/")
301 | for class_name in self.class_names]
302 |
303 | # Check if each class-directory exists, otherwise create it.
304 | for dir in class_dirs:
305 | if not os.path.exists(dir):
306 | os.makedirs(dir)
307 |
308 | # For all the file-paths and associated class-numbers,
309 | # copy the file to the destination dir for that class.
310 | for src, cls in zip(src_paths, class_numbers):
311 | shutil.copy(src=src, dst=class_dirs[cls])
312 |
313 | # Copy the files for the training-set.
314 | _copy_files(src_paths=self.get_paths(test=False),
315 | dst_dir=train_dir,
316 | class_numbers=self.class_numbers)
317 |
318 | print("- Copied training-set to:", train_dir)
319 |
320 | # Copy the files for the test-set.
321 | _copy_files(src_paths=self.get_paths(test=True),
322 | dst_dir=test_dir,
323 | class_numbers=self.class_numbers_test)
324 |
325 | print("- Copied test-set to:", test_dir)
326 |
327 |
328 | ########################################################################
329 |
330 |
331 | def load_cached(cache_path, in_dir):
332 | """
333 | Wrapper-function for creating a DataSet-object, which will be
334 | loaded from a cache-file if it already exists, otherwise a new
335 | object will be created and saved to the cache-file.
336 |
337 | This is useful if you need to ensure the ordering of the
338 | filenames is consistent every time you load the data-set,
339 | for example if you use the DataSet-object in combination
340 | with Transfer Values saved to another cache-file, see e.g.
341 | Tutorial #09 for an example of this.
342 |
343 | :param cache_path:
344 | File-path for the cache-file.
345 |
346 | :param in_dir:
347 | Root-dir for the files in the data-set.
348 | This is an argument for the DataSet-init function.
349 |
350 | :return:
351 | The DataSet-object.
352 | """
353 |
354 | print("Creating dataset from the files in: " + in_dir)
355 |
356 | # If the object-instance for DataSet(in_dir=data_dir) already
357 | # exists in the cache-file then reload it, otherwise create
358 | # an object instance and save it to the cache-file for next time.
359 | dataset = cache(cache_path=cache_path,
360 | fn=DataSet, in_dir=in_dir)
361 |
362 | return dataset
363 |
364 |
365 | ########################################################################
366 |
--------------------------------------------------------------------------------
/download.py:
--------------------------------------------------------------------------------
1 | ########################################################################
2 | #
3 | # Functions for downloading and extracting data-files from the internet.
4 | #
5 | # Implemented in Python 3.5
6 | #
7 | ########################################################################
8 | #
9 | # This file is part of the TensorFlow Tutorials available at:
10 | #
11 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials
12 | #
13 | # Published under the MIT License. See the file LICENSE for details.
14 | #
15 | # Copyright 2016 by Magnus Erik Hvass Pedersen
16 | #
17 | ########################################################################
18 |
19 | import sys
20 | import os
21 | import urllib.request
22 | import tarfile
23 | import zipfile
24 |
25 | ########################################################################
26 |
27 |
28 | def _print_download_progress(count, block_size, total_size):
29 | """
30 | Function used for printing the download progress.
31 | Used as a call-back function in maybe_download_and_extract().
32 | """
33 |
34 | # Percentage completion.
35 | pct_complete = float(count * block_size) / total_size
36 |
37 | # Limit it because rounding errors may cause it to exceed 100%.
38 | pct_complete = min(1.0, pct_complete)
39 |
40 | # Status-message. Note the \r which means the line should overwrite itself.
41 | msg = "\r- Download progress: {0:.1%}".format(pct_complete)
42 |
43 | # Print it.
44 | sys.stdout.write(msg)
45 | sys.stdout.flush()
46 |
47 |
48 | ########################################################################
49 |
50 | def download(base_url, filename, download_dir):
51 | """
52 | Download the given file if it does not already exist in the download_dir.
53 |
54 | :param base_url: The internet URL without the filename.
55 | :param filename: The filename that will be added to the base_url.
56 | :param download_dir: Local directory for storing the file.
57 | :return: Nothing.
58 | """
59 |
60 | # Path for local file.
61 | save_path = os.path.join(download_dir, filename)
62 |
63 | # Check if the file already exists, otherwise we need to download it now.
64 | if not os.path.exists(save_path):
65 | # Check if the download directory exists, otherwise create it.
66 | if not os.path.exists(download_dir):
67 | os.makedirs(download_dir)
68 |
69 | print("Downloading", filename, "...")
70 |
71 | # Download the file from the internet.
72 | url = base_url + filename
73 | file_path, _ = urllib.request.urlretrieve(url=url,
74 | filename=save_path,
75 | reporthook=_print_download_progress)
76 |
77 | print(" Done!")
78 |
79 |
80 | def maybe_download_and_extract(url, download_dir):
81 | """
82 | Download and extract the data if it doesn't already exist.
83 | Assumes the url is a tar-ball file.
84 |
85 | :param url:
86 | Internet URL for the tar-file to download.
87 | Example: "https://www.cs.toronto.edu/~kriz/cifar-10-python.tar.gz"
88 |
89 | :param download_dir:
90 | Directory where the downloaded file is saved.
91 | Example: "data/CIFAR-10/"
92 |
93 | :return:
94 | Nothing.
95 | """
96 |
97 | # Filename for saving the file downloaded from the internet.
98 | # Use the filename from the URL and add it to the download_dir.
99 | filename = url.split('/')[-1]
100 | file_path = os.path.join(download_dir, filename)
101 |
102 | # Check if the file already exists.
103 | # If it exists then we assume it has also been extracted,
104 | # otherwise we need to download and extract it now.
105 | if not os.path.exists(file_path):
106 | # Check if the download directory exists, otherwise create it.
107 | if not os.path.exists(download_dir):
108 | os.makedirs(download_dir)
109 |
110 | # Download the file from the internet.
111 | file_path, _ = urllib.request.urlretrieve(url=url,
112 | filename=file_path,
113 | reporthook=_print_download_progress)
114 |
115 | print()
116 | print("Download finished. Extracting files.")
117 |
118 | if file_path.endswith(".zip"):
119 | # Unpack the zip-file.
120 | zipfile.ZipFile(file=file_path, mode="r").extractall(download_dir)
121 | elif file_path.endswith((".tar.gz", ".tgz")):
122 | # Unpack the tar-ball.
123 | tarfile.open(name=file_path, mode="r:gz").extractall(download_dir)
124 |
125 | print("Done.")
126 | else:
127 | print("Data has apparently already been downloaded and unpacked.")
128 |
129 |
130 | ########################################################################
131 |
--------------------------------------------------------------------------------
/europarl.py:
--------------------------------------------------------------------------------
1 | ########################################################################
2 | #
3 | # Functions for downloading the Europarl data-set from the internet
4 | # and loading it into memory. This data-set is used for translation
5 | # between English and most European languages.
6 | #
7 | # http://www.statmt.org/europarl/
8 | #
9 | # Implemented in Python 3.6
10 | #
11 | # Usage:
12 | # 1) Set the variable data_dir with the desired storage directory.
13 | # 2) Determine the language-code to use e.g. "da" for Danish.
14 | # 3) Call maybe_download_and_extract() to download the data-set
15 | # if it is not already located in the given data_dir.
16 | # 4) Call load_data(english=True) and load_data(english=False)
17 | # to load the two data-files.
18 | # 5) Use the returned data in your own program.
19 | #
20 | # Format:
21 | # The Europarl data-set contains millions of text-pairs between English
22 | # and most European languages. The data is stored in two text-files.
23 | # The data is returned as lists of strings by the load_data() function.
24 | #
25 | # The list of currently supported languages and their codes are as follows:
26 | #
27 | # bg - Bulgarian
28 | # cs - Czech
29 | # da - Danish
30 | # de - German
31 | # el - Greek
32 | # es - Spanish
33 | # et - Estonian
34 | # fi - Finnish
35 | # fr - French
36 | # hu - Hungarian
37 | # it - Italian
38 | # lt - Lithuanian
39 | # lv - Latvian
40 | # nl - Dutch
41 | # pl - Polish
42 | # pt - Portuguese
43 | # ro - Romanian
44 | # sk - Slovak
45 | # sl - Slovene
46 | # sv - Swedish
47 | #
48 | ########################################################################
49 | #
50 | # This file is part of the TensorFlow Tutorials available at:
51 | #
52 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials
53 | #
54 | # Published under the MIT License. See the file LICENSE for details.
55 | #
56 | # Copyright 2018 by Magnus Erik Hvass Pedersen
57 | #
58 | ########################################################################
59 |
60 | import os
61 | import download
62 |
63 | ########################################################################
64 |
65 | # Directory where you want to download and save the data-set.
66 | # Set this before you start calling any of the functions below.
67 | data_dir = "data/europarl/"
68 |
69 | # Base-URL for the data-sets on the internet.
70 | data_url = "http://www.statmt.org/europarl/v7/"
71 |
72 |
73 | ########################################################################
74 | # Public functions that you may call to download the data-set from
75 | # the internet and load the data into memory.
76 |
77 |
78 | def maybe_download_and_extract(language_code="da"):
79 | """
80 | Download and extract the Europarl data-set if the data-file doesn't
81 | already exist in data_dir. The data-set is for translating between
82 | English and the given language-code (e.g. 'da' for Danish, see the
83 | list of available language-codes above).
84 | """
85 |
86 | # Create the full URL for the file with this data-set.
87 | url = data_url + language_code + "-en.tgz"
88 |
89 | download.maybe_download_and_extract(url=url, download_dir=data_dir)
90 |
91 |
92 | def load_data(english=True, language_code="da", start="", end=""):
93 | """
94 | Load the data-file for either the English-language texts or
95 | for the other language (e.g. "da" for Danish).
96 |
97 | All lines of the data-file are returned as a list of strings.
98 |
99 | :param english:
100 | Boolean whether to load the data-file for
101 | English (True) or the other language (False).
102 |
103 | :param language_code:
104 | Two-char code for the other language e.g. "da" for Danish.
105 | See list of available codes above.
106 |
107 | :param start:
108 | Prepend each line with this text e.g. "ssss " to indicate start of line.
109 |
110 | :param end:
111 | Append each line with this text e.g. " eeee" to indicate end of line.
112 |
113 | :return:
114 | List of strings with all the lines of the data-file.
115 | """
116 |
117 | if english:
118 | # Load the English data.
119 | filename = "europarl-v7.{0}-en.en".format(language_code)
120 | else:
121 | # Load the other language.
122 | filename = "europarl-v7.{0}-en.{0}".format(language_code)
123 |
124 | # Full path for the data-file.
125 | path = os.path.join(data_dir, filename)
126 |
127 | # Open and read all the contents of the data-file.
128 | with open(path, encoding="utf-8") as file:
129 | # Read the line from file, strip leading and trailing whitespace,
130 | # prepend the start-text and append the end-text.
131 | texts = [start + line.strip() + end for line in file]
132 |
133 | return texts
134 |
135 |
136 | ########################################################################
137 |
--------------------------------------------------------------------------------
/forks.md:
--------------------------------------------------------------------------------
1 | # TensorFlow Tutorials - Forks
2 |
3 | These are forks of the [original TensorFlow Tutorials by Hvass-Labs](https://github.com/Hvass-Labs/TensorFlow-Tutorials).
4 | They are not developed or even reviewed by the original author, who takes no reponsibility for these forks.
5 |
6 | If you have made a fork of the TensorFlow Tutorials with substantial modifications that you feel may be useful to others,
7 | then please [open a new issue on GitHub](https://github.com/Hvass-Labs/TensorFlow-Tutorials/issues) with a link and short description.
8 |
9 | * [Keras port of some tutorials.](https://github.com/chidochipotle/TensorFlow-Tutorials)
10 | * [The Inception model as an OpenFaaS function.](https://github.com/faas-and-furious/inception-function)
11 |
--------------------------------------------------------------------------------
/images/02_convolution.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/02_convolution.png
--------------------------------------------------------------------------------
/images/02_network_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/02_network_flowchart.png
--------------------------------------------------------------------------------
/images/06_network_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/06_network_flowchart.png
--------------------------------------------------------------------------------
/images/07_inception_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/07_inception_flowchart.png
--------------------------------------------------------------------------------
/images/08_transfer_learning_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/08_transfer_learning_flowchart.png
--------------------------------------------------------------------------------
/images/09_transfer_learning_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/09_transfer_learning_flowchart.png
--------------------------------------------------------------------------------
/images/10_transfer_learning_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/10_transfer_learning_flowchart.png
--------------------------------------------------------------------------------
/images/11_adversarial_examples_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/11_adversarial_examples_flowchart.png
--------------------------------------------------------------------------------
/images/12_adversarial_noise_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/12_adversarial_noise_flowchart.png
--------------------------------------------------------------------------------
/images/13_visual_analysis_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/13_visual_analysis_flowchart.png
--------------------------------------------------------------------------------
/images/13b_visual_analysis_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/13b_visual_analysis_flowchart.png
--------------------------------------------------------------------------------
/images/14_deepdream_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/14_deepdream_flowchart.png
--------------------------------------------------------------------------------
/images/14_deepdream_recursive_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/14_deepdream_recursive_flowchart.png
--------------------------------------------------------------------------------
/images/15_style_transfer_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/15_style_transfer_flowchart.png
--------------------------------------------------------------------------------
/images/16_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/16_flowchart.png
--------------------------------------------------------------------------------
/images/16_motion-trace.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/16_motion-trace.png
--------------------------------------------------------------------------------
/images/16_problem.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/16_problem.png
--------------------------------------------------------------------------------
/images/16_q-values-details.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/16_q-values-details.png
--------------------------------------------------------------------------------
/images/16_q-values-simple.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/16_q-values-simple.png
--------------------------------------------------------------------------------
/images/16_training_stability.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/16_training_stability.png
--------------------------------------------------------------------------------
/images/19_flowchart_bayesian_optimization.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/19_flowchart_bayesian_optimization.png
--------------------------------------------------------------------------------
/images/20_natural_language_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/20_natural_language_flowchart.png
--------------------------------------------------------------------------------
/images/20_natural_language_flowchart.svg:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
441 |
--------------------------------------------------------------------------------
/images/20_recurrent_unit.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/20_recurrent_unit.png
--------------------------------------------------------------------------------
/images/20_recurrent_unit.svg:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
337 |
--------------------------------------------------------------------------------
/images/20_unrolled_3layers_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/20_unrolled_3layers_flowchart.png
--------------------------------------------------------------------------------
/images/20_unrolled_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/20_unrolled_flowchart.png
--------------------------------------------------------------------------------
/images/20_unrolled_flowchart.svg:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
850 |
--------------------------------------------------------------------------------
/images/21_machine_translation_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/21_machine_translation_flowchart.png
--------------------------------------------------------------------------------
/images/22_image_captioning_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/22_image_captioning_flowchart.png
--------------------------------------------------------------------------------
/images/23_time_series_flowchart.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/23_time_series_flowchart.png
--------------------------------------------------------------------------------
/images/Denmark.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/Denmark.jpg
--------------------------------------------------------------------------------
/images/Europe.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/Europe.jpg
--------------------------------------------------------------------------------
/images/elon_musk.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/elon_musk.jpg
--------------------------------------------------------------------------------
/images/elon_musk_100x100.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/elon_musk_100x100.jpg
--------------------------------------------------------------------------------
/images/escher_planefilling2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/escher_planefilling2.jpg
--------------------------------------------------------------------------------
/images/giger.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/giger.jpg
--------------------------------------------------------------------------------
/images/hulk.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/hulk.jpg
--------------------------------------------------------------------------------
/images/parrot.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/parrot.jpg
--------------------------------------------------------------------------------
/images/parrot_cropped1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/parrot_cropped1.jpg
--------------------------------------------------------------------------------
/images/parrot_cropped2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/parrot_cropped2.jpg
--------------------------------------------------------------------------------
/images/parrot_cropped3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/parrot_cropped3.jpg
--------------------------------------------------------------------------------
/images/parrot_padded.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/parrot_padded.jpg
--------------------------------------------------------------------------------
/images/style1.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/style1.jpg
--------------------------------------------------------------------------------
/images/style2.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/style2.jpg
--------------------------------------------------------------------------------
/images/style3.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/style3.jpg
--------------------------------------------------------------------------------
/images/style4.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/style4.jpg
--------------------------------------------------------------------------------
/images/style5.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/style5.jpg
--------------------------------------------------------------------------------
/images/style6.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/style6.jpg
--------------------------------------------------------------------------------
/images/style7.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/style7.jpg
--------------------------------------------------------------------------------
/images/style8.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/style8.jpg
--------------------------------------------------------------------------------
/images/style9.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/style9.jpg
--------------------------------------------------------------------------------
/images/willy_wonka_new.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/willy_wonka_new.jpg
--------------------------------------------------------------------------------
/images/willy_wonka_old.jpg:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/Hvass-Labs/TensorFlow-Tutorials/d5f33973570fe6ef9c78c8a38c7449a932c81010/images/willy_wonka_old.jpg
--------------------------------------------------------------------------------
/imdb.py:
--------------------------------------------------------------------------------
1 | ########################################################################
2 | #
3 | # Functions for downloading the IMDB Review data-set from the internet
4 | # and loading it into memory.
5 | #
6 | # Implemented in Python 3.6
7 | #
8 | # Usage:
9 | # 1) Set the variable data_dir with the desired storage directory.
10 | # 2) Call maybe_download_and_extract() to download the data-set
11 | # if it is not already located in the given data_dir.
12 | # 3) Call load_data(train=True) to load the training-set.
13 | # 4) Call load_data(train=False) to load the test-set.
14 | # 5) Use the returned data in your own program.
15 | #
16 | # Format:
17 | # The IMDB Review data-set consists of 50000 reviews of movies
18 | # that are split into 25000 reviews for the training- and test-set,
19 | # and each of those is split into 12500 positive and 12500 negative reviews.
20 | # These are returned as lists of strings by the load_data() function.
21 | #
22 | ########################################################################
23 | #
24 | # This file is part of the TensorFlow Tutorials available at:
25 | #
26 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials
27 | #
28 | # Published under the MIT License. See the file LICENSE for details.
29 | #
30 | # Copyright 2018 by Magnus Erik Hvass Pedersen
31 | #
32 | ########################################################################
33 |
34 | import os
35 | import download
36 | import glob
37 |
38 | ########################################################################
39 |
40 | # Directory where you want to download and save the data-set.
41 | # Set this before you start calling any of the functions below.
42 | data_dir = "data/IMDB/"
43 |
44 | # URL for the data-set on the internet.
45 | data_url = "http://ai.stanford.edu/~amaas/data/sentiment/aclImdb_v1.tar.gz"
46 |
47 |
48 | ########################################################################
49 | # Private helper-functions.
50 |
51 | def _read_text_file(path):
52 | """
53 | Read and return all the contents of the text-file with the given path.
54 | It is returned as a single string where all lines are concatenated.
55 | """
56 |
57 | with open(path, 'rt', encoding='utf-8') as file:
58 | # Read a list of strings.
59 | lines = file.readlines()
60 |
61 | # Concatenate to a single string.
62 | text = " ".join(lines)
63 |
64 | return text
65 |
66 |
67 | ########################################################################
68 | # Public functions that you may call to download the data-set from
69 | # the internet and load the data into memory.
70 |
71 |
72 | def maybe_download_and_extract():
73 | """
74 | Download and extract the IMDB Review data-set if it doesn't already exist
75 | in data_dir (set this variable first to the desired directory).
76 | """
77 |
78 | download.maybe_download_and_extract(url=data_url, download_dir=data_dir)
79 |
80 |
81 | def load_data(train=True):
82 | """
83 | Load all the data from the IMDB Review data-set for sentiment analysis.
84 |
85 | :param train: Boolean whether to load the training-set (True)
86 | or the test-set (False).
87 |
88 | :return: A list of all the reviews as text-strings,
89 | and a list of the corresponding sentiments
90 | where 1.0 is positive and 0.0 is negative.
91 | """
92 |
93 | # Part of the path-name for either training or test-set.
94 | train_test_path = "train" if train else "test"
95 |
96 | # Base-directory where the extracted data is located.
97 | dir_base = os.path.join(data_dir, "aclImdb", train_test_path)
98 |
99 | # Filename-patterns for the data-files.
100 | path_pattern_pos = os.path.join(dir_base, "pos", "*.txt")
101 | path_pattern_neg = os.path.join(dir_base, "neg", "*.txt")
102 |
103 | # Get lists of all the file-paths for the data.
104 | paths_pos = glob.glob(path_pattern_pos)
105 | paths_neg = glob.glob(path_pattern_neg)
106 |
107 | # Read all the text-files.
108 | data_pos = [_read_text_file(path) for path in paths_pos]
109 | data_neg = [_read_text_file(path) for path in paths_neg]
110 |
111 | # Concatenate the positive and negative data.
112 | x = data_pos + data_neg
113 |
114 | # Create a list of the sentiments for the text-data.
115 | # 1.0 is a positive sentiment, 0.0 is a negative sentiment.
116 | y = [1.0] * len(data_pos) + [0.0] * len(data_neg)
117 |
118 | return x, y
119 |
120 |
121 | ########################################################################
122 |
--------------------------------------------------------------------------------
/inception.py:
--------------------------------------------------------------------------------
1 | ########################################################################
2 | #
3 | # The Inception Model v3 for TensorFlow.
4 | #
5 | # This is a pre-trained Deep Neural Network for classifying images.
6 | # You provide an image or filename for a jpeg-file which will be
7 | # loaded and input to the Inception model, which will then output
8 | # an array of numbers indicating how likely it is that the
9 | # input-image is of each class.
10 | #
11 | # See the example code at the bottom of this file or in the
12 | # accompanying Python Notebooks.
13 | #
14 | # Tutorial #07 shows how to use the Inception model.
15 | # Tutorial #08 shows how to use it for Transfer Learning.
16 | #
17 | # What is Transfer Learning?
18 | #
19 | # Transfer Learning is the use of a Neural Network for classifying
20 | # images from another data-set than it was trained on. For example,
21 | # the Inception model was trained on the ImageNet data-set using
22 | # a very powerful and expensive computer. But the Inception model
23 | # can be re-used on data-sets it was not trained on without having
24 | # to re-train the entire model, even though the number of classes
25 | # are different for the two data-sets. This allows you to use the
26 | # Inception model on your own data-sets without the need for a
27 | # very powerful and expensive computer to train it.
28 | #
29 | # The last layer of the Inception model before the softmax-classifier
30 | # is called the Transfer Layer because the output of that layer will
31 | # be used as the input in your new softmax-classifier (or as the
32 | # input for another neural network), which will then be trained on
33 | # your own data-set.
34 | #
35 | # The output values of the Transfer Layer are called Transfer Values.
36 | # These are the actual values that will be input to your new
37 | # softmax-classifier or to another neural network that you create.
38 | #
39 | # The word 'bottleneck' is also sometimes used to refer to the
40 | # Transfer Layer or Transfer Values, but it is a confusing word
41 | # that is not used here.
42 | #
43 | # Implemented in Python 3.5 with TensorFlow v0.10.0rc0
44 | #
45 | ########################################################################
46 | #
47 | # This file is part of the TensorFlow Tutorials available at:
48 | #
49 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials
50 | #
51 | # Published under the MIT License. See the file LICENSE for details.
52 | #
53 | # Copyright 2016 by Magnus Erik Hvass Pedersen
54 | #
55 | ########################################################################
56 |
57 | import numpy as np
58 | import tensorflow as tf
59 | import download
60 | from cache import cache
61 | import os
62 | import sys
63 |
64 | ########################################################################
65 | # Various directories and file-names.
66 |
67 | # Internet URL for the tar-file with the Inception model.
68 | # Note that this might change in the future and will need to be updated.
69 | data_url = "http://download.tensorflow.org/models/image/imagenet/inception-2015-12-05.tgz"
70 |
71 | # Directory to store the downloaded data.
72 | data_dir = "inception/"
73 |
74 | # File containing the mappings between class-number and uid. (Downloaded)
75 | path_uid_to_cls = "imagenet_2012_challenge_label_map_proto.pbtxt"
76 |
77 | # File containing the mappings between uid and string. (Downloaded)
78 | path_uid_to_name = "imagenet_synset_to_human_label_map.txt"
79 |
80 | # File containing the TensorFlow graph definition. (Downloaded)
81 | path_graph_def = "classify_image_graph_def.pb"
82 |
83 | ########################################################################
84 |
85 |
86 | def maybe_download():
87 | """
88 | Download the Inception model from the internet if it does not already
89 | exist in the data_dir. The file is about 85 MB.
90 | """
91 |
92 | print("Downloading Inception v3 Model ...")
93 | download.maybe_download_and_extract(url=data_url, download_dir=data_dir)
94 |
95 |
96 | ########################################################################
97 |
98 |
99 | class NameLookup:
100 | """
101 | Used for looking up the name associated with a class-number.
102 | This is used to print the name of a class instead of its number,
103 | e.g. "plant" or "horse".
104 |
105 | Maps between:
106 | - cls is the class-number as an integer between 1 and 1000 (inclusive).
107 | - uid is a class-id as a string from the ImageNet data-set, e.g. "n00017222".
108 | - name is the class-name as a string, e.g. "plant, flora, plant life"
109 |
110 | There are actually 1008 output classes of the Inception model
111 | but there are only 1000 named classes in these mapping-files.
112 | The remaining 8 output classes of the model should not be used.
113 | """
114 |
115 | def __init__(self):
116 | # Mappings between uid, cls and name are dicts, where insertions and
117 | # lookup have O(1) time-usage on average, but may be O(n) in worst case.
118 | self._uid_to_cls = {} # Map from uid to cls.
119 | self._uid_to_name = {} # Map from uid to name.
120 | self._cls_to_uid = {} # Map from cls to uid.
121 |
122 | # Read the uid-to-name mappings from file.
123 | path = os.path.join(data_dir, path_uid_to_name)
124 | with open(file=path, mode='r') as file:
125 | # Read all lines from the file.
126 | lines = file.readlines()
127 |
128 | for line in lines:
129 | # Remove newlines.
130 | line = line.replace("\n", "")
131 |
132 | # Split the line on tabs.
133 | elements = line.split("\t")
134 |
135 | # Get the uid.
136 | uid = elements[0]
137 |
138 | # Get the class-name.
139 | name = elements[1]
140 |
141 | # Insert into the lookup-dict.
142 | self._uid_to_name[uid] = name
143 |
144 | # Read the uid-to-cls mappings from file.
145 | path = os.path.join(data_dir, path_uid_to_cls)
146 | with open(file=path, mode='r') as file:
147 | # Read all lines from the file.
148 | lines = file.readlines()
149 |
150 | for line in lines:
151 | # We assume the file is in the proper format,
152 | # so the following lines come in pairs. Other lines are ignored.
153 |
154 | if line.startswith(" target_class: "):
155 | # This line must be the class-number as an integer.
156 |
157 | # Split the line.
158 | elements = line.split(": ")
159 |
160 | # Get the class-number as an integer.
161 | cls = int(elements[1])
162 |
163 | elif line.startswith(" target_class_string: "):
164 | # This line must be the uid as a string.
165 |
166 | # Split the line.
167 | elements = line.split(": ")
168 |
169 | # Get the uid as a string e.g. "n01494475"
170 | uid = elements[1]
171 |
172 | # Remove the enclosing "" from the string.
173 | uid = uid[1:-2]
174 |
175 | # Insert into the lookup-dicts for both ways between uid and cls.
176 | self._uid_to_cls[uid] = cls
177 | self._cls_to_uid[cls] = uid
178 |
179 | def uid_to_cls(self, uid):
180 | """
181 | Return the class-number as an integer for the given uid-string.
182 | """
183 |
184 | return self._uid_to_cls[uid]
185 |
186 | def uid_to_name(self, uid, only_first_name=False):
187 | """
188 | Return the class-name for the given uid string.
189 |
190 | Some class-names are lists of names, if you only want the first name,
191 | then set only_first_name=True.
192 | """
193 |
194 | # Lookup the name from the uid.
195 | name = self._uid_to_name[uid]
196 |
197 | # Only use the first name in the list?
198 | if only_first_name:
199 | name = name.split(",")[0]
200 |
201 | return name
202 |
203 | def cls_to_name(self, cls, only_first_name=False):
204 | """
205 | Return the class-name from the integer class-number.
206 |
207 | Some class-names are lists of names, if you only want the first name,
208 | then set only_first_name=True.
209 | """
210 |
211 | # Lookup the uid from the cls.
212 | uid = self._cls_to_uid[cls]
213 |
214 | # Lookup the name from the uid.
215 | name = self.uid_to_name(uid=uid, only_first_name=only_first_name)
216 |
217 | return name
218 |
219 |
220 | ########################################################################
221 |
222 |
223 | class Inception:
224 | """
225 | The Inception model is a Deep Neural Network which has already been
226 | trained for classifying images into 1000 different categories.
227 |
228 | When you create a new instance of this class, the Inception model
229 | will be loaded and can be used immediately without training.
230 |
231 | The Inception model can also be used for Transfer Learning.
232 | """
233 |
234 | # Name of the tensor for feeding the input image as jpeg.
235 | tensor_name_input_jpeg = "DecodeJpeg/contents:0"
236 |
237 | # Name of the tensor for feeding the decoded input image.
238 | # Use this for feeding images in other formats than jpeg.
239 | tensor_name_input_image = "DecodeJpeg:0"
240 |
241 | # Name of the tensor for the resized input image.
242 | # This is used to retrieve the image after it has been resized.
243 | tensor_name_resized_image = "ResizeBilinear:0"
244 |
245 | # Name of the tensor for the output of the softmax-classifier.
246 | # This is used for classifying images with the Inception model.
247 | tensor_name_softmax = "softmax:0"
248 |
249 | # Name of the tensor for the unscaled outputs of the softmax-classifier (aka. logits).
250 | tensor_name_softmax_logits = "softmax/logits:0"
251 |
252 | # Name of the tensor for the output of the Inception model.
253 | # This is used for Transfer Learning.
254 | tensor_name_transfer_layer = "pool_3:0"
255 |
256 | def __init__(self):
257 | # Mappings between class-numbers and class-names.
258 | # Used to print the class-name as a string e.g. "horse" or "plant".
259 | self.name_lookup = NameLookup()
260 |
261 | # Now load the Inception model from file. The way TensorFlow
262 | # does this is confusing and requires several steps.
263 |
264 | # Create a new TensorFlow computational graph.
265 | self.graph = tf.Graph()
266 |
267 | # Set the new graph as the default.
268 | with self.graph.as_default():
269 |
270 | # TensorFlow graphs are saved to disk as so-called Protocol Buffers
271 | # aka. proto-bufs which is a file-format that works on multiple
272 | # platforms. In this case it is saved as a binary file.
273 |
274 | # Open the graph-def file for binary reading.
275 | path = os.path.join(data_dir, path_graph_def)
276 | with tf.gfile.FastGFile(path, 'rb') as file:
277 | # The graph-def is a saved copy of a TensorFlow graph.
278 | # First we need to create an empty graph-def.
279 | graph_def = tf.GraphDef()
280 |
281 | # Then we load the proto-buf file into the graph-def.
282 | graph_def.ParseFromString(file.read())
283 |
284 | # Finally we import the graph-def to the default TensorFlow graph.
285 | tf.import_graph_def(graph_def, name='')
286 |
287 | # Now self.graph holds the Inception model from the proto-buf file.
288 |
289 | # Get the output of the Inception model by looking up the tensor
290 | # with the appropriate name for the output of the softmax-classifier.
291 | self.y_pred = self.graph.get_tensor_by_name(self.tensor_name_softmax)
292 |
293 | # Get the unscaled outputs for the Inception model (aka. softmax-logits).
294 | self.y_logits = self.graph.get_tensor_by_name(self.tensor_name_softmax_logits)
295 |
296 | # Get the tensor for the resized image that is input to the neural network.
297 | self.resized_image = self.graph.get_tensor_by_name(self.tensor_name_resized_image)
298 |
299 | # Get the tensor for the last layer of the graph, aka. the transfer-layer.
300 | self.transfer_layer = self.graph.get_tensor_by_name(self.tensor_name_transfer_layer)
301 |
302 | # Get the number of elements in the transfer-layer.
303 | self.transfer_len = self.transfer_layer.get_shape()[3]
304 |
305 | # Create a TensorFlow session for executing the graph.
306 | self.session = tf.Session(graph=self.graph)
307 |
308 | def close(self):
309 | """
310 | Call this function when you are done using the Inception model.
311 | It closes the TensorFlow session to release its resources.
312 | """
313 |
314 | self.session.close()
315 |
316 | def _write_summary(self, logdir='summary/'):
317 | """
318 | Write graph to summary-file so it can be shown in TensorBoard.
319 |
320 | This function is used for debugging and may be changed or removed in the future.
321 |
322 | :param logdir:
323 | Directory for writing the summary-files.
324 |
325 | :return:
326 | Nothing.
327 | """
328 |
329 | writer = tf.train.SummaryWriter(logdir=logdir, graph=self.graph)
330 | writer.close()
331 |
332 | def _create_feed_dict(self, image_path=None, image=None):
333 | """
334 | Create and return a feed-dict with an image.
335 |
336 | :param image_path:
337 | The input image is a jpeg-file with this file-path.
338 |
339 | :param image:
340 | The input image is a 3-dim array which is already decoded.
341 | The pixels MUST be values between 0 and 255 (float or int).
342 |
343 | :return:
344 | Dict for feeding to the Inception graph in TensorFlow.
345 | """
346 |
347 | if image is not None:
348 | # Image is passed in as a 3-dim array that is already decoded.
349 | feed_dict = {self.tensor_name_input_image: image}
350 |
351 | elif image_path is not None:
352 | # Read the jpeg-image as an array of bytes.
353 | image_data = tf.gfile.FastGFile(image_path, 'rb').read()
354 |
355 | # Image is passed in as a jpeg-encoded image.
356 | feed_dict = {self.tensor_name_input_jpeg: image_data}
357 |
358 | else:
359 | raise ValueError("Either image or image_path must be set.")
360 |
361 | return feed_dict
362 |
363 | def classify(self, image_path=None, image=None):
364 | """
365 | Use the Inception model to classify a single image.
366 |
367 | The image will be resized automatically to 299 x 299 pixels,
368 | see the discussion in the Python Notebook for Tutorial #07.
369 |
370 | :param image_path:
371 | The input image is a jpeg-file with this file-path.
372 |
373 | :param image:
374 | The input image is a 3-dim array which is already decoded.
375 | The pixels MUST be values between 0 and 255 (float or int).
376 |
377 | :return:
378 | Array of floats (aka. softmax-array) indicating how likely
379 | the Inception model thinks the image is of each given class.
380 | """
381 |
382 | # Create a feed-dict for the TensorFlow graph with the input image.
383 | feed_dict = self._create_feed_dict(image_path=image_path, image=image)
384 |
385 | # Execute the TensorFlow session to get the predicted labels.
386 | pred = self.session.run(self.y_pred, feed_dict=feed_dict)
387 |
388 | # Reduce the array to a single dimension.
389 | pred = np.squeeze(pred)
390 |
391 | return pred
392 |
393 | def get_resized_image(self, image_path=None, image=None):
394 | """
395 | Input an image to the Inception model and return
396 | the resized image. The resized image can be plotted so
397 | we can see what the neural network sees as its input.
398 |
399 | :param image_path:
400 | The input image is a jpeg-file with this file-path.
401 |
402 | :param image:
403 | The input image is a 3-dim array which is already decoded.
404 | The pixels MUST be values between 0 and 255 (float or int).
405 |
406 | :return:
407 | A 3-dim array holding the image.
408 | """
409 |
410 | # Create a feed-dict for the TensorFlow graph with the input image.
411 | feed_dict = self._create_feed_dict(image_path=image_path, image=image)
412 |
413 | # Execute the TensorFlow session to get the predicted labels.
414 | resized_image = self.session.run(self.resized_image, feed_dict=feed_dict)
415 |
416 | # Remove the 1st dimension of the 4-dim tensor.
417 | resized_image = resized_image.squeeze(axis=0)
418 |
419 | # Scale pixels to be between 0.0 and 1.0
420 | resized_image = resized_image.astype(float) / 255.0
421 |
422 | return resized_image
423 |
424 | def print_scores(self, pred, k=10, only_first_name=True):
425 | """
426 | Print the scores (or probabilities) for the top-k predicted classes.
427 |
428 | :param pred:
429 | Predicted class-labels returned from the predict() function.
430 |
431 | :param k:
432 | How many classes to print.
433 |
434 | :param only_first_name:
435 | Some class-names are lists of names, if you only want the first name,
436 | then set only_first_name=True.
437 |
438 | :return:
439 | Nothing.
440 | """
441 |
442 | # Get a sorted index for the pred-array.
443 | idx = pred.argsort()
444 |
445 | # The index is sorted lowest-to-highest values. Take the last k.
446 | top_k = idx[-k:]
447 |
448 | # Iterate the top-k classes in reversed order (i.e. highest first).
449 | for cls in reversed(top_k):
450 | # Lookup the class-name.
451 | name = self.name_lookup.cls_to_name(cls=cls, only_first_name=only_first_name)
452 |
453 | # Predicted score (or probability) for this class.
454 | score = pred[cls]
455 |
456 | # Print the score and class-name.
457 | print("{0:>6.2%} : {1}".format(score, name))
458 |
459 | def transfer_values(self, image_path=None, image=None):
460 | """
461 | Calculate the transfer-values for the given image.
462 | These are the values of the last layer of the Inception model before
463 | the softmax-layer, when inputting the image to the Inception model.
464 |
465 | The transfer-values allow us to use the Inception model in so-called
466 | Transfer Learning for other data-sets and different classifications.
467 |
468 | It may take several hours or more to calculate the transfer-values
469 | for all images in a data-set. It is therefore useful to cache the
470 | results using the function transfer_values_cache() below.
471 |
472 | :param image_path:
473 | The input image is a jpeg-file with this file-path.
474 |
475 | :param image:
476 | The input image is a 3-dim array which is already decoded.
477 | The pixels MUST be values between 0 and 255 (float or int).
478 |
479 | :return:
480 | The transfer-values for those images.
481 | """
482 |
483 | # Create a feed-dict for the TensorFlow graph with the input image.
484 | feed_dict = self._create_feed_dict(image_path=image_path, image=image)
485 |
486 | # Use TensorFlow to run the graph for the Inception model.
487 | # This calculates the values for the last layer of the Inception model
488 | # prior to the softmax-classification, which we call transfer-values.
489 | transfer_values = self.session.run(self.transfer_layer, feed_dict=feed_dict)
490 |
491 | # Reduce to a 1-dim array.
492 | transfer_values = np.squeeze(transfer_values)
493 |
494 | return transfer_values
495 |
496 |
497 | ########################################################################
498 | # Batch-processing.
499 |
500 |
501 | def process_images(fn, images=None, image_paths=None):
502 | """
503 | Call the function fn() for each image, e.g. transfer_values() from
504 | the Inception model above. All the results are concatenated and returned.
505 |
506 | :param fn:
507 | Function to be called for each image.
508 |
509 | :param images:
510 | List of images to process.
511 |
512 | :param image_paths:
513 | List of file-paths for the images to process.
514 |
515 | :return:
516 | Numpy array with the results.
517 | """
518 |
519 | # Are we using images or image_paths?
520 | using_images = images is not None
521 |
522 | # Number of images.
523 | if using_images:
524 | num_images = len(images)
525 | else:
526 | num_images = len(image_paths)
527 |
528 | # Pre-allocate list for the results.
529 | # This holds references to other arrays. Initially the references are None.
530 | result = [None] * num_images
531 |
532 | # For each input image.
533 | for i in range(num_images):
534 | # Status-message. Note the \r which means the line should overwrite itself.
535 | msg = "\r- Processing image: {0:>6} / {1}".format(i+1, num_images)
536 |
537 | # Print the status message.
538 | sys.stdout.write(msg)
539 | sys.stdout.flush()
540 |
541 | # Process the image and store the result for later use.
542 | if using_images:
543 | result[i] = fn(image=images[i])
544 | else:
545 | result[i] = fn(image_path=image_paths[i])
546 |
547 | # Print newline.
548 | print()
549 |
550 | # Convert the result to a numpy array.
551 | result = np.array(result)
552 |
553 | return result
554 |
555 |
556 | ########################################################################
557 |
558 |
559 | def transfer_values_cache(cache_path, model, images=None, image_paths=None):
560 | """
561 | This function either loads the transfer-values if they have
562 | already been calculated, otherwise it calculates the values
563 | and saves them to a file that can be re-loaded again later.
564 |
565 | Because the transfer-values can be expensive to compute, it can
566 | be useful to cache the values through this function instead
567 | of calling transfer_values() directly on the Inception model.
568 |
569 | See Tutorial #08 for an example on how to use this function.
570 |
571 | :param cache_path:
572 | File containing the cached transfer-values for the images.
573 |
574 | :param model:
575 | Instance of the Inception model.
576 |
577 | :param images:
578 | 4-dim array with images. [image_number, height, width, colour_channel]
579 |
580 | :param image_paths:
581 | Array of file-paths for images (must be jpeg-format).
582 |
583 | :return:
584 | The transfer-values from the Inception model for those images.
585 | """
586 |
587 | # Helper-function for processing the images if the cache-file does not exist.
588 | # This is needed because we cannot supply both fn=process_images
589 | # and fn=model.transfer_values to the cache()-function.
590 | def fn():
591 | return process_images(fn=model.transfer_values, images=images, image_paths=image_paths)
592 |
593 | # Read the transfer-values from a cache-file, or calculate them if the file does not exist.
594 | transfer_values = cache(cache_path=cache_path, fn=fn)
595 |
596 | return transfer_values
597 |
598 |
599 | ########################################################################
600 | # Example usage.
601 |
602 | if __name__ == '__main__':
603 | print(tf.__version__)
604 |
605 | # Download Inception model if not already done.
606 | maybe_download()
607 |
608 | # Load the Inception model so it is ready for classifying images.
609 | model = Inception()
610 |
611 | # Path for a jpeg-image that is included in the downloaded data.
612 | image_path = os.path.join(data_dir, 'cropped_panda.jpg')
613 |
614 | # Use the Inception model to classify the image.
615 | pred = model.classify(image_path=image_path)
616 |
617 | # Print the scores and names for the top-10 predictions.
618 | model.print_scores(pred=pred, k=10)
619 |
620 | # Close the TensorFlow session.
621 | model.close()
622 |
623 | # Transfer Learning is demonstrated in Tutorial #08.
624 |
625 | ########################################################################
626 |
--------------------------------------------------------------------------------
/inception5h.py:
--------------------------------------------------------------------------------
1 | ########################################################################
2 | #
3 | # The Inception Model 5h for TensorFlow.
4 | #
5 | # This variant of the Inception model is easier to use for DeepDream
6 | # and other imaging techniques. This is because it allows the input
7 | # image to be any size, and the optimized images are also prettier.
8 | #
9 | # It is unclear which Inception model this implements because the
10 | # Google developers have (as usual) neglected to document it.
11 | # It is dubbed the 5h-model because that is the name of the zip-file,
12 | # but it is apparently simpler than the v.3 model.
13 | #
14 | # See the Python Notebook for Tutorial #14 for an example usage.
15 | #
16 | # Implemented in Python 3.5 with TensorFlow v0.11.0rc0
17 | #
18 | ########################################################################
19 | #
20 | # This file is part of the TensorFlow Tutorials available at:
21 | #
22 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials
23 | #
24 | # Published under the MIT License. See the file LICENSE for details.
25 | #
26 | # Copyright 2016 by Magnus Erik Hvass Pedersen
27 | #
28 | ########################################################################
29 |
30 | import numpy as np
31 | import tensorflow as tf
32 | import download
33 | import os
34 |
35 | ########################################################################
36 | # Various directories and file-names.
37 |
38 | # Internet URL for the tar-file with the Inception model.
39 | # Note that this might change in the future and will need to be updated.
40 | data_url = "http://storage.googleapis.com/download.tensorflow.org/models/inception5h.zip"
41 |
42 | # Directory to store the downloaded data.
43 | data_dir = "inception/5h/"
44 |
45 | # File containing the TensorFlow graph definition. (Downloaded)
46 | path_graph_def = "tensorflow_inception_graph.pb"
47 |
48 | ########################################################################
49 |
50 |
51 | def maybe_download():
52 | """
53 | Download the Inception model from the internet if it does not already
54 | exist in the data_dir. The file is about 50 MB.
55 | """
56 |
57 | print("Downloading Inception 5h Model ...")
58 | download.maybe_download_and_extract(url=data_url, download_dir=data_dir)
59 |
60 |
61 | ########################################################################
62 |
63 |
64 | class Inception5h:
65 | """
66 | The Inception model is a Deep Neural Network which has already been
67 | trained for classifying images into 1000 different categories.
68 |
69 | When you create a new instance of this class, the Inception model
70 | will be loaded and can be used immediately without training.
71 | """
72 |
73 | # Name of the tensor for feeding the input image.
74 | tensor_name_input_image = "input:0"
75 |
76 | # Names for some of the commonly used layers in the Inception model.
77 | layer_names = ['conv2d0', 'conv2d1', 'conv2d2',
78 | 'mixed3a', 'mixed3b',
79 | 'mixed4a', 'mixed4b', 'mixed4c', 'mixed4d', 'mixed4e',
80 | 'mixed5a', 'mixed5b']
81 |
82 | def __init__(self):
83 | # Now load the Inception model from file. The way TensorFlow
84 | # does this is confusing and requires several steps.
85 |
86 | # Create a new TensorFlow computational graph.
87 | self.graph = tf.Graph()
88 |
89 | # Set the new graph as the default.
90 | with self.graph.as_default():
91 |
92 | # TensorFlow graphs are saved to disk as so-called Protocol Buffers
93 | # aka. proto-bufs which is a file-format that works on multiple
94 | # platforms. In this case it is saved as a binary file.
95 |
96 | # Open the graph-def file for binary reading.
97 | path = os.path.join(data_dir, path_graph_def)
98 | with tf.gfile.FastGFile(path, 'rb') as file:
99 | # The graph-def is a saved copy of a TensorFlow graph.
100 | # First we need to create an empty graph-def.
101 | graph_def = tf.GraphDef()
102 |
103 | # Then we load the proto-buf file into the graph-def.
104 | graph_def.ParseFromString(file.read())
105 |
106 | # Finally we import the graph-def to the default TensorFlow graph.
107 | tf.import_graph_def(graph_def, name='')
108 |
109 | # Now self.graph holds the Inception model from the proto-buf file.
110 |
111 | # Get a reference to the tensor for inputting images to the graph.
112 | self.input = self.graph.get_tensor_by_name(self.tensor_name_input_image)
113 |
114 | # Get references to the tensors for the commonly used layers.
115 | self.layer_tensors = [self.graph.get_tensor_by_name(name + ":0") for name in self.layer_names]
116 |
117 | def create_feed_dict(self, image=None):
118 | """
119 | Create and return a feed-dict with an image.
120 |
121 | :param image:
122 | The input image is a 3-dim array which is already decoded.
123 | The pixels MUST be values between 0 and 255 (float or int).
124 |
125 | :return:
126 | Dict for feeding to the Inception graph in TensorFlow.
127 | """
128 |
129 | # Expand 3-dim array to 4-dim by prepending an 'empty' dimension.
130 | # This is because we are only feeding a single image, but the
131 | # Inception model was built to take multiple images as input.
132 | image = np.expand_dims(image, axis=0)
133 |
134 | # Image is passed in as a 3-dim array of raw pixel-values.
135 | feed_dict = {self.tensor_name_input_image: image}
136 |
137 | return feed_dict
138 |
139 | def get_gradient(self, tensor):
140 | """
141 | Get the gradient of the given tensor with respect to
142 | the input image. This allows us to modify the input
143 | image so as to maximize the given tensor.
144 |
145 | For use in e.g. DeepDream and Visual Analysis.
146 |
147 | :param tensor:
148 | The tensor whose value we want to maximize
149 | by changing the input image.
150 |
151 | :return:
152 | Gradient for the tensor with regard to the input image.
153 | """
154 |
155 | # Set the graph as default so we can add operations to it.
156 | with self.graph.as_default():
157 | # Square the tensor-values.
158 | # You can try and remove this to see the effect.
159 | tensor = tf.square(tensor)
160 |
161 | # Average the tensor so we get a single scalar value.
162 | tensor_mean = tf.reduce_mean(tensor)
163 |
164 | # Use TensorFlow to automatically create a mathematical
165 | # formula for the gradient using the chain-rule of
166 | # differentiation.
167 | gradient = tf.gradients(tensor_mean, self.input)[0]
168 |
169 | return gradient
170 |
171 | ########################################################################
172 |
--------------------------------------------------------------------------------
/knifey.py:
--------------------------------------------------------------------------------
1 | ########################################################################
2 | #
3 | # Functions for downloading the Knifey-Spoony data-set from the internet
4 | # and loading it into memory. Note that this only loads the file-names
5 | # for the images in the data-set and does not load the actual images.
6 | #
7 | # Implemented in Python 3.5
8 | #
9 | ########################################################################
10 | #
11 | # This file is part of the TensorFlow Tutorials available at:
12 | #
13 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials
14 | #
15 | # Published under the MIT License. See the file LICENSE for details.
16 | #
17 | # Copyright 2016 by Magnus Erik Hvass Pedersen
18 | #
19 | ########################################################################
20 |
21 | from dataset import load_cached
22 | import download
23 | import os
24 |
25 | ########################################################################
26 |
27 | # Directory where you want to download and save the data-set.
28 | # Set this before you start calling any of the functions below.
29 | data_dir = "data/knifey-spoony/"
30 |
31 | # Directory for the training-set after copying the files using copy_files().
32 | train_dir = os.path.join(data_dir, "train/")
33 |
34 | # Directory for the test-set after copying the files using copy_files().
35 | test_dir = os.path.join(data_dir, "test/")
36 |
37 | # URL for the data-set on the internet.
38 | data_url = "https://github.com/Hvass-Labs/knifey-spoony/raw/master/knifey-spoony.tar.gz"
39 |
40 | ########################################################################
41 | # Various constants for the size of the images.
42 | # Use these constants in your own program.
43 |
44 | # Width and height of each image.
45 | img_size = 200
46 |
47 | # Number of channels in each image, 3 channels: Red, Green, Blue.
48 | num_channels = 3
49 |
50 | # Shape of the numpy-array for an image.
51 | img_shape = [img_size, img_size, num_channels]
52 |
53 | # Length of an image when flattened to a 1-dim array.
54 | img_size_flat = img_size * img_size * num_channels
55 |
56 | # Number of classes.
57 | num_classes = 3
58 |
59 | ########################################################################
60 | # Public functions that you may call to download the data-set from
61 | # the internet and load the data into memory.
62 |
63 |
64 | def maybe_download_and_extract():
65 | """
66 | Download and extract the Knifey-Spoony data-set if it doesn't already exist
67 | in data_dir (set this variable first to the desired directory).
68 | """
69 |
70 | download.maybe_download_and_extract(url=data_url, download_dir=data_dir)
71 |
72 |
73 | def load():
74 | """
75 | Load the Knifey-Spoony data-set into memory.
76 |
77 | This uses a cache-file which is reloaded if it already exists,
78 | otherwise the Knifey-Spoony data-set is created and saved to
79 | the cache-file. The reason for using a cache-file is that it
80 | ensure the files are ordered consistently each time the data-set
81 | is loaded. This is important when the data-set is used in
82 | combination with Transfer Learning as is done in Tutorial #09.
83 |
84 | :return:
85 | A DataSet-object for the Knifey-Spoony data-set.
86 | """
87 |
88 | # Path for the cache-file.
89 | cache_path = os.path.join(data_dir, "knifey-spoony.pkl")
90 |
91 | # If the DataSet-object already exists in a cache-file
92 | # then load it, otherwise create a new object and save
93 | # it to the cache-file so it can be loaded the next time.
94 | dataset = load_cached(cache_path=cache_path,
95 | in_dir=data_dir)
96 |
97 | return dataset
98 |
99 |
100 | def copy_files():
101 | """
102 | Copy all the files in the training-set to train_dir
103 | and copy all the files in the test-set to test_dir.
104 |
105 | This creates the directories if they don't already exist,
106 | and it overwrites the images if they already exist.
107 |
108 | The images are originally stored in a directory-structure
109 | that is incompatible with e.g. the Keras API. This function
110 | copies the files to a dir-structure that works with e.g. Keras.
111 | """
112 |
113 | # Load the Knifey-Spoony dataset.
114 | # This is very fast as it only gathers lists of the files
115 | # and does not actually load the images into memory.
116 | dataset = load()
117 |
118 | # Copy the files to separate training- and test-dirs.
119 | dataset.copy_files(train_dir=train_dir, test_dir=test_dir)
120 |
121 | ########################################################################
122 |
123 | if __name__ == '__main__':
124 | # Download and extract the data-set if it doesn't already exist.
125 | maybe_download_and_extract()
126 |
127 | # Load the data-set.
128 | dataset = load()
129 |
130 | # Get the file-paths for the images and their associated class-numbers
131 | # and class-labels. This is for the training-set.
132 | image_paths_train, cls_train, labels_train = dataset.get_training_set()
133 |
134 | # Get the file-paths for the images and their associated class-numbers
135 | # and class-labels. This is for the test-set.
136 | image_paths_test, cls_test, labels_test = dataset.get_test_set()
137 |
138 | # Check if the training-set looks OK.
139 |
140 | # Print some of the file-paths for the training-set.
141 | for path in image_paths_train[0:5]:
142 | print(path)
143 |
144 | # Print the associated class-numbers.
145 | print(cls_train[0:5])
146 |
147 | # Print the class-numbers as one-hot encoded arrays.
148 | print(labels_train[0:5])
149 |
150 | # Check if the test-set looks OK.
151 |
152 | # Print some of the file-paths for the test-set.
153 | for path in image_paths_test[0:5]:
154 | print(path)
155 |
156 | # Print the associated class-numbers.
157 | print(cls_test[0:5])
158 |
159 | # Print the class-numbers as one-hot encoded arrays.
160 | print(labels_test[0:5])
161 |
162 | ########################################################################
163 |
--------------------------------------------------------------------------------
/mnist.py:
--------------------------------------------------------------------------------
1 | ########################################################################
2 | #
3 | # Downloads the MNIST data-set for recognizing hand-written digits.
4 | #
5 | # Implemented in Python 3.6
6 | #
7 | # Usage:
8 | # 1) Create a new object instance: data = MNIST(data_dir="data/MNIST/")
9 | # This automatically downloads the files to the given dir.
10 | # 2) Use the training-set as data.x_train, data.y_train and data.y_train_cls
11 | # 3) Get random batches of training data using data.random_batch()
12 | # 4) Use the test-set as data.x_test, data.y_test and data.y_test_cls
13 | #
14 | ########################################################################
15 | #
16 | # This file is part of the TensorFlow Tutorials available at:
17 | #
18 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials
19 | #
20 | # Published under the MIT License. See the file LICENSE for details.
21 | #
22 | # Copyright 2016-18 by Magnus Erik Hvass Pedersen
23 | #
24 | ########################################################################
25 |
26 | import numpy as np
27 | import gzip
28 | import os
29 | from dataset import one_hot_encoded
30 | from download import download
31 |
32 | ########################################################################
33 |
34 | # Base URL for downloading the data-files from the internet.
35 | base_url = "https://storage.googleapis.com/cvdf-datasets/mnist/"
36 |
37 | # Filenames for the data-set.
38 | filename_x_train = "train-images-idx3-ubyte.gz"
39 | filename_y_train = "train-labels-idx1-ubyte.gz"
40 | filename_x_test = "t10k-images-idx3-ubyte.gz"
41 | filename_y_test = "t10k-labels-idx1-ubyte.gz"
42 |
43 | ########################################################################
44 |
45 |
46 | class MNIST:
47 | """
48 | The MNIST data-set for recognizing hand-written digits.
49 | This automatically downloads the data-files if they do
50 | not already exist in the local data_dir.
51 |
52 | Note: Pixel-values are floats between 0.0 and 1.0.
53 | """
54 |
55 | # The images are 28 pixels in each dimension.
56 | img_size = 28
57 |
58 | # The images are stored in one-dimensional arrays of this length.
59 | img_size_flat = img_size * img_size
60 |
61 | # Tuple with height and width of images used to reshape arrays.
62 | img_shape = (img_size, img_size)
63 |
64 | # Number of colour channels for the images: 1 channel for gray-scale.
65 | num_channels = 1
66 |
67 | # Tuple with height, width and depth used to reshape arrays.
68 | # This is used for reshaping in Keras.
69 | img_shape_full = (img_size, img_size, num_channels)
70 |
71 | # Number of classes, one class for each of 10 digits.
72 | num_classes = 10
73 |
74 | def __init__(self, data_dir="data/MNIST/"):
75 | """
76 | Load the MNIST data-set. Automatically downloads the files
77 | if they do not already exist locally.
78 |
79 | :param data_dir: Base-directory for downloading files.
80 | """
81 |
82 | # Copy args to self.
83 | self.data_dir = data_dir
84 |
85 | # Number of images in each sub-set.
86 | self.num_train = 55000
87 | self.num_val = 5000
88 | self.num_test = 10000
89 |
90 | # Download / load the training-set.
91 | x_train = self._load_images(filename=filename_x_train)
92 | y_train_cls = self._load_cls(filename=filename_y_train)
93 |
94 | # Split the training-set into train / validation.
95 | # Pixel-values are converted from ints between 0 and 255
96 | # to floats between 0.0 and 1.0.
97 | self.x_train = x_train[0:self.num_train] / 255.0
98 | self.x_val = x_train[self.num_train:] / 255.0
99 | self.y_train_cls = y_train_cls[0:self.num_train]
100 | self.y_val_cls = y_train_cls[self.num_train:]
101 |
102 | # Download / load the test-set.
103 | self.x_test = self._load_images(filename=filename_x_test) / 255.0
104 | self.y_test_cls = self._load_cls(filename=filename_y_test)
105 |
106 | # Convert the class-numbers from bytes to ints as that is needed
107 | # some places in TensorFlow.
108 | self.y_train_cls = self.y_train_cls.astype(np.int)
109 | self.y_val_cls = self.y_val_cls.astype(np.int)
110 | self.y_test_cls = self.y_test_cls.astype(np.int)
111 |
112 | # Convert the integer class-numbers into one-hot encoded arrays.
113 | self.y_train = one_hot_encoded(class_numbers=self.y_train_cls,
114 | num_classes=self.num_classes)
115 | self.y_val = one_hot_encoded(class_numbers=self.y_val_cls,
116 | num_classes=self.num_classes)
117 | self.y_test = one_hot_encoded(class_numbers=self.y_test_cls,
118 | num_classes=self.num_classes)
119 |
120 | def _load_data(self, filename, offset):
121 | """
122 | Load the data in the given file. Automatically downloads the file
123 | if it does not already exist in the data_dir.
124 |
125 | :param filename: Name of the data-file.
126 | :param offset: Start offset in bytes when reading the data-file.
127 | :return: The data as a numpy array.
128 | """
129 |
130 | # Download the file from the internet if it does not exist locally.
131 | download(base_url=base_url, filename=filename, download_dir=self.data_dir)
132 |
133 | # Read the data-file.
134 | path = os.path.join(self.data_dir, filename)
135 | with gzip.open(path, 'rb') as f:
136 | data = np.frombuffer(f.read(), np.uint8, offset=offset)
137 |
138 | return data
139 |
140 | def _load_images(self, filename):
141 | """
142 | Load image-data from the given file.
143 | Automatically downloads the file if it does not exist locally.
144 |
145 | :param filename: Name of the data-file.
146 | :return: Numpy array.
147 | """
148 |
149 | # Read the data as one long array of bytes.
150 | data = self._load_data(filename=filename, offset=16)
151 |
152 | # Reshape to 2-dim array with shape (num_images, img_size_flat).
153 | images_flat = data.reshape(-1, self.img_size_flat)
154 |
155 | return images_flat
156 |
157 | def _load_cls(self, filename):
158 | """
159 | Load class-numbers from the given file.
160 | Automatically downloads the file if it does not exist locally.
161 |
162 | :param filename: Name of the data-file.
163 | :return: Numpy array.
164 | """
165 | return self._load_data(filename=filename, offset=8)
166 |
167 | def random_batch(self, batch_size=32):
168 | """
169 | Create a random batch of training-data.
170 |
171 | :param batch_size: Number of images in the batch.
172 | :return: 3 numpy arrays (x, y, y_cls)
173 | """
174 |
175 | # Create a random index into the training-set.
176 | idx = np.random.randint(low=0, high=self.num_train, size=batch_size)
177 |
178 | # Use the index to lookup random training-data.
179 | x_batch = self.x_train[idx]
180 | y_batch = self.y_train[idx]
181 | y_batch_cls = self.y_train_cls[idx]
182 |
183 | return x_batch, y_batch, y_batch_cls
184 |
185 |
186 | ########################################################################
187 |
--------------------------------------------------------------------------------
/requirements.txt:
--------------------------------------------------------------------------------
1 | ################################################################
2 | #
3 | # Python package requirements for the TensorFlow Tutorials:
4 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials
5 | #
6 | # If you are using Anaconda then you can install all required
7 | # Python packages by running the following commands in a shell:
8 | #
9 | # conda create --name tf python=3
10 | # source activate tf
11 | # pip install -r requirements.txt
12 | #
13 | # Note that you have to edit this file to select whether you
14 | # want to install the CPU or GPU version of TensorFlow.
15 | #
16 | ################################################################
17 | # Basic packages used in many of the tutorials.
18 |
19 | numpy
20 | scipy
21 | jupyter
22 | matplotlib
23 | Pillow
24 | scikit-learn
25 |
26 | ################################################################
27 | # TensorFlow v.2.1 and above include both CPU and GPU versions.
28 |
29 | tensorflow
30 |
31 | ################################################################
32 | # Some tutorials use other individual Python packages.
33 | # Uncomment the relevant lines for the tutorials you want to run.
34 |
35 | # gym[atari] # Tutorial #16 on Reinforcement Learning.
36 | # pandas # Tutorial #23 on Time-Series Prediction.
37 |
38 | ################################################################
39 | # PrettyTensor was used as the builder API for several of the
40 | # earlier tutorials. PrettyTensor is apparently no longer being
41 | # maintained and may not work with newer versions of TensorFlow.
42 |
43 | # prettytensor
44 |
45 | ################################################################
46 |
--------------------------------------------------------------------------------
/vgg16.py:
--------------------------------------------------------------------------------
1 | ########################################################################
2 | #
3 | # The pre-trained VGG16 Model for TensorFlow.
4 | #
5 | # This model seems to produce better-looking images in Style Transfer
6 | # than the Inception 5h model that otherwise works well for DeepDream.
7 | #
8 | # See the Python Notebook for Tutorial #15 for an example usage.
9 | #
10 | # Implemented in Python 3.5 with TensorFlow v0.11.0rc0
11 | #
12 | ########################################################################
13 | #
14 | # This file is part of the TensorFlow Tutorials available at:
15 | #
16 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials
17 | #
18 | # Published under the MIT License. See the file LICENSE for details.
19 | #
20 | # Copyright 2016 by Magnus Erik Hvass Pedersen
21 | #
22 | ########################################################################
23 |
24 | import numpy as np
25 | import tensorflow as tf
26 | import download
27 | import os
28 |
29 | ########################################################################
30 | # Various directories and file-names.
31 |
32 | # The pre-trained VGG16 model is taken from this tutorial:
33 | # https://github.com/pkmital/CADL/blob/master/session-4/libs/vgg16.py
34 |
35 | # The class-names are available in the following URL:
36 | # https://s3.amazonaws.com/cadl/models/synset.txt
37 |
38 | # Internet URL for the file with the VGG16 model.
39 | # Note that this might change in the future and will need to be updated.
40 | data_url = "https://s3.amazonaws.com/cadl/models/vgg16.tfmodel"
41 |
42 | # Directory to store the downloaded data.
43 | data_dir = "vgg16/"
44 |
45 | # File containing the TensorFlow graph definition. (Downloaded)
46 | path_graph_def = "vgg16.tfmodel"
47 |
48 | ########################################################################
49 |
50 |
51 | def maybe_download():
52 | """
53 | Download the VGG16 model from the internet if it does not already
54 | exist in the data_dir. WARNING! The file is about 550 MB.
55 | """
56 |
57 | print("Downloading VGG16 Model ...")
58 |
59 | # The file on the internet is not stored in a compressed format.
60 | # This function should not extract the file when it does not have
61 | # a relevant filename-extensions such as .zip or .tar.gz
62 | download.maybe_download_and_extract(url=data_url, download_dir=data_dir)
63 |
64 |
65 | ########################################################################
66 |
67 |
68 | class VGG16:
69 | """
70 | The VGG16 model is a Deep Neural Network which has already been
71 | trained for classifying images into 1000 different categories.
72 |
73 | When you create a new instance of this class, the VGG16 model
74 | will be loaded and can be used immediately without training.
75 | """
76 |
77 | # Name of the tensor for feeding the input image.
78 | tensor_name_input_image = "images:0"
79 |
80 | # Names of the tensors for the dropout random-values..
81 | tensor_name_dropout = 'dropout/random_uniform:0'
82 | tensor_name_dropout1 = 'dropout_1/random_uniform:0'
83 |
84 | # Names for the convolutional layers in the model for use in Style Transfer.
85 | layer_names = ['conv1_1/conv1_1', 'conv1_2/conv1_2',
86 | 'conv2_1/conv2_1', 'conv2_2/conv2_2',
87 | 'conv3_1/conv3_1', 'conv3_2/conv3_2', 'conv3_3/conv3_3',
88 | 'conv4_1/conv4_1', 'conv4_2/conv4_2', 'conv4_3/conv4_3',
89 | 'conv5_1/conv5_1', 'conv5_2/conv5_2', 'conv5_3/conv5_3']
90 |
91 | def __init__(self):
92 | # Now load the model from file. The way TensorFlow
93 | # does this is confusing and requires several steps.
94 |
95 | # Create a new TensorFlow computational graph.
96 | self.graph = tf.Graph()
97 |
98 | # Set the new graph as the default.
99 | with self.graph.as_default():
100 |
101 | # TensorFlow graphs are saved to disk as so-called Protocol Buffers
102 | # aka. proto-bufs which is a file-format that works on multiple
103 | # platforms. In this case it is saved as a binary file.
104 |
105 | # Open the graph-def file for binary reading.
106 | path = os.path.join(data_dir, path_graph_def)
107 | with tf.gfile.FastGFile(path, 'rb') as file:
108 | # The graph-def is a saved copy of a TensorFlow graph.
109 | # First we need to create an empty graph-def.
110 | graph_def = tf.GraphDef()
111 |
112 | # Then we load the proto-buf file into the graph-def.
113 | graph_def.ParseFromString(file.read())
114 |
115 | # Finally we import the graph-def to the default TensorFlow graph.
116 | tf.import_graph_def(graph_def, name='')
117 |
118 | # Now self.graph holds the VGG16 model from the proto-buf file.
119 |
120 | # Get a reference to the tensor for inputting images to the graph.
121 | self.input = self.graph.get_tensor_by_name(self.tensor_name_input_image)
122 |
123 | # Get references to the tensors for the commonly used layers.
124 | self.layer_tensors = [self.graph.get_tensor_by_name(name + ":0") for name in self.layer_names]
125 |
126 | def get_layer_tensors(self, layer_ids):
127 | """
128 | Return a list of references to the tensors for the layers with the given id's.
129 | """
130 |
131 | return [self.layer_tensors[idx] for idx in layer_ids]
132 |
133 | def get_layer_names(self, layer_ids):
134 | """
135 | Return a list of names for the layers with the given id's.
136 | """
137 |
138 | return [self.layer_names[idx] for idx in layer_ids]
139 |
140 | def get_all_layer_names(self, startswith=None):
141 | """
142 | Return a list of all the layers (operations) in the graph.
143 | The list can be filtered for names that start with the given string.
144 | """
145 |
146 | # Get a list of the names for all layers (operations) in the graph.
147 | names = [op.name for op in self.graph.get_operations()]
148 |
149 | # Filter the list of names so we only get those starting with
150 | # the given string.
151 | if startswith is not None:
152 | names = [name for name in names if name.startswith(startswith)]
153 |
154 | return names
155 |
156 | def create_feed_dict(self, image):
157 | """
158 | Create and return a feed-dict with an image.
159 |
160 | :param image:
161 | The input image is a 3-dim array which is already decoded.
162 | The pixels MUST be values between 0 and 255 (float or int).
163 |
164 | :return:
165 | Dict for feeding to the graph in TensorFlow.
166 | """
167 |
168 | # Expand 3-dim array to 4-dim by prepending an 'empty' dimension.
169 | # This is because we are only feeding a single image, but the
170 | # VGG16 model was built to take multiple images as input.
171 | image = np.expand_dims(image, axis=0)
172 |
173 | if False:
174 | # In the original code using this VGG16 model, the random values
175 | # for the dropout are fixed to 1.0.
176 | # Experiments suggest that it does not seem to matter for
177 | # Style Transfer, and this causes an error with a GPU.
178 | dropout_fix = 1.0
179 |
180 | # Create feed-dict for inputting data to TensorFlow.
181 | feed_dict = {self.tensor_name_input_image: image,
182 | self.tensor_name_dropout: [[dropout_fix]],
183 | self.tensor_name_dropout1: [[dropout_fix]]}
184 | else:
185 | # Create feed-dict for inputting data to TensorFlow.
186 | feed_dict = {self.tensor_name_input_image: image}
187 |
188 | return feed_dict
189 |
190 | ########################################################################
191 |
--------------------------------------------------------------------------------
/weather.py:
--------------------------------------------------------------------------------
1 | ########################################################################
2 | #
3 | # Functions for downloading and re-sampling weather-data
4 | # for 5 cities in Denmark between 1980-2018.
5 | #
6 | # The raw data was obtained from:
7 | #
8 | # National Climatic Data Center (NCDC) in USA
9 | # https://www7.ncdc.noaa.gov/CDO/cdoselect.cmd
10 | #
11 | # Note that the NCDC's database functionality may change soon, and
12 | # that the CSV-file needed some manual editing before it could be read.
13 | # See the function _convert_raw_data() below for inspiration if you
14 | # want to convert a new data-file from NCDC's database.
15 | #
16 | # Implemented in Python 3.6
17 | #
18 | # Usage:
19 | # 1) Set the desired storage directory in the data_dir variable.
20 | # 2) Call maybe_download_and_extract() to download the data-set
21 | # if it is not already located in the given data_dir.
22 | # 3) Either call load_original_data() or load_resampled_data()
23 | # to load the original or resampled data for use in your program.
24 | #
25 | # Format:
26 | # The raw data-file from NCDC is not included in the downloaded archive,
27 | # which instead contains a cleaned-up version of the raw data-file
28 | # referred to as the "original data". This data has not yet been resampled.
29 | # The original data-file is available as a pickled file for fast reloading
30 | # with Pandas, and as a CSV-file for broad compatibility.
31 | #
32 | ########################################################################
33 | #
34 | # This file is part of the TensorFlow Tutorials available at:
35 | #
36 | # https://github.com/Hvass-Labs/TensorFlow-Tutorials
37 | #
38 | # Published under the MIT License. See the file LICENSE for details.
39 | #
40 | # Copyright 2018 by Magnus Erik Hvass Pedersen
41 | #
42 | ########################################################################
43 |
44 | import pandas as pd
45 | import os
46 | import download
47 |
48 | ########################################################################
49 |
50 | # Directory where you want to download and save the data-set.
51 | # Set this before you start calling any of the functions below.
52 | data_dir = "data/weather-denmark/"
53 |
54 |
55 | # Full path for the pickled data-file. (Original data).
56 | def path_original_data_pickle():
57 | return os.path.join(data_dir, "weather-denmark.pkl")
58 |
59 |
60 | # Full path for the comma-separated text-file. (Original data).
61 | def path_original_data_csv():
62 | return os.path.join(data_dir, "weather-denmark.csv")
63 |
64 |
65 | # Full path for the resampled data as a pickled file.
66 | def path_resampled_data_pickle():
67 | return os.path.join(data_dir, "weather-denmark-resampled.pkl")
68 |
69 |
70 | # URL for the data-set on the internet.
71 | data_url = "https://github.com/Hvass-Labs/weather-denmark/raw/master/weather-denmark.tar.gz"
72 |
73 |
74 | # List of the cities in this data-set. These are cities in Denmark.
75 | cities = ['Aalborg', 'Aarhus', 'Esbjerg', 'Odense', 'Roskilde']
76 |
77 |
78 | ########################################################################
79 | # Private helper-functions.
80 |
81 |
82 | def _date_string(x):
83 | """Convert two integers to a string for the date and time."""
84 |
85 | date = x[0] # Date. Example: 19801231
86 | time = x[1] # Time. Example: 1230
87 |
88 | return "{0}{1:04d}".format(date, time)
89 |
90 |
91 | def _usaf_to_city(usaf):
92 | """
93 | The raw data-file uses USAF-codes to identify weather-stations.
94 | If you download another data-set from NCDC then you will have to
95 | change this function to use the USAF-codes in your new data-file.
96 | """
97 |
98 | table = \
99 | {
100 | 60300: 'Aalborg',
101 | 60700: 'Aarhus',
102 | 60800: 'Esbjerg',
103 | 61200: 'Odense',
104 | 61700: 'Roskilde'
105 | }
106 |
107 | return table[usaf]
108 |
109 |
110 | def _convert_raw_data(path):
111 | """
112 | This converts a raw data-file obtained from the NCDC database.
113 | This function may be useful as an inspiration if you want to
114 | download another raw data-file from NCDC, but you will have
115 | to modify this function to match the data you have downloaded.
116 |
117 | Note that you may also have to manually edit the raw data-file,
118 | e.g. because the header is not in a proper comma-separated format.
119 | """
120 |
121 | # The raw CSV-file uses various markers for "not-available" (NA).
122 | # (This is one of several oddities with NCDC's file-format.)
123 | na_values = ['999', '999.0', '999.9', '9999.9']
124 |
125 | # Use Pandas to load the comma-separated file.
126 | # Note that you may have to manually edit the file's header
127 | # to get this to load correctly.
128 | df_raw = pd.read_csv(path, sep=',', header=1,
129 | index_col=False, na_values=na_values)
130 |
131 | # Create a new data-frame containing only the data
132 | # we are interested in.
133 | df = pd.DataFrame()
134 |
135 | # Get the city-name / weather-station name from the USAF code.
136 | df['City'] = df_raw['USAF '].apply(_usaf_to_city)
137 |
138 | # Convert the integer date-time to a proper date-time object.
139 | datestr = df_raw[['Date ', 'HrMn']].apply(_date_string, axis=1)
140 | df['DateTime'] = pd.to_datetime(datestr, format='%Y%m%d%H%M')
141 |
142 | # Get the data we are interested in.
143 | df['Temp'] = df_raw['Temp ']
144 | df['Pressure'] = df_raw['Slp ']
145 | df['WindSpeed'] = df_raw['Spd ']
146 | df['WindDir'] = df_raw['Dir']
147 |
148 | # Set the city-name and date-time as the index.
149 | df.set_index(['City', 'DateTime'], inplace=True)
150 |
151 | # Save the new data-frame as a pickle for fast reloading.
152 | df.to_pickle(path_original_data_pickle())
153 |
154 | # Save the new data-frame as a CSV-file for general readability.
155 | df.to_csv(path_original_data_csv())
156 |
157 | return df
158 |
159 |
160 | def _resample(df):
161 | """
162 | Resample the contents of a Pandas data-frame by first
163 | removing empty rows and columns, then up-sampling and
164 | interpolating the data for 1-minute intervals, and
165 | finally down-sampling to 60-minute intervals.
166 | """
167 |
168 | # Remove all empty rows.
169 | df_res = df.dropna(how='all')
170 |
171 | # Upsample so the time-series has data for every minute.
172 | df_res = df_res.resample('1T')
173 |
174 | # Fill in missing values.
175 | df_res = df_res.interpolate(method='time')
176 |
177 | # Downsample so the time-series has data for every hour.
178 | df_res = df_res.resample('60T')
179 |
180 | # Finalize the resampling. (Is this really necessary?)
181 | df_res = df_res.interpolate()
182 |
183 | # Remove all empty rows.
184 | df_res = df_res.dropna(how='all')
185 |
186 | return df_res
187 |
188 |
189 | ########################################################################
190 | # Public functions that you may call to download the data-set from
191 | # the internet and load the data into memory.
192 |
193 |
194 | def maybe_download_and_extract():
195 | """
196 | Download and extract the weather-data if the data-files don't
197 | already exist in the data_dir.
198 | """
199 |
200 | download.maybe_download_and_extract(url=data_url, download_dir=data_dir)
201 |
202 |
203 | def load_original_data():
204 | """
205 | Load and return the original data that has not been resampled.
206 |
207 | Note that this is not the raw data obtained from NCDC.
208 | It is a cleaned-up version of that data, as written by the
209 | function _convert_raw_data() above.
210 | """
211 |
212 | return pd.read_pickle(path_original_data_pickle())
213 |
214 |
215 | def load_resampled_data():
216 | """
217 | Load and return the resampled weather-data.
218 |
219 | This has data-points at regular 60-minute intervals where
220 | missing data has been linearly interpolated.
221 |
222 | This uses a cache-file for saving and quickly reloading the data,
223 | so the original data is only resampled once.
224 | """
225 |
226 | # Path for the cache-file with the resampled data.
227 | path = path_resampled_data_pickle()
228 |
229 | # If the cache-file exists ...
230 | if os.path.exists(path):
231 | # Reload the cache-file.
232 | df = pd.read_pickle(path)
233 | else:
234 | # Otherwise resample the original data and save it in a cache-file.
235 |
236 | # Load the original data.
237 | df_org = load_original_data()
238 |
239 | # Split the original data into separate data-frames for each city.
240 | df_cities = [df_org.xs(city) for city in cities]
241 |
242 | # Resample the data for each city.
243 | df_resampled = [_resample(df_city) for df_city in df_cities]
244 |
245 | # Join the resampled data into a single data-frame.
246 | df = pd.concat(df_resampled, keys=cities,
247 | axis=1, join='inner')
248 |
249 | # Save the resampled data in a cache-file for quick reloading.
250 | df.to_pickle(path)
251 |
252 | return df
253 |
254 |
255 | ########################################################################
256 |
--------------------------------------------------------------------------------