├── .gitignore
├── 01 - Regression
├── -- TensorBoard.ipynb
├── 00.0 - TensorFlow Version Update.ipynb
├── 01.0 - Regression Data Generation.ipynb
├── 02.0 - TF Regression Model - Estimator APIs + Pandas.ipynb
├── 03.0 - TF Regression Model - Experiment APIs + CSV Files.ipynb
├── 04.0 - TF Regression Model - Dataset Input + JSON Serving.ipynb
├── 04.0 - TF Regression Model - Dataset Input.ipynb
├── 05.0 - TF Regression Model - Custom Estimator.ipynb
├── 06.0 - Convert CSV to TFRecords.ipynb
├── 07.0 - TF Regression Model - DNN Wide & Deep + estimator.train_and_evaluate.ipynb
├── 08.0 - TF Regression Example - Housing Price Estimation + Features Scaling.ipynb
└── data
│ ├── housingdata.csv
│ ├── new-data.csv
│ ├── new-data.json
│ ├── test-data.csv
│ ├── test-data.tfrecords
│ ├── train-data.csv
│ ├── train-data.tfrecords
│ ├── valid-data.csv
│ └── valid-data.tfrecords
├── 02 - Classification
├── -- TensorBoard.ipynb
├── 00.0 - TensorFlow Version Update.ipynb
├── 01.0 - Classification Data Generation.ipynb
├── 02.0 - Convert CSV to TFRecords.ipynb
├── 03.0 - TF Classification Model - DNN Wide & Deep + Train_And_Evaluate + Dataset + TFRecords.ipynb
├── 04.0 - TF Classification Model - Custom Estimator + Experiment + Dataset + CSV.ipynb
├── 05.0 - Classification Example - Census Income Prediction.ipynb
├── 06.0 - Classification Example - Census Income Prediction - Custom Estimator + Exponential Decay Learning Rate.ipynb
└── data
│ ├── adult.data.csv
│ ├── adult.stats.csv
│ ├── adult.test.csv
│ ├── test-data.csv
│ ├── test-data.tfrecords
│ ├── train-data.csv
│ ├── train-data.tfrecords
│ ├── valid-data.csv
│ └── valid-data.tfrecords
├── 03 - Clustering
├── 00.0 - TensorFlow Version Update.ipynb
├── 01.0 - Generate Data Points + SKLearn Clustering.ipynb
├── 02.0 - TF k-means - Estimator API.ipynb
├── 03.0 - TF k-means - Experiment API.ipynb
└── data
│ ├── new-data.csv
│ ├── test-data.csv
│ └── train-data.csv
├── 04 - Times Series
├── 00.0 - Generate Time Series Data.ipynb
├── 01.0 - TF ARRegressor - Estimator + Numpy.ipynb
├── 02.0 - TF ARRegressor - Experiment + CSV.ipynb
└── data
│ ├── test-data.csv
│ ├── timeseries-multivariate.txt
│ ├── timeseries-univariate.csv
│ └── train-data.csv
├── 05 - Autoencoding
├── 01.0 - Generate Dataset with High-Dimensionality.ipynb
├── 02.0 - Dimensionality Reduction - Autoencoding + Custom Estimator.ipynb
├── 03.0 - Dimensionality Reduction - Autoencoding + Normalizer + XEntropy Loss.ipynb
├── 04.0 - Dimensionality Reduction - Autoencoding + Custom Estimator with MNIST.ipynb
└── data
│ └── data-01.csv
├── 06 - Sequence Models
├── 01 - RNN with LSTM - Predicting the Next Values - Single Pattern.ipynb
├── 02 - RNN with LSTM - Predicting the Next Values - Multiple Patterns.ipynb
├── 03 - RNN with LSTM - Sequence Classification.ipynb
├── TODO.txt
└── data
│ ├── seq01.test.csv
│ └── seq01.train.csv
├── 07 - Image Analysis
├── 00.0 - TensorFlow Version Update.ipynb
├── 01.0 - CNN Example with CIFAR-10 dataset.ipynb
├── 02.0 - CNN Example with CIFAR-10 dataset using TFRecords.ipynb
└── 03.0 - CNN Example with CIFAR-10 (Keras ver.).ipynb
├── 08 - Text Analysis
├── 01 - Text Classification - SMS Ham vs. Spam - Data Preparation.ipynb
├── 02 - Text Classification - SMS Ham vs. Spam - Document Embedding.ipynb
├── 03 - Text Classification - SMS Ham vs. Spam - Word Embeddings + CNN.ipynb
├── 04 - Text Classification - SMS Ham vs. Spam - Word Embeddings + LSTM.ipynb
├── 05 - Text Classification - Hacker News - End-to-End + TF-Hub Sentence Embedding.ipynb
├── 06 - Part_1 - Text Classification - Hacker News - Data Preprocessing with TFT.ipynb
├── 06 - Part_2 - Text Classification - Hacker News - DNNClassifier with TF-Hub Sentence Embedding.ipynb
├── 06 - Part_3 - Text Classification - Hacker News - Custom Estimator Word Embedding.ipynb
├── 06 - Part_4 - Text Classification - Hacker News - DNNClassifier with TF.IDF.ipynb
└── data
│ └── sms-spam
│ ├── SMSSpamCollection
│ ├── n_words.tsv
│ ├── train-data.tsv
│ ├── valid-data.tsv
│ └── vocab_list.tsv
├── README.md
└── images
└── exp-api2.png
/.gitignore:
--------------------------------------------------------------------------------
1 | 01 - Regression/trained_models
2 | 01 - Regression/.ipynb_checkpoints
3 | 01 - Regression/.DS_Store
4 | 02 - Classification/trained_models
5 | 02 - Classification/.ipynb_checkpoints
6 | 02 - Classification/.DS_Store
7 | 03 - Clustering/trained_models
8 | 03 - Clustering/.ipynb_checkpoints
9 | 03 - Clustering/.DS_Store
10 | 04 - Time Series/trained_models
11 | 04 - Time Series/.ipynb_checkpoints
12 | 04 - Time Series/.DS_Store
13 | 05 - Autoencoding/trained_models
14 | 05 - Autoencoding/.ipynb_checkpoints
15 | 05 - Autoencoding/.DS_Store
16 | .ipynb_checkpoints
17 | .DS_Store
18 |
--------------------------------------------------------------------------------
/01 - Regression/-- TensorBoard.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "MODEL_NAME = 'reg-model-01'\n",
10 | "model_dir = 'trained_models/{}'.format(MODEL_NAME)\n",
11 | "print(model_dir)"
12 | ]
13 | },
14 | {
15 | "cell_type": "markdown",
16 | "metadata": {},
17 | "source": [
18 | "## Start TensorBoard Process"
19 | ]
20 | },
21 | {
22 | "cell_type": "code",
23 | "execution_count": null,
24 | "metadata": {},
25 | "outputs": [],
26 | "source": [
27 | "from google.datalab.ml import TensorBoard\n",
28 | "TensorBoard().start(model_dir)\n",
29 | "TensorBoard().list()"
30 | ]
31 | },
32 | {
33 | "cell_type": "markdown",
34 | "metadata": {},
35 | "source": [
36 | "## Kill TensorBoard Process"
37 | ]
38 | },
39 | {
40 | "cell_type": "code",
41 | "execution_count": null,
42 | "metadata": {},
43 | "outputs": [],
44 | "source": [
45 | "# to stop TensorBoard\n",
46 | "TensorBoard().stop(23002)\n",
47 | "print('stopped TensorBoard')\n",
48 | "TensorBoard().list()"
49 | ]
50 | }
51 | ],
52 | "metadata": {
53 | "kernelspec": {
54 | "display_name": "Python 3",
55 | "language": "python",
56 | "name": "python3"
57 | },
58 | "language_info": {
59 | "codemirror_mode": {
60 | "name": "ipython",
61 | "version": 3
62 | },
63 | "file_extension": ".py",
64 | "mimetype": "text/x-python",
65 | "name": "python",
66 | "nbconvert_exporter": "python",
67 | "pygments_lexer": "ipython3",
68 | "version": "3.6.1"
69 | }
70 | },
71 | "nbformat": 4,
72 | "nbformat_minor": 2
73 | }
74 |
--------------------------------------------------------------------------------
/01 - Regression/00.0 - TensorFlow Version Update.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [
8 | {
9 | "name": "stdout",
10 | "output_type": "stream",
11 | "text": [
12 | "Collecting tensorflow\n",
13 | " Downloading tensorflow-1.4.0-cp36-cp36m-macosx_10_11_x86_64.whl (39.3MB)\n",
14 | "Collecting tensorflow-tensorboard<0.5.0,>=0.4.0rc1 (from tensorflow)\n",
15 | " Downloading tensorflow_tensorboard-0.4.0rc2-py3-none-any.whl (1.7MB)\n",
16 | "Requirement already up-to-date: protobuf>=3.3.0 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n",
17 | "Requirement already up-to-date: numpy>=1.12.1 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n",
18 | "Requirement already up-to-date: wheel>=0.26 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n",
19 | "Collecting enum34>=1.1.6 (from tensorflow)\n",
20 | " Downloading enum34-1.1.6-py3-none-any.whl\n",
21 | "Requirement already up-to-date: six>=1.10.0 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n",
22 | "Requirement already up-to-date: werkzeug>=0.11.10 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n",
23 | "Requirement already up-to-date: html5lib==0.9999999 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n",
24 | "Requirement already up-to-date: markdown>=2.6.8 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n",
25 | "Requirement already up-to-date: bleach==1.5.0 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n",
26 | "Requirement already up-to-date: setuptools in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from protobuf>=3.3.0->tensorflow)\n",
27 | "Installing collected packages: tensorflow-tensorboard, enum34, tensorflow\n",
28 | " Found existing installation: tensorflow-tensorboard 0.1.8\n",
29 | " Uninstalling tensorflow-tensorboard-0.1.8:\n",
30 | " Successfully uninstalled tensorflow-tensorboard-0.1.8\n",
31 | " Found existing installation: tensorflow 1.3.0\n",
32 | " Uninstalling tensorflow-1.3.0:\n",
33 | " Successfully uninstalled tensorflow-1.3.0\n",
34 | "Successfully installed enum34-1.1.6 tensorflow-1.4.0 tensorflow-tensorboard-0.4.0rc2\n"
35 | ]
36 | }
37 | ],
38 | "source": [
39 | "%%bash\n",
40 | "\n",
41 | "pip install -U tensorflow"
42 | ]
43 | },
44 | {
45 | "cell_type": "code",
46 | "execution_count": 2,
47 | "metadata": {},
48 | "outputs": [
49 | {
50 | "name": "stderr",
51 | "output_type": "stream",
52 | "text": [
53 | "/Users/khalidsalama/anaconda/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6\n",
54 | " return f(*args, **kwds)\n"
55 | ]
56 | },
57 | {
58 | "name": "stdout",
59 | "output_type": "stream",
60 | "text": [
61 | "1.4.0\n"
62 | ]
63 | }
64 | ],
65 | "source": [
66 | "import tensorflow as tf\n",
67 | "print(tf.__version__)"
68 | ]
69 | },
70 | {
71 | "cell_type": "code",
72 | "execution_count": null,
73 | "metadata": {
74 | "collapsed": true
75 | },
76 | "outputs": [],
77 | "source": []
78 | }
79 | ],
80 | "metadata": {
81 | "kernelspec": {
82 | "display_name": "Python 3",
83 | "language": "python",
84 | "name": "python3"
85 | },
86 | "language_info": {
87 | "codemirror_mode": {
88 | "name": "ipython",
89 | "version": 3
90 | },
91 | "file_extension": ".py",
92 | "mimetype": "text/x-python",
93 | "name": "python",
94 | "nbconvert_exporter": "python",
95 | "pygments_lexer": "ipython3",
96 | "version": "3.6.1"
97 | }
98 | },
99 | "nbformat": 4,
100 | "nbformat_minor": 2
101 | }
102 |
--------------------------------------------------------------------------------
/01 - Regression/01.0 - Regression Data Generation.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [
8 | {
9 | "name": "stderr",
10 | "output_type": "stream",
11 | "text": [
12 | "/Users/khalidsalama/anaconda/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.\n",
13 | " \"This module will be removed in 0.20.\", DeprecationWarning)\n",
14 | "/Users/khalidsalama/anaconda/lib/python3.6/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.\n",
15 | " DeprecationWarning)\n",
16 | "/Users/khalidsalama/anaconda/lib/python3.6/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20\n",
17 | " DeprecationWarning)\n"
18 | ]
19 | }
20 | ],
21 | "source": [
22 | "import numpy as np\n",
23 | "import pandas as pd\n",
24 | "from sklearn import *\n",
25 | "import matplotlib.pyplot as plt\n",
26 | "%matplotlib inline"
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": 2,
32 | "metadata": {
33 | "collapsed": true
34 | },
35 | "outputs": [],
36 | "source": [
37 | "sample_size = 5000"
38 | ]
39 | },
40 | {
41 | "cell_type": "code",
42 | "execution_count": 3,
43 | "metadata": {
44 | "collapsed": true
45 | },
46 | "outputs": [],
47 | "source": [
48 | "\n",
49 | "data1,target1 = datasets.make_circles(n_samples=sample_size, factor=.1, noise=0.2)\n",
50 | "target1 = (3*data1[:,0])-(16*data1[:,1]) + (0.5*data1[:,0]*data1[:,1]) + np.random.normal(0,2,size=sample_size)\n",
51 | "\n",
52 | "\n",
53 | "data2,target2 = datasets.make_circles(n_samples=sample_size, factor=.5, noise=0.2)\n",
54 | "target2 = np.power(data2[:,0],2) + 10*np.power(data2[:,1],3) + (50*data2[:,0]*np.power(data2[:,1],2)) + np.random.normal(0,2,size=sample_size)\n",
55 | "\n",
56 | "data3,target3 = datasets.make_moons(n_samples=sample_size,noise=0.2)\n",
57 | "data3[:,0] = (2 * (data3[:, 0]-(-1))/(3))-1\n",
58 | "data3[:,1] = (2 * (data3[:, 1]-(-1))/(2))-1\n",
59 | "target3 = (50*data3[:,0]*np.sin(data3[:,1])) + (50*data3[:,1]*np.cos(data3[:,0]))\n",
60 | "\n",
61 | "data4,target4 = datasets.make_moons(n_samples=sample_size,noise=0.2)\n",
62 | "\n",
63 | "temp = np.copy(data4[:, 0])\n",
64 | "data4[:, 0] = data4[:, 1]\n",
65 | "data4[:, 1] = temp\n",
66 | "data4[:,0] = (2 * (data4[:, 0]-(-1))/(2))-1\n",
67 | "data4[:,1] = (2 * (data4[:, 1]-(-1))/(3))-1\n",
68 | "\n",
69 | "target4 = (30*data1[:,0])-(16*data1[:,1]) - (1.5*data1[:,0]*data1[:,1]) + np.random.normal(0,1,size=sample_size)"
70 | ]
71 | },
72 | {
73 | "cell_type": "code",
74 | "execution_count": 4,
75 | "metadata": {
76 | "collapsed": true
77 | },
78 | "outputs": [],
79 | "source": [
80 | "data = np.concatenate((data1, data2, data3, data4), axis=0)\n",
81 | "target = np.concatenate((target1,target2,target3,target4),axis=0)\n",
82 | "alpha = np.concatenate((np.zeros(sample_size),np.ones(sample_size),np.zeros(sample_size),np.ones(sample_size)), axis=0)\n",
83 | "beta = np.concatenate((np.zeros(sample_size),np.zeros(sample_size),np.ones(sample_size),np.ones(sample_size)), axis=0)"
84 | ]
85 | },
86 | {
87 | "cell_type": "code",
88 | "execution_count": 5,
89 | "metadata": {
90 | "collapsed": true
91 | },
92 | "outputs": [],
93 | "source": [
94 | "data_frame = pd.DataFrame(data = data,columns=[\"x\",\"y\"])\n",
95 | "data_frame[\"alpha\"] = pd.Series(alpha).map(lambda v: 'ax01' if v==0 else 'ax02')\n",
96 | "data_frame[\"beta\"] = pd.Series(beta).map(lambda v: 'bx01' if v==0 else 'bx02')\n",
97 | "data_frame[\"target\"] = target"
98 | ]
99 | },
100 | {
101 | "cell_type": "code",
102 | "execution_count": 6,
103 | "metadata": {},
104 | "outputs": [
105 | {
106 | "data": {
107 | "text/html": [
108 | "
\n",
109 | "\n",
122 | "
\n",
123 | " \n",
124 | " \n",
125 | " | \n",
126 | " x | \n",
127 | " y | \n",
128 | " target | \n",
129 | "
\n",
130 | " \n",
131 | " \n",
132 | " \n",
133 | " count | \n",
134 | " 20000.000000 | \n",
135 | " 20000.000000 | \n",
136 | " 20000.000000 | \n",
137 | "
\n",
138 | " \n",
139 | " mean | \n",
140 | " 0.063032 | \n",
141 | " 0.061292 | \n",
142 | " 1.326481 | \n",
143 | "
\n",
144 | " \n",
145 | " std | \n",
146 | " 0.577148 | \n",
147 | " 0.577051 | \n",
148 | " 17.741681 | \n",
149 | "
\n",
150 | " \n",
151 | " min | \n",
152 | " -1.567981 | \n",
153 | " -1.578965 | \n",
154 | " -73.096282 | \n",
155 | "
\n",
156 | " \n",
157 | " 25% | \n",
158 | " -0.333928 | \n",
159 | " -0.334557 | \n",
160 | " -6.737629 | \n",
161 | "
\n",
162 | " \n",
163 | " 50% | \n",
164 | " 0.053508 | \n",
165 | " 0.053526 | \n",
166 | " 0.417512 | \n",
167 | "
\n",
168 | " \n",
169 | " 75% | \n",
170 | " 0.477157 | \n",
171 | " 0.475678 | \n",
172 | " 8.707335 | \n",
173 | "
\n",
174 | " \n",
175 | " max | \n",
176 | " 1.617511 | \n",
177 | " 1.724125 | \n",
178 | " 86.776134 | \n",
179 | "
\n",
180 | " \n",
181 | "
\n",
182 | "
"
183 | ],
184 | "text/plain": [
185 | " x y target\n",
186 | "count 20000.000000 20000.000000 20000.000000\n",
187 | "mean 0.063032 0.061292 1.326481\n",
188 | "std 0.577148 0.577051 17.741681\n",
189 | "min -1.567981 -1.578965 -73.096282\n",
190 | "25% -0.333928 -0.334557 -6.737629\n",
191 | "50% 0.053508 0.053526 0.417512\n",
192 | "75% 0.477157 0.475678 8.707335\n",
193 | "max 1.617511 1.724125 86.776134"
194 | ]
195 | },
196 | "execution_count": 6,
197 | "metadata": {},
198 | "output_type": "execute_result"
199 | }
200 | ],
201 | "source": [
202 | "data_frame.describe()"
203 | ]
204 | },
205 | {
206 | "cell_type": "code",
207 | "execution_count": 7,
208 | "metadata": {},
209 | "outputs": [
210 | {
211 | "name": "stdout",
212 | "output_type": "stream",
213 | "text": [
214 | "12000\n",
215 | "3000\n",
216 | "5000\n"
217 | ]
218 | }
219 | ],
220 | "source": [
221 | "distribution = ([0] * sample_size) + ([1] * sample_size) + ([2] * sample_size) + ([3] * sample_size)\n",
222 | "\n",
223 | "splitter = model_selection.StratifiedShuffleSplit(n_splits=1, test_size=0.25, random_state=0)\n",
224 | "splits = list(splitter.split(X=data_frame.iloc[:,[0,1,2,3]],y=distribution))\n",
225 | "learn_index = splits[0][0]\n",
226 | "test_index = splits[0][1]\n",
227 | "\n",
228 | "learn_df = data_frame.iloc[learn_index,:]\n",
229 | "\n",
230 | "size2 = int(len(learn_df)/4)\n",
231 | "distribution2 = ([0] * size2) + ([1] * size2) + ([2] * size2) + ([3] * size2)\n",
232 | "\n",
233 | "\n",
234 | "splitter = model_selection.StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=0)\n",
235 | "splits = list(splitter.split(X=learn_df.iloc[:,[0,1,2,3]],y=distribution2))\n",
236 | "train_index = splits[0][0]\n",
237 | "valid_index = splits[0][1]\n",
238 | "\n",
239 | "\n",
240 | "train_df = learn_df.iloc[train_index,:]\n",
241 | "print(len(train_df))\n",
242 | "\n",
243 | "valid_df = learn_df.iloc[valid_index,:]\n",
244 | "print(len(valid_df))\n",
245 | "\n",
246 | "test_df = data_frame.iloc[test_index,:]\n",
247 | "print(len(test_df))\n"
248 | ]
249 | },
250 | {
251 | "cell_type": "code",
252 | "execution_count": 8,
253 | "metadata": {
254 | "collapsed": true
255 | },
256 | "outputs": [],
257 | "source": [
258 | "train_df.to_csv(path_or_buf=\"data/train-data.csv\", header=False, index=True)\n",
259 | "valid_df.to_csv(path_or_buf=\"data/valid-data.csv\", header=False, index=True)\n",
260 | "test_df.to_csv(path_or_buf=\"data/test-data.csv\", header=False, index=True)"
261 | ]
262 | },
263 | {
264 | "cell_type": "code",
265 | "execution_count": null,
266 | "metadata": {
267 | "collapsed": true
268 | },
269 | "outputs": [],
270 | "source": []
271 | }
272 | ],
273 | "metadata": {
274 | "kernelspec": {
275 | "display_name": "Python 3",
276 | "language": "python",
277 | "name": "python3"
278 | },
279 | "language_info": {
280 | "codemirror_mode": {
281 | "name": "ipython",
282 | "version": 3
283 | },
284 | "file_extension": ".py",
285 | "mimetype": "text/x-python",
286 | "name": "python",
287 | "nbconvert_exporter": "python",
288 | "pygments_lexer": "ipython3",
289 | "version": "3.6.1"
290 | }
291 | },
292 | "nbformat": 4,
293 | "nbformat_minor": 2
294 | }
295 |
--------------------------------------------------------------------------------
/01 - Regression/02.0 - TF Regression Model - Estimator APIs + Pandas.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [
8 | {
9 | "name": "stderr",
10 | "output_type": "stream",
11 | "text": [
12 | "/Users/khalidsalama/anaconda/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6\n",
13 | " return f(*args, **kwds)\n"
14 | ]
15 | },
16 | {
17 | "name": "stdout",
18 | "output_type": "stream",
19 | "text": [
20 | "1.4.0\n"
21 | ]
22 | }
23 | ],
24 | "source": [
25 | "import tensorflow as tf\n",
26 | "import pandas as pd\n",
27 | "import numpy as np\n",
28 | "import shutil\n",
29 | "import math\n",
30 | "import multiprocessing\n",
31 | "from datetime import datetime\n",
32 | "from tensorflow.python.feature_column import feature_column\n",
33 | "print(tf.__version__)"
34 | ]
35 | },
36 | {
37 | "cell_type": "markdown",
38 | "metadata": {},
39 | "source": [
40 | "## Steps to use the TF Estimator APIs\n",
41 | "1. Define dataset **metadata**\n",
42 | "2. Define **data input function** to read the data from Pandas dataframe + **apply feature processing**\n",
43 | "3. Create TF **feature columns** based on metadata + **extended feature columns**\n",
44 | "4. Instantiate an **estimator** with the required **feature columns & parameters**\n",
45 | "5. **Train** estimator using training data\n",
46 | "6. **Evaluate** estimator using test data\n",
47 | "7. Perform **predictions**\n",
48 | "8. **Save & Serve** the estimator"
49 | ]
50 | },
51 | {
52 | "cell_type": "code",
53 | "execution_count": 2,
54 | "metadata": {
55 | "collapsed": true
56 | },
57 | "outputs": [],
58 | "source": [
59 | "MODEL_NAME = 'reg-model-01'\n",
60 | "\n",
61 | "TRAIN_DATA_FILE = 'data/train-data.csv'\n",
62 | "VALID_DATA_FILE = 'data/valid-data.csv'\n",
63 | "TEST_DATA_FILE = 'data/test-data.csv'\n",
64 | "\n",
65 | "RESUME_TRAINING = False\n",
66 | "PROCESS_FEATURES = True\n",
67 | "MULTI_THREADING = False"
68 | ]
69 | },
70 | {
71 | "cell_type": "markdown",
72 | "metadata": {},
73 | "source": [
74 | "## 1. Define Dataset Metadata\n",
75 | "* CSV file header and defaults\n",
76 | "* Numeric and categorical feature names\n",
77 | "* Target feature name\n",
78 | "* Unused columns"
79 | ]
80 | },
81 | {
82 | "cell_type": "code",
83 | "execution_count": 3,
84 | "metadata": {},
85 | "outputs": [
86 | {
87 | "name": "stdout",
88 | "output_type": "stream",
89 | "text": [
90 | "Header: ['key', 'x', 'y', 'alpha', 'beta', 'target']\n",
91 | "Numeric Features: ['x', 'y']\n",
92 | "Categorical Features: ['alpha', 'beta']\n",
93 | "Target: target\n",
94 | "Unused Features: ['key']\n"
95 | ]
96 | }
97 | ],
98 | "source": [
99 | "HEADER = ['key','x','y','alpha','beta','target']\n",
100 | "HEADER_DEFAULTS = [[0], [0.0], [0.0], ['NA'], ['NA'], [0.0]]\n",
101 | "\n",
102 | "NUMERIC_FEATURE_NAMES = ['x', 'y'] \n",
103 | "\n",
104 | "CATEGORICAL_FEATURE_NAMES_WITH_VOCABULARY = {'alpha':['ax01', 'ax02'], 'beta':['bx01', 'bx02']}\n",
105 | "CATEGORICAL_FEATURE_NAMES = list(CATEGORICAL_FEATURE_NAMES_WITH_VOCABULARY.keys())\n",
106 | "\n",
107 | "FEATURE_NAMES = NUMERIC_FEATURE_NAMES + CATEGORICAL_FEATURE_NAMES\n",
108 | "\n",
109 | "TARGET_NAME = 'target'\n",
110 | "\n",
111 | "UNUSED_FEATURE_NAMES = list(set(HEADER) - set(FEATURE_NAMES) - {TARGET_NAME})\n",
112 | "\n",
113 | "print(\"Header: {}\".format(HEADER))\n",
114 | "print(\"Numeric Features: {}\".format(NUMERIC_FEATURE_NAMES))\n",
115 | "print(\"Categorical Features: {}\".format(CATEGORICAL_FEATURE_NAMES))\n",
116 | "print(\"Target: {}\".format(TARGET_NAME))\n",
117 | "print(\"Unused Features: {}\".format(UNUSED_FEATURE_NAMES))"
118 | ]
119 | },
120 | {
121 | "cell_type": "markdown",
122 | "metadata": {},
123 | "source": [
124 | "## 2. Define Data Input Function\n",
125 | "* Input csv file name\n",
126 | "* Load pandas Dataframe\n",
127 | "* Apply feature processing\n",
128 | "* Return a function that returns (features, target) tensors"
129 | ]
130 | },
131 | {
132 | "cell_type": "code",
133 | "execution_count": 4,
134 | "metadata": {
135 | "collapsed": true
136 | },
137 | "outputs": [],
138 | "source": [
139 | "def process_dataframe(dataset_df):\n",
140 | " \n",
141 | " dataset_df[\"x_2\"] = np.square(dataset_df['x'])\n",
142 | " dataset_df[\"y_2\"] = np.square(dataset_df['y'])\n",
143 | " dataset_df[\"xy\"] = dataset_df['x'] * dataset_df['y']\n",
144 | " dataset_df['dist_xy'] = np.sqrt(np.square(dataset_df['x']-dataset_df['y']))\n",
145 | " \n",
146 | " return dataset_df\n",
147 | "\n",
148 | "def generate_pandas_input_fn(file_name, mode=tf.estimator.ModeKeys.EVAL,\n",
149 | " skip_header_lines=0,\n",
150 | " num_epochs=1,\n",
151 | " batch_size=100):\n",
152 | "\n",
153 | " df_dataset = pd.read_csv(file_name, names=HEADER, skiprows=skip_header_lines)\n",
154 | " \n",
155 | " x = df_dataset[FEATURE_NAMES].copy()\n",
156 | " if PROCESS_FEATURES:\n",
157 | " x = process_dataframe(x)\n",
158 | " \n",
159 | " y = df_dataset[TARGET_NAME]\n",
160 | " \n",
161 | " shuffle = True if mode == tf.estimator.ModeKeys.TRAIN else False\n",
162 | " \n",
163 | " num_threads=1\n",
164 | " \n",
165 | " if MULTI_THREADING:\n",
166 | " num_threads=multiprocessing.cpu_count()\n",
167 | " num_epochs = int(num_epochs/num_threads) if mode == tf.estimator.ModeKeys.TRAIN else num_epochs\n",
168 | " \n",
169 | " pandas_input_fn = tf.estimator.inputs.pandas_input_fn(\n",
170 | " batch_size=batch_size,\n",
171 | " num_epochs= num_epochs,\n",
172 | " shuffle=shuffle,\n",
173 | " x=x,\n",
174 | " y=y,\n",
175 | " target_column=TARGET_NAME\n",
176 | " )\n",
177 | " \n",
178 | " print(\"\")\n",
179 | " print(\"* data input_fn:\")\n",
180 | " print(\"================\")\n",
181 | " print(\"Input file: {}\".format(file_name))\n",
182 | " print(\"Dataset size: {}\".format(len(df_dataset)))\n",
183 | " print(\"Batch size: {}\".format(batch_size))\n",
184 | " print(\"Epoch Count: {}\".format(num_epochs))\n",
185 | " print(\"Mode: {}\".format(mode))\n",
186 | " print(\"Thread Count: {}\".format(num_threads))\n",
187 | " print(\"Shuffle: {}\".format(shuffle))\n",
188 | " print(\"================\")\n",
189 | " print(\"\")\n",
190 | " \n",
191 | " return pandas_input_fn"
192 | ]
193 | },
194 | {
195 | "cell_type": "code",
196 | "execution_count": 5,
197 | "metadata": {},
198 | "outputs": [
199 | {
200 | "name": "stdout",
201 | "output_type": "stream",
202 | "text": [
203 | "\n",
204 | "* data input_fn:\n",
205 | "================\n",
206 | "Input file: data/train-data.csv\n",
207 | "Dataset size: 12000\n",
208 | "Batch size: 100\n",
209 | "Epoch Count: 1\n",
210 | "Mode: eval\n",
211 | "Thread Count: 1\n",
212 | "Shuffle: False\n",
213 | "================\n",
214 | "\n",
215 | "Feature read from DataFrame: ['x', 'y', 'alpha', 'beta', 'x_2', 'y_2', 'xy', 'dist_xy']\n",
216 | "Target read from DataFrame: Tensor(\"fifo_queue_DequeueUpTo:9\", shape=(?,), dtype=float64)\n"
217 | ]
218 | }
219 | ],
220 | "source": [
221 | "features, target = generate_pandas_input_fn(file_name=TRAIN_DATA_FILE)()\n",
222 | "print(\"Feature read from DataFrame: {}\".format(list(features.keys())))\n",
223 | "print(\"Target read from DataFrame: {}\".format(target))"
224 | ]
225 | },
226 | {
227 | "cell_type": "markdown",
228 | "metadata": {},
229 | "source": [
230 | "## 3. Define Feature Columns\n",
231 | "The input numeric columns are assumed to be normalized (or have the same scale). Otherwise, a normlizer_fn, along with the normlisation params (mean, stdv or min, max) should be passed to tf.feature_column.numeric_column() constructor."
232 | ]
233 | },
234 | {
235 | "cell_type": "code",
236 | "execution_count": 6,
237 | "metadata": {},
238 | "outputs": [
239 | {
240 | "name": "stdout",
241 | "output_type": "stream",
242 | "text": [
243 | "Feature Columns: {'x': _NumericColumn(key='x', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), 'y': _NumericColumn(key='y', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), 'x_2': _NumericColumn(key='x_2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), 'y_2': _NumericColumn(key='y_2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), 'xy': _NumericColumn(key='xy', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), 'dist_xy': _NumericColumn(key='dist_xy', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), 'alpha': _VocabularyListCategoricalColumn(key='alpha', vocabulary_list=('ax01', 'ax02'), dtype=tf.string, default_value=-1, num_oov_buckets=0), 'beta': _VocabularyListCategoricalColumn(key='beta', vocabulary_list=('bx01', 'bx02'), dtype=tf.string, default_value=-1, num_oov_buckets=0), 'alpha_X_beta': _CrossedColumn(keys=(_VocabularyListCategoricalColumn(key='alpha', vocabulary_list=('ax01', 'ax02'), dtype=tf.string, default_value=-1, num_oov_buckets=0), _VocabularyListCategoricalColumn(key='beta', vocabulary_list=('bx01', 'bx02'), dtype=tf.string, default_value=-1, num_oov_buckets=0)), hash_bucket_size=4, hash_key=None)}\n"
244 | ]
245 | }
246 | ],
247 | "source": [
248 | "def get_feature_columns():\n",
249 | " \n",
250 | " \n",
251 | " all_numeric_feature_names = NUMERIC_FEATURE_NAMES\n",
252 | " \n",
253 | " CONSTRUCTED_NUMERIC_FEATURES_NAMES = ['x_2', 'y_2', 'xy', 'dist_xy']\n",
254 | " \n",
255 | " if PROCESS_FEATURES:\n",
256 | " all_numeric_feature_names += CONSTRUCTED_NUMERIC_FEATURES_NAMES\n",
257 | "\n",
258 | " numeric_columns = {feature_name: tf.feature_column.numeric_column(feature_name)\n",
259 | " for feature_name in all_numeric_feature_names}\n",
260 | "\n",
261 | " categorical_column_with_vocabulary = \\\n",
262 | " {item[0]: tf.feature_column.categorical_column_with_vocabulary_list(item[0], item[1])\n",
263 | " for item in CATEGORICAL_FEATURE_NAMES_WITH_VOCABULARY.items()}\n",
264 | " \n",
265 | " feature_columns = {}\n",
266 | "\n",
267 | " if numeric_columns is not None:\n",
268 | " feature_columns.update(numeric_columns)\n",
269 | "\n",
270 | " if categorical_column_with_vocabulary is not None:\n",
271 | " feature_columns.update(categorical_column_with_vocabulary)\n",
272 | " \n",
273 | " # add extended features (crossing, bucektization, embedding)\n",
274 | " \n",
275 | " feature_columns['alpha_X_beta'] = tf.feature_column.crossed_column(\n",
276 | " [feature_columns['alpha'], feature_columns['beta']], 4)\n",
277 | " \n",
278 | " return feature_columns\n",
279 | "\n",
280 | "feature_columns = get_feature_columns()\n",
281 | "print(\"Feature Columns: {}\".format(feature_columns))"
282 | ]
283 | },
284 | {
285 | "cell_type": "markdown",
286 | "metadata": {},
287 | "source": [
288 | "## 4. Create an Estimator"
289 | ]
290 | },
291 | {
292 | "cell_type": "markdown",
293 | "metadata": {},
294 | "source": [
295 | "### a. Define an Estimator Creation Function\n",
296 | "\n",
297 | "* Get dense (numeric) columns from the feature columns\n",
298 | "* Convert categorical columns to indicator columns\n",
299 | "* Create Instantiate a DNNRegressor estimator given **dense + indicator** feature columns + params"
300 | ]
301 | },
302 | {
303 | "cell_type": "code",
304 | "execution_count": 7,
305 | "metadata": {
306 | "collapsed": true
307 | },
308 | "outputs": [],
309 | "source": [
310 | "def create_estimator(run_config, hparams):\n",
311 | " \n",
312 | " feature_columns = list(get_feature_columns().values())\n",
313 | " \n",
314 | " dense_columns = list(\n",
315 | " filter(lambda column: isinstance(column, feature_column._NumericColumn),\n",
316 | " feature_columns\n",
317 | " )\n",
318 | " )\n",
319 | "\n",
320 | " categorical_columns = list(\n",
321 | " filter(lambda column: isinstance(column, feature_column._VocabularyListCategoricalColumn) |\n",
322 | " isinstance(column, feature_column._BucketizedColumn),\n",
323 | " feature_columns)\n",
324 | " )\n",
325 | "\n",
326 | " indicator_columns = list(\n",
327 | " map(lambda column: tf.feature_column.indicator_column(column),\n",
328 | " categorical_columns)\n",
329 | " )\n",
330 | " \n",
331 | " \n",
332 | " estimator_feature_columns = dense_columns + indicator_columns \n",
333 | " \n",
334 | " estimator = tf.estimator.DNNRegressor(\n",
335 | " \n",
336 | " feature_columns= estimator_feature_columns,\n",
337 | " hidden_units= hparams.hidden_units,\n",
338 | " \n",
339 | " optimizer= tf.train.AdamOptimizer(),\n",
340 | " activation_fn= tf.nn.elu,\n",
341 | " dropout= hparams.dropout_prob,\n",
342 | " \n",
343 | " config= run_config\n",
344 | " )\n",
345 | " \n",
346 | " print(\"\")\n",
347 | " print(\"Estimator Type: {}\".format(type(estimator)))\n",
348 | " print(\"\")\n",
349 | " \n",
350 | " return estimator"
351 | ]
352 | },
353 | {
354 | "cell_type": "markdown",
355 | "metadata": {},
356 | "source": [
357 | "### b. Set hyper-parameter values (HParams)"
358 | ]
359 | },
360 | {
361 | "cell_type": "code",
362 | "execution_count": 8,
363 | "metadata": {},
364 | "outputs": [
365 | {
366 | "name": "stdout",
367 | "output_type": "stream",
368 | "text": [
369 | "Model directory: trained_models/reg-model-01\n",
370 | "Hyper-parameters: [('batch_size', 500), ('dropout_prob', 0.0), ('hidden_units', [8, 4]), ('num_epochs', 100)]\n"
371 | ]
372 | }
373 | ],
374 | "source": [
375 | "hparams = tf.contrib.training.HParams(\n",
376 | " num_epochs = 100,\n",
377 | " batch_size = 500,\n",
378 | " hidden_units=[8, 4], \n",
379 | " dropout_prob = 0.0)\n",
380 | "\n",
381 | "\n",
382 | "model_dir = 'trained_models/{}'.format(MODEL_NAME)\n",
383 | "\n",
384 | "run_config = tf.estimator.RunConfig().replace(model_dir=model_dir)\n",
385 | "print(\"Model directory: {}\".format(run_config.model_dir))\n",
386 | "print(\"Hyper-parameters: {}\".format(hparams))"
387 | ]
388 | },
389 | {
390 | "cell_type": "markdown",
391 | "metadata": {},
392 | "source": [
393 | "### c. Instantiate the estimator "
394 | ]
395 | },
396 | {
397 | "cell_type": "code",
398 | "execution_count": 9,
399 | "metadata": {},
400 | "outputs": [
401 | {
402 | "name": "stdout",
403 | "output_type": "stream",
404 | "text": [
405 | "INFO:tensorflow:Using config: {'_model_dir': 'trained_models/reg-model-01', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': , '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}\n",
406 | "\n",
407 | "Estimator Type: \n",
408 | "\n"
409 | ]
410 | }
411 | ],
412 | "source": [
413 | "estimator = create_estimator(run_config, hparams)"
414 | ]
415 | },
416 | {
417 | "cell_type": "markdown",
418 | "metadata": {},
419 | "source": [
420 | "## 5. Train the Estimator"
421 | ]
422 | },
423 | {
424 | "cell_type": "code",
425 | "execution_count": 10,
426 | "metadata": {},
427 | "outputs": [
428 | {
429 | "name": "stdout",
430 | "output_type": "stream",
431 | "text": [
432 | "\n",
433 | "* data input_fn:\n",
434 | "================\n",
435 | "Input file: data/train-data.csv\n",
436 | "Dataset size: 12000\n",
437 | "Batch size: 500\n",
438 | "Epoch Count: 100\n",
439 | "Mode: train\n",
440 | "Thread Count: 1\n",
441 | "Shuffle: True\n",
442 | "================\n",
443 | "\n",
444 | "Estimator training started at 19:19:12\n",
445 | ".......................................\n",
446 | "INFO:tensorflow:Create CheckpointSaverHook.\n",
447 | "INFO:tensorflow:Saving checkpoints for 1 into trained_models/reg-model-01/model.ckpt.\n",
448 | "INFO:tensorflow:loss = 179225.0, step = 1\n",
449 | "INFO:tensorflow:global_step/sec: 166.515\n",
450 | "INFO:tensorflow:loss = 124778.0, step = 101 (0.602 sec)\n",
451 | "INFO:tensorflow:global_step/sec: 182.042\n",
452 | "INFO:tensorflow:loss = 144432.0, step = 201 (0.550 sec)\n",
453 | "INFO:tensorflow:global_step/sec: 221.401\n",
454 | "INFO:tensorflow:loss = 167542.0, step = 301 (0.451 sec)\n",
455 | "INFO:tensorflow:global_step/sec: 208.414\n",
456 | "INFO:tensorflow:loss = 146349.0, step = 401 (0.480 sec)\n",
457 | "INFO:tensorflow:global_step/sec: 216.184\n",
458 | "INFO:tensorflow:loss = 148680.0, step = 501 (0.462 sec)\n",
459 | "INFO:tensorflow:global_step/sec: 217.155\n",
460 | "INFO:tensorflow:loss = 123907.0, step = 601 (0.460 sec)\n",
461 | "INFO:tensorflow:global_step/sec: 209.701\n",
462 | "INFO:tensorflow:loss = 113046.0, step = 701 (0.477 sec)\n",
463 | "INFO:tensorflow:global_step/sec: 168.637\n",
464 | "INFO:tensorflow:loss = 107878.0, step = 801 (0.594 sec)\n",
465 | "INFO:tensorflow:global_step/sec: 126.787\n",
466 | "INFO:tensorflow:loss = 118305.0, step = 901 (0.788 sec)\n",
467 | "INFO:tensorflow:global_step/sec: 138.261\n",
468 | "INFO:tensorflow:loss = 101507.0, step = 1001 (0.723 sec)\n",
469 | "INFO:tensorflow:global_step/sec: 162.629\n",
470 | "INFO:tensorflow:loss = 106166.0, step = 1101 (0.616 sec)\n",
471 | "INFO:tensorflow:global_step/sec: 210.706\n",
472 | "INFO:tensorflow:loss = 107934.0, step = 1201 (0.474 sec)\n",
473 | "INFO:tensorflow:global_step/sec: 175.23\n",
474 | "INFO:tensorflow:loss = 98094.9, step = 1301 (0.571 sec)\n",
475 | "INFO:tensorflow:global_step/sec: 176.572\n",
476 | "INFO:tensorflow:loss = 89144.2, step = 1401 (0.566 sec)\n",
477 | "INFO:tensorflow:global_step/sec: 177.678\n",
478 | "INFO:tensorflow:loss = 104465.0, step = 1501 (0.563 sec)\n",
479 | "INFO:tensorflow:global_step/sec: 183.081\n",
480 | "INFO:tensorflow:loss = 92220.2, step = 1601 (0.546 sec)\n",
481 | "INFO:tensorflow:global_step/sec: 218.108\n",
482 | "INFO:tensorflow:loss = 79086.9, step = 1701 (0.458 sec)\n",
483 | "INFO:tensorflow:global_step/sec: 138.97\n",
484 | "INFO:tensorflow:loss = 93577.3, step = 1801 (0.724 sec)\n",
485 | "INFO:tensorflow:global_step/sec: 145.418\n",
486 | "INFO:tensorflow:loss = 75269.3, step = 1901 (0.684 sec)\n",
487 | "INFO:tensorflow:global_step/sec: 181.944\n",
488 | "INFO:tensorflow:loss = 73518.7, step = 2001 (0.549 sec)\n",
489 | "INFO:tensorflow:global_step/sec: 165.012\n",
490 | "INFO:tensorflow:loss = 75916.3, step = 2101 (0.607 sec)\n",
491 | "INFO:tensorflow:global_step/sec: 130.054\n",
492 | "INFO:tensorflow:loss = 65138.1, step = 2201 (0.768 sec)\n",
493 | "INFO:tensorflow:global_step/sec: 128.839\n",
494 | "INFO:tensorflow:loss = 65868.5, step = 2301 (0.777 sec)\n",
495 | "INFO:tensorflow:Saving checkpoints for 2400 into trained_models/reg-model-01/model.ckpt.\n",
496 | "INFO:tensorflow:Loss for final step: 88071.1.\n",
497 | ".......................................\n",
498 | "Estimator training finished at 19:19:30\n",
499 | "\n",
500 | "Estimator training elapsed time: 17.686301 seconds\n"
501 | ]
502 | }
503 | ],
504 | "source": [
505 | "train_input_fn = generate_pandas_input_fn(file_name= TRAIN_DATA_FILE, \n",
506 | " mode=tf.estimator.ModeKeys.TRAIN,\n",
507 | " num_epochs=hparams.num_epochs,\n",
508 | " batch_size=hparams.batch_size) \n",
509 | "\n",
510 | "if not RESUME_TRAINING:\n",
511 | " shutil.rmtree(model_dir, ignore_errors=True)\n",
512 | " \n",
513 | "tf.logging.set_verbosity(tf.logging.INFO)\n",
514 | "\n",
515 | "time_start = datetime.utcnow() \n",
516 | "print(\"Estimator training started at {}\".format(time_start.strftime(\"%H:%M:%S\")))\n",
517 | "print(\".......................................\")\n",
518 | "\n",
519 | "estimator.train(input_fn = train_input_fn)\n",
520 | "\n",
521 | "time_end = datetime.utcnow() \n",
522 | "print(\".......................................\")\n",
523 | "print(\"Estimator training finished at {}\".format(time_end.strftime(\"%H:%M:%S\")))\n",
524 | "print(\"\")\n",
525 | "time_elapsed = time_end - time_start\n",
526 | "print(\"Estimator training elapsed time: {} seconds\".format(time_elapsed.total_seconds()))\n"
527 | ]
528 | },
529 | {
530 | "cell_type": "markdown",
531 | "metadata": {},
532 | "source": [
533 | "## 6. Evaluate the Model"
534 | ]
535 | },
536 | {
537 | "cell_type": "code",
538 | "execution_count": 11,
539 | "metadata": {},
540 | "outputs": [
541 | {
542 | "name": "stdout",
543 | "output_type": "stream",
544 | "text": [
545 | "\n",
546 | "* data input_fn:\n",
547 | "================\n",
548 | "Input file: data/test-data.csv\n",
549 | "Dataset size: 5000\n",
550 | "Batch size: 5000\n",
551 | "Epoch Count: 1\n",
552 | "Mode: eval\n",
553 | "Thread Count: 1\n",
554 | "Shuffle: False\n",
555 | "================\n",
556 | "\n",
557 | "INFO:tensorflow:Starting evaluation at 2017-11-14-19:19:30\n",
558 | "INFO:tensorflow:Restoring parameters from trained_models/reg-model-01/model.ckpt-2400\n",
559 | "INFO:tensorflow:Finished evaluation at 2017-11-14-19:19:31\n",
560 | "INFO:tensorflow:Saving dict for global step 2400: average_loss = 164.862, global_step = 2400, loss = 824311.0\n",
561 | "\n",
562 | "{'average_loss': 164.86218, 'loss': 824310.88, 'global_step': 2400}\n",
563 | "\n",
564 | "RMSE: 12.83987\n"
565 | ]
566 | }
567 | ],
568 | "source": [
569 | "TEST_SIZE = 5000\n",
570 | "\n",
571 | "test_input_fn = generate_pandas_input_fn(file_name=TEST_DATA_FILE, \n",
572 | " mode= tf.estimator.ModeKeys.EVAL,\n",
573 | " batch_size= TEST_SIZE)\n",
574 | "\n",
575 | "results = estimator.evaluate(input_fn=test_input_fn)\n",
576 | "print(\"\")\n",
577 | "print(results)\n",
578 | "rmse = round(math.sqrt(results[\"average_loss\"]),5)\n",
579 | "print(\"\")\n",
580 | "print(\"RMSE: {}\".format(rmse))"
581 | ]
582 | },
583 | {
584 | "cell_type": "markdown",
585 | "metadata": {},
586 | "source": [
587 | "## 7. Prediction"
588 | ]
589 | },
590 | {
591 | "cell_type": "code",
592 | "execution_count": 12,
593 | "metadata": {},
594 | "outputs": [
595 | {
596 | "name": "stdout",
597 | "output_type": "stream",
598 | "text": [
599 | "\n",
600 | "* data input_fn:\n",
601 | "================\n",
602 | "Input file: data/test-data.csv\n",
603 | "Dataset size: 5000\n",
604 | "Batch size: 5\n",
605 | "Epoch Count: 1\n",
606 | "Mode: infer\n",
607 | "Thread Count: 1\n",
608 | "Shuffle: False\n",
609 | "================\n",
610 | "\n",
611 | "INFO:tensorflow:Restoring parameters from trained_models/reg-model-01/model.ckpt-2400\n",
612 | "\n",
613 | "Predicted Values: [13.141397, -5.9562521, 11.541443, 3.8178449, 2.1242597]\n"
614 | ]
615 | }
616 | ],
617 | "source": [
618 | "import itertools\n",
619 | "\n",
620 | "predict_input_fn = generate_pandas_input_fn(file_name=TEST_DATA_FILE, \n",
621 | " mode= tf.estimator.ModeKeys.PREDICT,\n",
622 | " batch_size= 5)\n",
623 | "\n",
624 | "predictions = estimator.predict(input_fn=predict_input_fn)\n",
625 | "values = list(map(lambda item: item[\"predictions\"][0],list(itertools.islice(predictions, 5))))\n",
626 | "print()\n",
627 | "print(\"Predicted Values: {}\".format(values))"
628 | ]
629 | },
630 | {
631 | "cell_type": "markdown",
632 | "metadata": {},
633 | "source": [
634 | "## 8. Save & Serve the Model"
635 | ]
636 | },
637 | {
638 | "cell_type": "markdown",
639 | "metadata": {},
640 | "source": [
641 | "### a. Define Seving Function"
642 | ]
643 | },
644 | {
645 | "cell_type": "code",
646 | "execution_count": 1,
647 | "metadata": {
648 | "collapsed": true
649 | },
650 | "outputs": [],
651 | "source": [
652 | "def process_features(features):\n",
653 | " \n",
654 | " features[\"x_2\"] = tf.square(features['x'])\n",
655 | " features[\"y_2\"] = tf.square(features['y'])\n",
656 | " features[\"xy\"] = tf.multiply(features['x'], features['y'])\n",
657 | " features['dist_xy'] = tf.sqrt(tf.squared_difference(features['x'],features['y']))\n",
658 | " \n",
659 | " return features\n",
660 | "\n",
661 | "def csv_serving_input_fn():\n",
662 | " \n",
663 | " SERVING_HEADER = ['x','y','alpha','beta']\n",
664 | " SERVING_HEADER_DEFAULTS = [[0.0], [0.0], ['NA'], ['NA']]\n",
665 | "\n",
666 | " rows_string_tensor = tf.placeholder(dtype=tf.string,\n",
667 | " shape=[None],\n",
668 | " name='csv_rows')\n",
669 | " \n",
670 | " receiver_tensor = {'csv_rows': rows_string_tensor}\n",
671 | "\n",
672 | " row_columns = tf.expand_dims(rows_string_tensor, -1)\n",
673 | " columns = tf.decode_csv(row_columns, record_defaults=SERVING_HEADER_DEFAULTS)\n",
674 | " features = dict(zip(SERVING_HEADER, columns))\n",
675 | " \n",
676 | " if PROCESS_FEATURES:\n",
677 | " features = process_features(features)\n",
678 | "\n",
679 | " return tf.estimator.export.ServingInputReceiver(\n",
680 | " features, receiver_tensor)"
681 | ]
682 | },
683 | {
684 | "cell_type": "markdown",
685 | "metadata": {},
686 | "source": [
687 | "### b. Export SavedModel"
688 | ]
689 | },
690 | {
691 | "cell_type": "code",
692 | "execution_count": 31,
693 | "metadata": {},
694 | "outputs": [
695 | {
696 | "name": "stdout",
697 | "output_type": "stream",
698 | "text": [
699 | "INFO:tensorflow:Restoring parameters from trained_models/reg-model-01/model.ckpt-2400\n",
700 | "INFO:tensorflow:Assets added to graph.\n",
701 | "INFO:tensorflow:No assets to write.\n",
702 | "INFO:tensorflow:SavedModel written to: b\"trained_models/reg-model-01/export/temp-b'1510688109'/saved_model.pbtxt\"\n"
703 | ]
704 | },
705 | {
706 | "data": {
707 | "text/plain": [
708 | "b'trained_models/reg-model-01/export/1510688109'"
709 | ]
710 | },
711 | "execution_count": 31,
712 | "metadata": {},
713 | "output_type": "execute_result"
714 | }
715 | ],
716 | "source": [
717 | "export_dir = model_dir + \"/export\"\n",
718 | "\n",
719 | "estimator.export_savedmodel(\n",
720 | " export_dir_base = export_dir,\n",
721 | " serving_input_receiver_fn = csv_serving_input_fn,\n",
722 | " as_text=True\n",
723 | ")\n"
724 | ]
725 | },
726 | {
727 | "cell_type": "markdown",
728 | "metadata": {},
729 | "source": [
730 | "### c. Serve the Saved Model"
731 | ]
732 | },
733 | {
734 | "cell_type": "code",
735 | "execution_count": 35,
736 | "metadata": {},
737 | "outputs": [
738 | {
739 | "name": "stdout",
740 | "output_type": "stream",
741 | "text": [
742 | "trained_models/reg-model-01/export/1510688109\n",
743 | "INFO:tensorflow:Restoring parameters from b'trained_models/reg-model-01/export/1510688109/variables/variables'\n",
744 | "{'predictions': array([[ 13.15929985],\n",
745 | " [-13.96904373]], dtype=float32)}\n"
746 | ]
747 | }
748 | ],
749 | "source": [
750 | "import os\n",
751 | "\n",
752 | "saved_model_dir = export_dir + \"/\" + os.listdir(path=export_dir)[-1] \n",
753 | "\n",
754 | "print(saved_model_dir)\n",
755 | "\n",
756 | "predictor_fn = tf.contrib.predictor.from_saved_model(\n",
757 | " export_dir = saved_model_dir,\n",
758 | " signature_def_key=\"predict\"\n",
759 | ")\n",
760 | "\n",
761 | "output = predictor_fn({'csv_rows': [\"0.5,1,ax01,bx02\", \"-0.5,-1,ax02,bx02\"]})\n",
762 | "print(output)"
763 | ]
764 | },
765 | {
766 | "cell_type": "markdown",
767 | "metadata": {},
768 | "source": [
769 | "## What can we improve?\n",
770 | "\n",
771 | "* **Use data files instead of DataFrames** - pandas dataframes need to fit in memory, and hard to distribute. Working with (sharded) training data files allows reading records in batches (so we can work with large data set regardless the memory size), as well as supporting distributed training (data parallelism).\n",
772 | "\n",
773 | "\n",
774 | "* **Use Experiment APIs** - Experiment API knows how to invoke training and eval loops in a sensible fashion for local & distributed training.\n",
775 | "\n",
776 | "\n",
777 | "* ** Early Stopping** - Use the validation set evaluation to stop the training and avoid overfitting.\n"
778 | ]
779 | },
780 | {
781 | "cell_type": "code",
782 | "execution_count": null,
783 | "metadata": {
784 | "collapsed": true
785 | },
786 | "outputs": [],
787 | "source": []
788 | }
789 | ],
790 | "metadata": {
791 | "kernelspec": {
792 | "display_name": "Python 3",
793 | "language": "python",
794 | "name": "python3"
795 | },
796 | "language_info": {
797 | "codemirror_mode": {
798 | "name": "ipython",
799 | "version": 3
800 | },
801 | "file_extension": ".py",
802 | "mimetype": "text/x-python",
803 | "name": "python",
804 | "nbconvert_exporter": "python",
805 | "pygments_lexer": "ipython3",
806 | "version": "3.6.1"
807 | }
808 | },
809 | "nbformat": 4,
810 | "nbformat_minor": 2
811 | }
812 |
--------------------------------------------------------------------------------
/01 - Regression/06.0 - Convert CSV to TFRecords.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [
8 | {
9 | "name": "stderr",
10 | "output_type": "stream",
11 | "text": [
12 | "/Users/khalidsalama/anaconda/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6\n",
13 | " return f(*args, **kwds)\n"
14 | ]
15 | },
16 | {
17 | "name": "stdout",
18 | "output_type": "stream",
19 | "text": [
20 | "1.4.0\n"
21 | ]
22 | }
23 | ],
24 | "source": [
25 | "import tensorflow as tf\n",
26 | "import csv\n",
27 | "import os\n",
28 | "\n",
29 | "print(tf.__version__)"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 2,
35 | "metadata": {
36 | "collapsed": true
37 | },
38 | "outputs": [],
39 | "source": [
40 | "train_data_files = ['data/train-data.csv']\n",
41 | "valid_data_files = ['data/valid-data.csv']\n",
42 | "test_data_files = ['data/test-data.csv']"
43 | ]
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": 3,
48 | "metadata": {},
49 | "outputs": [
50 | {
51 | "name": "stdout",
52 | "output_type": "stream",
53 | "text": [
54 | "Header: ['key', 'x', 'y', 'alpha', 'beta', 'target']\n",
55 | "Numeric Features: ['x', 'y']\n",
56 | "Categorical Features: ['alpha', 'beta']\n",
57 | "Target: target\n",
58 | "Unused Features: ['key']\n"
59 | ]
60 | }
61 | ],
62 | "source": [
63 | "HEADER = ['key','x','y','alpha','beta','target']\n",
64 | "HEADER_DEFAULTS = [[0], [0.0], [0.0], ['NA'], ['NA'], [0.0]]\n",
65 | "\n",
66 | "NUMERIC_FEATURE_NAMES = ['x', 'y'] \n",
67 | "\n",
68 | "CATEGORICAL_FEATURE_NAMES_WITH_VOCABULARY = {'alpha':['ax01', 'ax02'], 'beta':['bx01', 'bx02']}\n",
69 | "CATEGORICAL_FEATURE_NAMES = list(CATEGORICAL_FEATURE_NAMES_WITH_VOCABULARY.keys())\n",
70 | "\n",
71 | "FEATURE_NAMES = NUMERIC_FEATURE_NAMES + CATEGORICAL_FEATURE_NAMES\n",
72 | "\n",
73 | "TARGET_NAME = 'target'\n",
74 | "\n",
75 | "UNUSED_FEATURE_NAMES = list(set(HEADER) - set(FEATURE_NAMES) - {TARGET_NAME})\n",
76 | "\n",
77 | "print(\"Header: {}\".format(HEADER))\n",
78 | "print(\"Numeric Features: {}\".format(NUMERIC_FEATURE_NAMES))\n",
79 | "print(\"Categorical Features: {}\".format(CATEGORICAL_FEATURE_NAMES))\n",
80 | "print(\"Target: {}\".format(TARGET_NAME))\n",
81 | "print(\"Unused Features: {}\".format(UNUSED_FEATURE_NAMES))"
82 | ]
83 | },
84 | {
85 | "cell_type": "code",
86 | "execution_count": 4,
87 | "metadata": {
88 | "collapsed": true
89 | },
90 | "outputs": [],
91 | "source": [
92 | "def create_csv_iterator(csv_file_path, skip_header):\n",
93 | " \n",
94 | " with tf.gfile.Open(csv_file_path) as csv_file:\n",
95 | " reader = csv.reader(csv_file)\n",
96 | " if skip_header: # Skip the header\n",
97 | " next(reader)\n",
98 | " for row in reader:\n",
99 | " yield row"
100 | ]
101 | },
102 | {
103 | "cell_type": "code",
104 | "execution_count": 5,
105 | "metadata": {
106 | "collapsed": true
107 | },
108 | "outputs": [],
109 | "source": [
110 | "def create_example(row):\n",
111 | " \"\"\"\n",
112 | " Returns a tensorflow.Example Protocol Buffer object.\n",
113 | " \"\"\"\n",
114 | " example = tf.train.Example()\n",
115 | "\n",
116 | " for i in range(len(HEADER)):\n",
117 | " \n",
118 | " feature_name = HEADER[i]\n",
119 | " feature_value = row[i]\n",
120 | " \n",
121 | " if feature_name in UNUSED_FEATURE_NAMES:\n",
122 | " continue\n",
123 | " \n",
124 | " if feature_name in NUMERIC_FEATURE_NAMES:\n",
125 | " example.features.feature[feature_name].float_list.value.extend([float(feature_value)])\n",
126 | " \n",
127 | " elif feature_name in CATEGORICAL_FEATURE_NAMES:\n",
128 | " example.features.feature[feature_name].bytes_list.value.extend([bytes(feature_value, 'utf-8')])\n",
129 | " \n",
130 | "\n",
131 | " elif feature_name in TARGET_NAME:\n",
132 | " example.features.feature[feature_name].float_list.value.extend([float(feature_value)])\n",
133 | "\n",
134 | " return example"
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": 6,
140 | "metadata": {
141 | "collapsed": true
142 | },
143 | "outputs": [],
144 | "source": [
145 | "def create_tfrecords_file(input_csv_file):\n",
146 | " \"\"\"\n",
147 | " Creates a TFRecords file for the given input data and\n",
148 | " example transofmration function\n",
149 | " \"\"\"\n",
150 | " output_tfrecord_file = input_csv_file.replace(\"csv\",\"tfrecords\")\n",
151 | " writer = tf.python_io.TFRecordWriter(output_tfrecord_file)\n",
152 | " \n",
153 | " print(\"Creating TFRecords file at\", output_tfrecord_file, \"...\")\n",
154 | " \n",
155 | " for i, row in enumerate(create_csv_iterator(input_csv_file, skip_header=False)):\n",
156 | " \n",
157 | " if len(row) == 0:\n",
158 | " continue\n",
159 | " \n",
160 | " example = create_example(row)\n",
161 | " content = example.SerializeToString()\n",
162 | " writer.write(content)\n",
163 | " \n",
164 | " writer.close()\n",
165 | " \n",
166 | " print(\"Finish Writing\", output_tfrecord_file)"
167 | ]
168 | },
169 | {
170 | "cell_type": "code",
171 | "execution_count": 7,
172 | "metadata": {},
173 | "outputs": [
174 | {
175 | "name": "stdout",
176 | "output_type": "stream",
177 | "text": [
178 | "Converting Training Data Files\n",
179 | "Creating TFRecords file at data/train-data.tfrecords ...\n",
180 | "Finish Writing data/train-data.tfrecords\n",
181 | "\n",
182 | "Converting Validation Data Files\n",
183 | "Creating TFRecords file at data/valid-data.tfrecords ...\n",
184 | "Finish Writing data/valid-data.tfrecords\n",
185 | "\n",
186 | "Converting Test Data Files\n",
187 | "Creating TFRecords file at data/test-data.tfrecords ...\n",
188 | "Finish Writing data/test-data.tfrecords\n"
189 | ]
190 | }
191 | ],
192 | "source": [
193 | "print(\"Converting Training Data Files\")\n",
194 | "for input_csv_file in train_data_files:\n",
195 | " create_tfrecords_file(input_csv_file)\n",
196 | "print(\"\")\n",
197 | "\n",
198 | "print(\"Converting Validation Data Files\")\n",
199 | "for input_csv_file in valid_data_files:\n",
200 | " create_tfrecords_file(input_csv_file)\n",
201 | "print(\"\")\n",
202 | "\n",
203 | "print(\"Converting Test Data Files\")\n",
204 | "for input_csv_file in test_data_files:\n",
205 | " create_tfrecords_file(input_csv_file)"
206 | ]
207 | },
208 | {
209 | "cell_type": "code",
210 | "execution_count": null,
211 | "metadata": {
212 | "collapsed": true
213 | },
214 | "outputs": [],
215 | "source": []
216 | }
217 | ],
218 | "metadata": {
219 | "kernelspec": {
220 | "display_name": "Python 3",
221 | "language": "python",
222 | "name": "python3"
223 | },
224 | "language_info": {
225 | "codemirror_mode": {
226 | "name": "ipython",
227 | "version": 3
228 | },
229 | "file_extension": ".py",
230 | "mimetype": "text/x-python",
231 | "name": "python",
232 | "nbconvert_exporter": "python",
233 | "pygments_lexer": "ipython3",
234 | "version": "3.6.1"
235 | }
236 | },
237 | "nbformat": 4,
238 | "nbformat_minor": 2
239 | }
240 |
--------------------------------------------------------------------------------
/01 - Regression/data/new-data.csv:
--------------------------------------------------------------------------------
1 | 1.3,-0.5,ax01,bx02
--------------------------------------------------------------------------------
/01 - Regression/data/new-data.json:
--------------------------------------------------------------------------------
1 | {"x": 1.3, "y": -0.5, "alpha": "ax01", "beta": "bx02"}
--------------------------------------------------------------------------------
/01 - Regression/data/test-data.tfrecords:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ksalama/tf-estimator-tutorials/cecfea0c378ebc8552941c9ebf8a530228dd845d/01 - Regression/data/test-data.tfrecords
--------------------------------------------------------------------------------
/01 - Regression/data/train-data.tfrecords:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ksalama/tf-estimator-tutorials/cecfea0c378ebc8552941c9ebf8a530228dd845d/01 - Regression/data/train-data.tfrecords
--------------------------------------------------------------------------------
/01 - Regression/data/valid-data.tfrecords:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ksalama/tf-estimator-tutorials/cecfea0c378ebc8552941c9ebf8a530228dd845d/01 - Regression/data/valid-data.tfrecords
--------------------------------------------------------------------------------
/02 - Classification/-- TensorBoard.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [
8 | {
9 | "name": "stdout",
10 | "output_type": "stream",
11 | "text": [
12 | "trained_models/class-model-01\n"
13 | ]
14 | }
15 | ],
16 | "source": [
17 | "MODEL_NAME = 'class-model-01'\n",
18 | "model_dir = 'trained_models/{}'.format(MODEL_NAME)\n",
19 | "print(model_dir)"
20 | ]
21 | },
22 | {
23 | "cell_type": "markdown",
24 | "metadata": {},
25 | "source": [
26 | "## Start TensorBoard Process"
27 | ]
28 | },
29 | {
30 | "cell_type": "code",
31 | "execution_count": null,
32 | "metadata": {
33 | "collapsed": true
34 | },
35 | "outputs": [],
36 | "source": [
37 | "from google.datalab.ml import TensorBoard\n",
38 | "TensorBoard().start(model_dir)\n",
39 | "TensorBoard().list()"
40 | ]
41 | },
42 | {
43 | "cell_type": "markdown",
44 | "metadata": {},
45 | "source": [
46 | "## Kill TensorBoard Process"
47 | ]
48 | },
49 | {
50 | "cell_type": "code",
51 | "execution_count": null,
52 | "metadata": {
53 | "collapsed": true
54 | },
55 | "outputs": [],
56 | "source": [
57 | "# to stop TensorBoard\n",
58 | "TensorBoard().stop(23002)\n",
59 | "print('stopped TensorBoard')\n",
60 | "TensorBoard().list()"
61 | ]
62 | }
63 | ],
64 | "metadata": {
65 | "kernelspec": {
66 | "display_name": "Python 3",
67 | "language": "python",
68 | "name": "python3"
69 | },
70 | "language_info": {
71 | "codemirror_mode": {
72 | "name": "ipython",
73 | "version": 3
74 | },
75 | "file_extension": ".py",
76 | "mimetype": "text/x-python",
77 | "name": "python",
78 | "nbconvert_exporter": "python",
79 | "pygments_lexer": "ipython3",
80 | "version": "3.6.1"
81 | }
82 | },
83 | "nbformat": 4,
84 | "nbformat_minor": 2
85 | }
86 |
--------------------------------------------------------------------------------
/02 - Classification/00.0 - TensorFlow Version Update.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [
8 | {
9 | "name": "stdout",
10 | "output_type": "stream",
11 | "text": [
12 | "Collecting tensorflow\n",
13 | " Downloading tensorflow-1.4.0-cp36-cp36m-macosx_10_11_x86_64.whl (39.3MB)\n",
14 | "Collecting tensorflow-tensorboard<0.5.0,>=0.4.0rc1 (from tensorflow)\n",
15 | " Downloading tensorflow_tensorboard-0.4.0rc2-py3-none-any.whl (1.7MB)\n",
16 | "Requirement already up-to-date: protobuf>=3.3.0 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n",
17 | "Requirement already up-to-date: numpy>=1.12.1 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n",
18 | "Requirement already up-to-date: wheel>=0.26 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n",
19 | "Collecting enum34>=1.1.6 (from tensorflow)\n",
20 | " Downloading enum34-1.1.6-py3-none-any.whl\n",
21 | "Requirement already up-to-date: six>=1.10.0 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n",
22 | "Requirement already up-to-date: werkzeug>=0.11.10 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n",
23 | "Requirement already up-to-date: html5lib==0.9999999 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n",
24 | "Requirement already up-to-date: markdown>=2.6.8 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n",
25 | "Requirement already up-to-date: bleach==1.5.0 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n",
26 | "Requirement already up-to-date: setuptools in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from protobuf>=3.3.0->tensorflow)\n",
27 | "Installing collected packages: tensorflow-tensorboard, enum34, tensorflow\n",
28 | " Found existing installation: tensorflow-tensorboard 0.1.8\n",
29 | " Uninstalling tensorflow-tensorboard-0.1.8:\n",
30 | " Successfully uninstalled tensorflow-tensorboard-0.1.8\n",
31 | " Found existing installation: tensorflow 1.3.0\n",
32 | " Uninstalling tensorflow-1.3.0:\n",
33 | " Successfully uninstalled tensorflow-1.3.0\n",
34 | "Successfully installed enum34-1.1.6 tensorflow-1.4.0 tensorflow-tensorboard-0.4.0rc2\n"
35 | ]
36 | }
37 | ],
38 | "source": [
39 | "%%bash\n",
40 | "\n",
41 | "pip install -U tensorflow"
42 | ]
43 | },
44 | {
45 | "cell_type": "code",
46 | "execution_count": 2,
47 | "metadata": {},
48 | "outputs": [
49 | {
50 | "name": "stderr",
51 | "output_type": "stream",
52 | "text": [
53 | "/Users/khalidsalama/anaconda/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6\n",
54 | " return f(*args, **kwds)\n"
55 | ]
56 | },
57 | {
58 | "name": "stdout",
59 | "output_type": "stream",
60 | "text": [
61 | "1.4.0\n"
62 | ]
63 | }
64 | ],
65 | "source": [
66 | "import tensorflow as tf\n",
67 | "print(tf.__version__)"
68 | ]
69 | },
70 | {
71 | "cell_type": "code",
72 | "execution_count": null,
73 | "metadata": {
74 | "collapsed": true
75 | },
76 | "outputs": [],
77 | "source": []
78 | }
79 | ],
80 | "metadata": {
81 | "kernelspec": {
82 | "display_name": "Python 3",
83 | "language": "python",
84 | "name": "python3"
85 | },
86 | "language_info": {
87 | "codemirror_mode": {
88 | "name": "ipython",
89 | "version": 3
90 | },
91 | "file_extension": ".py",
92 | "mimetype": "text/x-python",
93 | "name": "python",
94 | "nbconvert_exporter": "python",
95 | "pygments_lexer": "ipython3",
96 | "version": "3.6.1"
97 | }
98 | },
99 | "nbformat": 4,
100 | "nbformat_minor": 2
101 | }
102 |
--------------------------------------------------------------------------------
/02 - Classification/02.0 - Convert CSV to TFRecords.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [
8 | {
9 | "name": "stderr",
10 | "output_type": "stream",
11 | "text": [
12 | "/Users/khalidsalama/anaconda/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6\n",
13 | " return f(*args, **kwds)\n"
14 | ]
15 | },
16 | {
17 | "name": "stdout",
18 | "output_type": "stream",
19 | "text": [
20 | "1.4.0\n"
21 | ]
22 | }
23 | ],
24 | "source": [
25 | "import tensorflow as tf\n",
26 | "import csv\n",
27 | "import os\n",
28 | "\n",
29 | "print(tf.__version__)"
30 | ]
31 | },
32 | {
33 | "cell_type": "code",
34 | "execution_count": 2,
35 | "metadata": {
36 | "collapsed": true
37 | },
38 | "outputs": [],
39 | "source": [
40 | "train_data_files = ['data/train-data.csv']\n",
41 | "valid_data_files = ['data/valid-data.csv']\n",
42 | "test_data_files = ['data/test-data.csv']"
43 | ]
44 | },
45 | {
46 | "cell_type": "code",
47 | "execution_count": 3,
48 | "metadata": {},
49 | "outputs": [
50 | {
51 | "name": "stdout",
52 | "output_type": "stream",
53 | "text": [
54 | "Header: ['key', 'x', 'y', 'alpha', 'beta', 'target']\n",
55 | "Numeric Features: ['x', 'y']\n",
56 | "Categorical Features: ['alpha', 'beta']\n",
57 | "Target: target - labels: ['postive', 'negative']\n",
58 | "Unused Features: ['key']\n"
59 | ]
60 | }
61 | ],
62 | "source": [
63 | "HEADER = ['key','x','y','alpha','beta','target']\n",
64 | "HEADER_DEFAULTS = [[0], [0.0], [0.0], ['NA'], ['NA'], ['NA']]\n",
65 | "\n",
66 | "NUMERIC_FEATURE_NAMES = ['x', 'y'] \n",
67 | "\n",
68 | "CATEGORICAL_FEATURE_NAMES_WITH_VOCABULARY = {'alpha':['ax01', 'ax02'], 'beta':['bx01', 'bx02']}\n",
69 | "CATEGORICAL_FEATURE_NAMES = list(CATEGORICAL_FEATURE_NAMES_WITH_VOCABULARY.keys())\n",
70 | "\n",
71 | "FEATURE_NAMES = NUMERIC_FEATURE_NAMES + CATEGORICAL_FEATURE_NAMES\n",
72 | "\n",
73 | "TARGET_NAME = 'target'\n",
74 | "\n",
75 | "TARGET_LABELS = ['postive', 'negative']\n",
76 | "\n",
77 | "UNUSED_FEATURE_NAMES = list(set(HEADER) - set(FEATURE_NAMES) - {TARGET_NAME})\n",
78 | "\n",
79 | "print(\"Header: {}\".format(HEADER))\n",
80 | "print(\"Numeric Features: {}\".format(NUMERIC_FEATURE_NAMES))\n",
81 | "print(\"Categorical Features: {}\".format(CATEGORICAL_FEATURE_NAMES))\n",
82 | "print(\"Target: {} - labels: {}\".format(TARGET_NAME, TARGET_LABELS))\n",
83 | "print(\"Unused Features: {}\".format(UNUSED_FEATURE_NAMES))"
84 | ]
85 | },
86 | {
87 | "cell_type": "code",
88 | "execution_count": 4,
89 | "metadata": {
90 | "collapsed": true
91 | },
92 | "outputs": [],
93 | "source": [
94 | "def create_csv_iterator(csv_file_path, skip_header):\n",
95 | " \n",
96 | " with tf.gfile.Open(csv_file_path) as csv_file:\n",
97 | " reader = csv.reader(csv_file)\n",
98 | " if skip_header: # Skip the header\n",
99 | " next(reader)\n",
100 | " for row in reader:\n",
101 | " yield row"
102 | ]
103 | },
104 | {
105 | "cell_type": "code",
106 | "execution_count": 5,
107 | "metadata": {
108 | "collapsed": true
109 | },
110 | "outputs": [],
111 | "source": [
112 | "def create_example(row):\n",
113 | " \"\"\"\n",
114 | " Returns a tensorflow.Example Protocol Buffer object.\n",
115 | " \"\"\"\n",
116 | " example = tf.train.Example()\n",
117 | "\n",
118 | " for i in range(len(HEADER)):\n",
119 | " \n",
120 | " feature_name = HEADER[i]\n",
121 | " feature_value = row[i]\n",
122 | " \n",
123 | " if feature_name in UNUSED_FEATURE_NAMES:\n",
124 | " continue\n",
125 | " \n",
126 | " if feature_name in NUMERIC_FEATURE_NAMES:\n",
127 | " example.features.feature[feature_name].float_list.value.extend([float(feature_value)])\n",
128 | " \n",
129 | " elif feature_name in CATEGORICAL_FEATURE_NAMES:\n",
130 | " example.features.feature[feature_name].bytes_list.value.extend([bytes(feature_value, 'utf-8')])\n",
131 | "\n",
132 | " elif feature_name in TARGET_NAME:\n",
133 | " example.features.feature[feature_name].bytes_list.value.extend([bytes(feature_value, 'utf-8')])\n",
134 | "\n",
135 | " return example"
136 | ]
137 | },
138 | {
139 | "cell_type": "code",
140 | "execution_count": 6,
141 | "metadata": {
142 | "collapsed": true
143 | },
144 | "outputs": [],
145 | "source": [
146 | "def create_tfrecords_file(input_csv_file):\n",
147 | " \"\"\"\n",
148 | " Creates a TFRecords file for the given input data and\n",
149 | " example transofmration function\n",
150 | " \"\"\"\n",
151 | " output_tfrecord_file = input_csv_file.replace(\"csv\",\"tfrecords\")\n",
152 | " writer = tf.python_io.TFRecordWriter(output_tfrecord_file)\n",
153 | " \n",
154 | " print(\"Creating TFRecords file at\", output_tfrecord_file, \"...\")\n",
155 | " \n",
156 | " for i, row in enumerate(create_csv_iterator(input_csv_file, skip_header=False)):\n",
157 | " \n",
158 | " if len(row) == 0:\n",
159 | " continue\n",
160 | " \n",
161 | " example = create_example(row)\n",
162 | " content = example.SerializeToString()\n",
163 | " writer.write(content)\n",
164 | " \n",
165 | " writer.close()\n",
166 | " \n",
167 | " print(\"Finish Writing\", output_tfrecord_file)"
168 | ]
169 | },
170 | {
171 | "cell_type": "code",
172 | "execution_count": 7,
173 | "metadata": {},
174 | "outputs": [
175 | {
176 | "name": "stdout",
177 | "output_type": "stream",
178 | "text": [
179 | "Converting Training Data Files\n",
180 | "Creating TFRecords file at data/train-data.tfrecords ...\n",
181 | "Finish Writing data/train-data.tfrecords\n",
182 | "\n",
183 | "Converting Validation Data Files\n",
184 | "Creating TFRecords file at data/valid-data.tfrecords ...\n",
185 | "Finish Writing data/valid-data.tfrecords\n",
186 | "\n",
187 | "Converting Test Data Files\n",
188 | "Creating TFRecords file at data/test-data.tfrecords ...\n",
189 | "Finish Writing data/test-data.tfrecords\n"
190 | ]
191 | }
192 | ],
193 | "source": [
194 | "print(\"Converting Training Data Files\")\n",
195 | "for input_csv_file in train_data_files:\n",
196 | " create_tfrecords_file(input_csv_file)\n",
197 | "print(\"\")\n",
198 | "\n",
199 | "print(\"Converting Validation Data Files\")\n",
200 | "for input_csv_file in valid_data_files:\n",
201 | " create_tfrecords_file(input_csv_file)\n",
202 | "print(\"\")\n",
203 | "\n",
204 | "print(\"Converting Test Data Files\")\n",
205 | "for input_csv_file in test_data_files:\n",
206 | " create_tfrecords_file(input_csv_file)"
207 | ]
208 | },
209 | {
210 | "cell_type": "code",
211 | "execution_count": null,
212 | "metadata": {
213 | "collapsed": true
214 | },
215 | "outputs": [],
216 | "source": []
217 | }
218 | ],
219 | "metadata": {
220 | "kernelspec": {
221 | "display_name": "Python 3",
222 | "language": "python",
223 | "name": "python3"
224 | },
225 | "language_info": {
226 | "codemirror_mode": {
227 | "name": "ipython",
228 | "version": 3
229 | },
230 | "file_extension": ".py",
231 | "mimetype": "text/x-python",
232 | "name": "python",
233 | "nbconvert_exporter": "python",
234 | "pygments_lexer": "ipython3",
235 | "version": "3.6.1"
236 | }
237 | },
238 | "nbformat": 4,
239 | "nbformat_minor": 2
240 | }
241 |
--------------------------------------------------------------------------------
/02 - Classification/data/adult.stats.csv:
--------------------------------------------------------------------------------
1 | ,max,mean,min,stdv
2 | age,90,38.58164675532078,17,13.640432553581146
3 | fnlwgt,1484705,189778.36651208502,12285,105549.97769702235
4 | education_num,16,10.0806793403151,1,2.5727203320673406
5 | capital_gain,99999,1077.6488437087312,0,7385.292084839299
6 | capital_loss,4356,87.303829734959,0,402.96021864905896
7 | hours_per_week,99,40.437455852092995,1,12.347428681730811
8 |
--------------------------------------------------------------------------------
/02 - Classification/data/test-data.tfrecords:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ksalama/tf-estimator-tutorials/cecfea0c378ebc8552941c9ebf8a530228dd845d/02 - Classification/data/test-data.tfrecords
--------------------------------------------------------------------------------
/02 - Classification/data/train-data.tfrecords:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ksalama/tf-estimator-tutorials/cecfea0c378ebc8552941c9ebf8a530228dd845d/02 - Classification/data/train-data.tfrecords
--------------------------------------------------------------------------------
/02 - Classification/data/valid-data.tfrecords:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ksalama/tf-estimator-tutorials/cecfea0c378ebc8552941c9ebf8a530228dd845d/02 - Classification/data/valid-data.tfrecords
--------------------------------------------------------------------------------
/03 - Clustering/00.0 - TensorFlow Version Update.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [
8 | {
9 | "name": "stdout",
10 | "output_type": "stream",
11 | "text": [
12 | "Collecting tensorflow\n",
13 | " Downloading tensorflow-1.4.0-cp36-cp36m-macosx_10_11_x86_64.whl (39.3MB)\n",
14 | "Collecting tensorflow-tensorboard<0.5.0,>=0.4.0rc1 (from tensorflow)\n",
15 | " Downloading tensorflow_tensorboard-0.4.0rc2-py3-none-any.whl (1.7MB)\n",
16 | "Requirement already up-to-date: protobuf>=3.3.0 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n",
17 | "Requirement already up-to-date: numpy>=1.12.1 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n",
18 | "Requirement already up-to-date: wheel>=0.26 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n",
19 | "Collecting enum34>=1.1.6 (from tensorflow)\n",
20 | " Downloading enum34-1.1.6-py3-none-any.whl\n",
21 | "Requirement already up-to-date: six>=1.10.0 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n",
22 | "Requirement already up-to-date: werkzeug>=0.11.10 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n",
23 | "Requirement already up-to-date: html5lib==0.9999999 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n",
24 | "Requirement already up-to-date: markdown>=2.6.8 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n",
25 | "Requirement already up-to-date: bleach==1.5.0 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n",
26 | "Requirement already up-to-date: setuptools in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from protobuf>=3.3.0->tensorflow)\n",
27 | "Installing collected packages: tensorflow-tensorboard, enum34, tensorflow\n",
28 | " Found existing installation: tensorflow-tensorboard 0.1.8\n",
29 | " Uninstalling tensorflow-tensorboard-0.1.8:\n",
30 | " Successfully uninstalled tensorflow-tensorboard-0.1.8\n",
31 | " Found existing installation: tensorflow 1.3.0\n",
32 | " Uninstalling tensorflow-1.3.0:\n",
33 | " Successfully uninstalled tensorflow-1.3.0\n",
34 | "Successfully installed enum34-1.1.6 tensorflow-1.4.0 tensorflow-tensorboard-0.4.0rc2\n"
35 | ]
36 | }
37 | ],
38 | "source": [
39 | "%%bash\n",
40 | "\n",
41 | "pip install -U tensorflow"
42 | ]
43 | },
44 | {
45 | "cell_type": "code",
46 | "execution_count": 2,
47 | "metadata": {},
48 | "outputs": [
49 | {
50 | "name": "stderr",
51 | "output_type": "stream",
52 | "text": [
53 | "/Users/khalidsalama/anaconda/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6\n",
54 | " return f(*args, **kwds)\n"
55 | ]
56 | },
57 | {
58 | "name": "stdout",
59 | "output_type": "stream",
60 | "text": [
61 | "1.4.0\n"
62 | ]
63 | }
64 | ],
65 | "source": [
66 | "import tensorflow as tf\n",
67 | "print(tf.__version__)"
68 | ]
69 | },
70 | {
71 | "cell_type": "code",
72 | "execution_count": null,
73 | "metadata": {
74 | "collapsed": true
75 | },
76 | "outputs": [],
77 | "source": []
78 | }
79 | ],
80 | "metadata": {
81 | "kernelspec": {
82 | "display_name": "Python 3",
83 | "language": "python",
84 | "name": "python3"
85 | },
86 | "language_info": {
87 | "codemirror_mode": {
88 | "name": "ipython",
89 | "version": 3
90 | },
91 | "file_extension": ".py",
92 | "mimetype": "text/x-python",
93 | "name": "python",
94 | "nbconvert_exporter": "python",
95 | "pygments_lexer": "ipython3",
96 | "version": "3.6.1"
97 | }
98 | },
99 | "nbformat": 4,
100 | "nbformat_minor": 2
101 | }
102 |
--------------------------------------------------------------------------------
/03 - Clustering/data/new-data.csv:
--------------------------------------------------------------------------------
1 | 0.5,-0.3,7
--------------------------------------------------------------------------------
/04 - Times Series/data/test-data.csv:
--------------------------------------------------------------------------------
1 | time_index,value
2 | 700,4.062140225471643
3 | 701,3.1703847192297845
4 | 702,2.8296873454315192
5 | 703,3.6961238939396597
6 | 704,3.774688382600603
7 | 705,3.8527851934977764
8 | 706,3.0342060812112686
9 | 707,3.3306549645378025
10 | 708,3.6487243344222495
11 | 709,2.1335273551814233
12 | 710,3.683338770039675
13 | 711,3.4407180005651257
14 | 712,3.1847285264243808
15 | 713,2.8896834732269645
16 | 714,3.443554099017879
17 | 715,3.1628527640306943
18 | 716,3.477295109150082
19 | 717,3.36526258592279
20 | 718,3.122152506677769
21 | 719,3.348075456125274
22 | 720,2.4504905110212265
23 | 721,3.00950452903947
24 | 722,3.205798117619332
25 | 723,2.5846071060841815
26 | 724,2.3892445742499167
27 | 725,3.3945942547686103
28 | 726,2.561123153365352
29 | 727,2.035257638932893
30 | 728,2.9801074278502275
31 | 729,2.9562399361791156
32 | 730,2.1654708168278356
33 | 731,3.4468449142981705
34 | 732,2.6893807928426563
35 | 733,3.025794994419157
36 | 734,2.542869596532311
37 | 735,2.9275771470778706
38 | 736,2.8204505932091055
39 | 737,3.527816758474815
40 | 738,2.197418634802284
41 | 739,2.554280646235888
42 | 740,2.4324240602338847
43 | 741,3.1271375891212405
44 | 742,2.2850209514914006
45 | 743,2.0776756899911613
46 | 744,2.529000935802995
47 | 745,3.297087742223073
48 | 746,2.2394742253878963
49 | 747,3.1367437479006797
50 | 748,2.3953147600203675
51 | 749,2.8848458913301296
52 | 750,2.9185911092297903
53 | 751,2.768126620869814
54 | 752,2.43488473055407
55 | 753,2.8870032425325123
56 | 754,3.317655820661928
57 | 755,2.1790416388446836
58 | 756,2.7702407610447577
59 | 757,2.554226484730687
60 | 758,2.8134188158141438
61 | 759,2.758781861045474
62 | 760,2.272104718154779
63 | 761,2.8103970647324372
64 | 762,2.8972594904941387
65 | 763,3.4002482934772478
66 | 764,3.3455711599757834
67 | 765,2.715918824573258
68 | 766,3.7061718277620113
69 | 767,3.0195081640399204
70 | 768,3.4891004444538325
71 | 769,2.9311254106642193
72 | 770,2.3379837346623598
73 | 771,2.5146941193432775
74 | 772,3.534476172205595
75 | 773,3.0003799070150836
76 | 774,2.8915136087990785
77 | 775,2.393552222327803
78 | 776,3.011905423311392
79 | 777,3.8801787347996632
80 | 778,3.2515228547754886
81 | 779,2.789501465000945
82 | 780,3.426429385272551
83 | 781,3.418712155395797
84 | 782,3.983713621739207
85 | 783,3.6345931864075576
86 | 784,3.028427715036325
87 | 785,3.6675103028495144
88 | 786,4.199142625113263
89 | 787,3.0825750004211327
90 | 788,3.3340944569219486
91 | 789,3.7900930100567076
92 | 790,3.9891451701449654
93 | 791,4.437402936216056
94 | 792,3.483434383801479
95 | 793,4.856432283156268
96 | 794,3.112032068064219
97 | 795,3.764822361311284
98 | 796,4.778499314027573
99 | 797,3.33724185989896
100 | 798,3.8058737331849453
101 | 799,3.9223811712262653
102 | 800,4.80546589113736
103 | 801,4.421552582453292
104 | 802,3.606081714628961
105 | 803,3.9941737176325596
106 | 804,4.662649705612334
107 | 805,4.018590914241019
108 | 806,3.5680466115701646
109 | 807,5.103635450598651
110 | 808,4.553832764926619
111 | 809,4.480087371204185
112 | 810,4.462603498918542
113 | 811,4.2137200426188075
114 | 812,4.189374217427936
115 | 813,4.044349362105051
116 | 814,3.3654308023514417
117 | 815,4.551988909577272
118 | 816,5.281251897092956
119 | 817,4.919655962013503
120 | 818,4.268853670537956
121 | 819,5.326461607549719
122 | 820,4.423531000313117
123 | 821,4.203178570982242
124 | 822,4.120263855677827
125 | 823,3.776759734748973
126 | 824,4.5429757684426
127 | 825,5.351165193685153
128 | 826,4.3428152492354775
129 | 827,5.394929077351952
130 | 828,5.218609727629257
131 | 829,4.9831655977115625
132 | 830,5.602842952189427
133 | 831,5.3664242391999775
134 | 832,5.14450210344502
135 | 833,5.014801223804789
136 | 834,5.404549894954248
137 | 835,4.611614806903722
138 | 836,5.91369549455372
139 | 837,5.575199712425203
140 | 838,4.551385680651797
141 | 839,5.696239295334581
142 | 840,5.673983860921718
143 | 841,5.131646815240395
144 | 842,4.7304053547507685
145 | 843,5.131704930574379
146 | 844,5.350692049974139
147 | 845,5.043122726463051
148 | 846,5.433980654640878
149 | 847,5.392811171818018
150 | 848,6.127902771533075
151 | 849,4.948801899758867
152 | 850,5.672670683819614
153 | 851,4.619344342691638
154 | 852,4.461927290385413
155 | 853,5.134271002568175
156 | 854,5.244183015774759
157 | 855,5.0454199444160315
158 | 856,5.63991663670262
159 | 857,5.444551179447414
160 | 858,5.358876256297602
161 | 859,6.300157516455733
162 | 860,5.521687291262919
163 | 861,6.482989918871226
164 | 862,4.452113139646457
165 | 863,5.947519115228699
166 | 864,4.732843968683768
167 | 865,4.663305658213866
168 | 866,5.060828778618426
169 | 867,5.630137501067726
170 | 868,4.837622754661279
171 | 869,4.589984321432029
172 | 870,5.149519770472633
173 | 871,4.926183108085338
174 | 872,5.529322212911065
175 | 873,4.757430665280789
176 | 874,5.39173836256956
177 | 875,5.23465202505217
178 | 876,4.714170978848213
179 | 877,4.662839356640053
180 | 878,4.60819971256791
181 | 879,4.882721694617192
182 | 880,5.390915345465747
183 | 881,3.8287811359231476
184 | 882,4.905994302868104
185 | 883,5.1710621658328515
186 | 884,4.391188353483403
187 | 885,4.748422379069466
188 | 886,5.83622319255817
189 | 887,5.085489278108183
190 | 888,5.085950301210101
191 | 889,5.267403853016739
192 | 890,5.494662308615086
193 | 891,5.113984813073912
194 | 892,5.1585692571022514
195 | 893,4.4319546043214695
196 | 894,4.387326346526101
197 | 895,4.756366569062057
198 | 896,4.413291810154036
199 | 897,5.013658966087423
200 | 898,4.549167180263243
201 | 899,5.172728541719882
202 | 900,3.886746174651208
203 | 901,4.588569016237771
204 | 902,4.929271797376781
205 | 903,4.599656125469891
206 | 904,4.808639274005556
207 | 905,4.325581040660617
208 | 906,4.194580144654176
209 | 907,3.9974315940262444
210 | 908,4.715515557271075
211 | 909,4.1909689237542285
212 | 910,4.074666135679668
213 | 911,4.901169926148987
214 | 912,3.9552622873015917
215 | 913,3.5796376754546113
216 | 914,4.711809225431517
217 | 915,3.6796683690417753
218 | 916,3.5442679947216176
219 | 917,3.1421330422267806
220 | 918,3.756923255399997
221 | 919,4.221580470129694
222 | 920,3.7462046848740105
223 | 921,4.185030793915441
224 | 922,3.396145019184779
225 | 923,4.5190483220341156
226 | 924,4.397364432124365
227 | 925,4.187464540915996
228 | 926,3.9272795891536907
229 | 927,3.6197921717482333
230 | 928,4.032448297113617
231 | 929,4.220999672128967
232 | 930,4.257691213195231
233 | 931,4.154362183090802
234 | 932,4.263289039195564
235 | 933,4.223630570761886
236 | 934,3.5084926009711603
237 | 935,4.062713546013925
238 | 936,4.273967524010026
239 | 937,3.9695138571146784
240 | 938,3.8583311980482913
241 | 939,2.9247548632596514
242 | 940,3.854652554626871
243 | 941,3.234429271148347
244 | 942,3.044781194988175
245 | 943,3.656513334659649
246 | 944,3.4818997204930073
247 | 945,2.875852392487933
248 | 946,3.961998294895795
249 | 947,4.105299492259828
250 | 948,4.216647670087034
251 | 949,3.6508133990630003
252 | 950,3.8246399910006907
253 | 951,3.9922875618756573
254 | 952,3.588030199403815
255 | 953,4.384184812214926
256 | 954,3.5120901674831724
257 | 955,3.418442907989169
258 | 956,3.2977455863735003
259 | 957,2.752204501626123
260 | 958,3.6410521282086634
261 | 959,3.473679507191042
262 | 960,3.921614723961769
263 | 961,3.8925441023134137
264 | 962,3.4275589055043403
265 | 963,3.2211072262973106
266 | 964,3.109856818870202
267 | 965,4.680680857691393
268 | 966,4.546573184643018
269 | 967,3.306769027768169
270 | 968,3.981130047116258
271 | 969,3.8552929675600307
272 | 970,3.7797822835350914
273 | 971,4.427976023630056
274 | 972,4.202232494544553
275 | 973,5.007984438145008
276 | 974,4.489380804569116
277 | 975,4.378691233249306
278 | 976,3.844107079000932
279 | 977,3.2052757587187792
280 | 978,3.966125426862569
281 | 979,3.871352865158788
282 | 980,3.8431485227426725
283 | 981,3.657077938996487
284 | 982,4.356575779819392
285 | 983,3.991957980024272
286 | 984,3.937984423830078
287 | 985,4.673590913114282
288 | 986,5.608245991898564
289 | 987,4.549470674312809
290 | 988,4.706222218782882
291 | 989,4.006099739650296
292 | 990,4.735846319022681
293 | 991,4.395454732392768
294 | 992,4.807044589998613
295 | 993,4.40877616086605
296 | 994,3.623780385854121
297 | 995,5.332176415583953
298 | 996,4.798612805906225
299 | 997,4.91261012214747
300 | 998,5.1126834613174434
301 | 999,5.6472138130982374
302 |
--------------------------------------------------------------------------------
/04 - Times Series/data/timeseries-multivariate.txt:
--------------------------------------------------------------------------------
1 | 0,0.926906299771,1.99107237682,2.56546245685,3.07914768197,4.04839057867
2 | 1,0.108010001864,1.41645361423,2.1686839775,2.94963962176,4.1263503303
3 | 2,-0.800567600028,1.0172132907,1.96434754116,2.99885333086,4.04300485864
4 | 3,0.0607042871898,0.719540073421,1.9765012584,2.89265588817,4.0951014426
5 | 4,0.933712200629,0.28052120776,1.41018552514,2.69232603996,4.06481164223
6 | 5,-0.171730652974,0.260054421028,1.48770816369,2.62199129293,4.44572807842
7 | 6,-1.00180162933,0.333045158863,1.50006392277,2.88888309683,4.24755865606
8 | 7,0.0580061875336,0.688929398826,1.56543458772,2.99840358953,4.52726873347
9 | 8,0.764139447412,1.24704875327,1.77649279698,3.13578593851,4.63238922951
10 | 9,-0.230331874785,1.47903998963,2.03547545751,3.20624030377,4.77980005228
11 | 10,-1.03846045211,2.01133000781,2.31977503972,3.67951536251,5.09716775897
12 | 11,0.188643592253,2.23285349038,2.68338482249,3.49817168611,5.24928239634
13 | 12,0.91207302309,2.24244446841,2.71362604985,3.96332587625,5.37802271594
14 | 13,-0.296588665881,2.02594634141,3.07733910479,3.99698324956,5.56365901394
15 | 14,-0.959961476551,1.45078629833,3.18996420137,4.3763059609,5.65356015609
16 | 15,0.46313530679,1.01141441548,3.4980215948,4.20224896882,5.88842247449
17 | 16,0.929354125798,0.626635305936,3.70508262244,4.51791573544,5.73945973251
18 | 17,-0.519110731957,0.269249223148,3.39866823332,4.46802003061,5.82768174382
19 | 18,-0.924330981367,0.349602834684,3.21762413294,4.72803587499,5.94918925767
20 | 19,0.253239387885,0.345158023497,3.11071425333,4.79311566935,5.9489259713
21 | 20,0.637408390225,0.698996675371,3.25232492145,4.73814732384,5.9612010251
22 | 21,-0.407396859412,1.17456342803,2.49526823723,4.59323415742,5.82501686811
23 | 22,-0.967485452118,1.66655933642,2.47284606244,4.58316034754,5.88721406681
24 | 23,0.474480867904,1.95018556323,2.0228950072,4.48651142819,5.8255943735
25 | 24,1.04309652155,2.23519892356,1.91924131572,4.19094661783,5.87457348436
26 | 25,-0.517861513772,2.12501967336,1.70266619979,4.05280882887,5.72160912899
27 | 26,-0.945301585146,1.65464653549,1.81567174251,3.92309850635,5.58270493814
28 | 27,0.501153868974,1.40600764889,1.53991387719,3.72853247942,5.60169001727
29 | 28,0.972859524418,1.00344321868,1.5175642828,3.64092376655,5.10567722582
30 | 29,-0.70553406135,0.465306263885,1.7038540803,3.33236870312,5.09182481555
31 | 30,-0.946093634916,0.294539309453,1.88052827037,2.93011492669,4.97354922696
32 | 31,0.47922123231,0.308465865031,2.03445883031,2.90772899045,4.86241793548
33 | 32,0.754030014252,0.549752241167,2.46115815089,2.95063349534,4.71834614627
34 | 33,-0.64875949826,0.894615488148,2.5922463381,2.81269864022,4.43480095104
35 | 34,-0.757829951086,1.39123914261,2.69258079904,2.61834837315,4.36580046156
36 | 35,0.565653301088,1.72360022693,2.97794913834,2.80403840334,4.27327248459
37 | 36,0.867440092372,2.21100730052,3.38648090792,2.84057515729,4.12210169576
38 | 37,-0.894567758095,2.17549105818,3.45532493329,2.90446025717,4.00251740584
39 | 38,-0.715442356893,2.15105389965,3.52041791902,3.03650393392,4.12809249577
40 | 39,0.80671703672,1.81504564517,3.60463324866,3.00747789871,3.98440762467
41 | 40,0.527014790142,1.31803513865,3.43842186337,3.3332594663,4.03232406566
42 | 41,-0.795936862129,0.847809114454,3.09875133548,3.52863155938,3.94883924909
43 | 42,-0.610245806946,0.425530441018,2.92581949152,3.77238736123,4.27287245021
44 | 43,0.611662279431,0.178432049837,2.48128214822,3.73212087883,4.17319013831
45 | 44,0.650866553108,0.220341648392,2.41694642022,4.2609098519,4.27271645905
46 | 45,-0.774156982023,0.632667602331,2.05474356052,4.32889204886,4.18029723271
47 | 46,-0.714058448409,0.924562377599,1.75706135146,4.52492718422,4.3972678094
48 | 47,0.889627293379,1.46207968841,1.78299357672,4.64466731095,4.56317887554
49 | 48,0.520140662861,1.8996333843,1.41377633823,4.48899091177,4.78805049769
50 | 49,-1.03816935616,2.08997002059,1.51218375351,4.84167764204,4.93026048606
51 | 50,-0.40772951362,2.30878972136,1.44144415128,4.76854460997,5.01538444629
52 | 51,0.792730684781,1.91367048509,1.58887384677,4.71739397335,5.25690012199
53 | 52,0.371311881576,1.67565079528,1.81688563053,4.60353107555,5.44265822961
54 | 53,-0.814398070371,1.13374634126,1.80328814859,4.72264252878,5.52674761122
55 | 54,-0.469017949323,0.601244136627,2.29690896736,4.49859178859,5.54126153454
56 | 55,0.871044371426,0.407597593794,2.7499112487,4.19060637761,5.57693767301
57 | 56,0.523764933017,0.247705192709,3.09002071379,4.02095509006,5.80510362182
58 | 57,-0.881326403531,0.31513103164,3.11358205718,3.96079100808,5.81000652365
59 | 58,-0.357928025339,0.486163915865,3.17884556771,3.72634990659,5.85693642011
60 | 59,0.853038779822,1.04218094475,3.45835384454,3.36703969978,5.9585988449
61 | 60,0.435311516013,1.59715085283,3.63313338588,3.11276729421,5.93643818229
62 | 61,-1.02703719138,1.92205832542,3.47606111735,3.06247155999,6.02106646259
63 | 62,-0.246661325557,2.14653802542,3.29446326567,2.89936259181,5.67531541272
64 | 63,1.02554736569,2.25943737733,3.07031591528,2.78176218013,5.78206328989
65 | 64,0.337814475969,2.07589147224,2.80356226089,2.55888206331,5.7094075496
66 | 65,-1.12023369929,1.25333011618,2.56497288445,2.77361359194,5.50799418376
67 | 66,-0.178980246554,1.11937139901,2.51598681313,2.91438309151,5.47469577206
68 | 67,0.97550951531,0.60553823137,2.11657741073,2.88081098981,5.37034999502
69 | 68,0.136653357206,0.365828836075,1.97386033165,3.13217903204,5.07254490219
70 | 69,-1.05607596951,0.153152115069,1.52110743825,3.01308794192,5.08902539125
71 | 70,-0.13095280331,0.337113974483,1.52703079853,3.16687131599,4.86649398514
72 | 71,1.07081057754,0.714247566736,1.53761382634,3.45151989484,4.75892309166
73 | 72,0.0153410376082,1.24631231847,1.61690939161,3.85481994498,4.35683752832
74 | 73,-0.912801257303,1.60791309476,1.8729264524,4.03037260012,4.36072588913
75 | 74,-0.0894895640338,2.02535207407,1.93484909619,4.09557485132,4.35327025188
76 | 75,0.978646999652,2.20085086625,2.09003440427,4.27542353033,4.1805058388
77 | 76,-0.113312642876,2.2444100761,2.50789248839,4.4151861502,4.03267168136
78 | 77,-1.00215099149,1.84305628445,2.61691237246,4.45425147595,3.81203553766
79 | 78,-0.0183234614205,1.49573923116,2.99308471214,4.71134960112,4.0273804959
80 | 79,1.0823738177,1.12211589848,3.27079386925,4.94288270502,4.01851068083
81 | 80,0.124370187893,0.616474412808,3.4284236674,4.76942168327,3.9749536483
82 | 81,-0.929423379352,0.290977090976,3.34131726136,4.78590392707,4.10190661656
83 | 82,0.23766302648,0.155302052254,3.49779513794,4.64605656795,4.15571321107
84 | 83,1.03531486192,0.359702776204,3.4880725919,4.48167586667,4.21134561991
85 | 84,-0.261234571382,0.713877760378,3.42756426614,4.426443869,4.25208300527
86 | 85,-1.03572442277,1.25001113691,2.96908341113,4.25500915322,4.25723010649
87 | 86,0.380034261243,1.70543355622,2.73605932518,4.16703432307,4.63700400788
88 | 87,1.03734873488,1.97544410562,2.55586572141,3.84976673263,4.55282864289
89 | 88,-0.177344253372,2.22614526325,2.09565864891,3.77378097953,4.82577400298
90 | 89,-0.976821526892,2.18385079177,1.78522284118,3.67768223554,5.06302440873
91 | 90,0.264820472091,1.86981946157,1.50048403865,3.43619796921,5.05651761669
92 | 91,1.05642344868,1.47568646076,1.51347671977,3.20898518885,5.50149047462
93 | 92,-0.311607433358,1.04226467636,1.52089650905,3.02291865417,5.4889046232
94 | 93,-0.724285777937,0.553052311957,1.48573560173,2.7365973598,5.72549174225
95 | 94,0.519859192905,0.226520626591,1.61543723167,2.84102086852,5.69330622288
96 | 95,1.0323195039,0.260873217055,1.81913034804,2.83951143848,5.90325028086
97 | 96,-0.53285682538,0.387695521405,1.70935609313,2.57977050631,5.79579213161
98 | 97,-0.975127997215,0.920948771589,2.51292643636,2.71004616612,5.87016469227
99 | 98,0.540246804099,1.36445470181,2.61949412896,2.98482553485,6.02447664937
100 | 99,0.987764008058,1.85581989607,2.84685706149,2.94760204892,6.0212151724
--------------------------------------------------------------------------------
/04 - Times Series/data/timeseries-univariate.csv:
--------------------------------------------------------------------------------
1 | 1,-0.6656603714
2 | 2,-0.1164380359
3 | 3,0.7398626488
4 | 4,0.7368633029
5 | 5,0.2289480898
6 | 6,2.257073255
7 | 7,3.023457405
8 | 8,2.481161007
9 | 9,3.773638612
10 | 10,5.059257738
11 | 11,3.553186083
12 | 12,4.554486452
13 | 13,3.655475698
14 | 14,3.419647598
15 | 15,4.303376245
16 | 16,4.830153934
17 | 17,7.253057441
18 | 18,5.064802335
19 | 19,5.448082106
20 | 20,6.251301517
21 | 21,6.214335675
22 | 22,3.07021164
23 | 23,6.995487627
24 | 24,7.180942656
25 | 25,6.084876071
26 | 26,6.95580607
27 | 27,6.692312738
28 | 28,6.339959049
29 | 29,7.659013269
30 | 30,6.157071564
31 | 31,4.023661782
32 | 32,7.380555018
33 | 33,6.972155839
34 | 34,6.655956847
35 | 35,6.532594924
36 | 36,6.780524726
37 | 37,6.723407547
38 | 38,7.616777776
39 | 39,6.394157367
40 | 40,5.046574011
41 | 41,5.715326568
42 | 42,6.536737479
43 | 43,6.527307846
44 | 44,5.671954159
45 | 45,6.508512087
46 | 46,4.740656344
47 | 47,5.449062618
48 | 48,5.796110609
49 | 49,4.802213058
50 | 50,4.627081034
51 | 51,5.748934924
52 | 52,4.05776044
53 | 53,2.743057715
54 | 54,3.590052501
55 | 55,2.937786376
56 | 56,5.333221794
57 | 57,5.102383904
58 | 58,5.097946146
59 | 59,2.771776766
60 | 60,3.75493571
61 | 61,3.268329562
62 | 62,3.127887555
63 | 63,5.723894838
64 | 64,2.365351066
65 | 65,2.030890988
66 | 66,5.74385257
67 | 67,2.637874242
68 | 68,2.851492945
69 | 69,1.907194917
70 | 70,2.568816256
71 | 71,3.869259698
72 | 72,3.989917724
73 | 73,3.641515351
74 | 74,2.812911768
75 | 75,4.964828171
76 | 76,3.050937945
77 | 77,4.203046785
78 | 78,4.269162745
79 | 79,2.818643243
80 | 80,3.334928424
81 | 81,5.239741508
82 | 82,4.972880771
83 | 83,5.212782208
84 | 84,6.056729012
85 | 85,5.404247421
86 | 86,4.733521027
87 | 87,5.241044888
88 | 88,6.844720502
89 | 89,8.242617764
90 | 90,6.686818708
91 | 91,6.429035591
92 | 92,7.45926043
93 | 93,8.225717423
94 | 94,7.661722793
95 | 95,8.348721917
96 | 96,8.029228135
97 | 97,9.780942864
98 | 98,9.755623978
99 | 99,9.149489124
100 | 100,8.947965351
101 | 101,9.176768019
102 | 102,8.768408716
103 | 103,10.39624874
104 | 104,10.39477408
105 | 105,11.63126076
106 | 106,11.8222078
107 | 107,13.60107691
108 | 108,14.54919169
109 | 109,12.63475358
110 | 110,13.77411599
111 | 111,14.45808191
112 | 112,13.27674112
113 | 113,16.00004992
114 | 114,13.04977221
115 | 115,14.65730048
116 | 116,14.76178039
117 | 117,14.62716229
118 | 118,16.20697047
119 | 119,14.79470608
120 | 120,16.70541749
121 | 121,15.8638474
122 | 122,15.63192699
123 | 123,17.20433954
124 | 124,16.29180965
125 | 125,16.93688521
126 | 126,16.07521662
127 | 127,18.33942893
128 | 128,15.62502668
129 | 129,16.81519558
130 | 130,16.86177911
131 | 131,19.18323671
132 | 132,16.68993279
133 | 133,16.52735528
134 | 134,15.22702085
135 | 135,16.13574242
136 | 136,16.08079964
137 | 137,17.16828833
138 | 138,16.09004409
139 | 139,16.92712829
140 | 140,15.54298161
141 | 141,16.03893798
142 | 142,15.38310389
143 | 143,16.18064645
144 | 144,16.22326501
145 | 145,17.1657127
146 | 146,14.87850136
147 | 147,12.80968507
148 | 148,16.25354113
149 | 149,15.14082073
150 | 150,15.79111348
151 | 151,14.02005588
152 | 152,14.32583767
153 | 153,13.87437546
154 | 154,14.47127314
155 | 155,14.29661188
156 | 156,14.68406313
157 | 157,15.84514503
158 | 158,13.89667867
159 | 159,13.58135083
160 | 160,14.26005818
161 | 161,13.3826131
162 | 162,12.85293827
163 | 163,11.06745237
164 | 164,14.08812275
165 | 165,13.05949205
166 | 166,12.18454971
167 | 167,13.01005879
168 | 168,12.45032762
169 | 169,12.20445297
170 | 170,14.39420173
171 | 171,13.49261191
172 | 172,14.91460871
173 | 173,15.97672915
174 | 174,13.96235436
175 | 175,13.77840615
176 | 176,14.39425289
177 | 177,14.31499272
178 | 178,14.37080989
179 | 179,15.34130707
180 | 180,13.42441434
181 | 181,14.54726137
182 | 182,12.51644144
183 | 183,15.36040785
184 | 184,14.52577002
185 | 185,15.90562887
186 | 186,15.12482026
187 | 187,15.55534424
188 | 188,12.22427756
189 | 189,15.11554898
190 | 190,14.23464612
191 | 191,16.52156964
192 | 192,18.14558077
193 | 193,16.51932129
194 | 194,16.88159194
195 | 195,18.08337828
196 | 196,18.70889734
197 | 197,20.97040748
198 | 198,18.98358689
199 | 199,20.76308391
200 | 200,19.81117586
201 | 201,20.24139919
202 | 202,20.78884634
203 | 203,19.92458806
204 | 204,21.60401889
205 | 205,23.30040897
206 | 206,22.2621713
207 | 207,21.24305034
208 | 208,22.07690632
209 | 209,21.78022193
210 | 210,22.94853418
211 | 211,23.72076264
212 | 212,24.12217213
213 | 213,23.04498673
214 | 214,23.8767225
215 | 215,26.52157498
216 | 216,26.24329682
217 | 217,24.83932457
218 | 218,25.66570111
219 | 219,25.61834475
220 | 220,24.41079934
221 | 221,25.31871793
222 | 222,26.7612452
223 | 223,27.00663389
224 | 224,27.86719501
225 | 225,24.87319457
226 | 226,27.85768696
227 | 227,25.70405436
228 | 228,26.11077958
229 | 229,28.11250875
230 | 230,27.6743468
231 | 231,27.19705336
232 | 232,28.08086799
233 | 233,26.19946123
234 | 234,27.32830376
235 | 235,25.98334256
236 | 236,26.71791978
237 | 237,26.67921906
238 | 238,26.25811051
239 | 239,26.64228363
240 | 240,26.20667398
241 | 241,26.39816025
242 | 242,24.83672957
243 | 243,24.27745854
244 | 244,26.10007483
245 | 245,25.67761738
246 | 246,25.91667268
247 | 247,27.57057095
248 | 248,25.68913621
249 | 249,24.92375989
250 | 250,25.5593706
251 | 251,25.14638402
252 | 252,26.46738639
253 | 253,24.55740644
254 | 254,23.5691458
255 | 255,24.07138538
256 | 256,24.94177528
257 | 257,22.33546227
258 | 258,22.32323763
259 | 259,24.38075647
260 | 260,22.40754744
261 | 261,22.61183469
262 | 262,23.28658677
263 | 263,22.98637689
264 | 264,25.46468191
265 | 265,24.14497597
266 | 266,22.97023633
267 | 267,24.37831161
268 | 268,24.86418705
269 | 269,22.61185053
270 | 270,21.70979546
271 | 271,22.09389192
272 | 272,23.25882086
273 | 273,23.56494308
274 | 274,24.13181731
275 | 275,24.28160263
276 | 276,24.43623736
277 | 277,23.24956419
278 | 278,21.76696726
279 | 279,25.14997786
280 | 280,24.67520728
281 | 281,23.40400797
282 | 282,26.24489282
283 | 283,25.05952039
284 | 284,24.53922399
285 | 285,24.89917455
286 | 286,25.13438134
287 | 287,26.05220822
288 | 288,26.94133112
289 | 289,26.02788294
290 | 290,26.65909349
291 | 291,26.0832158
292 | 292,27.39946496
293 | 293,26.57973099
294 | 294,27.49867838
295 | 295,29.89834253
296 | 296,27.78403709
297 | 297,28.92405258
298 | 298,26.58518509
299 | 299,30.91291741
300 | 300,31.73949474
301 | 301,29.25173685
302 | 302,30.3747463
303 | 303,30.59695095
304 | 304,31.50757627
305 | 305,30.97036633
306 | 306,31.27177079
307 | 307,33.43369051
308 | 308,33.9848363
309 | 309,33.31775176
310 | 310,31.69164009
311 | 311,33.07897081
312 | 312,33.10849644
313 | 313,33.29428375
314 | 314,35.60397723
315 | 315,35.33614012
316 | 316,33.95701506
317 | 317,35.16914759
318 | 318,35.92430987
319 | 319,35.81820171
320 | 320,37.36378976
321 | 321,36.74459793
322 | 322,35.27569759
323 | 323,35.9767425
324 | 324,36.17811539
325 | 325,35.68567729
326 | 326,35.54212562
327 | 327,38.78114238
328 | 328,36.46819618
329 | 329,38.07352601
330 | 330,36.56662256
331 | 331,38.1938068
332 | 332,37.42919226
333 | 333,37.44666875
334 | 334,37.16795054
335 | 335,34.97440399
336 | 336,35.6174255
337 | 337,37.37634133
338 | 338,37.26137677
339 | 339,38.09726659
340 | 340,36.04071363
341 | 341,37.07494746
342 | 342,34.4281316
343 | 343,35.1959716
344 | 344,35.26041345
345 | 345,36.9398346
346 | 346,33.58933988
347 | 347,35.00075536
348 | 348,35.97807689
349 | 349,35.66631707
350 | 350,35.44925794
351 | 351,33.69565848
352 | 352,35.38969147
353 | 353,35.96432261
354 | 354,33.6956667
355 | 355,34.05230212
356 | 356,32.70536873
357 | 357,33.91009672
358 | 358,34.45606416
359 | 359,34.97972516
360 | 360,32.36260234
361 | 361,31.69621537
362 | 362,33.02307596
363 | 363,33.94445036
364 | 364,32.2763097
365 | 365,32.06228645
366 | 366,34.25956906
367 | 367,33.61620818
368 | 368,35.00141908
369 | 369,34.47493965
370 | 370,34.31576327
371 | 371,33.24772844
372 | 372,32.95185358
373 | 373,32.55224164
374 | 374,33.06560689
375 | 375,35.2082848
376 | 376,34.50372086
377 | 377,33.54922461
378 | 378,35.46287805
379 | 379,34.68829823
380 | 380,35.04640557
381 | 381,33.48711975
382 | 382,34.03264662
383 | 383,34.43296169
384 | 384,35.7571391
385 | 385,32.58466542
386 | 386,34.44295272
387 | 387,35.43369124
388 | 388,37.7196386
389 | 389,37.55863215
390 | 390,35.11245844
391 | 391,37.36667774
392 | 392,36.41904568
393 | 393,38.11951592
394 | 394,39.351325
395 | 395,38.87795167
396 | 396,38.8144378
397 | 397,38.96059714
398 | 398,39.95536453
399 | 399,39.78580611
400 | 400,40.70319964
401 | 401,41.32804151
402 | 402,42.79937243
403 | 403,38.43432481
404 | 404,42.12051726
405 | 405,42.50068551
406 | 406,43.89812523
407 | 407,42.18632495
408 | 408,43.99716859
409 | 409,43.67726129
410 | 410,42.98072384
411 | 411,43.59181621
412 | 412,44.98283057
413 | 413,42.17674627
414 | 414,46.49541908
415 | 415,45.58212027
416 | 416,42.7202171
417 | 417,45.66108535
418 | 418,45.03844556
419 | 419,44.96618253
420 | 420,45.0371585
421 | 421,46.12237848
422 | 422,46.18891162
423 | 423,46.82075672
424 | 424,47.25058257
425 | 425,45.91853936
426 | 426,46.83241571
427 | 427,47.77383153
428 | 428,48.12984438
429 | 429,46.74042025
430 | 430,46.66834779
431 | 431,47.41473153
432 | 432,46.93101415
433 | 433,48.24438209
434 | 434,47.41007874
435 | 435,46.92607209
436 | 436,46.77346554
437 | 437,47.80447575
438 | 438,45.7000972
439 | 439,46.60252512
440 | 440,45.59290618
441 | 441,47.37025588
442 | 442,46.46333171
443 | 443,46.19762396
444 | 444,47.57763766
445 | 445,46.92624737
446 | 446,46.1536802
447 | 447,45.94947611
448 | 448,46.37457004
449 | 449,44.22344538
450 | 450,43.18937717
451 | 451,44.3387774
452 | 452,45.63204816
453 | 453,43.87816917
454 | 454,43.67301546
455 | 455,42.11959709
456 | 456,43.89387883
457 | 457,44.40734798
458 | 458,42.67367897
459 | 459,43.76501429
460 | 460,44.74698445
461 | 461,43.14500236
462 | 462,42.41214263
463 | 463,44.1631715
464 | 464,41.81378406
465 | 465,43.00929934
466 | 466,42.80360515
467 | 467,44.30252713
468 | 468,42.88123048
469 | 469,43.47049118
470 | 470,44.42168141
471 | 471,42.43276664
472 | 472,44.57582419
473 | 473,43.56138481
474 | 474,43.4549005
475 | 475,43.06396235
476 | 476,43.8737132
477 | 477,42.1428636
478 | 478,43.60856585
479 | 479,44.16778079
480 | 480,42.90474298
481 | 481,44.99882414
482 | 482,43.304605
483 | 483,44.4468626
484 | 484,45.49241923
485 | 485,44.46713555
486 | 486,46.27348465
487 | 487,45.76034556
488 | 488,45.37440079
489 | 489,46.19246701
490 | 490,48.28190231
491 | 491,47.81719203
492 | 492,47.23213374
493 | 493,48.03313818
494 | 494,46.73599653
495 | 495,47.12327054
496 | 496,48.58597108
497 | 497,48.6738899
498 | 498,48.52018743
499 | 499,48.50385022
500 | 500,50.17026668
--------------------------------------------------------------------------------
/04 - Times Series/data/train-data.csv:
--------------------------------------------------------------------------------
1 | time_index,value
2 | 0,-0.5070139274941298
3 | 1,0.1253712967775968
4 | 2,-0.12267575206840288
5 | 3,-0.16956387939957984
6 | 4,0.30393116333315534
7 | 5,-0.12859637797270826
8 | 6,0.3184790887830743
9 | 7,-0.42364367699352623
10 | 8,0.21185831866839666
11 | 9,-0.3894984396354476
12 | 10,0.780461773776706
13 | 11,0.21716827635006894
14 | 12,0.208066295332867
15 | 13,0.9023076086982981
16 | 14,0.37177403780424256
17 | 15,0.43073320317895214
18 | 16,1.23308015942016
19 | 17,0.9121301645172821
20 | 18,0.8702649415061833
21 | 19,1.3225444506952997
22 | 20,1.2849016700625
23 | 21,0.8053488682091177
24 | 22,0.7683377036246207
25 | 23,0.4057552748095914
26 | 24,0.978329292153113
27 | 25,0.696015936876089
28 | 26,1.5554789446672375
29 | 27,0.4781779876080657
30 | 28,0.14013906948820853
31 | 29,1.6449368085323504
32 | 30,1.6590749923600454
33 | 31,2.0141355497897404
34 | 32,1.2459591113581108
35 | 33,0.9793177817011796
36 | 34,0.241826654996384
37 | 35,1.7570742528353063
38 | 36,1.0695330784833672
39 | 37,1.5210907139820962
40 | 38,0.8710384662192763
41 | 39,1.4839397118303155
42 | 40,1.243793091688579
43 | 41,0.6848339518810963
44 | 42,1.6276559859206734
45 | 43,1.3116497622098806
46 | 44,1.3608905379061378
47 | 45,1.041190994021974
48 | 46,0.9799805971350682
49 | 47,1.2354969054174134
50 | 48,0.22235989417954904
51 | 49,1.1513923265108672
52 | 50,0.9396515278276432
53 | 51,0.30260959424652467
54 | 52,1.0056960398178687
55 | 53,1.6068853568408674
56 | 54,1.7676627305773898
57 | 55,0.9173150845957287
58 | 56,1.9939897894609664
59 | 57,0.7414664658496637
60 | 58,0.5332771867761734
61 | 59,0.8338414219102095
62 | 60,0.8641193396342405
63 | 61,1.8207245814139355
64 | 62,1.443486437588053
65 | 63,1.674623601228037
66 | 64,1.443584875090209
67 | 65,1.308804473574395
68 | 66,1.733630056325742
69 | 67,0.8359910593520475
70 | 68,1.0980006203179862
71 | 69,0.7093105225204377
72 | 70,1.1191743713347615
73 | 71,1.150825963564253
74 | 72,2.3425419082645034
75 | 73,1.5345352246330584
76 | 74,1.6779068527914949
77 | 75,0.8676696369755783
78 | 76,0.7677086011436657
79 | 77,0.8998287612805649
80 | 78,0.6025577314257724
81 | 79,1.5358568541287414
82 | 80,1.7413454713205905
83 | 81,1.7294779805951772
84 | 82,0.24100658343511855
85 | 83,0.8087157551467282
86 | 84,1.5151550594728582
87 | 85,1.6630951453036857
88 | 86,0.6939581780591096
89 | 87,1.8563910702987192
90 | 88,0.8695925892718722
91 | 89,1.4075735259518103
92 | 90,0.813779511845681
93 | 91,1.0075753587561769
94 | 92,0.5362436479057621
95 | 93,1.208505762728457
96 | 94,0.39508516366335494
97 | 95,0.5387949889626957
98 | 96,0.06824975090018343
99 | 97,0.5515019585271188
100 | 98,0.9717347784462206
101 | 99,0.6453799423415064
102 | 100,0.9638253752187474
103 | 101,0.8253807454700055
104 | 102,0.765295849230056
105 | 103,-0.2490667183903401
106 | 104,0.1570243819306409
107 | 105,0.25567212153198093
108 | 106,0.13750761989296234
109 | 107,-0.20660746818522513
110 | 108,0.5933906620273861
111 | 109,0.5663378756409212
112 | 110,1.0593951296282083
113 | 111,0.2521124010736045
114 | 112,0.3260453790642569
115 | 113,0.1540775658086948
116 | 114,1.1988817491487544
117 | 115,0.2736270594678474
118 | 116,0.07319358529892356
119 | 117,0.2355506823573319
120 | 118,-0.3498579462123355
121 | 119,1.1746109061731147
122 | 120,-0.2454734313621706
123 | 121,0.03472511352169072
124 | 122,0.6452287293459306
125 | 123,-0.5196418251954463
126 | 124,0.27339202936903717
127 | 125,0.49343731462487284
128 | 126,1.0226849755300311
129 | 127,-0.19257056273006357
130 | 128,-0.47107762571031686
131 | 129,-0.23374598590110818
132 | 130,-0.007609464770713337
133 | 131,-0.3980476099888386
134 | 132,-0.5558966206457587
135 | 133,-0.1344657899523246
136 | 134,-0.6562174891480932
137 | 135,-0.9529234885623512
138 | 136,-0.1824689939763049
139 | 137,-0.4824704474604228
140 | 138,-0.9436448853093331
141 | 139,-0.3369041551721612
142 | 140,0.14497797127573497
143 | 141,0.016325582764854185
144 | 142,-0.19500561044644937
145 | 143,-0.9654489601806846
146 | 144,-0.9612848959159918
147 | 145,-0.162283592524345
148 | 146,0.22063804277118648
149 | 147,-0.7768224686464962
150 | 148,-0.5474822299406553
151 | 149,-0.22684463014547362
152 | 150,-0.05073639447563938
153 | 151,-0.3540337760171799
154 | 152,0.26733413075841495
155 | 153,-0.48318666001008803
156 | 154,1.0412721613305362
157 | 155,-0.3009441654464442
158 | 156,0.23672219675628858
159 | 157,-0.107098377724405
160 | 158,-0.4440316895674985
161 | 159,-0.24570790256824426
162 | 160,0.5943460949278447
163 | 161,-1.0682094133994264
164 | 162,-0.2680015981885515
165 | 163,0.033828877133002866
166 | 164,-0.28231626805343357
167 | 165,0.025170222611089033
168 | 166,-0.17207076283859424
169 | 167,0.2296022365944559
170 | 168,0.04598573095359981
171 | 169,0.8987253251768407
172 | 170,0.4586360956080101
173 | 171,0.42576578797620623
174 | 172,0.10791234233154612
175 | 173,-0.23875487135166396
176 | 174,-0.34084971403433867
177 | 175,0.6546440666166147
178 | 176,1.0435654514531552
179 | 177,0.6100905299960653
180 | 178,-0.42662090687008325
181 | 179,-0.7205534111701549
182 | 180,0.2370598496105042
183 | 181,0.7156811736776737
184 | 182,0.09764433823741903
185 | 183,0.1713968836530575
186 | 184,1.1557269685335108
187 | 185,0.7253976794449961
188 | 186,0.26055050392723333
189 | 187,1.429007916594629
190 | 188,0.8915221745929881
191 | 189,1.849975707474486
192 | 190,1.2170415838605697
193 | 191,0.28227177326870756
194 | 192,0.3758340112512873
195 | 193,0.489395008190605
196 | 194,-0.07780361911193254
197 | 195,1.0592080566932727
198 | 196,1.260639592592383
199 | 197,0.8778403107793371
200 | 198,0.23693750056347074
201 | 199,1.3629839804917614
202 | 200,0.933989699638078
203 | 201,1.2559119044689113
204 | 202,1.562489659119579
205 | 203,1.4766387826355065
206 | 204,1.491247246204821
207 | 205,1.0330384259109189
208 | 206,1.2717659065426434
209 | 207,0.5450718100076353
210 | 208,1.588810638400759
211 | 209,1.180009407506983
212 | 210,1.2517809833903317
213 | 211,1.5428349841931654
214 | 212,0.9090514743496572
215 | 213,2.0636856127889924
216 | 214,1.7295023405620553
217 | 215,0.7207840153166203
218 | 216,1.329677130143393
219 | 217,1.7757506097740814
220 | 218,0.7162167349060784
221 | 219,2.2275539995837077
222 | 220,2.012786749166425
223 | 221,0.8684124735270828
224 | 222,1.7121799084141875
225 | 223,1.574730314175874
226 | 224,1.9539080144419492
227 | 225,1.0177157482426051
228 | 226,1.7317822328139219
229 | 227,1.931341421920829
230 | 228,2.3984024094583827
231 | 229,1.4395431414826998
232 | 230,2.014204161701857
233 | 231,2.8239742537290016
234 | 232,1.2932303643848833
235 | 233,1.9383163687374438
236 | 234,2.2300236863813026
237 | 235,2.110700974442951
238 | 236,1.9604749859151185
239 | 237,2.873056945604347
240 | 238,3.042481203367786
241 | 239,2.0349069225390064
242 | 240,2.0777500108680504
243 | 241,2.291484544878781
244 | 242,2.769883711814939
245 | 243,2.4088627771736624
246 | 244,1.9491837560161467
247 | 245,3.1833487056181706
248 | 246,2.2988579493188883
249 | 247,1.957192841259238
250 | 248,2.9068158400814905
251 | 249,2.3701889353165955
252 | 250,1.919831480345358
253 | 251,2.6692682753733843
254 | 252,2.3481555360095228
255 | 253,2.0611353817546756
256 | 254,2.084063946698618
257 | 255,2.5871870558846437
258 | 256,2.5349460436653226
259 | 257,2.1937121705238254
260 | 258,2.465205616564662
261 | 259,3.5011148068047655
262 | 260,1.0872350793182335
263 | 261,3.1172222909842504
264 | 262,2.2166159479532883
265 | 263,1.7705159676237796
266 | 264,3.1727664641419024
267 | 265,1.9892904101862803
268 | 266,1.5376910059503701
269 | 267,2.7220079745425036
270 | 268,2.2792294831645616
271 | 269,1.2867915515837316
272 | 270,1.7632027534734713
273 | 271,2.608122652864524
274 | 272,1.8349037986395822
275 | 273,2.5657744969319713
276 | 274,2.0294851497169994
277 | 275,2.701897623997814
278 | 276,2.136856139216165
279 | 277,3.0684468389668704
280 | 278,2.5880287951712986
281 | 279,2.09758044416676
282 | 280,1.3842149430365502
283 | 281,2.090777383464958
284 | 282,2.7737299554551322
285 | 283,1.677875405665155
286 | 284,2.5185996092876226
287 | 285,2.3979314455762495
288 | 286,0.7706116377140514
289 | 287,1.9068300017223707
290 | 288,1.6884109083823688
291 | 289,2.3211973311275136
292 | 290,2.074017902710044
293 | 291,1.8601854699639824
294 | 292,2.5770457442035712
295 | 293,1.1320215688173692
296 | 294,1.6661182755241826
297 | 295,1.5065410558928145
298 | 296,1.3504730246720247
299 | 297,1.4781447340715372
300 | 298,1.8287400728716516
301 | 299,2.4439413941638555
302 | 300,1.1335870239005752
303 | 301,1.1376121185076786
304 | 302,1.883419689823916
305 | 303,1.1643748038367192
306 | 304,0.9052621787229713
307 | 305,1.49029893737825
308 | 306,1.504595817338015
309 | 307,0.5972730492321923
310 | 308,1.4200104774184505
311 | 309,1.7746909364603747
312 | 310,1.2224561324894654
313 | 311,0.9871804251175486
314 | 312,1.5178942988712019
315 | 313,1.6323865515785387
316 | 314,0.8180963782943355
317 | 315,1.179894094942133
318 | 316,0.7957333525885373
319 | 317,1.5596466944625096
320 | 318,1.4642504546959447
321 | 319,1.3051724059444467
322 | 320,1.620731615853558
323 | 321,0.34301128322455676
324 | 322,0.8935376468255153
325 | 323,0.7162418593271469
326 | 324,0.7935228855212434
327 | 325,0.715036157783857
328 | 326,0.47255776093126256
329 | 327,0.06053479272504858
330 | 328,1.1092282612381497
331 | 329,1.5854843894731405
332 | 330,0.7226338344053831
333 | 331,0.5415628726616049
334 | 332,0.769707956298005
335 | 333,1.0811826467816688
336 | 334,1.2324015386438218
337 | 335,0.1850672631368836
338 | 336,0.8329997477763121
339 | 337,1.2827590248617144
340 | 338,0.15409359872707973
341 | 339,0.7131220830375605
342 | 340,0.3538863693240233
343 | 341,0.4729187010748861
344 | 342,0.25167060592814694
345 | 343,0.4795340919745855
346 | 344,1.244545001156966
347 | 345,0.9070580938291402
348 | 346,-0.2474843556396955
349 | 347,0.9318249817242306
350 | 348,0.8691696762707389
351 | 349,0.10928398400582107
352 | 350,1.1954642151715116
353 | 351,1.484077554996629
354 | 352,0.40735836498626776
355 | 353,0.8741504310303659
356 | 354,1.3984054286233911
357 | 355,0.8676682893671924
358 | 356,1.1623868990365802
359 | 357,1.3360059694609663
360 | 358,0.5750144526389834
361 | 359,1.4273445427609106
362 | 360,0.7625919084249725
363 | 361,1.5300702740487817
364 | 362,1.1771428900727505
365 | 363,0.9815884070712058
366 | 364,1.7794529677664737
367 | 365,0.5096286014930141
368 | 366,0.8621047208445878
369 | 367,0.6572835372437593
370 | 368,0.7704329971617637
371 | 369,0.5021708574153725
372 | 370,0.9521055790991899
373 | 371,0.5124929945662163
374 | 372,0.24419637629300284
375 | 373,1.1672863895506451
376 | 374,0.7178411779625429
377 | 375,1.1888475010857138
378 | 376,1.2673469074966683
379 | 377,0.4135564113401964
380 | 378,1.2619200687245757
381 | 379,1.2847332369524564
382 | 380,1.0587556277731935
383 | 381,1.2057020854211569
384 | 382,1.9577257560566927
385 | 383,2.0900437646623122
386 | 384,0.9604295505152993
387 | 385,1.6910295543239444
388 | 386,2.4846342819242393
389 | 387,2.0867271577793156
390 | 388,1.2630580708033659
391 | 389,1.211468044794045
392 | 390,0.955653622356968
393 | 391,1.3890179730031247
394 | 392,2.44345181147831
395 | 393,1.204457284758076
396 | 394,2.845650935135721
397 | 395,1.5276002219192633
398 | 396,1.6151275384846175
399 | 397,2.5959934698396947
400 | 398,1.8126461999813244
401 | 399,2.8157807817978533
402 | 400,1.422968461929559
403 | 401,0.9105198075182803
404 | 402,1.9572272725379791
405 | 403,2.600692828025921
406 | 404,1.3992103054577312
407 | 405,3.018559318027152
408 | 406,1.5093470931108484
409 | 407,2.8971446400960468
410 | 408,1.6865911401470695
411 | 409,2.2332932663990315
412 | 410,2.6608879170904864
413 | 411,2.669532774586354
414 | 412,2.0251473312590482
415 | 413,2.523094088712269
416 | 414,2.911419909915979
417 | 415,2.340494856846694
418 | 416,2.545681180095806
419 | 417,3.099596750711708
420 | 418,2.136135468673287
421 | 419,2.1888107371201158
422 | 420,2.6488645103329107
423 | 421,3.1568399578858273
424 | 422,2.440864677932434
425 | 423,2.610809173659483
426 | 424,1.8178056091295223
427 | 425,2.9315735968273544
428 | 426,2.8184595352670607
429 | 427,2.069996262482516
430 | 428,2.210078410725907
431 | 429,4.014375425597757
432 | 430,2.797331532036595
433 | 431,2.34882390480658
434 | 432,2.4007093792308676
435 | 433,3.2702296111523
436 | 434,3.4303221048765566
437 | 435,3.37858399500833
438 | 436,3.212384205955622
439 | 437,2.216321278261658
440 | 438,2.784889634345377
441 | 439,4.419156418444061
442 | 440,3.957440911283956
443 | 441,2.754337273745299
444 | 442,3.9590830385435067
445 | 443,2.9510304655840445
446 | 444,3.430818337970406
447 | 445,3.417164122127078
448 | 446,3.7727735339485235
449 | 447,2.193807925768478
450 | 448,3.3244896074740726
451 | 449,3.4043681191391784
452 | 450,3.6974548916638454
453 | 451,3.9523937610220634
454 | 452,3.5729878538357385
455 | 453,2.495321372584242
456 | 454,2.0219326537398943
457 | 455,2.6792841395141833
458 | 456,3.5211582944939983
459 | 457,2.9853510573663247
460 | 458,2.725114037495373
461 | 459,3.0731529711235157
462 | 460,2.8064542337639415
463 | 461,3.961218040102244
464 | 462,3.318125102556614
465 | 463,3.7284890757462783
466 | 464,3.4752958055141363
467 | 465,3.278428878730352
468 | 466,3.444283176951884
469 | 467,3.77325515759806
470 | 468,2.2907371586806886
471 | 469,2.938790653954773
472 | 470,3.8176787682406395
473 | 471,2.7734549949442497
474 | 472,3.5007835419150353
475 | 473,3.18851778210803
476 | 474,3.594162344250331
477 | 475,2.1631708225851556
478 | 476,3.876183487335109
479 | 477,3.238333572409265
480 | 478,3.0111588283739703
481 | 479,2.6859592974631026
482 | 480,3.543110508796691
483 | 481,3.3523951980962914
484 | 482,3.3045303739129706
485 | 483,2.1505872783687434
486 | 484,3.255545595834051
487 | 485,2.1834361711617376
488 | 486,3.016004454036089
489 | 487,2.4348048508379136
490 | 488,2.2764720072045774
491 | 489,2.13094695214868
492 | 490,2.4107915906463733
493 | 491,2.2028169203722907
494 | 492,2.5179343324986063
495 | 493,2.05351310068931
496 | 494,2.5003053592680518
497 | 495,2.218774657884143
498 | 496,1.644258843640977
499 | 497,2.062599169684491
500 | 498,2.3219602009985816
501 | 499,2.6096961732821673
502 | 500,2.3765700831025884
503 | 501,3.0287779735690266
504 | 502,2.3451050020349733
505 | 503,2.2730504246034657
506 | 504,1.7170030083608254
507 | 505,3.972820176853931
508 | 506,2.96081928143457
509 | 507,1.7440825856250106
510 | 508,2.283956392920651
511 | 509,3.017090874099012
512 | 510,1.73269060308032
513 | 511,2.4818036862008084
514 | 512,1.9178423638357804
515 | 513,2.008468538769001
516 | 514,1.4303512189916163
517 | 515,2.420028353972818
518 | 516,2.333694471544968
519 | 517,2.0358856878361853
520 | 518,1.930372332964022
521 | 519,2.772314114023144
522 | 520,2.6525270397365643
523 | 521,2.7874542842670094
524 | 522,1.4108282031162545
525 | 523,1.6868876057556874
526 | 524,1.5037689438264519
527 | 525,1.2325642370435734
528 | 526,0.7124232713063819
529 | 527,2.3800907644028957
530 | 528,1.3877233338461814
531 | 529,1.8462752581855768
532 | 530,1.6416133440083642
533 | 531,1.8126044092561209
534 | 532,2.0663554509247932
535 | 533,2.761008359626194
536 | 534,2.1577238764476894
537 | 535,2.017417635006672
538 | 536,1.601923991176038
539 | 537,1.7680351150614104
540 | 538,2.065200901662619
541 | 539,1.725351491022523
542 | 540,1.924858339794002
543 | 541,2.125758189878704
544 | 542,1.2301071586988084
545 | 543,1.709721540041509
546 | 544,1.5239738349384686
547 | 545,2.3385901731902794
548 | 546,2.5132419702994624
549 | 547,1.9750801909178817
550 | 548,0.12333314756948865
551 | 549,2.0991657257046445
552 | 550,1.9142082333554962
553 | 551,1.9309896520009284
554 | 552,1.341544103330502
555 | 553,1.28049058809898
556 | 554,2.6218423637877235
557 | 555,2.1286009393550938
558 | 556,2.2438217064205723
559 | 557,1.456842576721939
560 | 558,2.4680883404735128
561 | 559,2.678058024740523
562 | 560,2.1697469640198386
563 | 561,1.9274790031063742
564 | 562,1.263900280529004
565 | 563,1.3976212029710853
566 | 564,0.7847085746264618
567 | 565,2.239433783516727
568 | 566,1.0804046348105025
569 | 567,2.0291971262278277
570 | 568,1.9031523291722041
571 | 569,1.594750755676741
572 | 570,2.095705429543913
573 | 571,2.1439876601219687
574 | 572,2.17447718714394
575 | 573,2.509210779721647
576 | 574,1.348754113695628
577 | 575,2.2650768647581687
578 | 576,2.469957066691454
579 | 577,1.9094084008513759
580 | 578,2.6907915613546765
581 | 579,2.4283581283856654
582 | 580,2.6198086506106506
583 | 581,2.824532088498242
584 | 582,2.144986257680162
585 | 583,2.6967709905140853
586 | 584,2.155027926934807
587 | 585,3.0763715554468307
588 | 586,2.2804956585199467
589 | 587,2.330030225288893
590 | 588,2.7283262483956863
591 | 589,1.9667832940469763
592 | 590,2.7629914605916728
593 | 591,3.347673260683096
594 | 592,2.7204991774409706
595 | 593,2.2213380726123417
596 | 594,3.295738484226204
597 | 595,3.0943834635313716
598 | 596,3.395245128818069
599 | 597,1.7801657319322999
600 | 598,3.006003815599795
601 | 599,3.7271303717241318
602 | 600,3.3687920925147727
603 | 601,3.494081235835588
604 | 602,2.6398630706074826
605 | 603,3.32437043035613
606 | 604,3.9716937039130134
607 | 605,3.0148970547587193
608 | 606,2.7587802972729034
609 | 607,3.541314808626614
610 | 608,3.419552004582584
611 | 609,4.096884215866331
612 | 610,2.5616750924741796
613 | 611,3.5639179882703873
614 | 612,3.4268808208312347
615 | 613,2.872363219761527
616 | 614,3.3855457293820366
617 | 615,2.93785729220192
618 | 616,3.1458155752845265
619 | 617,3.2465539652297237
620 | 618,2.946748408784293
621 | 619,3.7676191513221244
622 | 620,4.106623396844353
623 | 621,3.4644362851382864
624 | 622,4.2298715687484965
625 | 623,4.614724972380464
626 | 624,4.485952085461564
627 | 625,4.008892979961672
628 | 626,3.4117492445208164
629 | 627,3.6571908075816983
630 | 628,3.8408519603679254
631 | 629,3.431112244463793
632 | 630,4.556865657959409
633 | 631,4.237222933222556
634 | 632,3.222116818635072
635 | 633,3.576165574677214
636 | 634,3.9754856194853234
637 | 635,3.328213368589007
638 | 636,4.104693254466189
639 | 637,3.9691640998211914
640 | 638,3.3161168154225567
641 | 639,3.7508224414841997
642 | 640,4.434260513708335
643 | 641,3.2003371924917166
644 | 642,4.980128628132029
645 | 643,4.545334396893772
646 | 644,4.181800858792881
647 | 645,4.264018934480361
648 | 646,3.81587906099798
649 | 647,4.546549273705075
650 | 648,4.343850164247174
651 | 649,3.785030301942927
652 | 650,3.9667904424294553
653 | 651,4.832489508124424
654 | 652,3.5115344994617628
655 | 653,5.280271863593596
656 | 654,5.170105810177021
657 | 655,4.001193197013245
658 | 656,4.152953851268918
659 | 657,4.349360432568223
660 | 658,3.5909299965583887
661 | 659,4.734825589863275
662 | 660,3.893199952723647
663 | 661,5.383358186348112
664 | 662,3.861226525588445
665 | 663,3.8204241629880973
666 | 664,4.030051057138472
667 | 665,4.01900086168555
668 | 666,4.245729586883419
669 | 667,3.8258969209411626
670 | 668,4.640010552723009
671 | 669,4.283439814229849
672 | 670,4.4789429151128495
673 | 671,3.9050383247869966
674 | 672,4.440188192844067
675 | 673,3.9891542166926928
676 | 674,4.533468802275714
677 | 675,3.2600453449902833
678 | 676,4.435279876652234
679 | 677,3.8421123752652178
680 | 678,3.81433230199235
681 | 679,4.095330470306205
682 | 680,3.7508912112831534
683 | 681,3.6067124664798547
684 | 682,3.773260762561308
685 | 683,3.9661995229630125
686 | 684,4.025269079939741
687 | 685,3.891316308095429
688 | 686,2.6268497228468517
689 | 687,3.9555450836012014
690 | 688,4.217561006995724
691 | 689,3.959901576095089
692 | 690,3.9814289938170444
693 | 691,3.4927129816373235
694 | 692,3.643282736855781
695 | 693,3.415009233378614
696 | 694,3.755798217824436
697 | 695,3.767404234589839
698 | 696,3.2622188273548947
699 | 697,3.7034220234951563
700 | 698,2.449142007600186
701 | 699,2.7817285578755713
702 |
--------------------------------------------------------------------------------
/06 - Sequence Models/TODO.txt:
--------------------------------------------------------------------------------
1 | - RNN, LSTM, etc.
2 | - sequence classification
3 | - predicting next element in the sequence
4 | - time-series using RNN
5 | - ...
--------------------------------------------------------------------------------
/06 - Sequence Models/data/seq01.test.csv:
--------------------------------------------------------------------------------
1 | 0.0,0.687785252292,1.1510565163,1.2510565163,0.987785252292,0.5,0.0122147477075,-0.251056516295,-0.151056516295,0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771
2 | 0.687785252292,1.1510565163,1.2510565163,0.987785252292,0.5,0.0122147477075,-0.251056516295,-0.151056516295,0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0
3 | 1.1510565163,1.2510565163,0.987785252292,0.5,0.0122147477075,-0.251056516295,-0.151056516295,0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229
4 | 1.2510565163,0.987785252292,0.5,0.0122147477075,-0.251056516295,-0.151056516295,0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163
5 | 0.987785252292,0.5,0.0122147477075,-0.251056516295,-0.151056516295,0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163
6 | 0.5,0.0122147477075,-0.251056516295,-0.151056516295,0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229
7 | 0.0122147477075,-0.251056516295,-0.151056516295,0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5
8 | -0.251056516295,-0.151056516295,0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771
9 | -0.151056516295,0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837
10 | 0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837
11 | 1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771
12 | 1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0
13 | 2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229
14 | 2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163
15 | 1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163
16 | 1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229
17 | 1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5
18 | 0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771
19 | 0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837
20 | 1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837
21 | 2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771
22 | 2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0
23 | 3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229
24 | 3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163
25 | 2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163
26 | 2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229
27 | 2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5
28 | 1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771
29 | 1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837
30 | 2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837
31 | 3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771
32 | 3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0
33 | 4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229
34 | 4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163
35 | 3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163
36 | 3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229
37 | 3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5
38 | 2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771
39 | 2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837
40 | 3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837
41 | 4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771
42 | 4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0
43 | 5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229
44 | 5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163
45 | 4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163
46 | 4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229
47 | 4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5
48 | 3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771
49 | 3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837
50 | 4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837
51 | 5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771
52 | 5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0
53 | 6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229
54 | 6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163
55 | 5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163
56 | 5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229
57 | 5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5
58 | 4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771
59 | 4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837
60 | 5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837
61 | 6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771
62 | 6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0
63 | 7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229
64 | 7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163
65 | 6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163
66 | 6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229
67 | 6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5
68 | 5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771
69 | 5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837
70 | 6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837
71 | 7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771
72 | 7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0
73 | 8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229
74 | 8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163
75 | 7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163
76 | 7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229
77 | 7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5
78 | 6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771
79 | 6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837
80 | 7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837
81 | 8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771
82 | 8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0
83 | 9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523
84 | 9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163
85 | 8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163
86 | 8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523
87 | 8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5
88 | 7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477
89 | 7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837
90 | 8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837
91 | 9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477
92 | 9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477,11.0
93 | 10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477,11.0,11.6877852523
94 | 10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477,11.0,11.6877852523,12.1510565163
95 | 9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477,11.0,11.6877852523,12.1510565163,12.2510565163
96 | 9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477,11.0,11.6877852523,12.1510565163,12.2510565163,11.9877852523
97 | 9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477,11.0,11.6877852523,12.1510565163,12.2510565163,11.9877852523,11.5
98 | 8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477,11.0,11.6877852523,12.1510565163,12.2510565163,11.9877852523,11.5,11.0122147477
99 | 8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477,11.0,11.6877852523,12.1510565163,12.2510565163,11.9877852523,11.5,11.0122147477,10.7489434837
100 | 9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477,11.0,11.6877852523,12.1510565163,12.2510565163,11.9877852523,11.5,11.0122147477,10.7489434837,10.8489434837
101 |
--------------------------------------------------------------------------------
/07 - Image Analysis/00.0 - TensorFlow Version Update.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "%%bash\n",
10 | "\n",
11 | "pip install -U tensorflow"
12 | ]
13 | },
14 | {
15 | "cell_type": "code",
16 | "execution_count": null,
17 | "metadata": {},
18 | "outputs": [],
19 | "source": [
20 | "import tensorflow as tf\n",
21 | "print(tf.__version__)"
22 | ]
23 | }
24 | ],
25 | "metadata": {
26 | "kernelspec": {
27 | "display_name": "Python 2",
28 | "language": "python",
29 | "name": "python2"
30 | },
31 | "language_info": {
32 | "codemirror_mode": {
33 | "name": "ipython",
34 | "version": 2
35 | },
36 | "file_extension": ".py",
37 | "mimetype": "text/x-python",
38 | "name": "python",
39 | "nbconvert_exporter": "python",
40 | "pygments_lexer": "ipython2",
41 | "version": "2.7.13"
42 | }
43 | },
44 | "nbformat": 4,
45 | "nbformat_minor": 2
46 | }
47 |
--------------------------------------------------------------------------------
/08 - Text Analysis/01 - Text Classification - SMS Ham vs. Spam - Data Preparation.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "markdown",
5 | "metadata": {},
6 | "source": [
7 | "## UCI SMS Spam Collection Dataset\n",
8 | "\n",
9 | "### Dataset URL: http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection\n",
10 | "\n",
11 | "A set of labeled SMS messages + label (ham vs Spam)"
12 | ]
13 | },
14 | {
15 | "cell_type": "code",
16 | "execution_count": 1,
17 | "metadata": {
18 | "collapsed": true
19 | },
20 | "outputs": [],
21 | "source": [
22 | "import pandas as pd\n",
23 | "import string\n",
24 | "import re\n",
25 | "from sklearn import model_selection"
26 | ]
27 | },
28 | {
29 | "cell_type": "code",
30 | "execution_count": 2,
31 | "metadata": {},
32 | "outputs": [
33 | {
34 | "data": {
35 | "text/html": [
36 | "\n",
37 | "\n",
50 | "
\n",
51 | " \n",
52 | " \n",
53 | " | \n",
54 | " class | \n",
55 | " sms | \n",
56 | "
\n",
57 | " \n",
58 | " \n",
59 | " \n",
60 | " 0 | \n",
61 | " ham | \n",
62 | " Go until jurong point, crazy.. Available only ... | \n",
63 | "
\n",
64 | " \n",
65 | " 1 | \n",
66 | " ham | \n",
67 | " Ok lar... Joking wif u oni... | \n",
68 | "
\n",
69 | " \n",
70 | " 2 | \n",
71 | " spam | \n",
72 | " Free entry in 2 a wkly comp to win FA Cup fina... | \n",
73 | "
\n",
74 | " \n",
75 | " 3 | \n",
76 | " ham | \n",
77 | " U dun say so early hor... U c already then say... | \n",
78 | "
\n",
79 | " \n",
80 | " 4 | \n",
81 | " ham | \n",
82 | " Nah I don't think he goes to usf, he lives aro... | \n",
83 | "
\n",
84 | " \n",
85 | "
\n",
86 | "
"
87 | ],
88 | "text/plain": [
89 | " class sms\n",
90 | "0 ham Go until jurong point, crazy.. Available only ...\n",
91 | "1 ham Ok lar... Joking wif u oni...\n",
92 | "2 spam Free entry in 2 a wkly comp to win FA Cup fina...\n",
93 | "3 ham U dun say so early hor... U c already then say...\n",
94 | "4 ham Nah I don't think he goes to usf, he lives aro..."
95 | ]
96 | },
97 | "execution_count": 2,
98 | "metadata": {},
99 | "output_type": "execute_result"
100 | }
101 | ],
102 | "source": [
103 | "DATASET_FILE = 'data/sms-spam/SMSSpamCollection'\n",
104 | "dataset = pd.read_csv(DATASET_FILE, sep='\\t', names=['class','sms'])\n",
105 | "dataset.head()"
106 | ]
107 | },
108 | {
109 | "cell_type": "code",
110 | "execution_count": 3,
111 | "metadata": {},
112 | "outputs": [
113 | {
114 | "name": "stdout",
115 | "output_type": "stream",
116 | "text": [
117 | "Dataset Size: 5572\n",
118 | "ham 4825\n",
119 | "spam 747\n",
120 | "Name: class, dtype: int64\n",
121 | "ham %: 86.59\n",
122 | "ham %: 13.41\n"
123 | ]
124 | }
125 | ],
126 | "source": [
127 | "print(\"Dataset Size: {}\".format(len(dataset)))\n",
128 | "value_counts = dataset['class'].value_counts()\n",
129 | "print(value_counts)\n",
130 | "print(\"ham %: {}\".format(round(value_counts[0]/len(dataset)*100,2)))\n",
131 | "print(\"ham %: {}\".format(round(value_counts[1]/len(dataset)*100,2)))"
132 | ]
133 | },
134 | {
135 | "cell_type": "markdown",
136 | "metadata": {},
137 | "source": [
138 | "## Create Training and Validation Datasets"
139 | ]
140 | },
141 | {
142 | "cell_type": "code",
143 | "execution_count": 4,
144 | "metadata": {},
145 | "outputs": [
146 | {
147 | "name": "stdout",
148 | "output_type": "stream",
149 | "text": [
150 | "4179\n",
151 | "1393\n"
152 | ]
153 | }
154 | ],
155 | "source": [
156 | "exclude = ['\\t', '\"']\n",
157 | "def clean_text(text):\n",
158 | " for c in exclude:\n",
159 | " text=text.replace(c,'')\n",
160 | " return text.lower().strip()\n",
161 | "\n",
162 | "sms_processed = list(map(lambda text: clean_text(text), \n",
163 | " dataset['sms'].values))\n",
164 | "\n",
165 | "dataset['sms'] = sms_processed\n",
166 | "\n",
167 | "splitter = model_selection.StratifiedShuffleSplit(n_splits=1,\n",
168 | " test_size=0.25, \n",
169 | " random_state=19850610)\n",
170 | "\n",
171 | "splits = list(splitter.split(X=dataset['sms'], y=dataset['class']))\n",
172 | "train_index = splits[0][0]\n",
173 | "valid_index = splits[0][1]\n",
174 | "\n",
175 | "train_df = dataset.loc[train_index,:]\n",
176 | "print(len(train_df))\n",
177 | "\n",
178 | "valid_df = dataset.loc[valid_index,:]\n",
179 | "print(len(valid_df))"
180 | ]
181 | },
182 | {
183 | "cell_type": "code",
184 | "execution_count": 5,
185 | "metadata": {},
186 | "outputs": [
187 | {
188 | "name": "stdout",
189 | "output_type": "stream",
190 | "text": [
191 | "Training Set\n",
192 | "ham 3619\n",
193 | "spam 560\n",
194 | "Name: class, dtype: int64\n",
195 | "ham %: 86.6\n",
196 | "ham %: 13.4\n",
197 | "\n",
198 | "Validation Set\n",
199 | "ham 1206\n",
200 | "spam 187\n",
201 | "Name: class, dtype: int64\n",
202 | "ham %: 86.58\n",
203 | "ham %: 13.42\n"
204 | ]
205 | }
206 | ],
207 | "source": [
208 | "print(\"Training Set\")\n",
209 | "training_value_counts = train_df['class'].value_counts()\n",
210 | "print(training_value_counts)\n",
211 | "print(\"ham %: {}\".format(round(training_value_counts[0]/len(train_df)*100,2)))\n",
212 | "print(\"ham %: {}\".format(round(training_value_counts[1]/len(train_df)*100,2)))\n",
213 | "print(\"\")\n",
214 | "print(\"Validation Set\")\n",
215 | "validation_value_counts = valid_df['class'].value_counts()\n",
216 | "print(validation_value_counts)\n",
217 | "print(\"ham %: {}\".format(round(validation_value_counts[0]/len(valid_df)*100,2)))\n",
218 | "print(\"ham %: {}\".format(round(validation_value_counts[1]/len(valid_df)*100,2)))"
219 | ]
220 | },
221 | {
222 | "cell_type": "markdown",
223 | "metadata": {},
224 | "source": [
225 | "## Save Training and Validation Datasets"
226 | ]
227 | },
228 | {
229 | "cell_type": "code",
230 | "execution_count": 6,
231 | "metadata": {
232 | "collapsed": true
233 | },
234 | "outputs": [],
235 | "source": [
236 | "train_df.to_csv(\"data/sms-spam/train-data.tsv\", header=False, index=False, sep='\\t')\n",
237 | "valid_df.to_csv(\"data/sms-spam/valid-data.tsv\", header=False, index=False, sep='\\t')"
238 | ]
239 | },
240 | {
241 | "cell_type": "code",
242 | "execution_count": 7,
243 | "metadata": {},
244 | "outputs": [
245 | {
246 | "data": {
247 | "text/html": [
248 | "\n",
249 | "\n",
262 | "
\n",
263 | " \n",
264 | " \n",
265 | " | \n",
266 | " class | \n",
267 | " sms | \n",
268 | "
\n",
269 | " \n",
270 | " \n",
271 | " \n",
272 | " 4174 | \n",
273 | " ham | \n",
274 | " just woke up. yeesh its late. but i didn't fal... | \n",
275 | "
\n",
276 | " \n",
277 | " 4175 | \n",
278 | " ham | \n",
279 | " what do u reckon as need 2 arrange transport i... | \n",
280 | "
\n",
281 | " \n",
282 | " 4176 | \n",
283 | " spam | \n",
284 | " free entry into our £250 weekly competition ju... | \n",
285 | "
\n",
286 | " \n",
287 | " 4177 | \n",
288 | " spam | \n",
289 | " -pls stop bootydelious (32/f) is inviting you ... | \n",
290 | "
\n",
291 | " \n",
292 | " 4178 | \n",
293 | " ham | \n",
294 | " tell my bad character which u dnt lik in me. ... | \n",
295 | "
\n",
296 | " \n",
297 | "
\n",
298 | "
"
299 | ],
300 | "text/plain": [
301 | " class sms\n",
302 | "4174 ham just woke up. yeesh its late. but i didn't fal...\n",
303 | "4175 ham what do u reckon as need 2 arrange transport i...\n",
304 | "4176 spam free entry into our £250 weekly competition ju...\n",
305 | "4177 spam -pls stop bootydelious (32/f) is inviting you ...\n",
306 | "4178 ham tell my bad character which u dnt lik in me. ..."
307 | ]
308 | },
309 | "execution_count": 7,
310 | "metadata": {},
311 | "output_type": "execute_result"
312 | }
313 | ],
314 | "source": [
315 | "pd.read_csv(\"data/sms-spam/train-data.tsv\", sep='\\t', names=['class','sms']).tail()"
316 | ]
317 | },
318 | {
319 | "cell_type": "code",
320 | "execution_count": 12,
321 | "metadata": {},
322 | "outputs": [
323 | {
324 | "data": {
325 | "text/html": [
326 | "\n",
327 | "\n",
340 | "
\n",
341 | " \n",
342 | " \n",
343 | " | \n",
344 | " class | \n",
345 | " sms | \n",
346 | "
\n",
347 | " \n",
348 | " \n",
349 | " \n",
350 | " 1387 | \n",
351 | " ham | \n",
352 | " true dear..i sat to pray evening and felt so.s... | \n",
353 | "
\n",
354 | " \n",
355 | " 1388 | \n",
356 | " ham | \n",
357 | " what will we do in the shower, baby? | \n",
358 | "
\n",
359 | " \n",
360 | " 1389 | \n",
361 | " ham | \n",
362 | " where are you ? what are you doing ? are yuou ... | \n",
363 | "
\n",
364 | " \n",
365 | " 1390 | \n",
366 | " spam | \n",
367 | " ur cash-balance is currently 500 pounds - to m... | \n",
368 | "
\n",
369 | " \n",
370 | " 1391 | \n",
371 | " spam | \n",
372 | " not heard from u4 a while. call 4 rude chat pr... | \n",
373 | "
\n",
374 | " \n",
375 | "
\n",
376 | "
"
377 | ],
378 | "text/plain": [
379 | " class sms\n",
380 | "1387 ham true dear..i sat to pray evening and felt so.s...\n",
381 | "1388 ham what will we do in the shower, baby?\n",
382 | "1389 ham where are you ? what are you doing ? are yuou ...\n",
383 | "1390 spam ur cash-balance is currently 500 pounds - to m...\n",
384 | "1391 spam not heard from u4 a while. call 4 rude chat pr..."
385 | ]
386 | },
387 | "execution_count": 12,
388 | "metadata": {},
389 | "output_type": "execute_result"
390 | }
391 | ],
392 | "source": [
393 | "pd.read_csv(\"data/sms-spam/valid-data.tsv\", sep='\\t', names=['class','sms']).tail()"
394 | ]
395 | },
396 | {
397 | "cell_type": "markdown",
398 | "metadata": {},
399 | "source": [
400 | "## Calculate Vocabulary"
401 | ]
402 | },
403 | {
404 | "cell_type": "code",
405 | "execution_count": 9,
406 | "metadata": {
407 | "collapsed": true
408 | },
409 | "outputs": [],
410 | "source": [
411 | "def get_vocab():\n",
412 | " vocab = set()\n",
413 | " for text in train_df['sms'].values:\n",
414 | " words = text.split(' ')\n",
415 | " word_set = set(words)\n",
416 | " vocab.update(word_set)\n",
417 | " \n",
418 | " vocab.remove('')\n",
419 | " return list(vocab)"
420 | ]
421 | },
422 | {
423 | "cell_type": "code",
424 | "execution_count": 10,
425 | "metadata": {},
426 | "outputs": [
427 | {
428 | "name": "stdout",
429 | "output_type": "stream",
430 | "text": [
431 | "11330\n"
432 | ]
433 | },
434 | {
435 | "data": {
436 | "text/plain": [
437 | "['child',\n",
438 | " 'place..',\n",
439 | " 'hi..i',\n",
440 | " 'oso?',\n",
441 | " 'home!',\n",
442 | " 'lasting',\n",
443 | " 'there..do',\n",
444 | " 'clock',\n",
445 | " 'advice',\n",
446 | " 'free...']"
447 | ]
448 | },
449 | "execution_count": 10,
450 | "metadata": {},
451 | "output_type": "execute_result"
452 | }
453 | ],
454 | "source": [
455 | "vocab = get_vocab()\n",
456 | "print(len(vocab))\n",
457 | "vocab[10:20]"
458 | ]
459 | },
460 | {
461 | "cell_type": "markdown",
462 | "metadata": {},
463 | "source": [
464 | "## Save Vocabulary"
465 | ]
466 | },
467 | {
468 | "cell_type": "code",
469 | "execution_count": 11,
470 | "metadata": {
471 | "collapsed": true
472 | },
473 | "outputs": [],
474 | "source": [
475 | "PAD_WORD = '#=KS=#'\n",
476 | "\n",
477 | "with open('data/sms-spam/vocab_list.tsv', 'w') as file:\n",
478 | " file.write(\"{}\\n\".format(PAD_WORD))\n",
479 | " for word in vocab:\n",
480 | " file.write(\"{}\\n\".format(word))\n",
481 | " \n",
482 | "with open('data/sms-spam/n_words.tsv', 'w') as file:\n",
483 | " file.write(str(len(vocab)))"
484 | ]
485 | },
486 | {
487 | "cell_type": "code",
488 | "execution_count": null,
489 | "metadata": {
490 | "collapsed": true
491 | },
492 | "outputs": [],
493 | "source": []
494 | }
495 | ],
496 | "metadata": {
497 | "kernelspec": {
498 | "display_name": "Python 3",
499 | "language": "python",
500 | "name": "python3"
501 | },
502 | "language_info": {
503 | "codemirror_mode": {
504 | "name": "ipython",
505 | "version": 3
506 | },
507 | "file_extension": ".py",
508 | "mimetype": "text/x-python",
509 | "name": "python",
510 | "nbconvert_exporter": "python",
511 | "pygments_lexer": "ipython3",
512 | "version": "3.6.1"
513 | }
514 | },
515 | "nbformat": 4,
516 | "nbformat_minor": 2
517 | }
518 |
--------------------------------------------------------------------------------
/08 - Text Analysis/06 - Part_1 - Text Classification - Hacker News - Data Preprocessing with TFT.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": 1,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "# %%bash\n",
10 | "\n",
11 | "# pip install tensorflow==1.7\n",
12 | "# pip install google-cloud-dataflow==2.3\n",
13 | "# pip install tensorflow-hub"
14 | ]
15 | },
16 | {
17 | "cell_type": "markdown",
18 | "metadata": {},
19 | "source": [
20 | "# Text Classification using TensorFlow and Google Cloud - Part 1\n",
21 | "\n",
22 | "This [bigquery-public-data:hacker_news](https://cloud.google.com/bigquery/public-data/hacker-news) contains all stories and comments from Hacker News from its launch in 2006. Each story contains a story id, url, the title of the story, tthe author that made the post, when it was written, and the number of points the story received.\n",
23 | "\n",
24 | "The objective is, given the title of the story, we want to build an ML model that can predict the source of this story.\n",
25 | "\n",
26 | "## Data preparation with tf.Transform and DataFlow\n",
27 | "\n",
28 | "This notebook illustrates how to build a Beam pipeline using tf.transform to prepare ML 'train' and 'eval' datasets. \n",
29 | "The pipeline includes the following steps:\n",
30 | "1. Read data from BigQuery\n",
31 | "2. Extract and clean features from BQ rows\n",
32 | "3. Use tf.transfrom to process the text and produce the following features for each entry\n",
33 | " * title: Raw text - string\n",
34 | " * bow: Bag of word indecies - sparse vector of integers\n",
35 | " * weight: TF.IDF values - sparse vector of floats\n",
36 | " * source: target feature - string\n",
37 | "4. Save the data as .tfrecord files\n",
38 | " \n",
39 | "\n"
40 | ]
41 | },
42 | {
43 | "cell_type": "markdown",
44 | "metadata": {},
45 | "source": [
46 | "### Setting Global Parameters"
47 | ]
48 | },
49 | {
50 | "cell_type": "code",
51 | "execution_count": 2,
52 | "metadata": {},
53 | "outputs": [],
54 | "source": [
55 | "import os\n",
56 | "\n",
57 | "class Params:\n",
58 | " pass\n",
59 | "\n",
60 | "# Set to run on GCP\n",
61 | "Params.GCP_PROJECT_ID = 'ksalama-gcp-playground'\n",
62 | "Params.REGION = 'europe-west1'\n",
63 | "Params.BUCKET = 'ksalama-gcs-cloudml'\n",
64 | "\n",
65 | "Params.PLATFORM = 'local' # local | GCP\n",
66 | "\n",
67 | "Params.DATA_DIR = 'data/news' if Params.PLATFORM == 'local' else 'gs://{}/data/news'.format(Params.BUCKET)\n",
68 | "\n",
69 | "Params.TRANSFORMED_DATA_DIR = os.path.join(Params.DATA_DIR, 'transformed')\n",
70 | "Params.TRANSFORMED_TRAIN_DATA_FILE_PREFIX = os.path.join(Params.TRANSFORMED_DATA_DIR, 'train')\n",
71 | "Params.TRANSFORMED_EVAL_DATA_FILE_PREFIX = os.path.join(Params.TRANSFORMED_DATA_DIR, 'eval')\n",
72 | "\n",
73 | "Params.TEMP_DIR = os.path.join(Params.DATA_DIR, 'tmp')\n",
74 | "\n",
75 | "Params.MODELS_DIR = 'models/news' if Params.PLATFORM == 'local' else 'gs://{}/models/news'.format(Params.BUCKET)\n",
76 | "\n",
77 | "Params.TRANSFORM_ARTEFACTS_DIR = os.path.join(Params.MODELS_DIR,'transform')\n",
78 | "\n",
79 | "Params.TRANSFORM = True"
80 | ]
81 | },
82 | {
83 | "cell_type": "markdown",
84 | "metadata": {},
85 | "source": [
86 | "### Importing libraries"
87 | ]
88 | },
89 | {
90 | "cell_type": "code",
91 | "execution_count": 3,
92 | "metadata": {},
93 | "outputs": [
94 | {
95 | "name": "stdout",
96 | "output_type": "stream",
97 | "text": [
98 | "WARNING:tensorflow:From /Users/khalidsalama/Technology/python-venvs/py27-venv/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.\n",
99 | "Instructions for updating:\n",
100 | "Use the retry module or similar alternatives.\n"
101 | ]
102 | }
103 | ],
104 | "source": [
105 | "import apache_beam as beam\n",
106 | "\n",
107 | "import tensorflow as tf\n",
108 | "import tensorflow_transform as tft\n",
109 | "import tensorflow_transform.coders as tft_coders\n",
110 | "\n",
111 | "from tensorflow.contrib.learn.python.learn.utils import input_fn_utils\n",
112 | "\n",
113 | "from tensorflow_transform.beam import impl\n",
114 | "from tensorflow_transform.beam.tft_beam_io import transform_fn_io\n",
115 | "from tensorflow_transform.tf_metadata import metadata_io\n",
116 | "from tensorflow_transform.tf_metadata import dataset_schema\n",
117 | "from tensorflow_transform.tf_metadata import dataset_metadata\n",
118 | "from tensorflow_transform.saved import saved_transform_io"
119 | ]
120 | },
121 | {
122 | "cell_type": "markdown",
123 | "metadata": {},
124 | "source": [
125 | "## 1. Source Query"
126 | ]
127 | },
128 | {
129 | "cell_type": "code",
130 | "execution_count": 4,
131 | "metadata": {},
132 | "outputs": [],
133 | "source": [
134 | "bq_query = '''\n",
135 | "SELECT\n",
136 | " key,\n",
137 | " REGEXP_REPLACE(title, '[^a-zA-Z0-9 $.-]', ' ') AS title, \n",
138 | " source\n",
139 | "FROM\n",
140 | "(\n",
141 | " SELECT\n",
142 | " ARRAY_REVERSE(SPLIT(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'), '.'))[OFFSET(1)] AS source,\n",
143 | " title,\n",
144 | " ABS(FARM_FINGERPRINT(title)) AS Key\n",
145 | " FROM\n",
146 | " `bigquery-public-data.hacker_news.stories`\n",
147 | " WHERE\n",
148 | " REGEXP_CONTAINS(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'), '.com$')\n",
149 | " AND LENGTH(title) > 10\n",
150 | ")\n",
151 | "WHERE (source = 'github' OR source = 'nytimes' OR source = 'techcrunch')\n",
152 | "'''\n",
153 | "\n",
154 | "def get_source_query(step):\n",
155 | " \n",
156 | " if step == 'train':\n",
157 | " source_query = 'SELECT * FROM ({}) WHERE MOD(key,100) <= 75'.format(bq_query)\n",
158 | " else:\n",
159 | " source_query = 'SELECT * FROM ({}) WHERE MOD(key,100) > 75'.format(bq_query)\n",
160 | " \n",
161 | " return source_query"
162 | ]
163 | },
164 | {
165 | "cell_type": "markdown",
166 | "metadata": {},
167 | "source": [
168 | "## 2. Raw metadata"
169 | ]
170 | },
171 | {
172 | "cell_type": "code",
173 | "execution_count": 5,
174 | "metadata": {},
175 | "outputs": [],
176 | "source": [
177 | "RAW_HEADER = 'key,title,source'.split(',')\n",
178 | "RAW_DEFAULTS = [['NA'],['NA'],['NA']]\n",
179 | "TARGET_FEATURE_NAME = 'source'\n",
180 | "TARGET_LABELS = ['github', 'nytimes', 'techcrunch']\n",
181 | "TEXT_FEATURE_NAME = 'title'\n",
182 | "KEY_COLUMN = 'key'\n",
183 | "\n",
184 | "VOCAB_SIZE = 20000\n",
185 | "TRAIN_SIZE = 73124\n",
186 | "EVAL_SIZE = 23079\n",
187 | "\n",
188 | "DELIMITERS = '.,!?() '\n",
189 | "\n",
190 | "raw_metadata = dataset_metadata.DatasetMetadata(dataset_schema.Schema({\n",
191 | " KEY_COLUMN: dataset_schema.ColumnSchema(\n",
192 | " tf.string, [], dataset_schema.FixedColumnRepresentation()),\n",
193 | " TEXT_FEATURE_NAME: dataset_schema.ColumnSchema(\n",
194 | " tf.string, [], dataset_schema.FixedColumnRepresentation()),\n",
195 | " TARGET_FEATURE_NAME: dataset_schema.ColumnSchema(\n",
196 | " tf.string, [], dataset_schema.FixedColumnRepresentation()),\n",
197 | "}))"
198 | ]
199 | },
200 | {
201 | "cell_type": "markdown",
202 | "metadata": {},
203 | "source": [
204 | "## 3. Preprocessing functions"
205 | ]
206 | },
207 | {
208 | "cell_type": "code",
209 | "execution_count": 6,
210 | "metadata": {},
211 | "outputs": [],
212 | "source": [
213 | "def get_features(bq_row):\n",
214 | " \n",
215 | " CSV_HEADER = 'key,title,source'.split(',')\n",
216 | " \n",
217 | " input_features = {}\n",
218 | " \n",
219 | " for feature_name in CSV_HEADER:\n",
220 | " input_features[feature_name] = str(bq_row[feature_name]).lower()\n",
221 | " \n",
222 | " return input_features\n",
223 | "\n",
224 | "\n",
225 | "def preprocessing_fn(input_features):\n",
226 | " \n",
227 | " text = input_features[TEXT_FEATURE_NAME]\n",
228 | "\n",
229 | " text_tokens = tf.string_split(text, DELIMITERS)\n",
230 | " text_tokens_indcies = tft.string_to_int(text_tokens, top_k=VOCAB_SIZE)\n",
231 | " bag_of_words_indices, text_weight = tft.tfidf(text_tokens_indcies, VOCAB_SIZE + 1)\n",
232 | " \n",
233 | " output_features = {}\n",
234 | " output_features[TEXT_FEATURE_NAME] = input_features[TEXT_FEATURE_NAME]\n",
235 | " output_features['bow'] = bag_of_words_indices\n",
236 | " output_features['weight'] = text_weight\n",
237 | " output_features[TARGET_FEATURE_NAME] = input_features[TARGET_FEATURE_NAME]\n",
238 | " \n",
239 | " return output_features"
240 | ]
241 | },
242 | {
243 | "cell_type": "markdown",
244 | "metadata": {},
245 | "source": [
246 | "## 4. Beam Pipeline"
247 | ]
248 | },
249 | {
250 | "cell_type": "code",
251 | "execution_count": 7,
252 | "metadata": {},
253 | "outputs": [],
254 | "source": [
255 | "import apache_beam as beam\n",
256 | "\n",
257 | "\n",
258 | "def run_pipeline(runner, opts):\n",
259 | " \n",
260 | " print(\"Sink train data files: {}\".format(Params.TRANSFORMED_TRAIN_DATA_FILE_PREFIX))\n",
261 | " print(\"Sink data files: {}\".format(Params.TRANSFORMED_EVAL_DATA_FILE_PREFIX))\n",
262 | " print(\"Temporary directory: {}\".format(Params.TEMP_DIR))\n",
263 | " print(\"\")\n",
264 | " \n",
265 | " \n",
266 | " with beam.Pipeline(runner, options=opts) as pipeline:\n",
267 | " with impl.Context(Params.TEMP_DIR): \n",
268 | " \n",
269 | " ###### analyze & transform train #########################################################\n",
270 | " if(runner=='DirectRunner'):\n",
271 | " print(\"\")\n",
272 | " print(\"Transform training data....\")\n",
273 | " print(\"\")\n",
274 | " \n",
275 | " step = 'train'\n",
276 | " source_query = get_source_query(step)\n",
277 | " \n",
278 | " # Read raw train data from BQ and cleanup\n",
279 | " raw_train_data = (\n",
280 | " pipeline\n",
281 | " | '{} - Read Data from BigQuery'.format(step) >> beam.io.Read(beam.io.BigQuerySource(query=source_query, use_standard_sql=True))\n",
282 | " | '{} - Extract Features'.format(step) >> beam.Map(get_features)\n",
283 | " )\n",
284 | " \n",
285 | " # create a train dataset from the data and schema\n",
286 | " raw_train_dataset = (raw_train_data, raw_metadata)\n",
287 | " \n",
288 | " # analyze and transform raw_train_dataset to produced transformed_train_dataset and transform_fn\n",
289 | " transformed_train_dataset, transform_fn = (\n",
290 | " raw_train_dataset \n",
291 | " | '{} - Analyze & Transform'.format(step) >> impl.AnalyzeAndTransformDataset(preprocessing_fn)\n",
292 | " )\n",
293 | " \n",
294 | " # get data and schema separately from the transformed_train_dataset\n",
295 | " transformed_train_data, transformed_metadata = transformed_train_dataset\n",
296 | "\n",
297 | " # write transformed train data to sink\n",
298 | " _ = (\n",
299 | " transformed_train_data \n",
300 | " | '{} - Write Transformed Data as tfrecords'.format(step) >> beam.io.tfrecordio.WriteToTFRecord(\n",
301 | " file_path_prefix=Params.TRANSFORMED_TRAIN_DATA_FILE_PREFIX,\n",
302 | " file_name_suffix=\".tfrecords\",\n",
303 | " num_shards=25,\n",
304 | " coder=tft_coders.example_proto_coder.ExampleProtoCoder(transformed_metadata.schema))\n",
305 | " )\n",
306 | " \n",
307 | " \n",
308 | "# #### TEST write transformed AS TEXT train data to sink\n",
309 | "# _ = (\n",
310 | "# transformed_train_data \n",
311 | "# | '{} - Write Transformed Data as Text'.format(step) >> beam.io.textio.WriteToText(\n",
312 | "# file_path_prefix=Params.TRANSFORMED_TRAIN_DATA_FILE_PREFIX,\n",
313 | "# file_name_suffix=\".csv\")\n",
314 | "# )\n",
315 | "# ##################################################\n",
316 | "\n",
317 | "\n",
318 | " ###### transform eval ##################################################################\n",
319 | " \n",
320 | " if(runner=='DirectRunner'):\n",
321 | " print(\"\")\n",
322 | " print(\"Transform eval data....\")\n",
323 | " print(\"\")\n",
324 | " \n",
325 | " step = 'eval'\n",
326 | " source_query = get_source_query(step)\n",
327 | "\n",
328 | " # Read raw eval data from BQ and cleanup\n",
329 | " raw_eval_data = (\n",
330 | " pipeline\n",
331 | " | '{} - Read Data from BigQuery'.format(step) >> beam.io.Read(beam.io.BigQuerySource(query=source_query, use_standard_sql=True))\n",
332 | " | '{} - Extract Features'.format(step) >> beam.Map(get_features)\n",
333 | " )\n",
334 | " \n",
335 | " # create a eval dataset from the data and schema\n",
336 | " raw_eval_dataset = (raw_eval_data, raw_metadata)\n",
337 | " \n",
338 | " # transform eval data based on produced transform_fn (from analyzing train_data)\n",
339 | " transformed_eval_dataset = (\n",
340 | " (raw_eval_dataset, transform_fn) \n",
341 | " | '{} - Transform'.format(step) >> impl.TransformDataset()\n",
342 | " )\n",
343 | " \n",
344 | " # get data from the transformed_eval_dataset\n",
345 | " transformed_eval_data, _ = transformed_eval_dataset\n",
346 | " \n",
347 | " # write transformed eval data to sink\n",
348 | " _ = (\n",
349 | " transformed_eval_data \n",
350 | " | '{} - Write Transformed Data'.format(step) >> beam.io.tfrecordio.WriteToTFRecord(\n",
351 | " file_path_prefix=Params.TRANSFORMED_EVAL_DATA_FILE_PREFIX,\n",
352 | " file_name_suffix=\".tfrecords\",\n",
353 | " num_shards=10,\n",
354 | " coder=tft_coders.example_proto_coder.ExampleProtoCoder(transformed_metadata.schema))\n",
355 | " )\n",
356 | " \n",
357 | " ###### write transformation metadata #######################################################\n",
358 | " if(runner=='DirectRunner'):\n",
359 | " print(\"\")\n",
360 | " print(\"Saving transformation artefacts ....\")\n",
361 | " print(\"\")\n",
362 | " \n",
363 | " # write transform_fn as tf.graph\n",
364 | " _ = (\n",
365 | " transform_fn \n",
366 | " | 'Write Transform Artefacts' >> transform_fn_io.WriteTransformFn(Params.TRANSFORM_ARTEFACTS_DIR)\n",
367 | " )\n",
368 | "\n",
369 | " if runner=='DataflowRunner':\n",
370 | " pipeline.run()"
371 | ]
372 | },
373 | {
374 | "cell_type": "markdown",
375 | "metadata": {},
376 | "source": [
377 | "## 5. Run Pipeline"
378 | ]
379 | },
380 | {
381 | "cell_type": "code",
382 | "execution_count": 8,
383 | "metadata": {},
384 | "outputs": [
385 | {
386 | "name": "stdout",
387 | "output_type": "stream",
388 | "text": [
389 | "Launching DirectRunner job preprocess-hackernews-data-180514-115222 ... hang on\n",
390 | "Sink train data files: data/news/transformed/train\n",
391 | "Sink data files: data/news/transformed/eval\n",
392 | "Temporary directory: data/news/tmp\n",
393 | "\n",
394 | "\n",
395 | "Transform training data....\n",
396 | "\n",
397 | "\n",
398 | "Transform eval data....\n",
399 | "\n",
400 | "\n",
401 | "Saving transformation artefacts ....\n",
402 | "\n"
403 | ]
404 | },
405 | {
406 | "name": "stderr",
407 | "output_type": "stream",
408 | "text": [
409 | "/Users/khalidsalama/Technology/python-venvs/py27-venv/lib/python2.7/site-packages/apache_beam/runners/direct/direct_runner.py:337: DeprecationWarning: options is deprecated since First stable release.. References to .options will not be supported\n",
410 | " pipeline.replace_all(_get_transform_overrides(pipeline.options))\n",
411 | "WARNING:root:Dataset ksalama-gcp-playground:temp_dataset_151e64fa07a3490bae91dd844ce4b7da does not exist so we will create it as temporary with location=None\n",
412 | "WARNING:root:Dataset ksalama-gcp-playground:temp_dataset_f3701d6e27e14e068968a255f43c4b8c does not exist so we will create it as temporary with location=None\n"
413 | ]
414 | },
415 | {
416 | "name": "stdout",
417 | "output_type": "stream",
418 | "text": [
419 | "Pipline completed.\n"
420 | ]
421 | }
422 | ],
423 | "source": [
424 | "from datetime import datetime\n",
425 | "import shutil\n",
426 | "\n",
427 | "job_name = 'preprocess-hackernews-data' + '-' + datetime.utcnow().strftime('%y%m%d-%H%M%S')\n",
428 | "\n",
429 | "options = {\n",
430 | " 'region': Params.REGION,\n",
431 | " 'staging_location': os.path.join(Params.TEMP_DIR, 'staging'),\n",
432 | " 'temp_location': Params.TEMP_DIR,\n",
433 | " 'job_name': job_name,\n",
434 | " 'project': Params.GCP_PROJECT_ID\n",
435 | "}\n",
436 | "\n",
437 | "tf.logging.set_verbosity(tf.logging.ERROR)\n",
438 | "\n",
439 | "opts = beam.pipeline.PipelineOptions(flags=[], **options)\n",
440 | "runner = 'DirectRunner' if Params.PLATFORM == 'local' else 'DirectRunner'\n",
441 | "\n",
442 | "if Params.TRANSFORM:\n",
443 | " \n",
444 | " if Params.PLATFORM == 'local':\n",
445 | " shutil.rmtree(Params.TRANSFORMED_DATA_DIR, ignore_errors=True)\n",
446 | " shutil.rmtree(Params.TRANSFORM_ARTEFACTS_DIR, ignore_errors=True)\n",
447 | " shutil.rmtree(Params.TEMP_DIR, ignore_errors=True)\n",
448 | " \n",
449 | " print 'Launching {} job {} ... hang on'.format(runner, job_name)\n",
450 | " \n",
451 | " run_pipeline(runner, opts)\n",
452 | " \n",
453 | " print \"Pipline completed.\"\n",
454 | "else:\n",
455 | " print \"Transformation skipped!\""
456 | ]
457 | },
458 | {
459 | "cell_type": "code",
460 | "execution_count": 9,
461 | "metadata": {},
462 | "outputs": [
463 | {
464 | "name": "stdout",
465 | "output_type": "stream",
466 | "text": [
467 | "** transformed data:\n",
468 | "eval-00000-of-00010.tfrecords\n",
469 | "eval-00001-of-00010.tfrecords\n",
470 | "eval-00002-of-00010.tfrecords\n",
471 | "eval-00003-of-00010.tfrecords\n",
472 | "eval-00004-of-00010.tfrecords\n",
473 | "eval-00005-of-00010.tfrecords\n",
474 | "eval-00006-of-00010.tfrecords\n",
475 | "eval-00007-of-00010.tfrecords\n",
476 | "eval-00008-of-00010.tfrecords\n",
477 | "eval-00009-of-00010.tfrecords\n",
478 | "train-00000-of-00025.tfrecords\n",
479 | "train-00001-of-00025.tfrecords\n",
480 | "train-00002-of-00025.tfrecords\n",
481 | "train-00003-of-00025.tfrecords\n",
482 | "train-00004-of-00025.tfrecords\n",
483 | "train-00005-of-00025.tfrecords\n",
484 | "train-00006-of-00025.tfrecords\n",
485 | "train-00007-of-00025.tfrecords\n",
486 | "train-00008-of-00025.tfrecords\n",
487 | "train-00009-of-00025.tfrecords\n",
488 | "train-00010-of-00025.tfrecords\n",
489 | "train-00011-of-00025.tfrecords\n",
490 | "train-00012-of-00025.tfrecords\n",
491 | "train-00013-of-00025.tfrecords\n",
492 | "train-00014-of-00025.tfrecords\n",
493 | "train-00015-of-00025.tfrecords\n",
494 | "train-00016-of-00025.tfrecords\n",
495 | "train-00017-of-00025.tfrecords\n",
496 | "train-00018-of-00025.tfrecords\n",
497 | "train-00019-of-00025.tfrecords\n",
498 | "train-00020-of-00025.tfrecords\n",
499 | "train-00021-of-00025.tfrecords\n",
500 | "train-00022-of-00025.tfrecords\n",
501 | "train-00023-of-00025.tfrecords\n",
502 | "train-00024-of-00025.tfrecords\n",
503 | "\n",
504 | "** transform artefacts:\n",
505 | "transform_fn\n",
506 | "transformed_metadata\n",
507 | "\n",
508 | "** transform assets:\n",
509 | "vocab_string_to_int_uniques\n",
510 | "\n",
511 | "the\n",
512 | "a\n",
513 | "to\n",
514 | "for\n",
515 | "in\n",
516 | "of\n",
517 | "and\n",
518 | "s\n",
519 | "on\n",
520 | "with\n"
521 | ]
522 | }
523 | ],
524 | "source": [
525 | "%%bash\n",
526 | "\n",
527 | "echo \"** transformed data:\"\n",
528 | "ls data/news/transformed\n",
529 | "echo \"\"\n",
530 | "\n",
531 | "echo \"** transform artefacts:\"\n",
532 | "ls models/news/transform\n",
533 | "echo \"\"\n",
534 | "\n",
535 | "echo \"** transform assets:\"\n",
536 | "ls models/news/transform/transform_fn/assets\n",
537 | "echo \"\"\n",
538 | "\n",
539 | "head models/news/transform/transform_fn/assets/vocab_string_to_int_uniques"
540 | ]
541 | },
542 | {
543 | "cell_type": "code",
544 | "execution_count": null,
545 | "metadata": {},
546 | "outputs": [],
547 | "source": []
548 | }
549 | ],
550 | "metadata": {
551 | "kernelspec": {
552 | "display_name": "Python 2",
553 | "language": "python",
554 | "name": "python2"
555 | },
556 | "language_info": {
557 | "codemirror_mode": {
558 | "name": "ipython",
559 | "version": 2
560 | },
561 | "file_extension": ".py",
562 | "mimetype": "text/x-python",
563 | "name": "python",
564 | "nbconvert_exporter": "python",
565 | "pygments_lexer": "ipython2",
566 | "version": "2.7.10"
567 | }
568 | },
569 | "nbformat": 4,
570 | "nbformat_minor": 2
571 | }
572 |
--------------------------------------------------------------------------------
/08 - Text Analysis/06 - Part_4 - Text Classification - Hacker News - DNNClassifier with TF.IDF.ipynb:
--------------------------------------------------------------------------------
1 | {
2 | "cells": [
3 | {
4 | "cell_type": "code",
5 | "execution_count": null,
6 | "metadata": {},
7 | "outputs": [],
8 | "source": [
9 | "# %%bash\n",
10 | "\n",
11 | "# pip install tensorflow==1.7\n",
12 | "# pip install tensorflow-transform"
13 | ]
14 | },
15 | {
16 | "cell_type": "markdown",
17 | "metadata": {},
18 | "source": [
19 | "# Text Classification using TensorFlow and Google Cloud - Part 4\n",
20 | "\n",
21 | "This [bigquery-public-data:hacker_news](https://cloud.google.com/bigquery/public-data/hacker-news) contains all stories and comments from Hacker News from its launch in 2006. Each story contains a story id, url, the title of the story, tthe author that made the post, when it was written, and the number of points the story received.\n",
22 | "\n",
23 | "The objective is, given the title of the story, we want to build an ML model that can predict the source of this story.\n",
24 | "\n",
25 | "## TF DNNClassifier with TF.IDF Text Reprsentation\n",
26 | "\n",
27 | "This notebook illustrates how to build a TF premade estimator, namely DNNClassifier, while the input text will be repesented as TF.IDF computed during the preprocessing phase in Part 1. The overall steps are as follows:\n",
28 | "\n",
29 | "\n",
30 | "1. Define the metadata\n",
31 | "2. Define data input function\n",
32 | "2. Create feature columns (using the tfidf)\n",
33 | "3. Create the premade DNNClassifier estimator\n",
34 | "4. Setup experiement\n",
35 | " * Hyper-parameters & RunConfig\n",
36 | " * Serving function (for exported model)\n",
37 | " * TrainSpec & EvalSpec\n",
38 | "5. Run experiement\n",
39 | "6. Evalute the model\n",
40 | "7. Use SavedModel for prediction\n",
41 | " \n",
42 | "\n"
43 | ]
44 | },
45 | {
46 | "cell_type": "markdown",
47 | "metadata": {},
48 | "source": [
49 | "### Setting Global Parameters"
50 | ]
51 | },
52 | {
53 | "cell_type": "code",
54 | "execution_count": 1,
55 | "metadata": {},
56 | "outputs": [],
57 | "source": [
58 | "import os\n",
59 | "\n",
60 | "class Params:\n",
61 | " pass\n",
62 | "\n",
63 | "# Set to run on GCP\n",
64 | "Params.GCP_PROJECT_ID = 'ksalama-gcp-playground'\n",
65 | "Params.REGION = 'europe-west1'\n",
66 | "Params.BUCKET = 'ksalama-gcs-cloudml'\n",
67 | "\n",
68 | "Params.PLATFORM = 'local' # local | GCP\n",
69 | "\n",
70 | "Params.DATA_DIR = 'data/news' if Params.PLATFORM == 'local' else 'gs://{}/data/news'.format(Params.BUCKET)\n",
71 | "\n",
72 | "Params.TRANSFORMED_DATA_DIR = os.path.join(Params.DATA_DIR, 'transformed')\n",
73 | "Params.TRANSFORMED_TRAIN_DATA_FILE_PREFIX = os.path.join(Params.TRANSFORMED_DATA_DIR, 'train')\n",
74 | "Params.TRANSFORMED_EVAL_DATA_FILE_PREFIX = os.path.join(Params.TRANSFORMED_DATA_DIR, 'eval')\n",
75 | "\n",
76 | "Params.TEMP_DIR = os.path.join(Params.DATA_DIR, 'tmp')\n",
77 | "\n",
78 | "Params.MODELS_DIR = 'models/news' if Params.PLATFORM == 'local' else 'gs://{}/models/news'.format(Params.BUCKET)\n",
79 | "\n",
80 | "Params.TRANSFORM_ARTEFACTS_DIR = os.path.join(Params.MODELS_DIR,'transform')\n",
81 | "\n",
82 | "Params.TRAIN = True\n",
83 | "\n",
84 | "Params.RESUME_TRAINING = False\n",
85 | "\n",
86 | "Params.EAGER = False\n",
87 | "\n",
88 | "if Params.EAGER:\n",
89 | " tf.enable_eager_execution()"
90 | ]
91 | },
92 | {
93 | "cell_type": "markdown",
94 | "metadata": {},
95 | "source": [
96 | "### Importing libraries"
97 | ]
98 | },
99 | {
100 | "cell_type": "code",
101 | "execution_count": 2,
102 | "metadata": {},
103 | "outputs": [
104 | {
105 | "name": "stdout",
106 | "output_type": "stream",
107 | "text": [
108 | "WARNING:tensorflow:From /Users/khalidsalama/Technology/python-venvs/py27-venv/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.\n",
109 | "Instructions for updating:\n",
110 | "Use the retry module or similar alternatives.\n",
111 | "1.7.0\n"
112 | ]
113 | }
114 | ],
115 | "source": [
116 | "import tensorflow as tf\n",
117 | "from tensorflow import data\n",
118 | "\n",
119 | "\n",
120 | "from tensorflow.contrib.learn.python.learn.utils import input_fn_utils\n",
121 | "from tensorflow_transform.beam.tft_beam_io import transform_fn_io\n",
122 | "from tensorflow_transform.tf_metadata import metadata_io\n",
123 | "from tensorflow_transform.tf_metadata import dataset_schema\n",
124 | "from tensorflow_transform.tf_metadata import dataset_metadata\n",
125 | "from tensorflow_transform.saved import saved_transform_io\n",
126 | "\n",
127 | "print tf.__version__"
128 | ]
129 | },
130 | {
131 | "cell_type": "markdown",
132 | "metadata": {},
133 | "source": [
134 | "## 1. Define Metadata"
135 | ]
136 | },
137 | {
138 | "cell_type": "code",
139 | "execution_count": 3,
140 | "metadata": {},
141 | "outputs": [
142 | {
143 | "name": "stdout",
144 | "output_type": "stream",
145 | "text": [
146 | "{u'source': FixedLenFeature(shape=[], dtype=tf.string, default_value=None), u'title': FixedLenFeature(shape=[], dtype=tf.string, default_value=None), u'weight': VarLenFeature(dtype=tf.float32), u'bow': VarLenFeature(dtype=tf.int64)}\n"
147 | ]
148 | }
149 | ],
150 | "source": [
151 | "RAW_HEADER = 'key,title,source'.split(',')\n",
152 | "RAW_DEFAULTS = [['NA'],['NA'],['NA']]\n",
153 | "TARGET_FEATURE_NAME = 'source'\n",
154 | "TARGET_LABELS = ['github', 'nytimes', 'techcrunch']\n",
155 | "TEXT_FEATURE_NAME = 'title'\n",
156 | "KEY_COLUMN = 'key'\n",
157 | "\n",
158 | "VOCAB_SIZE = 20000\n",
159 | "TRAIN_SIZE = 73124\n",
160 | "EVAL_SIZE = 23079\n",
161 | "\n",
162 | "DELIMITERS = '.,!?() '\n",
163 | "\n",
164 | "raw_metadata = dataset_metadata.DatasetMetadata(dataset_schema.Schema({\n",
165 | " KEY_COLUMN: dataset_schema.ColumnSchema(\n",
166 | " tf.string, [], dataset_schema.FixedColumnRepresentation()),\n",
167 | " TEXT_FEATURE_NAME: dataset_schema.ColumnSchema(\n",
168 | " tf.string, [], dataset_schema.FixedColumnRepresentation()),\n",
169 | " TARGET_FEATURE_NAME: dataset_schema.ColumnSchema(\n",
170 | " tf.string, [], dataset_schema.FixedColumnRepresentation()),\n",
171 | "}))\n",
172 | "\n",
173 | "\n",
174 | "transformed_metadata = metadata_io.read_metadata(\n",
175 | " os.path.join(Params.TRANSFORM_ARTEFACTS_DIR,\"transformed_metadata\"))\n",
176 | "\n",
177 | "raw_feature_spec = raw_metadata.schema.as_feature_spec()\n",
178 | "transformed_feature_spec = transformed_metadata.schema.as_feature_spec()\n",
179 | "\n",
180 | "print transformed_feature_spec"
181 | ]
182 | },
183 | {
184 | "cell_type": "markdown",
185 | "metadata": {},
186 | "source": [
187 | "## 2. Define Input Function"
188 | ]
189 | },
190 | {
191 | "cell_type": "code",
192 | "execution_count": 4,
193 | "metadata": {},
194 | "outputs": [],
195 | "source": [
196 | "def parse_tf_example(tf_example):\n",
197 | " \n",
198 | " parsed_features = tf.parse_single_example(serialized=tf_example, features=transformed_feature_spec)\n",
199 | " target = parsed_features.pop(TARGET_FEATURE_NAME)\n",
200 | " \n",
201 | " return parsed_features, target\n",
202 | "\n",
203 | "\n",
204 | "def generate_tfrecords_input_fn(files_pattern, \n",
205 | " mode=tf.estimator.ModeKeys.EVAL, \n",
206 | " num_epochs=1, \n",
207 | " batch_size=200):\n",
208 | " \n",
209 | " def _input_fn():\n",
210 | " \n",
211 | " file_names = data.Dataset.list_files(files_pattern)\n",
212 | "\n",
213 | " if Params.EAGER:\n",
214 | " print file_names\n",
215 | "\n",
216 | " dataset = data.TFRecordDataset(file_names )\n",
217 | "\n",
218 | " dataset = dataset.apply(\n",
219 | " tf.contrib.data.shuffle_and_repeat(count=num_epochs,\n",
220 | " buffer_size=batch_size*2)\n",
221 | " )\n",
222 | "\n",
223 | " dataset = dataset.apply(\n",
224 | " tf.contrib.data.map_and_batch(parse_tf_example, \n",
225 | " batch_size=batch_size, \n",
226 | " num_parallel_batches=2)\n",
227 | " )\n",
228 | "\n",
229 | " datset = dataset.prefetch(batch_size)\n",
230 | "\n",
231 | " if Params.EAGER:\n",
232 | " return dataset\n",
233 | "\n",
234 | " iterator = dataset.make_one_shot_iterator()\n",
235 | " features, target = iterator.get_next()\n",
236 | " return features, target\n",
237 | " \n",
238 | " return _input_fn"
239 | ]
240 | },
241 | {
242 | "cell_type": "markdown",
243 | "metadata": {},
244 | "source": [
245 | "## 3. Create feature columns"
246 | ]
247 | },
248 | {
249 | "cell_type": "code",
250 | "execution_count": 5,
251 | "metadata": {},
252 | "outputs": [],
253 | "source": [
254 | "BOW_FEATURE_NAME = 'bow'\n",
255 | "TFIDF_FEATURE_NAME = 'weight'\n",
256 | "\n",
257 | "def create_feature_columns():\n",
258 | " \n",
259 | " # Get word indecies from bow\n",
260 | " bow = tf.feature_column.categorical_column_with_identity(\n",
261 | " BOW_FEATURE_NAME, num_buckets=VOCAB_SIZE + 1)\n",
262 | " \n",
263 | " # Add weight to the word indecies\n",
264 | " weight_bow = tf.feature_column.weighted_categorical_column(\n",
265 | " bow, TFIDF_FEATURE_NAME)\n",
266 | " \n",
267 | " # Convert to indicator \n",
268 | " weight_bow_indicators = tf.feature_column.indicator_column(weight_bow)\n",
269 | " \n",
270 | " return [weight_bow_indicators]"
271 | ]
272 | },
273 | {
274 | "cell_type": "markdown",
275 | "metadata": {},
276 | "source": [
277 | "## 4. Create a model using a premade DNNClassifer"
278 | ]
279 | },
280 | {
281 | "cell_type": "code",
282 | "execution_count": 6,
283 | "metadata": {},
284 | "outputs": [],
285 | "source": [
286 | "def create_estimator(hparams, run_config):\n",
287 | " \n",
288 | " feature_columns = create_feature_columns()\n",
289 | " \n",
290 | " optimizer = tf.train.AdamOptimizer(learning_rate=hparams.learning_rate)\n",
291 | " \n",
292 | " estimator = tf.estimator.DNNClassifier(\n",
293 | " feature_columns=feature_columns,\n",
294 | " n_classes =len(TARGET_LABELS),\n",
295 | " label_vocabulary=TARGET_LABELS,\n",
296 | " hidden_units=hparams.hidden_units,\n",
297 | " optimizer=optimizer,\n",
298 | " config=run_config\n",
299 | " )\n",
300 | " \n",
301 | " \n",
302 | " return estimator"
303 | ]
304 | },
305 | {
306 | "cell_type": "markdown",
307 | "metadata": {},
308 | "source": [
309 | "## 5. Setup Experiment"
310 | ]
311 | },
312 | {
313 | "cell_type": "markdown",
314 | "metadata": {},
315 | "source": [
316 | "### 5.1 HParams and RunConfig"
317 | ]
318 | },
319 | {
320 | "cell_type": "code",
321 | "execution_count": 7,
322 | "metadata": {},
323 | "outputs": [
324 | {
325 | "name": "stdout",
326 | "output_type": "stream",
327 | "text": [
328 | "[('batch_size', 1000), ('hidden_units', [64, 32]), ('learning_rate', 0.01), ('max_steps', 730), ('num_epochs', 10), ('trainable_embedding', False)]\n",
329 | "\n",
330 | "('Model Directory:', 'models/news/dnn_estimator_tfidf')\n",
331 | "('Dataset Size:', 73124)\n",
332 | "('Batch Size:', 1000)\n",
333 | "('Steps per Epoch:', 73)\n",
334 | "('Total Steps:', 730)\n"
335 | ]
336 | }
337 | ],
338 | "source": [
339 | "NUM_EPOCHS = 10\n",
340 | "BATCH_SIZE = 1000\n",
341 | "\n",
342 | "TOTAL_STEPS = (TRAIN_SIZE/BATCH_SIZE)*NUM_EPOCHS\n",
343 | "EVAL_EVERY_SEC = 60\n",
344 | "\n",
345 | "hparams = tf.contrib.training.HParams(\n",
346 | " num_epochs = NUM_EPOCHS,\n",
347 | " batch_size = BATCH_SIZE,\n",
348 | " learning_rate = 0.01,\n",
349 | " hidden_units=[64, 32],\n",
350 | " max_steps = TOTAL_STEPS,\n",
351 | "\n",
352 | ")\n",
353 | "\n",
354 | "MODEL_NAME = 'dnn_estimator_tfidf' \n",
355 | "model_dir = os.path.join(Params.MODELS_DIR, MODEL_NAME)\n",
356 | "\n",
357 | "run_config = tf.estimator.RunConfig(\n",
358 | " tf_random_seed=19830610,\n",
359 | " log_step_count_steps=1000,\n",
360 | " save_checkpoints_secs=EVAL_EVERY_SEC,\n",
361 | " keep_checkpoint_max=1,\n",
362 | " model_dir=model_dir\n",
363 | ")\n",
364 | "\n",
365 | "\n",
366 | "print(hparams)\n",
367 | "print(\"\")\n",
368 | "print(\"Model Directory:\", run_config.model_dir)\n",
369 | "print(\"Dataset Size:\", TRAIN_SIZE)\n",
370 | "print(\"Batch Size:\", BATCH_SIZE)\n",
371 | "print(\"Steps per Epoch:\",TRAIN_SIZE/BATCH_SIZE)\n",
372 | "print(\"Total Steps:\", TOTAL_STEPS)"
373 | ]
374 | },
375 | {
376 | "cell_type": "markdown",
377 | "metadata": {},
378 | "source": [
379 | "### 5.2 Serving function"
380 | ]
381 | },
382 | {
383 | "cell_type": "code",
384 | "execution_count": 8,
385 | "metadata": {},
386 | "outputs": [],
387 | "source": [
388 | "def generate_serving_input_fn():\n",
389 | " \n",
390 | " def _serving_fn():\n",
391 | " \n",
392 | " receiver_tensor = {\n",
393 | " 'title': tf.placeholder(dtype=tf.string, shape=[None])\n",
394 | " }\n",
395 | "\n",
396 | " _, transformed_features = (\n",
397 | " saved_transform_io.partially_apply_saved_transform(\n",
398 | " os.path.join(Params.TRANSFORM_ARTEFACTS_DIR, transform_fn_io.TRANSFORM_FN_DIR),\n",
399 | " receiver_tensor)\n",
400 | " )\n",
401 | " \n",
402 | " return tf.estimator.export.ServingInputReceiver(\n",
403 | " transformed_features, receiver_tensor)\n",
404 | " \n",
405 | " return _serving_fn"
406 | ]
407 | },
408 | {
409 | "cell_type": "markdown",
410 | "metadata": {},
411 | "source": [
412 | "### 5.3 TrainSpec & EvalSpec"
413 | ]
414 | },
415 | {
416 | "cell_type": "code",
417 | "execution_count": 9,
418 | "metadata": {},
419 | "outputs": [],
420 | "source": [
421 | "train_spec = tf.estimator.TrainSpec(\n",
422 | " input_fn = generate_tfrecords_input_fn(\n",
423 | " Params.TRANSFORMED_TRAIN_DATA_FILE_PREFIX+\"*\",\n",
424 | " mode = tf.estimator.ModeKeys.TRAIN,\n",
425 | " num_epochs=hparams.num_epochs,\n",
426 | " batch_size=hparams.batch_size\n",
427 | " ),\n",
428 | " max_steps=hparams.max_steps,\n",
429 | " hooks=None\n",
430 | ")\n",
431 | "\n",
432 | "eval_spec = tf.estimator.EvalSpec(\n",
433 | " input_fn = generate_tfrecords_input_fn(\n",
434 | " Params.TRANSFORMED_EVAL_DATA_FILE_PREFIX+\"*\",\n",
435 | " mode=tf.estimator.ModeKeys.EVAL,\n",
436 | " num_epochs=1,\n",
437 | " batch_size=hparams.batch_size\n",
438 | " ),\n",
439 | " exporters=[tf.estimator.LatestExporter(\n",
440 | " name=\"estimate\", # the name of the folder in which the model will be exported to under export\n",
441 | " serving_input_receiver_fn=generate_serving_input_fn(),\n",
442 | " exports_to_keep=1,\n",
443 | " as_text=False)],\n",
444 | " steps=None,\n",
445 | " throttle_secs=EVAL_EVERY_SEC\n",
446 | ")"
447 | ]
448 | },
449 | {
450 | "cell_type": "markdown",
451 | "metadata": {},
452 | "source": [
453 | "## 6. Run experiment"
454 | ]
455 | },
456 | {
457 | "cell_type": "code",
458 | "execution_count": 10,
459 | "metadata": {},
460 | "outputs": [
461 | {
462 | "name": "stdout",
463 | "output_type": "stream",
464 | "text": [
465 | "Removing previous training artefacts...\n",
466 | "Experiment started at 16:13:21\n",
467 | ".......................................\n",
468 | "INFO:tensorflow:Using config: {'_save_checkpoints_secs': 60, '_session_config': None, '_keep_checkpoint_max': 1, '_tf_random_seed': 19830610, '_task_type': 'worker', '_global_id_in_cluster': 0, '_is_chief': True, '_cluster_spec': , '_model_dir': 'models/news/dnn_estimator_tfidf', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 1000, '_master': '', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_evaluation_master': '', '_service': None, '_save_summary_steps': 100, '_num_ps_replicas': 0}\n",
469 | "INFO:tensorflow:Running training and evaluation locally (non-distributed).\n",
470 | "INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after 60 secs (eval_spec.throttle_secs) or training is finished.\n",
471 | "INFO:tensorflow:Calling model_fn.\n",
472 | "INFO:tensorflow:Done calling model_fn.\n",
473 | "INFO:tensorflow:Create CheckpointSaverHook.\n",
474 | "INFO:tensorflow:Graph was finalized.\n",
475 | "INFO:tensorflow:Running local_init_op.\n",
476 | "INFO:tensorflow:Done running local_init_op.\n",
477 | "INFO:tensorflow:Saving checkpoints for 1 into models/news/dnn_estimator_tfidf/model.ckpt.\n",
478 | "INFO:tensorflow:loss = 1098.7266, step = 1\n",
479 | "INFO:tensorflow:loss = 213.40088, step = 101 (15.307 sec)\n",
480 | "INFO:tensorflow:loss = 147.65674, step = 201 (13.971 sec)\n",
481 | "INFO:tensorflow:loss = 71.7646, step = 301 (15.121 sec)\n",
482 | "INFO:tensorflow:Saving checkpoints for 392 into models/news/dnn_estimator_tfidf/model.ckpt.\n",
483 | "INFO:tensorflow:Loss for final step: 26.048763.\n",
484 | "INFO:tensorflow:Calling model_fn.\n",
485 | "INFO:tensorflow:Done calling model_fn.\n",
486 | "INFO:tensorflow:Starting evaluation at 2018-05-14-16:14:22\n",
487 | "INFO:tensorflow:Graph was finalized.\n",
488 | "INFO:tensorflow:Restoring parameters from models/news/dnn_estimator_tfidf/model.ckpt-392\n",
489 | "INFO:tensorflow:Running local_init_op.\n",
490 | "INFO:tensorflow:Done running local_init_op.\n",
491 | "INFO:tensorflow:Finished evaluation at 2018-05-14-16:14:25\n",
492 | "INFO:tensorflow:Saving dict for global step 392: accuracy = 0.8243858, average_loss = 0.94847244, global_step = 392, loss = 912.07477\n",
493 | "WARNING:tensorflow:Expected binary or unicode string, got type_url: \"type.googleapis.com/tensorflow.AssetFileDef\"\n",
494 | "value: \"\\n\\t\\n\\007Const:0\\022\\033vocab_string_to_int_uniques\"\n",
495 | "\n",
496 | "INFO:tensorflow:Calling model_fn.\n",
497 | "INFO:tensorflow:Done calling model_fn.\n",
498 | "INFO:tensorflow:Signatures INCLUDED in export for Classify: ['serving_default', 'classification']\n",
499 | "INFO:tensorflow:Signatures INCLUDED in export for Regress: None\n",
500 | "INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']\n",
501 | "INFO:tensorflow:Restoring parameters from models/news/dnn_estimator_tfidf/model.ckpt-392\n",
502 | "INFO:tensorflow:Assets added to graph.\n",
503 | "INFO:tensorflow:Assets written to: models/news/dnn_estimator_tfidf/export/estimate/temp-1526314465/assets\n",
504 | "INFO:tensorflow:SavedModel written to: models/news/dnn_estimator_tfidf/export/estimate/temp-1526314465/saved_model.pb\n",
505 | "INFO:tensorflow:Calling model_fn.\n",
506 | "INFO:tensorflow:Done calling model_fn.\n",
507 | "INFO:tensorflow:Create CheckpointSaverHook.\n",
508 | "INFO:tensorflow:Graph was finalized.\n",
509 | "INFO:tensorflow:Restoring parameters from models/news/dnn_estimator_tfidf/model.ckpt-392\n",
510 | "INFO:tensorflow:Running local_init_op.\n",
511 | "INFO:tensorflow:Done running local_init_op.\n",
512 | "INFO:tensorflow:Saving checkpoints for 393 into models/news/dnn_estimator_tfidf/model.ckpt.\n",
513 | "INFO:tensorflow:loss = 27.088547, step = 393\n",
514 | "INFO:tensorflow:loss = 2.9095829, step = 493 (13.979 sec)\n",
515 | "INFO:tensorflow:loss = 4.3351374, step = 593 (13.651 sec)\n",
516 | "INFO:tensorflow:loss = 11.017786, step = 693 (14.415 sec)\n",
517 | "INFO:tensorflow:Saving checkpoints for 730 into models/news/dnn_estimator_tfidf/model.ckpt.\n",
518 | "INFO:tensorflow:Loss for final step: 3.2552278.\n",
519 | "INFO:tensorflow:Calling model_fn.\n",
520 | "INFO:tensorflow:Done calling model_fn.\n",
521 | "INFO:tensorflow:Starting evaluation at 2018-05-14-16:15:15\n",
522 | "INFO:tensorflow:Graph was finalized.\n",
523 | "INFO:tensorflow:Restoring parameters from models/news/dnn_estimator_tfidf/model.ckpt-730\n",
524 | "INFO:tensorflow:Running local_init_op.\n",
525 | "INFO:tensorflow:Done running local_init_op.\n",
526 | "INFO:tensorflow:Finished evaluation at 2018-05-14-16:15:17\n",
527 | "INFO:tensorflow:Saving dict for global step 730: accuracy = 0.82416916, average_loss = 1.344607, global_step = 730, loss = 1293.0077\n",
528 | "WARNING:tensorflow:Expected binary or unicode string, got type_url: \"type.googleapis.com/tensorflow.AssetFileDef\"\n",
529 | "value: \"\\n\\t\\n\\007Const:0\\022\\033vocab_string_to_int_uniques\"\n",
530 | "\n",
531 | "INFO:tensorflow:Calling model_fn.\n",
532 | "INFO:tensorflow:Done calling model_fn.\n",
533 | "INFO:tensorflow:Signatures INCLUDED in export for Classify: ['serving_default', 'classification']\n",
534 | "INFO:tensorflow:Signatures INCLUDED in export for Regress: None\n",
535 | "INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']\n",
536 | "INFO:tensorflow:Restoring parameters from models/news/dnn_estimator_tfidf/model.ckpt-730\n",
537 | "INFO:tensorflow:Assets added to graph.\n",
538 | "INFO:tensorflow:Assets written to: models/news/dnn_estimator_tfidf/export/estimate/temp-1526314518/assets\n",
539 | "INFO:tensorflow:SavedModel written to: models/news/dnn_estimator_tfidf/export/estimate/temp-1526314518/saved_model.pb\n",
540 | ".......................................\n",
541 | "Experiment finished at 16:15:18\n",
542 | "\n",
543 | "Experiment elapsed time: 117.021302 seconds\n"
544 | ]
545 | }
546 | ],
547 | "source": [
548 | "from datetime import datetime\n",
549 | "import shutil\n",
550 | "\n",
551 | "if Params.TRAIN:\n",
552 | " if not Params.RESUME_TRAINING:\n",
553 | " print(\"Removing previous training artefacts...\")\n",
554 | " shutil.rmtree(model_dir, ignore_errors=True)\n",
555 | " else:\n",
556 | " print(\"Resuming training...\") \n",
557 | "\n",
558 | "\n",
559 | " tf.logging.set_verbosity(tf.logging.INFO)\n",
560 | "\n",
561 | " time_start = datetime.utcnow() \n",
562 | " print(\"Experiment started at {}\".format(time_start.strftime(\"%H:%M:%S\")))\n",
563 | " print(\".......................................\") \n",
564 | "\n",
565 | " estimator = create_estimator(hparams, run_config)\n",
566 | "\n",
567 | " tf.estimator.train_and_evaluate(\n",
568 | " estimator=estimator,\n",
569 | " train_spec=train_spec, \n",
570 | " eval_spec=eval_spec\n",
571 | " )\n",
572 | "\n",
573 | " time_end = datetime.utcnow() \n",
574 | " print(\".......................................\")\n",
575 | " print(\"Experiment finished at {}\".format(time_end.strftime(\"%H:%M:%S\")))\n",
576 | " print(\"\")\n",
577 | " time_elapsed = time_end - time_start\n",
578 | " print(\"Experiment elapsed time: {} seconds\".format(time_elapsed.total_seconds()))\n",
579 | "else:\n",
580 | " print \"Training was skipped!\""
581 | ]
582 | },
583 | {
584 | "cell_type": "markdown",
585 | "metadata": {},
586 | "source": [
587 | "## 7. Evaluate the model"
588 | ]
589 | },
590 | {
591 | "cell_type": "code",
592 | "execution_count": 11,
593 | "metadata": {},
594 | "outputs": [
595 | {
596 | "name": "stdout",
597 | "output_type": "stream",
598 | "text": [
599 | "############################################################################################\n",
600 | "# Train Measures: {'average_loss': 0.0037224626, 'accuracy': 0.99904275, 'global_step': 730, 'loss': 272.20135}\n",
601 | "############################################################################################\n",
602 | "\n",
603 | "############################################################################################\n",
604 | "# Eval Measures: {'average_loss': 1.3446056, 'accuracy': 0.82416916, 'global_step': 730, 'loss': 31032.152}\n",
605 | "############################################################################################\n"
606 | ]
607 | }
608 | ],
609 | "source": [
610 | "tf.logging.set_verbosity(tf.logging.ERROR)\n",
611 | "\n",
612 | "estimator = create_estimator(hparams, run_config)\n",
613 | "\n",
614 | "train_metrics = estimator.evaluate(\n",
615 | " input_fn = generate_tfrecords_input_fn(\n",
616 | " files_pattern= Params.TRANSFORMED_TRAIN_DATA_FILE_PREFIX+\"*\", \n",
617 | " mode= tf.estimator.ModeKeys.EVAL,\n",
618 | " batch_size= TRAIN_SIZE), \n",
619 | " steps=1\n",
620 | ")\n",
621 | "\n",
622 | "\n",
623 | "print(\"############################################################################################\")\n",
624 | "print(\"# Train Measures: {}\".format(train_metrics))\n",
625 | "print(\"############################################################################################\")\n",
626 | "\n",
627 | "eval_metrics = estimator.evaluate(\n",
628 | " input_fn=generate_tfrecords_input_fn(\n",
629 | " files_pattern= Params.TRANSFORMED_EVAL_DATA_FILE_PREFIX+\"*\", \n",
630 | " mode= tf.estimator.ModeKeys.EVAL,\n",
631 | " batch_size= EVAL_SIZE), \n",
632 | " steps=1\n",
633 | ")\n",
634 | "print(\"\")\n",
635 | "print(\"############################################################################################\")\n",
636 | "print(\"# Eval Measures: {}\".format(eval_metrics))\n",
637 | "print(\"############################################################################################\")\n"
638 | ]
639 | },
640 | {
641 | "cell_type": "markdown",
642 | "metadata": {},
643 | "source": [
644 | "## 8. Use Saved Model for Predictions"
645 | ]
646 | },
647 | {
648 | "cell_type": "code",
649 | "execution_count": 12,
650 | "metadata": {},
651 | "outputs": [
652 | {
653 | "name": "stdout",
654 | "output_type": "stream",
655 | "text": [
656 | "models/news/dnn_estimator_tfidf/export/estimate/1526314518\n",
657 | "\n",
658 | "{u'probabilities': array([[0.96217114, 0.01375495, 0.02407398],\n",
659 | " [0.02322701, 0.39720485, 0.5795681 ],\n",
660 | " [0.03017025, 0.9552083 , 0.01462139]], dtype=float32), u'class_ids': array([[0],\n",
661 | " [2],\n",
662 | " [1]]), u'classes': array([['github'],\n",
663 | " ['techcrunch'],\n",
664 | " ['nytimes']], dtype=object), u'logits': array([[ 2.4457023, -1.8020908, -1.2423583],\n",
665 | " [-2.1229138, 0.7162221, 1.0940531],\n",
666 | " [-0.9709409, 2.4841323, -1.6953117]], dtype=float32)}\n"
667 | ]
668 | }
669 | ],
670 | "source": [
671 | "import os\n",
672 | "\n",
673 | "export_dir = model_dir +\"/export/estimate/\"\n",
674 | "saved_model_dir = os.path.join(export_dir, os.listdir(export_dir)[0])\n",
675 | "\n",
676 | "print(saved_model_dir)\n",
677 | "print(\"\")\n",
678 | "\n",
679 | "predictor_fn = tf.contrib.predictor.from_saved_model(\n",
680 | " export_dir = saved_model_dir,\n",
681 | " signature_def_key=\"predict\"\n",
682 | ")\n",
683 | "\n",
684 | "output = predictor_fn(\n",
685 | " {\n",
686 | " 'title':[\n",
687 | " 'Microsoft and Google are joining forces for a new AI framework',\n",
688 | " 'A new version of Python is mind blowing',\n",
689 | " 'EU is investigating new data privacy policies'\n",
690 | " ]\n",
691 | " \n",
692 | " }\n",
693 | ")\n",
694 | "print(output)"
695 | ]
696 | },
697 | {
698 | "cell_type": "code",
699 | "execution_count": null,
700 | "metadata": {},
701 | "outputs": [],
702 | "source": []
703 | }
704 | ],
705 | "metadata": {
706 | "kernelspec": {
707 | "display_name": "Python 2",
708 | "language": "python",
709 | "name": "python2"
710 | },
711 | "language_info": {
712 | "codemirror_mode": {
713 | "name": "ipython",
714 | "version": 2
715 | },
716 | "file_extension": ".py",
717 | "mimetype": "text/x-python",
718 | "name": "python",
719 | "nbconvert_exporter": "python",
720 | "pygments_lexer": "ipython2",
721 | "version": "2.7.10"
722 | }
723 | },
724 | "nbformat": 4,
725 | "nbformat_minor": 2
726 | }
727 |
--------------------------------------------------------------------------------
/08 - Text Analysis/data/sms-spam/n_words.tsv:
--------------------------------------------------------------------------------
1 | 11330
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # TensorFlow Estimator APIs Tutorials - TensorFlow v1.4
2 |
3 | ## The tutorials use the TF estimator APIs to cover:
4 |
5 | * Various ML tasks, currently covering:
6 | * Classification
7 | * Regression
8 | * Clustering (k-means)
9 | * Time-series Analysis (AR Models)
10 | * Dimensionality Reduction (Autoencoding)
11 | * Sequence Models (RNN and LSTMs)
12 | * Image Analysis (CNN for Image Classification)
13 | * Text Analysis (Text Classification with embeddings, CNN, and RNN)
14 | * How to use **canned estimators** to train ML models.
15 |
16 | * How to implement **custom estimators** (model_fn & EstimatorSpec).
17 |
18 | * A standard **metadata-driven** approach to build the model **feature_column**(s) including:
19 | * **numerical** features
20 | * **categorical** features with **vocabulary**,
21 | * **categorical** features **hash bucket**, and
22 | * **categorical** features with **identity**
23 |
24 | * Data **input pipelines** (input_fn) using:
25 | * tf.estimator.inputs.**pandas_input_fn**,
26 | * tf.train.**string_input_producer**, and
27 | * tf.data.**Dataset** APIs to read both **.csv** and **.tfrecords** (tf.example) data files
28 | * tf.contrib.timeseries.**RandomWindowInputFn** and **WholeDatasetInputFn** for time-series data
29 | * Feature **preprocessing** and **creation** as part of reading data (input_fn), for example, sin, sqrt, polynomial expansion, fourier transform, log, boolean comparisons, euclidean distance, custom formulas, etc.
30 |
31 | * A standard approach to prepare **wide** (sparse) and **deep** (dense) feature_column(s) for Wide and Deep **DNN Liner Combined Models**
32 |
33 | * The use of **normalizer_fn** in numeric_column() to **scale** the numeric features using pre-computed statistics (for Min-Max or Standard scaling)
34 |
35 | * The use of **weight_column** in the canned estimators, and in the loss metric in custom estimators.
36 |
37 | * Implicit **Feature Engineering** as part of defining feature_colum(s), including:
38 | * crossing,
39 | * clipping,
40 | * embedding,
41 | * indicators (encoding categorical features), and
42 | * bucketization
43 | * How to use the tf.contrib.learn.**experiment** APIs to train, evaluate, and export models
44 |
45 | * Howe to use the tf.estimator.**train_and_evaluate** function (along with trainSpec & evalSpec) train, evaluate, and export models
46 |
47 | * How to use **tf.train.exponential_decay** function as a learning rate scheduler
48 |
49 | * How to **serve** exported model (export_savedmodel) using **csv** and **json** inputs
50 |
51 | ## Coming Soon:
52 | * Early-stopping implementation
53 | * DynamicRnnEstimator and the use of variable-length sequences
54 | * Collaborative Filtering for Recommendation Models
55 | * Text Analysis (Topic Models, Word/Doc embedding, etc.)
56 | * tf.Transform to preprocessing and feature engineering
57 | * keras examples
58 |
59 |
60 |
61 |
--------------------------------------------------------------------------------
/images/exp-api2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/ksalama/tf-estimator-tutorials/cecfea0c378ebc8552941c9ebf8a530228dd845d/images/exp-api2.png
--------------------------------------------------------------------------------