├── .gitignore ├── 01 - Regression ├── -- TensorBoard.ipynb ├── 00.0 - TensorFlow Version Update.ipynb ├── 01.0 - Regression Data Generation.ipynb ├── 02.0 - TF Regression Model - Estimator APIs + Pandas.ipynb ├── 03.0 - TF Regression Model - Experiment APIs + CSV Files.ipynb ├── 04.0 - TF Regression Model - Dataset Input + JSON Serving.ipynb ├── 04.0 - TF Regression Model - Dataset Input.ipynb ├── 05.0 - TF Regression Model - Custom Estimator.ipynb ├── 06.0 - Convert CSV to TFRecords.ipynb ├── 07.0 - TF Regression Model - DNN Wide & Deep + estimator.train_and_evaluate.ipynb ├── 08.0 - TF Regression Example - Housing Price Estimation + Features Scaling.ipynb └── data │ ├── housingdata.csv │ ├── new-data.csv │ ├── new-data.json │ ├── test-data.csv │ ├── test-data.tfrecords │ ├── train-data.csv │ ├── train-data.tfrecords │ ├── valid-data.csv │ └── valid-data.tfrecords ├── 02 - Classification ├── -- TensorBoard.ipynb ├── 00.0 - TensorFlow Version Update.ipynb ├── 01.0 - Classification Data Generation.ipynb ├── 02.0 - Convert CSV to TFRecords.ipynb ├── 03.0 - TF Classification Model - DNN Wide & Deep + Train_And_Evaluate + Dataset + TFRecords.ipynb ├── 04.0 - TF Classification Model - Custom Estimator + Experiment + Dataset + CSV.ipynb ├── 05.0 - Classification Example - Census Income Prediction.ipynb ├── 06.0 - Classification Example - Census Income Prediction - Custom Estimator + Exponential Decay Learning Rate.ipynb └── data │ ├── adult.data.csv │ ├── adult.stats.csv │ ├── adult.test.csv │ ├── test-data.csv │ ├── test-data.tfrecords │ ├── train-data.csv │ ├── train-data.tfrecords │ ├── valid-data.csv │ └── valid-data.tfrecords ├── 03 - Clustering ├── 00.0 - TensorFlow Version Update.ipynb ├── 01.0 - Generate Data Points + SKLearn Clustering.ipynb ├── 02.0 - TF k-means - Estimator API.ipynb ├── 03.0 - TF k-means - Experiment API.ipynb └── data │ ├── new-data.csv │ ├── test-data.csv │ └── train-data.csv ├── 04 - Times Series ├── 00.0 - Generate Time Series Data.ipynb ├── 01.0 - TF ARRegressor - Estimator + Numpy.ipynb ├── 02.0 - TF ARRegressor - Experiment + CSV.ipynb └── data │ ├── test-data.csv │ ├── timeseries-multivariate.txt │ ├── timeseries-univariate.csv │ └── train-data.csv ├── 05 - Autoencoding ├── 01.0 - Generate Dataset with High-Dimensionality.ipynb ├── 02.0 - Dimensionality Reduction - Autoencoding + Custom Estimator.ipynb ├── 03.0 - Dimensionality Reduction - Autoencoding + Normalizer + XEntropy Loss.ipynb ├── 04.0 - Dimensionality Reduction - Autoencoding + Custom Estimator with MNIST.ipynb └── data │ └── data-01.csv ├── 06 - Sequence Models ├── 01 - RNN with LSTM - Predicting the Next Values - Single Pattern.ipynb ├── 02 - RNN with LSTM - Predicting the Next Values - Multiple Patterns.ipynb ├── 03 - RNN with LSTM - Sequence Classification.ipynb ├── TODO.txt └── data │ ├── seq01.test.csv │ └── seq01.train.csv ├── 07 - Image Analysis ├── 00.0 - TensorFlow Version Update.ipynb ├── 01.0 - CNN Example with CIFAR-10 dataset.ipynb ├── 02.0 - CNN Example with CIFAR-10 dataset using TFRecords.ipynb └── 03.0 - CNN Example with CIFAR-10 (Keras ver.).ipynb ├── 08 - Text Analysis ├── 01 - Text Classification - SMS Ham vs. Spam - Data Preparation.ipynb ├── 02 - Text Classification - SMS Ham vs. Spam - Document Embedding.ipynb ├── 03 - Text Classification - SMS Ham vs. Spam - Word Embeddings + CNN.ipynb ├── 04 - Text Classification - SMS Ham vs. Spam - Word Embeddings + LSTM.ipynb ├── 05 - Text Classification - Hacker News - End-to-End + TF-Hub Sentence Embedding.ipynb ├── 06 - Part_1 - Text Classification - Hacker News - Data Preprocessing with TFT.ipynb ├── 06 - Part_2 - Text Classification - Hacker News - DNNClassifier with TF-Hub Sentence Embedding.ipynb ├── 06 - Part_3 - Text Classification - Hacker News - Custom Estimator Word Embedding.ipynb ├── 06 - Part_4 - Text Classification - Hacker News - DNNClassifier with TF.IDF.ipynb └── data │ └── sms-spam │ ├── SMSSpamCollection │ ├── n_words.tsv │ ├── train-data.tsv │ ├── valid-data.tsv │ └── vocab_list.tsv ├── README.md └── images └── exp-api2.png /.gitignore: -------------------------------------------------------------------------------- 1 | 01 - Regression/trained_models 2 | 01 - Regression/.ipynb_checkpoints 3 | 01 - Regression/.DS_Store 4 | 02 - Classification/trained_models 5 | 02 - Classification/.ipynb_checkpoints 6 | 02 - Classification/.DS_Store 7 | 03 - Clustering/trained_models 8 | 03 - Clustering/.ipynb_checkpoints 9 | 03 - Clustering/.DS_Store 10 | 04 - Time Series/trained_models 11 | 04 - Time Series/.ipynb_checkpoints 12 | 04 - Time Series/.DS_Store 13 | 05 - Autoencoding/trained_models 14 | 05 - Autoencoding/.ipynb_checkpoints 15 | 05 - Autoencoding/.DS_Store 16 | .ipynb_checkpoints 17 | .DS_Store 18 | -------------------------------------------------------------------------------- /01 - Regression/-- TensorBoard.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "MODEL_NAME = 'reg-model-01'\n", 10 | "model_dir = 'trained_models/{}'.format(MODEL_NAME)\n", 11 | "print(model_dir)" 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "## Start TensorBoard Process" 19 | ] 20 | }, 21 | { 22 | "cell_type": "code", 23 | "execution_count": null, 24 | "metadata": {}, 25 | "outputs": [], 26 | "source": [ 27 | "from google.datalab.ml import TensorBoard\n", 28 | "TensorBoard().start(model_dir)\n", 29 | "TensorBoard().list()" 30 | ] 31 | }, 32 | { 33 | "cell_type": "markdown", 34 | "metadata": {}, 35 | "source": [ 36 | "## Kill TensorBoard Process" 37 | ] 38 | }, 39 | { 40 | "cell_type": "code", 41 | "execution_count": null, 42 | "metadata": {}, 43 | "outputs": [], 44 | "source": [ 45 | "# to stop TensorBoard\n", 46 | "TensorBoard().stop(23002)\n", 47 | "print('stopped TensorBoard')\n", 48 | "TensorBoard().list()" 49 | ] 50 | } 51 | ], 52 | "metadata": { 53 | "kernelspec": { 54 | "display_name": "Python 3", 55 | "language": "python", 56 | "name": "python3" 57 | }, 58 | "language_info": { 59 | "codemirror_mode": { 60 | "name": "ipython", 61 | "version": 3 62 | }, 63 | "file_extension": ".py", 64 | "mimetype": "text/x-python", 65 | "name": "python", 66 | "nbconvert_exporter": "python", 67 | "pygments_lexer": "ipython3", 68 | "version": "3.6.1" 69 | } 70 | }, 71 | "nbformat": 4, 72 | "nbformat_minor": 2 73 | } 74 | -------------------------------------------------------------------------------- /01 - Regression/00.0 - TensorFlow Version Update.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stdout", 10 | "output_type": "stream", 11 | "text": [ 12 | "Collecting tensorflow\n", 13 | " Downloading tensorflow-1.4.0-cp36-cp36m-macosx_10_11_x86_64.whl (39.3MB)\n", 14 | "Collecting tensorflow-tensorboard<0.5.0,>=0.4.0rc1 (from tensorflow)\n", 15 | " Downloading tensorflow_tensorboard-0.4.0rc2-py3-none-any.whl (1.7MB)\n", 16 | "Requirement already up-to-date: protobuf>=3.3.0 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n", 17 | "Requirement already up-to-date: numpy>=1.12.1 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n", 18 | "Requirement already up-to-date: wheel>=0.26 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n", 19 | "Collecting enum34>=1.1.6 (from tensorflow)\n", 20 | " Downloading enum34-1.1.6-py3-none-any.whl\n", 21 | "Requirement already up-to-date: six>=1.10.0 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n", 22 | "Requirement already up-to-date: werkzeug>=0.11.10 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n", 23 | "Requirement already up-to-date: html5lib==0.9999999 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n", 24 | "Requirement already up-to-date: markdown>=2.6.8 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n", 25 | "Requirement already up-to-date: bleach==1.5.0 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n", 26 | "Requirement already up-to-date: setuptools in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from protobuf>=3.3.0->tensorflow)\n", 27 | "Installing collected packages: tensorflow-tensorboard, enum34, tensorflow\n", 28 | " Found existing installation: tensorflow-tensorboard 0.1.8\n", 29 | " Uninstalling tensorflow-tensorboard-0.1.8:\n", 30 | " Successfully uninstalled tensorflow-tensorboard-0.1.8\n", 31 | " Found existing installation: tensorflow 1.3.0\n", 32 | " Uninstalling tensorflow-1.3.0:\n", 33 | " Successfully uninstalled tensorflow-1.3.0\n", 34 | "Successfully installed enum34-1.1.6 tensorflow-1.4.0 tensorflow-tensorboard-0.4.0rc2\n" 35 | ] 36 | } 37 | ], 38 | "source": [ 39 | "%%bash\n", 40 | "\n", 41 | "pip install -U tensorflow" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 2, 47 | "metadata": {}, 48 | "outputs": [ 49 | { 50 | "name": "stderr", 51 | "output_type": "stream", 52 | "text": [ 53 | "/Users/khalidsalama/anaconda/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6\n", 54 | " return f(*args, **kwds)\n" 55 | ] 56 | }, 57 | { 58 | "name": "stdout", 59 | "output_type": "stream", 60 | "text": [ 61 | "1.4.0\n" 62 | ] 63 | } 64 | ], 65 | "source": [ 66 | "import tensorflow as tf\n", 67 | "print(tf.__version__)" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": null, 73 | "metadata": { 74 | "collapsed": true 75 | }, 76 | "outputs": [], 77 | "source": [] 78 | } 79 | ], 80 | "metadata": { 81 | "kernelspec": { 82 | "display_name": "Python 3", 83 | "language": "python", 84 | "name": "python3" 85 | }, 86 | "language_info": { 87 | "codemirror_mode": { 88 | "name": "ipython", 89 | "version": 3 90 | }, 91 | "file_extension": ".py", 92 | "mimetype": "text/x-python", 93 | "name": "python", 94 | "nbconvert_exporter": "python", 95 | "pygments_lexer": "ipython3", 96 | "version": "3.6.1" 97 | } 98 | }, 99 | "nbformat": 4, 100 | "nbformat_minor": 2 101 | } 102 | -------------------------------------------------------------------------------- /01 - Regression/01.0 - Regression Data Generation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stderr", 10 | "output_type": "stream", 11 | "text": [ 12 | "/Users/khalidsalama/anaconda/lib/python3.6/site-packages/sklearn/cross_validation.py:41: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.\n", 13 | " \"This module will be removed in 0.20.\", DeprecationWarning)\n", 14 | "/Users/khalidsalama/anaconda/lib/python3.6/site-packages/sklearn/grid_search.py:42: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. This module will be removed in 0.20.\n", 15 | " DeprecationWarning)\n", 16 | "/Users/khalidsalama/anaconda/lib/python3.6/site-packages/sklearn/learning_curve.py:22: DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the functions are moved. This module will be removed in 0.20\n", 17 | " DeprecationWarning)\n" 18 | ] 19 | } 20 | ], 21 | "source": [ 22 | "import numpy as np\n", 23 | "import pandas as pd\n", 24 | "from sklearn import *\n", 25 | "import matplotlib.pyplot as plt\n", 26 | "%matplotlib inline" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": 2, 32 | "metadata": { 33 | "collapsed": true 34 | }, 35 | "outputs": [], 36 | "source": [ 37 | "sample_size = 5000" 38 | ] 39 | }, 40 | { 41 | "cell_type": "code", 42 | "execution_count": 3, 43 | "metadata": { 44 | "collapsed": true 45 | }, 46 | "outputs": [], 47 | "source": [ 48 | "\n", 49 | "data1,target1 = datasets.make_circles(n_samples=sample_size, factor=.1, noise=0.2)\n", 50 | "target1 = (3*data1[:,0])-(16*data1[:,1]) + (0.5*data1[:,0]*data1[:,1]) + np.random.normal(0,2,size=sample_size)\n", 51 | "\n", 52 | "\n", 53 | "data2,target2 = datasets.make_circles(n_samples=sample_size, factor=.5, noise=0.2)\n", 54 | "target2 = np.power(data2[:,0],2) + 10*np.power(data2[:,1],3) + (50*data2[:,0]*np.power(data2[:,1],2)) + np.random.normal(0,2,size=sample_size)\n", 55 | "\n", 56 | "data3,target3 = datasets.make_moons(n_samples=sample_size,noise=0.2)\n", 57 | "data3[:,0] = (2 * (data3[:, 0]-(-1))/(3))-1\n", 58 | "data3[:,1] = (2 * (data3[:, 1]-(-1))/(2))-1\n", 59 | "target3 = (50*data3[:,0]*np.sin(data3[:,1])) + (50*data3[:,1]*np.cos(data3[:,0]))\n", 60 | "\n", 61 | "data4,target4 = datasets.make_moons(n_samples=sample_size,noise=0.2)\n", 62 | "\n", 63 | "temp = np.copy(data4[:, 0])\n", 64 | "data4[:, 0] = data4[:, 1]\n", 65 | "data4[:, 1] = temp\n", 66 | "data4[:,0] = (2 * (data4[:, 0]-(-1))/(2))-1\n", 67 | "data4[:,1] = (2 * (data4[:, 1]-(-1))/(3))-1\n", 68 | "\n", 69 | "target4 = (30*data1[:,0])-(16*data1[:,1]) - (1.5*data1[:,0]*data1[:,1]) + np.random.normal(0,1,size=sample_size)" 70 | ] 71 | }, 72 | { 73 | "cell_type": "code", 74 | "execution_count": 4, 75 | "metadata": { 76 | "collapsed": true 77 | }, 78 | "outputs": [], 79 | "source": [ 80 | "data = np.concatenate((data1, data2, data3, data4), axis=0)\n", 81 | "target = np.concatenate((target1,target2,target3,target4),axis=0)\n", 82 | "alpha = np.concatenate((np.zeros(sample_size),np.ones(sample_size),np.zeros(sample_size),np.ones(sample_size)), axis=0)\n", 83 | "beta = np.concatenate((np.zeros(sample_size),np.zeros(sample_size),np.ones(sample_size),np.ones(sample_size)), axis=0)" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 5, 89 | "metadata": { 90 | "collapsed": true 91 | }, 92 | "outputs": [], 93 | "source": [ 94 | "data_frame = pd.DataFrame(data = data,columns=[\"x\",\"y\"])\n", 95 | "data_frame[\"alpha\"] = pd.Series(alpha).map(lambda v: 'ax01' if v==0 else 'ax02')\n", 96 | "data_frame[\"beta\"] = pd.Series(beta).map(lambda v: 'bx01' if v==0 else 'bx02')\n", 97 | "data_frame[\"target\"] = target" 98 | ] 99 | }, 100 | { 101 | "cell_type": "code", 102 | "execution_count": 6, 103 | "metadata": {}, 104 | "outputs": [ 105 | { 106 | "data": { 107 | "text/html": [ 108 | "
\n", 109 | "\n", 122 | "\n", 123 | " \n", 124 | " \n", 125 | " \n", 126 | " \n", 127 | " \n", 128 | " \n", 129 | " \n", 130 | " \n", 131 | " \n", 132 | " \n", 133 | " \n", 134 | " \n", 135 | " \n", 136 | " \n", 137 | " \n", 138 | " \n", 139 | " \n", 140 | " \n", 141 | " \n", 142 | " \n", 143 | " \n", 144 | " \n", 145 | " \n", 146 | " \n", 147 | " \n", 148 | " \n", 149 | " \n", 150 | " \n", 151 | " \n", 152 | " \n", 153 | " \n", 154 | " \n", 155 | " \n", 156 | " \n", 157 | " \n", 158 | " \n", 159 | " \n", 160 | " \n", 161 | " \n", 162 | " \n", 163 | " \n", 164 | " \n", 165 | " \n", 166 | " \n", 167 | " \n", 168 | " \n", 169 | " \n", 170 | " \n", 171 | " \n", 172 | " \n", 173 | " \n", 174 | " \n", 175 | " \n", 176 | " \n", 177 | " \n", 178 | " \n", 179 | " \n", 180 | " \n", 181 | "
xytarget
count20000.00000020000.00000020000.000000
mean0.0630320.0612921.326481
std0.5771480.57705117.741681
min-1.567981-1.578965-73.096282
25%-0.333928-0.334557-6.737629
50%0.0535080.0535260.417512
75%0.4771570.4756788.707335
max1.6175111.72412586.776134
\n", 182 | "
" 183 | ], 184 | "text/plain": [ 185 | " x y target\n", 186 | "count 20000.000000 20000.000000 20000.000000\n", 187 | "mean 0.063032 0.061292 1.326481\n", 188 | "std 0.577148 0.577051 17.741681\n", 189 | "min -1.567981 -1.578965 -73.096282\n", 190 | "25% -0.333928 -0.334557 -6.737629\n", 191 | "50% 0.053508 0.053526 0.417512\n", 192 | "75% 0.477157 0.475678 8.707335\n", 193 | "max 1.617511 1.724125 86.776134" 194 | ] 195 | }, 196 | "execution_count": 6, 197 | "metadata": {}, 198 | "output_type": "execute_result" 199 | } 200 | ], 201 | "source": [ 202 | "data_frame.describe()" 203 | ] 204 | }, 205 | { 206 | "cell_type": "code", 207 | "execution_count": 7, 208 | "metadata": {}, 209 | "outputs": [ 210 | { 211 | "name": "stdout", 212 | "output_type": "stream", 213 | "text": [ 214 | "12000\n", 215 | "3000\n", 216 | "5000\n" 217 | ] 218 | } 219 | ], 220 | "source": [ 221 | "distribution = ([0] * sample_size) + ([1] * sample_size) + ([2] * sample_size) + ([3] * sample_size)\n", 222 | "\n", 223 | "splitter = model_selection.StratifiedShuffleSplit(n_splits=1, test_size=0.25, random_state=0)\n", 224 | "splits = list(splitter.split(X=data_frame.iloc[:,[0,1,2,3]],y=distribution))\n", 225 | "learn_index = splits[0][0]\n", 226 | "test_index = splits[0][1]\n", 227 | "\n", 228 | "learn_df = data_frame.iloc[learn_index,:]\n", 229 | "\n", 230 | "size2 = int(len(learn_df)/4)\n", 231 | "distribution2 = ([0] * size2) + ([1] * size2) + ([2] * size2) + ([3] * size2)\n", 232 | "\n", 233 | "\n", 234 | "splitter = model_selection.StratifiedShuffleSplit(n_splits=1, test_size=0.2, random_state=0)\n", 235 | "splits = list(splitter.split(X=learn_df.iloc[:,[0,1,2,3]],y=distribution2))\n", 236 | "train_index = splits[0][0]\n", 237 | "valid_index = splits[0][1]\n", 238 | "\n", 239 | "\n", 240 | "train_df = learn_df.iloc[train_index,:]\n", 241 | "print(len(train_df))\n", 242 | "\n", 243 | "valid_df = learn_df.iloc[valid_index,:]\n", 244 | "print(len(valid_df))\n", 245 | "\n", 246 | "test_df = data_frame.iloc[test_index,:]\n", 247 | "print(len(test_df))\n" 248 | ] 249 | }, 250 | { 251 | "cell_type": "code", 252 | "execution_count": 8, 253 | "metadata": { 254 | "collapsed": true 255 | }, 256 | "outputs": [], 257 | "source": [ 258 | "train_df.to_csv(path_or_buf=\"data/train-data.csv\", header=False, index=True)\n", 259 | "valid_df.to_csv(path_or_buf=\"data/valid-data.csv\", header=False, index=True)\n", 260 | "test_df.to_csv(path_or_buf=\"data/test-data.csv\", header=False, index=True)" 261 | ] 262 | }, 263 | { 264 | "cell_type": "code", 265 | "execution_count": null, 266 | "metadata": { 267 | "collapsed": true 268 | }, 269 | "outputs": [], 270 | "source": [] 271 | } 272 | ], 273 | "metadata": { 274 | "kernelspec": { 275 | "display_name": "Python 3", 276 | "language": "python", 277 | "name": "python3" 278 | }, 279 | "language_info": { 280 | "codemirror_mode": { 281 | "name": "ipython", 282 | "version": 3 283 | }, 284 | "file_extension": ".py", 285 | "mimetype": "text/x-python", 286 | "name": "python", 287 | "nbconvert_exporter": "python", 288 | "pygments_lexer": "ipython3", 289 | "version": "3.6.1" 290 | } 291 | }, 292 | "nbformat": 4, 293 | "nbformat_minor": 2 294 | } 295 | -------------------------------------------------------------------------------- /01 - Regression/02.0 - TF Regression Model - Estimator APIs + Pandas.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stderr", 10 | "output_type": "stream", 11 | "text": [ 12 | "/Users/khalidsalama/anaconda/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6\n", 13 | " return f(*args, **kwds)\n" 14 | ] 15 | }, 16 | { 17 | "name": "stdout", 18 | "output_type": "stream", 19 | "text": [ 20 | "1.4.0\n" 21 | ] 22 | } 23 | ], 24 | "source": [ 25 | "import tensorflow as tf\n", 26 | "import pandas as pd\n", 27 | "import numpy as np\n", 28 | "import shutil\n", 29 | "import math\n", 30 | "import multiprocessing\n", 31 | "from datetime import datetime\n", 32 | "from tensorflow.python.feature_column import feature_column\n", 33 | "print(tf.__version__)" 34 | ] 35 | }, 36 | { 37 | "cell_type": "markdown", 38 | "metadata": {}, 39 | "source": [ 40 | "## Steps to use the TF Estimator APIs\n", 41 | "1. Define dataset **metadata**\n", 42 | "2. Define **data input function** to read the data from Pandas dataframe + **apply feature processing**\n", 43 | "3. Create TF **feature columns** based on metadata + **extended feature columns**\n", 44 | "4. Instantiate an **estimator** with the required **feature columns & parameters**\n", 45 | "5. **Train** estimator using training data\n", 46 | "6. **Evaluate** estimator using test data\n", 47 | "7. Perform **predictions**\n", 48 | "8. **Save & Serve** the estimator" 49 | ] 50 | }, 51 | { 52 | "cell_type": "code", 53 | "execution_count": 2, 54 | "metadata": { 55 | "collapsed": true 56 | }, 57 | "outputs": [], 58 | "source": [ 59 | "MODEL_NAME = 'reg-model-01'\n", 60 | "\n", 61 | "TRAIN_DATA_FILE = 'data/train-data.csv'\n", 62 | "VALID_DATA_FILE = 'data/valid-data.csv'\n", 63 | "TEST_DATA_FILE = 'data/test-data.csv'\n", 64 | "\n", 65 | "RESUME_TRAINING = False\n", 66 | "PROCESS_FEATURES = True\n", 67 | "MULTI_THREADING = False" 68 | ] 69 | }, 70 | { 71 | "cell_type": "markdown", 72 | "metadata": {}, 73 | "source": [ 74 | "## 1. Define Dataset Metadata\n", 75 | "* CSV file header and defaults\n", 76 | "* Numeric and categorical feature names\n", 77 | "* Target feature name\n", 78 | "* Unused columns" 79 | ] 80 | }, 81 | { 82 | "cell_type": "code", 83 | "execution_count": 3, 84 | "metadata": {}, 85 | "outputs": [ 86 | { 87 | "name": "stdout", 88 | "output_type": "stream", 89 | "text": [ 90 | "Header: ['key', 'x', 'y', 'alpha', 'beta', 'target']\n", 91 | "Numeric Features: ['x', 'y']\n", 92 | "Categorical Features: ['alpha', 'beta']\n", 93 | "Target: target\n", 94 | "Unused Features: ['key']\n" 95 | ] 96 | } 97 | ], 98 | "source": [ 99 | "HEADER = ['key','x','y','alpha','beta','target']\n", 100 | "HEADER_DEFAULTS = [[0], [0.0], [0.0], ['NA'], ['NA'], [0.0]]\n", 101 | "\n", 102 | "NUMERIC_FEATURE_NAMES = ['x', 'y'] \n", 103 | "\n", 104 | "CATEGORICAL_FEATURE_NAMES_WITH_VOCABULARY = {'alpha':['ax01', 'ax02'], 'beta':['bx01', 'bx02']}\n", 105 | "CATEGORICAL_FEATURE_NAMES = list(CATEGORICAL_FEATURE_NAMES_WITH_VOCABULARY.keys())\n", 106 | "\n", 107 | "FEATURE_NAMES = NUMERIC_FEATURE_NAMES + CATEGORICAL_FEATURE_NAMES\n", 108 | "\n", 109 | "TARGET_NAME = 'target'\n", 110 | "\n", 111 | "UNUSED_FEATURE_NAMES = list(set(HEADER) - set(FEATURE_NAMES) - {TARGET_NAME})\n", 112 | "\n", 113 | "print(\"Header: {}\".format(HEADER))\n", 114 | "print(\"Numeric Features: {}\".format(NUMERIC_FEATURE_NAMES))\n", 115 | "print(\"Categorical Features: {}\".format(CATEGORICAL_FEATURE_NAMES))\n", 116 | "print(\"Target: {}\".format(TARGET_NAME))\n", 117 | "print(\"Unused Features: {}\".format(UNUSED_FEATURE_NAMES))" 118 | ] 119 | }, 120 | { 121 | "cell_type": "markdown", 122 | "metadata": {}, 123 | "source": [ 124 | "## 2. Define Data Input Function\n", 125 | "* Input csv file name\n", 126 | "* Load pandas Dataframe\n", 127 | "* Apply feature processing\n", 128 | "* Return a function that returns (features, target) tensors" 129 | ] 130 | }, 131 | { 132 | "cell_type": "code", 133 | "execution_count": 4, 134 | "metadata": { 135 | "collapsed": true 136 | }, 137 | "outputs": [], 138 | "source": [ 139 | "def process_dataframe(dataset_df):\n", 140 | " \n", 141 | " dataset_df[\"x_2\"] = np.square(dataset_df['x'])\n", 142 | " dataset_df[\"y_2\"] = np.square(dataset_df['y'])\n", 143 | " dataset_df[\"xy\"] = dataset_df['x'] * dataset_df['y']\n", 144 | " dataset_df['dist_xy'] = np.sqrt(np.square(dataset_df['x']-dataset_df['y']))\n", 145 | " \n", 146 | " return dataset_df\n", 147 | "\n", 148 | "def generate_pandas_input_fn(file_name, mode=tf.estimator.ModeKeys.EVAL,\n", 149 | " skip_header_lines=0,\n", 150 | " num_epochs=1,\n", 151 | " batch_size=100):\n", 152 | "\n", 153 | " df_dataset = pd.read_csv(file_name, names=HEADER, skiprows=skip_header_lines)\n", 154 | " \n", 155 | " x = df_dataset[FEATURE_NAMES].copy()\n", 156 | " if PROCESS_FEATURES:\n", 157 | " x = process_dataframe(x)\n", 158 | " \n", 159 | " y = df_dataset[TARGET_NAME]\n", 160 | " \n", 161 | " shuffle = True if mode == tf.estimator.ModeKeys.TRAIN else False\n", 162 | " \n", 163 | " num_threads=1\n", 164 | " \n", 165 | " if MULTI_THREADING:\n", 166 | " num_threads=multiprocessing.cpu_count()\n", 167 | " num_epochs = int(num_epochs/num_threads) if mode == tf.estimator.ModeKeys.TRAIN else num_epochs\n", 168 | " \n", 169 | " pandas_input_fn = tf.estimator.inputs.pandas_input_fn(\n", 170 | " batch_size=batch_size,\n", 171 | " num_epochs= num_epochs,\n", 172 | " shuffle=shuffle,\n", 173 | " x=x,\n", 174 | " y=y,\n", 175 | " target_column=TARGET_NAME\n", 176 | " )\n", 177 | " \n", 178 | " print(\"\")\n", 179 | " print(\"* data input_fn:\")\n", 180 | " print(\"================\")\n", 181 | " print(\"Input file: {}\".format(file_name))\n", 182 | " print(\"Dataset size: {}\".format(len(df_dataset)))\n", 183 | " print(\"Batch size: {}\".format(batch_size))\n", 184 | " print(\"Epoch Count: {}\".format(num_epochs))\n", 185 | " print(\"Mode: {}\".format(mode))\n", 186 | " print(\"Thread Count: {}\".format(num_threads))\n", 187 | " print(\"Shuffle: {}\".format(shuffle))\n", 188 | " print(\"================\")\n", 189 | " print(\"\")\n", 190 | " \n", 191 | " return pandas_input_fn" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": 5, 197 | "metadata": {}, 198 | "outputs": [ 199 | { 200 | "name": "stdout", 201 | "output_type": "stream", 202 | "text": [ 203 | "\n", 204 | "* data input_fn:\n", 205 | "================\n", 206 | "Input file: data/train-data.csv\n", 207 | "Dataset size: 12000\n", 208 | "Batch size: 100\n", 209 | "Epoch Count: 1\n", 210 | "Mode: eval\n", 211 | "Thread Count: 1\n", 212 | "Shuffle: False\n", 213 | "================\n", 214 | "\n", 215 | "Feature read from DataFrame: ['x', 'y', 'alpha', 'beta', 'x_2', 'y_2', 'xy', 'dist_xy']\n", 216 | "Target read from DataFrame: Tensor(\"fifo_queue_DequeueUpTo:9\", shape=(?,), dtype=float64)\n" 217 | ] 218 | } 219 | ], 220 | "source": [ 221 | "features, target = generate_pandas_input_fn(file_name=TRAIN_DATA_FILE)()\n", 222 | "print(\"Feature read from DataFrame: {}\".format(list(features.keys())))\n", 223 | "print(\"Target read from DataFrame: {}\".format(target))" 224 | ] 225 | }, 226 | { 227 | "cell_type": "markdown", 228 | "metadata": {}, 229 | "source": [ 230 | "## 3. Define Feature Columns\n", 231 | "The input numeric columns are assumed to be normalized (or have the same scale). Otherwise, a normlizer_fn, along with the normlisation params (mean, stdv or min, max) should be passed to tf.feature_column.numeric_column() constructor." 232 | ] 233 | }, 234 | { 235 | "cell_type": "code", 236 | "execution_count": 6, 237 | "metadata": {}, 238 | "outputs": [ 239 | { 240 | "name": "stdout", 241 | "output_type": "stream", 242 | "text": [ 243 | "Feature Columns: {'x': _NumericColumn(key='x', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), 'y': _NumericColumn(key='y', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), 'x_2': _NumericColumn(key='x_2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), 'y_2': _NumericColumn(key='y_2', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), 'xy': _NumericColumn(key='xy', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), 'dist_xy': _NumericColumn(key='dist_xy', shape=(1,), default_value=None, dtype=tf.float32, normalizer_fn=None), 'alpha': _VocabularyListCategoricalColumn(key='alpha', vocabulary_list=('ax01', 'ax02'), dtype=tf.string, default_value=-1, num_oov_buckets=0), 'beta': _VocabularyListCategoricalColumn(key='beta', vocabulary_list=('bx01', 'bx02'), dtype=tf.string, default_value=-1, num_oov_buckets=0), 'alpha_X_beta': _CrossedColumn(keys=(_VocabularyListCategoricalColumn(key='alpha', vocabulary_list=('ax01', 'ax02'), dtype=tf.string, default_value=-1, num_oov_buckets=0), _VocabularyListCategoricalColumn(key='beta', vocabulary_list=('bx01', 'bx02'), dtype=tf.string, default_value=-1, num_oov_buckets=0)), hash_bucket_size=4, hash_key=None)}\n" 244 | ] 245 | } 246 | ], 247 | "source": [ 248 | "def get_feature_columns():\n", 249 | " \n", 250 | " \n", 251 | " all_numeric_feature_names = NUMERIC_FEATURE_NAMES\n", 252 | " \n", 253 | " CONSTRUCTED_NUMERIC_FEATURES_NAMES = ['x_2', 'y_2', 'xy', 'dist_xy']\n", 254 | " \n", 255 | " if PROCESS_FEATURES:\n", 256 | " all_numeric_feature_names += CONSTRUCTED_NUMERIC_FEATURES_NAMES\n", 257 | "\n", 258 | " numeric_columns = {feature_name: tf.feature_column.numeric_column(feature_name)\n", 259 | " for feature_name in all_numeric_feature_names}\n", 260 | "\n", 261 | " categorical_column_with_vocabulary = \\\n", 262 | " {item[0]: tf.feature_column.categorical_column_with_vocabulary_list(item[0], item[1])\n", 263 | " for item in CATEGORICAL_FEATURE_NAMES_WITH_VOCABULARY.items()}\n", 264 | " \n", 265 | " feature_columns = {}\n", 266 | "\n", 267 | " if numeric_columns is not None:\n", 268 | " feature_columns.update(numeric_columns)\n", 269 | "\n", 270 | " if categorical_column_with_vocabulary is not None:\n", 271 | " feature_columns.update(categorical_column_with_vocabulary)\n", 272 | " \n", 273 | " # add extended features (crossing, bucektization, embedding)\n", 274 | " \n", 275 | " feature_columns['alpha_X_beta'] = tf.feature_column.crossed_column(\n", 276 | " [feature_columns['alpha'], feature_columns['beta']], 4)\n", 277 | " \n", 278 | " return feature_columns\n", 279 | "\n", 280 | "feature_columns = get_feature_columns()\n", 281 | "print(\"Feature Columns: {}\".format(feature_columns))" 282 | ] 283 | }, 284 | { 285 | "cell_type": "markdown", 286 | "metadata": {}, 287 | "source": [ 288 | "## 4. Create an Estimator" 289 | ] 290 | }, 291 | { 292 | "cell_type": "markdown", 293 | "metadata": {}, 294 | "source": [ 295 | "### a. Define an Estimator Creation Function\n", 296 | "\n", 297 | "* Get dense (numeric) columns from the feature columns\n", 298 | "* Convert categorical columns to indicator columns\n", 299 | "* Create Instantiate a DNNRegressor estimator given **dense + indicator** feature columns + params" 300 | ] 301 | }, 302 | { 303 | "cell_type": "code", 304 | "execution_count": 7, 305 | "metadata": { 306 | "collapsed": true 307 | }, 308 | "outputs": [], 309 | "source": [ 310 | "def create_estimator(run_config, hparams):\n", 311 | " \n", 312 | " feature_columns = list(get_feature_columns().values())\n", 313 | " \n", 314 | " dense_columns = list(\n", 315 | " filter(lambda column: isinstance(column, feature_column._NumericColumn),\n", 316 | " feature_columns\n", 317 | " )\n", 318 | " )\n", 319 | "\n", 320 | " categorical_columns = list(\n", 321 | " filter(lambda column: isinstance(column, feature_column._VocabularyListCategoricalColumn) |\n", 322 | " isinstance(column, feature_column._BucketizedColumn),\n", 323 | " feature_columns)\n", 324 | " )\n", 325 | "\n", 326 | " indicator_columns = list(\n", 327 | " map(lambda column: tf.feature_column.indicator_column(column),\n", 328 | " categorical_columns)\n", 329 | " )\n", 330 | " \n", 331 | " \n", 332 | " estimator_feature_columns = dense_columns + indicator_columns \n", 333 | " \n", 334 | " estimator = tf.estimator.DNNRegressor(\n", 335 | " \n", 336 | " feature_columns= estimator_feature_columns,\n", 337 | " hidden_units= hparams.hidden_units,\n", 338 | " \n", 339 | " optimizer= tf.train.AdamOptimizer(),\n", 340 | " activation_fn= tf.nn.elu,\n", 341 | " dropout= hparams.dropout_prob,\n", 342 | " \n", 343 | " config= run_config\n", 344 | " )\n", 345 | " \n", 346 | " print(\"\")\n", 347 | " print(\"Estimator Type: {}\".format(type(estimator)))\n", 348 | " print(\"\")\n", 349 | " \n", 350 | " return estimator" 351 | ] 352 | }, 353 | { 354 | "cell_type": "markdown", 355 | "metadata": {}, 356 | "source": [ 357 | "### b. Set hyper-parameter values (HParams)" 358 | ] 359 | }, 360 | { 361 | "cell_type": "code", 362 | "execution_count": 8, 363 | "metadata": {}, 364 | "outputs": [ 365 | { 366 | "name": "stdout", 367 | "output_type": "stream", 368 | "text": [ 369 | "Model directory: trained_models/reg-model-01\n", 370 | "Hyper-parameters: [('batch_size', 500), ('dropout_prob', 0.0), ('hidden_units', [8, 4]), ('num_epochs', 100)]\n" 371 | ] 372 | } 373 | ], 374 | "source": [ 375 | "hparams = tf.contrib.training.HParams(\n", 376 | " num_epochs = 100,\n", 377 | " batch_size = 500,\n", 378 | " hidden_units=[8, 4], \n", 379 | " dropout_prob = 0.0)\n", 380 | "\n", 381 | "\n", 382 | "model_dir = 'trained_models/{}'.format(MODEL_NAME)\n", 383 | "\n", 384 | "run_config = tf.estimator.RunConfig().replace(model_dir=model_dir)\n", 385 | "print(\"Model directory: {}\".format(run_config.model_dir))\n", 386 | "print(\"Hyper-parameters: {}\".format(hparams))" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": {}, 392 | "source": [ 393 | "### c. Instantiate the estimator " 394 | ] 395 | }, 396 | { 397 | "cell_type": "code", 398 | "execution_count": 9, 399 | "metadata": {}, 400 | "outputs": [ 401 | { 402 | "name": "stdout", 403 | "output_type": "stream", 404 | "text": [ 405 | "INFO:tensorflow:Using config: {'_model_dir': 'trained_models/reg-model-01', '_tf_random_seed': None, '_save_summary_steps': 100, '_save_checkpoints_steps': None, '_save_checkpoints_secs': 600, '_session_config': None, '_keep_checkpoint_max': 5, '_keep_checkpoint_every_n_hours': 10000, '_log_step_count_steps': 100, '_service': None, '_cluster_spec': , '_task_type': 'worker', '_task_id': 0, '_master': '', '_is_chief': True, '_num_ps_replicas': 0, '_num_worker_replicas': 1}\n", 406 | "\n", 407 | "Estimator Type: \n", 408 | "\n" 409 | ] 410 | } 411 | ], 412 | "source": [ 413 | "estimator = create_estimator(run_config, hparams)" 414 | ] 415 | }, 416 | { 417 | "cell_type": "markdown", 418 | "metadata": {}, 419 | "source": [ 420 | "## 5. Train the Estimator" 421 | ] 422 | }, 423 | { 424 | "cell_type": "code", 425 | "execution_count": 10, 426 | "metadata": {}, 427 | "outputs": [ 428 | { 429 | "name": "stdout", 430 | "output_type": "stream", 431 | "text": [ 432 | "\n", 433 | "* data input_fn:\n", 434 | "================\n", 435 | "Input file: data/train-data.csv\n", 436 | "Dataset size: 12000\n", 437 | "Batch size: 500\n", 438 | "Epoch Count: 100\n", 439 | "Mode: train\n", 440 | "Thread Count: 1\n", 441 | "Shuffle: True\n", 442 | "================\n", 443 | "\n", 444 | "Estimator training started at 19:19:12\n", 445 | ".......................................\n", 446 | "INFO:tensorflow:Create CheckpointSaverHook.\n", 447 | "INFO:tensorflow:Saving checkpoints for 1 into trained_models/reg-model-01/model.ckpt.\n", 448 | "INFO:tensorflow:loss = 179225.0, step = 1\n", 449 | "INFO:tensorflow:global_step/sec: 166.515\n", 450 | "INFO:tensorflow:loss = 124778.0, step = 101 (0.602 sec)\n", 451 | "INFO:tensorflow:global_step/sec: 182.042\n", 452 | "INFO:tensorflow:loss = 144432.0, step = 201 (0.550 sec)\n", 453 | "INFO:tensorflow:global_step/sec: 221.401\n", 454 | "INFO:tensorflow:loss = 167542.0, step = 301 (0.451 sec)\n", 455 | "INFO:tensorflow:global_step/sec: 208.414\n", 456 | "INFO:tensorflow:loss = 146349.0, step = 401 (0.480 sec)\n", 457 | "INFO:tensorflow:global_step/sec: 216.184\n", 458 | "INFO:tensorflow:loss = 148680.0, step = 501 (0.462 sec)\n", 459 | "INFO:tensorflow:global_step/sec: 217.155\n", 460 | "INFO:tensorflow:loss = 123907.0, step = 601 (0.460 sec)\n", 461 | "INFO:tensorflow:global_step/sec: 209.701\n", 462 | "INFO:tensorflow:loss = 113046.0, step = 701 (0.477 sec)\n", 463 | "INFO:tensorflow:global_step/sec: 168.637\n", 464 | "INFO:tensorflow:loss = 107878.0, step = 801 (0.594 sec)\n", 465 | "INFO:tensorflow:global_step/sec: 126.787\n", 466 | "INFO:tensorflow:loss = 118305.0, step = 901 (0.788 sec)\n", 467 | "INFO:tensorflow:global_step/sec: 138.261\n", 468 | "INFO:tensorflow:loss = 101507.0, step = 1001 (0.723 sec)\n", 469 | "INFO:tensorflow:global_step/sec: 162.629\n", 470 | "INFO:tensorflow:loss = 106166.0, step = 1101 (0.616 sec)\n", 471 | "INFO:tensorflow:global_step/sec: 210.706\n", 472 | "INFO:tensorflow:loss = 107934.0, step = 1201 (0.474 sec)\n", 473 | "INFO:tensorflow:global_step/sec: 175.23\n", 474 | "INFO:tensorflow:loss = 98094.9, step = 1301 (0.571 sec)\n", 475 | "INFO:tensorflow:global_step/sec: 176.572\n", 476 | "INFO:tensorflow:loss = 89144.2, step = 1401 (0.566 sec)\n", 477 | "INFO:tensorflow:global_step/sec: 177.678\n", 478 | "INFO:tensorflow:loss = 104465.0, step = 1501 (0.563 sec)\n", 479 | "INFO:tensorflow:global_step/sec: 183.081\n", 480 | "INFO:tensorflow:loss = 92220.2, step = 1601 (0.546 sec)\n", 481 | "INFO:tensorflow:global_step/sec: 218.108\n", 482 | "INFO:tensorflow:loss = 79086.9, step = 1701 (0.458 sec)\n", 483 | "INFO:tensorflow:global_step/sec: 138.97\n", 484 | "INFO:tensorflow:loss = 93577.3, step = 1801 (0.724 sec)\n", 485 | "INFO:tensorflow:global_step/sec: 145.418\n", 486 | "INFO:tensorflow:loss = 75269.3, step = 1901 (0.684 sec)\n", 487 | "INFO:tensorflow:global_step/sec: 181.944\n", 488 | "INFO:tensorflow:loss = 73518.7, step = 2001 (0.549 sec)\n", 489 | "INFO:tensorflow:global_step/sec: 165.012\n", 490 | "INFO:tensorflow:loss = 75916.3, step = 2101 (0.607 sec)\n", 491 | "INFO:tensorflow:global_step/sec: 130.054\n", 492 | "INFO:tensorflow:loss = 65138.1, step = 2201 (0.768 sec)\n", 493 | "INFO:tensorflow:global_step/sec: 128.839\n", 494 | "INFO:tensorflow:loss = 65868.5, step = 2301 (0.777 sec)\n", 495 | "INFO:tensorflow:Saving checkpoints for 2400 into trained_models/reg-model-01/model.ckpt.\n", 496 | "INFO:tensorflow:Loss for final step: 88071.1.\n", 497 | ".......................................\n", 498 | "Estimator training finished at 19:19:30\n", 499 | "\n", 500 | "Estimator training elapsed time: 17.686301 seconds\n" 501 | ] 502 | } 503 | ], 504 | "source": [ 505 | "train_input_fn = generate_pandas_input_fn(file_name= TRAIN_DATA_FILE, \n", 506 | " mode=tf.estimator.ModeKeys.TRAIN,\n", 507 | " num_epochs=hparams.num_epochs,\n", 508 | " batch_size=hparams.batch_size) \n", 509 | "\n", 510 | "if not RESUME_TRAINING:\n", 511 | " shutil.rmtree(model_dir, ignore_errors=True)\n", 512 | " \n", 513 | "tf.logging.set_verbosity(tf.logging.INFO)\n", 514 | "\n", 515 | "time_start = datetime.utcnow() \n", 516 | "print(\"Estimator training started at {}\".format(time_start.strftime(\"%H:%M:%S\")))\n", 517 | "print(\".......................................\")\n", 518 | "\n", 519 | "estimator.train(input_fn = train_input_fn)\n", 520 | "\n", 521 | "time_end = datetime.utcnow() \n", 522 | "print(\".......................................\")\n", 523 | "print(\"Estimator training finished at {}\".format(time_end.strftime(\"%H:%M:%S\")))\n", 524 | "print(\"\")\n", 525 | "time_elapsed = time_end - time_start\n", 526 | "print(\"Estimator training elapsed time: {} seconds\".format(time_elapsed.total_seconds()))\n" 527 | ] 528 | }, 529 | { 530 | "cell_type": "markdown", 531 | "metadata": {}, 532 | "source": [ 533 | "## 6. Evaluate the Model" 534 | ] 535 | }, 536 | { 537 | "cell_type": "code", 538 | "execution_count": 11, 539 | "metadata": {}, 540 | "outputs": [ 541 | { 542 | "name": "stdout", 543 | "output_type": "stream", 544 | "text": [ 545 | "\n", 546 | "* data input_fn:\n", 547 | "================\n", 548 | "Input file: data/test-data.csv\n", 549 | "Dataset size: 5000\n", 550 | "Batch size: 5000\n", 551 | "Epoch Count: 1\n", 552 | "Mode: eval\n", 553 | "Thread Count: 1\n", 554 | "Shuffle: False\n", 555 | "================\n", 556 | "\n", 557 | "INFO:tensorflow:Starting evaluation at 2017-11-14-19:19:30\n", 558 | "INFO:tensorflow:Restoring parameters from trained_models/reg-model-01/model.ckpt-2400\n", 559 | "INFO:tensorflow:Finished evaluation at 2017-11-14-19:19:31\n", 560 | "INFO:tensorflow:Saving dict for global step 2400: average_loss = 164.862, global_step = 2400, loss = 824311.0\n", 561 | "\n", 562 | "{'average_loss': 164.86218, 'loss': 824310.88, 'global_step': 2400}\n", 563 | "\n", 564 | "RMSE: 12.83987\n" 565 | ] 566 | } 567 | ], 568 | "source": [ 569 | "TEST_SIZE = 5000\n", 570 | "\n", 571 | "test_input_fn = generate_pandas_input_fn(file_name=TEST_DATA_FILE, \n", 572 | " mode= tf.estimator.ModeKeys.EVAL,\n", 573 | " batch_size= TEST_SIZE)\n", 574 | "\n", 575 | "results = estimator.evaluate(input_fn=test_input_fn)\n", 576 | "print(\"\")\n", 577 | "print(results)\n", 578 | "rmse = round(math.sqrt(results[\"average_loss\"]),5)\n", 579 | "print(\"\")\n", 580 | "print(\"RMSE: {}\".format(rmse))" 581 | ] 582 | }, 583 | { 584 | "cell_type": "markdown", 585 | "metadata": {}, 586 | "source": [ 587 | "## 7. Prediction" 588 | ] 589 | }, 590 | { 591 | "cell_type": "code", 592 | "execution_count": 12, 593 | "metadata": {}, 594 | "outputs": [ 595 | { 596 | "name": "stdout", 597 | "output_type": "stream", 598 | "text": [ 599 | "\n", 600 | "* data input_fn:\n", 601 | "================\n", 602 | "Input file: data/test-data.csv\n", 603 | "Dataset size: 5000\n", 604 | "Batch size: 5\n", 605 | "Epoch Count: 1\n", 606 | "Mode: infer\n", 607 | "Thread Count: 1\n", 608 | "Shuffle: False\n", 609 | "================\n", 610 | "\n", 611 | "INFO:tensorflow:Restoring parameters from trained_models/reg-model-01/model.ckpt-2400\n", 612 | "\n", 613 | "Predicted Values: [13.141397, -5.9562521, 11.541443, 3.8178449, 2.1242597]\n" 614 | ] 615 | } 616 | ], 617 | "source": [ 618 | "import itertools\n", 619 | "\n", 620 | "predict_input_fn = generate_pandas_input_fn(file_name=TEST_DATA_FILE, \n", 621 | " mode= tf.estimator.ModeKeys.PREDICT,\n", 622 | " batch_size= 5)\n", 623 | "\n", 624 | "predictions = estimator.predict(input_fn=predict_input_fn)\n", 625 | "values = list(map(lambda item: item[\"predictions\"][0],list(itertools.islice(predictions, 5))))\n", 626 | "print()\n", 627 | "print(\"Predicted Values: {}\".format(values))" 628 | ] 629 | }, 630 | { 631 | "cell_type": "markdown", 632 | "metadata": {}, 633 | "source": [ 634 | "## 8. Save & Serve the Model" 635 | ] 636 | }, 637 | { 638 | "cell_type": "markdown", 639 | "metadata": {}, 640 | "source": [ 641 | "### a. Define Seving Function" 642 | ] 643 | }, 644 | { 645 | "cell_type": "code", 646 | "execution_count": 1, 647 | "metadata": { 648 | "collapsed": true 649 | }, 650 | "outputs": [], 651 | "source": [ 652 | "def process_features(features):\n", 653 | " \n", 654 | " features[\"x_2\"] = tf.square(features['x'])\n", 655 | " features[\"y_2\"] = tf.square(features['y'])\n", 656 | " features[\"xy\"] = tf.multiply(features['x'], features['y'])\n", 657 | " features['dist_xy'] = tf.sqrt(tf.squared_difference(features['x'],features['y']))\n", 658 | " \n", 659 | " return features\n", 660 | "\n", 661 | "def csv_serving_input_fn():\n", 662 | " \n", 663 | " SERVING_HEADER = ['x','y','alpha','beta']\n", 664 | " SERVING_HEADER_DEFAULTS = [[0.0], [0.0], ['NA'], ['NA']]\n", 665 | "\n", 666 | " rows_string_tensor = tf.placeholder(dtype=tf.string,\n", 667 | " shape=[None],\n", 668 | " name='csv_rows')\n", 669 | " \n", 670 | " receiver_tensor = {'csv_rows': rows_string_tensor}\n", 671 | "\n", 672 | " row_columns = tf.expand_dims(rows_string_tensor, -1)\n", 673 | " columns = tf.decode_csv(row_columns, record_defaults=SERVING_HEADER_DEFAULTS)\n", 674 | " features = dict(zip(SERVING_HEADER, columns))\n", 675 | " \n", 676 | " if PROCESS_FEATURES:\n", 677 | " features = process_features(features)\n", 678 | "\n", 679 | " return tf.estimator.export.ServingInputReceiver(\n", 680 | " features, receiver_tensor)" 681 | ] 682 | }, 683 | { 684 | "cell_type": "markdown", 685 | "metadata": {}, 686 | "source": [ 687 | "### b. Export SavedModel" 688 | ] 689 | }, 690 | { 691 | "cell_type": "code", 692 | "execution_count": 31, 693 | "metadata": {}, 694 | "outputs": [ 695 | { 696 | "name": "stdout", 697 | "output_type": "stream", 698 | "text": [ 699 | "INFO:tensorflow:Restoring parameters from trained_models/reg-model-01/model.ckpt-2400\n", 700 | "INFO:tensorflow:Assets added to graph.\n", 701 | "INFO:tensorflow:No assets to write.\n", 702 | "INFO:tensorflow:SavedModel written to: b\"trained_models/reg-model-01/export/temp-b'1510688109'/saved_model.pbtxt\"\n" 703 | ] 704 | }, 705 | { 706 | "data": { 707 | "text/plain": [ 708 | "b'trained_models/reg-model-01/export/1510688109'" 709 | ] 710 | }, 711 | "execution_count": 31, 712 | "metadata": {}, 713 | "output_type": "execute_result" 714 | } 715 | ], 716 | "source": [ 717 | "export_dir = model_dir + \"/export\"\n", 718 | "\n", 719 | "estimator.export_savedmodel(\n", 720 | " export_dir_base = export_dir,\n", 721 | " serving_input_receiver_fn = csv_serving_input_fn,\n", 722 | " as_text=True\n", 723 | ")\n" 724 | ] 725 | }, 726 | { 727 | "cell_type": "markdown", 728 | "metadata": {}, 729 | "source": [ 730 | "### c. Serve the Saved Model" 731 | ] 732 | }, 733 | { 734 | "cell_type": "code", 735 | "execution_count": 35, 736 | "metadata": {}, 737 | "outputs": [ 738 | { 739 | "name": "stdout", 740 | "output_type": "stream", 741 | "text": [ 742 | "trained_models/reg-model-01/export/1510688109\n", 743 | "INFO:tensorflow:Restoring parameters from b'trained_models/reg-model-01/export/1510688109/variables/variables'\n", 744 | "{'predictions': array([[ 13.15929985],\n", 745 | " [-13.96904373]], dtype=float32)}\n" 746 | ] 747 | } 748 | ], 749 | "source": [ 750 | "import os\n", 751 | "\n", 752 | "saved_model_dir = export_dir + \"/\" + os.listdir(path=export_dir)[-1] \n", 753 | "\n", 754 | "print(saved_model_dir)\n", 755 | "\n", 756 | "predictor_fn = tf.contrib.predictor.from_saved_model(\n", 757 | " export_dir = saved_model_dir,\n", 758 | " signature_def_key=\"predict\"\n", 759 | ")\n", 760 | "\n", 761 | "output = predictor_fn({'csv_rows': [\"0.5,1,ax01,bx02\", \"-0.5,-1,ax02,bx02\"]})\n", 762 | "print(output)" 763 | ] 764 | }, 765 | { 766 | "cell_type": "markdown", 767 | "metadata": {}, 768 | "source": [ 769 | "## What can we improve?\n", 770 | "\n", 771 | "* **Use data files instead of DataFrames** - pandas dataframes need to fit in memory, and hard to distribute. Working with (sharded) training data files allows reading records in batches (so we can work with large data set regardless the memory size), as well as supporting distributed training (data parallelism).\n", 772 | "\n", 773 | "\n", 774 | "* **Use Experiment APIs** - Experiment API knows how to invoke training and eval loops in a sensible fashion for local & distributed training.\n", 775 | "\n", 776 | "\n", 777 | "* ** Early Stopping** - Use the validation set evaluation to stop the training and avoid overfitting.\n" 778 | ] 779 | }, 780 | { 781 | "cell_type": "code", 782 | "execution_count": null, 783 | "metadata": { 784 | "collapsed": true 785 | }, 786 | "outputs": [], 787 | "source": [] 788 | } 789 | ], 790 | "metadata": { 791 | "kernelspec": { 792 | "display_name": "Python 3", 793 | "language": "python", 794 | "name": "python3" 795 | }, 796 | "language_info": { 797 | "codemirror_mode": { 798 | "name": "ipython", 799 | "version": 3 800 | }, 801 | "file_extension": ".py", 802 | "mimetype": "text/x-python", 803 | "name": "python", 804 | "nbconvert_exporter": "python", 805 | "pygments_lexer": "ipython3", 806 | "version": "3.6.1" 807 | } 808 | }, 809 | "nbformat": 4, 810 | "nbformat_minor": 2 811 | } 812 | -------------------------------------------------------------------------------- /01 - Regression/06.0 - Convert CSV to TFRecords.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stderr", 10 | "output_type": "stream", 11 | "text": [ 12 | "/Users/khalidsalama/anaconda/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6\n", 13 | " return f(*args, **kwds)\n" 14 | ] 15 | }, 16 | { 17 | "name": "stdout", 18 | "output_type": "stream", 19 | "text": [ 20 | "1.4.0\n" 21 | ] 22 | } 23 | ], 24 | "source": [ 25 | "import tensorflow as tf\n", 26 | "import csv\n", 27 | "import os\n", 28 | "\n", 29 | "print(tf.__version__)" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 2, 35 | "metadata": { 36 | "collapsed": true 37 | }, 38 | "outputs": [], 39 | "source": [ 40 | "train_data_files = ['data/train-data.csv']\n", 41 | "valid_data_files = ['data/valid-data.csv']\n", 42 | "test_data_files = ['data/test-data.csv']" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 3, 48 | "metadata": {}, 49 | "outputs": [ 50 | { 51 | "name": "stdout", 52 | "output_type": "stream", 53 | "text": [ 54 | "Header: ['key', 'x', 'y', 'alpha', 'beta', 'target']\n", 55 | "Numeric Features: ['x', 'y']\n", 56 | "Categorical Features: ['alpha', 'beta']\n", 57 | "Target: target\n", 58 | "Unused Features: ['key']\n" 59 | ] 60 | } 61 | ], 62 | "source": [ 63 | "HEADER = ['key','x','y','alpha','beta','target']\n", 64 | "HEADER_DEFAULTS = [[0], [0.0], [0.0], ['NA'], ['NA'], [0.0]]\n", 65 | "\n", 66 | "NUMERIC_FEATURE_NAMES = ['x', 'y'] \n", 67 | "\n", 68 | "CATEGORICAL_FEATURE_NAMES_WITH_VOCABULARY = {'alpha':['ax01', 'ax02'], 'beta':['bx01', 'bx02']}\n", 69 | "CATEGORICAL_FEATURE_NAMES = list(CATEGORICAL_FEATURE_NAMES_WITH_VOCABULARY.keys())\n", 70 | "\n", 71 | "FEATURE_NAMES = NUMERIC_FEATURE_NAMES + CATEGORICAL_FEATURE_NAMES\n", 72 | "\n", 73 | "TARGET_NAME = 'target'\n", 74 | "\n", 75 | "UNUSED_FEATURE_NAMES = list(set(HEADER) - set(FEATURE_NAMES) - {TARGET_NAME})\n", 76 | "\n", 77 | "print(\"Header: {}\".format(HEADER))\n", 78 | "print(\"Numeric Features: {}\".format(NUMERIC_FEATURE_NAMES))\n", 79 | "print(\"Categorical Features: {}\".format(CATEGORICAL_FEATURE_NAMES))\n", 80 | "print(\"Target: {}\".format(TARGET_NAME))\n", 81 | "print(\"Unused Features: {}\".format(UNUSED_FEATURE_NAMES))" 82 | ] 83 | }, 84 | { 85 | "cell_type": "code", 86 | "execution_count": 4, 87 | "metadata": { 88 | "collapsed": true 89 | }, 90 | "outputs": [], 91 | "source": [ 92 | "def create_csv_iterator(csv_file_path, skip_header):\n", 93 | " \n", 94 | " with tf.gfile.Open(csv_file_path) as csv_file:\n", 95 | " reader = csv.reader(csv_file)\n", 96 | " if skip_header: # Skip the header\n", 97 | " next(reader)\n", 98 | " for row in reader:\n", 99 | " yield row" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 5, 105 | "metadata": { 106 | "collapsed": true 107 | }, 108 | "outputs": [], 109 | "source": [ 110 | "def create_example(row):\n", 111 | " \"\"\"\n", 112 | " Returns a tensorflow.Example Protocol Buffer object.\n", 113 | " \"\"\"\n", 114 | " example = tf.train.Example()\n", 115 | "\n", 116 | " for i in range(len(HEADER)):\n", 117 | " \n", 118 | " feature_name = HEADER[i]\n", 119 | " feature_value = row[i]\n", 120 | " \n", 121 | " if feature_name in UNUSED_FEATURE_NAMES:\n", 122 | " continue\n", 123 | " \n", 124 | " if feature_name in NUMERIC_FEATURE_NAMES:\n", 125 | " example.features.feature[feature_name].float_list.value.extend([float(feature_value)])\n", 126 | " \n", 127 | " elif feature_name in CATEGORICAL_FEATURE_NAMES:\n", 128 | " example.features.feature[feature_name].bytes_list.value.extend([bytes(feature_value, 'utf-8')])\n", 129 | " \n", 130 | "\n", 131 | " elif feature_name in TARGET_NAME:\n", 132 | " example.features.feature[feature_name].float_list.value.extend([float(feature_value)])\n", 133 | "\n", 134 | " return example" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 6, 140 | "metadata": { 141 | "collapsed": true 142 | }, 143 | "outputs": [], 144 | "source": [ 145 | "def create_tfrecords_file(input_csv_file):\n", 146 | " \"\"\"\n", 147 | " Creates a TFRecords file for the given input data and\n", 148 | " example transofmration function\n", 149 | " \"\"\"\n", 150 | " output_tfrecord_file = input_csv_file.replace(\"csv\",\"tfrecords\")\n", 151 | " writer = tf.python_io.TFRecordWriter(output_tfrecord_file)\n", 152 | " \n", 153 | " print(\"Creating TFRecords file at\", output_tfrecord_file, \"...\")\n", 154 | " \n", 155 | " for i, row in enumerate(create_csv_iterator(input_csv_file, skip_header=False)):\n", 156 | " \n", 157 | " if len(row) == 0:\n", 158 | " continue\n", 159 | " \n", 160 | " example = create_example(row)\n", 161 | " content = example.SerializeToString()\n", 162 | " writer.write(content)\n", 163 | " \n", 164 | " writer.close()\n", 165 | " \n", 166 | " print(\"Finish Writing\", output_tfrecord_file)" 167 | ] 168 | }, 169 | { 170 | "cell_type": "code", 171 | "execution_count": 7, 172 | "metadata": {}, 173 | "outputs": [ 174 | { 175 | "name": "stdout", 176 | "output_type": "stream", 177 | "text": [ 178 | "Converting Training Data Files\n", 179 | "Creating TFRecords file at data/train-data.tfrecords ...\n", 180 | "Finish Writing data/train-data.tfrecords\n", 181 | "\n", 182 | "Converting Validation Data Files\n", 183 | "Creating TFRecords file at data/valid-data.tfrecords ...\n", 184 | "Finish Writing data/valid-data.tfrecords\n", 185 | "\n", 186 | "Converting Test Data Files\n", 187 | "Creating TFRecords file at data/test-data.tfrecords ...\n", 188 | "Finish Writing data/test-data.tfrecords\n" 189 | ] 190 | } 191 | ], 192 | "source": [ 193 | "print(\"Converting Training Data Files\")\n", 194 | "for input_csv_file in train_data_files:\n", 195 | " create_tfrecords_file(input_csv_file)\n", 196 | "print(\"\")\n", 197 | "\n", 198 | "print(\"Converting Validation Data Files\")\n", 199 | "for input_csv_file in valid_data_files:\n", 200 | " create_tfrecords_file(input_csv_file)\n", 201 | "print(\"\")\n", 202 | "\n", 203 | "print(\"Converting Test Data Files\")\n", 204 | "for input_csv_file in test_data_files:\n", 205 | " create_tfrecords_file(input_csv_file)" 206 | ] 207 | }, 208 | { 209 | "cell_type": "code", 210 | "execution_count": null, 211 | "metadata": { 212 | "collapsed": true 213 | }, 214 | "outputs": [], 215 | "source": [] 216 | } 217 | ], 218 | "metadata": { 219 | "kernelspec": { 220 | "display_name": "Python 3", 221 | "language": "python", 222 | "name": "python3" 223 | }, 224 | "language_info": { 225 | "codemirror_mode": { 226 | "name": "ipython", 227 | "version": 3 228 | }, 229 | "file_extension": ".py", 230 | "mimetype": "text/x-python", 231 | "name": "python", 232 | "nbconvert_exporter": "python", 233 | "pygments_lexer": "ipython3", 234 | "version": "3.6.1" 235 | } 236 | }, 237 | "nbformat": 4, 238 | "nbformat_minor": 2 239 | } 240 | -------------------------------------------------------------------------------- /01 - Regression/data/new-data.csv: -------------------------------------------------------------------------------- 1 | 1.3,-0.5,ax01,bx02 -------------------------------------------------------------------------------- /01 - Regression/data/new-data.json: -------------------------------------------------------------------------------- 1 | {"x": 1.3, "y": -0.5, "alpha": "ax01", "beta": "bx02"} -------------------------------------------------------------------------------- /01 - Regression/data/test-data.tfrecords: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ksalama/tf-estimator-tutorials/cecfea0c378ebc8552941c9ebf8a530228dd845d/01 - Regression/data/test-data.tfrecords -------------------------------------------------------------------------------- /01 - Regression/data/train-data.tfrecords: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ksalama/tf-estimator-tutorials/cecfea0c378ebc8552941c9ebf8a530228dd845d/01 - Regression/data/train-data.tfrecords -------------------------------------------------------------------------------- /01 - Regression/data/valid-data.tfrecords: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ksalama/tf-estimator-tutorials/cecfea0c378ebc8552941c9ebf8a530228dd845d/01 - Regression/data/valid-data.tfrecords -------------------------------------------------------------------------------- /02 - Classification/-- TensorBoard.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stdout", 10 | "output_type": "stream", 11 | "text": [ 12 | "trained_models/class-model-01\n" 13 | ] 14 | } 15 | ], 16 | "source": [ 17 | "MODEL_NAME = 'class-model-01'\n", 18 | "model_dir = 'trained_models/{}'.format(MODEL_NAME)\n", 19 | "print(model_dir)" 20 | ] 21 | }, 22 | { 23 | "cell_type": "markdown", 24 | "metadata": {}, 25 | "source": [ 26 | "## Start TensorBoard Process" 27 | ] 28 | }, 29 | { 30 | "cell_type": "code", 31 | "execution_count": null, 32 | "metadata": { 33 | "collapsed": true 34 | }, 35 | "outputs": [], 36 | "source": [ 37 | "from google.datalab.ml import TensorBoard\n", 38 | "TensorBoard().start(model_dir)\n", 39 | "TensorBoard().list()" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "## Kill TensorBoard Process" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": null, 52 | "metadata": { 53 | "collapsed": true 54 | }, 55 | "outputs": [], 56 | "source": [ 57 | "# to stop TensorBoard\n", 58 | "TensorBoard().stop(23002)\n", 59 | "print('stopped TensorBoard')\n", 60 | "TensorBoard().list()" 61 | ] 62 | } 63 | ], 64 | "metadata": { 65 | "kernelspec": { 66 | "display_name": "Python 3", 67 | "language": "python", 68 | "name": "python3" 69 | }, 70 | "language_info": { 71 | "codemirror_mode": { 72 | "name": "ipython", 73 | "version": 3 74 | }, 75 | "file_extension": ".py", 76 | "mimetype": "text/x-python", 77 | "name": "python", 78 | "nbconvert_exporter": "python", 79 | "pygments_lexer": "ipython3", 80 | "version": "3.6.1" 81 | } 82 | }, 83 | "nbformat": 4, 84 | "nbformat_minor": 2 85 | } 86 | -------------------------------------------------------------------------------- /02 - Classification/00.0 - TensorFlow Version Update.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stdout", 10 | "output_type": "stream", 11 | "text": [ 12 | "Collecting tensorflow\n", 13 | " Downloading tensorflow-1.4.0-cp36-cp36m-macosx_10_11_x86_64.whl (39.3MB)\n", 14 | "Collecting tensorflow-tensorboard<0.5.0,>=0.4.0rc1 (from tensorflow)\n", 15 | " Downloading tensorflow_tensorboard-0.4.0rc2-py3-none-any.whl (1.7MB)\n", 16 | "Requirement already up-to-date: protobuf>=3.3.0 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n", 17 | "Requirement already up-to-date: numpy>=1.12.1 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n", 18 | "Requirement already up-to-date: wheel>=0.26 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n", 19 | "Collecting enum34>=1.1.6 (from tensorflow)\n", 20 | " Downloading enum34-1.1.6-py3-none-any.whl\n", 21 | "Requirement already up-to-date: six>=1.10.0 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n", 22 | "Requirement already up-to-date: werkzeug>=0.11.10 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n", 23 | "Requirement already up-to-date: html5lib==0.9999999 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n", 24 | "Requirement already up-to-date: markdown>=2.6.8 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n", 25 | "Requirement already up-to-date: bleach==1.5.0 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n", 26 | "Requirement already up-to-date: setuptools in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from protobuf>=3.3.0->tensorflow)\n", 27 | "Installing collected packages: tensorflow-tensorboard, enum34, tensorflow\n", 28 | " Found existing installation: tensorflow-tensorboard 0.1.8\n", 29 | " Uninstalling tensorflow-tensorboard-0.1.8:\n", 30 | " Successfully uninstalled tensorflow-tensorboard-0.1.8\n", 31 | " Found existing installation: tensorflow 1.3.0\n", 32 | " Uninstalling tensorflow-1.3.0:\n", 33 | " Successfully uninstalled tensorflow-1.3.0\n", 34 | "Successfully installed enum34-1.1.6 tensorflow-1.4.0 tensorflow-tensorboard-0.4.0rc2\n" 35 | ] 36 | } 37 | ], 38 | "source": [ 39 | "%%bash\n", 40 | "\n", 41 | "pip install -U tensorflow" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 2, 47 | "metadata": {}, 48 | "outputs": [ 49 | { 50 | "name": "stderr", 51 | "output_type": "stream", 52 | "text": [ 53 | "/Users/khalidsalama/anaconda/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6\n", 54 | " return f(*args, **kwds)\n" 55 | ] 56 | }, 57 | { 58 | "name": "stdout", 59 | "output_type": "stream", 60 | "text": [ 61 | "1.4.0\n" 62 | ] 63 | } 64 | ], 65 | "source": [ 66 | "import tensorflow as tf\n", 67 | "print(tf.__version__)" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": null, 73 | "metadata": { 74 | "collapsed": true 75 | }, 76 | "outputs": [], 77 | "source": [] 78 | } 79 | ], 80 | "metadata": { 81 | "kernelspec": { 82 | "display_name": "Python 3", 83 | "language": "python", 84 | "name": "python3" 85 | }, 86 | "language_info": { 87 | "codemirror_mode": { 88 | "name": "ipython", 89 | "version": 3 90 | }, 91 | "file_extension": ".py", 92 | "mimetype": "text/x-python", 93 | "name": "python", 94 | "nbconvert_exporter": "python", 95 | "pygments_lexer": "ipython3", 96 | "version": "3.6.1" 97 | } 98 | }, 99 | "nbformat": 4, 100 | "nbformat_minor": 2 101 | } 102 | -------------------------------------------------------------------------------- /02 - Classification/02.0 - Convert CSV to TFRecords.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stderr", 10 | "output_type": "stream", 11 | "text": [ 12 | "/Users/khalidsalama/anaconda/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6\n", 13 | " return f(*args, **kwds)\n" 14 | ] 15 | }, 16 | { 17 | "name": "stdout", 18 | "output_type": "stream", 19 | "text": [ 20 | "1.4.0\n" 21 | ] 22 | } 23 | ], 24 | "source": [ 25 | "import tensorflow as tf\n", 26 | "import csv\n", 27 | "import os\n", 28 | "\n", 29 | "print(tf.__version__)" 30 | ] 31 | }, 32 | { 33 | "cell_type": "code", 34 | "execution_count": 2, 35 | "metadata": { 36 | "collapsed": true 37 | }, 38 | "outputs": [], 39 | "source": [ 40 | "train_data_files = ['data/train-data.csv']\n", 41 | "valid_data_files = ['data/valid-data.csv']\n", 42 | "test_data_files = ['data/test-data.csv']" 43 | ] 44 | }, 45 | { 46 | "cell_type": "code", 47 | "execution_count": 3, 48 | "metadata": {}, 49 | "outputs": [ 50 | { 51 | "name": "stdout", 52 | "output_type": "stream", 53 | "text": [ 54 | "Header: ['key', 'x', 'y', 'alpha', 'beta', 'target']\n", 55 | "Numeric Features: ['x', 'y']\n", 56 | "Categorical Features: ['alpha', 'beta']\n", 57 | "Target: target - labels: ['postive', 'negative']\n", 58 | "Unused Features: ['key']\n" 59 | ] 60 | } 61 | ], 62 | "source": [ 63 | "HEADER = ['key','x','y','alpha','beta','target']\n", 64 | "HEADER_DEFAULTS = [[0], [0.0], [0.0], ['NA'], ['NA'], ['NA']]\n", 65 | "\n", 66 | "NUMERIC_FEATURE_NAMES = ['x', 'y'] \n", 67 | "\n", 68 | "CATEGORICAL_FEATURE_NAMES_WITH_VOCABULARY = {'alpha':['ax01', 'ax02'], 'beta':['bx01', 'bx02']}\n", 69 | "CATEGORICAL_FEATURE_NAMES = list(CATEGORICAL_FEATURE_NAMES_WITH_VOCABULARY.keys())\n", 70 | "\n", 71 | "FEATURE_NAMES = NUMERIC_FEATURE_NAMES + CATEGORICAL_FEATURE_NAMES\n", 72 | "\n", 73 | "TARGET_NAME = 'target'\n", 74 | "\n", 75 | "TARGET_LABELS = ['postive', 'negative']\n", 76 | "\n", 77 | "UNUSED_FEATURE_NAMES = list(set(HEADER) - set(FEATURE_NAMES) - {TARGET_NAME})\n", 78 | "\n", 79 | "print(\"Header: {}\".format(HEADER))\n", 80 | "print(\"Numeric Features: {}\".format(NUMERIC_FEATURE_NAMES))\n", 81 | "print(\"Categorical Features: {}\".format(CATEGORICAL_FEATURE_NAMES))\n", 82 | "print(\"Target: {} - labels: {}\".format(TARGET_NAME, TARGET_LABELS))\n", 83 | "print(\"Unused Features: {}\".format(UNUSED_FEATURE_NAMES))" 84 | ] 85 | }, 86 | { 87 | "cell_type": "code", 88 | "execution_count": 4, 89 | "metadata": { 90 | "collapsed": true 91 | }, 92 | "outputs": [], 93 | "source": [ 94 | "def create_csv_iterator(csv_file_path, skip_header):\n", 95 | " \n", 96 | " with tf.gfile.Open(csv_file_path) as csv_file:\n", 97 | " reader = csv.reader(csv_file)\n", 98 | " if skip_header: # Skip the header\n", 99 | " next(reader)\n", 100 | " for row in reader:\n", 101 | " yield row" 102 | ] 103 | }, 104 | { 105 | "cell_type": "code", 106 | "execution_count": 5, 107 | "metadata": { 108 | "collapsed": true 109 | }, 110 | "outputs": [], 111 | "source": [ 112 | "def create_example(row):\n", 113 | " \"\"\"\n", 114 | " Returns a tensorflow.Example Protocol Buffer object.\n", 115 | " \"\"\"\n", 116 | " example = tf.train.Example()\n", 117 | "\n", 118 | " for i in range(len(HEADER)):\n", 119 | " \n", 120 | " feature_name = HEADER[i]\n", 121 | " feature_value = row[i]\n", 122 | " \n", 123 | " if feature_name in UNUSED_FEATURE_NAMES:\n", 124 | " continue\n", 125 | " \n", 126 | " if feature_name in NUMERIC_FEATURE_NAMES:\n", 127 | " example.features.feature[feature_name].float_list.value.extend([float(feature_value)])\n", 128 | " \n", 129 | " elif feature_name in CATEGORICAL_FEATURE_NAMES:\n", 130 | " example.features.feature[feature_name].bytes_list.value.extend([bytes(feature_value, 'utf-8')])\n", 131 | "\n", 132 | " elif feature_name in TARGET_NAME:\n", 133 | " example.features.feature[feature_name].bytes_list.value.extend([bytes(feature_value, 'utf-8')])\n", 134 | "\n", 135 | " return example" 136 | ] 137 | }, 138 | { 139 | "cell_type": "code", 140 | "execution_count": 6, 141 | "metadata": { 142 | "collapsed": true 143 | }, 144 | "outputs": [], 145 | "source": [ 146 | "def create_tfrecords_file(input_csv_file):\n", 147 | " \"\"\"\n", 148 | " Creates a TFRecords file for the given input data and\n", 149 | " example transofmration function\n", 150 | " \"\"\"\n", 151 | " output_tfrecord_file = input_csv_file.replace(\"csv\",\"tfrecords\")\n", 152 | " writer = tf.python_io.TFRecordWriter(output_tfrecord_file)\n", 153 | " \n", 154 | " print(\"Creating TFRecords file at\", output_tfrecord_file, \"...\")\n", 155 | " \n", 156 | " for i, row in enumerate(create_csv_iterator(input_csv_file, skip_header=False)):\n", 157 | " \n", 158 | " if len(row) == 0:\n", 159 | " continue\n", 160 | " \n", 161 | " example = create_example(row)\n", 162 | " content = example.SerializeToString()\n", 163 | " writer.write(content)\n", 164 | " \n", 165 | " writer.close()\n", 166 | " \n", 167 | " print(\"Finish Writing\", output_tfrecord_file)" 168 | ] 169 | }, 170 | { 171 | "cell_type": "code", 172 | "execution_count": 7, 173 | "metadata": {}, 174 | "outputs": [ 175 | { 176 | "name": "stdout", 177 | "output_type": "stream", 178 | "text": [ 179 | "Converting Training Data Files\n", 180 | "Creating TFRecords file at data/train-data.tfrecords ...\n", 181 | "Finish Writing data/train-data.tfrecords\n", 182 | "\n", 183 | "Converting Validation Data Files\n", 184 | "Creating TFRecords file at data/valid-data.tfrecords ...\n", 185 | "Finish Writing data/valid-data.tfrecords\n", 186 | "\n", 187 | "Converting Test Data Files\n", 188 | "Creating TFRecords file at data/test-data.tfrecords ...\n", 189 | "Finish Writing data/test-data.tfrecords\n" 190 | ] 191 | } 192 | ], 193 | "source": [ 194 | "print(\"Converting Training Data Files\")\n", 195 | "for input_csv_file in train_data_files:\n", 196 | " create_tfrecords_file(input_csv_file)\n", 197 | "print(\"\")\n", 198 | "\n", 199 | "print(\"Converting Validation Data Files\")\n", 200 | "for input_csv_file in valid_data_files:\n", 201 | " create_tfrecords_file(input_csv_file)\n", 202 | "print(\"\")\n", 203 | "\n", 204 | "print(\"Converting Test Data Files\")\n", 205 | "for input_csv_file in test_data_files:\n", 206 | " create_tfrecords_file(input_csv_file)" 207 | ] 208 | }, 209 | { 210 | "cell_type": "code", 211 | "execution_count": null, 212 | "metadata": { 213 | "collapsed": true 214 | }, 215 | "outputs": [], 216 | "source": [] 217 | } 218 | ], 219 | "metadata": { 220 | "kernelspec": { 221 | "display_name": "Python 3", 222 | "language": "python", 223 | "name": "python3" 224 | }, 225 | "language_info": { 226 | "codemirror_mode": { 227 | "name": "ipython", 228 | "version": 3 229 | }, 230 | "file_extension": ".py", 231 | "mimetype": "text/x-python", 232 | "name": "python", 233 | "nbconvert_exporter": "python", 234 | "pygments_lexer": "ipython3", 235 | "version": "3.6.1" 236 | } 237 | }, 238 | "nbformat": 4, 239 | "nbformat_minor": 2 240 | } 241 | -------------------------------------------------------------------------------- /02 - Classification/data/adult.stats.csv: -------------------------------------------------------------------------------- 1 | ,max,mean,min,stdv 2 | age,90,38.58164675532078,17,13.640432553581146 3 | fnlwgt,1484705,189778.36651208502,12285,105549.97769702235 4 | education_num,16,10.0806793403151,1,2.5727203320673406 5 | capital_gain,99999,1077.6488437087312,0,7385.292084839299 6 | capital_loss,4356,87.303829734959,0,402.96021864905896 7 | hours_per_week,99,40.437455852092995,1,12.347428681730811 8 | -------------------------------------------------------------------------------- /02 - Classification/data/test-data.tfrecords: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ksalama/tf-estimator-tutorials/cecfea0c378ebc8552941c9ebf8a530228dd845d/02 - Classification/data/test-data.tfrecords -------------------------------------------------------------------------------- /02 - Classification/data/train-data.tfrecords: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ksalama/tf-estimator-tutorials/cecfea0c378ebc8552941c9ebf8a530228dd845d/02 - Classification/data/train-data.tfrecords -------------------------------------------------------------------------------- /02 - Classification/data/valid-data.tfrecords: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ksalama/tf-estimator-tutorials/cecfea0c378ebc8552941c9ebf8a530228dd845d/02 - Classification/data/valid-data.tfrecords -------------------------------------------------------------------------------- /03 - Clustering/00.0 - TensorFlow Version Update.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [ 8 | { 9 | "name": "stdout", 10 | "output_type": "stream", 11 | "text": [ 12 | "Collecting tensorflow\n", 13 | " Downloading tensorflow-1.4.0-cp36-cp36m-macosx_10_11_x86_64.whl (39.3MB)\n", 14 | "Collecting tensorflow-tensorboard<0.5.0,>=0.4.0rc1 (from tensorflow)\n", 15 | " Downloading tensorflow_tensorboard-0.4.0rc2-py3-none-any.whl (1.7MB)\n", 16 | "Requirement already up-to-date: protobuf>=3.3.0 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n", 17 | "Requirement already up-to-date: numpy>=1.12.1 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n", 18 | "Requirement already up-to-date: wheel>=0.26 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n", 19 | "Collecting enum34>=1.1.6 (from tensorflow)\n", 20 | " Downloading enum34-1.1.6-py3-none-any.whl\n", 21 | "Requirement already up-to-date: six>=1.10.0 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow)\n", 22 | "Requirement already up-to-date: werkzeug>=0.11.10 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n", 23 | "Requirement already up-to-date: html5lib==0.9999999 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n", 24 | "Requirement already up-to-date: markdown>=2.6.8 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n", 25 | "Requirement already up-to-date: bleach==1.5.0 in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from tensorflow-tensorboard<0.5.0,>=0.4.0rc1->tensorflow)\n", 26 | "Requirement already up-to-date: setuptools in /Users/khalidsalama/anaconda/lib/python3.6/site-packages (from protobuf>=3.3.0->tensorflow)\n", 27 | "Installing collected packages: tensorflow-tensorboard, enum34, tensorflow\n", 28 | " Found existing installation: tensorflow-tensorboard 0.1.8\n", 29 | " Uninstalling tensorflow-tensorboard-0.1.8:\n", 30 | " Successfully uninstalled tensorflow-tensorboard-0.1.8\n", 31 | " Found existing installation: tensorflow 1.3.0\n", 32 | " Uninstalling tensorflow-1.3.0:\n", 33 | " Successfully uninstalled tensorflow-1.3.0\n", 34 | "Successfully installed enum34-1.1.6 tensorflow-1.4.0 tensorflow-tensorboard-0.4.0rc2\n" 35 | ] 36 | } 37 | ], 38 | "source": [ 39 | "%%bash\n", 40 | "\n", 41 | "pip install -U tensorflow" 42 | ] 43 | }, 44 | { 45 | "cell_type": "code", 46 | "execution_count": 2, 47 | "metadata": {}, 48 | "outputs": [ 49 | { 50 | "name": "stderr", 51 | "output_type": "stream", 52 | "text": [ 53 | "/Users/khalidsalama/anaconda/lib/python3.6/importlib/_bootstrap.py:205: RuntimeWarning: compiletime version 3.5 of module 'tensorflow.python.framework.fast_tensor_util' does not match runtime version 3.6\n", 54 | " return f(*args, **kwds)\n" 55 | ] 56 | }, 57 | { 58 | "name": "stdout", 59 | "output_type": "stream", 60 | "text": [ 61 | "1.4.0\n" 62 | ] 63 | } 64 | ], 65 | "source": [ 66 | "import tensorflow as tf\n", 67 | "print(tf.__version__)" 68 | ] 69 | }, 70 | { 71 | "cell_type": "code", 72 | "execution_count": null, 73 | "metadata": { 74 | "collapsed": true 75 | }, 76 | "outputs": [], 77 | "source": [] 78 | } 79 | ], 80 | "metadata": { 81 | "kernelspec": { 82 | "display_name": "Python 3", 83 | "language": "python", 84 | "name": "python3" 85 | }, 86 | "language_info": { 87 | "codemirror_mode": { 88 | "name": "ipython", 89 | "version": 3 90 | }, 91 | "file_extension": ".py", 92 | "mimetype": "text/x-python", 93 | "name": "python", 94 | "nbconvert_exporter": "python", 95 | "pygments_lexer": "ipython3", 96 | "version": "3.6.1" 97 | } 98 | }, 99 | "nbformat": 4, 100 | "nbformat_minor": 2 101 | } 102 | -------------------------------------------------------------------------------- /03 - Clustering/data/new-data.csv: -------------------------------------------------------------------------------- 1 | 0.5,-0.3,7 -------------------------------------------------------------------------------- /04 - Times Series/data/test-data.csv: -------------------------------------------------------------------------------- 1 | time_index,value 2 | 700,4.062140225471643 3 | 701,3.1703847192297845 4 | 702,2.8296873454315192 5 | 703,3.6961238939396597 6 | 704,3.774688382600603 7 | 705,3.8527851934977764 8 | 706,3.0342060812112686 9 | 707,3.3306549645378025 10 | 708,3.6487243344222495 11 | 709,2.1335273551814233 12 | 710,3.683338770039675 13 | 711,3.4407180005651257 14 | 712,3.1847285264243808 15 | 713,2.8896834732269645 16 | 714,3.443554099017879 17 | 715,3.1628527640306943 18 | 716,3.477295109150082 19 | 717,3.36526258592279 20 | 718,3.122152506677769 21 | 719,3.348075456125274 22 | 720,2.4504905110212265 23 | 721,3.00950452903947 24 | 722,3.205798117619332 25 | 723,2.5846071060841815 26 | 724,2.3892445742499167 27 | 725,3.3945942547686103 28 | 726,2.561123153365352 29 | 727,2.035257638932893 30 | 728,2.9801074278502275 31 | 729,2.9562399361791156 32 | 730,2.1654708168278356 33 | 731,3.4468449142981705 34 | 732,2.6893807928426563 35 | 733,3.025794994419157 36 | 734,2.542869596532311 37 | 735,2.9275771470778706 38 | 736,2.8204505932091055 39 | 737,3.527816758474815 40 | 738,2.197418634802284 41 | 739,2.554280646235888 42 | 740,2.4324240602338847 43 | 741,3.1271375891212405 44 | 742,2.2850209514914006 45 | 743,2.0776756899911613 46 | 744,2.529000935802995 47 | 745,3.297087742223073 48 | 746,2.2394742253878963 49 | 747,3.1367437479006797 50 | 748,2.3953147600203675 51 | 749,2.8848458913301296 52 | 750,2.9185911092297903 53 | 751,2.768126620869814 54 | 752,2.43488473055407 55 | 753,2.8870032425325123 56 | 754,3.317655820661928 57 | 755,2.1790416388446836 58 | 756,2.7702407610447577 59 | 757,2.554226484730687 60 | 758,2.8134188158141438 61 | 759,2.758781861045474 62 | 760,2.272104718154779 63 | 761,2.8103970647324372 64 | 762,2.8972594904941387 65 | 763,3.4002482934772478 66 | 764,3.3455711599757834 67 | 765,2.715918824573258 68 | 766,3.7061718277620113 69 | 767,3.0195081640399204 70 | 768,3.4891004444538325 71 | 769,2.9311254106642193 72 | 770,2.3379837346623598 73 | 771,2.5146941193432775 74 | 772,3.534476172205595 75 | 773,3.0003799070150836 76 | 774,2.8915136087990785 77 | 775,2.393552222327803 78 | 776,3.011905423311392 79 | 777,3.8801787347996632 80 | 778,3.2515228547754886 81 | 779,2.789501465000945 82 | 780,3.426429385272551 83 | 781,3.418712155395797 84 | 782,3.983713621739207 85 | 783,3.6345931864075576 86 | 784,3.028427715036325 87 | 785,3.6675103028495144 88 | 786,4.199142625113263 89 | 787,3.0825750004211327 90 | 788,3.3340944569219486 91 | 789,3.7900930100567076 92 | 790,3.9891451701449654 93 | 791,4.437402936216056 94 | 792,3.483434383801479 95 | 793,4.856432283156268 96 | 794,3.112032068064219 97 | 795,3.764822361311284 98 | 796,4.778499314027573 99 | 797,3.33724185989896 100 | 798,3.8058737331849453 101 | 799,3.9223811712262653 102 | 800,4.80546589113736 103 | 801,4.421552582453292 104 | 802,3.606081714628961 105 | 803,3.9941737176325596 106 | 804,4.662649705612334 107 | 805,4.018590914241019 108 | 806,3.5680466115701646 109 | 807,5.103635450598651 110 | 808,4.553832764926619 111 | 809,4.480087371204185 112 | 810,4.462603498918542 113 | 811,4.2137200426188075 114 | 812,4.189374217427936 115 | 813,4.044349362105051 116 | 814,3.3654308023514417 117 | 815,4.551988909577272 118 | 816,5.281251897092956 119 | 817,4.919655962013503 120 | 818,4.268853670537956 121 | 819,5.326461607549719 122 | 820,4.423531000313117 123 | 821,4.203178570982242 124 | 822,4.120263855677827 125 | 823,3.776759734748973 126 | 824,4.5429757684426 127 | 825,5.351165193685153 128 | 826,4.3428152492354775 129 | 827,5.394929077351952 130 | 828,5.218609727629257 131 | 829,4.9831655977115625 132 | 830,5.602842952189427 133 | 831,5.3664242391999775 134 | 832,5.14450210344502 135 | 833,5.014801223804789 136 | 834,5.404549894954248 137 | 835,4.611614806903722 138 | 836,5.91369549455372 139 | 837,5.575199712425203 140 | 838,4.551385680651797 141 | 839,5.696239295334581 142 | 840,5.673983860921718 143 | 841,5.131646815240395 144 | 842,4.7304053547507685 145 | 843,5.131704930574379 146 | 844,5.350692049974139 147 | 845,5.043122726463051 148 | 846,5.433980654640878 149 | 847,5.392811171818018 150 | 848,6.127902771533075 151 | 849,4.948801899758867 152 | 850,5.672670683819614 153 | 851,4.619344342691638 154 | 852,4.461927290385413 155 | 853,5.134271002568175 156 | 854,5.244183015774759 157 | 855,5.0454199444160315 158 | 856,5.63991663670262 159 | 857,5.444551179447414 160 | 858,5.358876256297602 161 | 859,6.300157516455733 162 | 860,5.521687291262919 163 | 861,6.482989918871226 164 | 862,4.452113139646457 165 | 863,5.947519115228699 166 | 864,4.732843968683768 167 | 865,4.663305658213866 168 | 866,5.060828778618426 169 | 867,5.630137501067726 170 | 868,4.837622754661279 171 | 869,4.589984321432029 172 | 870,5.149519770472633 173 | 871,4.926183108085338 174 | 872,5.529322212911065 175 | 873,4.757430665280789 176 | 874,5.39173836256956 177 | 875,5.23465202505217 178 | 876,4.714170978848213 179 | 877,4.662839356640053 180 | 878,4.60819971256791 181 | 879,4.882721694617192 182 | 880,5.390915345465747 183 | 881,3.8287811359231476 184 | 882,4.905994302868104 185 | 883,5.1710621658328515 186 | 884,4.391188353483403 187 | 885,4.748422379069466 188 | 886,5.83622319255817 189 | 887,5.085489278108183 190 | 888,5.085950301210101 191 | 889,5.267403853016739 192 | 890,5.494662308615086 193 | 891,5.113984813073912 194 | 892,5.1585692571022514 195 | 893,4.4319546043214695 196 | 894,4.387326346526101 197 | 895,4.756366569062057 198 | 896,4.413291810154036 199 | 897,5.013658966087423 200 | 898,4.549167180263243 201 | 899,5.172728541719882 202 | 900,3.886746174651208 203 | 901,4.588569016237771 204 | 902,4.929271797376781 205 | 903,4.599656125469891 206 | 904,4.808639274005556 207 | 905,4.325581040660617 208 | 906,4.194580144654176 209 | 907,3.9974315940262444 210 | 908,4.715515557271075 211 | 909,4.1909689237542285 212 | 910,4.074666135679668 213 | 911,4.901169926148987 214 | 912,3.9552622873015917 215 | 913,3.5796376754546113 216 | 914,4.711809225431517 217 | 915,3.6796683690417753 218 | 916,3.5442679947216176 219 | 917,3.1421330422267806 220 | 918,3.756923255399997 221 | 919,4.221580470129694 222 | 920,3.7462046848740105 223 | 921,4.185030793915441 224 | 922,3.396145019184779 225 | 923,4.5190483220341156 226 | 924,4.397364432124365 227 | 925,4.187464540915996 228 | 926,3.9272795891536907 229 | 927,3.6197921717482333 230 | 928,4.032448297113617 231 | 929,4.220999672128967 232 | 930,4.257691213195231 233 | 931,4.154362183090802 234 | 932,4.263289039195564 235 | 933,4.223630570761886 236 | 934,3.5084926009711603 237 | 935,4.062713546013925 238 | 936,4.273967524010026 239 | 937,3.9695138571146784 240 | 938,3.8583311980482913 241 | 939,2.9247548632596514 242 | 940,3.854652554626871 243 | 941,3.234429271148347 244 | 942,3.044781194988175 245 | 943,3.656513334659649 246 | 944,3.4818997204930073 247 | 945,2.875852392487933 248 | 946,3.961998294895795 249 | 947,4.105299492259828 250 | 948,4.216647670087034 251 | 949,3.6508133990630003 252 | 950,3.8246399910006907 253 | 951,3.9922875618756573 254 | 952,3.588030199403815 255 | 953,4.384184812214926 256 | 954,3.5120901674831724 257 | 955,3.418442907989169 258 | 956,3.2977455863735003 259 | 957,2.752204501626123 260 | 958,3.6410521282086634 261 | 959,3.473679507191042 262 | 960,3.921614723961769 263 | 961,3.8925441023134137 264 | 962,3.4275589055043403 265 | 963,3.2211072262973106 266 | 964,3.109856818870202 267 | 965,4.680680857691393 268 | 966,4.546573184643018 269 | 967,3.306769027768169 270 | 968,3.981130047116258 271 | 969,3.8552929675600307 272 | 970,3.7797822835350914 273 | 971,4.427976023630056 274 | 972,4.202232494544553 275 | 973,5.007984438145008 276 | 974,4.489380804569116 277 | 975,4.378691233249306 278 | 976,3.844107079000932 279 | 977,3.2052757587187792 280 | 978,3.966125426862569 281 | 979,3.871352865158788 282 | 980,3.8431485227426725 283 | 981,3.657077938996487 284 | 982,4.356575779819392 285 | 983,3.991957980024272 286 | 984,3.937984423830078 287 | 985,4.673590913114282 288 | 986,5.608245991898564 289 | 987,4.549470674312809 290 | 988,4.706222218782882 291 | 989,4.006099739650296 292 | 990,4.735846319022681 293 | 991,4.395454732392768 294 | 992,4.807044589998613 295 | 993,4.40877616086605 296 | 994,3.623780385854121 297 | 995,5.332176415583953 298 | 996,4.798612805906225 299 | 997,4.91261012214747 300 | 998,5.1126834613174434 301 | 999,5.6472138130982374 302 | -------------------------------------------------------------------------------- /04 - Times Series/data/timeseries-multivariate.txt: -------------------------------------------------------------------------------- 1 | 0,0.926906299771,1.99107237682,2.56546245685,3.07914768197,4.04839057867 2 | 1,0.108010001864,1.41645361423,2.1686839775,2.94963962176,4.1263503303 3 | 2,-0.800567600028,1.0172132907,1.96434754116,2.99885333086,4.04300485864 4 | 3,0.0607042871898,0.719540073421,1.9765012584,2.89265588817,4.0951014426 5 | 4,0.933712200629,0.28052120776,1.41018552514,2.69232603996,4.06481164223 6 | 5,-0.171730652974,0.260054421028,1.48770816369,2.62199129293,4.44572807842 7 | 6,-1.00180162933,0.333045158863,1.50006392277,2.88888309683,4.24755865606 8 | 7,0.0580061875336,0.688929398826,1.56543458772,2.99840358953,4.52726873347 9 | 8,0.764139447412,1.24704875327,1.77649279698,3.13578593851,4.63238922951 10 | 9,-0.230331874785,1.47903998963,2.03547545751,3.20624030377,4.77980005228 11 | 10,-1.03846045211,2.01133000781,2.31977503972,3.67951536251,5.09716775897 12 | 11,0.188643592253,2.23285349038,2.68338482249,3.49817168611,5.24928239634 13 | 12,0.91207302309,2.24244446841,2.71362604985,3.96332587625,5.37802271594 14 | 13,-0.296588665881,2.02594634141,3.07733910479,3.99698324956,5.56365901394 15 | 14,-0.959961476551,1.45078629833,3.18996420137,4.3763059609,5.65356015609 16 | 15,0.46313530679,1.01141441548,3.4980215948,4.20224896882,5.88842247449 17 | 16,0.929354125798,0.626635305936,3.70508262244,4.51791573544,5.73945973251 18 | 17,-0.519110731957,0.269249223148,3.39866823332,4.46802003061,5.82768174382 19 | 18,-0.924330981367,0.349602834684,3.21762413294,4.72803587499,5.94918925767 20 | 19,0.253239387885,0.345158023497,3.11071425333,4.79311566935,5.9489259713 21 | 20,0.637408390225,0.698996675371,3.25232492145,4.73814732384,5.9612010251 22 | 21,-0.407396859412,1.17456342803,2.49526823723,4.59323415742,5.82501686811 23 | 22,-0.967485452118,1.66655933642,2.47284606244,4.58316034754,5.88721406681 24 | 23,0.474480867904,1.95018556323,2.0228950072,4.48651142819,5.8255943735 25 | 24,1.04309652155,2.23519892356,1.91924131572,4.19094661783,5.87457348436 26 | 25,-0.517861513772,2.12501967336,1.70266619979,4.05280882887,5.72160912899 27 | 26,-0.945301585146,1.65464653549,1.81567174251,3.92309850635,5.58270493814 28 | 27,0.501153868974,1.40600764889,1.53991387719,3.72853247942,5.60169001727 29 | 28,0.972859524418,1.00344321868,1.5175642828,3.64092376655,5.10567722582 30 | 29,-0.70553406135,0.465306263885,1.7038540803,3.33236870312,5.09182481555 31 | 30,-0.946093634916,0.294539309453,1.88052827037,2.93011492669,4.97354922696 32 | 31,0.47922123231,0.308465865031,2.03445883031,2.90772899045,4.86241793548 33 | 32,0.754030014252,0.549752241167,2.46115815089,2.95063349534,4.71834614627 34 | 33,-0.64875949826,0.894615488148,2.5922463381,2.81269864022,4.43480095104 35 | 34,-0.757829951086,1.39123914261,2.69258079904,2.61834837315,4.36580046156 36 | 35,0.565653301088,1.72360022693,2.97794913834,2.80403840334,4.27327248459 37 | 36,0.867440092372,2.21100730052,3.38648090792,2.84057515729,4.12210169576 38 | 37,-0.894567758095,2.17549105818,3.45532493329,2.90446025717,4.00251740584 39 | 38,-0.715442356893,2.15105389965,3.52041791902,3.03650393392,4.12809249577 40 | 39,0.80671703672,1.81504564517,3.60463324866,3.00747789871,3.98440762467 41 | 40,0.527014790142,1.31803513865,3.43842186337,3.3332594663,4.03232406566 42 | 41,-0.795936862129,0.847809114454,3.09875133548,3.52863155938,3.94883924909 43 | 42,-0.610245806946,0.425530441018,2.92581949152,3.77238736123,4.27287245021 44 | 43,0.611662279431,0.178432049837,2.48128214822,3.73212087883,4.17319013831 45 | 44,0.650866553108,0.220341648392,2.41694642022,4.2609098519,4.27271645905 46 | 45,-0.774156982023,0.632667602331,2.05474356052,4.32889204886,4.18029723271 47 | 46,-0.714058448409,0.924562377599,1.75706135146,4.52492718422,4.3972678094 48 | 47,0.889627293379,1.46207968841,1.78299357672,4.64466731095,4.56317887554 49 | 48,0.520140662861,1.8996333843,1.41377633823,4.48899091177,4.78805049769 50 | 49,-1.03816935616,2.08997002059,1.51218375351,4.84167764204,4.93026048606 51 | 50,-0.40772951362,2.30878972136,1.44144415128,4.76854460997,5.01538444629 52 | 51,0.792730684781,1.91367048509,1.58887384677,4.71739397335,5.25690012199 53 | 52,0.371311881576,1.67565079528,1.81688563053,4.60353107555,5.44265822961 54 | 53,-0.814398070371,1.13374634126,1.80328814859,4.72264252878,5.52674761122 55 | 54,-0.469017949323,0.601244136627,2.29690896736,4.49859178859,5.54126153454 56 | 55,0.871044371426,0.407597593794,2.7499112487,4.19060637761,5.57693767301 57 | 56,0.523764933017,0.247705192709,3.09002071379,4.02095509006,5.80510362182 58 | 57,-0.881326403531,0.31513103164,3.11358205718,3.96079100808,5.81000652365 59 | 58,-0.357928025339,0.486163915865,3.17884556771,3.72634990659,5.85693642011 60 | 59,0.853038779822,1.04218094475,3.45835384454,3.36703969978,5.9585988449 61 | 60,0.435311516013,1.59715085283,3.63313338588,3.11276729421,5.93643818229 62 | 61,-1.02703719138,1.92205832542,3.47606111735,3.06247155999,6.02106646259 63 | 62,-0.246661325557,2.14653802542,3.29446326567,2.89936259181,5.67531541272 64 | 63,1.02554736569,2.25943737733,3.07031591528,2.78176218013,5.78206328989 65 | 64,0.337814475969,2.07589147224,2.80356226089,2.55888206331,5.7094075496 66 | 65,-1.12023369929,1.25333011618,2.56497288445,2.77361359194,5.50799418376 67 | 66,-0.178980246554,1.11937139901,2.51598681313,2.91438309151,5.47469577206 68 | 67,0.97550951531,0.60553823137,2.11657741073,2.88081098981,5.37034999502 69 | 68,0.136653357206,0.365828836075,1.97386033165,3.13217903204,5.07254490219 70 | 69,-1.05607596951,0.153152115069,1.52110743825,3.01308794192,5.08902539125 71 | 70,-0.13095280331,0.337113974483,1.52703079853,3.16687131599,4.86649398514 72 | 71,1.07081057754,0.714247566736,1.53761382634,3.45151989484,4.75892309166 73 | 72,0.0153410376082,1.24631231847,1.61690939161,3.85481994498,4.35683752832 74 | 73,-0.912801257303,1.60791309476,1.8729264524,4.03037260012,4.36072588913 75 | 74,-0.0894895640338,2.02535207407,1.93484909619,4.09557485132,4.35327025188 76 | 75,0.978646999652,2.20085086625,2.09003440427,4.27542353033,4.1805058388 77 | 76,-0.113312642876,2.2444100761,2.50789248839,4.4151861502,4.03267168136 78 | 77,-1.00215099149,1.84305628445,2.61691237246,4.45425147595,3.81203553766 79 | 78,-0.0183234614205,1.49573923116,2.99308471214,4.71134960112,4.0273804959 80 | 79,1.0823738177,1.12211589848,3.27079386925,4.94288270502,4.01851068083 81 | 80,0.124370187893,0.616474412808,3.4284236674,4.76942168327,3.9749536483 82 | 81,-0.929423379352,0.290977090976,3.34131726136,4.78590392707,4.10190661656 83 | 82,0.23766302648,0.155302052254,3.49779513794,4.64605656795,4.15571321107 84 | 83,1.03531486192,0.359702776204,3.4880725919,4.48167586667,4.21134561991 85 | 84,-0.261234571382,0.713877760378,3.42756426614,4.426443869,4.25208300527 86 | 85,-1.03572442277,1.25001113691,2.96908341113,4.25500915322,4.25723010649 87 | 86,0.380034261243,1.70543355622,2.73605932518,4.16703432307,4.63700400788 88 | 87,1.03734873488,1.97544410562,2.55586572141,3.84976673263,4.55282864289 89 | 88,-0.177344253372,2.22614526325,2.09565864891,3.77378097953,4.82577400298 90 | 89,-0.976821526892,2.18385079177,1.78522284118,3.67768223554,5.06302440873 91 | 90,0.264820472091,1.86981946157,1.50048403865,3.43619796921,5.05651761669 92 | 91,1.05642344868,1.47568646076,1.51347671977,3.20898518885,5.50149047462 93 | 92,-0.311607433358,1.04226467636,1.52089650905,3.02291865417,5.4889046232 94 | 93,-0.724285777937,0.553052311957,1.48573560173,2.7365973598,5.72549174225 95 | 94,0.519859192905,0.226520626591,1.61543723167,2.84102086852,5.69330622288 96 | 95,1.0323195039,0.260873217055,1.81913034804,2.83951143848,5.90325028086 97 | 96,-0.53285682538,0.387695521405,1.70935609313,2.57977050631,5.79579213161 98 | 97,-0.975127997215,0.920948771589,2.51292643636,2.71004616612,5.87016469227 99 | 98,0.540246804099,1.36445470181,2.61949412896,2.98482553485,6.02447664937 100 | 99,0.987764008058,1.85581989607,2.84685706149,2.94760204892,6.0212151724 -------------------------------------------------------------------------------- /04 - Times Series/data/timeseries-univariate.csv: -------------------------------------------------------------------------------- 1 | 1,-0.6656603714 2 | 2,-0.1164380359 3 | 3,0.7398626488 4 | 4,0.7368633029 5 | 5,0.2289480898 6 | 6,2.257073255 7 | 7,3.023457405 8 | 8,2.481161007 9 | 9,3.773638612 10 | 10,5.059257738 11 | 11,3.553186083 12 | 12,4.554486452 13 | 13,3.655475698 14 | 14,3.419647598 15 | 15,4.303376245 16 | 16,4.830153934 17 | 17,7.253057441 18 | 18,5.064802335 19 | 19,5.448082106 20 | 20,6.251301517 21 | 21,6.214335675 22 | 22,3.07021164 23 | 23,6.995487627 24 | 24,7.180942656 25 | 25,6.084876071 26 | 26,6.95580607 27 | 27,6.692312738 28 | 28,6.339959049 29 | 29,7.659013269 30 | 30,6.157071564 31 | 31,4.023661782 32 | 32,7.380555018 33 | 33,6.972155839 34 | 34,6.655956847 35 | 35,6.532594924 36 | 36,6.780524726 37 | 37,6.723407547 38 | 38,7.616777776 39 | 39,6.394157367 40 | 40,5.046574011 41 | 41,5.715326568 42 | 42,6.536737479 43 | 43,6.527307846 44 | 44,5.671954159 45 | 45,6.508512087 46 | 46,4.740656344 47 | 47,5.449062618 48 | 48,5.796110609 49 | 49,4.802213058 50 | 50,4.627081034 51 | 51,5.748934924 52 | 52,4.05776044 53 | 53,2.743057715 54 | 54,3.590052501 55 | 55,2.937786376 56 | 56,5.333221794 57 | 57,5.102383904 58 | 58,5.097946146 59 | 59,2.771776766 60 | 60,3.75493571 61 | 61,3.268329562 62 | 62,3.127887555 63 | 63,5.723894838 64 | 64,2.365351066 65 | 65,2.030890988 66 | 66,5.74385257 67 | 67,2.637874242 68 | 68,2.851492945 69 | 69,1.907194917 70 | 70,2.568816256 71 | 71,3.869259698 72 | 72,3.989917724 73 | 73,3.641515351 74 | 74,2.812911768 75 | 75,4.964828171 76 | 76,3.050937945 77 | 77,4.203046785 78 | 78,4.269162745 79 | 79,2.818643243 80 | 80,3.334928424 81 | 81,5.239741508 82 | 82,4.972880771 83 | 83,5.212782208 84 | 84,6.056729012 85 | 85,5.404247421 86 | 86,4.733521027 87 | 87,5.241044888 88 | 88,6.844720502 89 | 89,8.242617764 90 | 90,6.686818708 91 | 91,6.429035591 92 | 92,7.45926043 93 | 93,8.225717423 94 | 94,7.661722793 95 | 95,8.348721917 96 | 96,8.029228135 97 | 97,9.780942864 98 | 98,9.755623978 99 | 99,9.149489124 100 | 100,8.947965351 101 | 101,9.176768019 102 | 102,8.768408716 103 | 103,10.39624874 104 | 104,10.39477408 105 | 105,11.63126076 106 | 106,11.8222078 107 | 107,13.60107691 108 | 108,14.54919169 109 | 109,12.63475358 110 | 110,13.77411599 111 | 111,14.45808191 112 | 112,13.27674112 113 | 113,16.00004992 114 | 114,13.04977221 115 | 115,14.65730048 116 | 116,14.76178039 117 | 117,14.62716229 118 | 118,16.20697047 119 | 119,14.79470608 120 | 120,16.70541749 121 | 121,15.8638474 122 | 122,15.63192699 123 | 123,17.20433954 124 | 124,16.29180965 125 | 125,16.93688521 126 | 126,16.07521662 127 | 127,18.33942893 128 | 128,15.62502668 129 | 129,16.81519558 130 | 130,16.86177911 131 | 131,19.18323671 132 | 132,16.68993279 133 | 133,16.52735528 134 | 134,15.22702085 135 | 135,16.13574242 136 | 136,16.08079964 137 | 137,17.16828833 138 | 138,16.09004409 139 | 139,16.92712829 140 | 140,15.54298161 141 | 141,16.03893798 142 | 142,15.38310389 143 | 143,16.18064645 144 | 144,16.22326501 145 | 145,17.1657127 146 | 146,14.87850136 147 | 147,12.80968507 148 | 148,16.25354113 149 | 149,15.14082073 150 | 150,15.79111348 151 | 151,14.02005588 152 | 152,14.32583767 153 | 153,13.87437546 154 | 154,14.47127314 155 | 155,14.29661188 156 | 156,14.68406313 157 | 157,15.84514503 158 | 158,13.89667867 159 | 159,13.58135083 160 | 160,14.26005818 161 | 161,13.3826131 162 | 162,12.85293827 163 | 163,11.06745237 164 | 164,14.08812275 165 | 165,13.05949205 166 | 166,12.18454971 167 | 167,13.01005879 168 | 168,12.45032762 169 | 169,12.20445297 170 | 170,14.39420173 171 | 171,13.49261191 172 | 172,14.91460871 173 | 173,15.97672915 174 | 174,13.96235436 175 | 175,13.77840615 176 | 176,14.39425289 177 | 177,14.31499272 178 | 178,14.37080989 179 | 179,15.34130707 180 | 180,13.42441434 181 | 181,14.54726137 182 | 182,12.51644144 183 | 183,15.36040785 184 | 184,14.52577002 185 | 185,15.90562887 186 | 186,15.12482026 187 | 187,15.55534424 188 | 188,12.22427756 189 | 189,15.11554898 190 | 190,14.23464612 191 | 191,16.52156964 192 | 192,18.14558077 193 | 193,16.51932129 194 | 194,16.88159194 195 | 195,18.08337828 196 | 196,18.70889734 197 | 197,20.97040748 198 | 198,18.98358689 199 | 199,20.76308391 200 | 200,19.81117586 201 | 201,20.24139919 202 | 202,20.78884634 203 | 203,19.92458806 204 | 204,21.60401889 205 | 205,23.30040897 206 | 206,22.2621713 207 | 207,21.24305034 208 | 208,22.07690632 209 | 209,21.78022193 210 | 210,22.94853418 211 | 211,23.72076264 212 | 212,24.12217213 213 | 213,23.04498673 214 | 214,23.8767225 215 | 215,26.52157498 216 | 216,26.24329682 217 | 217,24.83932457 218 | 218,25.66570111 219 | 219,25.61834475 220 | 220,24.41079934 221 | 221,25.31871793 222 | 222,26.7612452 223 | 223,27.00663389 224 | 224,27.86719501 225 | 225,24.87319457 226 | 226,27.85768696 227 | 227,25.70405436 228 | 228,26.11077958 229 | 229,28.11250875 230 | 230,27.6743468 231 | 231,27.19705336 232 | 232,28.08086799 233 | 233,26.19946123 234 | 234,27.32830376 235 | 235,25.98334256 236 | 236,26.71791978 237 | 237,26.67921906 238 | 238,26.25811051 239 | 239,26.64228363 240 | 240,26.20667398 241 | 241,26.39816025 242 | 242,24.83672957 243 | 243,24.27745854 244 | 244,26.10007483 245 | 245,25.67761738 246 | 246,25.91667268 247 | 247,27.57057095 248 | 248,25.68913621 249 | 249,24.92375989 250 | 250,25.5593706 251 | 251,25.14638402 252 | 252,26.46738639 253 | 253,24.55740644 254 | 254,23.5691458 255 | 255,24.07138538 256 | 256,24.94177528 257 | 257,22.33546227 258 | 258,22.32323763 259 | 259,24.38075647 260 | 260,22.40754744 261 | 261,22.61183469 262 | 262,23.28658677 263 | 263,22.98637689 264 | 264,25.46468191 265 | 265,24.14497597 266 | 266,22.97023633 267 | 267,24.37831161 268 | 268,24.86418705 269 | 269,22.61185053 270 | 270,21.70979546 271 | 271,22.09389192 272 | 272,23.25882086 273 | 273,23.56494308 274 | 274,24.13181731 275 | 275,24.28160263 276 | 276,24.43623736 277 | 277,23.24956419 278 | 278,21.76696726 279 | 279,25.14997786 280 | 280,24.67520728 281 | 281,23.40400797 282 | 282,26.24489282 283 | 283,25.05952039 284 | 284,24.53922399 285 | 285,24.89917455 286 | 286,25.13438134 287 | 287,26.05220822 288 | 288,26.94133112 289 | 289,26.02788294 290 | 290,26.65909349 291 | 291,26.0832158 292 | 292,27.39946496 293 | 293,26.57973099 294 | 294,27.49867838 295 | 295,29.89834253 296 | 296,27.78403709 297 | 297,28.92405258 298 | 298,26.58518509 299 | 299,30.91291741 300 | 300,31.73949474 301 | 301,29.25173685 302 | 302,30.3747463 303 | 303,30.59695095 304 | 304,31.50757627 305 | 305,30.97036633 306 | 306,31.27177079 307 | 307,33.43369051 308 | 308,33.9848363 309 | 309,33.31775176 310 | 310,31.69164009 311 | 311,33.07897081 312 | 312,33.10849644 313 | 313,33.29428375 314 | 314,35.60397723 315 | 315,35.33614012 316 | 316,33.95701506 317 | 317,35.16914759 318 | 318,35.92430987 319 | 319,35.81820171 320 | 320,37.36378976 321 | 321,36.74459793 322 | 322,35.27569759 323 | 323,35.9767425 324 | 324,36.17811539 325 | 325,35.68567729 326 | 326,35.54212562 327 | 327,38.78114238 328 | 328,36.46819618 329 | 329,38.07352601 330 | 330,36.56662256 331 | 331,38.1938068 332 | 332,37.42919226 333 | 333,37.44666875 334 | 334,37.16795054 335 | 335,34.97440399 336 | 336,35.6174255 337 | 337,37.37634133 338 | 338,37.26137677 339 | 339,38.09726659 340 | 340,36.04071363 341 | 341,37.07494746 342 | 342,34.4281316 343 | 343,35.1959716 344 | 344,35.26041345 345 | 345,36.9398346 346 | 346,33.58933988 347 | 347,35.00075536 348 | 348,35.97807689 349 | 349,35.66631707 350 | 350,35.44925794 351 | 351,33.69565848 352 | 352,35.38969147 353 | 353,35.96432261 354 | 354,33.6956667 355 | 355,34.05230212 356 | 356,32.70536873 357 | 357,33.91009672 358 | 358,34.45606416 359 | 359,34.97972516 360 | 360,32.36260234 361 | 361,31.69621537 362 | 362,33.02307596 363 | 363,33.94445036 364 | 364,32.2763097 365 | 365,32.06228645 366 | 366,34.25956906 367 | 367,33.61620818 368 | 368,35.00141908 369 | 369,34.47493965 370 | 370,34.31576327 371 | 371,33.24772844 372 | 372,32.95185358 373 | 373,32.55224164 374 | 374,33.06560689 375 | 375,35.2082848 376 | 376,34.50372086 377 | 377,33.54922461 378 | 378,35.46287805 379 | 379,34.68829823 380 | 380,35.04640557 381 | 381,33.48711975 382 | 382,34.03264662 383 | 383,34.43296169 384 | 384,35.7571391 385 | 385,32.58466542 386 | 386,34.44295272 387 | 387,35.43369124 388 | 388,37.7196386 389 | 389,37.55863215 390 | 390,35.11245844 391 | 391,37.36667774 392 | 392,36.41904568 393 | 393,38.11951592 394 | 394,39.351325 395 | 395,38.87795167 396 | 396,38.8144378 397 | 397,38.96059714 398 | 398,39.95536453 399 | 399,39.78580611 400 | 400,40.70319964 401 | 401,41.32804151 402 | 402,42.79937243 403 | 403,38.43432481 404 | 404,42.12051726 405 | 405,42.50068551 406 | 406,43.89812523 407 | 407,42.18632495 408 | 408,43.99716859 409 | 409,43.67726129 410 | 410,42.98072384 411 | 411,43.59181621 412 | 412,44.98283057 413 | 413,42.17674627 414 | 414,46.49541908 415 | 415,45.58212027 416 | 416,42.7202171 417 | 417,45.66108535 418 | 418,45.03844556 419 | 419,44.96618253 420 | 420,45.0371585 421 | 421,46.12237848 422 | 422,46.18891162 423 | 423,46.82075672 424 | 424,47.25058257 425 | 425,45.91853936 426 | 426,46.83241571 427 | 427,47.77383153 428 | 428,48.12984438 429 | 429,46.74042025 430 | 430,46.66834779 431 | 431,47.41473153 432 | 432,46.93101415 433 | 433,48.24438209 434 | 434,47.41007874 435 | 435,46.92607209 436 | 436,46.77346554 437 | 437,47.80447575 438 | 438,45.7000972 439 | 439,46.60252512 440 | 440,45.59290618 441 | 441,47.37025588 442 | 442,46.46333171 443 | 443,46.19762396 444 | 444,47.57763766 445 | 445,46.92624737 446 | 446,46.1536802 447 | 447,45.94947611 448 | 448,46.37457004 449 | 449,44.22344538 450 | 450,43.18937717 451 | 451,44.3387774 452 | 452,45.63204816 453 | 453,43.87816917 454 | 454,43.67301546 455 | 455,42.11959709 456 | 456,43.89387883 457 | 457,44.40734798 458 | 458,42.67367897 459 | 459,43.76501429 460 | 460,44.74698445 461 | 461,43.14500236 462 | 462,42.41214263 463 | 463,44.1631715 464 | 464,41.81378406 465 | 465,43.00929934 466 | 466,42.80360515 467 | 467,44.30252713 468 | 468,42.88123048 469 | 469,43.47049118 470 | 470,44.42168141 471 | 471,42.43276664 472 | 472,44.57582419 473 | 473,43.56138481 474 | 474,43.4549005 475 | 475,43.06396235 476 | 476,43.8737132 477 | 477,42.1428636 478 | 478,43.60856585 479 | 479,44.16778079 480 | 480,42.90474298 481 | 481,44.99882414 482 | 482,43.304605 483 | 483,44.4468626 484 | 484,45.49241923 485 | 485,44.46713555 486 | 486,46.27348465 487 | 487,45.76034556 488 | 488,45.37440079 489 | 489,46.19246701 490 | 490,48.28190231 491 | 491,47.81719203 492 | 492,47.23213374 493 | 493,48.03313818 494 | 494,46.73599653 495 | 495,47.12327054 496 | 496,48.58597108 497 | 497,48.6738899 498 | 498,48.52018743 499 | 499,48.50385022 500 | 500,50.17026668 -------------------------------------------------------------------------------- /04 - Times Series/data/train-data.csv: -------------------------------------------------------------------------------- 1 | time_index,value 2 | 0,-0.5070139274941298 3 | 1,0.1253712967775968 4 | 2,-0.12267575206840288 5 | 3,-0.16956387939957984 6 | 4,0.30393116333315534 7 | 5,-0.12859637797270826 8 | 6,0.3184790887830743 9 | 7,-0.42364367699352623 10 | 8,0.21185831866839666 11 | 9,-0.3894984396354476 12 | 10,0.780461773776706 13 | 11,0.21716827635006894 14 | 12,0.208066295332867 15 | 13,0.9023076086982981 16 | 14,0.37177403780424256 17 | 15,0.43073320317895214 18 | 16,1.23308015942016 19 | 17,0.9121301645172821 20 | 18,0.8702649415061833 21 | 19,1.3225444506952997 22 | 20,1.2849016700625 23 | 21,0.8053488682091177 24 | 22,0.7683377036246207 25 | 23,0.4057552748095914 26 | 24,0.978329292153113 27 | 25,0.696015936876089 28 | 26,1.5554789446672375 29 | 27,0.4781779876080657 30 | 28,0.14013906948820853 31 | 29,1.6449368085323504 32 | 30,1.6590749923600454 33 | 31,2.0141355497897404 34 | 32,1.2459591113581108 35 | 33,0.9793177817011796 36 | 34,0.241826654996384 37 | 35,1.7570742528353063 38 | 36,1.0695330784833672 39 | 37,1.5210907139820962 40 | 38,0.8710384662192763 41 | 39,1.4839397118303155 42 | 40,1.243793091688579 43 | 41,0.6848339518810963 44 | 42,1.6276559859206734 45 | 43,1.3116497622098806 46 | 44,1.3608905379061378 47 | 45,1.041190994021974 48 | 46,0.9799805971350682 49 | 47,1.2354969054174134 50 | 48,0.22235989417954904 51 | 49,1.1513923265108672 52 | 50,0.9396515278276432 53 | 51,0.30260959424652467 54 | 52,1.0056960398178687 55 | 53,1.6068853568408674 56 | 54,1.7676627305773898 57 | 55,0.9173150845957287 58 | 56,1.9939897894609664 59 | 57,0.7414664658496637 60 | 58,0.5332771867761734 61 | 59,0.8338414219102095 62 | 60,0.8641193396342405 63 | 61,1.8207245814139355 64 | 62,1.443486437588053 65 | 63,1.674623601228037 66 | 64,1.443584875090209 67 | 65,1.308804473574395 68 | 66,1.733630056325742 69 | 67,0.8359910593520475 70 | 68,1.0980006203179862 71 | 69,0.7093105225204377 72 | 70,1.1191743713347615 73 | 71,1.150825963564253 74 | 72,2.3425419082645034 75 | 73,1.5345352246330584 76 | 74,1.6779068527914949 77 | 75,0.8676696369755783 78 | 76,0.7677086011436657 79 | 77,0.8998287612805649 80 | 78,0.6025577314257724 81 | 79,1.5358568541287414 82 | 80,1.7413454713205905 83 | 81,1.7294779805951772 84 | 82,0.24100658343511855 85 | 83,0.8087157551467282 86 | 84,1.5151550594728582 87 | 85,1.6630951453036857 88 | 86,0.6939581780591096 89 | 87,1.8563910702987192 90 | 88,0.8695925892718722 91 | 89,1.4075735259518103 92 | 90,0.813779511845681 93 | 91,1.0075753587561769 94 | 92,0.5362436479057621 95 | 93,1.208505762728457 96 | 94,0.39508516366335494 97 | 95,0.5387949889626957 98 | 96,0.06824975090018343 99 | 97,0.5515019585271188 100 | 98,0.9717347784462206 101 | 99,0.6453799423415064 102 | 100,0.9638253752187474 103 | 101,0.8253807454700055 104 | 102,0.765295849230056 105 | 103,-0.2490667183903401 106 | 104,0.1570243819306409 107 | 105,0.25567212153198093 108 | 106,0.13750761989296234 109 | 107,-0.20660746818522513 110 | 108,0.5933906620273861 111 | 109,0.5663378756409212 112 | 110,1.0593951296282083 113 | 111,0.2521124010736045 114 | 112,0.3260453790642569 115 | 113,0.1540775658086948 116 | 114,1.1988817491487544 117 | 115,0.2736270594678474 118 | 116,0.07319358529892356 119 | 117,0.2355506823573319 120 | 118,-0.3498579462123355 121 | 119,1.1746109061731147 122 | 120,-0.2454734313621706 123 | 121,0.03472511352169072 124 | 122,0.6452287293459306 125 | 123,-0.5196418251954463 126 | 124,0.27339202936903717 127 | 125,0.49343731462487284 128 | 126,1.0226849755300311 129 | 127,-0.19257056273006357 130 | 128,-0.47107762571031686 131 | 129,-0.23374598590110818 132 | 130,-0.007609464770713337 133 | 131,-0.3980476099888386 134 | 132,-0.5558966206457587 135 | 133,-0.1344657899523246 136 | 134,-0.6562174891480932 137 | 135,-0.9529234885623512 138 | 136,-0.1824689939763049 139 | 137,-0.4824704474604228 140 | 138,-0.9436448853093331 141 | 139,-0.3369041551721612 142 | 140,0.14497797127573497 143 | 141,0.016325582764854185 144 | 142,-0.19500561044644937 145 | 143,-0.9654489601806846 146 | 144,-0.9612848959159918 147 | 145,-0.162283592524345 148 | 146,0.22063804277118648 149 | 147,-0.7768224686464962 150 | 148,-0.5474822299406553 151 | 149,-0.22684463014547362 152 | 150,-0.05073639447563938 153 | 151,-0.3540337760171799 154 | 152,0.26733413075841495 155 | 153,-0.48318666001008803 156 | 154,1.0412721613305362 157 | 155,-0.3009441654464442 158 | 156,0.23672219675628858 159 | 157,-0.107098377724405 160 | 158,-0.4440316895674985 161 | 159,-0.24570790256824426 162 | 160,0.5943460949278447 163 | 161,-1.0682094133994264 164 | 162,-0.2680015981885515 165 | 163,0.033828877133002866 166 | 164,-0.28231626805343357 167 | 165,0.025170222611089033 168 | 166,-0.17207076283859424 169 | 167,0.2296022365944559 170 | 168,0.04598573095359981 171 | 169,0.8987253251768407 172 | 170,0.4586360956080101 173 | 171,0.42576578797620623 174 | 172,0.10791234233154612 175 | 173,-0.23875487135166396 176 | 174,-0.34084971403433867 177 | 175,0.6546440666166147 178 | 176,1.0435654514531552 179 | 177,0.6100905299960653 180 | 178,-0.42662090687008325 181 | 179,-0.7205534111701549 182 | 180,0.2370598496105042 183 | 181,0.7156811736776737 184 | 182,0.09764433823741903 185 | 183,0.1713968836530575 186 | 184,1.1557269685335108 187 | 185,0.7253976794449961 188 | 186,0.26055050392723333 189 | 187,1.429007916594629 190 | 188,0.8915221745929881 191 | 189,1.849975707474486 192 | 190,1.2170415838605697 193 | 191,0.28227177326870756 194 | 192,0.3758340112512873 195 | 193,0.489395008190605 196 | 194,-0.07780361911193254 197 | 195,1.0592080566932727 198 | 196,1.260639592592383 199 | 197,0.8778403107793371 200 | 198,0.23693750056347074 201 | 199,1.3629839804917614 202 | 200,0.933989699638078 203 | 201,1.2559119044689113 204 | 202,1.562489659119579 205 | 203,1.4766387826355065 206 | 204,1.491247246204821 207 | 205,1.0330384259109189 208 | 206,1.2717659065426434 209 | 207,0.5450718100076353 210 | 208,1.588810638400759 211 | 209,1.180009407506983 212 | 210,1.2517809833903317 213 | 211,1.5428349841931654 214 | 212,0.9090514743496572 215 | 213,2.0636856127889924 216 | 214,1.7295023405620553 217 | 215,0.7207840153166203 218 | 216,1.329677130143393 219 | 217,1.7757506097740814 220 | 218,0.7162167349060784 221 | 219,2.2275539995837077 222 | 220,2.012786749166425 223 | 221,0.8684124735270828 224 | 222,1.7121799084141875 225 | 223,1.574730314175874 226 | 224,1.9539080144419492 227 | 225,1.0177157482426051 228 | 226,1.7317822328139219 229 | 227,1.931341421920829 230 | 228,2.3984024094583827 231 | 229,1.4395431414826998 232 | 230,2.014204161701857 233 | 231,2.8239742537290016 234 | 232,1.2932303643848833 235 | 233,1.9383163687374438 236 | 234,2.2300236863813026 237 | 235,2.110700974442951 238 | 236,1.9604749859151185 239 | 237,2.873056945604347 240 | 238,3.042481203367786 241 | 239,2.0349069225390064 242 | 240,2.0777500108680504 243 | 241,2.291484544878781 244 | 242,2.769883711814939 245 | 243,2.4088627771736624 246 | 244,1.9491837560161467 247 | 245,3.1833487056181706 248 | 246,2.2988579493188883 249 | 247,1.957192841259238 250 | 248,2.9068158400814905 251 | 249,2.3701889353165955 252 | 250,1.919831480345358 253 | 251,2.6692682753733843 254 | 252,2.3481555360095228 255 | 253,2.0611353817546756 256 | 254,2.084063946698618 257 | 255,2.5871870558846437 258 | 256,2.5349460436653226 259 | 257,2.1937121705238254 260 | 258,2.465205616564662 261 | 259,3.5011148068047655 262 | 260,1.0872350793182335 263 | 261,3.1172222909842504 264 | 262,2.2166159479532883 265 | 263,1.7705159676237796 266 | 264,3.1727664641419024 267 | 265,1.9892904101862803 268 | 266,1.5376910059503701 269 | 267,2.7220079745425036 270 | 268,2.2792294831645616 271 | 269,1.2867915515837316 272 | 270,1.7632027534734713 273 | 271,2.608122652864524 274 | 272,1.8349037986395822 275 | 273,2.5657744969319713 276 | 274,2.0294851497169994 277 | 275,2.701897623997814 278 | 276,2.136856139216165 279 | 277,3.0684468389668704 280 | 278,2.5880287951712986 281 | 279,2.09758044416676 282 | 280,1.3842149430365502 283 | 281,2.090777383464958 284 | 282,2.7737299554551322 285 | 283,1.677875405665155 286 | 284,2.5185996092876226 287 | 285,2.3979314455762495 288 | 286,0.7706116377140514 289 | 287,1.9068300017223707 290 | 288,1.6884109083823688 291 | 289,2.3211973311275136 292 | 290,2.074017902710044 293 | 291,1.8601854699639824 294 | 292,2.5770457442035712 295 | 293,1.1320215688173692 296 | 294,1.6661182755241826 297 | 295,1.5065410558928145 298 | 296,1.3504730246720247 299 | 297,1.4781447340715372 300 | 298,1.8287400728716516 301 | 299,2.4439413941638555 302 | 300,1.1335870239005752 303 | 301,1.1376121185076786 304 | 302,1.883419689823916 305 | 303,1.1643748038367192 306 | 304,0.9052621787229713 307 | 305,1.49029893737825 308 | 306,1.504595817338015 309 | 307,0.5972730492321923 310 | 308,1.4200104774184505 311 | 309,1.7746909364603747 312 | 310,1.2224561324894654 313 | 311,0.9871804251175486 314 | 312,1.5178942988712019 315 | 313,1.6323865515785387 316 | 314,0.8180963782943355 317 | 315,1.179894094942133 318 | 316,0.7957333525885373 319 | 317,1.5596466944625096 320 | 318,1.4642504546959447 321 | 319,1.3051724059444467 322 | 320,1.620731615853558 323 | 321,0.34301128322455676 324 | 322,0.8935376468255153 325 | 323,0.7162418593271469 326 | 324,0.7935228855212434 327 | 325,0.715036157783857 328 | 326,0.47255776093126256 329 | 327,0.06053479272504858 330 | 328,1.1092282612381497 331 | 329,1.5854843894731405 332 | 330,0.7226338344053831 333 | 331,0.5415628726616049 334 | 332,0.769707956298005 335 | 333,1.0811826467816688 336 | 334,1.2324015386438218 337 | 335,0.1850672631368836 338 | 336,0.8329997477763121 339 | 337,1.2827590248617144 340 | 338,0.15409359872707973 341 | 339,0.7131220830375605 342 | 340,0.3538863693240233 343 | 341,0.4729187010748861 344 | 342,0.25167060592814694 345 | 343,0.4795340919745855 346 | 344,1.244545001156966 347 | 345,0.9070580938291402 348 | 346,-0.2474843556396955 349 | 347,0.9318249817242306 350 | 348,0.8691696762707389 351 | 349,0.10928398400582107 352 | 350,1.1954642151715116 353 | 351,1.484077554996629 354 | 352,0.40735836498626776 355 | 353,0.8741504310303659 356 | 354,1.3984054286233911 357 | 355,0.8676682893671924 358 | 356,1.1623868990365802 359 | 357,1.3360059694609663 360 | 358,0.5750144526389834 361 | 359,1.4273445427609106 362 | 360,0.7625919084249725 363 | 361,1.5300702740487817 364 | 362,1.1771428900727505 365 | 363,0.9815884070712058 366 | 364,1.7794529677664737 367 | 365,0.5096286014930141 368 | 366,0.8621047208445878 369 | 367,0.6572835372437593 370 | 368,0.7704329971617637 371 | 369,0.5021708574153725 372 | 370,0.9521055790991899 373 | 371,0.5124929945662163 374 | 372,0.24419637629300284 375 | 373,1.1672863895506451 376 | 374,0.7178411779625429 377 | 375,1.1888475010857138 378 | 376,1.2673469074966683 379 | 377,0.4135564113401964 380 | 378,1.2619200687245757 381 | 379,1.2847332369524564 382 | 380,1.0587556277731935 383 | 381,1.2057020854211569 384 | 382,1.9577257560566927 385 | 383,2.0900437646623122 386 | 384,0.9604295505152993 387 | 385,1.6910295543239444 388 | 386,2.4846342819242393 389 | 387,2.0867271577793156 390 | 388,1.2630580708033659 391 | 389,1.211468044794045 392 | 390,0.955653622356968 393 | 391,1.3890179730031247 394 | 392,2.44345181147831 395 | 393,1.204457284758076 396 | 394,2.845650935135721 397 | 395,1.5276002219192633 398 | 396,1.6151275384846175 399 | 397,2.5959934698396947 400 | 398,1.8126461999813244 401 | 399,2.8157807817978533 402 | 400,1.422968461929559 403 | 401,0.9105198075182803 404 | 402,1.9572272725379791 405 | 403,2.600692828025921 406 | 404,1.3992103054577312 407 | 405,3.018559318027152 408 | 406,1.5093470931108484 409 | 407,2.8971446400960468 410 | 408,1.6865911401470695 411 | 409,2.2332932663990315 412 | 410,2.6608879170904864 413 | 411,2.669532774586354 414 | 412,2.0251473312590482 415 | 413,2.523094088712269 416 | 414,2.911419909915979 417 | 415,2.340494856846694 418 | 416,2.545681180095806 419 | 417,3.099596750711708 420 | 418,2.136135468673287 421 | 419,2.1888107371201158 422 | 420,2.6488645103329107 423 | 421,3.1568399578858273 424 | 422,2.440864677932434 425 | 423,2.610809173659483 426 | 424,1.8178056091295223 427 | 425,2.9315735968273544 428 | 426,2.8184595352670607 429 | 427,2.069996262482516 430 | 428,2.210078410725907 431 | 429,4.014375425597757 432 | 430,2.797331532036595 433 | 431,2.34882390480658 434 | 432,2.4007093792308676 435 | 433,3.2702296111523 436 | 434,3.4303221048765566 437 | 435,3.37858399500833 438 | 436,3.212384205955622 439 | 437,2.216321278261658 440 | 438,2.784889634345377 441 | 439,4.419156418444061 442 | 440,3.957440911283956 443 | 441,2.754337273745299 444 | 442,3.9590830385435067 445 | 443,2.9510304655840445 446 | 444,3.430818337970406 447 | 445,3.417164122127078 448 | 446,3.7727735339485235 449 | 447,2.193807925768478 450 | 448,3.3244896074740726 451 | 449,3.4043681191391784 452 | 450,3.6974548916638454 453 | 451,3.9523937610220634 454 | 452,3.5729878538357385 455 | 453,2.495321372584242 456 | 454,2.0219326537398943 457 | 455,2.6792841395141833 458 | 456,3.5211582944939983 459 | 457,2.9853510573663247 460 | 458,2.725114037495373 461 | 459,3.0731529711235157 462 | 460,2.8064542337639415 463 | 461,3.961218040102244 464 | 462,3.318125102556614 465 | 463,3.7284890757462783 466 | 464,3.4752958055141363 467 | 465,3.278428878730352 468 | 466,3.444283176951884 469 | 467,3.77325515759806 470 | 468,2.2907371586806886 471 | 469,2.938790653954773 472 | 470,3.8176787682406395 473 | 471,2.7734549949442497 474 | 472,3.5007835419150353 475 | 473,3.18851778210803 476 | 474,3.594162344250331 477 | 475,2.1631708225851556 478 | 476,3.876183487335109 479 | 477,3.238333572409265 480 | 478,3.0111588283739703 481 | 479,2.6859592974631026 482 | 480,3.543110508796691 483 | 481,3.3523951980962914 484 | 482,3.3045303739129706 485 | 483,2.1505872783687434 486 | 484,3.255545595834051 487 | 485,2.1834361711617376 488 | 486,3.016004454036089 489 | 487,2.4348048508379136 490 | 488,2.2764720072045774 491 | 489,2.13094695214868 492 | 490,2.4107915906463733 493 | 491,2.2028169203722907 494 | 492,2.5179343324986063 495 | 493,2.05351310068931 496 | 494,2.5003053592680518 497 | 495,2.218774657884143 498 | 496,1.644258843640977 499 | 497,2.062599169684491 500 | 498,2.3219602009985816 501 | 499,2.6096961732821673 502 | 500,2.3765700831025884 503 | 501,3.0287779735690266 504 | 502,2.3451050020349733 505 | 503,2.2730504246034657 506 | 504,1.7170030083608254 507 | 505,3.972820176853931 508 | 506,2.96081928143457 509 | 507,1.7440825856250106 510 | 508,2.283956392920651 511 | 509,3.017090874099012 512 | 510,1.73269060308032 513 | 511,2.4818036862008084 514 | 512,1.9178423638357804 515 | 513,2.008468538769001 516 | 514,1.4303512189916163 517 | 515,2.420028353972818 518 | 516,2.333694471544968 519 | 517,2.0358856878361853 520 | 518,1.930372332964022 521 | 519,2.772314114023144 522 | 520,2.6525270397365643 523 | 521,2.7874542842670094 524 | 522,1.4108282031162545 525 | 523,1.6868876057556874 526 | 524,1.5037689438264519 527 | 525,1.2325642370435734 528 | 526,0.7124232713063819 529 | 527,2.3800907644028957 530 | 528,1.3877233338461814 531 | 529,1.8462752581855768 532 | 530,1.6416133440083642 533 | 531,1.8126044092561209 534 | 532,2.0663554509247932 535 | 533,2.761008359626194 536 | 534,2.1577238764476894 537 | 535,2.017417635006672 538 | 536,1.601923991176038 539 | 537,1.7680351150614104 540 | 538,2.065200901662619 541 | 539,1.725351491022523 542 | 540,1.924858339794002 543 | 541,2.125758189878704 544 | 542,1.2301071586988084 545 | 543,1.709721540041509 546 | 544,1.5239738349384686 547 | 545,2.3385901731902794 548 | 546,2.5132419702994624 549 | 547,1.9750801909178817 550 | 548,0.12333314756948865 551 | 549,2.0991657257046445 552 | 550,1.9142082333554962 553 | 551,1.9309896520009284 554 | 552,1.341544103330502 555 | 553,1.28049058809898 556 | 554,2.6218423637877235 557 | 555,2.1286009393550938 558 | 556,2.2438217064205723 559 | 557,1.456842576721939 560 | 558,2.4680883404735128 561 | 559,2.678058024740523 562 | 560,2.1697469640198386 563 | 561,1.9274790031063742 564 | 562,1.263900280529004 565 | 563,1.3976212029710853 566 | 564,0.7847085746264618 567 | 565,2.239433783516727 568 | 566,1.0804046348105025 569 | 567,2.0291971262278277 570 | 568,1.9031523291722041 571 | 569,1.594750755676741 572 | 570,2.095705429543913 573 | 571,2.1439876601219687 574 | 572,2.17447718714394 575 | 573,2.509210779721647 576 | 574,1.348754113695628 577 | 575,2.2650768647581687 578 | 576,2.469957066691454 579 | 577,1.9094084008513759 580 | 578,2.6907915613546765 581 | 579,2.4283581283856654 582 | 580,2.6198086506106506 583 | 581,2.824532088498242 584 | 582,2.144986257680162 585 | 583,2.6967709905140853 586 | 584,2.155027926934807 587 | 585,3.0763715554468307 588 | 586,2.2804956585199467 589 | 587,2.330030225288893 590 | 588,2.7283262483956863 591 | 589,1.9667832940469763 592 | 590,2.7629914605916728 593 | 591,3.347673260683096 594 | 592,2.7204991774409706 595 | 593,2.2213380726123417 596 | 594,3.295738484226204 597 | 595,3.0943834635313716 598 | 596,3.395245128818069 599 | 597,1.7801657319322999 600 | 598,3.006003815599795 601 | 599,3.7271303717241318 602 | 600,3.3687920925147727 603 | 601,3.494081235835588 604 | 602,2.6398630706074826 605 | 603,3.32437043035613 606 | 604,3.9716937039130134 607 | 605,3.0148970547587193 608 | 606,2.7587802972729034 609 | 607,3.541314808626614 610 | 608,3.419552004582584 611 | 609,4.096884215866331 612 | 610,2.5616750924741796 613 | 611,3.5639179882703873 614 | 612,3.4268808208312347 615 | 613,2.872363219761527 616 | 614,3.3855457293820366 617 | 615,2.93785729220192 618 | 616,3.1458155752845265 619 | 617,3.2465539652297237 620 | 618,2.946748408784293 621 | 619,3.7676191513221244 622 | 620,4.106623396844353 623 | 621,3.4644362851382864 624 | 622,4.2298715687484965 625 | 623,4.614724972380464 626 | 624,4.485952085461564 627 | 625,4.008892979961672 628 | 626,3.4117492445208164 629 | 627,3.6571908075816983 630 | 628,3.8408519603679254 631 | 629,3.431112244463793 632 | 630,4.556865657959409 633 | 631,4.237222933222556 634 | 632,3.222116818635072 635 | 633,3.576165574677214 636 | 634,3.9754856194853234 637 | 635,3.328213368589007 638 | 636,4.104693254466189 639 | 637,3.9691640998211914 640 | 638,3.3161168154225567 641 | 639,3.7508224414841997 642 | 640,4.434260513708335 643 | 641,3.2003371924917166 644 | 642,4.980128628132029 645 | 643,4.545334396893772 646 | 644,4.181800858792881 647 | 645,4.264018934480361 648 | 646,3.81587906099798 649 | 647,4.546549273705075 650 | 648,4.343850164247174 651 | 649,3.785030301942927 652 | 650,3.9667904424294553 653 | 651,4.832489508124424 654 | 652,3.5115344994617628 655 | 653,5.280271863593596 656 | 654,5.170105810177021 657 | 655,4.001193197013245 658 | 656,4.152953851268918 659 | 657,4.349360432568223 660 | 658,3.5909299965583887 661 | 659,4.734825589863275 662 | 660,3.893199952723647 663 | 661,5.383358186348112 664 | 662,3.861226525588445 665 | 663,3.8204241629880973 666 | 664,4.030051057138472 667 | 665,4.01900086168555 668 | 666,4.245729586883419 669 | 667,3.8258969209411626 670 | 668,4.640010552723009 671 | 669,4.283439814229849 672 | 670,4.4789429151128495 673 | 671,3.9050383247869966 674 | 672,4.440188192844067 675 | 673,3.9891542166926928 676 | 674,4.533468802275714 677 | 675,3.2600453449902833 678 | 676,4.435279876652234 679 | 677,3.8421123752652178 680 | 678,3.81433230199235 681 | 679,4.095330470306205 682 | 680,3.7508912112831534 683 | 681,3.6067124664798547 684 | 682,3.773260762561308 685 | 683,3.9661995229630125 686 | 684,4.025269079939741 687 | 685,3.891316308095429 688 | 686,2.6268497228468517 689 | 687,3.9555450836012014 690 | 688,4.217561006995724 691 | 689,3.959901576095089 692 | 690,3.9814289938170444 693 | 691,3.4927129816373235 694 | 692,3.643282736855781 695 | 693,3.415009233378614 696 | 694,3.755798217824436 697 | 695,3.767404234589839 698 | 696,3.2622188273548947 699 | 697,3.7034220234951563 700 | 698,2.449142007600186 701 | 699,2.7817285578755713 702 | -------------------------------------------------------------------------------- /06 - Sequence Models/TODO.txt: -------------------------------------------------------------------------------- 1 | - RNN, LSTM, etc. 2 | - sequence classification 3 | - predicting next element in the sequence 4 | - time-series using RNN 5 | - ... -------------------------------------------------------------------------------- /06 - Sequence Models/data/seq01.test.csv: -------------------------------------------------------------------------------- 1 | 0.0,0.687785252292,1.1510565163,1.2510565163,0.987785252292,0.5,0.0122147477075,-0.251056516295,-0.151056516295,0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771 2 | 0.687785252292,1.1510565163,1.2510565163,0.987785252292,0.5,0.0122147477075,-0.251056516295,-0.151056516295,0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0 3 | 1.1510565163,1.2510565163,0.987785252292,0.5,0.0122147477075,-0.251056516295,-0.151056516295,0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229 4 | 1.2510565163,0.987785252292,0.5,0.0122147477075,-0.251056516295,-0.151056516295,0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163 5 | 0.987785252292,0.5,0.0122147477075,-0.251056516295,-0.151056516295,0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163 6 | 0.5,0.0122147477075,-0.251056516295,-0.151056516295,0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229 7 | 0.0122147477075,-0.251056516295,-0.151056516295,0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5 8 | -0.251056516295,-0.151056516295,0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771 9 | -0.151056516295,0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837 10 | 0.312214747708,1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837 11 | 1.0,1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771 12 | 1.68778525229,2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0 13 | 2.1510565163,2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229 14 | 2.2510565163,1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163 15 | 1.98778525229,1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163 16 | 1.5,1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229 17 | 1.01221474771,0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5 18 | 0.748943483705,0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771 19 | 0.848943483705,1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837 20 | 1.31221474771,2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837 21 | 2.0,2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771 22 | 2.68778525229,3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0 23 | 3.1510565163,3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229 24 | 3.2510565163,2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163 25 | 2.98778525229,2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163 26 | 2.5,2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229 27 | 2.01221474771,1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5 28 | 1.7489434837,1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771 29 | 1.8489434837,2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837 30 | 2.31221474771,3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837 31 | 3.0,3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771 32 | 3.68778525229,4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0 33 | 4.1510565163,4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229 34 | 4.2510565163,3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163 35 | 3.98778525229,3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163 36 | 3.5,3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229 37 | 3.01221474771,2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5 38 | 2.7489434837,2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771 39 | 2.8489434837,3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837 40 | 3.31221474771,4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837 41 | 4.0,4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771 42 | 4.68778525229,5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0 43 | 5.1510565163,5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229 44 | 5.2510565163,4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163 45 | 4.98778525229,4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163 46 | 4.5,4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229 47 | 4.01221474771,3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5 48 | 3.7489434837,3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771 49 | 3.8489434837,4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837 50 | 4.31221474771,5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837 51 | 5.0,5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771 52 | 5.68778525229,6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0 53 | 6.1510565163,6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229 54 | 6.2510565163,5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163 55 | 5.98778525229,5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163 56 | 5.5,5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229 57 | 5.01221474771,4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5 58 | 4.7489434837,4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771 59 | 4.8489434837,5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837 60 | 5.31221474771,6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837 61 | 6.0,6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771 62 | 6.68778525229,7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0 63 | 7.1510565163,7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229 64 | 7.2510565163,6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163 65 | 6.98778525229,6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163 66 | 6.5,6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229 67 | 6.01221474771,5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5 68 | 5.7489434837,5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771 69 | 5.8489434837,6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837 70 | 6.31221474771,7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837 71 | 7.0,7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771 72 | 7.68778525229,8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0 73 | 8.1510565163,8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229 74 | 8.2510565163,7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163 75 | 7.98778525229,7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163 76 | 7.5,7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229 77 | 7.01221474771,6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5 78 | 6.7489434837,6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771 79 | 6.8489434837,7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837 80 | 7.31221474771,8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837 81 | 8.0,8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771 82 | 8.68778525229,9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0 83 | 9.1510565163,9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523 84 | 9.2510565163,8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163 85 | 8.98778525229,8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163 86 | 8.5,8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523 87 | 8.01221474771,7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5 88 | 7.7489434837,7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477 89 | 7.8489434837,8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837 90 | 8.31221474771,9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837 91 | 9.0,9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477 92 | 9.68778525229,10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477,11.0 93 | 10.1510565163,10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477,11.0,11.6877852523 94 | 10.2510565163,9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477,11.0,11.6877852523,12.1510565163 95 | 9.98778525229,9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477,11.0,11.6877852523,12.1510565163,12.2510565163 96 | 9.5,9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477,11.0,11.6877852523,12.1510565163,12.2510565163,11.9877852523 97 | 9.01221474771,8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477,11.0,11.6877852523,12.1510565163,12.2510565163,11.9877852523,11.5 98 | 8.7489434837,8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477,11.0,11.6877852523,12.1510565163,12.2510565163,11.9877852523,11.5,11.0122147477 99 | 8.8489434837,9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477,11.0,11.6877852523,12.1510565163,12.2510565163,11.9877852523,11.5,11.0122147477,10.7489434837 100 | 9.31221474771,10.0,10.6877852523,11.1510565163,11.2510565163,10.9877852523,10.5,10.0122147477,9.7489434837,9.8489434837,10.3122147477,11.0,11.6877852523,12.1510565163,12.2510565163,11.9877852523,11.5,11.0122147477,10.7489434837,10.8489434837 101 | -------------------------------------------------------------------------------- /07 - Image Analysis/00.0 - TensorFlow Version Update.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "%%bash\n", 10 | "\n", 11 | "pip install -U tensorflow" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": null, 17 | "metadata": {}, 18 | "outputs": [], 19 | "source": [ 20 | "import tensorflow as tf\n", 21 | "print(tf.__version__)" 22 | ] 23 | } 24 | ], 25 | "metadata": { 26 | "kernelspec": { 27 | "display_name": "Python 2", 28 | "language": "python", 29 | "name": "python2" 30 | }, 31 | "language_info": { 32 | "codemirror_mode": { 33 | "name": "ipython", 34 | "version": 2 35 | }, 36 | "file_extension": ".py", 37 | "mimetype": "text/x-python", 38 | "name": "python", 39 | "nbconvert_exporter": "python", 40 | "pygments_lexer": "ipython2", 41 | "version": "2.7.13" 42 | } 43 | }, 44 | "nbformat": 4, 45 | "nbformat_minor": 2 46 | } 47 | -------------------------------------------------------------------------------- /08 - Text Analysis/01 - Text Classification - SMS Ham vs. Spam - Data Preparation.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## UCI SMS Spam Collection Dataset\n", 8 | "\n", 9 | "### Dataset URL: http://archive.ics.uci.edu/ml/datasets/SMS+Spam+Collection\n", 10 | "\n", 11 | "A set of labeled SMS messages + label (ham vs Spam)" 12 | ] 13 | }, 14 | { 15 | "cell_type": "code", 16 | "execution_count": 1, 17 | "metadata": { 18 | "collapsed": true 19 | }, 20 | "outputs": [], 21 | "source": [ 22 | "import pandas as pd\n", 23 | "import string\n", 24 | "import re\n", 25 | "from sklearn import model_selection" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "execution_count": 2, 31 | "metadata": {}, 32 | "outputs": [ 33 | { 34 | "data": { 35 | "text/html": [ 36 | "
\n", 37 | "\n", 50 | "\n", 51 | " \n", 52 | " \n", 53 | " \n", 54 | " \n", 55 | " \n", 56 | " \n", 57 | " \n", 58 | " \n", 59 | " \n", 60 | " \n", 61 | " \n", 62 | " \n", 63 | " \n", 64 | " \n", 65 | " \n", 66 | " \n", 67 | " \n", 68 | " \n", 69 | " \n", 70 | " \n", 71 | " \n", 72 | " \n", 73 | " \n", 74 | " \n", 75 | " \n", 76 | " \n", 77 | " \n", 78 | " \n", 79 | " \n", 80 | " \n", 81 | " \n", 82 | " \n", 83 | " \n", 84 | " \n", 85 | "
classsms
0hamGo until jurong point, crazy.. Available only ...
1hamOk lar... Joking wif u oni...
2spamFree entry in 2 a wkly comp to win FA Cup fina...
3hamU dun say so early hor... U c already then say...
4hamNah I don't think he goes to usf, he lives aro...
\n", 86 | "
" 87 | ], 88 | "text/plain": [ 89 | " class sms\n", 90 | "0 ham Go until jurong point, crazy.. Available only ...\n", 91 | "1 ham Ok lar... Joking wif u oni...\n", 92 | "2 spam Free entry in 2 a wkly comp to win FA Cup fina...\n", 93 | "3 ham U dun say so early hor... U c already then say...\n", 94 | "4 ham Nah I don't think he goes to usf, he lives aro..." 95 | ] 96 | }, 97 | "execution_count": 2, 98 | "metadata": {}, 99 | "output_type": "execute_result" 100 | } 101 | ], 102 | "source": [ 103 | "DATASET_FILE = 'data/sms-spam/SMSSpamCollection'\n", 104 | "dataset = pd.read_csv(DATASET_FILE, sep='\\t', names=['class','sms'])\n", 105 | "dataset.head()" 106 | ] 107 | }, 108 | { 109 | "cell_type": "code", 110 | "execution_count": 3, 111 | "metadata": {}, 112 | "outputs": [ 113 | { 114 | "name": "stdout", 115 | "output_type": "stream", 116 | "text": [ 117 | "Dataset Size: 5572\n", 118 | "ham 4825\n", 119 | "spam 747\n", 120 | "Name: class, dtype: int64\n", 121 | "ham %: 86.59\n", 122 | "ham %: 13.41\n" 123 | ] 124 | } 125 | ], 126 | "source": [ 127 | "print(\"Dataset Size: {}\".format(len(dataset)))\n", 128 | "value_counts = dataset['class'].value_counts()\n", 129 | "print(value_counts)\n", 130 | "print(\"ham %: {}\".format(round(value_counts[0]/len(dataset)*100,2)))\n", 131 | "print(\"ham %: {}\".format(round(value_counts[1]/len(dataset)*100,2)))" 132 | ] 133 | }, 134 | { 135 | "cell_type": "markdown", 136 | "metadata": {}, 137 | "source": [ 138 | "## Create Training and Validation Datasets" 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": 4, 144 | "metadata": {}, 145 | "outputs": [ 146 | { 147 | "name": "stdout", 148 | "output_type": "stream", 149 | "text": [ 150 | "4179\n", 151 | "1393\n" 152 | ] 153 | } 154 | ], 155 | "source": [ 156 | "exclude = ['\\t', '\"']\n", 157 | "def clean_text(text):\n", 158 | " for c in exclude:\n", 159 | " text=text.replace(c,'')\n", 160 | " return text.lower().strip()\n", 161 | "\n", 162 | "sms_processed = list(map(lambda text: clean_text(text), \n", 163 | " dataset['sms'].values))\n", 164 | "\n", 165 | "dataset['sms'] = sms_processed\n", 166 | "\n", 167 | "splitter = model_selection.StratifiedShuffleSplit(n_splits=1,\n", 168 | " test_size=0.25, \n", 169 | " random_state=19850610)\n", 170 | "\n", 171 | "splits = list(splitter.split(X=dataset['sms'], y=dataset['class']))\n", 172 | "train_index = splits[0][0]\n", 173 | "valid_index = splits[0][1]\n", 174 | "\n", 175 | "train_df = dataset.loc[train_index,:]\n", 176 | "print(len(train_df))\n", 177 | "\n", 178 | "valid_df = dataset.loc[valid_index,:]\n", 179 | "print(len(valid_df))" 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "execution_count": 5, 185 | "metadata": {}, 186 | "outputs": [ 187 | { 188 | "name": "stdout", 189 | "output_type": "stream", 190 | "text": [ 191 | "Training Set\n", 192 | "ham 3619\n", 193 | "spam 560\n", 194 | "Name: class, dtype: int64\n", 195 | "ham %: 86.6\n", 196 | "ham %: 13.4\n", 197 | "\n", 198 | "Validation Set\n", 199 | "ham 1206\n", 200 | "spam 187\n", 201 | "Name: class, dtype: int64\n", 202 | "ham %: 86.58\n", 203 | "ham %: 13.42\n" 204 | ] 205 | } 206 | ], 207 | "source": [ 208 | "print(\"Training Set\")\n", 209 | "training_value_counts = train_df['class'].value_counts()\n", 210 | "print(training_value_counts)\n", 211 | "print(\"ham %: {}\".format(round(training_value_counts[0]/len(train_df)*100,2)))\n", 212 | "print(\"ham %: {}\".format(round(training_value_counts[1]/len(train_df)*100,2)))\n", 213 | "print(\"\")\n", 214 | "print(\"Validation Set\")\n", 215 | "validation_value_counts = valid_df['class'].value_counts()\n", 216 | "print(validation_value_counts)\n", 217 | "print(\"ham %: {}\".format(round(validation_value_counts[0]/len(valid_df)*100,2)))\n", 218 | "print(\"ham %: {}\".format(round(validation_value_counts[1]/len(valid_df)*100,2)))" 219 | ] 220 | }, 221 | { 222 | "cell_type": "markdown", 223 | "metadata": {}, 224 | "source": [ 225 | "## Save Training and Validation Datasets" 226 | ] 227 | }, 228 | { 229 | "cell_type": "code", 230 | "execution_count": 6, 231 | "metadata": { 232 | "collapsed": true 233 | }, 234 | "outputs": [], 235 | "source": [ 236 | "train_df.to_csv(\"data/sms-spam/train-data.tsv\", header=False, index=False, sep='\\t')\n", 237 | "valid_df.to_csv(\"data/sms-spam/valid-data.tsv\", header=False, index=False, sep='\\t')" 238 | ] 239 | }, 240 | { 241 | "cell_type": "code", 242 | "execution_count": 7, 243 | "metadata": {}, 244 | "outputs": [ 245 | { 246 | "data": { 247 | "text/html": [ 248 | "
\n", 249 | "\n", 262 | "\n", 263 | " \n", 264 | " \n", 265 | " \n", 266 | " \n", 267 | " \n", 268 | " \n", 269 | " \n", 270 | " \n", 271 | " \n", 272 | " \n", 273 | " \n", 274 | " \n", 275 | " \n", 276 | " \n", 277 | " \n", 278 | " \n", 279 | " \n", 280 | " \n", 281 | " \n", 282 | " \n", 283 | " \n", 284 | " \n", 285 | " \n", 286 | " \n", 287 | " \n", 288 | " \n", 289 | " \n", 290 | " \n", 291 | " \n", 292 | " \n", 293 | " \n", 294 | " \n", 295 | " \n", 296 | " \n", 297 | "
classsms
4174hamjust woke up. yeesh its late. but i didn't fal...
4175hamwhat do u reckon as need 2 arrange transport i...
4176spamfree entry into our £250 weekly competition ju...
4177spam-pls stop bootydelious (32/f) is inviting you ...
4178hamtell my bad character which u dnt lik in me. ...
\n", 298 | "
" 299 | ], 300 | "text/plain": [ 301 | " class sms\n", 302 | "4174 ham just woke up. yeesh its late. but i didn't fal...\n", 303 | "4175 ham what do u reckon as need 2 arrange transport i...\n", 304 | "4176 spam free entry into our £250 weekly competition ju...\n", 305 | "4177 spam -pls stop bootydelious (32/f) is inviting you ...\n", 306 | "4178 ham tell my bad character which u dnt lik in me. ..." 307 | ] 308 | }, 309 | "execution_count": 7, 310 | "metadata": {}, 311 | "output_type": "execute_result" 312 | } 313 | ], 314 | "source": [ 315 | "pd.read_csv(\"data/sms-spam/train-data.tsv\", sep='\\t', names=['class','sms']).tail()" 316 | ] 317 | }, 318 | { 319 | "cell_type": "code", 320 | "execution_count": 12, 321 | "metadata": {}, 322 | "outputs": [ 323 | { 324 | "data": { 325 | "text/html": [ 326 | "
\n", 327 | "\n", 340 | "\n", 341 | " \n", 342 | " \n", 343 | " \n", 344 | " \n", 345 | " \n", 346 | " \n", 347 | " \n", 348 | " \n", 349 | " \n", 350 | " \n", 351 | " \n", 352 | " \n", 353 | " \n", 354 | " \n", 355 | " \n", 356 | " \n", 357 | " \n", 358 | " \n", 359 | " \n", 360 | " \n", 361 | " \n", 362 | " \n", 363 | " \n", 364 | " \n", 365 | " \n", 366 | " \n", 367 | " \n", 368 | " \n", 369 | " \n", 370 | " \n", 371 | " \n", 372 | " \n", 373 | " \n", 374 | " \n", 375 | "
classsms
1387hamtrue dear..i sat to pray evening and felt so.s...
1388hamwhat will we do in the shower, baby?
1389hamwhere are you ? what are you doing ? are yuou ...
1390spamur cash-balance is currently 500 pounds - to m...
1391spamnot heard from u4 a while. call 4 rude chat pr...
\n", 376 | "
" 377 | ], 378 | "text/plain": [ 379 | " class sms\n", 380 | "1387 ham true dear..i sat to pray evening and felt so.s...\n", 381 | "1388 ham what will we do in the shower, baby?\n", 382 | "1389 ham where are you ? what are you doing ? are yuou ...\n", 383 | "1390 spam ur cash-balance is currently 500 pounds - to m...\n", 384 | "1391 spam not heard from u4 a while. call 4 rude chat pr..." 385 | ] 386 | }, 387 | "execution_count": 12, 388 | "metadata": {}, 389 | "output_type": "execute_result" 390 | } 391 | ], 392 | "source": [ 393 | "pd.read_csv(\"data/sms-spam/valid-data.tsv\", sep='\\t', names=['class','sms']).tail()" 394 | ] 395 | }, 396 | { 397 | "cell_type": "markdown", 398 | "metadata": {}, 399 | "source": [ 400 | "## Calculate Vocabulary" 401 | ] 402 | }, 403 | { 404 | "cell_type": "code", 405 | "execution_count": 9, 406 | "metadata": { 407 | "collapsed": true 408 | }, 409 | "outputs": [], 410 | "source": [ 411 | "def get_vocab():\n", 412 | " vocab = set()\n", 413 | " for text in train_df['sms'].values:\n", 414 | " words = text.split(' ')\n", 415 | " word_set = set(words)\n", 416 | " vocab.update(word_set)\n", 417 | " \n", 418 | " vocab.remove('')\n", 419 | " return list(vocab)" 420 | ] 421 | }, 422 | { 423 | "cell_type": "code", 424 | "execution_count": 10, 425 | "metadata": {}, 426 | "outputs": [ 427 | { 428 | "name": "stdout", 429 | "output_type": "stream", 430 | "text": [ 431 | "11330\n" 432 | ] 433 | }, 434 | { 435 | "data": { 436 | "text/plain": [ 437 | "['child',\n", 438 | " 'place..',\n", 439 | " 'hi..i',\n", 440 | " 'oso?',\n", 441 | " 'home!',\n", 442 | " 'lasting',\n", 443 | " 'there..do',\n", 444 | " 'clock',\n", 445 | " 'advice',\n", 446 | " 'free...']" 447 | ] 448 | }, 449 | "execution_count": 10, 450 | "metadata": {}, 451 | "output_type": "execute_result" 452 | } 453 | ], 454 | "source": [ 455 | "vocab = get_vocab()\n", 456 | "print(len(vocab))\n", 457 | "vocab[10:20]" 458 | ] 459 | }, 460 | { 461 | "cell_type": "markdown", 462 | "metadata": {}, 463 | "source": [ 464 | "## Save Vocabulary" 465 | ] 466 | }, 467 | { 468 | "cell_type": "code", 469 | "execution_count": 11, 470 | "metadata": { 471 | "collapsed": true 472 | }, 473 | "outputs": [], 474 | "source": [ 475 | "PAD_WORD = '#=KS=#'\n", 476 | "\n", 477 | "with open('data/sms-spam/vocab_list.tsv', 'w') as file:\n", 478 | " file.write(\"{}\\n\".format(PAD_WORD))\n", 479 | " for word in vocab:\n", 480 | " file.write(\"{}\\n\".format(word))\n", 481 | " \n", 482 | "with open('data/sms-spam/n_words.tsv', 'w') as file:\n", 483 | " file.write(str(len(vocab)))" 484 | ] 485 | }, 486 | { 487 | "cell_type": "code", 488 | "execution_count": null, 489 | "metadata": { 490 | "collapsed": true 491 | }, 492 | "outputs": [], 493 | "source": [] 494 | } 495 | ], 496 | "metadata": { 497 | "kernelspec": { 498 | "display_name": "Python 3", 499 | "language": "python", 500 | "name": "python3" 501 | }, 502 | "language_info": { 503 | "codemirror_mode": { 504 | "name": "ipython", 505 | "version": 3 506 | }, 507 | "file_extension": ".py", 508 | "mimetype": "text/x-python", 509 | "name": "python", 510 | "nbconvert_exporter": "python", 511 | "pygments_lexer": "ipython3", 512 | "version": "3.6.1" 513 | } 514 | }, 515 | "nbformat": 4, 516 | "nbformat_minor": 2 517 | } 518 | -------------------------------------------------------------------------------- /08 - Text Analysis/06 - Part_1 - Text Classification - Hacker News - Data Preprocessing with TFT.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 1, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "# %%bash\n", 10 | "\n", 11 | "# pip install tensorflow==1.7\n", 12 | "# pip install google-cloud-dataflow==2.3\n", 13 | "# pip install tensorflow-hub" 14 | ] 15 | }, 16 | { 17 | "cell_type": "markdown", 18 | "metadata": {}, 19 | "source": [ 20 | "# Text Classification using TensorFlow and Google Cloud - Part 1\n", 21 | "\n", 22 | "This [bigquery-public-data:hacker_news](https://cloud.google.com/bigquery/public-data/hacker-news) contains all stories and comments from Hacker News from its launch in 2006. Each story contains a story id, url, the title of the story, tthe author that made the post, when it was written, and the number of points the story received.\n", 23 | "\n", 24 | "The objective is, given the title of the story, we want to build an ML model that can predict the source of this story.\n", 25 | "\n", 26 | "## Data preparation with tf.Transform and DataFlow\n", 27 | "\n", 28 | "This notebook illustrates how to build a Beam pipeline using tf.transform to prepare ML 'train' and 'eval' datasets. \n", 29 | "The pipeline includes the following steps:\n", 30 | "1. Read data from BigQuery\n", 31 | "2. Extract and clean features from BQ rows\n", 32 | "3. Use tf.transfrom to process the text and produce the following features for each entry\n", 33 | " * title: Raw text - string\n", 34 | " * bow: Bag of word indecies - sparse vector of integers\n", 35 | " * weight: TF.IDF values - sparse vector of floats\n", 36 | " * source: target feature - string\n", 37 | "4. Save the data as .tfrecord files\n", 38 | " \n", 39 | "\n" 40 | ] 41 | }, 42 | { 43 | "cell_type": "markdown", 44 | "metadata": {}, 45 | "source": [ 46 | "### Setting Global Parameters" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": 2, 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "import os\n", 56 | "\n", 57 | "class Params:\n", 58 | " pass\n", 59 | "\n", 60 | "# Set to run on GCP\n", 61 | "Params.GCP_PROJECT_ID = 'ksalama-gcp-playground'\n", 62 | "Params.REGION = 'europe-west1'\n", 63 | "Params.BUCKET = 'ksalama-gcs-cloudml'\n", 64 | "\n", 65 | "Params.PLATFORM = 'local' # local | GCP\n", 66 | "\n", 67 | "Params.DATA_DIR = 'data/news' if Params.PLATFORM == 'local' else 'gs://{}/data/news'.format(Params.BUCKET)\n", 68 | "\n", 69 | "Params.TRANSFORMED_DATA_DIR = os.path.join(Params.DATA_DIR, 'transformed')\n", 70 | "Params.TRANSFORMED_TRAIN_DATA_FILE_PREFIX = os.path.join(Params.TRANSFORMED_DATA_DIR, 'train')\n", 71 | "Params.TRANSFORMED_EVAL_DATA_FILE_PREFIX = os.path.join(Params.TRANSFORMED_DATA_DIR, 'eval')\n", 72 | "\n", 73 | "Params.TEMP_DIR = os.path.join(Params.DATA_DIR, 'tmp')\n", 74 | "\n", 75 | "Params.MODELS_DIR = 'models/news' if Params.PLATFORM == 'local' else 'gs://{}/models/news'.format(Params.BUCKET)\n", 76 | "\n", 77 | "Params.TRANSFORM_ARTEFACTS_DIR = os.path.join(Params.MODELS_DIR,'transform')\n", 78 | "\n", 79 | "Params.TRANSFORM = True" 80 | ] 81 | }, 82 | { 83 | "cell_type": "markdown", 84 | "metadata": {}, 85 | "source": [ 86 | "### Importing libraries" 87 | ] 88 | }, 89 | { 90 | "cell_type": "code", 91 | "execution_count": 3, 92 | "metadata": {}, 93 | "outputs": [ 94 | { 95 | "name": "stdout", 96 | "output_type": "stream", 97 | "text": [ 98 | "WARNING:tensorflow:From /Users/khalidsalama/Technology/python-venvs/py27-venv/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.\n", 99 | "Instructions for updating:\n", 100 | "Use the retry module or similar alternatives.\n" 101 | ] 102 | } 103 | ], 104 | "source": [ 105 | "import apache_beam as beam\n", 106 | "\n", 107 | "import tensorflow as tf\n", 108 | "import tensorflow_transform as tft\n", 109 | "import tensorflow_transform.coders as tft_coders\n", 110 | "\n", 111 | "from tensorflow.contrib.learn.python.learn.utils import input_fn_utils\n", 112 | "\n", 113 | "from tensorflow_transform.beam import impl\n", 114 | "from tensorflow_transform.beam.tft_beam_io import transform_fn_io\n", 115 | "from tensorflow_transform.tf_metadata import metadata_io\n", 116 | "from tensorflow_transform.tf_metadata import dataset_schema\n", 117 | "from tensorflow_transform.tf_metadata import dataset_metadata\n", 118 | "from tensorflow_transform.saved import saved_transform_io" 119 | ] 120 | }, 121 | { 122 | "cell_type": "markdown", 123 | "metadata": {}, 124 | "source": [ 125 | "## 1. Source Query" 126 | ] 127 | }, 128 | { 129 | "cell_type": "code", 130 | "execution_count": 4, 131 | "metadata": {}, 132 | "outputs": [], 133 | "source": [ 134 | "bq_query = '''\n", 135 | "SELECT\n", 136 | " key,\n", 137 | " REGEXP_REPLACE(title, '[^a-zA-Z0-9 $.-]', ' ') AS title, \n", 138 | " source\n", 139 | "FROM\n", 140 | "(\n", 141 | " SELECT\n", 142 | " ARRAY_REVERSE(SPLIT(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'), '.'))[OFFSET(1)] AS source,\n", 143 | " title,\n", 144 | " ABS(FARM_FINGERPRINT(title)) AS Key\n", 145 | " FROM\n", 146 | " `bigquery-public-data.hacker_news.stories`\n", 147 | " WHERE\n", 148 | " REGEXP_CONTAINS(REGEXP_EXTRACT(url, '.*://(.[^/]+)/'), '.com$')\n", 149 | " AND LENGTH(title) > 10\n", 150 | ")\n", 151 | "WHERE (source = 'github' OR source = 'nytimes' OR source = 'techcrunch')\n", 152 | "'''\n", 153 | "\n", 154 | "def get_source_query(step):\n", 155 | " \n", 156 | " if step == 'train':\n", 157 | " source_query = 'SELECT * FROM ({}) WHERE MOD(key,100) <= 75'.format(bq_query)\n", 158 | " else:\n", 159 | " source_query = 'SELECT * FROM ({}) WHERE MOD(key,100) > 75'.format(bq_query)\n", 160 | " \n", 161 | " return source_query" 162 | ] 163 | }, 164 | { 165 | "cell_type": "markdown", 166 | "metadata": {}, 167 | "source": [ 168 | "## 2. Raw metadata" 169 | ] 170 | }, 171 | { 172 | "cell_type": "code", 173 | "execution_count": 5, 174 | "metadata": {}, 175 | "outputs": [], 176 | "source": [ 177 | "RAW_HEADER = 'key,title,source'.split(',')\n", 178 | "RAW_DEFAULTS = [['NA'],['NA'],['NA']]\n", 179 | "TARGET_FEATURE_NAME = 'source'\n", 180 | "TARGET_LABELS = ['github', 'nytimes', 'techcrunch']\n", 181 | "TEXT_FEATURE_NAME = 'title'\n", 182 | "KEY_COLUMN = 'key'\n", 183 | "\n", 184 | "VOCAB_SIZE = 20000\n", 185 | "TRAIN_SIZE = 73124\n", 186 | "EVAL_SIZE = 23079\n", 187 | "\n", 188 | "DELIMITERS = '.,!?() '\n", 189 | "\n", 190 | "raw_metadata = dataset_metadata.DatasetMetadata(dataset_schema.Schema({\n", 191 | " KEY_COLUMN: dataset_schema.ColumnSchema(\n", 192 | " tf.string, [], dataset_schema.FixedColumnRepresentation()),\n", 193 | " TEXT_FEATURE_NAME: dataset_schema.ColumnSchema(\n", 194 | " tf.string, [], dataset_schema.FixedColumnRepresentation()),\n", 195 | " TARGET_FEATURE_NAME: dataset_schema.ColumnSchema(\n", 196 | " tf.string, [], dataset_schema.FixedColumnRepresentation()),\n", 197 | "}))" 198 | ] 199 | }, 200 | { 201 | "cell_type": "markdown", 202 | "metadata": {}, 203 | "source": [ 204 | "## 3. Preprocessing functions" 205 | ] 206 | }, 207 | { 208 | "cell_type": "code", 209 | "execution_count": 6, 210 | "metadata": {}, 211 | "outputs": [], 212 | "source": [ 213 | "def get_features(bq_row):\n", 214 | " \n", 215 | " CSV_HEADER = 'key,title,source'.split(',')\n", 216 | " \n", 217 | " input_features = {}\n", 218 | " \n", 219 | " for feature_name in CSV_HEADER:\n", 220 | " input_features[feature_name] = str(bq_row[feature_name]).lower()\n", 221 | " \n", 222 | " return input_features\n", 223 | "\n", 224 | "\n", 225 | "def preprocessing_fn(input_features):\n", 226 | " \n", 227 | " text = input_features[TEXT_FEATURE_NAME]\n", 228 | "\n", 229 | " text_tokens = tf.string_split(text, DELIMITERS)\n", 230 | " text_tokens_indcies = tft.string_to_int(text_tokens, top_k=VOCAB_SIZE)\n", 231 | " bag_of_words_indices, text_weight = tft.tfidf(text_tokens_indcies, VOCAB_SIZE + 1)\n", 232 | " \n", 233 | " output_features = {}\n", 234 | " output_features[TEXT_FEATURE_NAME] = input_features[TEXT_FEATURE_NAME]\n", 235 | " output_features['bow'] = bag_of_words_indices\n", 236 | " output_features['weight'] = text_weight\n", 237 | " output_features[TARGET_FEATURE_NAME] = input_features[TARGET_FEATURE_NAME]\n", 238 | " \n", 239 | " return output_features" 240 | ] 241 | }, 242 | { 243 | "cell_type": "markdown", 244 | "metadata": {}, 245 | "source": [ 246 | "## 4. Beam Pipeline" 247 | ] 248 | }, 249 | { 250 | "cell_type": "code", 251 | "execution_count": 7, 252 | "metadata": {}, 253 | "outputs": [], 254 | "source": [ 255 | "import apache_beam as beam\n", 256 | "\n", 257 | "\n", 258 | "def run_pipeline(runner, opts):\n", 259 | " \n", 260 | " print(\"Sink train data files: {}\".format(Params.TRANSFORMED_TRAIN_DATA_FILE_PREFIX))\n", 261 | " print(\"Sink data files: {}\".format(Params.TRANSFORMED_EVAL_DATA_FILE_PREFIX))\n", 262 | " print(\"Temporary directory: {}\".format(Params.TEMP_DIR))\n", 263 | " print(\"\")\n", 264 | " \n", 265 | " \n", 266 | " with beam.Pipeline(runner, options=opts) as pipeline:\n", 267 | " with impl.Context(Params.TEMP_DIR): \n", 268 | " \n", 269 | " ###### analyze & transform train #########################################################\n", 270 | " if(runner=='DirectRunner'):\n", 271 | " print(\"\")\n", 272 | " print(\"Transform training data....\")\n", 273 | " print(\"\")\n", 274 | " \n", 275 | " step = 'train'\n", 276 | " source_query = get_source_query(step)\n", 277 | " \n", 278 | " # Read raw train data from BQ and cleanup\n", 279 | " raw_train_data = (\n", 280 | " pipeline\n", 281 | " | '{} - Read Data from BigQuery'.format(step) >> beam.io.Read(beam.io.BigQuerySource(query=source_query, use_standard_sql=True))\n", 282 | " | '{} - Extract Features'.format(step) >> beam.Map(get_features)\n", 283 | " )\n", 284 | " \n", 285 | " # create a train dataset from the data and schema\n", 286 | " raw_train_dataset = (raw_train_data, raw_metadata)\n", 287 | " \n", 288 | " # analyze and transform raw_train_dataset to produced transformed_train_dataset and transform_fn\n", 289 | " transformed_train_dataset, transform_fn = (\n", 290 | " raw_train_dataset \n", 291 | " | '{} - Analyze & Transform'.format(step) >> impl.AnalyzeAndTransformDataset(preprocessing_fn)\n", 292 | " )\n", 293 | " \n", 294 | " # get data and schema separately from the transformed_train_dataset\n", 295 | " transformed_train_data, transformed_metadata = transformed_train_dataset\n", 296 | "\n", 297 | " # write transformed train data to sink\n", 298 | " _ = (\n", 299 | " transformed_train_data \n", 300 | " | '{} - Write Transformed Data as tfrecords'.format(step) >> beam.io.tfrecordio.WriteToTFRecord(\n", 301 | " file_path_prefix=Params.TRANSFORMED_TRAIN_DATA_FILE_PREFIX,\n", 302 | " file_name_suffix=\".tfrecords\",\n", 303 | " num_shards=25,\n", 304 | " coder=tft_coders.example_proto_coder.ExampleProtoCoder(transformed_metadata.schema))\n", 305 | " )\n", 306 | " \n", 307 | " \n", 308 | "# #### TEST write transformed AS TEXT train data to sink\n", 309 | "# _ = (\n", 310 | "# transformed_train_data \n", 311 | "# | '{} - Write Transformed Data as Text'.format(step) >> beam.io.textio.WriteToText(\n", 312 | "# file_path_prefix=Params.TRANSFORMED_TRAIN_DATA_FILE_PREFIX,\n", 313 | "# file_name_suffix=\".csv\")\n", 314 | "# )\n", 315 | "# ##################################################\n", 316 | "\n", 317 | "\n", 318 | " ###### transform eval ##################################################################\n", 319 | " \n", 320 | " if(runner=='DirectRunner'):\n", 321 | " print(\"\")\n", 322 | " print(\"Transform eval data....\")\n", 323 | " print(\"\")\n", 324 | " \n", 325 | " step = 'eval'\n", 326 | " source_query = get_source_query(step)\n", 327 | "\n", 328 | " # Read raw eval data from BQ and cleanup\n", 329 | " raw_eval_data = (\n", 330 | " pipeline\n", 331 | " | '{} - Read Data from BigQuery'.format(step) >> beam.io.Read(beam.io.BigQuerySource(query=source_query, use_standard_sql=True))\n", 332 | " | '{} - Extract Features'.format(step) >> beam.Map(get_features)\n", 333 | " )\n", 334 | " \n", 335 | " # create a eval dataset from the data and schema\n", 336 | " raw_eval_dataset = (raw_eval_data, raw_metadata)\n", 337 | " \n", 338 | " # transform eval data based on produced transform_fn (from analyzing train_data)\n", 339 | " transformed_eval_dataset = (\n", 340 | " (raw_eval_dataset, transform_fn) \n", 341 | " | '{} - Transform'.format(step) >> impl.TransformDataset()\n", 342 | " )\n", 343 | " \n", 344 | " # get data from the transformed_eval_dataset\n", 345 | " transformed_eval_data, _ = transformed_eval_dataset\n", 346 | " \n", 347 | " # write transformed eval data to sink\n", 348 | " _ = (\n", 349 | " transformed_eval_data \n", 350 | " | '{} - Write Transformed Data'.format(step) >> beam.io.tfrecordio.WriteToTFRecord(\n", 351 | " file_path_prefix=Params.TRANSFORMED_EVAL_DATA_FILE_PREFIX,\n", 352 | " file_name_suffix=\".tfrecords\",\n", 353 | " num_shards=10,\n", 354 | " coder=tft_coders.example_proto_coder.ExampleProtoCoder(transformed_metadata.schema))\n", 355 | " )\n", 356 | " \n", 357 | " ###### write transformation metadata #######################################################\n", 358 | " if(runner=='DirectRunner'):\n", 359 | " print(\"\")\n", 360 | " print(\"Saving transformation artefacts ....\")\n", 361 | " print(\"\")\n", 362 | " \n", 363 | " # write transform_fn as tf.graph\n", 364 | " _ = (\n", 365 | " transform_fn \n", 366 | " | 'Write Transform Artefacts' >> transform_fn_io.WriteTransformFn(Params.TRANSFORM_ARTEFACTS_DIR)\n", 367 | " )\n", 368 | "\n", 369 | " if runner=='DataflowRunner':\n", 370 | " pipeline.run()" 371 | ] 372 | }, 373 | { 374 | "cell_type": "markdown", 375 | "metadata": {}, 376 | "source": [ 377 | "## 5. Run Pipeline" 378 | ] 379 | }, 380 | { 381 | "cell_type": "code", 382 | "execution_count": 8, 383 | "metadata": {}, 384 | "outputs": [ 385 | { 386 | "name": "stdout", 387 | "output_type": "stream", 388 | "text": [ 389 | "Launching DirectRunner job preprocess-hackernews-data-180514-115222 ... hang on\n", 390 | "Sink train data files: data/news/transformed/train\n", 391 | "Sink data files: data/news/transformed/eval\n", 392 | "Temporary directory: data/news/tmp\n", 393 | "\n", 394 | "\n", 395 | "Transform training data....\n", 396 | "\n", 397 | "\n", 398 | "Transform eval data....\n", 399 | "\n", 400 | "\n", 401 | "Saving transformation artefacts ....\n", 402 | "\n" 403 | ] 404 | }, 405 | { 406 | "name": "stderr", 407 | "output_type": "stream", 408 | "text": [ 409 | "/Users/khalidsalama/Technology/python-venvs/py27-venv/lib/python2.7/site-packages/apache_beam/runners/direct/direct_runner.py:337: DeprecationWarning: options is deprecated since First stable release.. References to .options will not be supported\n", 410 | " pipeline.replace_all(_get_transform_overrides(pipeline.options))\n", 411 | "WARNING:root:Dataset ksalama-gcp-playground:temp_dataset_151e64fa07a3490bae91dd844ce4b7da does not exist so we will create it as temporary with location=None\n", 412 | "WARNING:root:Dataset ksalama-gcp-playground:temp_dataset_f3701d6e27e14e068968a255f43c4b8c does not exist so we will create it as temporary with location=None\n" 413 | ] 414 | }, 415 | { 416 | "name": "stdout", 417 | "output_type": "stream", 418 | "text": [ 419 | "Pipline completed.\n" 420 | ] 421 | } 422 | ], 423 | "source": [ 424 | "from datetime import datetime\n", 425 | "import shutil\n", 426 | "\n", 427 | "job_name = 'preprocess-hackernews-data' + '-' + datetime.utcnow().strftime('%y%m%d-%H%M%S')\n", 428 | "\n", 429 | "options = {\n", 430 | " 'region': Params.REGION,\n", 431 | " 'staging_location': os.path.join(Params.TEMP_DIR, 'staging'),\n", 432 | " 'temp_location': Params.TEMP_DIR,\n", 433 | " 'job_name': job_name,\n", 434 | " 'project': Params.GCP_PROJECT_ID\n", 435 | "}\n", 436 | "\n", 437 | "tf.logging.set_verbosity(tf.logging.ERROR)\n", 438 | "\n", 439 | "opts = beam.pipeline.PipelineOptions(flags=[], **options)\n", 440 | "runner = 'DirectRunner' if Params.PLATFORM == 'local' else 'DirectRunner'\n", 441 | "\n", 442 | "if Params.TRANSFORM:\n", 443 | " \n", 444 | " if Params.PLATFORM == 'local':\n", 445 | " shutil.rmtree(Params.TRANSFORMED_DATA_DIR, ignore_errors=True)\n", 446 | " shutil.rmtree(Params.TRANSFORM_ARTEFACTS_DIR, ignore_errors=True)\n", 447 | " shutil.rmtree(Params.TEMP_DIR, ignore_errors=True)\n", 448 | " \n", 449 | " print 'Launching {} job {} ... hang on'.format(runner, job_name)\n", 450 | " \n", 451 | " run_pipeline(runner, opts)\n", 452 | " \n", 453 | " print \"Pipline completed.\"\n", 454 | "else:\n", 455 | " print \"Transformation skipped!\"" 456 | ] 457 | }, 458 | { 459 | "cell_type": "code", 460 | "execution_count": 9, 461 | "metadata": {}, 462 | "outputs": [ 463 | { 464 | "name": "stdout", 465 | "output_type": "stream", 466 | "text": [ 467 | "** transformed data:\n", 468 | "eval-00000-of-00010.tfrecords\n", 469 | "eval-00001-of-00010.tfrecords\n", 470 | "eval-00002-of-00010.tfrecords\n", 471 | "eval-00003-of-00010.tfrecords\n", 472 | "eval-00004-of-00010.tfrecords\n", 473 | "eval-00005-of-00010.tfrecords\n", 474 | "eval-00006-of-00010.tfrecords\n", 475 | "eval-00007-of-00010.tfrecords\n", 476 | "eval-00008-of-00010.tfrecords\n", 477 | "eval-00009-of-00010.tfrecords\n", 478 | "train-00000-of-00025.tfrecords\n", 479 | "train-00001-of-00025.tfrecords\n", 480 | "train-00002-of-00025.tfrecords\n", 481 | "train-00003-of-00025.tfrecords\n", 482 | "train-00004-of-00025.tfrecords\n", 483 | "train-00005-of-00025.tfrecords\n", 484 | "train-00006-of-00025.tfrecords\n", 485 | "train-00007-of-00025.tfrecords\n", 486 | "train-00008-of-00025.tfrecords\n", 487 | "train-00009-of-00025.tfrecords\n", 488 | "train-00010-of-00025.tfrecords\n", 489 | "train-00011-of-00025.tfrecords\n", 490 | "train-00012-of-00025.tfrecords\n", 491 | "train-00013-of-00025.tfrecords\n", 492 | "train-00014-of-00025.tfrecords\n", 493 | "train-00015-of-00025.tfrecords\n", 494 | "train-00016-of-00025.tfrecords\n", 495 | "train-00017-of-00025.tfrecords\n", 496 | "train-00018-of-00025.tfrecords\n", 497 | "train-00019-of-00025.tfrecords\n", 498 | "train-00020-of-00025.tfrecords\n", 499 | "train-00021-of-00025.tfrecords\n", 500 | "train-00022-of-00025.tfrecords\n", 501 | "train-00023-of-00025.tfrecords\n", 502 | "train-00024-of-00025.tfrecords\n", 503 | "\n", 504 | "** transform artefacts:\n", 505 | "transform_fn\n", 506 | "transformed_metadata\n", 507 | "\n", 508 | "** transform assets:\n", 509 | "vocab_string_to_int_uniques\n", 510 | "\n", 511 | "the\n", 512 | "a\n", 513 | "to\n", 514 | "for\n", 515 | "in\n", 516 | "of\n", 517 | "and\n", 518 | "s\n", 519 | "on\n", 520 | "with\n" 521 | ] 522 | } 523 | ], 524 | "source": [ 525 | "%%bash\n", 526 | "\n", 527 | "echo \"** transformed data:\"\n", 528 | "ls data/news/transformed\n", 529 | "echo \"\"\n", 530 | "\n", 531 | "echo \"** transform artefacts:\"\n", 532 | "ls models/news/transform\n", 533 | "echo \"\"\n", 534 | "\n", 535 | "echo \"** transform assets:\"\n", 536 | "ls models/news/transform/transform_fn/assets\n", 537 | "echo \"\"\n", 538 | "\n", 539 | "head models/news/transform/transform_fn/assets/vocab_string_to_int_uniques" 540 | ] 541 | }, 542 | { 543 | "cell_type": "code", 544 | "execution_count": null, 545 | "metadata": {}, 546 | "outputs": [], 547 | "source": [] 548 | } 549 | ], 550 | "metadata": { 551 | "kernelspec": { 552 | "display_name": "Python 2", 553 | "language": "python", 554 | "name": "python2" 555 | }, 556 | "language_info": { 557 | "codemirror_mode": { 558 | "name": "ipython", 559 | "version": 2 560 | }, 561 | "file_extension": ".py", 562 | "mimetype": "text/x-python", 563 | "name": "python", 564 | "nbconvert_exporter": "python", 565 | "pygments_lexer": "ipython2", 566 | "version": "2.7.10" 567 | } 568 | }, 569 | "nbformat": 4, 570 | "nbformat_minor": 2 571 | } 572 | -------------------------------------------------------------------------------- /08 - Text Analysis/06 - Part_4 - Text Classification - Hacker News - DNNClassifier with TF.IDF.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": null, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "# %%bash\n", 10 | "\n", 11 | "# pip install tensorflow==1.7\n", 12 | "# pip install tensorflow-transform" 13 | ] 14 | }, 15 | { 16 | "cell_type": "markdown", 17 | "metadata": {}, 18 | "source": [ 19 | "# Text Classification using TensorFlow and Google Cloud - Part 4\n", 20 | "\n", 21 | "This [bigquery-public-data:hacker_news](https://cloud.google.com/bigquery/public-data/hacker-news) contains all stories and comments from Hacker News from its launch in 2006. Each story contains a story id, url, the title of the story, tthe author that made the post, when it was written, and the number of points the story received.\n", 22 | "\n", 23 | "The objective is, given the title of the story, we want to build an ML model that can predict the source of this story.\n", 24 | "\n", 25 | "## TF DNNClassifier with TF.IDF Text Reprsentation\n", 26 | "\n", 27 | "This notebook illustrates how to build a TF premade estimator, namely DNNClassifier, while the input text will be repesented as TF.IDF computed during the preprocessing phase in Part 1. The overall steps are as follows:\n", 28 | "\n", 29 | "\n", 30 | "1. Define the metadata\n", 31 | "2. Define data input function\n", 32 | "2. Create feature columns (using the tfidf)\n", 33 | "3. Create the premade DNNClassifier estimator\n", 34 | "4. Setup experiement\n", 35 | " * Hyper-parameters & RunConfig\n", 36 | " * Serving function (for exported model)\n", 37 | " * TrainSpec & EvalSpec\n", 38 | "5. Run experiement\n", 39 | "6. Evalute the model\n", 40 | "7. Use SavedModel for prediction\n", 41 | " \n", 42 | "\n" 43 | ] 44 | }, 45 | { 46 | "cell_type": "markdown", 47 | "metadata": {}, 48 | "source": [ 49 | "### Setting Global Parameters" 50 | ] 51 | }, 52 | { 53 | "cell_type": "code", 54 | "execution_count": 1, 55 | "metadata": {}, 56 | "outputs": [], 57 | "source": [ 58 | "import os\n", 59 | "\n", 60 | "class Params:\n", 61 | " pass\n", 62 | "\n", 63 | "# Set to run on GCP\n", 64 | "Params.GCP_PROJECT_ID = 'ksalama-gcp-playground'\n", 65 | "Params.REGION = 'europe-west1'\n", 66 | "Params.BUCKET = 'ksalama-gcs-cloudml'\n", 67 | "\n", 68 | "Params.PLATFORM = 'local' # local | GCP\n", 69 | "\n", 70 | "Params.DATA_DIR = 'data/news' if Params.PLATFORM == 'local' else 'gs://{}/data/news'.format(Params.BUCKET)\n", 71 | "\n", 72 | "Params.TRANSFORMED_DATA_DIR = os.path.join(Params.DATA_DIR, 'transformed')\n", 73 | "Params.TRANSFORMED_TRAIN_DATA_FILE_PREFIX = os.path.join(Params.TRANSFORMED_DATA_DIR, 'train')\n", 74 | "Params.TRANSFORMED_EVAL_DATA_FILE_PREFIX = os.path.join(Params.TRANSFORMED_DATA_DIR, 'eval')\n", 75 | "\n", 76 | "Params.TEMP_DIR = os.path.join(Params.DATA_DIR, 'tmp')\n", 77 | "\n", 78 | "Params.MODELS_DIR = 'models/news' if Params.PLATFORM == 'local' else 'gs://{}/models/news'.format(Params.BUCKET)\n", 79 | "\n", 80 | "Params.TRANSFORM_ARTEFACTS_DIR = os.path.join(Params.MODELS_DIR,'transform')\n", 81 | "\n", 82 | "Params.TRAIN = True\n", 83 | "\n", 84 | "Params.RESUME_TRAINING = False\n", 85 | "\n", 86 | "Params.EAGER = False\n", 87 | "\n", 88 | "if Params.EAGER:\n", 89 | " tf.enable_eager_execution()" 90 | ] 91 | }, 92 | { 93 | "cell_type": "markdown", 94 | "metadata": {}, 95 | "source": [ 96 | "### Importing libraries" 97 | ] 98 | }, 99 | { 100 | "cell_type": "code", 101 | "execution_count": 2, 102 | "metadata": {}, 103 | "outputs": [ 104 | { 105 | "name": "stdout", 106 | "output_type": "stream", 107 | "text": [ 108 | "WARNING:tensorflow:From /Users/khalidsalama/Technology/python-venvs/py27-venv/lib/python2.7/site-packages/tensorflow/contrib/learn/python/learn/datasets/base.py:198: retry (from tensorflow.contrib.learn.python.learn.datasets.base) is deprecated and will be removed in a future version.\n", 109 | "Instructions for updating:\n", 110 | "Use the retry module or similar alternatives.\n", 111 | "1.7.0\n" 112 | ] 113 | } 114 | ], 115 | "source": [ 116 | "import tensorflow as tf\n", 117 | "from tensorflow import data\n", 118 | "\n", 119 | "\n", 120 | "from tensorflow.contrib.learn.python.learn.utils import input_fn_utils\n", 121 | "from tensorflow_transform.beam.tft_beam_io import transform_fn_io\n", 122 | "from tensorflow_transform.tf_metadata import metadata_io\n", 123 | "from tensorflow_transform.tf_metadata import dataset_schema\n", 124 | "from tensorflow_transform.tf_metadata import dataset_metadata\n", 125 | "from tensorflow_transform.saved import saved_transform_io\n", 126 | "\n", 127 | "print tf.__version__" 128 | ] 129 | }, 130 | { 131 | "cell_type": "markdown", 132 | "metadata": {}, 133 | "source": [ 134 | "## 1. Define Metadata" 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "execution_count": 3, 140 | "metadata": {}, 141 | "outputs": [ 142 | { 143 | "name": "stdout", 144 | "output_type": "stream", 145 | "text": [ 146 | "{u'source': FixedLenFeature(shape=[], dtype=tf.string, default_value=None), u'title': FixedLenFeature(shape=[], dtype=tf.string, default_value=None), u'weight': VarLenFeature(dtype=tf.float32), u'bow': VarLenFeature(dtype=tf.int64)}\n" 147 | ] 148 | } 149 | ], 150 | "source": [ 151 | "RAW_HEADER = 'key,title,source'.split(',')\n", 152 | "RAW_DEFAULTS = [['NA'],['NA'],['NA']]\n", 153 | "TARGET_FEATURE_NAME = 'source'\n", 154 | "TARGET_LABELS = ['github', 'nytimes', 'techcrunch']\n", 155 | "TEXT_FEATURE_NAME = 'title'\n", 156 | "KEY_COLUMN = 'key'\n", 157 | "\n", 158 | "VOCAB_SIZE = 20000\n", 159 | "TRAIN_SIZE = 73124\n", 160 | "EVAL_SIZE = 23079\n", 161 | "\n", 162 | "DELIMITERS = '.,!?() '\n", 163 | "\n", 164 | "raw_metadata = dataset_metadata.DatasetMetadata(dataset_schema.Schema({\n", 165 | " KEY_COLUMN: dataset_schema.ColumnSchema(\n", 166 | " tf.string, [], dataset_schema.FixedColumnRepresentation()),\n", 167 | " TEXT_FEATURE_NAME: dataset_schema.ColumnSchema(\n", 168 | " tf.string, [], dataset_schema.FixedColumnRepresentation()),\n", 169 | " TARGET_FEATURE_NAME: dataset_schema.ColumnSchema(\n", 170 | " tf.string, [], dataset_schema.FixedColumnRepresentation()),\n", 171 | "}))\n", 172 | "\n", 173 | "\n", 174 | "transformed_metadata = metadata_io.read_metadata(\n", 175 | " os.path.join(Params.TRANSFORM_ARTEFACTS_DIR,\"transformed_metadata\"))\n", 176 | "\n", 177 | "raw_feature_spec = raw_metadata.schema.as_feature_spec()\n", 178 | "transformed_feature_spec = transformed_metadata.schema.as_feature_spec()\n", 179 | "\n", 180 | "print transformed_feature_spec" 181 | ] 182 | }, 183 | { 184 | "cell_type": "markdown", 185 | "metadata": {}, 186 | "source": [ 187 | "## 2. Define Input Function" 188 | ] 189 | }, 190 | { 191 | "cell_type": "code", 192 | "execution_count": 4, 193 | "metadata": {}, 194 | "outputs": [], 195 | "source": [ 196 | "def parse_tf_example(tf_example):\n", 197 | " \n", 198 | " parsed_features = tf.parse_single_example(serialized=tf_example, features=transformed_feature_spec)\n", 199 | " target = parsed_features.pop(TARGET_FEATURE_NAME)\n", 200 | " \n", 201 | " return parsed_features, target\n", 202 | "\n", 203 | "\n", 204 | "def generate_tfrecords_input_fn(files_pattern, \n", 205 | " mode=tf.estimator.ModeKeys.EVAL, \n", 206 | " num_epochs=1, \n", 207 | " batch_size=200):\n", 208 | " \n", 209 | " def _input_fn():\n", 210 | " \n", 211 | " file_names = data.Dataset.list_files(files_pattern)\n", 212 | "\n", 213 | " if Params.EAGER:\n", 214 | " print file_names\n", 215 | "\n", 216 | " dataset = data.TFRecordDataset(file_names )\n", 217 | "\n", 218 | " dataset = dataset.apply(\n", 219 | " tf.contrib.data.shuffle_and_repeat(count=num_epochs,\n", 220 | " buffer_size=batch_size*2)\n", 221 | " )\n", 222 | "\n", 223 | " dataset = dataset.apply(\n", 224 | " tf.contrib.data.map_and_batch(parse_tf_example, \n", 225 | " batch_size=batch_size, \n", 226 | " num_parallel_batches=2)\n", 227 | " )\n", 228 | "\n", 229 | " datset = dataset.prefetch(batch_size)\n", 230 | "\n", 231 | " if Params.EAGER:\n", 232 | " return dataset\n", 233 | "\n", 234 | " iterator = dataset.make_one_shot_iterator()\n", 235 | " features, target = iterator.get_next()\n", 236 | " return features, target\n", 237 | " \n", 238 | " return _input_fn" 239 | ] 240 | }, 241 | { 242 | "cell_type": "markdown", 243 | "metadata": {}, 244 | "source": [ 245 | "## 3. Create feature columns" 246 | ] 247 | }, 248 | { 249 | "cell_type": "code", 250 | "execution_count": 5, 251 | "metadata": {}, 252 | "outputs": [], 253 | "source": [ 254 | "BOW_FEATURE_NAME = 'bow'\n", 255 | "TFIDF_FEATURE_NAME = 'weight'\n", 256 | "\n", 257 | "def create_feature_columns():\n", 258 | " \n", 259 | " # Get word indecies from bow\n", 260 | " bow = tf.feature_column.categorical_column_with_identity(\n", 261 | " BOW_FEATURE_NAME, num_buckets=VOCAB_SIZE + 1)\n", 262 | " \n", 263 | " # Add weight to the word indecies\n", 264 | " weight_bow = tf.feature_column.weighted_categorical_column(\n", 265 | " bow, TFIDF_FEATURE_NAME)\n", 266 | " \n", 267 | " # Convert to indicator \n", 268 | " weight_bow_indicators = tf.feature_column.indicator_column(weight_bow)\n", 269 | " \n", 270 | " return [weight_bow_indicators]" 271 | ] 272 | }, 273 | { 274 | "cell_type": "markdown", 275 | "metadata": {}, 276 | "source": [ 277 | "## 4. Create a model using a premade DNNClassifer" 278 | ] 279 | }, 280 | { 281 | "cell_type": "code", 282 | "execution_count": 6, 283 | "metadata": {}, 284 | "outputs": [], 285 | "source": [ 286 | "def create_estimator(hparams, run_config):\n", 287 | " \n", 288 | " feature_columns = create_feature_columns()\n", 289 | " \n", 290 | " optimizer = tf.train.AdamOptimizer(learning_rate=hparams.learning_rate)\n", 291 | " \n", 292 | " estimator = tf.estimator.DNNClassifier(\n", 293 | " feature_columns=feature_columns,\n", 294 | " n_classes =len(TARGET_LABELS),\n", 295 | " label_vocabulary=TARGET_LABELS,\n", 296 | " hidden_units=hparams.hidden_units,\n", 297 | " optimizer=optimizer,\n", 298 | " config=run_config\n", 299 | " )\n", 300 | " \n", 301 | " \n", 302 | " return estimator" 303 | ] 304 | }, 305 | { 306 | "cell_type": "markdown", 307 | "metadata": {}, 308 | "source": [ 309 | "## 5. Setup Experiment" 310 | ] 311 | }, 312 | { 313 | "cell_type": "markdown", 314 | "metadata": {}, 315 | "source": [ 316 | "### 5.1 HParams and RunConfig" 317 | ] 318 | }, 319 | { 320 | "cell_type": "code", 321 | "execution_count": 7, 322 | "metadata": {}, 323 | "outputs": [ 324 | { 325 | "name": "stdout", 326 | "output_type": "stream", 327 | "text": [ 328 | "[('batch_size', 1000), ('hidden_units', [64, 32]), ('learning_rate', 0.01), ('max_steps', 730), ('num_epochs', 10), ('trainable_embedding', False)]\n", 329 | "\n", 330 | "('Model Directory:', 'models/news/dnn_estimator_tfidf')\n", 331 | "('Dataset Size:', 73124)\n", 332 | "('Batch Size:', 1000)\n", 333 | "('Steps per Epoch:', 73)\n", 334 | "('Total Steps:', 730)\n" 335 | ] 336 | } 337 | ], 338 | "source": [ 339 | "NUM_EPOCHS = 10\n", 340 | "BATCH_SIZE = 1000\n", 341 | "\n", 342 | "TOTAL_STEPS = (TRAIN_SIZE/BATCH_SIZE)*NUM_EPOCHS\n", 343 | "EVAL_EVERY_SEC = 60\n", 344 | "\n", 345 | "hparams = tf.contrib.training.HParams(\n", 346 | " num_epochs = NUM_EPOCHS,\n", 347 | " batch_size = BATCH_SIZE,\n", 348 | " learning_rate = 0.01,\n", 349 | " hidden_units=[64, 32],\n", 350 | " max_steps = TOTAL_STEPS,\n", 351 | "\n", 352 | ")\n", 353 | "\n", 354 | "MODEL_NAME = 'dnn_estimator_tfidf' \n", 355 | "model_dir = os.path.join(Params.MODELS_DIR, MODEL_NAME)\n", 356 | "\n", 357 | "run_config = tf.estimator.RunConfig(\n", 358 | " tf_random_seed=19830610,\n", 359 | " log_step_count_steps=1000,\n", 360 | " save_checkpoints_secs=EVAL_EVERY_SEC,\n", 361 | " keep_checkpoint_max=1,\n", 362 | " model_dir=model_dir\n", 363 | ")\n", 364 | "\n", 365 | "\n", 366 | "print(hparams)\n", 367 | "print(\"\")\n", 368 | "print(\"Model Directory:\", run_config.model_dir)\n", 369 | "print(\"Dataset Size:\", TRAIN_SIZE)\n", 370 | "print(\"Batch Size:\", BATCH_SIZE)\n", 371 | "print(\"Steps per Epoch:\",TRAIN_SIZE/BATCH_SIZE)\n", 372 | "print(\"Total Steps:\", TOTAL_STEPS)" 373 | ] 374 | }, 375 | { 376 | "cell_type": "markdown", 377 | "metadata": {}, 378 | "source": [ 379 | "### 5.2 Serving function" 380 | ] 381 | }, 382 | { 383 | "cell_type": "code", 384 | "execution_count": 8, 385 | "metadata": {}, 386 | "outputs": [], 387 | "source": [ 388 | "def generate_serving_input_fn():\n", 389 | " \n", 390 | " def _serving_fn():\n", 391 | " \n", 392 | " receiver_tensor = {\n", 393 | " 'title': tf.placeholder(dtype=tf.string, shape=[None])\n", 394 | " }\n", 395 | "\n", 396 | " _, transformed_features = (\n", 397 | " saved_transform_io.partially_apply_saved_transform(\n", 398 | " os.path.join(Params.TRANSFORM_ARTEFACTS_DIR, transform_fn_io.TRANSFORM_FN_DIR),\n", 399 | " receiver_tensor)\n", 400 | " )\n", 401 | " \n", 402 | " return tf.estimator.export.ServingInputReceiver(\n", 403 | " transformed_features, receiver_tensor)\n", 404 | " \n", 405 | " return _serving_fn" 406 | ] 407 | }, 408 | { 409 | "cell_type": "markdown", 410 | "metadata": {}, 411 | "source": [ 412 | "### 5.3 TrainSpec & EvalSpec" 413 | ] 414 | }, 415 | { 416 | "cell_type": "code", 417 | "execution_count": 9, 418 | "metadata": {}, 419 | "outputs": [], 420 | "source": [ 421 | "train_spec = tf.estimator.TrainSpec(\n", 422 | " input_fn = generate_tfrecords_input_fn(\n", 423 | " Params.TRANSFORMED_TRAIN_DATA_FILE_PREFIX+\"*\",\n", 424 | " mode = tf.estimator.ModeKeys.TRAIN,\n", 425 | " num_epochs=hparams.num_epochs,\n", 426 | " batch_size=hparams.batch_size\n", 427 | " ),\n", 428 | " max_steps=hparams.max_steps,\n", 429 | " hooks=None\n", 430 | ")\n", 431 | "\n", 432 | "eval_spec = tf.estimator.EvalSpec(\n", 433 | " input_fn = generate_tfrecords_input_fn(\n", 434 | " Params.TRANSFORMED_EVAL_DATA_FILE_PREFIX+\"*\",\n", 435 | " mode=tf.estimator.ModeKeys.EVAL,\n", 436 | " num_epochs=1,\n", 437 | " batch_size=hparams.batch_size\n", 438 | " ),\n", 439 | " exporters=[tf.estimator.LatestExporter(\n", 440 | " name=\"estimate\", # the name of the folder in which the model will be exported to under export\n", 441 | " serving_input_receiver_fn=generate_serving_input_fn(),\n", 442 | " exports_to_keep=1,\n", 443 | " as_text=False)],\n", 444 | " steps=None,\n", 445 | " throttle_secs=EVAL_EVERY_SEC\n", 446 | ")" 447 | ] 448 | }, 449 | { 450 | "cell_type": "markdown", 451 | "metadata": {}, 452 | "source": [ 453 | "## 6. Run experiment" 454 | ] 455 | }, 456 | { 457 | "cell_type": "code", 458 | "execution_count": 10, 459 | "metadata": {}, 460 | "outputs": [ 461 | { 462 | "name": "stdout", 463 | "output_type": "stream", 464 | "text": [ 465 | "Removing previous training artefacts...\n", 466 | "Experiment started at 16:13:21\n", 467 | ".......................................\n", 468 | "INFO:tensorflow:Using config: {'_save_checkpoints_secs': 60, '_session_config': None, '_keep_checkpoint_max': 1, '_tf_random_seed': 19830610, '_task_type': 'worker', '_global_id_in_cluster': 0, '_is_chief': True, '_cluster_spec': , '_model_dir': 'models/news/dnn_estimator_tfidf', '_num_worker_replicas': 1, '_task_id': 0, '_log_step_count_steps': 1000, '_master': '', '_save_checkpoints_steps': None, '_keep_checkpoint_every_n_hours': 10000, '_evaluation_master': '', '_service': None, '_save_summary_steps': 100, '_num_ps_replicas': 0}\n", 469 | "INFO:tensorflow:Running training and evaluation locally (non-distributed).\n", 470 | "INFO:tensorflow:Start train and evaluate loop. The evaluate will happen after 60 secs (eval_spec.throttle_secs) or training is finished.\n", 471 | "INFO:tensorflow:Calling model_fn.\n", 472 | "INFO:tensorflow:Done calling model_fn.\n", 473 | "INFO:tensorflow:Create CheckpointSaverHook.\n", 474 | "INFO:tensorflow:Graph was finalized.\n", 475 | "INFO:tensorflow:Running local_init_op.\n", 476 | "INFO:tensorflow:Done running local_init_op.\n", 477 | "INFO:tensorflow:Saving checkpoints for 1 into models/news/dnn_estimator_tfidf/model.ckpt.\n", 478 | "INFO:tensorflow:loss = 1098.7266, step = 1\n", 479 | "INFO:tensorflow:loss = 213.40088, step = 101 (15.307 sec)\n", 480 | "INFO:tensorflow:loss = 147.65674, step = 201 (13.971 sec)\n", 481 | "INFO:tensorflow:loss = 71.7646, step = 301 (15.121 sec)\n", 482 | "INFO:tensorflow:Saving checkpoints for 392 into models/news/dnn_estimator_tfidf/model.ckpt.\n", 483 | "INFO:tensorflow:Loss for final step: 26.048763.\n", 484 | "INFO:tensorflow:Calling model_fn.\n", 485 | "INFO:tensorflow:Done calling model_fn.\n", 486 | "INFO:tensorflow:Starting evaluation at 2018-05-14-16:14:22\n", 487 | "INFO:tensorflow:Graph was finalized.\n", 488 | "INFO:tensorflow:Restoring parameters from models/news/dnn_estimator_tfidf/model.ckpt-392\n", 489 | "INFO:tensorflow:Running local_init_op.\n", 490 | "INFO:tensorflow:Done running local_init_op.\n", 491 | "INFO:tensorflow:Finished evaluation at 2018-05-14-16:14:25\n", 492 | "INFO:tensorflow:Saving dict for global step 392: accuracy = 0.8243858, average_loss = 0.94847244, global_step = 392, loss = 912.07477\n", 493 | "WARNING:tensorflow:Expected binary or unicode string, got type_url: \"type.googleapis.com/tensorflow.AssetFileDef\"\n", 494 | "value: \"\\n\\t\\n\\007Const:0\\022\\033vocab_string_to_int_uniques\"\n", 495 | "\n", 496 | "INFO:tensorflow:Calling model_fn.\n", 497 | "INFO:tensorflow:Done calling model_fn.\n", 498 | "INFO:tensorflow:Signatures INCLUDED in export for Classify: ['serving_default', 'classification']\n", 499 | "INFO:tensorflow:Signatures INCLUDED in export for Regress: None\n", 500 | "INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']\n", 501 | "INFO:tensorflow:Restoring parameters from models/news/dnn_estimator_tfidf/model.ckpt-392\n", 502 | "INFO:tensorflow:Assets added to graph.\n", 503 | "INFO:tensorflow:Assets written to: models/news/dnn_estimator_tfidf/export/estimate/temp-1526314465/assets\n", 504 | "INFO:tensorflow:SavedModel written to: models/news/dnn_estimator_tfidf/export/estimate/temp-1526314465/saved_model.pb\n", 505 | "INFO:tensorflow:Calling model_fn.\n", 506 | "INFO:tensorflow:Done calling model_fn.\n", 507 | "INFO:tensorflow:Create CheckpointSaverHook.\n", 508 | "INFO:tensorflow:Graph was finalized.\n", 509 | "INFO:tensorflow:Restoring parameters from models/news/dnn_estimator_tfidf/model.ckpt-392\n", 510 | "INFO:tensorflow:Running local_init_op.\n", 511 | "INFO:tensorflow:Done running local_init_op.\n", 512 | "INFO:tensorflow:Saving checkpoints for 393 into models/news/dnn_estimator_tfidf/model.ckpt.\n", 513 | "INFO:tensorflow:loss = 27.088547, step = 393\n", 514 | "INFO:tensorflow:loss = 2.9095829, step = 493 (13.979 sec)\n", 515 | "INFO:tensorflow:loss = 4.3351374, step = 593 (13.651 sec)\n", 516 | "INFO:tensorflow:loss = 11.017786, step = 693 (14.415 sec)\n", 517 | "INFO:tensorflow:Saving checkpoints for 730 into models/news/dnn_estimator_tfidf/model.ckpt.\n", 518 | "INFO:tensorflow:Loss for final step: 3.2552278.\n", 519 | "INFO:tensorflow:Calling model_fn.\n", 520 | "INFO:tensorflow:Done calling model_fn.\n", 521 | "INFO:tensorflow:Starting evaluation at 2018-05-14-16:15:15\n", 522 | "INFO:tensorflow:Graph was finalized.\n", 523 | "INFO:tensorflow:Restoring parameters from models/news/dnn_estimator_tfidf/model.ckpt-730\n", 524 | "INFO:tensorflow:Running local_init_op.\n", 525 | "INFO:tensorflow:Done running local_init_op.\n", 526 | "INFO:tensorflow:Finished evaluation at 2018-05-14-16:15:17\n", 527 | "INFO:tensorflow:Saving dict for global step 730: accuracy = 0.82416916, average_loss = 1.344607, global_step = 730, loss = 1293.0077\n", 528 | "WARNING:tensorflow:Expected binary or unicode string, got type_url: \"type.googleapis.com/tensorflow.AssetFileDef\"\n", 529 | "value: \"\\n\\t\\n\\007Const:0\\022\\033vocab_string_to_int_uniques\"\n", 530 | "\n", 531 | "INFO:tensorflow:Calling model_fn.\n", 532 | "INFO:tensorflow:Done calling model_fn.\n", 533 | "INFO:tensorflow:Signatures INCLUDED in export for Classify: ['serving_default', 'classification']\n", 534 | "INFO:tensorflow:Signatures INCLUDED in export for Regress: None\n", 535 | "INFO:tensorflow:Signatures INCLUDED in export for Predict: ['predict']\n", 536 | "INFO:tensorflow:Restoring parameters from models/news/dnn_estimator_tfidf/model.ckpt-730\n", 537 | "INFO:tensorflow:Assets added to graph.\n", 538 | "INFO:tensorflow:Assets written to: models/news/dnn_estimator_tfidf/export/estimate/temp-1526314518/assets\n", 539 | "INFO:tensorflow:SavedModel written to: models/news/dnn_estimator_tfidf/export/estimate/temp-1526314518/saved_model.pb\n", 540 | ".......................................\n", 541 | "Experiment finished at 16:15:18\n", 542 | "\n", 543 | "Experiment elapsed time: 117.021302 seconds\n" 544 | ] 545 | } 546 | ], 547 | "source": [ 548 | "from datetime import datetime\n", 549 | "import shutil\n", 550 | "\n", 551 | "if Params.TRAIN:\n", 552 | " if not Params.RESUME_TRAINING:\n", 553 | " print(\"Removing previous training artefacts...\")\n", 554 | " shutil.rmtree(model_dir, ignore_errors=True)\n", 555 | " else:\n", 556 | " print(\"Resuming training...\") \n", 557 | "\n", 558 | "\n", 559 | " tf.logging.set_verbosity(tf.logging.INFO)\n", 560 | "\n", 561 | " time_start = datetime.utcnow() \n", 562 | " print(\"Experiment started at {}\".format(time_start.strftime(\"%H:%M:%S\")))\n", 563 | " print(\".......................................\") \n", 564 | "\n", 565 | " estimator = create_estimator(hparams, run_config)\n", 566 | "\n", 567 | " tf.estimator.train_and_evaluate(\n", 568 | " estimator=estimator,\n", 569 | " train_spec=train_spec, \n", 570 | " eval_spec=eval_spec\n", 571 | " )\n", 572 | "\n", 573 | " time_end = datetime.utcnow() \n", 574 | " print(\".......................................\")\n", 575 | " print(\"Experiment finished at {}\".format(time_end.strftime(\"%H:%M:%S\")))\n", 576 | " print(\"\")\n", 577 | " time_elapsed = time_end - time_start\n", 578 | " print(\"Experiment elapsed time: {} seconds\".format(time_elapsed.total_seconds()))\n", 579 | "else:\n", 580 | " print \"Training was skipped!\"" 581 | ] 582 | }, 583 | { 584 | "cell_type": "markdown", 585 | "metadata": {}, 586 | "source": [ 587 | "## 7. Evaluate the model" 588 | ] 589 | }, 590 | { 591 | "cell_type": "code", 592 | "execution_count": 11, 593 | "metadata": {}, 594 | "outputs": [ 595 | { 596 | "name": "stdout", 597 | "output_type": "stream", 598 | "text": [ 599 | "############################################################################################\n", 600 | "# Train Measures: {'average_loss': 0.0037224626, 'accuracy': 0.99904275, 'global_step': 730, 'loss': 272.20135}\n", 601 | "############################################################################################\n", 602 | "\n", 603 | "############################################################################################\n", 604 | "# Eval Measures: {'average_loss': 1.3446056, 'accuracy': 0.82416916, 'global_step': 730, 'loss': 31032.152}\n", 605 | "############################################################################################\n" 606 | ] 607 | } 608 | ], 609 | "source": [ 610 | "tf.logging.set_verbosity(tf.logging.ERROR)\n", 611 | "\n", 612 | "estimator = create_estimator(hparams, run_config)\n", 613 | "\n", 614 | "train_metrics = estimator.evaluate(\n", 615 | " input_fn = generate_tfrecords_input_fn(\n", 616 | " files_pattern= Params.TRANSFORMED_TRAIN_DATA_FILE_PREFIX+\"*\", \n", 617 | " mode= tf.estimator.ModeKeys.EVAL,\n", 618 | " batch_size= TRAIN_SIZE), \n", 619 | " steps=1\n", 620 | ")\n", 621 | "\n", 622 | "\n", 623 | "print(\"############################################################################################\")\n", 624 | "print(\"# Train Measures: {}\".format(train_metrics))\n", 625 | "print(\"############################################################################################\")\n", 626 | "\n", 627 | "eval_metrics = estimator.evaluate(\n", 628 | " input_fn=generate_tfrecords_input_fn(\n", 629 | " files_pattern= Params.TRANSFORMED_EVAL_DATA_FILE_PREFIX+\"*\", \n", 630 | " mode= tf.estimator.ModeKeys.EVAL,\n", 631 | " batch_size= EVAL_SIZE), \n", 632 | " steps=1\n", 633 | ")\n", 634 | "print(\"\")\n", 635 | "print(\"############################################################################################\")\n", 636 | "print(\"# Eval Measures: {}\".format(eval_metrics))\n", 637 | "print(\"############################################################################################\")\n" 638 | ] 639 | }, 640 | { 641 | "cell_type": "markdown", 642 | "metadata": {}, 643 | "source": [ 644 | "## 8. Use Saved Model for Predictions" 645 | ] 646 | }, 647 | { 648 | "cell_type": "code", 649 | "execution_count": 12, 650 | "metadata": {}, 651 | "outputs": [ 652 | { 653 | "name": "stdout", 654 | "output_type": "stream", 655 | "text": [ 656 | "models/news/dnn_estimator_tfidf/export/estimate/1526314518\n", 657 | "\n", 658 | "{u'probabilities': array([[0.96217114, 0.01375495, 0.02407398],\n", 659 | " [0.02322701, 0.39720485, 0.5795681 ],\n", 660 | " [0.03017025, 0.9552083 , 0.01462139]], dtype=float32), u'class_ids': array([[0],\n", 661 | " [2],\n", 662 | " [1]]), u'classes': array([['github'],\n", 663 | " ['techcrunch'],\n", 664 | " ['nytimes']], dtype=object), u'logits': array([[ 2.4457023, -1.8020908, -1.2423583],\n", 665 | " [-2.1229138, 0.7162221, 1.0940531],\n", 666 | " [-0.9709409, 2.4841323, -1.6953117]], dtype=float32)}\n" 667 | ] 668 | } 669 | ], 670 | "source": [ 671 | "import os\n", 672 | "\n", 673 | "export_dir = model_dir +\"/export/estimate/\"\n", 674 | "saved_model_dir = os.path.join(export_dir, os.listdir(export_dir)[0])\n", 675 | "\n", 676 | "print(saved_model_dir)\n", 677 | "print(\"\")\n", 678 | "\n", 679 | "predictor_fn = tf.contrib.predictor.from_saved_model(\n", 680 | " export_dir = saved_model_dir,\n", 681 | " signature_def_key=\"predict\"\n", 682 | ")\n", 683 | "\n", 684 | "output = predictor_fn(\n", 685 | " {\n", 686 | " 'title':[\n", 687 | " 'Microsoft and Google are joining forces for a new AI framework',\n", 688 | " 'A new version of Python is mind blowing',\n", 689 | " 'EU is investigating new data privacy policies'\n", 690 | " ]\n", 691 | " \n", 692 | " }\n", 693 | ")\n", 694 | "print(output)" 695 | ] 696 | }, 697 | { 698 | "cell_type": "code", 699 | "execution_count": null, 700 | "metadata": {}, 701 | "outputs": [], 702 | "source": [] 703 | } 704 | ], 705 | "metadata": { 706 | "kernelspec": { 707 | "display_name": "Python 2", 708 | "language": "python", 709 | "name": "python2" 710 | }, 711 | "language_info": { 712 | "codemirror_mode": { 713 | "name": "ipython", 714 | "version": 2 715 | }, 716 | "file_extension": ".py", 717 | "mimetype": "text/x-python", 718 | "name": "python", 719 | "nbconvert_exporter": "python", 720 | "pygments_lexer": "ipython2", 721 | "version": "2.7.10" 722 | } 723 | }, 724 | "nbformat": 4, 725 | "nbformat_minor": 2 726 | } 727 | -------------------------------------------------------------------------------- /08 - Text Analysis/data/sms-spam/n_words.tsv: -------------------------------------------------------------------------------- 1 | 11330 -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # TensorFlow Estimator APIs Tutorials - TensorFlow v1.4 2 | 3 | ## The tutorials use the TF estimator APIs to cover: 4 | 5 | * Various ML tasks, currently covering: 6 | * Classification 7 | * Regression 8 | * Clustering (k-means) 9 | * Time-series Analysis (AR Models) 10 | * Dimensionality Reduction (Autoencoding) 11 | * Sequence Models (RNN and LSTMs) 12 | * Image Analysis (CNN for Image Classification) 13 | * Text Analysis (Text Classification with embeddings, CNN, and RNN) 14 | * How to use **canned estimators** to train ML models. 15 | 16 | * How to implement **custom estimators** (model_fn & EstimatorSpec). 17 | 18 | * A standard **metadata-driven** approach to build the model **feature_column**(s) including: 19 | * **numerical** features 20 | * **categorical** features with **vocabulary**, 21 | * **categorical** features **hash bucket**, and 22 | * **categorical** features with **identity** 23 | 24 | * Data **input pipelines** (input_fn) using: 25 | * tf.estimator.inputs.**pandas_input_fn**, 26 | * tf.train.**string_input_producer**, and 27 | * tf.data.**Dataset** APIs to read both **.csv** and **.tfrecords** (tf.example) data files 28 | * tf.contrib.timeseries.**RandomWindowInputFn** and **WholeDatasetInputFn** for time-series data 29 | * Feature **preprocessing** and **creation** as part of reading data (input_fn), for example, sin, sqrt, polynomial expansion, fourier transform, log, boolean comparisons, euclidean distance, custom formulas, etc. 30 | 31 | * A standard approach to prepare **wide** (sparse) and **deep** (dense) feature_column(s) for Wide and Deep **DNN Liner Combined Models** 32 | 33 | * The use of **normalizer_fn** in numeric_column() to **scale** the numeric features using pre-computed statistics (for Min-Max or Standard scaling) 34 | 35 | * The use of **weight_column** in the canned estimators, and in the loss metric in custom estimators. 36 | 37 | * Implicit **Feature Engineering** as part of defining feature_colum(s), including: 38 | * crossing, 39 | * clipping, 40 | * embedding, 41 | * indicators (encoding categorical features), and 42 | * bucketization 43 | * How to use the tf.contrib.learn.**experiment** APIs to train, evaluate, and export models 44 | 45 | * Howe to use the tf.estimator.**train_and_evaluate** function (along with trainSpec & evalSpec) train, evaluate, and export models 46 | 47 | * How to use **tf.train.exponential_decay** function as a learning rate scheduler 48 | 49 | * How to **serve** exported model (export_savedmodel) using **csv** and **json** inputs 50 | 51 | ## Coming Soon: 52 | * Early-stopping implementation 53 | * DynamicRnnEstimator and the use of variable-length sequences 54 | * Collaborative Filtering for Recommendation Models 55 | * Text Analysis (Topic Models, Word/Doc embedding, etc.) 56 | * tf.Transform to preprocessing and feature engineering 57 | * keras examples 58 | 59 | 60 | 61 | -------------------------------------------------------------------------------- /images/exp-api2.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/ksalama/tf-estimator-tutorials/cecfea0c378ebc8552941c9ebf8a530228dd845d/images/exp-api2.png --------------------------------------------------------------------------------