├── README.md ├── notebooks ├── hyperparameter_optimization_skopt.ipynb └── hypertuning_kerastuner.ipynb └── slides └── hyperparemeter_optimization.pdf /README.md: -------------------------------------------------------------------------------- 1 | # Data Science Demos 2 | ## About 3 | Repository containing data science concepts and ideas presented in jupyter notebooks, 4 | write-ups, and sometimes presentation slides. 5 | 6 | Each concept is linked to the associated write-up on my blog hosted by Medium. 7 | The notebook is linked below each concept name. 8 | 9 | ## Installation 10 | Environments differ notebook to notebook. At the beggining of each notebook, 11 | the package versions pertaining to the concept or idea are printed out for ease 12 | of reproducibility. 13 | 14 | ## Concepts 15 | - [Hyperparameter Optimization with Scikit-Learn, Scikit-Opt, and Keras](https://towardsdatascience.com/hyperparameter-optimization-with-scikit-learn-scikit-opt-and-keras-f13367f3e796) 16 | - [Notebook](https://github.com/lukenew2/ds-demos/blob/master/notebooks/hyper_parameter_optimization.ipynb) 17 | - [HyperTuning with KerasTuner and TensorFlow](https://towardsdatascience.com/hyperparameter-tuning-with-kerastuner-and-tensorflow-c4a4d690b31a) 18 | - [Notebook](https://github.com/lukenew2/ds-demos/blob/master/notebooks/hypertuning_kerastuner.ipynb) -------------------------------------------------------------------------------- /notebooks/hyperparameter_optimization_skopt.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "code", 5 | "execution_count": 49, 6 | "metadata": {}, 7 | "outputs": [], 8 | "source": [ 9 | "import os\n", 10 | "import matplotlib.pyplot as plt\n", 11 | "import numpy as np\n", 12 | "\n", 13 | "PROJECT_ROOT_DIR = \".\"\n", 14 | "IMAGES_PATH = os.path.join(PROJECT_ROOT_DIR, \"images\")\n", 15 | "os.makedirs(IMAGES_PATH, exist_ok=True)\n", 16 | "\n", 17 | "def save_fig(fig_id, tight_layout=True, fig_extension=\"png\", resolution=300):\n", 18 | " path = os.path.join(IMAGES_PATH, fig_id + \".\" + fig_extension)\n", 19 | " print(\"Saving figure\", fig_id)\n", 20 | " if tight_layout:\n", 21 | " plt.tight_layout()\n", 22 | " plt.savefig(path, format=fig_extension, dpi=resolution)" 23 | ] 24 | }, 25 | { 26 | "cell_type": "markdown", 27 | "metadata": {}, 28 | "source": [ 29 | "# Hyperparameter Optimization\n", 30 | "\n", 31 | "Hyperparameter optimization is often one of the final steps in a data science project. Once you have a shortlist of promising models you will want to fine-tune them so that they perform better on your particular dataset. \n", 32 | "\n", 33 | "In this notebook we will go over three techniques used to find optimal hyperparameters with examples on how to implement them on models in Scikit-Learn and then finally neural networks in Keras. The three techniques we will discuss are as follows:\n", 34 | "\n", 35 | "* Grid Search\n", 36 | "* Randomized Search\n", 37 | "* Bayesian Optimization" 38 | ] 39 | }, 40 | { 41 | "cell_type": "markdown", 42 | "metadata": {}, 43 | "source": [ 44 | "Let's start by loading the MNIST dataset. Keras has a number of functions to load popular datasets in keras.datasets. The dataset is already split for you between a training set and a test set." 45 | ] 46 | }, 47 | { 48 | "cell_type": "code", 49 | "execution_count": 21, 50 | "metadata": {}, 51 | "outputs": [], 52 | "source": [ 53 | "import tensorflow as tf\n", 54 | "from tensorflow import keras" 55 | ] 56 | }, 57 | { 58 | "cell_type": "code", 59 | "execution_count": 94, 60 | "metadata": {}, 61 | "outputs": [], 62 | "source": [ 63 | "mnist = keras.datasets.mnist\n", 64 | "(X_train, y_train), (X_test, y_test) = mnist.load_data()" 65 | ] 66 | }, 67 | { 68 | "cell_type": "markdown", 69 | "metadata": {}, 70 | "source": [ 71 | "The dataset contains 60,000 grayscale images, each 28x28 pixels:" 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": 28, 77 | "metadata": {}, 78 | "outputs": [ 79 | { 80 | "data": { 81 | "text/plain": [ 82 | "(60000, 28, 28)" 83 | ] 84 | }, 85 | "execution_count": 28, 86 | "metadata": {}, 87 | "output_type": "execute_result" 88 | } 89 | ], 90 | "source": [ 91 | "X_train.shape" 92 | ] 93 | }, 94 | { 95 | "cell_type": "code", 96 | "execution_count": 33, 97 | "metadata": {}, 98 | "outputs": [ 99 | { 100 | "data": { 101 | "text/plain": [ 102 | "(10000, 28, 28)" 103 | ] 104 | }, 105 | "execution_count": 33, 106 | "metadata": {}, 107 | "output_type": "execute_result" 108 | } 109 | ], 110 | "source": [ 111 | "X_test.shape" 112 | ] 113 | }, 114 | { 115 | "cell_type": "markdown", 116 | "metadata": {}, 117 | "source": [ 118 | "Each pixel intensity is represented as a byte (0 to 255):" 119 | ] 120 | }, 121 | { 122 | "cell_type": "code", 123 | "execution_count": 29, 124 | "metadata": {}, 125 | "outputs": [ 126 | { 127 | "data": { 128 | "text/plain": [ 129 | "dtype('uint8')" 130 | ] 131 | }, 132 | "execution_count": 29, 133 | "metadata": {}, 134 | "output_type": "execute_result" 135 | } 136 | ], 137 | "source": [ 138 | "X_train.dtype" 139 | ] 140 | }, 141 | { 142 | "cell_type": "markdown", 143 | "metadata": {}, 144 | "source": [ 145 | "To give you a feel for the complexity of the classification task, the figure below shows a few images from the MNIST dataset:" 146 | ] 147 | }, 148 | { 149 | "cell_type": "code", 150 | "execution_count": 30, 151 | "metadata": {}, 152 | "outputs": [ 153 | { 154 | "name": "stdout", 155 | "output_type": "stream", 156 | "text": [ 157 | "Saving figure mnist_plot\n" 158 | ] 159 | }, 160 | { 161 | "data": { 162 | "image/png": "\n", 163 | "text/plain": [ 164 | "
" 165 | ] 166 | }, 167 | "metadata": { 168 | "needs_background": "light" 169 | }, 170 | "output_type": "display_data" 171 | } 172 | ], 173 | "source": [ 174 | "n_rows = 4\n", 175 | "n_cols = 10\n", 176 | "plt.figure(figsize=(n_cols * 1.2, n_rows * 1.2))\n", 177 | "for row in range(n_rows):\n", 178 | " for col in range(n_cols):\n", 179 | " index = n_cols * row + col\n", 180 | " plt.subplot(n_rows, n_cols, index + 1)\n", 181 | " plt.imshow(X_train[index], cmap=\"binary\", interpolation=\"nearest\")\n", 182 | " plt.axis('off')\n", 183 | " plt.title(y_train[index], fontsize=12)\n", 184 | "plt.subplots_adjust(wspace=0.2, hspace=0.5)\n", 185 | "save_fig(\"mnist_plot\")\n", 186 | "plt.show()" 187 | ] 188 | }, 189 | { 190 | "cell_type": "markdown", 191 | "metadata": {}, 192 | "source": [ 193 | "Reshape the dataset into a 2-dimensional array: 60,000 for the number of instances and 784 because 28 x 28 = 784:" 194 | ] 195 | }, 196 | { 197 | "cell_type": "code", 198 | "execution_count": 95, 199 | "metadata": {}, 200 | "outputs": [], 201 | "source": [ 202 | "X_train = X_train.reshape(60000, 784)\n", 203 | "X_test = X_test.reshape(10000, 784)" 204 | ] 205 | }, 206 | { 207 | "cell_type": "markdown", 208 | "metadata": {}, 209 | "source": [ 210 | "# Grid Search\n", 211 | "\n", 212 | "One option would be to fiddle around with the hyperparameters manually, until you find a great combination of hyperparameter values that optimize your performance metric. This would be very tedious work, and you may not have time to explore many combinations. \n", 213 | "\n", 214 | "Instead, you should get Scikit-Learn's ```GridSearchCV``` to do it for you. All you have to do is tell it which hyperparameters you want to experiment with and what values to try out, and it will use cross-validation to evaluate all the possible combinations of hyperparameter values. \n", 215 | "\n", 216 | "Let's work through an example where we use ```GridSearchCV``` to search for the best combination of hyperparameter values for a RandomForestClassifier." 217 | ] 218 | }, 219 | { 220 | "cell_type": "code", 221 | "execution_count": 39, 222 | "metadata": {}, 223 | "outputs": [ 224 | { 225 | "name": "stdout", 226 | "output_type": "stream", 227 | "text": [ 228 | "Fitting 5 folds for each of 32 candidates, totalling 160 fits\n" 229 | ] 230 | }, 231 | { 232 | "name": "stderr", 233 | "output_type": "stream", 234 | "text": [ 235 | "[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.\n", 236 | "[Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 7.6min\n", 237 | "[Parallel(n_jobs=-1)]: Done 160 out of 160 | elapsed: 43.6min finished\n" 238 | ] 239 | }, 240 | { 241 | "data": { 242 | "text/plain": [ 243 | "GridSearchCV(cv=5, error_score='raise-deprecating',\n", 244 | " estimator=RandomForestClassifier(bootstrap=True, class_weight=None,\n", 245 | " criterion='gini', max_depth=None,\n", 246 | " max_features='auto',\n", 247 | " max_leaf_nodes=None,\n", 248 | " min_impurity_decrease=0.0,\n", 249 | " min_impurity_split=None,\n", 250 | " min_samples_leaf=1,\n", 251 | " min_samples_split=2,\n", 252 | " min_weight_fraction_leaf=0.0,\n", 253 | " n_estimators='warn', n_jobs=None,\n", 254 | " oob_score=False,\n", 255 | " random_state=None, verbose=0,\n", 256 | " warm_start=False),\n", 257 | " iid='warn', n_jobs=-1,\n", 258 | " param_grid=[{'bootstrap': [True], 'max_depth': [6, 10],\n", 259 | " 'max_features': ['auto', 'sqrt'],\n", 260 | " 'min_samples_leaf': [3, 5],\n", 261 | " 'min_samples_split': [4, 6],\n", 262 | " 'n_estimators': [100, 350]}],\n", 263 | " pre_dispatch='2*n_jobs', refit=True, return_train_score=True,\n", 264 | " scoring='accuracy', verbose=True)" 265 | ] 266 | }, 267 | "execution_count": 39, 268 | "metadata": {}, 269 | "output_type": "execute_result" 270 | } 271 | ], 272 | "source": [ 273 | "from sklearn.ensemble import RandomForestClassifier\n", 274 | "from sklearn.model_selection import GridSearchCV\n", 275 | "\n", 276 | "param_grid = [{'bootstrap': [True],\n", 277 | " 'max_depth': [6, 10],\n", 278 | " 'max_features': ['auto', 'sqrt'],\n", 279 | " 'min_samples_leaf': [3, 5],\n", 280 | " 'min_samples_split': [4, 6],\n", 281 | " 'n_estimators': [100, 350]}\n", 282 | " ]\n", 283 | "\n", 284 | "forest_clf = RandomForestClassifier()\n", 285 | "\n", 286 | "forest_grid_search = GridSearchCV(forest_clf, param_grid, cv=5,\n", 287 | " scoring=\"accuracy\",\n", 288 | " return_train_score=True,\n", 289 | " verbose=True,\n", 290 | " n_jobs=-1)\n", 291 | "\n", 292 | "forest_grid_search.fit(X_train, y_train)" 293 | ] 294 | }, 295 | { 296 | "cell_type": "markdown", 297 | "metadata": {}, 298 | "source": [ 299 | "The `param_grid` tells Scikit-Learn to evauluate 1 x 2 x 2 x 2 x 2 x 2 combinations of `bootstrap`, `max_depth`, `max_features`, `min_samples_leaf`, `min_samples_split` and `n_estimators` hyperparameter values specified. The grid search will explore 32 combinations of RandomForestClassifier hyperparameter values, and it will train each model 5 times (since we are using five-fold cross-validation). In other words, all in all, there will be 32 x 5 = 160 rounds of training! It may take a long time, but when it is done you can get the best combination of hyperparameters like this:" 300 | ] 301 | }, 302 | { 303 | "cell_type": "code", 304 | "execution_count": 41, 305 | "metadata": {}, 306 | "outputs": [ 307 | { 308 | "data": { 309 | "text/plain": [ 310 | "{'bootstrap': True,\n", 311 | " 'max_depth': 10,\n", 312 | " 'max_features': 'auto',\n", 313 | " 'min_samples_leaf': 3,\n", 314 | " 'min_samples_split': 4,\n", 315 | " 'n_estimators': 350}" 316 | ] 317 | }, 318 | "execution_count": 41, 319 | "metadata": {}, 320 | "output_type": "execute_result" 321 | } 322 | ], 323 | "source": [ 324 | "forest_grid_search.best_params_" 325 | ] 326 | }, 327 | { 328 | "cell_type": "markdown", 329 | "metadata": {}, 330 | "source": [ 331 | "Since n_estimators=350 and max_depth=10 are the maximum values that were evaluated, you should probably try searching again with higher values; the score may continue to improve.\n", 332 | "\n", 333 | "You can also get the best estimator directly:" 334 | ] 335 | }, 336 | { 337 | "cell_type": "code", 338 | "execution_count": 42, 339 | "metadata": {}, 340 | "outputs": [ 341 | { 342 | "data": { 343 | "text/plain": [ 344 | "RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", 345 | " max_depth=10, max_features='auto', max_leaf_nodes=None,\n", 346 | " min_impurity_decrease=0.0, min_impurity_split=None,\n", 347 | " min_samples_leaf=3, min_samples_split=4,\n", 348 | " min_weight_fraction_leaf=0.0, n_estimators=350,\n", 349 | " n_jobs=None, oob_score=False, random_state=None,\n", 350 | " verbose=0, warm_start=False)" 351 | ] 352 | }, 353 | "execution_count": 42, 354 | "metadata": {}, 355 | "output_type": "execute_result" 356 | } 357 | ], 358 | "source": [ 359 | "forest_grid_search.best_estimator_" 360 | ] 361 | }, 362 | { 363 | "cell_type": "markdown", 364 | "metadata": {}, 365 | "source": [ 366 | "And of course the evaluation score is also available:" 367 | ] 368 | }, 369 | { 370 | "cell_type": "code", 371 | "execution_count": 58, 372 | "metadata": {}, 373 | "outputs": [ 374 | { 375 | "data": { 376 | "text/plain": [ 377 | "0.9459" 378 | ] 379 | }, 380 | "execution_count": 58, 381 | "metadata": {}, 382 | "output_type": "execute_result" 383 | } 384 | ], 385 | "source": [ 386 | "forest_grid_search.best_score_" 387 | ] 388 | }, 389 | { 390 | "cell_type": "markdown", 391 | "metadata": {}, 392 | "source": [ 393 | "Our best score here is 0.9459 accuracy which is not bad for such a small paramater grid specified." 394 | ] 395 | }, 396 | { 397 | "cell_type": "markdown", 398 | "metadata": {}, 399 | "source": [ 400 | "# Randomized Search\n", 401 | "\n", 402 | "The grid search approach is fine when you are exploring relatively few combinations, like in the previous example, but when the hyperparameter space is large, it is often preferable to use `RandomizedSearchCV` instead. This class can be used in much the same way as the `GridSearchCV` class, but instead of trying out all possible combinations, it evaluates a given number of random combinations by selecting a random value for each hyperparameter at every iteration. This approach has two main benefits:\n", 403 | "* If you let randomized search run for, say, 1,000 iterations, this approach will explore 1,000 different values for each hyperparameter (instead of just a few values per hyperparameter with the grid search approach). \n", 404 | "* Simply by setting the number of iterations, you have more control over the computing budget you want to allocate to hyperparameter search.\n", 405 | "\n", 406 | "Let's walk through the same example as before but instead use `RandomizedSearchCV`. Since we are using `RandomizedSearchCV` we can search a larger param space than we did with `GridSearchCV`:" 407 | ] 408 | }, 409 | { 410 | "cell_type": "code", 411 | "execution_count": 53, 412 | "metadata": {}, 413 | "outputs": [ 414 | { 415 | "name": "stdout", 416 | "output_type": "stream", 417 | "text": [ 418 | "Fitting 5 folds for each of 32 candidates, totalling 160 fits\n" 419 | ] 420 | }, 421 | { 422 | "name": "stderr", 423 | "output_type": "stream", 424 | "text": [ 425 | "[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.\n", 426 | "[Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 14.6min\n", 427 | "[Parallel(n_jobs=-1)]: Done 160 out of 160 | elapsed: 81.1min finished\n" 428 | ] 429 | }, 430 | { 431 | "data": { 432 | "text/plain": [ 433 | "RandomizedSearchCV(cv=5, error_score='raise-deprecating',\n", 434 | " estimator=RandomForestClassifier(bootstrap=True,\n", 435 | " class_weight=None,\n", 436 | " criterion='gini',\n", 437 | " max_depth=None,\n", 438 | " max_features='auto',\n", 439 | " max_leaf_nodes=None,\n", 440 | " min_impurity_decrease=0.0,\n", 441 | " min_impurity_split=None,\n", 442 | " min_samples_leaf=1,\n", 443 | " min_samples_split=2,\n", 444 | " min_weight_fraction_leaf=0.0,\n", 445 | " n_estimators='warn',\n", 446 | " n_jobs=None,\n", 447 | " oob_sc...\n", 448 | " iid='warn', n_iter=32, n_jobs=-1,\n", 449 | " param_distributions={'bootstrap': [True],\n", 450 | " 'max_depth': [6, 8, 10, 12, 14],\n", 451 | " 'max_features': ['auto', 'sqrt',\n", 452 | " 'log2'],\n", 453 | " 'min_samples_leaf': [2, 3, 4],\n", 454 | " 'min_samples_split': [2, 3, 4, 5],\n", 455 | " 'n_estimators': [100, 200, 300, 400,\n", 456 | " 500, 600, 700, 800,\n", 457 | " 900, 1000]},\n", 458 | " pre_dispatch='2*n_jobs', random_state=42, refit=True,\n", 459 | " return_train_score=False, scoring='accuracy', verbose=True)" 460 | ] 461 | }, 462 | "execution_count": 53, 463 | "metadata": {}, 464 | "output_type": "execute_result" 465 | } 466 | ], 467 | "source": [ 468 | "from sklearn.model_selection import RandomizedSearchCV\n", 469 | "\n", 470 | "param_space = {\"bootstrap\": [True],\n", 471 | " \"max_depth\": [6, 8, 10, 12, 14],\n", 472 | " \"max_features\": ['auto', 'sqrt','log2'],\n", 473 | " \"min_samples_leaf\": [2, 3, 4],\n", 474 | " \"min_samples_split\": [2, 3, 4, 5],\n", 475 | " \"n_estimators\": [100, 200, 300, 400, 500, 600, 700, 800, 900, 1000]\n", 476 | "}\n", 477 | "\n", 478 | "forest_rand_search = RandomizedSearchCV(forest_clf, param_space, n_iter=32,\n", 479 | " scoring=\"accuracy\", verbose=True, cv=5,\n", 480 | " n_jobs=-1, random_state=42)\n", 481 | "\n", 482 | "forest_rand_search.fit(X_train, y_train)" 483 | ] 484 | }, 485 | { 486 | "cell_type": "markdown", 487 | "metadata": {}, 488 | "source": [ 489 | "Same as above we can see the best hyperparameters that were explored:" 490 | ] 491 | }, 492 | { 493 | "cell_type": "code", 494 | "execution_count": 54, 495 | "metadata": {}, 496 | "outputs": [ 497 | { 498 | "data": { 499 | "text/plain": [ 500 | "{'n_estimators': 300,\n", 501 | " 'min_samples_split': 4,\n", 502 | " 'min_samples_leaf': 2,\n", 503 | " 'max_features': 'sqrt',\n", 504 | " 'max_depth': 14,\n", 505 | " 'bootstrap': True}" 506 | ] 507 | }, 508 | "execution_count": 54, 509 | "metadata": {}, 510 | "output_type": "execute_result" 511 | } 512 | ], 513 | "source": [ 514 | "forest_rand_search.best_params_" 515 | ] 516 | }, 517 | { 518 | "cell_type": "markdown", 519 | "metadata": {}, 520 | "source": [ 521 | "Also the best estimator:" 522 | ] 523 | }, 524 | { 525 | "cell_type": "code", 526 | "execution_count": 55, 527 | "metadata": {}, 528 | "outputs": [ 529 | { 530 | "data": { 531 | "text/plain": [ 532 | "RandomForestClassifier(bootstrap=True, class_weight=None, criterion='gini',\n", 533 | " max_depth=14, max_features='sqrt', max_leaf_nodes=None,\n", 534 | " min_impurity_decrease=0.0, min_impurity_split=None,\n", 535 | " min_samples_leaf=2, min_samples_split=4,\n", 536 | " min_weight_fraction_leaf=0.0, n_estimators=300,\n", 537 | " n_jobs=None, oob_score=False, random_state=None,\n", 538 | " verbose=0, warm_start=False)" 539 | ] 540 | }, 541 | "execution_count": 55, 542 | "metadata": {}, 543 | "output_type": "execute_result" 544 | } 545 | ], 546 | "source": [ 547 | "forest_rand_search.best_estimator_" 548 | ] 549 | }, 550 | { 551 | "cell_type": "markdown", 552 | "metadata": {}, 553 | "source": [ 554 | "And check the best score:" 555 | ] 556 | }, 557 | { 558 | "cell_type": "code", 559 | "execution_count": 57, 560 | "metadata": {}, 561 | "outputs": [ 562 | { 563 | "data": { 564 | "text/plain": [ 565 | "0.9620666666666666" 566 | ] 567 | }, 568 | "execution_count": 57, 569 | "metadata": {}, 570 | "output_type": "execute_result" 571 | } 572 | ], 573 | "source": [ 574 | "forest_rand_search.best_score_" 575 | ] 576 | }, 577 | { 578 | "cell_type": "markdown", 579 | "metadata": {}, 580 | "source": [ 581 | "Our best performance was 0.9621 accuracy beating `GridSearchCV` by 1.5%. As you can see since `RandomizedSearchCV` allows us to explore a larger hyperparameter space in relatively the same amount of time and generally outputs better results than `GridSearchCV`. \n", 582 | "\n", 583 | "You can now save this model, evaluate it on the test set, and, if you are satisfied with its performance, deploy it to production. Using randomized search is not too hard, and it works well for many fairly simple problems. When training is slow, however, (e.g., for more complex problems with larger datasets), this approach will only explore a tiny portion of the hyperparameter space. You can partially alleviate this problem by assisting the search process manually: first run a quick random search using wide ranges of hyperparameter values, then run another search using smaller ranges of values centered on the best ones found during the first run, and so on. This approach will hopefully zoom in on a good set of hyperparameters. However, it's very time consuming, and probably not the best use of your time." 584 | ] 585 | }, 586 | { 587 | "cell_type": "markdown", 588 | "metadata": {}, 589 | "source": [ 590 | "# Bayesian Optimization\n", 591 | "\n", 592 | "Fortunately, there are many techniques to explore a search space much more efficiently than randomly. Their core idea is simple: when a region of the space turns out to be good, it should be explored more. Such techniques take care of the \"zooming\" process for you and lead to much better solutions in much less time. \n", 593 | "\n", 594 | "One such technique is called Bayesian Optimization and we will use Scikit-Optimize (Skopt) [https://scikit-optimize.github.io/](https://scikit-optimize.github.io) to perform Bayesian Optimization. Skopt is a general-purpose optimization library that performs Bayesian Optimization with its class `BayesSearchCV` using an interface similar to `GridSearchCV`. \n", 595 | "\n", 596 | "If you don't have Skopt already installed go ahead and run the following line of code in your virtual environment:" 597 | ] 598 | }, 599 | { 600 | "cell_type": "code", 601 | "execution_count": 61, 602 | "metadata": {}, 603 | "outputs": [ 604 | { 605 | "name": "stdout", 606 | "output_type": "stream", 607 | "text": [ 608 | "Collecting scikit-optimize\n", 609 | "\u001b[?25l Downloading https://files.pythonhosted.org/packages/5c/87/310b52debfbc0cb79764e5770fa3f5c18f6f0754809ea9e2fc185e1b67d3/scikit_optimize-0.7.4-py2.py3-none-any.whl (80kB)\n", 610 | "\u001b[K |████████████████████████████████| 81kB 2.4MB/s eta 0:00:011\n", 611 | "\u001b[?25hCollecting pyaml>=16.9 (from scikit-optimize)\n", 612 | " Downloading https://files.pythonhosted.org/packages/15/c4/1310a054d33abc318426a956e7d6df0df76a6ddfa9c66f6310274fb75d42/pyaml-20.4.0-py2.py3-none-any.whl\n", 613 | "Requirement already satisfied: scipy>=0.18.0 in /anaconda3/envs/metis/lib/python3.7/site-packages (from scikit-optimize) (1.4.1)\n", 614 | "Requirement already satisfied: scikit-learn>=0.19.1 in /anaconda3/envs/metis/lib/python3.7/site-packages (from scikit-optimize) (0.21.3)\n", 615 | "Requirement already satisfied: joblib>=0.11 in /anaconda3/envs/metis/lib/python3.7/site-packages (from scikit-optimize) (0.13.2)\n", 616 | "Requirement already satisfied: numpy>=1.11.0 in /anaconda3/envs/metis/lib/python3.7/site-packages (from scikit-optimize) (1.19.0)\n", 617 | "Requirement already satisfied: PyYAML in /anaconda3/envs/metis/lib/python3.7/site-packages (from pyaml>=16.9->scikit-optimize) (5.1.2)\n", 618 | "Installing collected packages: pyaml, scikit-optimize\n", 619 | "Successfully installed pyaml-20.4.0 scikit-optimize-0.7.4\n" 620 | ] 621 | } 622 | ], 623 | "source": [ 624 | "!pip install scikit-optimize" 625 | ] 626 | }, 627 | { 628 | "cell_type": "markdown", 629 | "metadata": {}, 630 | "source": [ 631 | "There are only 2 main differences when performing Bayesian Optimization using Skopt's `BayesSearchCV`. First, when creating your search space you need to make each hyperparameters space a probability distribution as opposed to using lists for `GridSearchCV`. Skopt makes this easy for you by importing Real, Categorical, and Integer from skopt.space.\n", 632 | "\n", 633 | "* **Real**: Continuous hyperparameter space.\n", 634 | "* **Integer**: Discrete hyperparameter space.\n", 635 | "* **Categorical**: Categorical hyperparameter space.\n", 636 | "\n", 637 | "Below you can see examples of using both the categorical and integer functions. For categorical spaces simply imput a list inside the function. For Integer spaces input the minimum and maximum values you want `BayesSearchCV` to explore." 638 | ] 639 | }, 640 | { 641 | "cell_type": "code", 642 | "execution_count": 64, 643 | "metadata": {}, 644 | "outputs": [ 645 | { 646 | "name": "stdout", 647 | "output_type": "stream", 648 | "text": [ 649 | "best score: 0.9336833333333333\n", 650 | "best score: 0.9495833333333333\n", 651 | "best score: 0.9495833333333333\n", 652 | "best score: 0.96815\n", 653 | "best score: 0.96815\n", 654 | "best score: 0.96815\n", 655 | "best score: 0.96815\n", 656 | "best score: 0.96815\n", 657 | "best score: 0.96815\n", 658 | "best score: 0.96815\n", 659 | "best score: 0.96815\n", 660 | "best score: 0.9697333333333333\n", 661 | "best score: 0.9697333333333333\n", 662 | "best score: 0.9697333333333333\n", 663 | "best score: 0.9697333333333333\n", 664 | "best score: 0.9697333333333333\n", 665 | "best score: 0.9697333333333333\n", 666 | "best score: 0.9697333333333333\n", 667 | "best score: 0.9697333333333333\n", 668 | "best score: 0.9697333333333333\n", 669 | "best score: 0.9697333333333333\n", 670 | "best score: 0.9697333333333333\n", 671 | "best score: 0.9697333333333333\n", 672 | "best score: 0.9697333333333333\n", 673 | "best score: 0.9697333333333333\n", 674 | "best score: 0.9697333333333333\n", 675 | "best score: 0.9697333333333333\n" 676 | ] 677 | }, 678 | { 679 | "name": "stderr", 680 | "output_type": "stream", 681 | "text": [ 682 | "//anaconda3/envs/metis/lib/python3.7/site-packages/skopt/optimizer/optimizer.py:409: UserWarning: The objective has been evaluated at this point before.\n", 683 | " warnings.warn(\"The objective has been evaluated \"\n" 684 | ] 685 | }, 686 | { 687 | "name": "stdout", 688 | "output_type": "stream", 689 | "text": [ 690 | "best score: 0.97\n", 691 | "best score: 0.97\n", 692 | "best score: 0.97\n", 693 | "best score: 0.97\n", 694 | "best score: 0.97\n" 695 | ] 696 | }, 697 | { 698 | "data": { 699 | "text/plain": [ 700 | "BayesSearchCV(cv=5, error_score='raise',\n", 701 | " estimator=RandomForestClassifier(bootstrap=True,\n", 702 | " class_weight=None,\n", 703 | " criterion='gini', max_depth=None,\n", 704 | " max_features='auto',\n", 705 | " max_leaf_nodes=None,\n", 706 | " min_impurity_decrease=0.0,\n", 707 | " min_impurity_split=None,\n", 708 | " min_samples_leaf=1,\n", 709 | " min_samples_split=2,\n", 710 | " min_weight_fraction_leaf=0.0,\n", 711 | " n_estimators='warn', n_jobs=None,\n", 712 | " oob_score=False,\n", 713 | " random_...\n", 714 | " 'max_depth': Integer(low=6, high=20, prior='uniform', transform='identity'),\n", 715 | " 'max_features': Categorical(categories=('auto', 'sqrt', 'log2'), prior=None),\n", 716 | " 'min_samples_leaf': Integer(low=2, high=10, prior='uniform', transform='identity'),\n", 717 | " 'min_samples_split': Integer(low=2, high=10, prior='uniform', transform='identity'),\n", 718 | " 'n_estimators': Integer(low=100, high=500, prior='uniform', transform='identity')},\n", 719 | " verbose=0)" 720 | ] 721 | }, 722 | "execution_count": 64, 723 | "metadata": {}, 724 | "output_type": "execute_result" 725 | } 726 | ], 727 | "source": [ 728 | "from skopt import BayesSearchCV\n", 729 | "from skopt.space import Real, Categorical, Integer\n", 730 | "\n", 731 | "search_space = {\"bootstrap\": Categorical([True, False]), # values for boostrap can be either True or False\n", 732 | " \"max_depth\": Integer(6, 20), # values of max_depth are integers from 6 to 20\n", 733 | " \"max_features\": Categorical(['auto', 'sqrt','log2']), \n", 734 | " \"min_samples_leaf\": Integer(2, 10),\n", 735 | " \"min_samples_split\": Integer(2, 10),\n", 736 | " \"n_estimators\": Integer(100, 500)\n", 737 | " }\n", 738 | "\n", 739 | "def on_step(optim_result):\n", 740 | " \"\"\"\n", 741 | " Callback meant to view scores after\n", 742 | " each iteration while performing Bayesian\n", 743 | " Optimization in Skopt\"\"\"\n", 744 | " score = forest_bayes_search.best_score_\n", 745 | " print(\"best score: %s\" % score)\n", 746 | " if score >= 0.98:\n", 747 | " print('Interrupting!')\n", 748 | " return True\n", 749 | "\n", 750 | "forest_bayes_search = BayesSearchCV(forest_clf, search_space, n_iter=32, # specify how many iterations\n", 751 | " scoring=\"accuracy\", n_jobs=-1, cv=5)\n", 752 | "\n", 753 | "forest_bayes_search.fit(X_train, y_train, callback=on_step) # callback=on_step will print score after each iteration" 754 | ] 755 | }, 756 | { 757 | "cell_type": "markdown", 758 | "metadata": {}, 759 | "source": [ 760 | "Just like in Scikit-Learn we can view the best parameters:" 761 | ] 762 | }, 763 | { 764 | "cell_type": "code", 765 | "execution_count": 65, 766 | "metadata": {}, 767 | "outputs": [ 768 | { 769 | "data": { 770 | "text/plain": [ 771 | "OrderedDict([('bootstrap', False),\n", 772 | " ('max_depth', 20),\n", 773 | " ('max_features', 'sqrt'),\n", 774 | " ('min_samples_leaf', 2),\n", 775 | " ('min_samples_split', 2),\n", 776 | " ('n_estimators', 500)])" 777 | ] 778 | }, 779 | "execution_count": 65, 780 | "metadata": {}, 781 | "output_type": "execute_result" 782 | } 783 | ], 784 | "source": [ 785 | "forest_bayes_search.best_params_" 786 | ] 787 | }, 788 | { 789 | "cell_type": "markdown", 790 | "metadata": {}, 791 | "source": [ 792 | "And the best estimator:" 793 | ] 794 | }, 795 | { 796 | "cell_type": "code", 797 | "execution_count": 66, 798 | "metadata": {}, 799 | "outputs": [ 800 | { 801 | "data": { 802 | "text/plain": [ 803 | "RandomForestClassifier(bootstrap=False, class_weight=None, criterion='gini',\n", 804 | " max_depth=20, max_features='sqrt', max_leaf_nodes=None,\n", 805 | " min_impurity_decrease=0.0, min_impurity_split=None,\n", 806 | " min_samples_leaf=2, min_samples_split=2,\n", 807 | " min_weight_fraction_leaf=0.0, n_estimators=500,\n", 808 | " n_jobs=None, oob_score=False, random_state=None,\n", 809 | " verbose=0, warm_start=False)" 810 | ] 811 | }, 812 | "execution_count": 66, 813 | "metadata": {}, 814 | "output_type": "execute_result" 815 | } 816 | ], 817 | "source": [ 818 | "forest_bayes_search.best_estimator_" 819 | ] 820 | }, 821 | { 822 | "cell_type": "markdown", 823 | "metadata": {}, 824 | "source": [ 825 | "And the best score:" 826 | ] 827 | }, 828 | { 829 | "cell_type": "code", 830 | "execution_count": 67, 831 | "metadata": {}, 832 | "outputs": [ 833 | { 834 | "data": { 835 | "text/plain": [ 836 | "0.97" 837 | ] 838 | }, 839 | "execution_count": 67, 840 | "metadata": {}, 841 | "output_type": "execute_result" 842 | } 843 | ], 844 | "source": [ 845 | "forest_bayes_search.best_score_" 846 | ] 847 | }, 848 | { 849 | "cell_type": "markdown", 850 | "metadata": {}, 851 | "source": [ 852 | "Bayesian Optimization allowed us to improve our accuracy by another whole percent in the same amount of iterations as Randomized Search. I hope this convinces you to stray away from your comfort zone using `GridSearchCV` and `RandomizedSearchCV` to try implementing something new like `BayesSearchCV` in your next project. Hyperparameter searching can be tedious, but there are tools that can do the tedious work for you." 853 | ] 854 | }, 855 | { 856 | "cell_type": "markdown", 857 | "metadata": {}, 858 | "source": [ 859 | "# Fine-Tuning Neural Network Hyperparameters\n", 860 | "\n", 861 | "The flexibility of neural networks is also one of their main drawbacks: there are many hyperparameters to tweak. Not only can you use any imaginable network architecture, but even in a simple MLP you can change the number of layers, the number of neurons per layer, the type of activation function to use in each layer, the weight initialization logic, and much more. It can be hard to know what combination of hyperparameters is the best for your task.\n", 862 | "\n", 863 | "One option is to simply try many combinations of hyperparameters and see which one works best on the validation set (or use K-fold cross-validation). For example, we can use `GridSearchCV` or `RandomizedSearchCV` to explore the hyperparameter space. To do this, we need to wrap our Keras models in objects that mimic regular Scikit-Learn classifiers. The first step is to create a function that will build and compile a Keras model, given a set of hyperparameters:" 864 | ] 865 | }, 866 | { 867 | "cell_type": "code", 868 | "execution_count": 68, 869 | "metadata": {}, 870 | "outputs": [], 871 | "source": [ 872 | "(X_train_full, y_train_full), (X_test, y_test) = keras.datasets.mnist.load_data()" 873 | ] 874 | }, 875 | { 876 | "cell_type": "markdown", 877 | "metadata": {}, 878 | "source": [ 879 | "Since we are going to train the neural network using gradient descent, we must scale the input features. For simplicity, we'll scale the pixel intensities down to the 0-1 range by dividing them by 255.0 (this also converts them to floats):" 880 | ] 881 | }, 882 | { 883 | "cell_type": "code", 884 | "execution_count": 69, 885 | "metadata": {}, 886 | "outputs": [], 887 | "source": [ 888 | "X_valid, X_train = X_train_full[:5000] / 255., X_train_full[5000:] / 255.\n", 889 | "y_valid, y_train = y_train_full[:5000], y_train_full[5000:]\n", 890 | "X_test = X_test / 255." 891 | ] 892 | }, 893 | { 894 | "cell_type": "code", 895 | "execution_count": 84, 896 | "metadata": {}, 897 | "outputs": [], 898 | "source": [ 899 | "def build_model(n_hidden=1, n_neurons=30, learning_rate=3e-3, input_shape=[28, 28]):\n", 900 | " model = keras.models.Sequential()\n", 901 | " model.add(keras.layers.Flatten(input_shape=input_shape))\n", 902 | " for layer in range(n_hidden):\n", 903 | " model.add(keras.layers.Dense(n_neurons, activation=\"relu\"))\n", 904 | " model.add(keras.layers.Dense(10, activation=\"softmax\"))\n", 905 | " optimizer = keras.optimizers.SGD(lr=learning_rate)\n", 906 | " model.compile(loss=\"sparse_categorical_crossentropy\", optimizer=optimizer, metrics=[\"accuracy\"])\n", 907 | " return model" 908 | ] 909 | }, 910 | { 911 | "cell_type": "markdown", 912 | "metadata": {}, 913 | "source": [ 914 | "This function creates a simple Sequential model for multiclass classification with the given input shape and the given number of hidden layers and neurons, and it compiles it using an SGD optimizer configured with the specified learning rate. \n", 915 | "\n", 916 | "Next, let's create a KerasClassifier based on this `build_model()` function:" 917 | ] 918 | }, 919 | { 920 | "cell_type": "code", 921 | "execution_count": 85, 922 | "metadata": {}, 923 | "outputs": [], 924 | "source": [ 925 | "keras_clf = keras.wrappers.scikit_learn.KerasClassifier(build_model)" 926 | ] 927 | }, 928 | { 929 | "cell_type": "markdown", 930 | "metadata": {}, 931 | "source": [ 932 | "The `KerasClassifier` object is a thin wrapper around the Keras model built using `build_model()`. This will allow us to use this object like a regular Scikit-Learn classifier: we can train it using its `fit()` method, then evaluate it using its `score()` method, and use it to make predictions using its `predict()` method.\n", 933 | "\n", 934 | "We don't want to train and evaluate a single model like this though, we want to train hundreds of variants and see which one performs best on the validation set. Since there are many hyperparamters, it is preferable to use randomized search rather than grid search. Let's try to explore the number of hidden layers, the number of neurons, and the learning rate:" 935 | ] 936 | }, 937 | { 938 | "cell_type": "code", 939 | "execution_count": 86, 940 | "metadata": {}, 941 | "outputs": [ 942 | { 943 | "name": "stdout", 944 | "output_type": "stream", 945 | "text": [ 946 | "Fitting 5 folds for each of 20 candidates, totalling 100 fits\n" 947 | ] 948 | }, 949 | { 950 | "name": "stderr", 951 | "output_type": "stream", 952 | "text": [ 953 | "[Parallel(n_jobs=-1)]: Using backend LokyBackend with 8 concurrent workers.\n", 954 | "[Parallel(n_jobs=-1)]: Done 34 tasks | elapsed: 43.4min\n", 955 | "[Parallel(n_jobs=-1)]: Done 100 out of 100 | elapsed: 135.5min finished\n" 956 | ] 957 | }, 958 | { 959 | "name": "stdout", 960 | "output_type": "stream", 961 | "text": [ 962 | "Epoch 1/100\n", 963 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.5520 - accuracy: 0.8586 - val_loss: 0.3218 - val_accuracy: 0.9122\n", 964 | "Epoch 2/100\n", 965 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.2984 - accuracy: 0.9168 - val_loss: 0.2512 - val_accuracy: 0.9318\n", 966 | "Epoch 3/100\n", 967 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.2488 - accuracy: 0.9312 - val_loss: 0.2181 - val_accuracy: 0.9422\n", 968 | "Epoch 4/100\n", 969 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.2168 - accuracy: 0.9406 - val_loss: 0.1955 - val_accuracy: 0.9484\n", 970 | "Epoch 5/100\n", 971 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.1930 - accuracy: 0.9463 - val_loss: 0.1753 - val_accuracy: 0.9548\n", 972 | "Epoch 6/100\n", 973 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.1744 - accuracy: 0.9521 - val_loss: 0.1608 - val_accuracy: 0.9590\n", 974 | "Epoch 7/100\n", 975 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.1591 - accuracy: 0.9558 - val_loss: 0.1497 - val_accuracy: 0.9598\n", 976 | "Epoch 8/100\n", 977 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.1463 - accuracy: 0.9588 - val_loss: 0.1387 - val_accuracy: 0.9642\n", 978 | "Epoch 9/100\n", 979 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.1356 - accuracy: 0.9622 - val_loss: 0.1322 - val_accuracy: 0.9644\n", 980 | "Epoch 10/100\n", 981 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.1264 - accuracy: 0.9647 - val_loss: 0.1260 - val_accuracy: 0.9680\n", 982 | "Epoch 11/100\n", 983 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.1183 - accuracy: 0.9671 - val_loss: 0.1193 - val_accuracy: 0.9674\n", 984 | "Epoch 12/100\n", 985 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.1111 - accuracy: 0.9690 - val_loss: 0.1128 - val_accuracy: 0.9694\n", 986 | "Epoch 13/100\n", 987 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.1047 - accuracy: 0.9710 - val_loss: 0.1092 - val_accuracy: 0.9708\n", 988 | "Epoch 14/100\n", 989 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0991 - accuracy: 0.9730 - val_loss: 0.1043 - val_accuracy: 0.9718\n", 990 | "Epoch 15/100\n", 991 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0936 - accuracy: 0.9746 - val_loss: 0.1047 - val_accuracy: 0.9706\n", 992 | "Epoch 16/100\n", 993 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0891 - accuracy: 0.9761 - val_loss: 0.0993 - val_accuracy: 0.9718\n", 994 | "Epoch 17/100\n", 995 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0846 - accuracy: 0.9772 - val_loss: 0.0962 - val_accuracy: 0.9726\n", 996 | "Epoch 18/100\n", 997 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0807 - accuracy: 0.9783 - val_loss: 0.0941 - val_accuracy: 0.9730\n", 998 | "Epoch 19/100\n", 999 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0770 - accuracy: 0.9793 - val_loss: 0.0917 - val_accuracy: 0.9754\n", 1000 | "Epoch 20/100\n", 1001 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0737 - accuracy: 0.9801 - val_loss: 0.0883 - val_accuracy: 0.9742\n", 1002 | "Epoch 21/100\n", 1003 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0705 - accuracy: 0.9813 - val_loss: 0.0886 - val_accuracy: 0.9734\n", 1004 | "Epoch 22/100\n", 1005 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0676 - accuracy: 0.9821 - val_loss: 0.0846 - val_accuracy: 0.9768\n", 1006 | "Epoch 23/100\n", 1007 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0649 - accuracy: 0.9829 - val_loss: 0.0823 - val_accuracy: 0.9770\n", 1008 | "Epoch 24/100\n", 1009 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0621 - accuracy: 0.9838 - val_loss: 0.0827 - val_accuracy: 0.9768\n", 1010 | "Epoch 25/100\n", 1011 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0599 - accuracy: 0.9846 - val_loss: 0.0806 - val_accuracy: 0.9778\n", 1012 | "Epoch 26/100\n", 1013 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0576 - accuracy: 0.9853 - val_loss: 0.0787 - val_accuracy: 0.9786\n", 1014 | "Epoch 27/100\n", 1015 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0552 - accuracy: 0.9861 - val_loss: 0.0791 - val_accuracy: 0.9778\n", 1016 | "Epoch 28/100\n", 1017 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0533 - accuracy: 0.9867 - val_loss: 0.0772 - val_accuracy: 0.9784\n", 1018 | "Epoch 29/100\n", 1019 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0515 - accuracy: 0.9870 - val_loss: 0.0755 - val_accuracy: 0.9792\n", 1020 | "Epoch 30/100\n", 1021 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0496 - accuracy: 0.9876 - val_loss: 0.0734 - val_accuracy: 0.9792\n", 1022 | "Epoch 31/100\n", 1023 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0479 - accuracy: 0.9884 - val_loss: 0.0742 - val_accuracy: 0.9776\n", 1024 | "Epoch 32/100\n", 1025 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0463 - accuracy: 0.9885 - val_loss: 0.0730 - val_accuracy: 0.9786\n", 1026 | "Epoch 33/100\n", 1027 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0448 - accuracy: 0.9888 - val_loss: 0.0723 - val_accuracy: 0.9782\n", 1028 | "Epoch 34/100\n", 1029 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0432 - accuracy: 0.9895 - val_loss: 0.0723 - val_accuracy: 0.9782\n", 1030 | "Epoch 35/100\n", 1031 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0418 - accuracy: 0.9896 - val_loss: 0.0712 - val_accuracy: 0.9792\n", 1032 | "Epoch 36/100\n", 1033 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0405 - accuracy: 0.9905 - val_loss: 0.0702 - val_accuracy: 0.9788\n", 1034 | "Epoch 37/100\n", 1035 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0392 - accuracy: 0.9905 - val_loss: 0.0698 - val_accuracy: 0.9798\n", 1036 | "Epoch 38/100\n", 1037 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0380 - accuracy: 0.9911 - val_loss: 0.0694 - val_accuracy: 0.9794\n", 1038 | "Epoch 39/100\n", 1039 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0368 - accuracy: 0.9915 - val_loss: 0.0693 - val_accuracy: 0.9792\n", 1040 | "Epoch 40/100\n", 1041 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0356 - accuracy: 0.9917 - val_loss: 0.0695 - val_accuracy: 0.9784\n", 1042 | "Epoch 41/100\n", 1043 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0344 - accuracy: 0.9923 - val_loss: 0.0695 - val_accuracy: 0.9784\n", 1044 | "Epoch 42/100\n", 1045 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0336 - accuracy: 0.9922 - val_loss: 0.0693 - val_accuracy: 0.9786\n", 1046 | "Epoch 43/100\n", 1047 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0326 - accuracy: 0.9928 - val_loss: 0.0670 - val_accuracy: 0.9786\n", 1048 | "Epoch 44/100\n", 1049 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0316 - accuracy: 0.9933 - val_loss: 0.0680 - val_accuracy: 0.9790\n", 1050 | "Epoch 45/100\n", 1051 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0307 - accuracy: 0.9936 - val_loss: 0.0662 - val_accuracy: 0.9802\n", 1052 | "Epoch 46/100\n", 1053 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0298 - accuracy: 0.9936 - val_loss: 0.0676 - val_accuracy: 0.9790\n", 1054 | "Epoch 47/100\n", 1055 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0290 - accuracy: 0.9939 - val_loss: 0.0662 - val_accuracy: 0.9800\n", 1056 | "Epoch 48/100\n", 1057 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0281 - accuracy: 0.9945 - val_loss: 0.0666 - val_accuracy: 0.9796\n", 1058 | "Epoch 49/100\n", 1059 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0273 - accuracy: 0.9944 - val_loss: 0.0671 - val_accuracy: 0.9796\n", 1060 | "Epoch 50/100\n", 1061 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0265 - accuracy: 0.9947 - val_loss: 0.0654 - val_accuracy: 0.9794\n", 1062 | "Epoch 51/100\n", 1063 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0258 - accuracy: 0.9951 - val_loss: 0.0659 - val_accuracy: 0.9794\n", 1064 | "Epoch 52/100\n", 1065 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0251 - accuracy: 0.9951 - val_loss: 0.0646 - val_accuracy: 0.9802\n", 1066 | "Epoch 53/100\n", 1067 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0244 - accuracy: 0.9953 - val_loss: 0.0656 - val_accuracy: 0.9786\n", 1068 | "Epoch 54/100\n", 1069 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0237 - accuracy: 0.9957 - val_loss: 0.0655 - val_accuracy: 0.9796\n", 1070 | "Epoch 55/100\n", 1071 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0231 - accuracy: 0.9960 - val_loss: 0.0655 - val_accuracy: 0.9792\n", 1072 | "Epoch 56/100\n", 1073 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0225 - accuracy: 0.9959 - val_loss: 0.0647 - val_accuracy: 0.9802\n", 1074 | "Epoch 57/100\n", 1075 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0219 - accuracy: 0.9962 - val_loss: 0.0653 - val_accuracy: 0.9804\n", 1076 | "Epoch 58/100\n", 1077 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0213 - accuracy: 0.9965 - val_loss: 0.0647 - val_accuracy: 0.9806\n", 1078 | "Epoch 59/100\n", 1079 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0208 - accuracy: 0.9965 - val_loss: 0.0644 - val_accuracy: 0.9792\n", 1080 | "Epoch 60/100\n", 1081 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0203 - accuracy: 0.9966 - val_loss: 0.0646 - val_accuracy: 0.9806\n", 1082 | "Epoch 61/100\n", 1083 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0197 - accuracy: 0.9970 - val_loss: 0.0634 - val_accuracy: 0.9800\n", 1084 | "Epoch 62/100\n", 1085 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0192 - accuracy: 0.9971 - val_loss: 0.0641 - val_accuracy: 0.9812\n", 1086 | "Epoch 63/100\n", 1087 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0188 - accuracy: 0.9972 - val_loss: 0.0636 - val_accuracy: 0.9810\n", 1088 | "Epoch 64/100\n", 1089 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0183 - accuracy: 0.9973 - val_loss: 0.0637 - val_accuracy: 0.9800\n", 1090 | "Epoch 65/100\n", 1091 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0179 - accuracy: 0.9975 - val_loss: 0.0639 - val_accuracy: 0.9802\n", 1092 | "Epoch 66/100\n", 1093 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0174 - accuracy: 0.9975 - val_loss: 0.0655 - val_accuracy: 0.9810\n", 1094 | "Epoch 67/100\n", 1095 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0170 - accuracy: 0.9977 - val_loss: 0.0634 - val_accuracy: 0.9814\n", 1096 | "Epoch 68/100\n", 1097 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0167 - accuracy: 0.9979 - val_loss: 0.0649 - val_accuracy: 0.9808\n", 1098 | "Epoch 69/100\n", 1099 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0163 - accuracy: 0.9978 - val_loss: 0.0639 - val_accuracy: 0.9810\n", 1100 | "Epoch 70/100\n", 1101 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0159 - accuracy: 0.9979 - val_loss: 0.0639 - val_accuracy: 0.9802\n", 1102 | "Epoch 71/100\n", 1103 | "1719/1719 [==============================] - 2s 1ms/step - loss: 0.0156 - accuracy: 0.9980 - val_loss: 0.0640 - val_accuracy: 0.9810\n" 1104 | ] 1105 | }, 1106 | { 1107 | "data": { 1108 | "text/plain": [ 1109 | "RandomizedSearchCV(cv=5, error_score='raise-deprecating',\n", 1110 | " estimator=,\n", 1111 | " iid='warn', n_iter=20, n_jobs=-1,\n", 1112 | " param_distributions={'learning_rate': ,\n", 1113 | " 'n_hidden': [1, 2, 3, 4],\n", 1114 | " 'n_neurons': array([ 30, 31, 32, 33, 34, 35, 36, 37, 38, 3...\n", 1115 | " 238, 239, 240, 241, 242, 243, 244, 245, 246, 247, 248, 249, 250,\n", 1116 | " 251, 252, 253, 254, 255, 256, 257, 258, 259, 260, 261, 262, 263,\n", 1117 | " 264, 265, 266, 267, 268, 269, 270, 271, 272, 273, 274, 275, 276,\n", 1118 | " 277, 278, 279, 280, 281, 282, 283, 284, 285, 286, 287, 288, 289,\n", 1119 | " 290, 291, 292, 293, 294, 295, 296, 297, 298, 299])},\n", 1120 | " pre_dispatch='2*n_jobs', random_state=None, refit=True,\n", 1121 | " return_train_score=False, scoring='accuracy', verbose=True)" 1122 | ] 1123 | }, 1124 | "execution_count": 86, 1125 | "metadata": {}, 1126 | "output_type": "execute_result" 1127 | } 1128 | ], 1129 | "source": [ 1130 | "from scipy.stats import reciprocal\n", 1131 | "\n", 1132 | "keras_param_space = {\"n_hidden\": [1, 2, 3, 4],\n", 1133 | " \"n_neurons\": np.arange(30, 300),\n", 1134 | " \"learning_rate\": reciprocal(3e-4, 3e-2) \n", 1135 | "}\n", 1136 | "\n", 1137 | "keras_rand_search = RandomizedSearchCV(keras_clf, keras_param_space, n_iter=20, \n", 1138 | " cv=5, scoring=\"accuracy\", n_jobs=-1, verbose=True)\n", 1139 | "\n", 1140 | "keras_rand_search.fit(X_train, y_train, epochs=100,\n", 1141 | " validation_data=(X_valid, y_valid),\n", 1142 | " callbacks=[keras.callbacks.EarlyStopping(patience=10)])" 1143 | ] 1144 | }, 1145 | { 1146 | "cell_type": "code", 1147 | "execution_count": 87, 1148 | "metadata": {}, 1149 | "outputs": [ 1150 | { 1151 | "data": { 1152 | "text/plain": [ 1153 | "{'learning_rate': 0.015529337745078654, 'n_hidden': 1, 'n_neurons': 237}" 1154 | ] 1155 | }, 1156 | "execution_count": 87, 1157 | "metadata": {}, 1158 | "output_type": "execute_result" 1159 | } 1160 | ], 1161 | "source": [ 1162 | "keras_rand_search.best_params_" 1163 | ] 1164 | }, 1165 | { 1166 | "cell_type": "code", 1167 | "execution_count": 88, 1168 | "metadata": {}, 1169 | "outputs": [ 1170 | { 1171 | "data": { 1172 | "text/plain": [ 1173 | "" 1174 | ] 1175 | }, 1176 | "execution_count": 88, 1177 | "metadata": {}, 1178 | "output_type": "execute_result" 1179 | } 1180 | ], 1181 | "source": [ 1182 | "keras_rand_search.best_estimator_" 1183 | ] 1184 | }, 1185 | { 1186 | "cell_type": "code", 1187 | "execution_count": 90, 1188 | "metadata": {}, 1189 | "outputs": [ 1190 | { 1191 | "data": { 1192 | "text/plain": [ 1193 | "0.9752909090909091" 1194 | ] 1195 | }, 1196 | "execution_count": 90, 1197 | "metadata": {}, 1198 | "output_type": "execute_result" 1199 | } 1200 | ], 1201 | "source": [ 1202 | "keras_rand_search.best_score_" 1203 | ] 1204 | }, 1205 | { 1206 | "cell_type": "markdown", 1207 | "metadata": {}, 1208 | "source": [ 1209 | "Our accuracy increased by another .5%! The last step is to see how each model performed on the test set (see below)." 1210 | ] 1211 | }, 1212 | { 1213 | "cell_type": "markdown", 1214 | "metadata": {}, 1215 | "source": [ 1216 | "# Conclusions\n", 1217 | "\n", 1218 | "Hyperparameter tuning is still an active area of research, and different algorithms are being produced today. But having basic algorithms in your back pocket can alleviate a lot of the tedious work searching for the best hyperparameters.\n", 1219 | "\n", 1220 | "Remember, randomized search is almost always preferable then grid search unless you have very few hyperparameters to explore. If you have a more complex problem using a larger dataset you might want to turn to a technique that explores a search space much more efficiently like Bayesian Optimization." 1221 | ] 1222 | }, 1223 | { 1224 | "cell_type": "code", 1225 | "execution_count": 99, 1226 | "metadata": {}, 1227 | "outputs": [ 1228 | { 1229 | "data": { 1230 | "text/plain": [ 1231 | "0.9486" 1232 | ] 1233 | }, 1234 | "execution_count": 99, 1235 | "metadata": {}, 1236 | "output_type": "execute_result" 1237 | } 1238 | ], 1239 | "source": [ 1240 | "forest_grid_search.score(X_test, y_test)" 1241 | ] 1242 | }, 1243 | { 1244 | "cell_type": "code", 1245 | "execution_count": 100, 1246 | "metadata": {}, 1247 | "outputs": [ 1248 | { 1249 | "data": { 1250 | "text/plain": [ 1251 | "0.9662" 1252 | ] 1253 | }, 1254 | "execution_count": 100, 1255 | "metadata": {}, 1256 | "output_type": "execute_result" 1257 | } 1258 | ], 1259 | "source": [ 1260 | "forest_rand_search.score(X_test, y_test)" 1261 | ] 1262 | }, 1263 | { 1264 | "cell_type": "code", 1265 | "execution_count": 101, 1266 | "metadata": {}, 1267 | "outputs": [ 1268 | { 1269 | "data": { 1270 | "text/plain": [ 1271 | "0.9721" 1272 | ] 1273 | }, 1274 | "execution_count": 101, 1275 | "metadata": {}, 1276 | "output_type": "execute_result" 1277 | } 1278 | ], 1279 | "source": [ 1280 | "forest_bayes_search.score(X_test, y_test)" 1281 | ] 1282 | }, 1283 | { 1284 | "cell_type": "code", 1285 | "execution_count": 102, 1286 | "metadata": {}, 1287 | "outputs": [ 1288 | { 1289 | "name": "stdout", 1290 | "output_type": "stream", 1291 | "text": [ 1292 | "WARNING:tensorflow:Model was constructed with shape (None, 28, 28) for input Tensor(\"flatten_input:0\", shape=(None, 28, 28), dtype=float32), but it was called on an input with incompatible shape (None, 784).\n" 1293 | ] 1294 | }, 1295 | { 1296 | "data": { 1297 | "text/plain": [ 1298 | "0.9773" 1299 | ] 1300 | }, 1301 | "execution_count": 102, 1302 | "metadata": {}, 1303 | "output_type": "execute_result" 1304 | } 1305 | ], 1306 | "source": [ 1307 | "keras_rand_search.score(X_test, y_test)" 1308 | ] 1309 | } 1310 | ], 1311 | "metadata": { 1312 | "kernelspec": { 1313 | "display_name": "Python 3", 1314 | "language": "python", 1315 | "name": "python3" 1316 | }, 1317 | "language_info": { 1318 | "codemirror_mode": { 1319 | "name": "ipython", 1320 | "version": 3 1321 | }, 1322 | "file_extension": ".py", 1323 | "mimetype": "text/x-python", 1324 | "name": "python", 1325 | "nbconvert_exporter": "python", 1326 | "pygments_lexer": "ipython3", 1327 | "version": "3.7.3" 1328 | }, 1329 | "toc": { 1330 | "base_numbering": 1, 1331 | "nav_menu": {}, 1332 | "number_sections": false, 1333 | "sideBar": true, 1334 | "skip_h1_title": false, 1335 | "title_cell": "Table of Contents", 1336 | "title_sidebar": "Contents", 1337 | "toc_cell": false, 1338 | "toc_position": {}, 1339 | "toc_section_display": true, 1340 | "toc_window_display": false 1341 | } 1342 | }, 1343 | "nbformat": 4, 1344 | "nbformat_minor": 2 1345 | } 1346 | -------------------------------------------------------------------------------- /slides/hyperparemeter_optimization.pdf: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/lukenew2/ds-demos/5ce0651c654039d1651aec8a48fb6d22c8cc18ab/slides/hyperparemeter_optimization.pdf --------------------------------------------------------------------------------