├── .ipynb_checkpoints ├── Bayesian_Optimization-checkpoint.ipynb └── Just Another Kernel Cookbook...-checkpoint.ipynb ├── Bayesian_Optimization.ipynb ├── Just Another Kernel Cookbook....ipynb └── README.md /.ipynb_checkpoints/Bayesian_Optimization-checkpoint.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "metadata": { 3 | "name": "", 4 | "signature": "sha256:c370f9ddee94f2229f68aa3ae621b7f22fa232210349fd6eda79f0d4b9095da3" 5 | }, 6 | "nbformat": 3, 7 | "nbformat_minor": 0, 8 | "worksheets": [ 9 | { 10 | "cells": [ 11 | { 12 | "cell_type": "heading", 13 | "level": 1, 14 | "metadata": {}, 15 | "source": [ 16 | "An IPython Notebook Tutorial on Bayesian Optimisation" 17 | ] 18 | }, 19 | { 20 | "cell_type": "markdown", 21 | "metadata": {}, 22 | "source": [ 23 | "The idea of this IP[y]:Notebook is to demonstrate some common Bayesian Optimisation techniques using python and the GPy toolbox. The tutorial follows very closely to Brochu et. al [1] and demonstrates three types of acquisition functions, namely PI, EI and UBC. This was created only to supplement [1] and NOT as an alternitive.\n", 24 | "\n", 25 | "In order to really get an intuition about what is happening in Bayesian Optimisation you are incouraged to play with the code, change the model to be optimised, tweek parameters and see what happens and why. Enjoy!" 26 | ] 27 | }, 28 | { 29 | "cell_type": "code", 30 | "collapsed": false, 31 | "input": [ 32 | "'''\n", 33 | "\n", 34 | "Details:\n", 35 | " This is just a little bit of python\n", 36 | " to demonstrate Bayesian Optimisation.\n", 37 | " It is meant only to build intuition.\n", 38 | " \n", 39 | "Author:\n", 40 | " Jack Fitzsimons,\n", 41 | " Machine Learning Group,\n", 42 | " Information Engineering (Robotics Research Group),\n", 43 | " University of Oxford.\n", 44 | " jack.fitzsimons@eng.ox.ac.uk\n", 45 | " \n", 46 | "Refences:\n", 47 | "[1] Brochu, Eric, Vlad M. Cora, and Nando De Freitas. \n", 48 | " \"A tutorial on Bayesian optimization of expensive cost functions, \n", 49 | " with application to active user modeling and hierarchical reinforcement learning.\" \n", 50 | " arXiv preprint arXiv:1012.2599 (2010).\n", 51 | "\n", 52 | "[2] Srinivas, Niranjan, et al. \n", 53 | " \"Gaussian process optimization in the bandit setting: \n", 54 | " No regret and experimental design.\" \n", 55 | " arXiv preprint arXiv:0912.3995 (2009).\n", 56 | "\n", 57 | "'''\n", 58 | "\n", 59 | "%pylab inline\n", 60 | "import numpy as np\n", 61 | "import scipy as sp\n", 62 | "import pylab as pb\n", 63 | "import os\n", 64 | "import sys\n", 65 | "sys.path.append(os.getenv(\"HOME\") + '/Desktop/DPhil_GitHub/Libraries')\n", 66 | "import GPy\n", 67 | "print(os.path)" 68 | ], 69 | "language": "python", 70 | "metadata": {}, 71 | "outputs": [] 72 | }, 73 | { 74 | "cell_type": "markdown", 75 | "metadata": {}, 76 | "source": [ 77 | "Let's start by defining some function which we want to find the maxima of. I choose the mixture of two gaussians but any function could be placed in here. We consider the function as a black box, so image you have little clue of the form of the function." 78 | ] 79 | }, 80 | { 81 | "cell_type": "code", 82 | "collapsed": false, 83 | "input": [ 84 | "# We will examine a simple 1D example\n", 85 | "\n", 86 | "def func(x):\n", 87 | " # Lets play around with a gausian mixture\n", 88 | " # and pretend we dont know what it is\n", 89 | " var_1 = 0.01\n", 90 | " var_2 = 0.03\n", 91 | " \n", 92 | " mean_1 = 0.3\n", 93 | " mean_2 = 0.6\n", 94 | " \n", 95 | " return ((1/np.sqrt(2*np.pi*var_1))*np.exp(-pow(x-mean_1,2)/var_1)) \\\n", 96 | " + ((1/np.sqrt(2*np.pi*var_2))*np.exp(-pow(x-mean_2,2)/var_2)) - 1" 97 | ], 98 | "language": "python", 99 | "metadata": {}, 100 | "outputs": [] 101 | }, 102 | { 103 | "cell_type": "markdown", 104 | "metadata": {}, 105 | "source": [ 106 | "Next we will simply plot the function." 107 | ] 108 | }, 109 | { 110 | "cell_type": "code", 111 | "collapsed": false, 112 | "input": [ 113 | "x = np.linspace(0,1,200)\n", 114 | "y = func(x)\n", 115 | "\n", 116 | "plt.plot(x, y, '-r')\n", 117 | "\n", 118 | "xlabel('Input (x)')\n", 119 | "ylabel('Output (y)')\n", 120 | "title('Function to optimise (maximisation)')" 121 | ], 122 | "language": "python", 123 | "metadata": {}, 124 | "outputs": [] 125 | }, 126 | { 127 | "cell_type": "markdown", 128 | "metadata": {}, 129 | "source": [ 130 | "Next we will use GPy to create a Gaussian Process (GP) which we will use to model our unknown function. In Bayesian Optimisation, we use attributes of the GP such as mean values and variances to intellegently sample successive points. The goal is to find the global maxima in as few samples as possible. This makes it very suitable for situations where the cost function is very costly to evaluate at points and traditional MCMC techniques would be inappropriate. \n", 131 | "\n", 132 | "The GP in this case is initialise with an RBF kernel with constant length scale and output variance. Of course these are rarely known in advace and optisation of hyperparameters may be performed. I have also assumed that the function has zero noise which may or may not be the case for various problems. Feel free to change this and see its effects on the optimisation.\n", 133 | "\n", 134 | "Unfortunitely, GPy goes a little crazy when you intitialise a GP with zero or one point so I initialised it with 3 arbitrary points." 135 | ] 136 | }, 137 | { 138 | "cell_type": "code", 139 | "collapsed": false, 140 | "input": [ 141 | "# The first issue with Baysian Optimisation is hyper parameter estimation\n", 142 | "# I'll just put in some toy numbers though\n", 143 | "\n", 144 | "# When Optimising a GP for a deterministic function we have zero noise\n", 145 | "# This is not the case in general optimisation problems\n", 146 | "noise = 0.0\n", 147 | "\n", 148 | "# length scale and output variance\n", 149 | "len_scale = 0.05\n", 150 | "o_var = 3.0\n", 151 | "\n", 152 | "# RBF (or Sqared Exp.) Kernel\n", 153 | "kern_rbf = GPy.kern.rbf(input_dim=1, variance=o_var, lengthscale=len_scale)\n", 154 | "\n", 155 | "# Add noise if nessesary (white noise)\n", 156 | "kern_noise = GPy.kern.white(input_dim=1, variance=noise)\n", 157 | "\n", 158 | "# Combine the kernels for the GP\n", 159 | "kern = kern_rbf + kern_noise\n", 160 | "\n", 161 | "# Initialise our search\n", 162 | "x_opt = np.array([0.1, 0.5, 0.8])\n", 163 | "y_opt = np.array([func(x_opt)])\n", 164 | "\n", 165 | "# Make the Gaussian Process using the GPy interface\n", 166 | "gp = GPy.models.GPRegression(x_opt.T,y_opt.T,kern)\n", 167 | "\n", 168 | "# There is no noise on our observations\n", 169 | "gp['noise'] = 0" 170 | ], 171 | "language": "python", 172 | "metadata": {}, 173 | "outputs": [] 174 | }, 175 | { 176 | "cell_type": "markdown", 177 | "metadata": {}, 178 | "source": [ 179 | "Let's check out what the initial GP looks like. Not much of an approximation of the maxima ey?" 180 | ] 181 | }, 182 | { 183 | "cell_type": "code", 184 | "collapsed": false, 185 | "input": [ 186 | "# Plot limits [x_min, x_max, y_min, y_max]\n", 187 | "limits = [0,1,-4,4]\n", 188 | "\n", 189 | "fig = plt.figure()\n", 190 | "ax = fig.add_subplot(111)\n", 191 | "gp.plot(ax=ax, plot_limits=[0,1])\n", 192 | "ax.axis(limits)\n", 193 | "ax.plot(x, y, '--r')\n", 194 | "\n", 195 | "\n", 196 | "xlabel('Input (x)')\n", 197 | "ylabel('Output (y)')\n", 198 | "title('Gaussian Process')" 199 | ], 200 | "language": "python", 201 | "metadata": {}, 202 | "outputs": [] 203 | }, 204 | { 205 | "cell_type": "markdown", 206 | "metadata": {}, 207 | "source": [ 208 | "The Probability of Improvement (PI) acquisition function is conceptually very simple. If we remember that a GP creates a gaussian over the output space for every input, PI simply chooses the point the largest CDF of the gaussian greater than our current maxima. That's it!\n", 209 | "\n", 210 | "There is an epsilon value but this only makes it look like the current maxima is slightly higher than it actually is. This was originally introduced to balance the exploration-expoitation trade off but it's not very effective at controlling this. " 211 | ] 212 | }, 213 | { 214 | "cell_type": "code", 215 | "collapsed": false, 216 | "input": [ 217 | "## Prob. of Improvement Based Acquisition Functions\n", 218 | "# Including the Exploration-Exploitation tradeoff parameter\n", 219 | "\n", 220 | "mean = gp.predict(x[:,None])[0]\n", 221 | "var = gp.predict(x[:,None])[1]\n", 222 | "\n", 223 | "max_y = np.max(y_opt)\n", 224 | "\n", 225 | "# Probability of Improvement (PI)\n", 226 | "\n", 227 | "# trade-off parameter\n", 228 | "eps = 0.1\n", 229 | "\n", 230 | "PI = sp.stats.norm.cdf((mean - max_y - eps)/(np.sqrt(var)))\n", 231 | "\n", 232 | "fig = plt.figure()\n", 233 | "ax = fig.add_subplot(111)\n", 234 | "ax.plot(x, PI, '-g')\n", 235 | "\n", 236 | "idx = np.where(PI==PI.max())[0]\n", 237 | "\n", 238 | "ax.plot(x[idx], PI[idx], 'go')\n", 239 | "ax.axvline(x=x[idx], color='green')\n", 240 | "\n", 241 | "xlabel('Input (x)')\n", 242 | "ylabel('Prob. of Impr. (PI)')\n", 243 | "title('Acquisition Function (PI)')" 244 | ], 245 | "language": "python", 246 | "metadata": {}, 247 | "outputs": [] 248 | }, 249 | { 250 | "cell_type": "markdown", 251 | "metadata": {}, 252 | "source": [ 253 | "Now that we have seen what the acquisition function looks like for three points lets see how the optimisation performs as it finding the next 9 points. " 254 | ] 255 | }, 256 | { 257 | "cell_type": "code", 258 | "collapsed": false, 259 | "input": [ 260 | "# Test Prob. of Improvement with 9 samples allowed\n", 261 | "\n", 262 | "# Local x and y that will be updated\n", 263 | "x_pi = np.copy(x_opt)\n", 264 | "y_pi = np.copy(y_opt)\n", 265 | "\n", 266 | "# trade-off parameter\n", 267 | "eps = 0.1\n", 268 | "\n", 269 | "for i in range(1,10):\n", 270 | " # Make the Gaussian Process using the GPy interface\n", 271 | " gp_pi = GPy.models.GPRegression(x_pi.T,y_pi.T,kern)\n", 272 | " \n", 273 | " # There is no noise on our observations\n", 274 | " gp_pi['noise'] = 0.0001\n", 275 | " \n", 276 | " mean = gp_pi.predict(x[:,None])[0]\n", 277 | " var = gp_pi.predict(x[:,None])[1]\n", 278 | "\n", 279 | " max_y = np.max(y_pi)\n", 280 | "\n", 281 | " var = var.clip(0)\n", 282 | " \n", 283 | " PI = sp.stats.norm.cdf((mean - max_y - eps)/(np.sqrt(var)))\n", 284 | " \n", 285 | " # Plot limits [x_min, x_max, y_min, y_max]\n", 286 | " limits = [0,1,-4,4]\n", 287 | "\n", 288 | " fig = plt.figure()\n", 289 | " ax = fig.add_subplot(111)\n", 290 | " gp_pi.plot(ax=ax, plot_limits=[0,1])\n", 291 | " ax.axis(limits)\n", 292 | " ax.plot(x, y, '--r')\n", 293 | "\n", 294 | "\n", 295 | " xlabel('Input (x)')\n", 296 | " ylabel('Output (y)')\n", 297 | " title('Gaussian Process')\n", 298 | " \n", 299 | " idx = np.where(PI==PI.max())[0]\n", 300 | " \n", 301 | " x_pi = np.append(x_pi, x[idx])\n", 302 | " y_pi = np.append(y_pi, y[idx])\n", 303 | " \n", 304 | " x_pi = x_pi[:,None].T\n", 305 | " y_pi = y_pi[:,None].T\n", 306 | " " 307 | ], 308 | "language": "python", 309 | "metadata": {}, 310 | "outputs": [] 311 | }, 312 | { 313 | "cell_type": "markdown", 314 | "metadata": {}, 315 | "source": [ 316 | "Mockus et al. [3] proposed the alternative of maximizing the expected improvement with respect to $f(x+)$, the predicted mean of the GP. This is known as the Expectation of Imporovement (EI) acquisition function. This is essentially a weighted combination of the CDF and PDF by the expecter increase and the uncertainty. The mixture of these helps balance the exploration-exploitation problem as new points will be selected from both unknown regions with high variance and known areas that we think are likely to contain maxima." 317 | ] 318 | }, 319 | { 320 | "cell_type": "code", 321 | "collapsed": false, 322 | "input": [ 323 | "## Expected Improvement Functions\n", 324 | "# Including the Exploration-Exploitation tradeoff parameter\n", 325 | "\n", 326 | "mean = gp.predict(x[:,None])[0]\n", 327 | "var = gp.predict(x[:,None])[1]\n", 328 | "\n", 329 | "max_y = np.max(y_opt)\n", 330 | "\n", 331 | "# Expected Improvement (EI)\n", 332 | "\n", 333 | "# trade-off parameter\n", 334 | "eps = 0.1\n", 335 | "\n", 336 | "offset = (mean - max_y - eps)\n", 337 | "sig = np.sqrt(var)\n", 338 | "\n", 339 | "EI = offset*sp.stats.norm.cdf(offset/sig) + sig*sp.stats.norm.pdf(offset/sig)\n", 340 | "EI[np.where( sig <= 0 )] = 0\n", 341 | "\n", 342 | "fig = plt.figure()\n", 343 | "ax = fig.add_subplot(111)\n", 344 | "ax.plot(x, EI, '-g')\n", 345 | "\n", 346 | "idx = np.where(EI==EI.max())[0]\n", 347 | "\n", 348 | "ax.plot(x[idx], EI[idx], 'go')\n", 349 | "ax.axvline(x=x[idx], color='green')\n", 350 | "\n", 351 | "xlabel('Input (x)')\n", 352 | "ylabel('Expected Improv. (EI)')\n", 353 | "title('Acquisition Function (EI)')" 354 | ], 355 | "language": "python", 356 | "metadata": {}, 357 | "outputs": [] 358 | }, 359 | { 360 | "cell_type": "markdown", 361 | "metadata": {}, 362 | "source": [ 363 | "Let's see how the EI version performs." 364 | ] 365 | }, 366 | { 367 | "cell_type": "code", 368 | "collapsed": false, 369 | "input": [ 370 | "# Test Expected Improvement with 9 samples allowed\n", 371 | "\n", 372 | "# Local x and y that will be updated\n", 373 | "x_ei = np.copy(x_opt)\n", 374 | "y_ei = np.copy(y_opt)\n", 375 | "\n", 376 | "# trade-off parameter\n", 377 | "eps = 0.1\n", 378 | "\n", 379 | "for i in range(1,10):\n", 380 | " # Make the Gaussian Process using the GPy interface\n", 381 | " gp_ei = GPy.models.GPRegression(x_ei.T,y_ei.T,kern)\n", 382 | " \n", 383 | " # There is no noise on our observations\n", 384 | " gp_ei['noise'] = 0.0001\n", 385 | " \n", 386 | " mean = gp_ei.predict(x[:,None])[0]\n", 387 | " var = gp_ei.predict(x[:,None])[1]\n", 388 | "\n", 389 | " max_y = np.max(y_ei)\n", 390 | "\n", 391 | " var = var.clip(0)\n", 392 | " \n", 393 | " # See previous section...\n", 394 | " offset = (mean - max_y - eps)\n", 395 | " sig = np.sqrt(var)\n", 396 | "\n", 397 | " EI = offset*sp.stats.norm.cdf(offset/sig) + sig*sp.stats.norm.pdf(offset/sig)\n", 398 | " EI[np.where( sig <= 0 )] = 0\n", 399 | " \n", 400 | " # Plot limits [x_min, x_max, y_min, y_max]\n", 401 | " limits = [0,1,-4,4]\n", 402 | "\n", 403 | " fig = plt.figure()\n", 404 | " ax = fig.add_subplot(111)\n", 405 | " gp_ei.plot(ax=ax, plot_limits=[0,1])\n", 406 | " ax.axis(limits)\n", 407 | " ax.plot(x, y, '--r')\n", 408 | "\n", 409 | "\n", 410 | " xlabel('Input (x)')\n", 411 | " ylabel('Output (y)')\n", 412 | " title('Gaussian Process')\n", 413 | " \n", 414 | " idx = np.where(EI==EI.max())[0]\n", 415 | " \n", 416 | " x_ei = np.append(x_ei, x[idx])\n", 417 | " y_ei = np.append(y_ei, y[idx])\n", 418 | " \n", 419 | " x_ei = x_ei[:,None].T\n", 420 | " y_ei = y_ei[:,None].T\n", 421 | " " 422 | ], 423 | "language": "python", 424 | "metadata": {}, 425 | "outputs": [] 426 | }, 427 | { 428 | "cell_type": "markdown", 429 | "metadata": {}, 430 | "source": [ 431 | "The last acquisition function we will look at is refered to as Upper Confidence Bound (UCB) acquisition function. This is quite an exploation dominated algorithm as it only takes into consider the level of uncertainty over the optimisation space. We pick an arbitrary bound on the CDF of the GP and calculate the cost on this bound. We then simply sample from the point with the highest bound." 432 | ] 433 | }, 434 | { 435 | "cell_type": "code", 436 | "collapsed": false, 437 | "input": [ 438 | "## GP-UCB Upper Confidence Bound\n", 439 | "# Including the Exploration-Exploitation tradeoff parameter\n", 440 | "\n", 441 | "mean = gp.predict(x[:,None])[0]\n", 442 | "var = gp.predict(x[:,None])[1]\n", 443 | "\n", 444 | "max_y = np.max(y_opt)\n", 445 | "\n", 446 | "# GP-UCB Upper Confidence Bound (UCB)\n", 447 | "\n", 448 | "# parameters...\n", 449 | "# dimensionality of function space\n", 450 | "d = 1\n", 451 | "# confidence parameter v\n", 452 | "v = 1\n", 453 | "\n", 454 | "# See page 16 of [1]\n", 455 | "def tau(idx):\n", 456 | " # delta in (0,1)\n", 457 | " delta = 0.1\n", 458 | " return 2*np.log((pow(idx, (d/2)+2))*(pow(pi,2))/(3*delta))\n", 459 | "\n", 460 | "sig = np.sqrt(var)\n", 461 | "\n", 462 | "# idx = 1 for first take\n", 463 | "UCB = mean + np.sqrt(v*tau(1))*sig\n", 464 | "\n", 465 | "fig = plt.figure()\n", 466 | "ax = fig.add_subplot(111)\n", 467 | "ax.plot(x, UCB, '-g')\n", 468 | "\n", 469 | "idx = np.where(UCB==UCB.max())[0]\n", 470 | "\n", 471 | "ax.plot(x[idx], UCB[idx], 'go')\n", 472 | "ax.axvline(x=x[idx], color='green')\n", 473 | "\n", 474 | "xlabel('Input (x)')\n", 475 | "ylabel('Expected Improv. (EI)')\n", 476 | "title('Acquisition Function (EI)')" 477 | ], 478 | "language": "python", 479 | "metadata": {}, 480 | "outputs": [] 481 | }, 482 | { 483 | "cell_type": "markdown", 484 | "metadata": {}, 485 | "source": [ 486 | "Let's see how UCB deals with our cost function... (Be impressed!)" 487 | ] 488 | }, 489 | { 490 | "cell_type": "code", 491 | "collapsed": false, 492 | "input": [ 493 | "# Test Expected Improvement with 9 samples allowed\n", 494 | "\n", 495 | "# Local x and y that will be updated\n", 496 | "x_ucb = np.copy(x_opt)\n", 497 | "y_ucb = np.copy(y_opt)\n", 498 | "\n", 499 | "# parameters...\n", 500 | "# dimensionality of function space\n", 501 | "d = 1\n", 502 | "# confidence parameter v\n", 503 | "v = 1\n", 504 | "\n", 505 | "for i in range(1,10):\n", 506 | " # Make the Gaussian Process using the GPy interface\n", 507 | " gp_ucb = GPy.models.GPRegression(x_ucb.T,y_ucb.T,kern)\n", 508 | " \n", 509 | " # There is no noise on our observations\n", 510 | " gp_ucb['noise'] = 0.0001\n", 511 | " \n", 512 | " mean = gp_ucb.predict(x[:,None])[0]\n", 513 | " var = gp_ucb.predict(x[:,None])[1]\n", 514 | "\n", 515 | " max_y = np.max(y_ucb)\n", 516 | "\n", 517 | " # remove the computer rounding errors\n", 518 | " var = var.clip(0)\n", 519 | " \n", 520 | " # See previous section...\n", 521 | " sig = np.sqrt(var)\n", 522 | "\n", 523 | " # idx = 1 for first take\n", 524 | " UCB = mean + np.sqrt(v*tau(i))*sig\n", 525 | " \n", 526 | " # Plot limits [x_min, x_max, y_min, y_max]\n", 527 | " limits = [0,1,-4,4]\n", 528 | "\n", 529 | " fig = plt.figure()\n", 530 | " ax = fig.add_subplot(111)\n", 531 | " gp_ucb.plot(ax=ax, plot_limits=[0,1])\n", 532 | " ax.axis(limits)\n", 533 | " ax.plot(x, y, '--r')\n", 534 | "\n", 535 | "\n", 536 | " xlabel('Input (x)')\n", 537 | " ylabel('Output (y)')\n", 538 | " title('Gaussian Process')\n", 539 | " \n", 540 | " idx = np.where(UCB==UCB.max())[0]\n", 541 | " \n", 542 | " x_ucb = np.append(x_ucb, x[idx])\n", 543 | " y_ucb = np.append(y_ucb, y[idx])\n", 544 | " \n", 545 | " x_ucb = x_ucb[:,None].T\n", 546 | " y_ucb = y_ucb[:,None].T\n", 547 | " " 548 | ], 549 | "language": "python", 550 | "metadata": {}, 551 | "outputs": [] 552 | } 553 | ], 554 | "metadata": {} 555 | } 556 | ] 557 | } -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | IPyNotebook_MachineLearning 2 | =========================== 3 | 4 | This contains a number of IP[y]: Notebooks that hopefully give a light to areas of bayesian machine learning. 5 | --------------------------------------------------------------------------------