├── CONTRIBUTORS.md ├── LICENSE.md ├── README.md ├── examples ├── Turbo1.ipynb └── TurboM.ipynb ├── requirements.txt ├── setup.py └── turbo ├── __init__.py ├── gp.py ├── turbo_1.py ├── turbo_m.py └── utils.py /CONTRIBUTORS.md: -------------------------------------------------------------------------------- 1 | Code written by: 2 | - David Eriksson 3 | -------------------------------------------------------------------------------- /LICENSE.md: -------------------------------------------------------------------------------- 1 | "License" shall mean the terms and conditions for use, reproduction, and distribution as defined by the text below. 2 | 3 | "You" (or "Your") shall mean an individual or Legal Entity exercising permissions granted by this License. 4 | 5 | "Legal Entity" shall mean the union of the acting entity and all other entities that control, are controlled by, or are under common control with that entity. For the purposes of this definition, "control" means (i) the power, direct or indirect, to cause the direction or management of such entity, whether by contract or otherwise, or (ii) ownership of fifty percent (50%) or more of the outstanding shares, or (iii) beneficial ownership of such entity. 6 | 7 | "Source" form shall mean the preferred form for making modifications, including but not limited to software source code, documentation source, and configuration files. 8 | 9 | "Object" form shall mean any form resulting from mechanical transformation or translation of a Source form, including but not limited to compiled object code, generated documentation, and conversions to other media types. 10 | 11 | "Work" shall mean the work of authorship, whether in Source or Object form, made available under this License. 12 | 13 | This License governs use of the accompanying Work, and your use of the Work constitutes acceptance of this License. 14 | 15 | You may use this Work for any non-commercial purpose, subject to the restrictions in this License. Some purposes which can be non-commercial are teaching, academic research, and personal experimentation. You may also distribute this Work with books or other teaching materials, or publish the Work on websites, that are intended to teach the use of the Work. 16 | 17 | You may not use or distribute this Work, or any derivative works, outputs, or results from the Work, in any form for commercial purposes. Non-exhaustive examples of commercial purposes would be running business operations, licensing, leasing, or selling the Work, or distributing the Work for use with commercial products. 18 | 19 | You may modify this Work and distribute the modified Work for non-commercial purposes, however, you may not grant rights to the Work or derivative works that are broader than or in conflict with those provided by this License. For example, you may not distribute modifications of the Work under terms that would permit commercial use, or under terms that purport to require the Work or derivative works to be sublicensed to others. 20 | 21 | In return, we require that you agree: 22 | 23 | 1. Not to remove any copyright or other notices from the Work. 24 | 25 | 2. That if you distribute the Work in Source or Object form, you will include a verbatim copy of this License. 26 | 27 | 3. That if you distribute derivative works of the Work in Source form, you do so only under a license that includes all of the provisions of this License and is not in conflict with this License, and if you distribute derivative works of the Work solely in Object form you do so only under a license that complies with this License. 28 | 29 | 4. That if you have modified the Work or created derivative works from the Work, and distribute such modifications or derivative works, you will cause the modified files to carry prominent notices so that recipients know that they are not receiving the original Work. Such notices must state: (i) that you have changed the Work; and (ii) the date of any changes. 30 | 31 | 5. If you publicly use the Work or any output or result of the Work, you will provide a notice with such use that provides any person who uses, views, accesses, interacts with, or is otherwise exposed to the Work (i) with information of the nature of the Work, (ii) with a link to the Work, and (iii) a notice that the Work is available under this License. 32 | 33 | 6. THAT THE WORK COMES "AS IS", WITH NO WARRANTIES. THIS MEANS NO EXPRESS, IMPLIED OR STATUTORY WARRANTY, INCLUDING WITHOUT LIMITATION, WARRANTIES OF MERCHANTABILITY OR FITNESS FOR A PARTICULAR PURPOSE OR ANY WARRANTY OF TITLE OR NON-INFRINGEMENT. ALSO, YOU MUST PASS THIS DISCLAIMER ON WHENEVER YOU DISTRIBUTE THE WORK OR DERIVATIVE WORKS. 34 | 35 | 7. THAT NEITHER UBER TECHNOLOGIES, INC. NOR ANY OF ITS AFFILIATES, SUPPLIERS, SUCCESSORS, NOR ASSIGNS WILL BE LIABLE FOR ANY DAMAGES RELATED TO THE WORK OR THIS LICENSE, INCLUDING DIRECT, INDIRECT, SPECIAL, CONSEQUENTIAL OR INCIDENTAL DAMAGES, TO THE MAXIMUM EXTENT THE LAW PERMITS, NO MATTER WHAT LEGAL THEORY IT IS BASED ON. ALSO, YOU MUST PASS THIS LIMITATION OF LIABILITY ON WHENEVER YOU DISTRIBUTE THE WORK OR DERIVATIVE WORKS. 36 | 37 | 8. That if you sue anyone over patents that you think may apply to the Work or anyone's use of the Work, your license to the Work ends automatically. 38 | 39 | 9. That your rights under the License end automatically if you breach it in any way. 40 | 41 | 10. Uber Technologies, Inc. reserves all rights not expressly granted to you in this License. 42 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ## Overview 2 | 3 | This is the code-release for the TuRBO algorithm from ***Scalable Global Optimization via Local Bayesian Optimization*** appearing in NeurIPS 2019. This is an implementation for the noise-free case and may not work well if observations are noisy as the center of the trust region should be chosen based on the posterior mean in this case. 4 | 5 | Note that TuRBO is a **minimization** algorithm, so please make sure you reformulate potential maximization problems. 6 | 7 | ## Benchmark functions 8 | 9 | ### Robot pushing 10 | The original code for the robot pushing problem is available at https://github.com/zi-w/Ensemble-Bayesian-Optimization. We have made the following changes to the code when running our experiments: 11 | 12 | 1. We turned off the visualization, which speeds up the function evaluations. 13 | 2. We replaced all instances of ```np.random.normal(0, 0.01)``` by ```np.random.normal(0, 1e-6)``` in ```push_utils.py```. This makes the function close to noise-free. Another option is to average over several evaluations using the original code 14 | 3. We flipped the sign of the objective function to turn this into a minimization problem. 15 | 16 | Dependencies: ```numpy ```, ```pygame```, ```box2d-py``` 17 | 18 | ### Rover 19 | The original code for the robot pushing problem is available at https://github.com/zi-w/Ensemble-Bayesian-Optimization. We used the large version of the problem, which has 60 dimensions. We have flipped the sign of the objective function to turn this into a minimization problem. 20 | 21 | Dependencies: ```numpy```, ```scipy``` 22 | 23 | ### Lunar 24 | 25 | The lunar code is available in the OpenAI gym: https://github.com/openai/gym. The goal of the problem is to learn the parameter values of a controller for the lunar lander. The controller we learn is a modification of the original heuristic controller which takes the form: 26 | 27 | ``` 28 | def heuristic_Controller(s, w): 29 | angle_targ = s[0] * w[0] + s[2] * w[1] 30 | if angle_targ > w[2]: 31 | angle_targ = w[2] 32 | if angle_targ < -w[2]: 33 | angle_targ = -w[2] 34 | hover_targ = w[3] * np.abs(s[0]) 35 | 36 | angle_todo = (angle_targ - s[4]) * w[4] - (s[5]) * w[5] 37 | hover_todo = (hover_targ - s[1]) * w[6] - (s[3]) * w[7] 38 | 39 | if s[6] or s[7]: 40 | angle_todo = w[8] 41 | hover_todo = -(s[3]) * w[9] 42 | 43 | a = 0 44 | if hover_todo > np.abs(angle_todo) and hover_todo > w[10]: 45 | a = 2 46 | elif angle_todo < -w[11]: 47 | a = 3 48 | elif angle_todo > +w[11]: 49 | a = 1 50 | return a 51 | ``` 52 | 53 | We use the constraints 0 <= w_i <= 2 for all parameters. We use ```INITIAL_RANDOM = 1500.0``` to make the problem more challenging. 54 | 55 | For more information about the logic behind this controller and how to integrate it with ```gym```, take a look at the original heuristic controller source code: https://github.com/openai/gym/blob/master/gym/envs/box2d/lunar_lander.py#L364 56 | 57 | Dependencies: ```gym```, ```box2d-py``` 58 | 59 | ### Cosmological constant 60 | The code for the cosmological constant problem is available here: https://ascl.net/1306.012. You need to follow the instructions and compile the FORTRAN code. This gives you an executable ```CAMB``` that you can call to run the simulation. 61 | 62 | The parameter names and bounds that we tune are the following: 63 | 64 | ``` 65 | ombh2: [0.01, 0.25] 66 | omch2: [0.01, 0.25] 67 | omnuh2: [0.01, 0.25] 68 | omk: [0.01, 0.25] 69 | hubble: [52.5, 100] 70 | temp_cmb: [2.7, 2.8] 71 | hefrac: [0.2, 0.3] 72 | mneu: [2.9, 3.09] 73 | scalar_amp: [1.5e-9, 2.6e-8] 74 | scalar_spec_ind: [0.72, 5] 75 | rf_fudge: [0, 100] 76 | rf_fudge_he: [0, 100] 77 | ``` 78 | 79 | ## Examples 80 | Check the examples folder for two examples on how to use Turbo-1 and Turbo-n. 81 | 82 | ## Citing us 83 | 84 | The final version of the paper is available at: http://papers.nips.cc/paper/8788-scalable-global-optimization-via-local-bayesian-optimization. 85 | 86 | ``` 87 | @inproceedings{eriksson2019scalable, 88 | title = {Scalable Global Optimization via Local {Bayesian} Optimization}, 89 | author = {Eriksson, David and Pearce, Michael and Gardner, Jacob and Turner, Ryan D and Poloczek, Matthias}, 90 | booktitle = {Advances in Neural Information Processing Systems}, 91 | pages = {5496--5507}, 92 | year = {2019}, 93 | url = {http://papers.nips.cc/paper/8788-scalable-global-optimization-via-local-bayesian-optimization.pdf}, 94 | } 95 | ``` 96 | -------------------------------------------------------------------------------- /examples/Turbo1.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Simple example of TuRBO-1" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "from turbo import Turbo1\n", 17 | "import numpy as np\n", 18 | "import torch\n", 19 | "import math\n", 20 | "import matplotlib\n", 21 | "import matplotlib.pyplot as plt" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "## Set up an optimization problem class" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 2, 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "class Levy:\n", 38 | " def __init__(self, dim=10):\n", 39 | " self.dim = dim\n", 40 | " self.lb = -5 * np.ones(dim)\n", 41 | " self.ub = 10 * np.ones(dim)\n", 42 | " \n", 43 | " def __call__(self, x):\n", 44 | " assert len(x) == self.dim\n", 45 | " assert x.ndim == 1\n", 46 | " assert np.all(x <= self.ub) and np.all(x >= self.lb)\n", 47 | " w = 1 + (x - 1.0) / 4.0\n", 48 | " val = np.sin(np.pi * w[0]) ** 2 + \\\n", 49 | " np.sum((w[1:self.dim - 1] - 1) ** 2 * (1 + 10 * np.sin(np.pi * w[1:self.dim - 1] + 1) ** 2)) + \\\n", 50 | " (w[self.dim - 1] - 1) ** 2 * (1 + np.sin(2 * np.pi * w[self.dim - 1])**2)\n", 51 | " return val\n", 52 | "\n", 53 | "f = Levy(10)" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "## Create a Turbo optimizer instance" 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": 3, 66 | "metadata": {}, 67 | "outputs": [ 68 | { 69 | "name": "stdout", 70 | "output_type": "stream", 71 | "text": [ 72 | "Using dtype = torch.float64 \n", 73 | "Using device = cpu\n" 74 | ] 75 | } 76 | ], 77 | "source": [ 78 | "turbo1 = Turbo1(\n", 79 | " f=f, # Handle to objective function\n", 80 | " lb=f.lb, # Numpy array specifying lower bounds\n", 81 | " ub=f.ub, # Numpy array specifying upper bounds\n", 82 | " n_init=20, # Number of initial bounds from an Latin hypercube design\n", 83 | " max_evals = 1000, # Maximum number of evaluations\n", 84 | " batch_size=10, # How large batch size TuRBO uses\n", 85 | " verbose=True, # Print information from each batch\n", 86 | " use_ard=True, # Set to true if you want to use ARD for the GP kernel\n", 87 | " max_cholesky_size=2000, # When we switch from Cholesky to Lanczos\n", 88 | " n_training_steps=50, # Number of steps of ADAM to learn the hypers\n", 89 | " min_cuda=1024, # Run on the CPU for small datasets\n", 90 | " device=\"cpu\", # \"cpu\" or \"cuda\"\n", 91 | " dtype=\"float64\", # float64 or float32\n", 92 | ")" 93 | ] 94 | }, 95 | { 96 | "cell_type": "markdown", 97 | "metadata": {}, 98 | "source": [ 99 | "# Run the optimization process" 100 | ] 101 | }, 102 | { 103 | "cell_type": "code", 104 | "execution_count": 4, 105 | "metadata": {}, 106 | "outputs": [ 107 | { 108 | "name": "stdout", 109 | "output_type": "stream", 110 | "text": [ 111 | "Starting from fbest = 20.98\n", 112 | "50) New best: 15.65\n", 113 | "80) New best: 11.27\n", 114 | "90) New best: 9.325\n", 115 | "100) New best: 8.288\n", 116 | "110) New best: 6.944\n", 117 | "120) New best: 5.974\n", 118 | "140) New best: 5.951\n", 119 | "160) New best: 5.905\n", 120 | "170) New best: 5.905\n", 121 | "180) New best: 5.822\n", 122 | "190) New best: 5.785\n", 123 | "200) New best: 5.759\n", 124 | "220) New best: 5.738\n", 125 | "230) New best: 5.683\n", 126 | "240) Restarting with fbest = 5.683\n", 127 | "Starting from fbest = 32.5\n", 128 | "320) New best: 5.526\n", 129 | "330) New best: 3.95\n", 130 | "350) New best: 1.736\n", 131 | "370) New best: 1.229\n", 132 | "410) New best: 1.206\n", 133 | "420) New best: 1.193\n", 134 | "430) New best: 1.191\n", 135 | "440) New best: 1.163\n", 136 | "450) New best: 1.145\n", 137 | "460) New best: 1.06\n", 138 | "480) New best: 1.024\n", 139 | "490) New best: 1.01\n", 140 | "500) New best: 1.001\n", 141 | "530) Restarting with fbest = 1.001\n", 142 | "Starting from fbest = 12.85\n", 143 | "730) Restarting with fbest = 8.634\n", 144 | "Starting from fbest = 9.62\n", 145 | "890) Restarting with fbest = 5.87\n", 146 | "Starting from fbest = 25.71\n" 147 | ] 148 | } 149 | ], 150 | "source": [ 151 | "turbo1.optimize()" 152 | ] 153 | }, 154 | { 155 | "cell_type": "markdown", 156 | "metadata": {}, 157 | "source": [ 158 | "## Extract all evaluations from Turbo and print the best" 159 | ] 160 | }, 161 | { 162 | "cell_type": "code", 163 | "execution_count": 5, 164 | "metadata": {}, 165 | "outputs": [ 166 | { 167 | "name": "stdout", 168 | "output_type": "stream", 169 | "text": [ 170 | "Best value found:\n", 171 | "\tf(x) = 1.001\n", 172 | "Observed at:\n", 173 | "\tx = [-3.006 0.914 3.659 0.853 0.033 -0.203 1.199 0.812 -0.301 2.42 ]\n" 174 | ] 175 | } 176 | ], 177 | "source": [ 178 | "X = turbo1.X # Evaluated points\n", 179 | "fX = turbo1.fX # Observed values\n", 180 | "ind_best = np.argmin(fX)\n", 181 | "f_best, x_best = fX[ind_best], X[ind_best, :]\n", 182 | "\n", 183 | "print(\"Best value found:\\n\\tf(x) = %.3f\\nObserved at:\\n\\tx = %s\" % (f_best, np.around(x_best, 3)))" 184 | ] 185 | }, 186 | { 187 | "cell_type": "markdown", 188 | "metadata": {}, 189 | "source": [ 190 | "## Plot the progress\n", 191 | "Each trust region is independent and finds different solutions" 192 | ] 193 | }, 194 | { 195 | "cell_type": "code", 196 | "execution_count": 6, 197 | "metadata": {}, 198 | "outputs": [ 199 | { 200 | "data": { 201 | "image/png": "\n", 202 | "text/plain": [ 203 | "
" 204 | ] 205 | }, 206 | "metadata": { 207 | "needs_background": "light" 208 | }, 209 | "output_type": "display_data" 210 | } 211 | ], 212 | "source": [ 213 | "fig = plt.figure(figsize=(7, 5))\n", 214 | "matplotlib.rcParams.update({'font.size': 16})\n", 215 | "plt.plot(fX, 'b.', ms=10) # Plot all evaluated points as blue dots\n", 216 | "plt.plot(np.minimum.accumulate(fX), 'r', lw=3) # Plot cumulative minimum as a red line\n", 217 | "plt.xlim([0, len(fX)])\n", 218 | "plt.ylim([0, 30])\n", 219 | "plt.title(\"10D Levy function\")\n", 220 | "\n", 221 | "plt.tight_layout()\n", 222 | "plt.show()" 223 | ] 224 | }, 225 | { 226 | "cell_type": "code", 227 | "execution_count": null, 228 | "metadata": {}, 229 | "outputs": [], 230 | "source": [] 231 | } 232 | ], 233 | "metadata": { 234 | "kernelspec": { 235 | "display_name": "Python 3", 236 | "language": "python", 237 | "name": "python3" 238 | }, 239 | "language_info": { 240 | "codemirror_mode": { 241 | "name": "ipython", 242 | "version": 3 243 | }, 244 | "file_extension": ".py", 245 | "mimetype": "text/x-python", 246 | "name": "python", 247 | "nbconvert_exporter": "python", 248 | "pygments_lexer": "ipython3", 249 | "version": "3.6.8" 250 | } 251 | }, 252 | "nbformat": 4, 253 | "nbformat_minor": 2 254 | } 255 | -------------------------------------------------------------------------------- /examples/TurboM.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "## Simple example of TuRBO-m" 8 | ] 9 | }, 10 | { 11 | "cell_type": "code", 12 | "execution_count": 1, 13 | "metadata": {}, 14 | "outputs": [], 15 | "source": [ 16 | "from turbo import TurboM\n", 17 | "import numpy as np\n", 18 | "import torch\n", 19 | "import math\n", 20 | "import matplotlib\n", 21 | "import matplotlib.pyplot as plt" 22 | ] 23 | }, 24 | { 25 | "cell_type": "markdown", 26 | "metadata": {}, 27 | "source": [ 28 | "## Set up an optimization problem class" 29 | ] 30 | }, 31 | { 32 | "cell_type": "code", 33 | "execution_count": 2, 34 | "metadata": {}, 35 | "outputs": [], 36 | "source": [ 37 | "class Levy:\n", 38 | " def __init__(self, dim=10):\n", 39 | " self.dim = dim\n", 40 | " self.lb = -5 * np.ones(dim)\n", 41 | " self.ub = 10 * np.ones(dim)\n", 42 | " \n", 43 | " def __call__(self, x):\n", 44 | " assert len(x) == self.dim\n", 45 | " assert x.ndim == 1\n", 46 | " assert np.all(x <= self.ub) and np.all(x >= self.lb)\n", 47 | " w = 1 + (x - 1.0) / 4.0\n", 48 | " val = np.sin(np.pi * w[0]) ** 2 + \\\n", 49 | " np.sum((w[1:self.dim - 1] - 1) ** 2 * (1 + 10 * np.sin(np.pi * w[1:self.dim - 1] + 1) ** 2)) + \\\n", 50 | " (w[self.dim - 1] - 1) ** 2 * (1 + np.sin(2 * np.pi * w[self.dim - 1])**2)\n", 51 | " return val\n", 52 | "\n", 53 | "f = Levy(10)" 54 | ] 55 | }, 56 | { 57 | "cell_type": "markdown", 58 | "metadata": {}, 59 | "source": [ 60 | "## Create a Turbo optimizer instance" 61 | ] 62 | }, 63 | { 64 | "cell_type": "code", 65 | "execution_count": 3, 66 | "metadata": {}, 67 | "outputs": [ 68 | { 69 | "name": "stdout", 70 | "output_type": "stream", 71 | "text": [ 72 | "Using dtype = torch.float64 \n", 73 | "Using device = cpu\n" 74 | ] 75 | } 76 | ], 77 | "source": [ 78 | "turbo_m = TurboM(\n", 79 | " f=f, # Handle to objective function\n", 80 | " lb=f.lb, # Numpy array specifying lower bounds\n", 81 | " ub=f.ub, # Numpy array specifying upper bounds\n", 82 | " n_init=10, # Number of initial bounds from an Symmetric Latin hypercube design\n", 83 | " max_evals=1000, # Maximum number of evaluations\n", 84 | " n_trust_regions=5, # Number of trust regions\n", 85 | " batch_size=10, # How large batch size TuRBO uses\n", 86 | " verbose=True, # Print information from each batch\n", 87 | " use_ard=True, # Set to true if you want to use ARD for the GP kernel\n", 88 | " max_cholesky_size=2000, # When we switch from Cholesky to Lanczos\n", 89 | " n_training_steps=50, # Number of steps of ADAM to learn the hypers\n", 90 | " min_cuda=1024, # Run on the CPU for small datasets\n", 91 | " device=\"cpu\", # \"cpu\" or \"cuda\"\n", 92 | " dtype=\"float64\", # float64 or float32\n", 93 | ")" 94 | ] 95 | }, 96 | { 97 | "cell_type": "markdown", 98 | "metadata": {}, 99 | "source": [ 100 | "# Run the optimization process" 101 | ] 102 | }, 103 | { 104 | "cell_type": "code", 105 | "execution_count": 4, 106 | "metadata": {}, 107 | "outputs": [ 108 | { 109 | "name": "stdout", 110 | "output_type": "stream", 111 | "text": [ 112 | "TR-0 starting from: 24.79\n", 113 | "TR-1 starting from: 20.77\n", 114 | "TR-2 starting from: 14.87\n", 115 | "TR-3 starting from: 27.97\n", 116 | "TR-4 starting from: 23.89\n", 117 | "80) New best @ TR-2: 12.43\n", 118 | "90) New best @ TR-2: 6.42\n", 119 | "110) New best @ TR-2: 5.467\n", 120 | "180) New best @ TR-2: 2.888\n", 121 | "230) New best @ TR-1: 1.944\n", 122 | "280) New best @ TR-1: 1.54\n", 123 | "310) New best @ TR-1: 1.052\n", 124 | "340) New best @ TR-1: 1.038\n", 125 | "390) New best @ TR-1: 0.9689\n", 126 | "410) New best @ TR-1: 0.877\n", 127 | "420) New best @ TR-1: 0.7794\n", 128 | "460) New best @ TR-1: 0.7509\n", 129 | "470) New best @ TR-1: 0.7264\n", 130 | "480) New best @ TR-1: 0.7238\n", 131 | "530) New best @ TR-1: 0.7044\n", 132 | "540) New best @ TR-1: 0.695\n", 133 | "550) New best @ TR-1: 0.6823\n", 134 | "560) New best @ TR-1: 0.6656\n", 135 | "590) New best @ TR-1: 0.6614\n", 136 | "600) New best @ TR-1: 0.6604\n", 137 | "640) TR-1 converged to: : 0.6604\n", 138 | "640) TR-1 is restarting from: : 23.66\n" 139 | ] 140 | } 141 | ], 142 | "source": [ 143 | "turbo_m.optimize()" 144 | ] 145 | }, 146 | { 147 | "cell_type": "markdown", 148 | "metadata": {}, 149 | "source": [ 150 | "## Extract all evaluations from Turbo and print the best" 151 | ] 152 | }, 153 | { 154 | "cell_type": "code", 155 | "execution_count": 5, 156 | "metadata": {}, 157 | "outputs": [ 158 | { 159 | "name": "stdout", 160 | "output_type": "stream", 161 | "text": [ 162 | "Best value found:\n", 163 | "\tf(x) = 0.660\n", 164 | "Observed at:\n", 165 | "\tx = [-2.968 1.072 0.173 0.973 3.698 0.883 0.946 0.872 0.006 0.927]\n" 166 | ] 167 | } 168 | ], 169 | "source": [ 170 | "X = turbo_m.X # Evaluated points\n", 171 | "fX = turbo_m.fX # Observed values\n", 172 | "ind_best = np.argmin(fX)\n", 173 | "f_best, x_best = fX[ind_best], X[ind_best, :]\n", 174 | "\n", 175 | "print(\"Best value found:\\n\\tf(x) = %.3f\\nObserved at:\\n\\tx = %s\" % (f_best, np.around(x_best, 3)))" 176 | ] 177 | }, 178 | { 179 | "cell_type": "markdown", 180 | "metadata": {}, 181 | "source": [ 182 | "## Plot the progress\n", 183 | "\n", 184 | "TuRBO-5 converges to a solution close to the global optimum" 185 | ] 186 | }, 187 | { 188 | "cell_type": "code", 189 | "execution_count": 6, 190 | "metadata": {}, 191 | "outputs": [ 192 | { 193 | "data": { 194 | "image/png": "\n", 195 | "text/plain": [ 196 | "
" 197 | ] 198 | }, 199 | "metadata": { 200 | "needs_background": "light" 201 | }, 202 | "output_type": "display_data" 203 | } 204 | ], 205 | "source": [ 206 | "fig = plt.figure(figsize=(7, 5))\n", 207 | "matplotlib.rcParams.update({'font.size': 16})\n", 208 | "plt.plot(fX, 'b.', ms=10) # Plot all evaluated points as blue dots\n", 209 | "plt.plot(np.minimum.accumulate(fX), 'r', lw=3) # Plot cumulative minimum as a red line\n", 210 | "plt.xlim([0, len(fX)])\n", 211 | "plt.ylim([0, 30])\n", 212 | "plt.title(\"10D Levy function\")\n", 213 | "\n", 214 | "plt.tight_layout()\n", 215 | "plt.show()" 216 | ] 217 | }, 218 | { 219 | "cell_type": "code", 220 | "execution_count": null, 221 | "metadata": {}, 222 | "outputs": [], 223 | "source": [] 224 | } 225 | ], 226 | "metadata": { 227 | "kernelspec": { 228 | "display_name": "Python 3", 229 | "language": "python", 230 | "name": "python3" 231 | }, 232 | "language_info": { 233 | "codemirror_mode": { 234 | "name": "ipython", 235 | "version": 3 236 | }, 237 | "file_extension": ".py", 238 | "mimetype": "text/x-python", 239 | "name": "python", 240 | "nbconvert_exporter": "python", 241 | "pygments_lexer": "ipython3", 242 | "version": "3.6.8" 243 | } 244 | }, 245 | "nbformat": 4, 246 | "nbformat_minor": 2 247 | } 248 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | numpy==1.17.3 2 | torch==1.3.0 3 | gpytorch==0.3.6 4 | -------------------------------------------------------------------------------- /setup.py: -------------------------------------------------------------------------------- 1 | from setuptools import setup, find_packages 2 | 3 | setup( 4 | name="turbo", 5 | version="0.0.1", 6 | packages=find_packages(), 7 | install_requires=["numpy>=1.17.3", "torch>=1.3.0", "gpytorch>=0.3.6"], 8 | ) 9 | -------------------------------------------------------------------------------- /turbo/__init__.py: -------------------------------------------------------------------------------- 1 | from .turbo_1 import Turbo1 2 | from .turbo_m import TurboM 3 | -------------------------------------------------------------------------------- /turbo/gp.py: -------------------------------------------------------------------------------- 1 | ############################################################################### 2 | # Copyright (c) 2019 Uber Technologies, Inc. # 3 | # # 4 | # Licensed under the Uber Non-Commercial License (the "License"); # 5 | # you may not use this file except in compliance with the License. # 6 | # You may obtain a copy of the License at the root directory of this project. # 7 | # # 8 | # See the License for the specific language governing permissions and # 9 | # limitations under the License. # 10 | ############################################################################### 11 | 12 | import math 13 | 14 | import gpytorch 15 | import numpy as np 16 | import torch 17 | from gpytorch.constraints.constraints import Interval 18 | from gpytorch.distributions import MultivariateNormal 19 | from gpytorch.kernels import MaternKernel, ScaleKernel 20 | from gpytorch.likelihoods import GaussianLikelihood 21 | from gpytorch.means import ConstantMean 22 | from gpytorch.mlls import ExactMarginalLogLikelihood 23 | from gpytorch.models import ExactGP 24 | 25 | 26 | # GP Model 27 | class GP(ExactGP): 28 | def __init__(self, train_x, train_y, likelihood, lengthscale_constraint, outputscale_constraint, ard_dims): 29 | super(GP, self).__init__(train_x, train_y, likelihood) 30 | self.ard_dims = ard_dims 31 | self.mean_module = ConstantMean() 32 | base_kernel = MaternKernel(lengthscale_constraint=lengthscale_constraint, ard_num_dims=ard_dims, nu=2.5) 33 | self.covar_module = ScaleKernel(base_kernel, outputscale_constraint=outputscale_constraint) 34 | 35 | def forward(self, x): 36 | mean_x = self.mean_module(x) 37 | covar_x = self.covar_module(x) 38 | return MultivariateNormal(mean_x, covar_x) 39 | 40 | 41 | def train_gp(train_x, train_y, use_ard, num_steps, hypers={}): 42 | """Fit a GP model where train_x is in [0, 1]^d and train_y is standardized.""" 43 | assert train_x.ndim == 2 44 | assert train_y.ndim == 1 45 | assert train_x.shape[0] == train_y.shape[0] 46 | 47 | # Create hyper parameter bounds 48 | noise_constraint = Interval(5e-4, 0.2) 49 | if use_ard: 50 | lengthscale_constraint = Interval(0.005, 2.0) 51 | else: 52 | lengthscale_constraint = Interval(0.005, math.sqrt(train_x.shape[1])) # [0.005, sqrt(dim)] 53 | outputscale_constraint = Interval(0.05, 20.0) 54 | 55 | # Create models 56 | likelihood = GaussianLikelihood(noise_constraint=noise_constraint).to(device=train_x.device, dtype=train_y.dtype) 57 | ard_dims = train_x.shape[1] if use_ard else None 58 | model = GP( 59 | train_x=train_x, 60 | train_y=train_y, 61 | likelihood=likelihood, 62 | lengthscale_constraint=lengthscale_constraint, 63 | outputscale_constraint=outputscale_constraint, 64 | ard_dims=ard_dims, 65 | ).to(device=train_x.device, dtype=train_x.dtype) 66 | 67 | # Find optimal model hyperparameters 68 | model.train() 69 | likelihood.train() 70 | 71 | # "Loss" for GPs - the marginal log likelihood 72 | mll = ExactMarginalLogLikelihood(likelihood, model) 73 | 74 | # Initialize model hypers 75 | if hypers: 76 | model.load_state_dict(hypers) 77 | else: 78 | hypers = {} 79 | hypers["covar_module.outputscale"] = 1.0 80 | hypers["covar_module.base_kernel.lengthscale"] = 0.5 81 | hypers["likelihood.noise"] = 0.005 82 | model.initialize(**hypers) 83 | 84 | # Use the adam optimizer 85 | optimizer = torch.optim.Adam([{"params": model.parameters()}], lr=0.1) 86 | 87 | for _ in range(num_steps): 88 | optimizer.zero_grad() 89 | output = model(train_x) 90 | loss = -mll(output, train_y) 91 | loss.backward() 92 | optimizer.step() 93 | 94 | # Switch to eval mode 95 | model.eval() 96 | likelihood.eval() 97 | 98 | return model 99 | -------------------------------------------------------------------------------- /turbo/turbo_1.py: -------------------------------------------------------------------------------- 1 | ############################################################################### 2 | # Copyright (c) 2019 Uber Technologies, Inc. # 3 | # # 4 | # Licensed under the Uber Non-Commercial License (the "License"); # 5 | # you may not use this file except in compliance with the License. # 6 | # You may obtain a copy of the License at the root directory of this project. # 7 | # # 8 | # See the License for the specific language governing permissions and # 9 | # limitations under the License. # 10 | ############################################################################### 11 | 12 | import math 13 | import sys 14 | from copy import deepcopy 15 | 16 | import gpytorch 17 | import numpy as np 18 | import torch 19 | from torch.quasirandom import SobolEngine 20 | 21 | from .gp import train_gp 22 | from .utils import from_unit_cube, latin_hypercube, to_unit_cube 23 | 24 | 25 | class Turbo1: 26 | """The TuRBO-1 algorithm. 27 | 28 | Parameters 29 | ---------- 30 | f : function handle 31 | lb : Lower variable bounds, numpy.array, shape (d,). 32 | ub : Upper variable bounds, numpy.array, shape (d,). 33 | n_init : Number of initial points (2*dim is recommended), int. 34 | max_evals : Total evaluation budget, int. 35 | batch_size : Number of points in each batch, int. 36 | verbose : If you want to print information about the optimization progress, bool. 37 | use_ard : If you want to use ARD for the GP kernel. 38 | max_cholesky_size : Largest number of training points where we use Cholesky, int 39 | n_training_steps : Number of training steps for learning the GP hypers, int 40 | min_cuda : We use float64 on the CPU if we have this or fewer datapoints 41 | device : Device to use for GP fitting ("cpu" or "cuda") 42 | dtype : Dtype to use for GP fitting ("float32" or "float64") 43 | 44 | Example usage: 45 | turbo1 = Turbo1(f=f, lb=lb, ub=ub, n_init=n_init, max_evals=max_evals) 46 | turbo1.optimize() # Run optimization 47 | X, fX = turbo1.X, turbo1.fX # Evaluated points 48 | """ 49 | 50 | def __init__( 51 | self, 52 | f, 53 | lb, 54 | ub, 55 | n_init, 56 | max_evals, 57 | batch_size=1, 58 | verbose=True, 59 | use_ard=True, 60 | max_cholesky_size=2000, 61 | n_training_steps=50, 62 | min_cuda=1024, 63 | device="cpu", 64 | dtype="float64", 65 | ): 66 | 67 | # Very basic input checks 68 | assert lb.ndim == 1 and ub.ndim == 1 69 | assert len(lb) == len(ub) 70 | assert np.all(ub > lb) 71 | assert max_evals > 0 and isinstance(max_evals, int) 72 | assert n_init > 0 and isinstance(n_init, int) 73 | assert batch_size > 0 and isinstance(batch_size, int) 74 | assert isinstance(verbose, bool) and isinstance(use_ard, bool) 75 | assert max_cholesky_size >= 0 and isinstance(batch_size, int) 76 | assert n_training_steps >= 30 and isinstance(n_training_steps, int) 77 | assert max_evals > n_init and max_evals > batch_size 78 | assert device == "cpu" or device == "cuda" 79 | assert dtype == "float32" or dtype == "float64" 80 | if device == "cuda": 81 | assert torch.cuda.is_available(), "can't use cuda if it's not available" 82 | 83 | # Save function information 84 | self.f = f 85 | self.dim = len(lb) 86 | self.lb = lb 87 | self.ub = ub 88 | 89 | # Settings 90 | self.n_init = n_init 91 | self.max_evals = max_evals 92 | self.batch_size = batch_size 93 | self.verbose = verbose 94 | self.use_ard = use_ard 95 | self.max_cholesky_size = max_cholesky_size 96 | self.n_training_steps = n_training_steps 97 | 98 | # Hyperparameters 99 | self.mean = np.zeros((0, 1)) 100 | self.signal_var = np.zeros((0, 1)) 101 | self.noise_var = np.zeros((0, 1)) 102 | self.lengthscales = np.zeros((0, self.dim)) if self.use_ard else np.zeros((0, 1)) 103 | 104 | # Tolerances and counters 105 | self.n_cand = min(100 * self.dim, 5000) 106 | self.failtol = np.ceil(np.max([4.0 / batch_size, self.dim / batch_size])) 107 | self.succtol = 3 108 | self.n_evals = 0 109 | 110 | # Trust region sizes 111 | self.length_min = 0.5 ** 7 112 | self.length_max = 1.6 113 | self.length_init = 0.8 114 | 115 | # Save the full history 116 | self.X = np.zeros((0, self.dim)) 117 | self.fX = np.zeros((0, 1)) 118 | 119 | # Device and dtype for GPyTorch 120 | self.min_cuda = min_cuda 121 | self.dtype = torch.float32 if dtype == "float32" else torch.float64 122 | self.device = torch.device("cuda") if device == "cuda" else torch.device("cpu") 123 | if self.verbose: 124 | print("Using dtype = %s \nUsing device = %s" % (self.dtype, self.device)) 125 | sys.stdout.flush() 126 | 127 | # Initialize parameters 128 | self._restart() 129 | 130 | def _restart(self): 131 | self._X = [] 132 | self._fX = [] 133 | self.failcount = 0 134 | self.succcount = 0 135 | self.length = self.length_init 136 | 137 | def _adjust_length(self, fX_next): 138 | if np.min(fX_next) < np.min(self._fX) - 1e-3 * math.fabs(np.min(self._fX)): 139 | self.succcount += 1 140 | self.failcount = 0 141 | else: 142 | self.succcount = 0 143 | self.failcount += 1 144 | 145 | if self.succcount == self.succtol: # Expand trust region 146 | self.length = min([2.0 * self.length, self.length_max]) 147 | self.succcount = 0 148 | elif self.failcount == self.failtol: # Shrink trust region 149 | self.length /= 2.0 150 | self.failcount = 0 151 | 152 | def _create_candidates(self, X, fX, length, n_training_steps, hypers): 153 | """Generate candidates assuming X has been scaled to [0,1]^d.""" 154 | # Pick the center as the point with the smallest function values 155 | # NOTE: This may not be robust to noise, in which case the posterior mean of the GP can be used instead 156 | assert X.min() >= 0.0 and X.max() <= 1.0 157 | 158 | # Standardize function values. 159 | mu, sigma = np.median(fX), fX.std() 160 | sigma = 1.0 if sigma < 1e-6 else sigma 161 | fX = (deepcopy(fX) - mu) / sigma 162 | 163 | # Figure out what device we are running on 164 | if len(X) < self.min_cuda: 165 | device, dtype = torch.device("cpu"), torch.float64 166 | else: 167 | device, dtype = self.device, self.dtype 168 | 169 | # We use CG + Lanczos for training if we have enough data 170 | with gpytorch.settings.max_cholesky_size(self.max_cholesky_size): 171 | X_torch = torch.tensor(X).to(device=device, dtype=dtype) 172 | y_torch = torch.tensor(fX).to(device=device, dtype=dtype) 173 | gp = train_gp( 174 | train_x=X_torch, train_y=y_torch, use_ard=self.use_ard, num_steps=n_training_steps, hypers=hypers 175 | ) 176 | 177 | # Save state dict 178 | hypers = gp.state_dict() 179 | 180 | # Create the trust region boundaries 181 | x_center = X[fX.argmin().item(), :][None, :] 182 | weights = gp.covar_module.base_kernel.lengthscale.cpu().detach().numpy().ravel() 183 | weights = weights / weights.mean() # This will make the next line more stable 184 | weights = weights / np.prod(np.power(weights, 1.0 / len(weights))) # We now have weights.prod() = 1 185 | lb = np.clip(x_center - weights * length / 2.0, 0.0, 1.0) 186 | ub = np.clip(x_center + weights * length / 2.0, 0.0, 1.0) 187 | 188 | # Draw a Sobolev sequence in [lb, ub] 189 | seed = np.random.randint(int(1e6)) 190 | sobol = SobolEngine(self.dim, scramble=True, seed=seed) 191 | pert = sobol.draw(self.n_cand).to(dtype=dtype, device=device).cpu().detach().numpy() 192 | pert = lb + (ub - lb) * pert 193 | 194 | # Create a perturbation mask 195 | prob_perturb = min(20.0 / self.dim, 1.0) 196 | mask = np.random.rand(self.n_cand, self.dim) <= prob_perturb 197 | ind = np.where(np.sum(mask, axis=1) == 0)[0] 198 | mask[ind, np.random.randint(0, self.dim - 1, size=len(ind))] = 1 199 | 200 | # Create candidate points 201 | X_cand = x_center.copy() * np.ones((self.n_cand, self.dim)) 202 | X_cand[mask] = pert[mask] 203 | 204 | # Figure out what device we are running on 205 | if len(X_cand) < self.min_cuda: 206 | device, dtype = torch.device("cpu"), torch.float64 207 | else: 208 | device, dtype = self.device, self.dtype 209 | 210 | # We may have to move the GP to a new device 211 | gp = gp.to(dtype=dtype, device=device) 212 | 213 | # We use Lanczos for sampling if we have enough data 214 | with torch.no_grad(), gpytorch.settings.max_cholesky_size(self.max_cholesky_size): 215 | X_cand_torch = torch.tensor(X_cand).to(device=device, dtype=dtype) 216 | y_cand = gp.likelihood(gp(X_cand_torch)).sample(torch.Size([self.batch_size])).t().cpu().detach().numpy() 217 | 218 | # Remove the torch variables 219 | del X_torch, y_torch, X_cand_torch, gp 220 | 221 | # De-standardize the sampled values 222 | y_cand = mu + sigma * y_cand 223 | 224 | return X_cand, y_cand, hypers 225 | 226 | def _select_candidates(self, X_cand, y_cand): 227 | """Select candidates.""" 228 | X_next = np.ones((self.batch_size, self.dim)) 229 | for i in range(self.batch_size): 230 | # Pick the best point and make sure we never pick it again 231 | indbest = np.argmin(y_cand[:, i]) 232 | X_next[i, :] = deepcopy(X_cand[indbest, :]) 233 | y_cand[indbest, :] = np.inf 234 | return X_next 235 | 236 | def optimize(self): 237 | """Run the full optimization process.""" 238 | while self.n_evals < self.max_evals: 239 | if len(self._fX) > 0 and self.verbose: 240 | n_evals, fbest = self.n_evals, self._fX.min() 241 | print(f"{n_evals}) Restarting with fbest = {fbest:.4}") 242 | sys.stdout.flush() 243 | 244 | # Initialize parameters 245 | self._restart() 246 | 247 | # Generate and evalute initial design points 248 | X_init = latin_hypercube(self.n_init, self.dim) 249 | X_init = from_unit_cube(X_init, self.lb, self.ub) 250 | fX_init = np.array([[self.f(x)] for x in X_init]) 251 | 252 | # Update budget and set as initial data for this TR 253 | self.n_evals += self.n_init 254 | self._X = deepcopy(X_init) 255 | self._fX = deepcopy(fX_init) 256 | 257 | # Append data to the global history 258 | self.X = np.vstack((self.X, deepcopy(X_init))) 259 | self.fX = np.vstack((self.fX, deepcopy(fX_init))) 260 | 261 | if self.verbose: 262 | fbest = self._fX.min() 263 | print(f"Starting from fbest = {fbest:.4}") 264 | sys.stdout.flush() 265 | 266 | # Thompson sample to get next suggestions 267 | while self.n_evals < self.max_evals and self.length >= self.length_min: 268 | # Warp inputs 269 | X = to_unit_cube(deepcopy(self._X), self.lb, self.ub) 270 | 271 | # Standardize values 272 | fX = deepcopy(self._fX).ravel() 273 | 274 | # Create th next batch 275 | X_cand, y_cand, _ = self._create_candidates( 276 | X, fX, length=self.length, n_training_steps=self.n_training_steps, hypers={} 277 | ) 278 | X_next = self._select_candidates(X_cand, y_cand) 279 | 280 | # Undo the warping 281 | X_next = from_unit_cube(X_next, self.lb, self.ub) 282 | 283 | # Evaluate batch 284 | fX_next = np.array([[self.f(x)] for x in X_next]) 285 | 286 | # Update trust region 287 | self._adjust_length(fX_next) 288 | 289 | # Update budget and append data 290 | self.n_evals += self.batch_size 291 | self._X = np.vstack((self._X, X_next)) 292 | self._fX = np.vstack((self._fX, fX_next)) 293 | 294 | if self.verbose and fX_next.min() < self.fX.min(): 295 | n_evals, fbest = self.n_evals, fX_next.min() 296 | print(f"{n_evals}) New best: {fbest:.4}") 297 | sys.stdout.flush() 298 | 299 | # Append data to the global history 300 | self.X = np.vstack((self.X, deepcopy(X_next))) 301 | self.fX = np.vstack((self.fX, deepcopy(fX_next))) 302 | -------------------------------------------------------------------------------- /turbo/turbo_m.py: -------------------------------------------------------------------------------- 1 | ############################################################################### 2 | # Copyright (c) 2019 Uber Technologies, Inc. # 3 | # # 4 | # Licensed under the Uber Non-Commercial License (the "License"); # 5 | # you may not use this file except in compliance with the License. # 6 | # You may obtain a copy of the License at the root directory of this project. # 7 | # # 8 | # See the License for the specific language governing permissions and # 9 | # limitations under the License. # 10 | ############################################################################### 11 | 12 | import math 13 | import sys 14 | from copy import deepcopy 15 | 16 | import gpytorch 17 | import numpy as np 18 | import torch 19 | 20 | from .gp import train_gp 21 | from .turbo_1 import Turbo1 22 | from .utils import from_unit_cube, latin_hypercube, to_unit_cube 23 | 24 | 25 | class TurboM(Turbo1): 26 | """The TuRBO-m algorithm. 27 | 28 | Parameters 29 | ---------- 30 | f : function handle 31 | lb : Lower variable bounds, numpy.array, shape (d,). 32 | ub : Upper variable bounds, numpy.array, shape (d,). 33 | n_init : Number of initial points *FOR EACH TRUST REGION* (2*dim is recommended), int. 34 | max_evals : Total evaluation budget, int. 35 | n_trust_regions : Number of trust regions 36 | batch_size : Number of points in each batch, int. 37 | verbose : If you want to print information about the optimization progress, bool. 38 | use_ard : If you want to use ARD for the GP kernel. 39 | max_cholesky_size : Largest number of training points where we use Cholesky, int 40 | n_training_steps : Number of training steps for learning the GP hypers, int 41 | min_cuda : We use float64 on the CPU if we have this or fewer datapoints 42 | device : Device to use for GP fitting ("cpu" or "cuda") 43 | dtype : Dtype to use for GP fitting ("float32" or "float64") 44 | 45 | Example usage: 46 | turbo5 = TurboM(f=f, lb=lb, ub=ub, n_init=n_init, max_evals=max_evals, n_trust_regions=5) 47 | turbo5.optimize() # Run optimization 48 | X, fX = turbo5.X, turbo5.fX # Evaluated points 49 | """ 50 | 51 | def __init__( 52 | self, 53 | f, 54 | lb, 55 | ub, 56 | n_init, 57 | max_evals, 58 | n_trust_regions, 59 | batch_size=1, 60 | verbose=True, 61 | use_ard=True, 62 | max_cholesky_size=2000, 63 | n_training_steps=50, 64 | min_cuda=1024, 65 | device="cpu", 66 | dtype="float64", 67 | ): 68 | self.n_trust_regions = n_trust_regions 69 | super().__init__( 70 | f=f, 71 | lb=lb, 72 | ub=ub, 73 | n_init=n_init, 74 | max_evals=max_evals, 75 | batch_size=batch_size, 76 | verbose=verbose, 77 | use_ard=use_ard, 78 | max_cholesky_size=max_cholesky_size, 79 | n_training_steps=n_training_steps, 80 | min_cuda=min_cuda, 81 | device=device, 82 | dtype=dtype, 83 | ) 84 | 85 | self.succtol = 3 86 | self.failtol = max(5, self.dim) 87 | 88 | # Very basic input checks 89 | assert n_trust_regions > 1 and isinstance(max_evals, int) 90 | assert max_evals > n_trust_regions * n_init, "Not enough trust regions to do initial evaluations" 91 | assert max_evals > batch_size, "Not enough evaluations to do a single batch" 92 | 93 | # Remember the hypers for trust regions we don't sample from 94 | self.hypers = [{} for _ in range(self.n_trust_regions)] 95 | 96 | # Initialize parameters 97 | self._restart() 98 | 99 | def _restart(self): 100 | self._idx = np.zeros((0, 1), dtype=int) # Track what trust region proposed what using an index vector 101 | self.failcount = np.zeros(self.n_trust_regions, dtype=int) 102 | self.succcount = np.zeros(self.n_trust_regions, dtype=int) 103 | self.length = self.length_init * np.ones(self.n_trust_regions) 104 | 105 | def _adjust_length(self, fX_next, i): 106 | assert i >= 0 and i <= self.n_trust_regions - 1 107 | 108 | fX_min = self.fX[self._idx[:, 0] == i, 0].min() # Target value 109 | if fX_next.min() < fX_min - 1e-3 * math.fabs(fX_min): 110 | self.succcount[i] += 1 111 | self.failcount[i] = 0 112 | else: 113 | self.succcount[i] = 0 114 | self.failcount[i] += len(fX_next) # NOTE: Add size of the batch for this TR 115 | 116 | if self.succcount[i] == self.succtol: # Expand trust region 117 | self.length[i] = min([2.0 * self.length[i], self.length_max]) 118 | self.succcount[i] = 0 119 | elif self.failcount[i] >= self.failtol: # Shrink trust region (we may have exceeded the failtol) 120 | self.length[i] /= 2.0 121 | self.failcount[i] = 0 122 | 123 | def _select_candidates(self, X_cand, y_cand): 124 | """Select candidates from samples from all trust regions.""" 125 | assert X_cand.shape == (self.n_trust_regions, self.n_cand, self.dim) 126 | assert y_cand.shape == (self.n_trust_regions, self.n_cand, self.batch_size) 127 | assert X_cand.min() >= 0.0 and X_cand.max() <= 1.0 and np.all(np.isfinite(y_cand)) 128 | 129 | X_next = np.zeros((self.batch_size, self.dim)) 130 | idx_next = np.zeros((self.batch_size, 1), dtype=int) 131 | for k in range(self.batch_size): 132 | i, j = np.unravel_index(np.argmin(y_cand[:, :, k]), (self.n_trust_regions, self.n_cand)) 133 | assert y_cand[:, :, k].min() == y_cand[i, j, k] 134 | X_next[k, :] = deepcopy(X_cand[i, j, :]) 135 | idx_next[k, 0] = i 136 | assert np.isfinite(y_cand[i, j, k]) # Just to make sure we never select nan or inf 137 | 138 | # Make sure we never pick this point again 139 | y_cand[i, j, :] = np.inf 140 | 141 | return X_next, idx_next 142 | 143 | def optimize(self): 144 | """Run the full optimization process.""" 145 | # Create initial points for each TR 146 | for i in range(self.n_trust_regions): 147 | X_init = latin_hypercube(self.n_init, self.dim) 148 | X_init = from_unit_cube(X_init, self.lb, self.ub) 149 | fX_init = np.array([[self.f(x)] for x in X_init]) 150 | 151 | # Update budget and set as initial data for this TR 152 | self.X = np.vstack((self.X, X_init)) 153 | self.fX = np.vstack((self.fX, fX_init)) 154 | self._idx = np.vstack((self._idx, i * np.ones((self.n_init, 1), dtype=int))) 155 | self.n_evals += self.n_init 156 | 157 | if self.verbose: 158 | fbest = fX_init.min() 159 | print(f"TR-{i} starting from: {fbest:.4}") 160 | sys.stdout.flush() 161 | 162 | # Thompson sample to get next suggestions 163 | while self.n_evals < self.max_evals: 164 | 165 | # Generate candidates from each TR 166 | X_cand = np.zeros((self.n_trust_regions, self.n_cand, self.dim)) 167 | y_cand = np.inf * np.ones((self.n_trust_regions, self.n_cand, self.batch_size)) 168 | for i in range(self.n_trust_regions): 169 | idx = np.where(self._idx == i)[0] # Extract all "active" indices 170 | 171 | # Get the points, values the active values 172 | X = deepcopy(self.X[idx, :]) 173 | X = to_unit_cube(X, self.lb, self.ub) 174 | 175 | # Get the values from the standardized data 176 | fX = deepcopy(self.fX[idx, 0].ravel()) 177 | 178 | # Don't retrain the model if the training data hasn't changed 179 | n_training_steps = 0 if self.hypers[i] else self.n_training_steps 180 | 181 | # Create new candidates 182 | X_cand[i, :, :], y_cand[i, :, :], self.hypers[i] = self._create_candidates( 183 | X, fX, length=self.length[i], n_training_steps=n_training_steps, hypers=self.hypers[i] 184 | ) 185 | 186 | # Select the next candidates 187 | X_next, idx_next = self._select_candidates(X_cand, y_cand) 188 | assert X_next.min() >= 0.0 and X_next.max() <= 1.0 189 | 190 | # Undo the warping 191 | X_next = from_unit_cube(X_next, self.lb, self.ub) 192 | 193 | # Evaluate batch 194 | fX_next = np.array([[self.f(x)] for x in X_next]) 195 | 196 | # Update trust regions 197 | for i in range(self.n_trust_regions): 198 | idx_i = np.where(idx_next == i)[0] 199 | if len(idx_i) > 0: 200 | self.hypers[i] = {} # Remove model hypers 201 | fX_i = fX_next[idx_i] 202 | 203 | if self.verbose and fX_i.min() < self.fX.min() - 1e-3 * math.fabs(self.fX.min()): 204 | n_evals, fbest = self.n_evals, fX_i.min() 205 | print(f"{n_evals}) New best @ TR-{i}: {fbest:.4}") 206 | sys.stdout.flush() 207 | self._adjust_length(fX_i, i) 208 | 209 | # Update budget and append data 210 | self.n_evals += self.batch_size 211 | self.X = np.vstack((self.X, deepcopy(X_next))) 212 | self.fX = np.vstack((self.fX, deepcopy(fX_next))) 213 | self._idx = np.vstack((self._idx, deepcopy(idx_next))) 214 | 215 | # Check if any TR needs to be restarted 216 | for i in range(self.n_trust_regions): 217 | if self.length[i] < self.length_min: # Restart trust region if converged 218 | idx_i = self._idx[:, 0] == i 219 | 220 | if self.verbose: 221 | n_evals, fbest = self.n_evals, self.fX[idx_i, 0].min() 222 | print(f"{n_evals}) TR-{i} converged to: : {fbest:.4}") 223 | sys.stdout.flush() 224 | 225 | # Reset length and counters, remove old data from trust region 226 | self.length[i] = self.length_init 227 | self.succcount[i] = 0 228 | self.failcount[i] = 0 229 | self._idx[idx_i, 0] = -1 # Remove points from trust region 230 | self.hypers[i] = {} # Remove model hypers 231 | 232 | # Create a new initial design 233 | X_init = latin_hypercube(self.n_init, self.dim) 234 | X_init = from_unit_cube(X_init, self.lb, self.ub) 235 | fX_init = np.array([[self.f(x)] for x in X_init]) 236 | 237 | # Print progress 238 | if self.verbose: 239 | n_evals, fbest = self.n_evals, fX_init.min() 240 | print(f"{n_evals}) TR-{i} is restarting from: : {fbest:.4}") 241 | sys.stdout.flush() 242 | 243 | # Append data to local history 244 | self.X = np.vstack((self.X, X_init)) 245 | self.fX = np.vstack((self.fX, fX_init)) 246 | self._idx = np.vstack((self._idx, i * np.ones((self.n_init, 1), dtype=int))) 247 | self.n_evals += self.n_init 248 | -------------------------------------------------------------------------------- /turbo/utils.py: -------------------------------------------------------------------------------- 1 | ############################################################################### 2 | # Copyright (c) 2019 Uber Technologies, Inc. # 3 | # # 4 | # Licensed under the Uber Non-Commercial License (the "License"); # 5 | # you may not use this file except in compliance with the License. # 6 | # You may obtain a copy of the License at the root directory of this project. # 7 | # # 8 | # See the License for the specific language governing permissions and # 9 | # limitations under the License. # 10 | ############################################################################### 11 | 12 | import numpy as np 13 | 14 | 15 | def to_unit_cube(x, lb, ub): 16 | """Project to [0, 1]^d from hypercube with bounds lb and ub""" 17 | assert np.all(lb < ub) and lb.ndim == 1 and ub.ndim == 1 and x.ndim == 2 18 | xx = (x - lb) / (ub - lb) 19 | return xx 20 | 21 | 22 | def from_unit_cube(x, lb, ub): 23 | """Project from [0, 1]^d to hypercube with bounds lb and ub""" 24 | assert np.all(lb < ub) and lb.ndim == 1 and ub.ndim == 1 and x.ndim == 2 25 | xx = x * (ub - lb) + lb 26 | return xx 27 | 28 | 29 | def latin_hypercube(n_pts, dim): 30 | """Basic Latin hypercube implementation with center perturbation.""" 31 | X = np.zeros((n_pts, dim)) 32 | centers = (1.0 + 2.0 * np.arange(0.0, n_pts)) / float(2 * n_pts) 33 | for i in range(dim): # Shuffle the center locataions for each dimension. 34 | X[:, i] = centers[np.random.permutation(n_pts)] 35 | 36 | # Add some perturbations within each box 37 | pert = np.random.uniform(-1.0, 1.0, (n_pts, dim)) / float(2 * n_pts) 38 | X += pert 39 | return X 40 | --------------------------------------------------------------------------------