├── CMakeLists.txt ├── LICENSE ├── README.md ├── include └── lbfgs.hpp └── src └── lbfgs_example.cpp /CMakeLists.txt: -------------------------------------------------------------------------------- 1 | cmake_minimum_required(VERSION 2.8.3) 2 | 3 | project(lbfgs) 4 | 5 | set(CMAKE_CXX_FLAGS "-std=c++11") 6 | set(CMAKE_BUILD_TYPE "Release") 7 | set(CMAKE_CXX_FLAGS_RELEASE "-O3 -Wall -fPIC") 8 | 9 | find_package(Eigen3 REQUIRED) 10 | 11 | include_directories( 12 | include 13 | ${EIGEN3_INCLUDE_DIRS} 14 | ) 15 | 16 | add_executable(${PROJECT_NAME} src/lbfgs_example.cpp) 17 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License 2 | 3 | Copyright (c) 1990 Jorge Nocedal 4 | Copyright (c) 2007-2010 Naoaki Okazaki 5 | Copyright (c) 2020-2022 Zhepei Wang 6 | 7 | Permission is hereby granted, free of charge, to any person obtaining a 8 | copy of this software and associated documentation files (the "Software"), 9 | to deal in the Software without restriction, including without limitation 10 | the rights to use, copy, modify, merge, publish, distribute, sublicense, 11 | and/or sell copies of the Software, and to permit persons to whom the 12 | Software is furnished to do so, subject to the following conditions: 13 | 14 | The above copyright notice and this permission notice shall be included in 15 | all copies or substantial portions of the Software. 16 | 17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 18 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 19 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 20 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 21 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 22 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN 23 | THE SOFTWARE. 24 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # LBFGS-Lite 2 | A header-only L-BFGS unconstrained optimizer. 3 | 4 | ## 0. About 5 | 6 | __LBFGS-Lite__ is a __C++__ [__header-only__](https://en.wikipedia.org/wiki/Header-only) library for __unconstrained optimization__. Many engineering considerations are added for improved robustness compared to the original version by Nocedal. 7 | 8 | ## 1. How to use 9 | 10 | All explanations are detailed by the comments in "lbfgs.hpp". See "lbfgs_example.cpp" for the calling procedure. You may need to install Eigen via "sudo apt install libeigen3-dev" because we use Eigen for better performance since [ver. 2.1](https://github.com/ZJU-FAST-Lab/LBFGS-Lite/tags). If you need a pure C-style lib without any external dependence, please refer to [ver. 0.9](https://github.com/ZJU-FAST-Lab/LBFGS-Lite/tags). 11 | 12 | ## 2. Features 13 | 14 | - Only one header file "lbfgs.hpp" is all you need. 15 | 16 | - The library implements [Limited-Memory Broyden-Fletcher-Goldfarb-Shanno Method](https://doi.org/10.1007/BF01589116) (L-BFGS). 17 | 18 | - A __highly robust line search__ proposed by [Lewis and Overton](https://link.springer.com/article/10.1007/s10107-012-0514-2) has been employed since [ver. 2.1](https://github.com/ZJU-FAST-Lab/LBFGS-Lite/tags). 19 | 20 | - Both __smooth__ ([C2](https://en.wikipedia.org/wiki/Smoothness)) and __nonsmooth__ ([C0 but piecewise C2](https://en.wikipedia.org/wiki/Smoothness)) functions are supported. 21 | 22 | - __Cautious update__ by [Li and Fukushima](https://epubs.siam.org/doi/pdf/10.1137/S1052623499354242) is employed for __global convergence__ in nonconvex cases. 23 | 24 | - __Externally provided maximum step size__ is convenient to functions defined on bounded sets. 25 | 26 | ## 3. Why this lib 27 | 28 | - Our lib is well-organized and concise (about 800 lines). 29 | 30 | - Many other libs use [Nocedal's zoom line search](https://link.springer.com/book/10.1007%2F978-0-387-40065-5), [More-Thuente line search](https://dl.acm.org/doi/abs/10.1145/192115.192132), or [Hager-Zhang line search](https://doi.org/10.1137/030601880). The truth is, interpolation-based schemes bring many tunable parameters and make more or less assumptions on the function smoothness. Engineering practice tells us that these assumptions can fail due to bad numerical conditions and ill-shaped functions. Admittedly, they slightly reduce the iteration number in some cases, but also easily-crashed. 31 | 32 | - Some other libs provide user-specified options for [Armijo and weak/strong Wolfe condition](https://en.wikipedia.org/wiki/Wolfe_conditions) without considering positive definiteness (PD) of the approximated Hessian. This is misleading because if only Armijo condition is satisfied, the PD property is not guaranteed and the solver is unstable or easily-crashed. We make the weak Wolfe condition mandatory here, which suffices for PD property. 33 | 34 | - We use [Lewis-Overton line search](https://link.springer.com/article/10.1007/s10107-012-0514-2) as the only scheme since ver. 2.1 from which nonsmooth functions are supported. Other schemes either assume high orders of continuity, or enforce the strong Wolfe condition can never be fulfilled by nonsmooth functions. Moreover, Lewis-Overton line search are widely adopted in many graphics applications involving optimization on [scene](https://dl.acm.org/doi/abs/10.1145/2766929), [shape](https://dl.acm.org/doi/abs/10.1145/2897824.2925918), or [mesh](https://dl.acm.org/doi/abs/10.1145/3197517.3201303), showing its practical robustness. 35 | 36 | - According to our practice, the function/gradient evaluation numbers are comparable with interpolation-based schemes. Sometimes it even requires fewer function calls. If you insist an interpolation-one for smooth well-shaped cost function, we also propose our [ver. 0.9](https://github.com/ZJU-FAST-Lab/LBFGS-Lite/tags) where a More-Thuente line search is kept. 37 | 38 | - Other schemes' global convergence on non-convex functions are not guaranteed theoretically. We avoid the potential problems by employing the [cautious update](https://epubs.siam.org/doi/pdf/10.1137/S1052623499354242) scheme in our lib without additional computation overhead. 39 | 40 | ## 6. Licence 41 | 42 | LBFGS-Lite is modified from [the C version](https://github.com/chokkan/liblbfgs) by Okazaki, which is further based on [the original Fortran version](https://doi.org/10.1007/BF01589116) by Nocedal. Thus it is distributed under the term of the MIT license according to previous ones by Okazaki and by Nocedal. Please refer to LICENSE file in the distribution. 43 | 44 | ## 7. Maintaince 45 | 46 | If any bug, please contact [Zhepei Wang](https://zhepeiwang.github.io/) (). 47 | -------------------------------------------------------------------------------- /include/lbfgs.hpp: -------------------------------------------------------------------------------- 1 | #ifndef LBFGS_HPP 2 | #define LBFGS_HPP 3 | 4 | #include 5 | #include 6 | #include 7 | 8 | namespace lbfgs 9 | { 10 | // ----------------------- Data Type Part ----------------------- 11 | 12 | /** 13 | * L-BFGS optimization parameters. 14 | */ 15 | struct lbfgs_parameter_t 16 | { 17 | /** 18 | * The number of corrections to approximate the inverse hessian matrix. 19 | * The L-BFGS routine stores the computation results of previous m 20 | * iterations to approximate the inverse hessian matrix of the current 21 | * iteration. This parameter controls the size of the limited memories 22 | * (corrections). The default value is 8. Values less than 3 are 23 | * not recommended. Large values will result in excessive computing time. 24 | */ 25 | int mem_size = 8; 26 | 27 | /** 28 | * Epsilon for grad convergence test. DO NOT USE IT in nonsmooth cases! 29 | * Set it to 0.0 and use past-delta-based test for nonsmooth functions. 30 | * This parameter determines the accuracy with which the solution is to 31 | * be found. A minimization terminates when 32 | * ||g(x)||_inf / max(1, ||x||_inf) < g_epsilon, 33 | * where ||.||_inf is the infinity norm. The default value is 1.0e-5. 34 | * This should be greater than 1.0e-6 in practice because L-BFGS does 35 | * not directly reduce first-order residual. It still needs the function 36 | * value which can be corrupted by machine_prec when ||g|| is small. 37 | */ 38 | double g_epsilon = 1.0e-5; 39 | 40 | /** 41 | * Distance for delta-based convergence test. 42 | * This parameter determines the distance, in iterations, to compute 43 | * the rate of decrease of the cost function. If the value of this 44 | * parameter is zero, the library does not perform the delta-based 45 | * convergence test. The default value is 3. 46 | */ 47 | int past = 3; 48 | 49 | /** 50 | * Delta for convergence test. 51 | * This parameter determines the minimum rate of decrease of the 52 | * cost function. The library stops iterations when the following 53 | * condition is met: 54 | * |f' - f| / max(1, |f|) < delta, 55 | * where f' is the cost value of past iterations ago, and f is 56 | * the cost value of the current iteration. 57 | * The default value is 1.0e-6. 58 | */ 59 | double delta = 1.0e-6; 60 | 61 | /** 62 | * The maximum number of iterations. 63 | * The lbfgs_optimize() function terminates an minimization process with 64 | * ::LBFGSERR_MAXIMUMITERATION status code when the iteration count 65 | * exceedes this parameter. Setting this parameter to zero continues an 66 | * minimization process until a convergence or error. The default value 67 | * is 0. 68 | */ 69 | int max_iterations = 0; 70 | 71 | /** 72 | * The maximum number of trials for the line search. 73 | * This parameter controls the number of function and gradients evaluations 74 | * per iteration for the line search routine. The default value is 64. 75 | */ 76 | int max_linesearch = 64; 77 | 78 | /** 79 | * The minimum step of the line search routine. 80 | * The default value is 1.0e-20. This value need not be modified unless 81 | * the exponents are too large for the machine being used, or unless the 82 | * problem is extremely badly scaled (in which case the exponents should 83 | * be increased). 84 | */ 85 | double min_step = 1.0e-20; 86 | 87 | /** 88 | * The maximum step of the line search. 89 | * The default value is 1.0e+20. This value need not be modified unless 90 | * the exponents are too large for the machine being used, or unless the 91 | * problem is extremely badly scaled (in which case the exponents should 92 | * be increased). 93 | */ 94 | double max_step = 1.0e+20; 95 | 96 | /** 97 | * A parameter to control the accuracy of the line search routine. 98 | * The default value is 1.0e-4. This parameter should be greater 99 | * than zero and smaller than 1.0. 100 | */ 101 | double f_dec_coeff = 1.0e-4; 102 | 103 | /** 104 | * A parameter to control the accuracy of the line search routine. 105 | * The default value is 0.9. If the function and gradient 106 | * evaluations are inexpensive with respect to the cost of the 107 | * iteration (which is sometimes the case when solving very large 108 | * problems) it may be advantageous to set this parameter to a small 109 | * value. A typical small value is 0.1. This parameter should be 110 | * greater than the f_dec_coeff parameter and smaller than 1.0. 111 | */ 112 | double s_curv_coeff = 0.9; 113 | 114 | /** 115 | * A parameter to ensure the global convergence for nonconvex functions. 116 | * The default value is 1.0e-6. The parameter performs the so called 117 | * cautious update for L-BFGS, especially when the convergence is 118 | * not sufficient. The parameter must be positive but might as well 119 | * be less than 1.0e-3 in practice. 120 | */ 121 | double cautious_factor = 1.0e-6; 122 | 123 | /** 124 | * The machine precision for floating-point values. The default is 1.0e-16. 125 | * This parameter must be a positive value set by a client program to 126 | * estimate the machine precision. 127 | */ 128 | double machine_prec = 1.0e-16; 129 | }; 130 | 131 | /** 132 | * Return values of lbfgs_optimize(). 133 | * Roughly speaking, a negative value indicates an error. 134 | */ 135 | enum 136 | { 137 | /** L-BFGS reaches convergence. */ 138 | LBFGS_CONVERGENCE = 0, 139 | /** L-BFGS satisfies stopping criteria. */ 140 | LBFGS_STOP, 141 | /** The iteration has been canceled by the monitor callback. */ 142 | LBFGS_CANCELED, 143 | 144 | /** Unknown error. */ 145 | LBFGSERR_UNKNOWNERROR = -1024, 146 | /** Invalid number of variables specified. */ 147 | LBFGSERR_INVALID_N, 148 | /** Invalid parameter lbfgs_parameter_t::mem_size specified. */ 149 | LBFGSERR_INVALID_MEMSIZE, 150 | /** Invalid parameter lbfgs_parameter_t::g_epsilon specified. */ 151 | LBFGSERR_INVALID_GEPSILON, 152 | /** Invalid parameter lbfgs_parameter_t::past specified. */ 153 | LBFGSERR_INVALID_TESTPERIOD, 154 | /** Invalid parameter lbfgs_parameter_t::delta specified. */ 155 | LBFGSERR_INVALID_DELTA, 156 | /** Invalid parameter lbfgs_parameter_t::min_step specified. */ 157 | LBFGSERR_INVALID_MINSTEP, 158 | /** Invalid parameter lbfgs_parameter_t::max_step specified. */ 159 | LBFGSERR_INVALID_MAXSTEP, 160 | /** Invalid parameter lbfgs_parameter_t::f_dec_coeff specified. */ 161 | LBFGSERR_INVALID_FDECCOEFF, 162 | /** Invalid parameter lbfgs_parameter_t::s_curv_coeff specified. */ 163 | LBFGSERR_INVALID_SCURVCOEFF, 164 | /** Invalid parameter lbfgs_parameter_t::machine_prec specified. */ 165 | LBFGSERR_INVALID_MACHINEPREC, 166 | /** Invalid parameter lbfgs_parameter_t::max_linesearch specified. */ 167 | LBFGSERR_INVALID_MAXLINESEARCH, 168 | /** The function value became NaN or Inf. */ 169 | LBFGSERR_INVALID_FUNCVAL, 170 | /** The line-search step became smaller than lbfgs_parameter_t::min_step. */ 171 | LBFGSERR_MINIMUMSTEP, 172 | /** The line-search step became larger than lbfgs_parameter_t::max_step. */ 173 | LBFGSERR_MAXIMUMSTEP, 174 | /** Line search reaches the maximum, assumptions not satisfied or precision not achievable.*/ 175 | LBFGSERR_MAXIMUMLINESEARCH, 176 | /** The algorithm routine reaches the maximum number of iterations. */ 177 | LBFGSERR_MAXIMUMITERATION, 178 | /** Relative search interval width is at least lbfgs_parameter_t::machine_prec. */ 179 | LBFGSERR_WIDTHTOOSMALL, 180 | /** A logic error (negative line-search step) occurred. */ 181 | LBFGSERR_INVALIDPARAMETERS, 182 | /** The current search direction increases the cost function value. */ 183 | LBFGSERR_INCREASEGRADIENT, 184 | }; 185 | 186 | /** 187 | * Callback interface to provide cost function and gradient evaluations. 188 | * 189 | * The lbfgs_optimize() function call this function to obtain the values of cost 190 | * function and its gradients when needed. A client program must implement 191 | * this function to evaluate the values of the cost function and its 192 | * gradients, given current values of variables. 193 | * 194 | * @param instance The user data sent for lbfgs_optimize() function by the client. 195 | * @param x The current values of variables. 196 | * @param g The gradient vector. The callback function must compute 197 | * the gradient values for the current variables. 198 | * @retval double The value of the cost function for the current variables. 199 | */ 200 | typedef double (*lbfgs_evaluate_t)(void *instance, 201 | const Eigen::VectorXd &x, 202 | Eigen::VectorXd &g); 203 | 204 | /** 205 | * Callback interface to provide an upper bound at the beginning of the current line search. 206 | * 207 | * The lbfgs_optimize() function call this function to obtain the values of the 208 | * upperbound of the stepsize to search in, provided with the beginning values of 209 | * variables before the line search, and the current step vector (can be descent direction). 210 | * A client program can implement this function for more efficient linesearch. Any step 211 | * larger than this bound should not be considered. For example, it has a very large or even 212 | * inf function value. Note that the function value at the provided bound should be FINITE! 213 | * If it is not used, just set it nullptr. 214 | * 215 | * @param instance The user data sent for lbfgs_optimize() function by the client. 216 | * @param xp The values of variables before current line search. 217 | * @param d The step vector. It can be the descent direction. 218 | * @retval double The upperboud of the step in current line search routine, 219 | * such that (stpbound * d) is the maximum reasonable step. 220 | */ 221 | typedef double (*lbfgs_stepbound_t)(void *instance, 222 | const Eigen::VectorXd &xp, 223 | const Eigen::VectorXd &d); 224 | 225 | /** 226 | * Callback interface to monitor the progress of the minimization process. 227 | * 228 | * The lbfgs_optimize() function call this function for each iteration. Implementing 229 | * this function, a client program can store or display the current progress 230 | * of the minimization process. If it is not used, just set it nullptr. 231 | * 232 | * @param instance The user data sent for lbfgs_optimize() function by the client. 233 | * @param x The current values of variables. 234 | * @param g The current gradient values of variables. 235 | * @param fx The current value of the cost function. 236 | * @param step The line-search step used for this iteration. 237 | * @param k The iteration count. 238 | * @param ls The number of evaluations called for this iteration. 239 | * @retval int Zero to continue the minimization process. Returning a 240 | * non-zero value will cancel the minimization process. 241 | */ 242 | typedef int (*lbfgs_progress_t)(void *instance, 243 | const Eigen::VectorXd &x, 244 | const Eigen::VectorXd &g, 245 | const double fx, 246 | const double step, 247 | const int k, 248 | const int ls); 249 | 250 | /** 251 | * Callback data struct 252 | */ 253 | struct callback_data_t 254 | { 255 | void *instance = nullptr; 256 | lbfgs_evaluate_t proc_evaluate = nullptr; 257 | lbfgs_stepbound_t proc_stepbound = nullptr; 258 | lbfgs_progress_t proc_progress = nullptr; 259 | }; 260 | 261 | // ----------------------- L-BFGS Part ----------------------- 262 | 263 | /** 264 | * Line search method for smooth or nonsmooth functions. 265 | * This function performs line search to find a point that satisfy 266 | * both the Armijo condition and the weak Wolfe condition. It is 267 | * as robust as the backtracking line search but further applies 268 | * to continuous and piecewise smooth functions where the strong 269 | * Wolfe condition usually does not hold. 270 | * 271 | * @see 272 | * Adrian S. Lewis and Michael L. Overton. Nonsmooth optimization 273 | * via quasi-Newton methods. Mathematical Programming, Vol 141, 274 | * No 1, pp. 135-163, 2013. 275 | */ 276 | inline int line_search_lewisoverton(Eigen::VectorXd &x, 277 | double &f, 278 | Eigen::VectorXd &g, 279 | double &stp, 280 | const Eigen::VectorXd &s, 281 | const Eigen::VectorXd &xp, 282 | const Eigen::VectorXd &gp, 283 | const double stpmin, 284 | const double stpmax, 285 | const callback_data_t &cd, 286 | const lbfgs_parameter_t ¶m) 287 | { 288 | int count = 0; 289 | bool brackt = false, touched = false; 290 | double finit, dginit, dgtest, dstest; 291 | double mu = 0.0, nu = stpmax; 292 | 293 | /* Check the input parameters for errors. */ 294 | if (!(stp > 0.0)) 295 | { 296 | return LBFGSERR_INVALIDPARAMETERS; 297 | } 298 | 299 | /* Compute the initial gradient in the search direction. */ 300 | dginit = gp.dot(s); 301 | 302 | /* Make sure that s points to a descent direction. */ 303 | if (0.0 < dginit) 304 | { 305 | return LBFGSERR_INCREASEGRADIENT; 306 | } 307 | 308 | /* The initial value of the cost function. */ 309 | finit = f; 310 | dgtest = param.f_dec_coeff * dginit; 311 | dstest = param.s_curv_coeff * dginit; 312 | 313 | while (true) 314 | { 315 | x = xp + stp * s; 316 | 317 | /* Evaluate the function and gradient values. */ 318 | f = cd.proc_evaluate(cd.instance, x, g); 319 | ++count; 320 | 321 | /* Test for errors. */ 322 | if (std::isinf(f) || std::isnan(f)) 323 | { 324 | return LBFGSERR_INVALID_FUNCVAL; 325 | } 326 | /* Check the Armijo condition. */ 327 | if (f > finit + stp * dgtest) 328 | { 329 | nu = stp; 330 | brackt = true; 331 | } 332 | else 333 | { 334 | /* Check the weak Wolfe condition. */ 335 | if (g.dot(s) < dstest) 336 | { 337 | mu = stp; 338 | } 339 | else 340 | { 341 | return count; 342 | } 343 | } 344 | if (param.max_linesearch <= count) 345 | { 346 | /* Maximum number of iteration. */ 347 | return LBFGSERR_MAXIMUMLINESEARCH; 348 | } 349 | if (brackt && (nu - mu) < param.machine_prec * nu) 350 | { 351 | /* Relative interval width is at least machine_prec. */ 352 | return LBFGSERR_WIDTHTOOSMALL; 353 | } 354 | 355 | if (brackt) 356 | { 357 | stp = 0.5 * (mu + nu); 358 | } 359 | else 360 | { 361 | stp *= 2.0; 362 | } 363 | 364 | if (stp < stpmin) 365 | { 366 | /* The step is the minimum value. */ 367 | return LBFGSERR_MINIMUMSTEP; 368 | } 369 | if (stp > stpmax) 370 | { 371 | if (touched) 372 | { 373 | /* The step is the maximum value. */ 374 | return LBFGSERR_MAXIMUMSTEP; 375 | } 376 | else 377 | { 378 | /* The maximum value should be tried once. */ 379 | touched = true; 380 | stp = stpmax; 381 | } 382 | } 383 | } 384 | } 385 | 386 | /** 387 | * Start a L-BFGS optimization. 388 | * Assumptions: 1. f(x) is either C2 or C0 but piecewise C2; 389 | * 2. f(x) is lower bounded; 390 | * 3. f(x) has bounded level sets; 391 | * 4. g(x) is either the gradient or subgradient; 392 | * 5. The gradient exists at the initial guess x0. 393 | * A user must implement a function compatible with ::lbfgs_evaluate_t (evaluation 394 | * callback) and pass the pointer to the callback function to lbfgs_optimize() 395 | * arguments. Similarly, a user can implement a function compatible with 396 | * ::lbfgs_stepbound_t to provide an external upper bound for stepsize, and 397 | * ::lbfgs_progress_t (progress callback) to obtain the current progress 398 | * (e.g., variables, function, and gradient, etc) and to cancel the iteration 399 | * process if necessary. Implementation of the stepbound and the progress callback 400 | * is optional: a user can pass nullptr if progress notification is not necessary. 401 | * 402 | * 403 | * @param x The vector of decision variables. 404 | * THE INITIAL GUESS x0 SHOULD BE SET BEFORE THE CALL! 405 | * A client program can receive decision variables 406 | * through this vector, at which the cost and its 407 | * gradient are queried during minimization. 408 | * @param f The ref to the variable that receives the final 409 | * value of the cost function for the variables. 410 | * @param proc_evaluate The callback function to provide function f(x) and 411 | * gradient g(x) evaluations given a current values of 412 | * variables x. A client program must implement a 413 | * callback function compatible with lbfgs_evaluate_t 414 | * and pass the pointer to the callback function. 415 | * @param proc_stepbound The callback function to provide values of the 416 | * upperbound of the stepsize to search in, provided 417 | * with the beginning values of variables before the 418 | * line search, and the current step vector (can be 419 | * negative gradient). A client program can implement 420 | * this function for more efficient linesearch. If it is 421 | * not used, just set it nullptr. 422 | * @param proc_progress The callback function to receive the progress 423 | * (the number of iterations, the current value of 424 | * the cost function) of the minimization 425 | * process. This argument can be set to nullptr if 426 | * a progress report is unnecessary. 427 | * @param instance A user data pointer for client programs. The callback 428 | * functions will receive the value of this argument. 429 | * @param param The parameters for L-BFGS optimization. 430 | * @retval int The status code. This function returns a nonnegative 431 | * integer if the minimization process terminates without 432 | * an error. A negative integer indicates an error. 433 | */ 434 | inline int lbfgs_optimize(Eigen::VectorXd &x, 435 | double &f, 436 | lbfgs_evaluate_t proc_evaluate, 437 | lbfgs_stepbound_t proc_stepbound, 438 | lbfgs_progress_t proc_progress, 439 | void *instance, 440 | const lbfgs_parameter_t ¶m) 441 | { 442 | int ret, i, j, k, ls, end, bound; 443 | double step, step_min, step_max, fx, ys, yy; 444 | double gnorm_inf, xnorm_inf, beta, rate, cau; 445 | 446 | const int n = x.size(); 447 | const int m = param.mem_size; 448 | 449 | /* Check the input parameters for errors. */ 450 | if (n <= 0) 451 | { 452 | return LBFGSERR_INVALID_N; 453 | } 454 | if (m <= 0) 455 | { 456 | return LBFGSERR_INVALID_MEMSIZE; 457 | } 458 | if (param.g_epsilon < 0.0) 459 | { 460 | return LBFGSERR_INVALID_GEPSILON; 461 | } 462 | if (param.past < 0) 463 | { 464 | return LBFGSERR_INVALID_TESTPERIOD; 465 | } 466 | if (param.delta < 0.0) 467 | { 468 | return LBFGSERR_INVALID_DELTA; 469 | } 470 | if (param.min_step < 0.0) 471 | { 472 | return LBFGSERR_INVALID_MINSTEP; 473 | } 474 | if (param.max_step < param.min_step) 475 | { 476 | return LBFGSERR_INVALID_MAXSTEP; 477 | } 478 | if (!(param.f_dec_coeff > 0.0 && 479 | param.f_dec_coeff < 1.0)) 480 | { 481 | return LBFGSERR_INVALID_FDECCOEFF; 482 | } 483 | if (!(param.s_curv_coeff < 1.0 && 484 | param.s_curv_coeff > param.f_dec_coeff)) 485 | { 486 | return LBFGSERR_INVALID_SCURVCOEFF; 487 | } 488 | if (!(param.machine_prec > 0.0)) 489 | { 490 | return LBFGSERR_INVALID_MACHINEPREC; 491 | } 492 | if (param.max_linesearch <= 0) 493 | { 494 | return LBFGSERR_INVALID_MAXLINESEARCH; 495 | } 496 | 497 | /* Prepare intermediate variables. */ 498 | Eigen::VectorXd xp(n); 499 | Eigen::VectorXd g(n); 500 | Eigen::VectorXd gp(n); 501 | Eigen::VectorXd d(n); 502 | Eigen::VectorXd pf(std::max(1, param.past)); 503 | 504 | /* Initialize the limited memory. */ 505 | Eigen::VectorXd lm_alpha = Eigen::VectorXd::Zero(m); 506 | Eigen::MatrixXd lm_s = Eigen::MatrixXd::Zero(n, m); 507 | Eigen::MatrixXd lm_y = Eigen::MatrixXd::Zero(n, m); 508 | Eigen::VectorXd lm_ys = Eigen::VectorXd::Zero(m); 509 | 510 | /* Construct a callback data. */ 511 | callback_data_t cd; 512 | cd.instance = instance; 513 | cd.proc_evaluate = proc_evaluate; 514 | cd.proc_stepbound = proc_stepbound; 515 | cd.proc_progress = proc_progress; 516 | 517 | /* Evaluate the function value and its gradient. */ 518 | fx = cd.proc_evaluate(cd.instance, x, g); 519 | 520 | /* Store the initial value of the cost function. */ 521 | pf(0) = fx; 522 | 523 | /* 524 | Compute the direction; 525 | we assume the initial hessian matrix H_0 as the identity matrix. 526 | */ 527 | d = -g; 528 | 529 | /* 530 | Make sure that the initial variables are not a stationary point. 531 | */ 532 | gnorm_inf = g.cwiseAbs().maxCoeff(); 533 | xnorm_inf = x.cwiseAbs().maxCoeff(); 534 | 535 | if (gnorm_inf / std::max(1.0, xnorm_inf) <= param.g_epsilon) 536 | { 537 | /* The initial guess is already a stationary point. */ 538 | ret = LBFGS_CONVERGENCE; 539 | } 540 | else 541 | { 542 | /* 543 | Compute the initial step: 544 | */ 545 | step = 1.0 / d.norm(); 546 | 547 | k = 1; 548 | end = 0; 549 | bound = 0; 550 | 551 | while (true) 552 | { 553 | /* Store the current position and gradient vectors. */ 554 | xp = x; 555 | gp = g; 556 | 557 | /* If the step bound can be provided dynamically, then apply it. */ 558 | step_min = param.min_step; 559 | step_max = param.max_step; 560 | if (cd.proc_stepbound) 561 | { 562 | step_max = cd.proc_stepbound(cd.instance, xp, d); 563 | step_max = step_max < param.max_step ? step_max : param.max_step; 564 | } 565 | step = step < step_max ? step : 0.5 * step_max; 566 | 567 | /* Search for an optimal step. */ 568 | ls = line_search_lewisoverton(x, fx, g, step, d, xp, gp, step_min, step_max, cd, param); 569 | 570 | if (ls < 0) 571 | { 572 | /* Revert to the previous point. */ 573 | x = xp; 574 | g = gp; 575 | ret = ls; 576 | break; 577 | } 578 | 579 | /* Report the progress. */ 580 | if (cd.proc_progress) 581 | { 582 | if (cd.proc_progress(cd.instance, x, g, fx, step, k, ls)) 583 | { 584 | ret = LBFGS_CANCELED; 585 | break; 586 | } 587 | } 588 | 589 | /* 590 | Convergence test. 591 | The criterion is given by the following formula: 592 | ||g(x)||_inf / max(1, ||x||_inf) < g_epsilon 593 | */ 594 | gnorm_inf = g.cwiseAbs().maxCoeff(); 595 | xnorm_inf = x.cwiseAbs().maxCoeff(); 596 | if (gnorm_inf / std::max(1.0, xnorm_inf) < param.g_epsilon) 597 | { 598 | /* Convergence. */ 599 | ret = LBFGS_CONVERGENCE; 600 | break; 601 | } 602 | 603 | /* 604 | Test for stopping criterion. 605 | The criterion is given by the following formula: 606 | |f(past_x) - f(x)| / max(1, |f(x)|) < \delta. 607 | */ 608 | if (0 < param.past) 609 | { 610 | /* We don't test the stopping criterion while k < past. */ 611 | if (param.past <= k) 612 | { 613 | /* The stopping criterion. */ 614 | rate = std::fabs(pf(k % param.past) - fx) / std::max(1.0, std::fabs(fx)); 615 | 616 | if (rate < param.delta) 617 | { 618 | ret = LBFGS_STOP; 619 | break; 620 | } 621 | } 622 | 623 | /* Store the current value of the cost function. */ 624 | pf(k % param.past) = fx; 625 | } 626 | 627 | if (param.max_iterations != 0 && param.max_iterations <= k) 628 | { 629 | /* Maximum number of iterations. */ 630 | ret = LBFGSERR_MAXIMUMITERATION; 631 | break; 632 | } 633 | 634 | /* Count the iteration number. */ 635 | ++k; 636 | 637 | /* 638 | Update vectors s and y: 639 | s_{k+1} = x_{k+1} - x_{k} = \step * d_{k}. 640 | y_{k+1} = g_{k+1} - g_{k}. 641 | */ 642 | lm_s.col(end) = x - xp; 643 | lm_y.col(end) = g - gp; 644 | 645 | /* 646 | Compute scalars ys and yy: 647 | ys = y^t \cdot s = 1 / \rho. 648 | yy = y^t \cdot y. 649 | Notice that yy is used for scaling the hessian matrix H_0 (Cholesky factor). 650 | */ 651 | ys = lm_y.col(end).dot(lm_s.col(end)); 652 | yy = lm_y.col(end).squaredNorm(); 653 | lm_ys(end) = ys; 654 | 655 | /* Compute the negative of gradients. */ 656 | d = -g; 657 | 658 | /* 659 | Only cautious update is performed here as long as 660 | (y^t \cdot s) / ||s_{k+1}||^2 > \epsilon * ||g_{k}||^\alpha, 661 | where \epsilon is the cautious factor and a proposed value 662 | for \alpha is 1. 663 | This is not for enforcing the PD of the approxomated Hessian 664 | since ys > 0 is already ensured by the weak Wolfe condition. 665 | This is to ensure the global convergence as described in: 666 | Dong-Hui Li and Masao Fukushima. On the global convergence of 667 | the BFGS method for nonconvex unconstrained optimization problems. 668 | SIAM Journal on Optimization, Vol 11, No 4, pp. 1054-1064, 2011. 669 | */ 670 | cau = lm_s.col(end).squaredNorm() * gp.norm() * param.cautious_factor; 671 | 672 | if (ys > cau) 673 | { 674 | /* 675 | Recursive formula to compute dir = -(H \cdot g). 676 | This is described in page 779 of: 677 | Jorge Nocedal. 678 | Updating Quasi-Newton Matrices with Limited Storage. 679 | Mathematics of Computation, Vol. 35, No. 151, 680 | pp. 773--782, 1980. 681 | */ 682 | ++bound; 683 | bound = m < bound ? m : bound; 684 | end = (end + 1) % m; 685 | 686 | j = end; 687 | for (i = 0; i < bound; ++i) 688 | { 689 | j = (j + m - 1) % m; /* if (--j == -1) j = m-1; */ 690 | /* \alpha_{j} = \rho_{j} s^{t}_{j} \cdot q_{k+1}. */ 691 | lm_alpha(j) = lm_s.col(j).dot(d) / lm_ys(j); 692 | /* q_{i} = q_{i+1} - \alpha_{i} y_{i}. */ 693 | d += (-lm_alpha(j)) * lm_y.col(j); 694 | } 695 | 696 | d *= ys / yy; 697 | 698 | for (i = 0; i < bound; ++i) 699 | { 700 | /* \beta_{j} = \rho_{j} y^t_{j} \cdot \gamm_{i}. */ 701 | beta = lm_y.col(j).dot(d) / lm_ys(j); 702 | /* \gamm_{i+1} = \gamm_{i} + (\alpha_{j} - \beta_{j}) s_{j}. */ 703 | d += (lm_alpha(j) - beta) * lm_s.col(j); 704 | j = (j + 1) % m; /* if (++j == m) j = 0; */ 705 | } 706 | } 707 | 708 | /* The search direction d is ready. We try step = 1 first. */ 709 | step = 1.0; 710 | } 711 | } 712 | 713 | /* Return the final value of the cost function. */ 714 | f = fx; 715 | 716 | return ret; 717 | } 718 | 719 | /** 720 | * Get string description of an lbfgs_optimize() return code. 721 | * 722 | * @param err A value returned by lbfgs_optimize(). 723 | */ 724 | inline const char *lbfgs_strerror(const int err) 725 | { 726 | switch (err) 727 | { 728 | case LBFGS_CONVERGENCE: 729 | return "Success: reached convergence (g_epsilon)."; 730 | 731 | case LBFGS_STOP: 732 | return "Success: met stopping criteria (past f decrease less than delta)."; 733 | 734 | case LBFGS_CANCELED: 735 | return "The iteration has been canceled by the monitor callback."; 736 | 737 | case LBFGSERR_UNKNOWNERROR: 738 | return "Unknown error."; 739 | 740 | case LBFGSERR_INVALID_N: 741 | return "Invalid number of variables specified."; 742 | 743 | case LBFGSERR_INVALID_MEMSIZE: 744 | return "Invalid parameter lbfgs_parameter_t::mem_size specified."; 745 | 746 | case LBFGSERR_INVALID_GEPSILON: 747 | return "Invalid parameter lbfgs_parameter_t::g_epsilon specified."; 748 | 749 | case LBFGSERR_INVALID_TESTPERIOD: 750 | return "Invalid parameter lbfgs_parameter_t::past specified."; 751 | 752 | case LBFGSERR_INVALID_DELTA: 753 | return "Invalid parameter lbfgs_parameter_t::delta specified."; 754 | 755 | case LBFGSERR_INVALID_MINSTEP: 756 | return "Invalid parameter lbfgs_parameter_t::min_step specified."; 757 | 758 | case LBFGSERR_INVALID_MAXSTEP: 759 | return "Invalid parameter lbfgs_parameter_t::max_step specified."; 760 | 761 | case LBFGSERR_INVALID_FDECCOEFF: 762 | return "Invalid parameter lbfgs_parameter_t::f_dec_coeff specified."; 763 | 764 | case LBFGSERR_INVALID_SCURVCOEFF: 765 | return "Invalid parameter lbfgs_parameter_t::s_curv_coeff specified."; 766 | 767 | case LBFGSERR_INVALID_MACHINEPREC: 768 | return "Invalid parameter lbfgs_parameter_t::machine_prec specified."; 769 | 770 | case LBFGSERR_INVALID_MAXLINESEARCH: 771 | return "Invalid parameter lbfgs_parameter_t::max_linesearch specified."; 772 | 773 | case LBFGSERR_INVALID_FUNCVAL: 774 | return "The function value became NaN or Inf."; 775 | 776 | case LBFGSERR_MINIMUMSTEP: 777 | return "The line-search step became smaller than lbfgs_parameter_t::min_step."; 778 | 779 | case LBFGSERR_MAXIMUMSTEP: 780 | return "The line-search step became larger than lbfgs_parameter_t::max_step."; 781 | 782 | case LBFGSERR_MAXIMUMLINESEARCH: 783 | return "Line search reaches the maximum try number, assumptions not satisfied or precision not achievable."; 784 | 785 | case LBFGSERR_MAXIMUMITERATION: 786 | return "The algorithm routine reaches the maximum number of iterations."; 787 | 788 | case LBFGSERR_WIDTHTOOSMALL: 789 | return "Relative search interval width is at least lbfgs_parameter_t::machine_prec."; 790 | 791 | case LBFGSERR_INVALIDPARAMETERS: 792 | return "A logic error (negative line-search step) occurred."; 793 | 794 | case LBFGSERR_INCREASEGRADIENT: 795 | return "The current search direction increases the cost function value."; 796 | 797 | default: 798 | return "(unknown)"; 799 | } 800 | } 801 | 802 | } // namespace lbfgs 803 | 804 | #endif 805 | -------------------------------------------------------------------------------- /src/lbfgs_example.cpp: -------------------------------------------------------------------------------- 1 | #include "lbfgs.hpp" 2 | #include 3 | #include 4 | #include 5 | 6 | class MinimizationExample 7 | { 8 | public: 9 | int run(const int N) 10 | { 11 | double finalCost; 12 | Eigen::VectorXd x(N); 13 | 14 | /* Set the initial guess */ 15 | for (int i = 0; i < N; i += 2) 16 | { 17 | x(i) = -1.2; 18 | x(i + 1) = 1.0; 19 | } 20 | 21 | /* Set the minimization parameters */ 22 | lbfgs::lbfgs_parameter_t params; 23 | params.g_epsilon = 1.0e-8; 24 | params.past = 3; 25 | params.delta = 1.0e-8; 26 | 27 | /* Start minimization */ 28 | int ret = lbfgs::lbfgs_optimize(x, 29 | finalCost, 30 | costFunction, 31 | nullptr, 32 | monitorProgress, 33 | this, 34 | params); 35 | 36 | /* Report the result. */ 37 | std::cout << std::setprecision(4) 38 | << "================================" << std::endl 39 | << "L-BFGS Optimization Returned: " << ret << std::endl 40 | << "Minimized Cost: " << finalCost << std::endl 41 | << "Optimal Variables: " << std::endl 42 | << x.transpose() << std::endl; 43 | 44 | return ret; 45 | } 46 | 47 | private: 48 | static double costFunction(void *instance, 49 | const Eigen::VectorXd &x, 50 | Eigen::VectorXd &g) 51 | { 52 | const int n = x.size(); 53 | double fx = 0.0; 54 | for (int i = 0; i < n; i += 2) 55 | { 56 | const double t1 = 1.0 - x(i); 57 | const double t2 = 10.0 * (x(i + 1) - x(i) * x(i)); 58 | g(i + 1) = 20.0 * t2; 59 | g(i) = -2.0 * (x(i) * g(i + 1) + t1); 60 | fx += t1 * t1 + t2 * t2; 61 | } 62 | return fx; 63 | } 64 | 65 | static int monitorProgress(void *instance, 66 | const Eigen::VectorXd &x, 67 | const Eigen::VectorXd &g, 68 | const double fx, 69 | const double step, 70 | const int k, 71 | const int ls) 72 | { 73 | std::cout << std::setprecision(4) 74 | << "================================" << std::endl 75 | << "Iteration: " << k << std::endl 76 | << "Function Value: " << fx << std::endl 77 | << "Gradient Inf Norm: " << g.cwiseAbs().maxCoeff() << std::endl 78 | << "Variables: " << std::endl 79 | << x.transpose() << std::endl; 80 | return 0; 81 | } 82 | }; 83 | 84 | int main(int argc, char **argv) 85 | { 86 | MinimizationExample example; 87 | return example.run(200); 88 | } 89 | --------------------------------------------------------------------------------