├── CMakeLists.txt
├── LICENSE
├── README.md
├── include
    └── lbfgs.hpp
└── src
    └── lbfgs_example.cpp


/CMakeLists.txt:
--------------------------------------------------------------------------------
 1 | cmake_minimum_required(VERSION 2.8.3)
 2 | 
 3 | project(lbfgs)
 4 | 
 5 | set(CMAKE_CXX_FLAGS "-std=c++11")
 6 | set(CMAKE_BUILD_TYPE "Release")
 7 | set(CMAKE_CXX_FLAGS_RELEASE "-O3 -Wall -fPIC")
 8 | 
 9 | find_package(Eigen3 REQUIRED)
10 | 
11 | include_directories(
12 |     include
13 |     ${EIGEN3_INCLUDE_DIRS}
14 | )
15 | 
16 | add_executable(${PROJECT_NAME} src/lbfgs_example.cpp)
17 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | The MIT License
 2 | 
 3 | Copyright (c) 1990 Jorge Nocedal
 4 | Copyright (c) 2007-2010 Naoaki Okazaki
 5 | Copyright (c) 2020-2022 Zhepei Wang
 6 | 
 7 | Permission is hereby granted, free of charge, to any person obtaining a
 8 | copy of this software and associated documentation files (the "Software"),
 9 | to deal in the Software without restriction, including without limitation
10 | the rights to use, copy, modify, merge, publish, distribute, sublicense,
11 | and/or sell copies of the Software, and to permit persons to whom the
12 | Software is furnished to do so, subject to the following conditions:
13 | 
14 | The above copyright notice and this permission notice shall be included in
15 | all copies or substantial portions of the Software.
16 | 
17 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
18 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
19 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
20 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
21 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
22 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
23 | THE SOFTWARE.
24 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # LBFGS-Lite
 2 | A header-only L-BFGS unconstrained optimizer.
 3 | 
 4 | ## 0. About
 5 | 
 6 | __LBFGS-Lite__ is a __C++__ [__header-only__](https://en.wikipedia.org/wiki/Header-only) library for __unconstrained optimization__. Many engineering considerations are added for improved robustness compared to the original version by Nocedal.
 7 | 
 8 | ## 1. How to use
 9 | 
10 | All explanations are detailed by the comments in "lbfgs.hpp". See "lbfgs_example.cpp" for the calling procedure. You may need to install Eigen via "sudo apt install libeigen3-dev" because we use Eigen for better performance since [ver. 2.1](https://github.com/ZJU-FAST-Lab/LBFGS-Lite/tags). If you need a pure C-style lib without any external dependence, please refer to [ver. 0.9](https://github.com/ZJU-FAST-Lab/LBFGS-Lite/tags).
11 | 
12 | ## 2. Features
13 | 
14 | - Only one header file "lbfgs.hpp" is all you need.
15 | 
16 | - The library implements [Limited-Memory Broyden-Fletcher-Goldfarb-Shanno Method](https://doi.org/10.1007/BF01589116) (L-BFGS).
17 | 
18 | - A __highly robust line search__ proposed by [Lewis and Overton](https://link.springer.com/article/10.1007/s10107-012-0514-2) has been employed since [ver. 2.1](https://github.com/ZJU-FAST-Lab/LBFGS-Lite/tags).
19 | 
20 | - Both __smooth__ ([C2](https://en.wikipedia.org/wiki/Smoothness)) and __nonsmooth__ ([C0 but piecewise C2](https://en.wikipedia.org/wiki/Smoothness)) functions are supported.
21 | 
22 | - __Cautious update__ by [Li and Fukushima](https://epubs.siam.org/doi/pdf/10.1137/S1052623499354242) is employed for __global convergence__ in nonconvex cases.
23 | 
24 | - __Externally provided maximum step size__ is convenient to functions defined on bounded sets.
25 | 
26 | ## 3. Why this lib
27 | 
28 | - Our lib is well-organized and concise (about 800 lines).
29 | 
30 | - Many other libs use [Nocedal's zoom line search](https://link.springer.com/book/10.1007%2F978-0-387-40065-5), [More-Thuente line search](https://dl.acm.org/doi/abs/10.1145/192115.192132), or [Hager-Zhang line search](https://doi.org/10.1137/030601880). The truth is, interpolation-based schemes bring many tunable parameters and make more or less assumptions on the function smoothness. Engineering practice tells us that these assumptions can fail due to bad numerical conditions and ill-shaped functions. Admittedly, they slightly reduce the iteration number in some cases, but also easily-crashed.
31 | 
32 | - Some other libs provide user-specified options for [Armijo and weak/strong Wolfe condition](https://en.wikipedia.org/wiki/Wolfe_conditions) without considering positive definiteness (PD) of the approximated Hessian. This is misleading because if only Armijo condition is satisfied, the PD property is not guaranteed and the solver is unstable or easily-crashed. We make the weak Wolfe condition mandatory here, which suffices for PD property.
33 | 
34 | - We use [Lewis-Overton line search](https://link.springer.com/article/10.1007/s10107-012-0514-2) as the only scheme since ver. 2.1 from which nonsmooth functions are supported. Other schemes either assume high orders of continuity, or enforce the strong Wolfe condition can never be fulfilled by nonsmooth functions. Moreover, Lewis-Overton line search are widely adopted in many graphics applications involving optimization on [scene](https://dl.acm.org/doi/abs/10.1145/2766929), [shape](https://dl.acm.org/doi/abs/10.1145/2897824.2925918), or [mesh](https://dl.acm.org/doi/abs/10.1145/3197517.3201303), showing its practical robustness.
35 | 
36 | - According to our practice, the function/gradient evaluation numbers are comparable with interpolation-based schemes. Sometimes it even requires fewer function calls. If you insist an interpolation-one for smooth well-shaped cost function, we also propose our [ver. 0.9](https://github.com/ZJU-FAST-Lab/LBFGS-Lite/tags) where a More-Thuente line search is kept.
37 | 
38 | - Other schemes' global convergence on non-convex functions are not guaranteed theoretically. We avoid the potential problems by employing the [cautious update](https://epubs.siam.org/doi/pdf/10.1137/S1052623499354242) scheme in our lib without additional computation overhead.
39 | 
40 | ## 6. Licence
41 | 
42 | LBFGS-Lite is modified from [the C version](https://github.com/chokkan/liblbfgs) by Okazaki, which is further based on [the original Fortran version](https://doi.org/10.1007/BF01589116) by Nocedal. Thus it is distributed under the term of the MIT license according to previous ones by Okazaki and by Nocedal. Please refer to LICENSE file in the distribution.
43 | 
44 | ## 7. Maintaince
45 | 
46 | If any bug, please contact [Zhepei Wang](https://zhepeiwang.github.io/) (<wangzhepei@live.com>).
47 | 


--------------------------------------------------------------------------------
/include/lbfgs.hpp:
--------------------------------------------------------------------------------
  1 | #ifndef LBFGS_HPP
  2 | #define LBFGS_HPP
  3 | 
  4 | #include <Eigen/Eigen>
  5 | #include <cmath>
  6 | #include <algorithm>
  7 | 
  8 | namespace lbfgs
  9 | {
 10 |     // ----------------------- Data Type Part -----------------------
 11 | 
 12 |     /**
 13 |      * L-BFGS optimization parameters.
 14 |      */
 15 |     struct lbfgs_parameter_t
 16 |     {
 17 |         /**
 18 |          * The number of corrections to approximate the inverse hessian matrix.
 19 |          *  The L-BFGS routine stores the computation results of previous m
 20 |          *  iterations to approximate the inverse hessian matrix of the current
 21 |          *  iteration. This parameter controls the size of the limited memories
 22 |          *  (corrections). The default value is 8. Values less than 3 are
 23 |          *  not recommended. Large values will result in excessive computing time.
 24 |          */
 25 |         int mem_size = 8;
 26 | 
 27 |         /**
 28 |          * Epsilon for grad convergence test. DO NOT USE IT in nonsmooth cases! 
 29 |          *  Set it to 0.0 and use past-delta-based test for nonsmooth functions.
 30 |          *  This parameter determines the accuracy with which the solution is to
 31 |          *  be found. A minimization terminates when
 32 |          *      ||g(x)||_inf / max(1, ||x||_inf) < g_epsilon,
 33 |          *  where ||.||_inf is the infinity norm. The default value is 1.0e-5. 
 34 |          *  This should be greater than 1.0e-6 in practice because L-BFGS does 
 35 |          *  not directly reduce first-order residual. It still needs the function 
 36 |          *  value which can be corrupted by machine_prec when ||g|| is small.
 37 |          */
 38 |         double g_epsilon = 1.0e-5;
 39 | 
 40 |         /**
 41 |          * Distance for delta-based convergence test.
 42 |          *  This parameter determines the distance, in iterations, to compute
 43 |          *  the rate of decrease of the cost function. If the value of this
 44 |          *  parameter is zero, the library does not perform the delta-based
 45 |          *  convergence test. The default value is 3.
 46 |          */
 47 |         int past = 3;
 48 | 
 49 |         /**
 50 |          * Delta for convergence test.
 51 |          *  This parameter determines the minimum rate of decrease of the
 52 |          *  cost function. The library stops iterations when the following 
 53 |          *  condition is met:
 54 |          *      |f' - f| / max(1, |f|) < delta,
 55 |          *  where f' is the cost value of past iterations ago, and f is
 56 |          *  the cost value of the current iteration.
 57 |          *  The default value is 1.0e-6.
 58 |          */
 59 |         double delta = 1.0e-6;
 60 | 
 61 |         /**
 62 |          * The maximum number of iterations.
 63 |          *  The lbfgs_optimize() function terminates an minimization process with
 64 |          *  ::LBFGSERR_MAXIMUMITERATION status code when the iteration count
 65 |          *  exceedes this parameter. Setting this parameter to zero continues an
 66 |          *  minimization process until a convergence or error. The default value
 67 |          *  is 0.
 68 |          */
 69 |         int max_iterations = 0;
 70 | 
 71 |         /**
 72 |          * The maximum number of trials for the line search.
 73 |          *  This parameter controls the number of function and gradients evaluations
 74 |          *  per iteration for the line search routine. The default value is 64.
 75 |          */
 76 |         int max_linesearch = 64;
 77 | 
 78 |         /**
 79 |          * The minimum step of the line search routine.
 80 |          *  The default value is 1.0e-20. This value need not be modified unless
 81 |          *  the exponents are too large for the machine being used, or unless the
 82 |          *  problem is extremely badly scaled (in which case the exponents should
 83 |          *  be increased).
 84 |          */
 85 |         double min_step = 1.0e-20;
 86 | 
 87 |         /**
 88 |          * The maximum step of the line search.
 89 |          *  The default value is 1.0e+20. This value need not be modified unless
 90 |          *  the exponents are too large for the machine being used, or unless the
 91 |          *  problem is extremely badly scaled (in which case the exponents should
 92 |          *  be increased).
 93 |          */
 94 |         double max_step = 1.0e+20;
 95 | 
 96 |         /**
 97 |          * A parameter to control the accuracy of the line search routine.
 98 |          *  The default value is 1.0e-4. This parameter should be greater
 99 |          *  than zero and smaller than 1.0.
100 |          */
101 |         double f_dec_coeff = 1.0e-4;
102 | 
103 |         /**
104 |          * A parameter to control the accuracy of the line search routine.
105 |          *  The default value is 0.9. If the function and gradient
106 |          *  evaluations are inexpensive with respect to the cost of the
107 |          *  iteration (which is sometimes the case when solving very large
108 |          *  problems) it may be advantageous to set this parameter to a small
109 |          *  value. A typical small value is 0.1. This parameter should be
110 |          *  greater than the f_dec_coeff parameter and smaller than 1.0.
111 |          */
112 |         double s_curv_coeff = 0.9;
113 | 
114 |         /**
115 |          * A parameter to ensure the global convergence for nonconvex functions.
116 |          *  The default value is 1.0e-6. The parameter performs the so called 
117 |          *  cautious update for L-BFGS, especially when the convergence is 
118 |          *  not sufficient. The parameter must be positive but might as well 
119 |          *  be less than 1.0e-3 in practice.
120 |          */
121 |         double cautious_factor = 1.0e-6;
122 | 
123 |         /**
124 |          * The machine precision for floating-point values. The default is 1.0e-16. 
125 |          *  This parameter must be a positive value set by a client program to
126 |          *  estimate the machine precision.
127 |          */
128 |         double machine_prec = 1.0e-16;
129 |     };
130 | 
131 |     /**
132 |      * Return values of lbfgs_optimize().
133 |      *  Roughly speaking, a negative value indicates an error.
134 |      */
135 |     enum
136 |     {
137 |         /** L-BFGS reaches convergence. */
138 |         LBFGS_CONVERGENCE = 0,
139 |         /** L-BFGS satisfies stopping criteria. */
140 |         LBFGS_STOP,
141 |         /** The iteration has been canceled by the monitor callback. */
142 |         LBFGS_CANCELED,
143 | 
144 |         /** Unknown error. */
145 |         LBFGSERR_UNKNOWNERROR = -1024,
146 |         /** Invalid number of variables specified. */
147 |         LBFGSERR_INVALID_N,
148 |         /** Invalid parameter lbfgs_parameter_t::mem_size specified. */
149 |         LBFGSERR_INVALID_MEMSIZE,
150 |         /** Invalid parameter lbfgs_parameter_t::g_epsilon specified. */
151 |         LBFGSERR_INVALID_GEPSILON,
152 |         /** Invalid parameter lbfgs_parameter_t::past specified. */
153 |         LBFGSERR_INVALID_TESTPERIOD,
154 |         /** Invalid parameter lbfgs_parameter_t::delta specified. */
155 |         LBFGSERR_INVALID_DELTA,
156 |         /** Invalid parameter lbfgs_parameter_t::min_step specified. */
157 |         LBFGSERR_INVALID_MINSTEP,
158 |         /** Invalid parameter lbfgs_parameter_t::max_step specified. */
159 |         LBFGSERR_INVALID_MAXSTEP,
160 |         /** Invalid parameter lbfgs_parameter_t::f_dec_coeff specified. */
161 |         LBFGSERR_INVALID_FDECCOEFF,
162 |         /** Invalid parameter lbfgs_parameter_t::s_curv_coeff specified. */
163 |         LBFGSERR_INVALID_SCURVCOEFF,
164 |         /** Invalid parameter lbfgs_parameter_t::machine_prec specified. */
165 |         LBFGSERR_INVALID_MACHINEPREC,
166 |         /** Invalid parameter lbfgs_parameter_t::max_linesearch specified. */
167 |         LBFGSERR_INVALID_MAXLINESEARCH,
168 |         /** The function value became NaN or Inf. */
169 |         LBFGSERR_INVALID_FUNCVAL,
170 |         /** The line-search step became smaller than lbfgs_parameter_t::min_step. */
171 |         LBFGSERR_MINIMUMSTEP,
172 |         /** The line-search step became larger than lbfgs_parameter_t::max_step. */
173 |         LBFGSERR_MAXIMUMSTEP,
174 |         /** Line search reaches the maximum, assumptions not satisfied or precision not achievable.*/
175 |         LBFGSERR_MAXIMUMLINESEARCH,
176 |         /** The algorithm routine reaches the maximum number of iterations. */
177 |         LBFGSERR_MAXIMUMITERATION,
178 |         /** Relative search interval width is at least lbfgs_parameter_t::machine_prec. */
179 |         LBFGSERR_WIDTHTOOSMALL,
180 |         /** A logic error (negative line-search step) occurred. */
181 |         LBFGSERR_INVALIDPARAMETERS,
182 |         /** The current search direction increases the cost function value. */
183 |         LBFGSERR_INCREASEGRADIENT,
184 |     };
185 | 
186 |     /**
187 |      * Callback interface to provide cost function and gradient evaluations.
188 |      *
189 |      *  The lbfgs_optimize() function call this function to obtain the values of cost
190 |      *  function and its gradients when needed. A client program must implement
191 |      *  this function to evaluate the values of the cost function and its
192 |      *  gradients, given current values of variables.
193 |      *  
194 |      *  @param  instance    The user data sent for lbfgs_optimize() function by the client.
195 |      *  @param  x           The current values of variables.
196 |      *  @param  g           The gradient vector. The callback function must compute
197 |      *                      the gradient values for the current variables.
198 |      *  @retval double      The value of the cost function for the current variables.
199 |      */
200 |     typedef double (*lbfgs_evaluate_t)(void *instance,
201 |                                        const Eigen::VectorXd &x,
202 |                                        Eigen::VectorXd &g);
203 | 
204 |     /**
205 |      * Callback interface to provide an upper bound at the beginning of the current line search.
206 |      *
207 |      *  The lbfgs_optimize() function call this function to obtain the values of the
208 |      *  upperbound of the stepsize to search in, provided with the beginning values of
209 |      *  variables before the line search, and the current step vector (can be descent direction). 
210 |      *  A client program can implement this function for more efficient linesearch. Any step 
211 |      *  larger than this bound should not be considered. For example, it has a very large or even 
212 |      *  inf function value. Note that the function value at the provided bound should be FINITE!
213 |      *  If it is not used, just set it nullptr.
214 |      *  
215 |      *  @param  instance    The user data sent for lbfgs_optimize() function by the client.
216 |      *  @param  xp          The values of variables before current line search.
217 |      *  @param  d           The step vector. It can be the descent direction.
218 |      *  @retval double      The upperboud of the step in current line search routine,
219 |      *                      such that (stpbound * d) is the maximum reasonable step.
220 |      */
221 |     typedef double (*lbfgs_stepbound_t)(void *instance,
222 |                                         const Eigen::VectorXd &xp,
223 |                                         const Eigen::VectorXd &d);
224 | 
225 |     /**
226 |      * Callback interface to monitor the progress of the minimization process.
227 |      *
228 |      *  The lbfgs_optimize() function call this function for each iteration. Implementing
229 |      *  this function, a client program can store or display the current progress
230 |      *  of the minimization process. If it is not used, just set it nullptr.
231 |      *
232 |      *  @param  instance    The user data sent for lbfgs_optimize() function by the client.
233 |      *  @param  x           The current values of variables.
234 |      *  @param  g           The current gradient values of variables.
235 |      *  @param  fx          The current value of the cost function.
236 |      *  @param  step        The line-search step used for this iteration.
237 |      *  @param  k           The iteration count.
238 |      *  @param  ls          The number of evaluations called for this iteration.
239 |      *  @retval int         Zero to continue the minimization process. Returning a
240 |      *                      non-zero value will cancel the minimization process.
241 |      */
242 |     typedef int (*lbfgs_progress_t)(void *instance,
243 |                                     const Eigen::VectorXd &x,
244 |                                     const Eigen::VectorXd &g,
245 |                                     const double fx,
246 |                                     const double step,
247 |                                     const int k,
248 |                                     const int ls);
249 | 
250 |     /**
251 |      * Callback data struct
252 |      */
253 |     struct callback_data_t
254 |     {
255 |         void *instance = nullptr;
256 |         lbfgs_evaluate_t proc_evaluate = nullptr;
257 |         lbfgs_stepbound_t proc_stepbound = nullptr;
258 |         lbfgs_progress_t proc_progress = nullptr;
259 |     };
260 | 
261 |     // ----------------------- L-BFGS Part -----------------------
262 | 
263 |     /**
264 |      * Line search method for smooth or nonsmooth functions.
265 |      *  This function performs line search to find a point that satisfy 
266 |      *  both the Armijo condition and the weak Wolfe condition. It is 
267 |      *  as robust as the backtracking line search but further applies 
268 |      *  to continuous and piecewise smooth functions where the strong 
269 |      *  Wolfe condition usually does not hold.
270 |      *
271 |      *  @see
272 |      *      Adrian S. Lewis and Michael L. Overton. Nonsmooth optimization 
273 |      *      via quasi-Newton methods. Mathematical Programming, Vol 141, 
274 |      *      No 1, pp. 135-163, 2013.
275 |      */
276 |     inline int line_search_lewisoverton(Eigen::VectorXd &x,
277 |                                         double &f,
278 |                                         Eigen::VectorXd &g,
279 |                                         double &stp,
280 |                                         const Eigen::VectorXd &s,
281 |                                         const Eigen::VectorXd &xp,
282 |                                         const Eigen::VectorXd &gp,
283 |                                         const double stpmin,
284 |                                         const double stpmax,
285 |                                         const callback_data_t &cd,
286 |                                         const lbfgs_parameter_t &param)
287 |     {
288 |         int count = 0;
289 |         bool brackt = false, touched = false;
290 |         double finit, dginit, dgtest, dstest;
291 |         double mu = 0.0, nu = stpmax;
292 | 
293 |         /* Check the input parameters for errors. */
294 |         if (!(stp > 0.0))
295 |         {
296 |             return LBFGSERR_INVALIDPARAMETERS;
297 |         }
298 | 
299 |         /* Compute the initial gradient in the search direction. */
300 |         dginit = gp.dot(s);
301 | 
302 |         /* Make sure that s points to a descent direction. */
303 |         if (0.0 < dginit)
304 |         {
305 |             return LBFGSERR_INCREASEGRADIENT;
306 |         }
307 | 
308 |         /* The initial value of the cost function. */
309 |         finit = f;
310 |         dgtest = param.f_dec_coeff * dginit;
311 |         dstest = param.s_curv_coeff * dginit;
312 | 
313 |         while (true)
314 |         {
315 |             x = xp + stp * s;
316 | 
317 |             /* Evaluate the function and gradient values. */
318 |             f = cd.proc_evaluate(cd.instance, x, g);
319 |             ++count;
320 | 
321 |             /* Test for errors. */
322 |             if (std::isinf(f) || std::isnan(f))
323 |             {
324 |                 return LBFGSERR_INVALID_FUNCVAL;
325 |             }
326 |             /* Check the Armijo condition. */
327 |             if (f > finit + stp * dgtest)
328 |             {
329 |                 nu = stp;
330 |                 brackt = true;
331 |             }
332 |             else
333 |             {
334 |                 /* Check the weak Wolfe condition. */
335 |                 if (g.dot(s) < dstest)
336 |                 {
337 |                     mu = stp;
338 |                 }
339 |                 else
340 |                 {
341 |                     return count;
342 |                 }
343 |             }
344 |             if (param.max_linesearch <= count)
345 |             {
346 |                 /* Maximum number of iteration. */
347 |                 return LBFGSERR_MAXIMUMLINESEARCH;
348 |             }
349 |             if (brackt && (nu - mu) < param.machine_prec * nu)
350 |             {
351 |                 /* Relative interval width is at least machine_prec. */
352 |                 return LBFGSERR_WIDTHTOOSMALL;
353 |             }
354 | 
355 |             if (brackt)
356 |             {
357 |                 stp = 0.5 * (mu + nu);
358 |             }
359 |             else
360 |             {
361 |                 stp *= 2.0;
362 |             }
363 | 
364 |             if (stp < stpmin)
365 |             {
366 |                 /* The step is the minimum value. */
367 |                 return LBFGSERR_MINIMUMSTEP;
368 |             }
369 |             if (stp > stpmax)
370 |             {
371 |                 if (touched)
372 |                 {
373 |                     /* The step is the maximum value. */
374 |                     return LBFGSERR_MAXIMUMSTEP;
375 |                 }
376 |                 else
377 |                 {
378 |                     /* The maximum value should be tried once. */
379 |                     touched = true;
380 |                     stp = stpmax;
381 |                 }
382 |             }
383 |         }
384 |     }
385 | 
386 |     /**
387 |      * Start a L-BFGS optimization.
388 |      * Assumptions: 1. f(x) is either C2 or C0 but piecewise C2;
389 |      *              2. f(x) is lower bounded;
390 |      *              3. f(x) has bounded level sets;
391 |      *              4. g(x) is either the gradient or subgradient;
392 |      *              5. The gradient exists at the initial guess x0.
393 |      * A user must implement a function compatible with ::lbfgs_evaluate_t (evaluation
394 |      * callback) and pass the pointer to the callback function to lbfgs_optimize() 
395 |      * arguments. Similarly, a user can implement a function compatible with 
396 |      * ::lbfgs_stepbound_t to provide an external upper bound for stepsize, and 
397 |      * ::lbfgs_progress_t (progress callback) to obtain the current progress 
398 |      * (e.g., variables, function, and gradient, etc) and to cancel the iteration 
399 |      * process if necessary. Implementation of the stepbound and the progress callback 
400 |      * is optional: a user can pass nullptr if progress notification is not necessary.
401 |      * 
402 |      *
403 |      *  @param  x               The vector of decision variables.
404 |      *                          THE INITIAL GUESS x0 SHOULD BE SET BEFORE THE CALL!
405 |      *                          A client program can receive decision variables 
406 |      *                          through this vector, at which the cost and its 
407 |      *                          gradient are queried during minimization.
408 |      *  @param  f               The ref to the variable that receives the final
409 |      *                          value of the cost function for the variables.
410 |      *  @param  proc_evaluate   The callback function to provide function f(x) and
411 |      *                          gradient g(x) evaluations given a current values of
412 |      *                          variables x. A client program must implement a
413 |      *                          callback function compatible with lbfgs_evaluate_t 
414 |      *                          and pass the pointer to the callback function.
415 |      *  @param  proc_stepbound  The callback function to provide values of the
416 |      *                          upperbound of the stepsize to search in, provided
417 |      *                          with the beginning values of variables before the 
418 |      *                          line search, and the current step vector (can be 
419 |      *                          negative gradient). A client program can implement
420 |      *                          this function for more efficient linesearch. If it is
421 |      *                          not used, just set it nullptr.
422 |      *  @param  proc_progress   The callback function to receive the progress
423 |      *                          (the number of iterations, the current value of
424 |      *                          the cost function) of the minimization
425 |      *                          process. This argument can be set to nullptr if
426 |      *                          a progress report is unnecessary.
427 |      *  @param  instance        A user data pointer for client programs. The callback
428 |      *                          functions will receive the value of this argument.
429 |      *  @param  param           The parameters for L-BFGS optimization.
430 |      *  @retval int             The status code. This function returns a nonnegative 
431 |      *                          integer if the minimization process terminates without 
432 |      *                          an error. A negative integer indicates an error.
433 |      */
434 |     inline int lbfgs_optimize(Eigen::VectorXd &x,
435 |                               double &f,
436 |                               lbfgs_evaluate_t proc_evaluate,
437 |                               lbfgs_stepbound_t proc_stepbound,
438 |                               lbfgs_progress_t proc_progress,
439 |                               void *instance,
440 |                               const lbfgs_parameter_t &param)
441 |     {
442 |         int ret, i, j, k, ls, end, bound;
443 |         double step, step_min, step_max, fx, ys, yy;
444 |         double gnorm_inf, xnorm_inf, beta, rate, cau;
445 | 
446 |         const int n = x.size();
447 |         const int m = param.mem_size;
448 | 
449 |         /* Check the input parameters for errors. */
450 |         if (n <= 0)
451 |         {
452 |             return LBFGSERR_INVALID_N;
453 |         }
454 |         if (m <= 0)
455 |         {
456 |             return LBFGSERR_INVALID_MEMSIZE;
457 |         }
458 |         if (param.g_epsilon < 0.0)
459 |         {
460 |             return LBFGSERR_INVALID_GEPSILON;
461 |         }
462 |         if (param.past < 0)
463 |         {
464 |             return LBFGSERR_INVALID_TESTPERIOD;
465 |         }
466 |         if (param.delta < 0.0)
467 |         {
468 |             return LBFGSERR_INVALID_DELTA;
469 |         }
470 |         if (param.min_step < 0.0)
471 |         {
472 |             return LBFGSERR_INVALID_MINSTEP;
473 |         }
474 |         if (param.max_step < param.min_step)
475 |         {
476 |             return LBFGSERR_INVALID_MAXSTEP;
477 |         }
478 |         if (!(param.f_dec_coeff > 0.0 &&
479 |               param.f_dec_coeff < 1.0))
480 |         {
481 |             return LBFGSERR_INVALID_FDECCOEFF;
482 |         }
483 |         if (!(param.s_curv_coeff < 1.0 &&
484 |               param.s_curv_coeff > param.f_dec_coeff))
485 |         {
486 |             return LBFGSERR_INVALID_SCURVCOEFF;
487 |         }
488 |         if (!(param.machine_prec > 0.0))
489 |         {
490 |             return LBFGSERR_INVALID_MACHINEPREC;
491 |         }
492 |         if (param.max_linesearch <= 0)
493 |         {
494 |             return LBFGSERR_INVALID_MAXLINESEARCH;
495 |         }
496 | 
497 |         /* Prepare intermediate variables. */
498 |         Eigen::VectorXd xp(n);
499 |         Eigen::VectorXd g(n);
500 |         Eigen::VectorXd gp(n);
501 |         Eigen::VectorXd d(n);
502 |         Eigen::VectorXd pf(std::max(1, param.past));
503 | 
504 |         /* Initialize the limited memory. */
505 |         Eigen::VectorXd lm_alpha = Eigen::VectorXd::Zero(m);
506 |         Eigen::MatrixXd lm_s = Eigen::MatrixXd::Zero(n, m);
507 |         Eigen::MatrixXd lm_y = Eigen::MatrixXd::Zero(n, m);
508 |         Eigen::VectorXd lm_ys = Eigen::VectorXd::Zero(m);
509 | 
510 |         /* Construct a callback data. */
511 |         callback_data_t cd;
512 |         cd.instance = instance;
513 |         cd.proc_evaluate = proc_evaluate;
514 |         cd.proc_stepbound = proc_stepbound;
515 |         cd.proc_progress = proc_progress;
516 | 
517 |         /* Evaluate the function value and its gradient. */
518 |         fx = cd.proc_evaluate(cd.instance, x, g);
519 | 
520 |         /* Store the initial value of the cost function. */
521 |         pf(0) = fx;
522 | 
523 |         /*
524 |         Compute the direction;
525 |         we assume the initial hessian matrix H_0 as the identity matrix.
526 |         */
527 |         d = -g;
528 | 
529 |         /*
530 |         Make sure that the initial variables are not a stationary point.
531 |         */
532 |         gnorm_inf = g.cwiseAbs().maxCoeff();
533 |         xnorm_inf = x.cwiseAbs().maxCoeff();
534 | 
535 |         if (gnorm_inf / std::max(1.0, xnorm_inf) <= param.g_epsilon)
536 |         {
537 |             /* The initial guess is already a stationary point. */
538 |             ret = LBFGS_CONVERGENCE;
539 |         }
540 |         else
541 |         {
542 |             /* 
543 |             Compute the initial step:
544 |             */
545 |             step = 1.0 / d.norm();
546 | 
547 |             k = 1;
548 |             end = 0;
549 |             bound = 0;
550 | 
551 |             while (true)
552 |             {
553 |                 /* Store the current position and gradient vectors. */
554 |                 xp = x;
555 |                 gp = g;
556 | 
557 |                 /* If the step bound can be provided dynamically, then apply it. */
558 |                 step_min = param.min_step;
559 |                 step_max = param.max_step;
560 |                 if (cd.proc_stepbound)
561 |                 {
562 |                     step_max = cd.proc_stepbound(cd.instance, xp, d);
563 |                     step_max = step_max < param.max_step ? step_max : param.max_step;
564 |                 }
565 |                 step = step < step_max ? step : 0.5 * step_max;
566 | 
567 |                 /* Search for an optimal step. */
568 |                 ls = line_search_lewisoverton(x, fx, g, step, d, xp, gp, step_min, step_max, cd, param);
569 | 
570 |                 if (ls < 0)
571 |                 {
572 |                     /* Revert to the previous point. */
573 |                     x = xp;
574 |                     g = gp;
575 |                     ret = ls;
576 |                     break;
577 |                 }
578 | 
579 |                 /* Report the progress. */
580 |                 if (cd.proc_progress)
581 |                 {
582 |                     if (cd.proc_progress(cd.instance, x, g, fx, step, k, ls))
583 |                     {
584 |                         ret = LBFGS_CANCELED;
585 |                         break;
586 |                     }
587 |                 }
588 | 
589 |                 /*
590 |                 Convergence test.
591 |                 The criterion is given by the following formula:
592 |                 ||g(x)||_inf / max(1, ||x||_inf) < g_epsilon
593 |                 */
594 |                 gnorm_inf = g.cwiseAbs().maxCoeff();
595 |                 xnorm_inf = x.cwiseAbs().maxCoeff();
596 |                 if (gnorm_inf / std::max(1.0, xnorm_inf) < param.g_epsilon)
597 |                 {
598 |                     /* Convergence. */
599 |                     ret = LBFGS_CONVERGENCE;
600 |                     break;
601 |                 }
602 | 
603 |                 /*
604 |                 Test for stopping criterion.
605 |                 The criterion is given by the following formula:
606 |                 |f(past_x) - f(x)| / max(1, |f(x)|) < \delta.
607 |                 */
608 |                 if (0 < param.past)
609 |                 {
610 |                     /* We don't test the stopping criterion while k < past. */
611 |                     if (param.past <= k)
612 |                     {
613 |                         /* The stopping criterion. */
614 |                         rate = std::fabs(pf(k % param.past) - fx) / std::max(1.0, std::fabs(fx));
615 | 
616 |                         if (rate < param.delta)
617 |                         {
618 |                             ret = LBFGS_STOP;
619 |                             break;
620 |                         }
621 |                     }
622 | 
623 |                     /* Store the current value of the cost function. */
624 |                     pf(k % param.past) = fx;
625 |                 }
626 | 
627 |                 if (param.max_iterations != 0 && param.max_iterations <= k)
628 |                 {
629 |                     /* Maximum number of iterations. */
630 |                     ret = LBFGSERR_MAXIMUMITERATION;
631 |                     break;
632 |                 }
633 | 
634 |                 /* Count the iteration number. */
635 |                 ++k;
636 | 
637 |                 /*
638 |                 Update vectors s and y:
639 |                 s_{k+1} = x_{k+1} - x_{k} = \step * d_{k}.
640 |                 y_{k+1} = g_{k+1} - g_{k}.
641 |                 */
642 |                 lm_s.col(end) = x - xp;
643 |                 lm_y.col(end) = g - gp;
644 | 
645 |                 /*
646 |                 Compute scalars ys and yy:
647 |                 ys = y^t \cdot s = 1 / \rho.
648 |                 yy = y^t \cdot y.
649 |                 Notice that yy is used for scaling the hessian matrix H_0 (Cholesky factor).
650 |                 */
651 |                 ys = lm_y.col(end).dot(lm_s.col(end));
652 |                 yy = lm_y.col(end).squaredNorm();
653 |                 lm_ys(end) = ys;
654 | 
655 |                 /* Compute the negative of gradients. */
656 |                 d = -g;
657 | 
658 |                 /* 
659 |                 Only cautious update is performed here as long as 
660 |                 (y^t \cdot s) / ||s_{k+1}||^2 > \epsilon * ||g_{k}||^\alpha,
661 |                 where \epsilon is the cautious factor and a proposed value 
662 |                 for \alpha is 1.
663 |                 This is not for enforcing the PD of the approxomated Hessian 
664 |                 since ys > 0 is already ensured by the weak Wolfe condition. 
665 |                 This is to ensure the global convergence as described in:
666 |                 Dong-Hui Li and Masao Fukushima. On the global convergence of 
667 |                 the BFGS method for nonconvex unconstrained optimization problems. 
668 |                 SIAM Journal on Optimization, Vol 11, No 4, pp. 1054-1064, 2011.
669 |                 */
670 |                 cau = lm_s.col(end).squaredNorm() * gp.norm() * param.cautious_factor;
671 | 
672 |                 if (ys > cau)
673 |                 {
674 |                     /*
675 |                     Recursive formula to compute dir = -(H \cdot g).
676 |                     This is described in page 779 of:
677 |                     Jorge Nocedal.
678 |                     Updating Quasi-Newton Matrices with Limited Storage.
679 |                     Mathematics of Computation, Vol. 35, No. 151,
680 |                     pp. 773--782, 1980.
681 |                     */
682 |                     ++bound;
683 |                     bound = m < bound ? m : bound;
684 |                     end = (end + 1) % m;
685 | 
686 |                     j = end;
687 |                     for (i = 0; i < bound; ++i)
688 |                     {
689 |                         j = (j + m - 1) % m; /* if (--j == -1) j = m-1; */
690 |                         /* \alpha_{j} = \rho_{j} s^{t}_{j} \cdot q_{k+1}. */
691 |                         lm_alpha(j) = lm_s.col(j).dot(d) / lm_ys(j);
692 |                         /* q_{i} = q_{i+1} - \alpha_{i} y_{i}. */
693 |                         d += (-lm_alpha(j)) * lm_y.col(j);
694 |                     }
695 | 
696 |                     d *= ys / yy;
697 | 
698 |                     for (i = 0; i < bound; ++i)
699 |                     {
700 |                         /* \beta_{j} = \rho_{j} y^t_{j} \cdot \gamm_{i}. */
701 |                         beta = lm_y.col(j).dot(d) / lm_ys(j);
702 |                         /* \gamm_{i+1} = \gamm_{i} + (\alpha_{j} - \beta_{j}) s_{j}. */
703 |                         d += (lm_alpha(j) - beta) * lm_s.col(j);
704 |                         j = (j + 1) % m; /* if (++j == m) j = 0; */
705 |                     }
706 |                 }
707 | 
708 |                 /* The search direction d is ready. We try step = 1 first. */
709 |                 step = 1.0;
710 |             }
711 |         }
712 | 
713 |         /* Return the final value of the cost function. */
714 |         f = fx;
715 | 
716 |         return ret;
717 |     }
718 | 
719 |     /**
720 |      * Get string description of an lbfgs_optimize() return code.
721 |      *
722 |      *  @param err          A value returned by lbfgs_optimize().
723 |      */
724 |     inline const char *lbfgs_strerror(const int err)
725 |     {
726 |         switch (err)
727 |         {
728 |         case LBFGS_CONVERGENCE:
729 |             return "Success: reached convergence (g_epsilon).";
730 | 
731 |         case LBFGS_STOP:
732 |             return "Success: met stopping criteria (past f decrease less than delta).";
733 | 
734 |         case LBFGS_CANCELED:
735 |             return "The iteration has been canceled by the monitor callback.";
736 | 
737 |         case LBFGSERR_UNKNOWNERROR:
738 |             return "Unknown error.";
739 | 
740 |         case LBFGSERR_INVALID_N:
741 |             return "Invalid number of variables specified.";
742 | 
743 |         case LBFGSERR_INVALID_MEMSIZE:
744 |             return "Invalid parameter lbfgs_parameter_t::mem_size specified.";
745 | 
746 |         case LBFGSERR_INVALID_GEPSILON:
747 |             return "Invalid parameter lbfgs_parameter_t::g_epsilon specified.";
748 | 
749 |         case LBFGSERR_INVALID_TESTPERIOD:
750 |             return "Invalid parameter lbfgs_parameter_t::past specified.";
751 | 
752 |         case LBFGSERR_INVALID_DELTA:
753 |             return "Invalid parameter lbfgs_parameter_t::delta specified.";
754 | 
755 |         case LBFGSERR_INVALID_MINSTEP:
756 |             return "Invalid parameter lbfgs_parameter_t::min_step specified.";
757 | 
758 |         case LBFGSERR_INVALID_MAXSTEP:
759 |             return "Invalid parameter lbfgs_parameter_t::max_step specified.";
760 | 
761 |         case LBFGSERR_INVALID_FDECCOEFF:
762 |             return "Invalid parameter lbfgs_parameter_t::f_dec_coeff specified.";
763 | 
764 |         case LBFGSERR_INVALID_SCURVCOEFF:
765 |             return "Invalid parameter lbfgs_parameter_t::s_curv_coeff specified.";
766 | 
767 |         case LBFGSERR_INVALID_MACHINEPREC:
768 |             return "Invalid parameter lbfgs_parameter_t::machine_prec specified.";
769 | 
770 |         case LBFGSERR_INVALID_MAXLINESEARCH:
771 |             return "Invalid parameter lbfgs_parameter_t::max_linesearch specified.";
772 | 
773 |         case LBFGSERR_INVALID_FUNCVAL:
774 |             return "The function value became NaN or Inf.";
775 | 
776 |         case LBFGSERR_MINIMUMSTEP:
777 |             return "The line-search step became smaller than lbfgs_parameter_t::min_step.";
778 | 
779 |         case LBFGSERR_MAXIMUMSTEP:
780 |             return "The line-search step became larger than lbfgs_parameter_t::max_step.";
781 | 
782 |         case LBFGSERR_MAXIMUMLINESEARCH:
783 |             return "Line search reaches the maximum try number, assumptions not satisfied or precision not achievable.";
784 | 
785 |         case LBFGSERR_MAXIMUMITERATION:
786 |             return "The algorithm routine reaches the maximum number of iterations.";
787 | 
788 |         case LBFGSERR_WIDTHTOOSMALL:
789 |             return "Relative search interval width is at least lbfgs_parameter_t::machine_prec.";
790 | 
791 |         case LBFGSERR_INVALIDPARAMETERS:
792 |             return "A logic error (negative line-search step) occurred.";
793 | 
794 |         case LBFGSERR_INCREASEGRADIENT:
795 |             return "The current search direction increases the cost function value.";
796 | 
797 |         default:
798 |             return "(unknown)";
799 |         }
800 |     }
801 | 
802 | } // namespace lbfgs
803 | 
804 | #endif
805 | 


--------------------------------------------------------------------------------
/src/lbfgs_example.cpp:
--------------------------------------------------------------------------------
 1 | #include "lbfgs.hpp"
 2 | #include <Eigen/Eigen>
 3 | #include <iostream>
 4 | #include <iomanip>
 5 | 
 6 | class MinimizationExample
 7 | {
 8 | public:
 9 |     int run(const int N)
10 |     {
11 |         double finalCost;
12 |         Eigen::VectorXd x(N);
13 | 
14 |         /* Set the initial guess */
15 |         for (int i = 0; i < N; i += 2)
16 |         {
17 |             x(i) = -1.2;
18 |             x(i + 1) = 1.0;
19 |         }
20 | 
21 |         /* Set the minimization parameters */
22 |         lbfgs::lbfgs_parameter_t params;
23 |         params.g_epsilon = 1.0e-8;
24 |         params.past = 3;
25 |         params.delta = 1.0e-8;
26 | 
27 |         /* Start minimization */
28 |         int ret = lbfgs::lbfgs_optimize(x,
29 |                                         finalCost,
30 |                                         costFunction,
31 |                                         nullptr,
32 |                                         monitorProgress,
33 |                                         this,
34 |                                         params);
35 | 
36 |         /* Report the result. */
37 |         std::cout << std::setprecision(4)
38 |                   << "================================" << std::endl
39 |                   << "L-BFGS Optimization Returned: " << ret << std::endl
40 |                   << "Minimized Cost: " << finalCost << std::endl
41 |                   << "Optimal Variables: " << std::endl
42 |                   << x.transpose() << std::endl;
43 | 
44 |         return ret;
45 |     }
46 | 
47 | private:
48 |     static double costFunction(void *instance,
49 |                                const Eigen::VectorXd &x,
50 |                                Eigen::VectorXd &g)
51 |     {
52 |         const int n = x.size();
53 |         double fx = 0.0;
54 |         for (int i = 0; i < n; i += 2)
55 |         {
56 |             const double t1 = 1.0 - x(i);
57 |             const double t2 = 10.0 * (x(i + 1) - x(i) * x(i));
58 |             g(i + 1) = 20.0 * t2;
59 |             g(i) = -2.0 * (x(i) * g(i + 1) + t1);
60 |             fx += t1 * t1 + t2 * t2;
61 |         }
62 |         return fx;
63 |     }
64 | 
65 |     static int monitorProgress(void *instance,
66 |                                const Eigen::VectorXd &x,
67 |                                const Eigen::VectorXd &g,
68 |                                const double fx,
69 |                                const double step,
70 |                                const int k,
71 |                                const int ls)
72 |     {
73 |         std::cout << std::setprecision(4)
74 |                   << "================================" << std::endl
75 |                   << "Iteration: " << k << std::endl
76 |                   << "Function Value: " << fx << std::endl
77 |                   << "Gradient Inf Norm: " << g.cwiseAbs().maxCoeff() << std::endl
78 |                   << "Variables: " << std::endl
79 |                   << x.transpose() << std::endl;
80 |         return 0;
81 |     }
82 | };
83 | 
84 | int main(int argc, char **argv)
85 | {
86 |     MinimizationExample example;
87 |     return example.run(200);
88 | }
89 | 


--------------------------------------------------------------------------------