├── LICENSE
├── README.md
├── __init__.py
├── clr_callback.py
├── clr_callback_tests.ipynb
└── images
    ├── cifar.png
    ├── cycle.png
    ├── exp_range.png
    ├── exp_rangeDiag.png
    ├── iteration.png
    ├── lrtest.png
    ├── reset.png
    ├── triangular.png
    ├── triangular2.png
    ├── triangular2Diag.png
    └── triangularDiag.png


/LICENSE:
--------------------------------------------------------------------------------
 1 | MIT License
 2 | 
 3 | Copyright (c) 2017 Bradley Kenstler
 4 | 
 5 | Permission is hereby granted, free of charge, to any person obtaining a copy
 6 | of this software and associated documentation files (the "Software"), to deal
 7 | in the Software without restriction, including without limitation the rights
 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
 9 | copies of the Software, and to permit persons to whom the Software is
10 | furnished to do so, subject to the following conditions:
11 | 
12 | The above copyright notice and this permission notice shall be included in all
13 | copies or substantial portions of the Software.
14 | 
15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
21 | SOFTWARE.
22 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
  1 | # Cyclical Learning Rate (CLR)
  2 | ![Alt text](images/triangularDiag.png?raw=true "Title")
  3 | 
  4 | This repository includes a Keras callback to be used in training that allows implementation of cyclical learning rate policies, as detailed in Leslie Smith's paper [Cyclical Learning Rates for Training Neural Networks
  5 | arXiv:1506.01186v4](https://arxiv.org/abs/1506.01186 "Title").
  6 | 
  7 | A cyclical learning rate is a policy of learning rate adjustment that increases the learning rate off a base value in a cyclical nature. Typically the frequency of the cycle is constant, but the amplitude is often scaled dynamically at either each cycle or each mini-batch iteration.
  8 | 
  9 | ## Why CLR
 10 | <img src="images/cifar.png" width="500" height="300" />
 11 | 
 12 | The author demonstrates how CLR policies can provide quicker converge for some neural network tasks and architectures.
 13 | One example from the paper compares validation accuracy for classification on the CIFAR-10 dataset. In this specific example, the author used a `triangular2` clr policy (detailed below). With clr, their model reached 81.4% validation accuracy in only 25,000 iterations compared to 70,000 iterations with standard hyperparameter settings.
 14 | 
 15 | One reason this approach may work well is because increasing the learning rate is an effective way of escaping saddle points. By cycling the learning rate, we're guaranteeing that such an increase will take place if we end up in a saddle point. 
 16 | 
 17 | ## CyclicLR()
 18 | 
 19 | The purpose of this class is to not only provide an easy implementation of CLR for Keras, but to enable easy experimentation with policies not explored in the original paper.
 20 | 
 21 | `clr_callback.py` contains the callback class `CyclicLR()`.
 22 | 
 23 | This class includes 3 built-in CLR policies, `'triangular'`, `'triangular2'`, and `'exp_range'`, as detailed in the original paper. It also allows for custom amplitude scaling functions, enabling easy experimentation.
 24 | 
 25 | Arguments for this class include:
 26 | * `base_lr`: initial learning rate, which is the lower boundary in the cycle. This overrides optimizer `lr`. Default 0.001.
 27 | * `max_lr`: upper boundary in the cycle. Functionally, it defines the cycle amplitude (`max_lr` - `base_lr`). The lr at any cycle is the sum of `base_lr` and some scaling of the amplitude; therefore `max_lr` may not actually be reached depending on scaling function. Default 0.006.
 28 | * `step_size`: number of training iterations per half cycle. Authors suggest setting `step_size = (2-8) x (training iterations in epoch)`. Default 2000.
 29 | * `mode`: one of `{'triangular', 'triangular2', 'exp_range'}`. Values correspond to policies detailed below. If `scale_fn` is not `None`, this argument is ignored. Default `'triangular'`.
 30 | * `gamma`: constant in `'exp_range'` scaling function, `gamma^(cycle iterations)`. Default 1.
 31 | * `scale_fn`: Custom scaling policy defined by a single argument lambda function, where `0 <= scale_fn(x) <= 1` for all `x >= 0`. `mode` parameter is ignored when this argument is used. Default `None`.
 32 | * `scale_mode`: `{'cycle', 'iterations'}`. Defines whether `scale_fn` is evaluated on cycle number or cycle iterations (training iterations since start of cycle). Default is `'cycle'`.
 33 | 
 34 | **NOTE: `base_lr` overrides `optimizer.lr`**
 35 | 
 36 | The general structure of the policy algorithm is:
 37 | 
 38 | ```python
 39 | cycle = np.floor(1+iterations/(2*step_size))
 40 | x = np.abs(iterations/step_size - 2*cycle + 1)
 41 | lr= base_lr + (max_lr-base_lr)*np.maximum(0, (1-x))*scale_fn(x)
 42 | ```
 43 | where `x` is either `iterations` or `cycle`, depending on `scale_mode`.
 44 | 
 45 | `CyclicLR()` can be used with any optimizer in Keras.
 46 | 
 47 | ### Syncing cycle and training iterations
 48 | 
 49 | The author points out that the best accuracies are typically attained by ending with the base learning rate. Therefore it's recommended to make sure your training finishes at the end of the cycle.
 50 | 
 51 | # Policies
 52 | 
 53 | ## triangular
 54 | 
 55 | ![Alt text](images/triangularDiag.png?raw=true "Title")
 56 | 
 57 | 
 58 | This method is a simple triangular cycle.
 59 | 
 60 | Basic algorithm:
 61 | 
 62 | ```python
 63 | cycle = np.floor(1+iterations/(2*step_size))
 64 | x = np.abs(iterations/step_size - 2*cycle + 1)
 65 | lr = base_lr + (max_lr-base_lr)*np.maximum(0, (1-x))
 66 | ```
 67 | 
 68 | Default triangular clr policy example:
 69 | ```python
 70 |     clr = CyclicLR(base_lr=0.001, max_lr=0.006,
 71 |                         step_size=2000.)
 72 |     model.fit(X_train, Y_train, callbacks=[clr])
 73 | ``` 
 74 | 
 75 | Results:
 76 | 
 77 | ![Alt text](images/triangular.png?raw=true "Title")
 78 | 
 79 | ## triangular2
 80 | 
 81 | ![Alt text](images/triangular2Diag.png?raw=true "Title")
 82 | 
 83 | This method is a triangular cycle that decreases the cycle amplitude by half after each period, while keeping the base lr constant. This is an example of scaling on cycle number.
 84 | 
 85 | Basic algorithm:
 86 | 
 87 | ```python
 88 | cycle = np.floor(1+iterations/(2*step_size))
 89 | x = np.abs(iterations/step_size - 2*cycle + 1)
 90 | lr = base_lr + (max_lr-base_lr)*np.maximum(0, (1-x))/float(2**(cycle-1))
 91 | ```
 92 | 
 93 | Default triangular clr policy example:
 94 | 
 95 | ```python
 96 |     clr = CyclicLR(base_lr=0.001, max_lr=0.006,
 97 |                         step_size=2000., mode='triangular2')
 98 |     model.fit(X_train, Y_train, callbacks=[clr])
 99 | ``` 
100 | 
101 | Results:
102 | 
103 | ![Alt text](images/triangular2.png?raw=true "Title")
104 | 
105 | ## exp_range
106 | 
107 | ![Alt text](images/exp_rangeDiag.png?raw=true "Title")
108 | 
109 | This method is a triangular cycle that scales the cycle amplitude by a factor `gamma**(iterations)`, while keeping the base lr constant. This is an example of scaling on iteration.
110 | 
111 | Basic algorithm:
112 | 
113 | ```python
114 | cycle = np.floor(1+iterations/(2*step_size))
115 | x = np.abs(iterations/step_size - 2*cycle + 1)
116 | lr= base_lr + (max_lr-base_lr)*np.maximum(0, (1-x))*gamma**(iterations)
117 | ```
118 | 
119 | Default triangular clr policy example:
120 | 
121 | ```python
122 |     clr = CyclicLR(base_lr=0.001, max_lr=0.006,
123 |                         step_size=2000., mode='exp_range',
124 |                         gamma=0.99994)
125 |     model.fit(X_train, Y_train, callbacks=[clr])
126 | ``` 
127 | 
128 | Results:
129 | 
130 | ![Alt text](images/exp_range.png?raw=true "Title")
131 | 
132 | ## Custom Cycle-Policy
133 | 
134 | This method is a triangular cycle that scales the cycle amplitude sinusoidally. This is an example of scaling on cycle.
135 | 
136 | Basic algorithm:
137 | 
138 | ```python
139 | cycle = np.floor(1+iterations/(2*step_size))
140 | x = np.abs(iterations/step_size - 2*cycle + 1)
141 | lr= base_lr + (max_lr-base_lr)*np.maximum(0, (1-x))*0.5*(1+np.sin(cycle*np.pi/2.))
142 | ```
143 | 
144 | Default custom cycle-policy example:
145 | ```python
146 |     clr_fn = lambda x: 0.5*(1+np.sin(x*np.pi/2.))
147 |     clr = CyclicLR(base_lr=0.001, max_lr=0.006,
148 |                         step_size=2000., scale_fn=clr_fn,
149 |                         scale_mode='cycle')
150 |     model.fit(X_train, Y_train, callbacks=[clr])
151 | ``` 
152 | 
153 | Results:
154 | 
155 | ![Alt text](images/cycle.png?raw=true "Title")
156 | 
157 | ## Custom Iteration-Policy
158 | 
159 | This method is a triangular cycle that scales the cycle amplitude as a function of the cycle iterations. This is an example of scaling on iteration.
160 | 
161 | Basic algorithm:
162 | 
163 | ```python
164 | cycle = np.floor(1+iterations/(2*step_size))
165 | x = np.abs(iterations/step_size - 2*cycle + 1)
166 | lr= base_lr + (max_lr-base_lr)*np.maximum(0, (1-x))*1/(5**(iterations*0.0001))
167 | ```
168 | 
169 | Default custom cycle-policy example:
170 | ```python
171 |     clr_fn = lambda x: 1/(5**(x*0.0001))
172 |     clr = CyclicLR(base_lr=0.001, max_lr=0.006,
173 |                         step_size=2000., scale_fn=clr_fn,
174 |                         scale_mode='iterations')
175 |     model.fit(X_train, Y_train, callbacks=[clr])
176 | ``` 
177 | 
178 | Results:
179 | 
180 | ![Alt text](images/iteration.png?raw=true "Title")
181 | 
182 | This result highlights one of the key differences between scaling on cycle vs scaling on iteration. When you scale on cycle, the absolute change in learning rate from one iteration to the next is always constant in a cycle. Scaling on iteration alters the absolute change at every iteration; in this particular case, the absolute change is monotonically decreasing. This results in the curvature between peaks.
183 | 
184 | # Additional Information
185 | 
186 | ## Changing/resetting Cycle
187 | 
188 | During training, you may wish to adjust your cycle parameters: 
189 | 
190 | ```python
191 | clr._reset(new_base_lr,
192 |            new_max_lr,
193 |            new_step_size)
194 | ```
195 | Calling `_reset()` allows you to start a new cycle w/ new parameters. 
196 | 
197 | `_reset()` also sets the cycle iteration count to zero. If you are using a policy with dynamic amplitude scaling, this ensures the scaling function is reset.
198 | 
199 | If an argument is not not included in the function call, then the corresponding parameter is unchanged in the new cycle. As a consequence, calling 
200 | 
201 | ```python
202 | clr._reset()
203 | ```
204 | 
205 | simply resets the original cycle.
206 | 
207 | ## History
208 | 
209 | `CyclicLR()` keeps track of learning rates, loss, metrics and more in the `history` attribute dict. This generated many of the plots above.
210 | 
211 | Note: iterations in the history is the running training iterations; it is distinct from the cycle iterations and does not reset. This allows you to plot your learning rates over training iterations, even after you change/reset the cycle.
212 | 
213 | Example:
214 | 
215 | ![Alt text](images/reset.png?raw=true "Title")
216 | 
217 | ## Choosing a suitable base_lr/max_lr (LR Range Test)
218 | <img src="images/lrtest.png" width="500" height="300" />
219 | 
220 | The author offers a simple approach to determining the boundaries of your cycle by increasing the learning rate over a number of epochs and observing the results. They refer to this as an "LR range test."
221 | 
222 | An LR range test can be done using the `triangular` policy; simply set `base_lr` and `max_lr` to define the entire range you wish to test over, and set `step_size` to be the total number of iterations in the number of epochs you wish to test on. This linearly increases the learning rate at each iteration over the range desired.
223 | 
224 | The author suggests choosing `base_lr` and `max_lr` by plotting accuracy vs. learning rate. Choose `base_lr` to be the learning rate where accuracy starts to increase, and choose `max_lr` to be the learning rate where accuracy starts to slow, oscillate, or fall (the elbow). In the example above, Smith chose 0.001 and 0.006 as `base_lr` and `max_lr` respectively.
225 | 
226 | ### Plotting Accuracy vs. Learning Rate
227 | In order to plot accuracy vs learning rate, you can use the `.history` attribute to get the learning rates and accuracy at each iteration.
228 | 
229 | ```python
230 | model.fit(X, Y, callbacks=[clr])
231 | h = clr.history
232 | lr = h['lr']
233 | acc = h['acc']
234 | ```
235 | 
236 | ## Order of learning rate augmentation
237 | Note that the clr callback updates the learning rate prior to any further learning rate adjustments as called for in a given optimizer.
238 | 
239 | ## Functionality Test
240 | 
241 | clr_callback_tests.ipynb contains tests demonstrating desired behavior of optimizers.
242 | 


--------------------------------------------------------------------------------
/__init__.py:
--------------------------------------------------------------------------------
1 | from .clr_callback import CyclicLR


--------------------------------------------------------------------------------
/clr_callback.py:
--------------------------------------------------------------------------------
  1 | from tensorflow.keras.callbacks import *
  2 | from tensorflow.keras import backend as K
  3 | import numpy as np
  4 | 
  5 | class CyclicLR(Callback):
  6 |     """This callback implements a cyclical learning rate policy (CLR).
  7 |     The method cycles the learning rate between two boundaries with
  8 |     some constant frequency, as detailed in this paper (https://arxiv.org/abs/1506.01186).
  9 |     The amplitude of the cycle can be scaled on a per-iteration or 
 10 |     per-cycle basis.
 11 |     This class has three built-in policies, as put forth in the paper.
 12 |     "triangular":
 13 |         A basic triangular cycle w/ no amplitude scaling.
 14 |     "triangular2":
 15 |         A basic triangular cycle that scales initial amplitude by half each cycle.
 16 |     "exp_range":
 17 |         A cycle that scales initial amplitude by gamma**(cycle iterations) at each 
 18 |         cycle iteration.
 19 |     For more detail, please see paper.
 20 |     
 21 |     # Example
 22 |         ```python
 23 |             clr = CyclicLR(base_lr=0.001, max_lr=0.006,
 24 |                                 step_size=2000., mode='triangular')
 25 |             model.fit(X_train, Y_train, callbacks=[clr])
 26 |         ```
 27 |     
 28 |     Class also supports custom scaling functions:
 29 |         ```python
 30 |             clr_fn = lambda x: 0.5*(1+np.sin(x*np.pi/2.))
 31 |             clr = CyclicLR(base_lr=0.001, max_lr=0.006,
 32 |                                 step_size=2000., scale_fn=clr_fn,
 33 |                                 scale_mode='cycle')
 34 |             model.fit(X_train, Y_train, callbacks=[clr])
 35 |         ```    
 36 |     # Arguments
 37 |         base_lr: initial learning rate which is the
 38 |             lower boundary in the cycle.
 39 |         max_lr: upper boundary in the cycle. Functionally,
 40 |             it defines the cycle amplitude (max_lr - base_lr).
 41 |             The lr at any cycle is the sum of base_lr
 42 |             and some scaling of the amplitude; therefore 
 43 |             max_lr may not actually be reached depending on
 44 |             scaling function.
 45 |         step_size: number of training iterations per
 46 |             half cycle. Authors suggest setting step_size
 47 |             2-8 x training iterations in epoch.
 48 |         mode: one of {triangular, triangular2, exp_range}.
 49 |             Default 'triangular'.
 50 |             Values correspond to policies detailed above.
 51 |             If scale_fn is not None, this argument is ignored.
 52 |         gamma: constant in 'exp_range' scaling function:
 53 |             gamma**(cycle iterations)
 54 |         scale_fn: Custom scaling policy defined by a single
 55 |             argument lambda function, where 
 56 |             0 <= scale_fn(x) <= 1 for all x >= 0.
 57 |             mode paramater is ignored 
 58 |         scale_mode: {'cycle', 'iterations'}.
 59 |             Defines whether scale_fn is evaluated on 
 60 |             cycle number or cycle iterations (training
 61 |             iterations since start of cycle). Default is 'cycle'.
 62 |     """
 63 | 
 64 |     def __init__(self, base_lr=0.001, max_lr=0.006, step_size=2000., mode='triangular',
 65 |                  gamma=1., scale_fn=None, scale_mode='cycle'):
 66 |         super(CyclicLR, self).__init__()
 67 | 
 68 |         self.base_lr = base_lr
 69 |         self.max_lr = max_lr
 70 |         self.step_size = step_size
 71 |         self.mode = mode
 72 |         self.gamma = gamma
 73 |         if scale_fn == None:
 74 |             if self.mode == 'triangular':
 75 |                 self.scale_fn = lambda x: 1.
 76 |                 self.scale_mode = 'cycle'
 77 |             elif self.mode == 'triangular2':
 78 |                 self.scale_fn = lambda x: 1/(2.**(x-1))
 79 |                 self.scale_mode = 'cycle'
 80 |             elif self.mode == 'exp_range':
 81 |                 self.scale_fn = lambda x: gamma**(x)
 82 |                 self.scale_mode = 'iterations'
 83 |         else:
 84 |             self.scale_fn = scale_fn
 85 |             self.scale_mode = scale_mode
 86 |         self.clr_iterations = 0.
 87 |         self.trn_iterations = 0.
 88 |         self.history = {}
 89 | 
 90 |         self._reset()
 91 | 
 92 |     def _reset(self, new_base_lr=None, new_max_lr=None,
 93 |                new_step_size=None):
 94 |         """Resets cycle iterations.
 95 |         Optional boundary/step size adjustment.
 96 |         """
 97 |         if new_base_lr != None:
 98 |             self.base_lr = new_base_lr
 99 |         if new_max_lr != None:
100 |             self.max_lr = new_max_lr
101 |         if new_step_size != None:
102 |             self.step_size = new_step_size
103 |         self.clr_iterations = 0.
104 |         
105 |     def clr(self):
106 |         cycle = np.floor(1+self.clr_iterations/(2*self.step_size))
107 |         x = np.abs(self.clr_iterations/self.step_size - 2*cycle + 1)
108 |         if self.scale_mode == 'cycle':
109 |             return self.base_lr + (self.max_lr-self.base_lr)*np.maximum(0, (1-x))*self.scale_fn(cycle)
110 |         else:
111 |             return self.base_lr + (self.max_lr-self.base_lr)*np.maximum(0, (1-x))*self.scale_fn(self.clr_iterations)
112 |         
113 |     def on_train_begin(self, logs={}):
114 |         logs = logs or {}
115 | 
116 |         if self.clr_iterations == 0:
117 |             K.set_value(self.model.optimizer.lr, self.base_lr)
118 |         else:
119 |             K.set_value(self.model.optimizer.lr, self.clr())        
120 |             
121 |     def on_batch_end(self, epoch, logs=None):
122 |         
123 |         logs = logs or {}
124 |         self.trn_iterations += 1
125 |         self.clr_iterations += 1
126 | 
127 |         self.history.setdefault('lr', []).append(K.get_value(self.model.optimizer.lr))
128 |         self.history.setdefault('iterations', []).append(self.trn_iterations)
129 | 
130 |         for k, v in logs.items():
131 |             self.history.setdefault(k, []).append(v)
132 |         
133 |         K.set_value(self.model.optimizer.lr, self.clr())
134 | 


--------------------------------------------------------------------------------
/images/cifar.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bckenstler/CLR/968cf5a49577f470d55fc17676fcfb6b11447d11/images/cifar.png


--------------------------------------------------------------------------------
/images/cycle.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bckenstler/CLR/968cf5a49577f470d55fc17676fcfb6b11447d11/images/cycle.png


--------------------------------------------------------------------------------
/images/exp_range.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bckenstler/CLR/968cf5a49577f470d55fc17676fcfb6b11447d11/images/exp_range.png


--------------------------------------------------------------------------------
/images/exp_rangeDiag.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bckenstler/CLR/968cf5a49577f470d55fc17676fcfb6b11447d11/images/exp_rangeDiag.png


--------------------------------------------------------------------------------
/images/iteration.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bckenstler/CLR/968cf5a49577f470d55fc17676fcfb6b11447d11/images/iteration.png


--------------------------------------------------------------------------------
/images/lrtest.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bckenstler/CLR/968cf5a49577f470d55fc17676fcfb6b11447d11/images/lrtest.png


--------------------------------------------------------------------------------
/images/reset.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bckenstler/CLR/968cf5a49577f470d55fc17676fcfb6b11447d11/images/reset.png


--------------------------------------------------------------------------------
/images/triangular.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bckenstler/CLR/968cf5a49577f470d55fc17676fcfb6b11447d11/images/triangular.png


--------------------------------------------------------------------------------
/images/triangular2.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bckenstler/CLR/968cf5a49577f470d55fc17676fcfb6b11447d11/images/triangular2.png


--------------------------------------------------------------------------------
/images/triangular2Diag.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bckenstler/CLR/968cf5a49577f470d55fc17676fcfb6b11447d11/images/triangular2Diag.png


--------------------------------------------------------------------------------
/images/triangularDiag.png:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/bckenstler/CLR/968cf5a49577f470d55fc17676fcfb6b11447d11/images/triangularDiag.png


--------------------------------------------------------------------------------