├── SGD_tutorial.pdf
├── README.md
├── sgd_demo.py
├── LICENSE
└── SGD.py


/SGD_tutorial.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CU-UQ/SGD/HEAD/SGD_tutorial.pdf


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # SGD
 2 | Implementation of Stochastic Gradient Descent algorithms in Python (GNU GPLv3)  
 3 | If you find this code useful please cite the article:  
 4 | ### Topology Optimization under Uncertainty using a Stochastic Gradient-based Approach ###  
 5 | Subhayan De, Jerrad Hampton, Kurt Maute, and Alireza Doostan (2020)  
 6 | Structural and Multidisciplinary Optimization, 62(5), 2255-2278.  
 7 | https://doi.org/10.1007/s00158-020-02599-z  
 8 | 
 9 | ### BibTeX entry: ### 
10 | @article{de2020topology,  
11 |   title={Topology optimization under uncertainty using a stochastic gradient-based approach},   
12 |   author={De, Subhayan and Hampton, Jerrad and Maute, Kurt and Doostan, Alireza},   
13 |   journal={Structural and Multidisciplinary Optimization},  
14 |   volume={62},  
15 |   number={5},   
16 |   pages={2255--2278},   
17 |   year={2020},  
18 |   publisher={Springer}  
19 | } 
20 | 
21 | Download the SGD module from https://github.com/CU-UQ/SGD.  
22 | See the demo https://github.com/CU-UQ/SGD/blob/master/sgd_demo.py for an example of the implementation.  
23 | For a description of the algorithms, see De et al (2020) (https://doi.org/10.1007/s00158-020-02599-z) and Ruder (2016) (https://arxiv.org/abs/1609.04747).  
24 | Please report any bugs to Subhayan.De@colorado.edu
25 | ### Website: www.subhayande.com
26 | 
27 | Required packages: numpy, time  
28 | 
29 | This module implements:  
30 |   (i) Stochastic Gradient Descent,   
31 |   (ii) SGD with Momentum,  
32 |   (iii) NAG,  
33 |   (iv) AdaGrad,  
34 |   (iv) RMSprop,  
35 |   (vi) Adam,  
36 |   (vii) Adamax,  
37 |   (viii) Adadelta,  
38 |   (ix) Nadam,  
39 |   (x) SAG,   
40 |   (xi) minibatch SGD,  
41 |   (xii) SVRG.  
42 | 
43 | *NOTE*: Currently, the stopping conditions are maximum number of iteration and 2nd norm of gradient vector is smaller than a tolerance value. Only, time-delay and exponential learning schedules are implemented.
44 | 
45 | Download this file and use *import SGD as sgd* to use the algorithms.  
46 | See *sgd_demo.py* for an example.  
47 | 
48 | 


--------------------------------------------------------------------------------
/sgd_demo.py:
--------------------------------------------------------------------------------
  1 | #!/usr/bin/env python3
  2 | # -*- coding: utf-8 -*-
  3 | """
  4 | -------------------------------------------------------------------------------
  5 | If you find this code useful please cite the article:
  6 | Topology Optimization under Uncertainty using a Stochastic Gradient-based Approach
  7 | Subhayan De, Jerrad Hampton, Kurt Maute, and Alireza Doostan (2020)
  8 | Structural and Multidisciplinary Optimization, 62(5), 2255-2278. https://doi.org/10.1007/s00158-020-02599-z
  9 | BibTeX entry:
 10 | @article{de2020topology,
 11 | title={Topology optimization under uncertainty using a stochastic gradient-based approach},
 12 | author={De, Subhayan and Hampton, Jerrad and Maute, Kurt and Doostan, Alireza},
 13 | journal={Structural and Multidisciplinary Optimization},
 14 | volume={62},
 15 | number={5},
 16 | pages={2255--2278},
 17 | year={2020},
 18 | publisher={Springer}
 19 | }
 20 | Download the SGD module from https://github.com/CU-UQ/SGD.
 21 | See the demo https://github.com/CU-UQ/SGD/blob/master/sgd_demo.py for an example of the implementation.
 22 | For a description of the algorithms, see De et al (2020) (https://doi.org/10.1007/s00158-020-02599-z) and Ruder (2016) (https://arxiv.org/abs/1609.04747). 
 23 | Please report any bugs to Subhayan.De@colorado.edu
 24 | Website: www.subhayande.com
 25 | -------------------------------------------------------------------------------
 26 | This file uses a linear regression example to show the use of  StochasticGradientDescent module.
 27 | Available classes:
 28 |     (1) Stochastic gradient descent
 29 |     (2) SGD with momentum
 30 |     (3) Nesterov accelerated SGD
 31 |     (4) AdaGrad
 32 |     (5) RMSprop
 33 |     (6) Adam
 34 |     (7) Adamax
 35 |     (8) Adadelta
 36 |     (9) Nadam
 37 |     (10) Stochastic average gradient
 38 |     (11) Mini-batch stochastic gradient descent
 39 |     (12) SVRG
 40 |     
 41 | Copyright (C) 2019  Subhayan De
 42 | 
 43 |     This program is free software: you can redistribute it and/or modify
 44 |     it under the terms of the GNU General Public License as published by
 45 |     the Free Software Foundation, either version 3 of the License, or
 46 |     (at your option) any later version.
 47 | 
 48 |     This program is distributed in the hope that it will be useful,
 49 |     but WITHOUT ANY WARRANTY; without even the implied warranty of
 50 |     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 51 |     GNU General Public License for more details.
 52 | 
 53 |     You should have received a copy of the GNU General Public License
 54 |     along with this program.  If not, see <https://www.gnu.org/licenses/>.
 55 |     
 56 | Created on Mon Jul  9 21:19:43 2018
 57 | @author: Subhayan De (email: Subhayan.De@colorado.edu)
 58 | """
 59 | # import matplotlib and numpy packages
 60 | import matplotlib
 61 | import numpy as np
 62 | import matplotlib.pyplot as plt
 63 | import matplotlib.cm as cm
 64 | 
 65 | # import the algorithm classes from the SGD module
 66 | import SGD as sgd
 67 | 
 68 | def main():
 69 |     # Generate data
 70 |     np.random.seed(0)
 71 |     n = 1000
 72 |     X = 2.0*np.random.rand(n,1)
 73 |     
 74 |     # parameters
 75 |     w1 = 3.0
 76 |     w2 = 4.5
 77 |     # noisy data
 78 |     y = w1 + w2 * X + np.random.randn(n,1)
 79 |     
 80 |     X_b = np.c_[np.ones((n,1)), X] # add 1 to each instance
 81 |     # save data and x to files to be used later to calculate objectives and gradients
 82 |     np.savetxt('test1_data.txt',y)
 83 |     np.savetxt('test1_x.txt',X_b)
 84 |     
 85 |     # select the algorithm to run
 86 |     # acceptable terms: SGD, SGDmomentum, SGDnesterov, AdaGrad, RMSprop, Adam, Adamax, Adadelta, Nadam, minibatchSGD, SAG, SVRG
 87 |     alg = 'Adam'
 88 |     
 89 |     # initial parameter
 90 |     w10 = 2.0
 91 |     w20 = 0.5
 92 |     theta = np.array([w10, w20])
 93 |     R = objFun(theta) # initial objective
 94 |     it = 0 # set iteration counter to 0
 95 |     maxIt = 2500 # maximum iteration
 96 |     dR = gradFun(theta) # initial gradient
 97 |     if alg == 'SGD':
 98 |         # Stochastic Gradient Descent
 99 |         eta = 0.0025 # learning rate
100 |         opt = sgd.SGD(obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=gradFun) # initialize
101 |         opt.performIter() # perform iterations
102 |         thetaHist = opt.getParamHist()
103 |     elif alg == 'SGDmomentum':
104 |         # Stochastic Gradient Descent with momentum
105 |         eta = 0.001 # learning rate
106 |         opt = sgd.SGD(obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=gradFun, momentum = 0.9) # initialize
107 |         opt.performIter() # perform iterations
108 |         thetaHist = opt.getParamHist()    
109 |     elif alg == 'SGDnesterov':
110 |         # Stochastic Gradient Descent with Nesterov momentum
111 |         eta = 0.001 # learning rate
112 |         opt = sgd.SGD(obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=gradFun, momentum = 0.9,nesterov = True) # initialize
113 |         opt.performIter() # perform iterations
114 |         thetaHist = opt.getParamHist()
115 |     elif alg == 'AdaGrad':
116 |         # AdaGrad
117 |         eta = 0.25 # learning rate
118 |         opt = sgd.AdaGrad(gradHist=0.0,obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=gradFun) # initialize
119 |         opt.performIter() # perform iterations
120 |         thetaHist = opt.getParamHist()
121 |     elif alg == 'RMSprop':
122 |         # RMSprop
123 |         eta = 0.9 # learning rate
124 |         opt = sgd.RMSprop(gradHist=0.0,rho=0.1,obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=gradFun) # initialize
125 |         opt.performIter() # perform iterations
126 |         thetaHist = opt.getParamHist()
127 |     elif alg == 'Adam':
128 |         # Adam
129 |         eta = 0.025 # learning rate
130 |         opt = sgd.Adam(m = 0.0,v = 0.0,beta1 = 0.9,beta2 = 0.999,obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=gradFun) # initialize
131 |         opt.performIter() # perform iterations
132 |         thetaHist = opt.getParamHist()
133 |     elif alg == 'Adamax':
134 |         # Adamax
135 |         eta = 0.025 # learning rate
136 |         opt = sgd.Adamax(m = 0.0,u = 0.0,beta1 = 0.9,beta2 = 0.999,obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=gradFun) # initialize
137 |         opt.performIter() # perform iterations
138 |         thetaHist = opt.getParamHist()
139 |     elif alg == 'Adadelta':
140 |         # Adadelta
141 |         eta = 1.0 # learning rate
142 |         opt = sgd.Adadelta(gradHist=0.0,updateHist=0.0,rho=0.99,obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=gradFun) # initialize
143 |         opt.performIter() # perform iterations
144 |         thetaHist = opt.getParamHist()
145 |     elif alg == 'Nadam':
146 |         # Nadam
147 |         eta = 0.01# learning rate
148 |         opt = sgd.Nadam(m = 0.0,v = 0.0,beta1 = 0.9,beta2 = 0.999,obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=gradFun) # initialize
149 |         opt.performIter() # perform iterations
150 |         thetaHist = opt.getParamHist()
151 |     elif alg == 'minibatchSGD':
152 |         # mini batch stochastic gradient descent
153 |         eta = 0.025 # learning rate
154 |         opt = sgd.minibatchSGD(nSamples = 10,nTotSamples = n,newGrad = 0.0,obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=batchGradFun) # initialize
155 |         opt.performIter() # perform iterations
156 |         thetaHist = opt.getParamHist()
157 |     elif alg == 'SAG':
158 |         # stochastic average gradient descent
159 |         eta = 0.0025 # learning rate
160 |         opt = sgd.SAG(nSamples = 20,nTotSamples= n, obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=batchGradFun) # initialize
161 |         opt.performIter() # perform iterations
162 |         thetaHist = opt.getParamHist()
163 |     elif alg == 'SVRG':
164 |         # stochastic variance reduced gradient descent
165 |         eta = 0.004
166 |         opt = sgd.SVRG(nTotSamples = n, innerIter = 10, outerIter = 200, option = 1,obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=batchGradFun)
167 |         opt.performOuterIter()
168 |         thetaHist = opt.getParamHist()
169 |     else:
170 |         raise ValueError('No such algorithm is in the module.\n Please use one of the following options:\nSGD, SGDmomentum, SGDnesterov, AdaGrad, RMSprop, Adam, Adamax, Adadelta, Nadam, minibatchSGD, SAG, SVRG')
171 |         
172 |     
173 |     # Plot the results
174 |     matplotlib.rcParams['xtick.direction'] = 'out'
175 |     matplotlib.rcParams['ytick.direction'] = 'out'
176 |     delta = 0.025
177 |     w1 = np.arange(-2.0, 10.0, delta)
178 |     w2 = np.arange(-2.0, 10.0, delta)
179 |     Xx, Yy = np.meshgrid(w1, w2)
180 |     nx = np.shape(Xx)
181 |     Z = np.zeros(nx)
182 |     for i in range(nx[0]):
183 |         for j in range(nx[1]):
184 |             Z[i,j] = (np.linalg.norm(y - Xx[i,j]-Yy[i,j]*X,2))**2/n
185 |             
186 |     plt.figure()
187 |     levels = np.arange(0, 40, 4)
188 |     CS = plt.contour(Xx, Yy, Z, levels,origin='lower',
189 |                  linewidths=2,
190 |                  extent=(-2, 10, -2, 10))
191 |     #plt.clabel(CS, inline=1, fontsize=10)
192 |     # Thicken the zero contour.
193 |     zc = CS.collections[6]
194 |     plt.setp(zc, linewidth=4)
195 | 
196 |     plt.clabel(CS, levels[1::2],  # label every second level
197 |            inline=1,
198 |            fmt='%1.1f',
199 |            fontsize=10)
200 |     im = plt.imshow(Z, interpolation='bilinear', origin='lower', cmap=cm.Wistia, extent=(-2, 10, -2, 10))
201 | 
202 |     # make a colorbar
203 |     plt.colorbar(im, shrink=0.8, extend='both')
204 |     plt.plot(thetaHist[0,:], thetaHist[1,:],'r.',linewidth = 6)
205 |     titl = opt.alg+' with a learning rate '+str(eta)
206 |     plt.title(titl)
207 |     return opt
208 | 
209 | def objFun(param):
210 |     # objective function
211 |     y = np.loadtxt('test1_data.txt')
212 |     X_b = np.loadtxt('test1_x.txt')
213 |     n = np.size(y)
214 |     yprime = X_b.dot(param)
215 |     obj = np.sum(np.multiply(y-yprime,y-yprime))/n
216 |     return obj
217 | 
218 | def gradFun(param):
219 |     # gradient function
220 |     y = np.loadtxt('test1_data.txt')
221 |     X_b = np.loadtxt('test1_x.txt')
222 |     n = np.size(y)
223 |     nprime = np.random.randint(n)
224 |     xi = X_b[nprime:nprime+1]
225 |     yi = y[nprime:nprime+1]
226 |     grad = 2.0 * xi.T.dot(xi.dot(param) - yi)
227 |     return grad
228 | 
229 | def batchGradFun(param,nBatch):
230 |     # batch gradient function
231 |     y = np.loadtxt('test1_data.txt')
232 |     X_b = np.loadtxt('test1_x.txt')
233 |     n = np.size(y)
234 |     nParam = np.size(param)
235 |     batchGrad = np.zeros((nParam,nBatch))
236 |     nprime = np.random.choice(range(n), nBatch, replace = False)
237 |     for i in range(nBatch):
238 |         xi = X_b[nprime[i]:nprime[i]+1]
239 |         yi = y[nprime[i]:nprime[i]+1]
240 |         batchGrad[:,i] = 2.0 * xi.T.dot(xi.dot(param) - yi)
241 |     return batchGrad,nprime
242 | 
243 | if __name__ == "__main__":
244 |     opt = main()
245 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
  1 |                     GNU GENERAL PUBLIC LICENSE
  2 |                        Version 3, 29 June 2007
  3 | 
  4 |  Copyright (C) 2007 Free Software Foundation, Inc. <https://fsf.org/>
  5 |  Everyone is permitted to copy and distribute verbatim copies
  6 |  of this license document, but changing it is not allowed.
  7 | 
  8 |                             Preamble
  9 | 
 10 |   The GNU General Public License is a free, copyleft license for
 11 | software and other kinds of works.
 12 | 
 13 |   The licenses for most software and other practical works are designed
 14 | to take away your freedom to share and change the works.  By contrast,
 15 | the GNU General Public License is intended to guarantee your freedom to
 16 | share and change all versions of a program--to make sure it remains free
 17 | software for all its users.  We, the Free Software Foundation, use the
 18 | GNU General Public License for most of our software; it applies also to
 19 | any other work released this way by its authors.  You can apply it to
 20 | your programs, too.
 21 | 
 22 |   When we speak of free software, we are referring to freedom, not
 23 | price.  Our General Public Licenses are designed to make sure that you
 24 | have the freedom to distribute copies of free software (and charge for
 25 | them if you wish), that you receive source code or can get it if you
 26 | want it, that you can change the software or use pieces of it in new
 27 | free programs, and that you know you can do these things.
 28 | 
 29 |   To protect your rights, we need to prevent others from denying you
 30 | these rights or asking you to surrender the rights.  Therefore, you have
 31 | certain responsibilities if you distribute copies of the software, or if
 32 | you modify it: responsibilities to respect the freedom of others.
 33 | 
 34 |   For example, if you distribute copies of such a program, whether
 35 | gratis or for a fee, you must pass on to the recipients the same
 36 | freedoms that you received.  You must make sure that they, too, receive
 37 | or can get the source code.  And you must show them these terms so they
 38 | know their rights.
 39 | 
 40 |   Developers that use the GNU GPL protect your rights with two steps:
 41 | (1) assert copyright on the software, and (2) offer you this License
 42 | giving you legal permission to copy, distribute and/or modify it.
 43 | 
 44 |   For the developers' and authors' protection, the GPL clearly explains
 45 | that there is no warranty for this free software.  For both users' and
 46 | authors' sake, the GPL requires that modified versions be marked as
 47 | changed, so that their problems will not be attributed erroneously to
 48 | authors of previous versions.
 49 | 
 50 |   Some devices are designed to deny users access to install or run
 51 | modified versions of the software inside them, although the manufacturer
 52 | can do so.  This is fundamentally incompatible with the aim of
 53 | protecting users' freedom to change the software.  The systematic
 54 | pattern of such abuse occurs in the area of products for individuals to
 55 | use, which is precisely where it is most unacceptable.  Therefore, we
 56 | have designed this version of the GPL to prohibit the practice for those
 57 | products.  If such problems arise substantially in other domains, we
 58 | stand ready to extend this provision to those domains in future versions
 59 | of the GPL, as needed to protect the freedom of users.
 60 | 
 61 |   Finally, every program is threatened constantly by software patents.
 62 | States should not allow patents to restrict development and use of
 63 | software on general-purpose computers, but in those that do, we wish to
 64 | avoid the special danger that patents applied to a free program could
 65 | make it effectively proprietary.  To prevent this, the GPL assures that
 66 | patents cannot be used to render the program non-free.
 67 | 
 68 |   The precise terms and conditions for copying, distribution and
 69 | modification follow.
 70 | 
 71 |                        TERMS AND CONDITIONS
 72 | 
 73 |   0. Definitions.
 74 | 
 75 |   "This License" refers to version 3 of the GNU General Public License.
 76 | 
 77 |   "Copyright" also means copyright-like laws that apply to other kinds of
 78 | works, such as semiconductor masks.
 79 | 
 80 |   "The Program" refers to any copyrightable work licensed under this
 81 | License.  Each licensee is addressed as "you".  "Licensees" and
 82 | "recipients" may be individuals or organizations.
 83 | 
 84 |   To "modify" a work means to copy from or adapt all or part of the work
 85 | in a fashion requiring copyright permission, other than the making of an
 86 | exact copy.  The resulting work is called a "modified version" of the
 87 | earlier work or a work "based on" the earlier work.
 88 | 
 89 |   A "covered work" means either the unmodified Program or a work based
 90 | on the Program.
 91 | 
 92 |   To "propagate" a work means to do anything with it that, without
 93 | permission, would make you directly or secondarily liable for
 94 | infringement under applicable copyright law, except executing it on a
 95 | computer or modifying a private copy.  Propagation includes copying,
 96 | distribution (with or without modification), making available to the
 97 | public, and in some countries other activities as well.
 98 | 
 99 |   To "convey" a work means any kind of propagation that enables other
100 | parties to make or receive copies.  Mere interaction with a user through
101 | a computer network, with no transfer of a copy, is not conveying.
102 | 
103 |   An interactive user interface displays "Appropriate Legal Notices"
104 | to the extent that it includes a convenient and prominently visible
105 | feature that (1) displays an appropriate copyright notice, and (2)
106 | tells the user that there is no warranty for the work (except to the
107 | extent that warranties are provided), that licensees may convey the
108 | work under this License, and how to view a copy of this License.  If
109 | the interface presents a list of user commands or options, such as a
110 | menu, a prominent item in the list meets this criterion.
111 | 
112 |   1. Source Code.
113 | 
114 |   The "source code" for a work means the preferred form of the work
115 | for making modifications to it.  "Object code" means any non-source
116 | form of a work.
117 | 
118 |   A "Standard Interface" means an interface that either is an official
119 | standard defined by a recognized standards body, or, in the case of
120 | interfaces specified for a particular programming language, one that
121 | is widely used among developers working in that language.
122 | 
123 |   The "System Libraries" of an executable work include anything, other
124 | than the work as a whole, that (a) is included in the normal form of
125 | packaging a Major Component, but which is not part of that Major
126 | Component, and (b) serves only to enable use of the work with that
127 | Major Component, or to implement a Standard Interface for which an
128 | implementation is available to the public in source code form.  A
129 | "Major Component", in this context, means a major essential component
130 | (kernel, window system, and so on) of the specific operating system
131 | (if any) on which the executable work runs, or a compiler used to
132 | produce the work, or an object code interpreter used to run it.
133 | 
134 |   The "Corresponding Source" for a work in object code form means all
135 | the source code needed to generate, install, and (for an executable
136 | work) run the object code and to modify the work, including scripts to
137 | control those activities.  However, it does not include the work's
138 | System Libraries, or general-purpose tools or generally available free
139 | programs which are used unmodified in performing those activities but
140 | which are not part of the work.  For example, Corresponding Source
141 | includes interface definition files associated with source files for
142 | the work, and the source code for shared libraries and dynamically
143 | linked subprograms that the work is specifically designed to require,
144 | such as by intimate data communication or control flow between those
145 | subprograms and other parts of the work.
146 | 
147 |   The Corresponding Source need not include anything that users
148 | can regenerate automatically from other parts of the Corresponding
149 | Source.
150 | 
151 |   The Corresponding Source for a work in source code form is that
152 | same work.
153 | 
154 |   2. Basic Permissions.
155 | 
156 |   All rights granted under this License are granted for the term of
157 | copyright on the Program, and are irrevocable provided the stated
158 | conditions are met.  This License explicitly affirms your unlimited
159 | permission to run the unmodified Program.  The output from running a
160 | covered work is covered by this License only if the output, given its
161 | content, constitutes a covered work.  This License acknowledges your
162 | rights of fair use or other equivalent, as provided by copyright law.
163 | 
164 |   You may make, run and propagate covered works that you do not
165 | convey, without conditions so long as your license otherwise remains
166 | in force.  You may convey covered works to others for the sole purpose
167 | of having them make modifications exclusively for you, or provide you
168 | with facilities for running those works, provided that you comply with
169 | the terms of this License in conveying all material for which you do
170 | not control copyright.  Those thus making or running the covered works
171 | for you must do so exclusively on your behalf, under your direction
172 | and control, on terms that prohibit them from making any copies of
173 | your copyrighted material outside their relationship with you.
174 | 
175 |   Conveying under any other circumstances is permitted solely under
176 | the conditions stated below.  Sublicensing is not allowed; section 10
177 | makes it unnecessary.
178 | 
179 |   3. Protecting Users' Legal Rights From Anti-Circumvention Law.
180 | 
181 |   No covered work shall be deemed part of an effective technological
182 | measure under any applicable law fulfilling obligations under article
183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or
184 | similar laws prohibiting or restricting circumvention of such
185 | measures.
186 | 
187 |   When you convey a covered work, you waive any legal power to forbid
188 | circumvention of technological measures to the extent such circumvention
189 | is effected by exercising rights under this License with respect to
190 | the covered work, and you disclaim any intention to limit operation or
191 | modification of the work as a means of enforcing, against the work's
192 | users, your or third parties' legal rights to forbid circumvention of
193 | technological measures.
194 | 
195 |   4. Conveying Verbatim Copies.
196 | 
197 |   You may convey verbatim copies of the Program's source code as you
198 | receive it, in any medium, provided that you conspicuously and
199 | appropriately publish on each copy an appropriate copyright notice;
200 | keep intact all notices stating that this License and any
201 | non-permissive terms added in accord with section 7 apply to the code;
202 | keep intact all notices of the absence of any warranty; and give all
203 | recipients a copy of this License along with the Program.
204 | 
205 |   You may charge any price or no price for each copy that you convey,
206 | and you may offer support or warranty protection for a fee.
207 | 
208 |   5. Conveying Modified Source Versions.
209 | 
210 |   You may convey a work based on the Program, or the modifications to
211 | produce it from the Program, in the form of source code under the
212 | terms of section 4, provided that you also meet all of these conditions:
213 | 
214 |     a) The work must carry prominent notices stating that you modified
215 |     it, and giving a relevant date.
216 | 
217 |     b) The work must carry prominent notices stating that it is
218 |     released under this License and any conditions added under section
219 |     7.  This requirement modifies the requirement in section 4 to
220 |     "keep intact all notices".
221 | 
222 |     c) You must license the entire work, as a whole, under this
223 |     License to anyone who comes into possession of a copy.  This
224 |     License will therefore apply, along with any applicable section 7
225 |     additional terms, to the whole of the work, and all its parts,
226 |     regardless of how they are packaged.  This License gives no
227 |     permission to license the work in any other way, but it does not
228 |     invalidate such permission if you have separately received it.
229 | 
230 |     d) If the work has interactive user interfaces, each must display
231 |     Appropriate Legal Notices; however, if the Program has interactive
232 |     interfaces that do not display Appropriate Legal Notices, your
233 |     work need not make them do so.
234 | 
235 |   A compilation of a covered work with other separate and independent
236 | works, which are not by their nature extensions of the covered work,
237 | and which are not combined with it such as to form a larger program,
238 | in or on a volume of a storage or distribution medium, is called an
239 | "aggregate" if the compilation and its resulting copyright are not
240 | used to limit the access or legal rights of the compilation's users
241 | beyond what the individual works permit.  Inclusion of a covered work
242 | in an aggregate does not cause this License to apply to the other
243 | parts of the aggregate.
244 | 
245 |   6. Conveying Non-Source Forms.
246 | 
247 |   You may convey a covered work in object code form under the terms
248 | of sections 4 and 5, provided that you also convey the
249 | machine-readable Corresponding Source under the terms of this License,
250 | in one of these ways:
251 | 
252 |     a) Convey the object code in, or embodied in, a physical product
253 |     (including a physical distribution medium), accompanied by the
254 |     Corresponding Source fixed on a durable physical medium
255 |     customarily used for software interchange.
256 | 
257 |     b) Convey the object code in, or embodied in, a physical product
258 |     (including a physical distribution medium), accompanied by a
259 |     written offer, valid for at least three years and valid for as
260 |     long as you offer spare parts or customer support for that product
261 |     model, to give anyone who possesses the object code either (1) a
262 |     copy of the Corresponding Source for all the software in the
263 |     product that is covered by this License, on a durable physical
264 |     medium customarily used for software interchange, for a price no
265 |     more than your reasonable cost of physically performing this
266 |     conveying of source, or (2) access to copy the
267 |     Corresponding Source from a network server at no charge.
268 | 
269 |     c) Convey individual copies of the object code with a copy of the
270 |     written offer to provide the Corresponding Source.  This
271 |     alternative is allowed only occasionally and noncommercially, and
272 |     only if you received the object code with such an offer, in accord
273 |     with subsection 6b.
274 | 
275 |     d) Convey the object code by offering access from a designated
276 |     place (gratis or for a charge), and offer equivalent access to the
277 |     Corresponding Source in the same way through the same place at no
278 |     further charge.  You need not require recipients to copy the
279 |     Corresponding Source along with the object code.  If the place to
280 |     copy the object code is a network server, the Corresponding Source
281 |     may be on a different server (operated by you or a third party)
282 |     that supports equivalent copying facilities, provided you maintain
283 |     clear directions next to the object code saying where to find the
284 |     Corresponding Source.  Regardless of what server hosts the
285 |     Corresponding Source, you remain obligated to ensure that it is
286 |     available for as long as needed to satisfy these requirements.
287 | 
288 |     e) Convey the object code using peer-to-peer transmission, provided
289 |     you inform other peers where the object code and Corresponding
290 |     Source of the work are being offered to the general public at no
291 |     charge under subsection 6d.
292 | 
293 |   A separable portion of the object code, whose source code is excluded
294 | from the Corresponding Source as a System Library, need not be
295 | included in conveying the object code work.
296 | 
297 |   A "User Product" is either (1) a "consumer product", which means any
298 | tangible personal property which is normally used for personal, family,
299 | or household purposes, or (2) anything designed or sold for incorporation
300 | into a dwelling.  In determining whether a product is a consumer product,
301 | doubtful cases shall be resolved in favor of coverage.  For a particular
302 | product received by a particular user, "normally used" refers to a
303 | typical or common use of that class of product, regardless of the status
304 | of the particular user or of the way in which the particular user
305 | actually uses, or expects or is expected to use, the product.  A product
306 | is a consumer product regardless of whether the product has substantial
307 | commercial, industrial or non-consumer uses, unless such uses represent
308 | the only significant mode of use of the product.
309 | 
310 |   "Installation Information" for a User Product means any methods,
311 | procedures, authorization keys, or other information required to install
312 | and execute modified versions of a covered work in that User Product from
313 | a modified version of its Corresponding Source.  The information must
314 | suffice to ensure that the continued functioning of the modified object
315 | code is in no case prevented or interfered with solely because
316 | modification has been made.
317 | 
318 |   If you convey an object code work under this section in, or with, or
319 | specifically for use in, a User Product, and the conveying occurs as
320 | part of a transaction in which the right of possession and use of the
321 | User Product is transferred to the recipient in perpetuity or for a
322 | fixed term (regardless of how the transaction is characterized), the
323 | Corresponding Source conveyed under this section must be accompanied
324 | by the Installation Information.  But this requirement does not apply
325 | if neither you nor any third party retains the ability to install
326 | modified object code on the User Product (for example, the work has
327 | been installed in ROM).
328 | 
329 |   The requirement to provide Installation Information does not include a
330 | requirement to continue to provide support service, warranty, or updates
331 | for a work that has been modified or installed by the recipient, or for
332 | the User Product in which it has been modified or installed.  Access to a
333 | network may be denied when the modification itself materially and
334 | adversely affects the operation of the network or violates the rules and
335 | protocols for communication across the network.
336 | 
337 |   Corresponding Source conveyed, and Installation Information provided,
338 | in accord with this section must be in a format that is publicly
339 | documented (and with an implementation available to the public in
340 | source code form), and must require no special password or key for
341 | unpacking, reading or copying.
342 | 
343 |   7. Additional Terms.
344 | 
345 |   "Additional permissions" are terms that supplement the terms of this
346 | License by making exceptions from one or more of its conditions.
347 | Additional permissions that are applicable to the entire Program shall
348 | be treated as though they were included in this License, to the extent
349 | that they are valid under applicable law.  If additional permissions
350 | apply only to part of the Program, that part may be used separately
351 | under those permissions, but the entire Program remains governed by
352 | this License without regard to the additional permissions.
353 | 
354 |   When you convey a copy of a covered work, you may at your option
355 | remove any additional permissions from that copy, or from any part of
356 | it.  (Additional permissions may be written to require their own
357 | removal in certain cases when you modify the work.)  You may place
358 | additional permissions on material, added by you to a covered work,
359 | for which you have or can give appropriate copyright permission.
360 | 
361 |   Notwithstanding any other provision of this License, for material you
362 | add to a covered work, you may (if authorized by the copyright holders of
363 | that material) supplement the terms of this License with terms:
364 | 
365 |     a) Disclaiming warranty or limiting liability differently from the
366 |     terms of sections 15 and 16 of this License; or
367 | 
368 |     b) Requiring preservation of specified reasonable legal notices or
369 |     author attributions in that material or in the Appropriate Legal
370 |     Notices displayed by works containing it; or
371 | 
372 |     c) Prohibiting misrepresentation of the origin of that material, or
373 |     requiring that modified versions of such material be marked in
374 |     reasonable ways as different from the original version; or
375 | 
376 |     d) Limiting the use for publicity purposes of names of licensors or
377 |     authors of the material; or
378 | 
379 |     e) Declining to grant rights under trademark law for use of some
380 |     trade names, trademarks, or service marks; or
381 | 
382 |     f) Requiring indemnification of licensors and authors of that
383 |     material by anyone who conveys the material (or modified versions of
384 |     it) with contractual assumptions of liability to the recipient, for
385 |     any liability that these contractual assumptions directly impose on
386 |     those licensors and authors.
387 | 
388 |   All other non-permissive additional terms are considered "further
389 | restrictions" within the meaning of section 10.  If the Program as you
390 | received it, or any part of it, contains a notice stating that it is
391 | governed by this License along with a term that is a further
392 | restriction, you may remove that term.  If a license document contains
393 | a further restriction but permits relicensing or conveying under this
394 | License, you may add to a covered work material governed by the terms
395 | of that license document, provided that the further restriction does
396 | not survive such relicensing or conveying.
397 | 
398 |   If you add terms to a covered work in accord with this section, you
399 | must place, in the relevant source files, a statement of the
400 | additional terms that apply to those files, or a notice indicating
401 | where to find the applicable terms.
402 | 
403 |   Additional terms, permissive or non-permissive, may be stated in the
404 | form of a separately written license, or stated as exceptions;
405 | the above requirements apply either way.
406 | 
407 |   8. Termination.
408 | 
409 |   You may not propagate or modify a covered work except as expressly
410 | provided under this License.  Any attempt otherwise to propagate or
411 | modify it is void, and will automatically terminate your rights under
412 | this License (including any patent licenses granted under the third
413 | paragraph of section 11).
414 | 
415 |   However, if you cease all violation of this License, then your
416 | license from a particular copyright holder is reinstated (a)
417 | provisionally, unless and until the copyright holder explicitly and
418 | finally terminates your license, and (b) permanently, if the copyright
419 | holder fails to notify you of the violation by some reasonable means
420 | prior to 60 days after the cessation.
421 | 
422 |   Moreover, your license from a particular copyright holder is
423 | reinstated permanently if the copyright holder notifies you of the
424 | violation by some reasonable means, this is the first time you have
425 | received notice of violation of this License (for any work) from that
426 | copyright holder, and you cure the violation prior to 30 days after
427 | your receipt of the notice.
428 | 
429 |   Termination of your rights under this section does not terminate the
430 | licenses of parties who have received copies or rights from you under
431 | this License.  If your rights have been terminated and not permanently
432 | reinstated, you do not qualify to receive new licenses for the same
433 | material under section 10.
434 | 
435 |   9. Acceptance Not Required for Having Copies.
436 | 
437 |   You are not required to accept this License in order to receive or
438 | run a copy of the Program.  Ancillary propagation of a covered work
439 | occurring solely as a consequence of using peer-to-peer transmission
440 | to receive a copy likewise does not require acceptance.  However,
441 | nothing other than this License grants you permission to propagate or
442 | modify any covered work.  These actions infringe copyright if you do
443 | not accept this License.  Therefore, by modifying or propagating a
444 | covered work, you indicate your acceptance of this License to do so.
445 | 
446 |   10. Automatic Licensing of Downstream Recipients.
447 | 
448 |   Each time you convey a covered work, the recipient automatically
449 | receives a license from the original licensors, to run, modify and
450 | propagate that work, subject to this License.  You are not responsible
451 | for enforcing compliance by third parties with this License.
452 | 
453 |   An "entity transaction" is a transaction transferring control of an
454 | organization, or substantially all assets of one, or subdividing an
455 | organization, or merging organizations.  If propagation of a covered
456 | work results from an entity transaction, each party to that
457 | transaction who receives a copy of the work also receives whatever
458 | licenses to the work the party's predecessor in interest had or could
459 | give under the previous paragraph, plus a right to possession of the
460 | Corresponding Source of the work from the predecessor in interest, if
461 | the predecessor has it or can get it with reasonable efforts.
462 | 
463 |   You may not impose any further restrictions on the exercise of the
464 | rights granted or affirmed under this License.  For example, you may
465 | not impose a license fee, royalty, or other charge for exercise of
466 | rights granted under this License, and you may not initiate litigation
467 | (including a cross-claim or counterclaim in a lawsuit) alleging that
468 | any patent claim is infringed by making, using, selling, offering for
469 | sale, or importing the Program or any portion of it.
470 | 
471 |   11. Patents.
472 | 
473 |   A "contributor" is a copyright holder who authorizes use under this
474 | License of the Program or a work on which the Program is based.  The
475 | work thus licensed is called the contributor's "contributor version".
476 | 
477 |   A contributor's "essential patent claims" are all patent claims
478 | owned or controlled by the contributor, whether already acquired or
479 | hereafter acquired, that would be infringed by some manner, permitted
480 | by this License, of making, using, or selling its contributor version,
481 | but do not include claims that would be infringed only as a
482 | consequence of further modification of the contributor version.  For
483 | purposes of this definition, "control" includes the right to grant
484 | patent sublicenses in a manner consistent with the requirements of
485 | this License.
486 | 
487 |   Each contributor grants you a non-exclusive, worldwide, royalty-free
488 | patent license under the contributor's essential patent claims, to
489 | make, use, sell, offer for sale, import and otherwise run, modify and
490 | propagate the contents of its contributor version.
491 | 
492 |   In the following three paragraphs, a "patent license" is any express
493 | agreement or commitment, however denominated, not to enforce a patent
494 | (such as an express permission to practice a patent or covenant not to
495 | sue for patent infringement).  To "grant" such a patent license to a
496 | party means to make such an agreement or commitment not to enforce a
497 | patent against the party.
498 | 
499 |   If you convey a covered work, knowingly relying on a patent license,
500 | and the Corresponding Source of the work is not available for anyone
501 | to copy, free of charge and under the terms of this License, through a
502 | publicly available network server or other readily accessible means,
503 | then you must either (1) cause the Corresponding Source to be so
504 | available, or (2) arrange to deprive yourself of the benefit of the
505 | patent license for this particular work, or (3) arrange, in a manner
506 | consistent with the requirements of this License, to extend the patent
507 | license to downstream recipients.  "Knowingly relying" means you have
508 | actual knowledge that, but for the patent license, your conveying the
509 | covered work in a country, or your recipient's use of the covered work
510 | in a country, would infringe one or more identifiable patents in that
511 | country that you have reason to believe are valid.
512 | 
513 |   If, pursuant to or in connection with a single transaction or
514 | arrangement, you convey, or propagate by procuring conveyance of, a
515 | covered work, and grant a patent license to some of the parties
516 | receiving the covered work authorizing them to use, propagate, modify
517 | or convey a specific copy of the covered work, then the patent license
518 | you grant is automatically extended to all recipients of the covered
519 | work and works based on it.
520 | 
521 |   A patent license is "discriminatory" if it does not include within
522 | the scope of its coverage, prohibits the exercise of, or is
523 | conditioned on the non-exercise of one or more of the rights that are
524 | specifically granted under this License.  You may not convey a covered
525 | work if you are a party to an arrangement with a third party that is
526 | in the business of distributing software, under which you make payment
527 | to the third party based on the extent of your activity of conveying
528 | the work, and under which the third party grants, to any of the
529 | parties who would receive the covered work from you, a discriminatory
530 | patent license (a) in connection with copies of the covered work
531 | conveyed by you (or copies made from those copies), or (b) primarily
532 | for and in connection with specific products or compilations that
533 | contain the covered work, unless you entered into that arrangement,
534 | or that patent license was granted, prior to 28 March 2007.
535 | 
536 |   Nothing in this License shall be construed as excluding or limiting
537 | any implied license or other defenses to infringement that may
538 | otherwise be available to you under applicable patent law.
539 | 
540 |   12. No Surrender of Others' Freedom.
541 | 
542 |   If conditions are imposed on you (whether by court order, agreement or
543 | otherwise) that contradict the conditions of this License, they do not
544 | excuse you from the conditions of this License.  If you cannot convey a
545 | covered work so as to satisfy simultaneously your obligations under this
546 | License and any other pertinent obligations, then as a consequence you may
547 | not convey it at all.  For example, if you agree to terms that obligate you
548 | to collect a royalty for further conveying from those to whom you convey
549 | the Program, the only way you could satisfy both those terms and this
550 | License would be to refrain entirely from conveying the Program.
551 | 
552 |   13. Use with the GNU Affero General Public License.
553 | 
554 |   Notwithstanding any other provision of this License, you have
555 | permission to link or combine any covered work with a work licensed
556 | under version 3 of the GNU Affero General Public License into a single
557 | combined work, and to convey the resulting work.  The terms of this
558 | License will continue to apply to the part which is the covered work,
559 | but the special requirements of the GNU Affero General Public License,
560 | section 13, concerning interaction through a network will apply to the
561 | combination as such.
562 | 
563 |   14. Revised Versions of this License.
564 | 
565 |   The Free Software Foundation may publish revised and/or new versions of
566 | the GNU General Public License from time to time.  Such new versions will
567 | be similar in spirit to the present version, but may differ in detail to
568 | address new problems or concerns.
569 | 
570 |   Each version is given a distinguishing version number.  If the
571 | Program specifies that a certain numbered version of the GNU General
572 | Public License "or any later version" applies to it, you have the
573 | option of following the terms and conditions either of that numbered
574 | version or of any later version published by the Free Software
575 | Foundation.  If the Program does not specify a version number of the
576 | GNU General Public License, you may choose any version ever published
577 | by the Free Software Foundation.
578 | 
579 |   If the Program specifies that a proxy can decide which future
580 | versions of the GNU General Public License can be used, that proxy's
581 | public statement of acceptance of a version permanently authorizes you
582 | to choose that version for the Program.
583 | 
584 |   Later license versions may give you additional or different
585 | permissions.  However, no additional obligations are imposed on any
586 | author or copyright holder as a result of your choosing to follow a
587 | later version.
588 | 
589 |   15. Disclaimer of Warranty.
590 | 
591 |   THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
592 | APPLICABLE LAW.  EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
596 | PURPOSE.  THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
597 | IS WITH YOU.  SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
599 | 
600 |   16. Limitation of Liability.
601 | 
602 |   IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
610 | SUCH DAMAGES.
611 | 
612 |   17. Interpretation of Sections 15 and 16.
613 | 
614 |   If the disclaimer of warranty and limitation of liability provided
615 | above cannot be given local legal effect according to their terms,
616 | reviewing courts shall apply local law that most closely approximates
617 | an absolute waiver of all civil liability in connection with the
618 | Program, unless a warranty or assumption of liability accompanies a
619 | copy of the Program in return for a fee.
620 | 
621 |                      END OF TERMS AND CONDITIONS
622 | 
623 |             How to Apply These Terms to Your New Programs
624 | 
625 |   If you develop a new program, and you want it to be of the greatest
626 | possible use to the public, the best way to achieve this is to make it
627 | free software which everyone can redistribute and change under these terms.
628 | 
629 |   To do so, attach the following notices to the program.  It is safest
630 | to attach them to the start of each source file to most effectively
631 | state the exclusion of warranty; and each file should have at least
632 | the "copyright" line and a pointer to where the full notice is found.
633 | 
634 |     <one line to give the program's name and a brief idea of what it does.>
635 |     Copyright (C) <year>  <name of author>
636 | 
637 |     This program is free software: you can redistribute it and/or modify
638 |     it under the terms of the GNU General Public License as published by
639 |     the Free Software Foundation, either version 3 of the License, or
640 |     (at your option) any later version.
641 | 
642 |     This program is distributed in the hope that it will be useful,
643 |     but WITHOUT ANY WARRANTY; without even the implied warranty of
644 |     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
645 |     GNU General Public License for more details.
646 | 
647 |     You should have received a copy of the GNU General Public License
648 |     along with this program.  If not, see <https://www.gnu.org/licenses/>.
649 | 
650 | Also add information on how to contact you by electronic and paper mail.
651 | 
652 |   If the program does terminal interaction, make it output a short
653 | notice like this when it starts in an interactive mode:
654 | 
655 |     <program>  Copyright (C) <year>  <name of author>
656 |     This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
657 |     This is free software, and you are welcome to redistribute it
658 |     under certain conditions; type `show c' for details.
659 | 
660 | The hypothetical commands `show w' and `show c' should show the appropriate
661 | parts of the General Public License.  Of course, your program's commands
662 | might be different; for a GUI interface, you would use an "about box".
663 | 
664 |   You should also get your employer (if you work as a programmer) or school,
665 | if any, to sign a "copyright disclaimer" for the program, if necessary.
666 | For more information on this, and how to apply and follow the GNU GPL, see
667 | <https://www.gnu.org/licenses/>.
668 | 
669 |   The GNU General Public License does not permit incorporating your program
670 | into proprietary programs.  If your program is a subroutine library, you
671 | may consider it more useful to permit linking proprietary applications with
672 | the library.  If this is what you want to do, use the GNU Lesser General
673 | Public License instead of this License.  But first, please read
674 | <https://www.gnu.org/licenses/why-not-lgpl.html>.
675 | 


--------------------------------------------------------------------------------
/SGD.py:
--------------------------------------------------------------------------------
   1 | #!/usr/bin/env python3
   2 | # -*- coding: utf-8 -*-
   3 | """ 
   4 | -------------------------------------------------------------------------------
   5 | If you find this code useful please cite the article:
   6 | 
   7 | Topology Optimization under Uncertainty using a Stochastic Gradient-based Approach
   8 | Subhayan De, Jerrad Hampton, Kurt Maute, and Alireza Doostan (2020)
   9 | Structural and Multidisciplinary Optimization, 62(5), 2255-2278. 
  10 | https://doi.org/10.1007/s00158-020-02599-z
  11 | 
  12 | BibTeX entry:
  13 | @article{de2020topology,
  14 | title={Topology optimization under uncertainty using a stochastic gradient-based approach},
  15 | author={De, Subhayan and Hampton, Jerrad and Maute, Kurt and Doostan, Alireza},
  16 | journal={Structural and Multidisciplinary Optimization},
  17 | volume={62},
  18 | number={5},
  19 | pages={2255--2278},
  20 | year={2020},
  21 | publisher={Springer}
  22 | }
  23 | 
  24 | Download the SGD module from https://github.com/CU-UQ/SGD.
  25 | See the demo https://github.com/CU-UQ/SGD/blob/master/sgd_demo.py for an example of the implementation.
  26 | For a description of the algorithms, see De et al (2020) (https://doi.org/10.1007/s00158-020-02599-z) and Ruder (2016) (https://arxiv.org/abs/1609.04747). 
  27 | Please report any bugs to Subhayan.De@colorado.edu
  28 | Website: www.subhayande.com
  29 | -------------------------------------------------------------------------------
  30 | 
  31 | This is the class file that implements: 
  32 | (i) Stochastic Gradient Descent, 
  33 | (ii) SGD with Momentum,
  34 | (iii) NAG,
  35 | (iv) AdaGrad, 
  36 | (iv) RMSprop,
  37 | (vi) Adam, 
  38 | (vii) Adamax,
  39 | (viii) Adadelta,
  40 | (ix) Nadam, 
  41 | (x) SAG, 
  42 | (xi) minibatch SGD, 
  43 | (xii) SVRG.
  44 | 
  45 | NOTE: Currently, the stopping conditions are maximum number of iteration and 2nd norm of gradient vector 
  46 | and time-delay and exponential learnong schedules are implemented.
  47 | 
  48 | Copyright (C) 2019  Subhayan De
  49 | 
  50 |     This program is free software: you can redistribute it and/or modify
  51 |     it under the terms of the GNU General Public License as published by
  52 |     the Free Software Foundation, either version 3 of the License, or
  53 |     (at your option) any later version.
  54 | 
  55 |     This program is distributed in the hope that it will be useful,
  56 |     but WITHOUT ANY WARRANTY; without even the implied warranty of
  57 |     MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  58 |     GNU General Public License for more details.
  59 | 
  60 |     You should have received a copy of the GNU General Public License
  61 |     along with this program.  If not, see <https://www.gnu.org/licenses/>.
  62 | 
  63 | Created on Sat Jun 30 01:04:28 2018
  64 | @author: Subhayan De 
  65 | 
  66 | Report any bugs to Subhayan.De@colorado.edu
  67 | 
  68 | Author's note:  add kSGD, 2nd order methods
  69 | """
  70 | 
  71 | import numpy as np
  72 | import time
  73 | 
  74 | # Print iterations progress
  75 | def printProgressBar (iteration, total, prefix = '', suffix = '', decimals = 1, length = 100, fill = '█'):
  76 |     """
  77 |         Call in a loop to create terminal progress bar
  78 |         parameters:
  79 |         iteration   - Required  : current iteration (Int)
  80 |         total       - Required  : total iterations (Int)
  81 |         prefix      - Optional  : prefix string (Str)
  82 |         suffix      - Optional  : suffix string (Str)
  83 |         decimals    - Optional  : positive number of decimals in percent complete (Int)
  84 |         length      - Optional  : character length of bar (Int)
  85 |         fill        - Optional  : bar fill character (Str)
  86 |         """
  87 |     percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / float(total)))
  88 |     filledLength = int(length * iteration // total)
  89 |     bar = fill * filledLength + '-' * (length - filledLength)
  90 |     print('\r%s |%s| %s%% %s' % (prefix, bar, percent, suffix), end = '\r')
  91 |     # Print New Line on Complete
  92 |     if iteration == total:
  93 |         print()
  94 | 
  95 | 
  96 | class SGD(object):
  97 |     """ 
  98 |     ==============================================================================
  99 |     |                     Stochastic Gradient Descent class                      |
 100 |     ==============================================================================
 101 |     Initialization:
 102 |         sgd = SGD(obj, grad, eta, param, iter, maxIter, objFun, gradFun, 
 103 |                   lowerBound, upperBound, stopGrad, momentum, nesterov, 
 104 |                   learnSched, lrParam)
 105 |         
 106 |     NOTE: To perform just one iteration provide either grad or gradFn. 
 107 |           obj  or objFn are optional.
 108 |     ==============================================================================
 109 |     Attributes:
 110 |         obj:        objective (optional input)
 111 |         grad:       Gradient information 
 112 |                     (array of dimension nParam-by-1, optional input)
 113 |         eta:        learning rate ( = 1.0, default)
 114 |         param:      the parameter vector (array of dimension nParam-by-1)
 115 |         nParam:     number of parameters
 116 |         iter:       iteration number
 117 |         maxIter:    maximum iteration number (optional, default = 1)
 118 |         objFun:     function handle to evaluate the objective 
 119 |                     (not required for maxit = 1 )
 120 |         gradFun:    function handle to evaluate the gradient 
 121 |                     (not required for maxit = 1 )
 122 |         lowerBound: lower bound for the parameters (optional input)
 123 |         upperBound: upper bound for the parameters (optional input)
 124 |         paramHist:  parameter evolution history
 125 |         stopGrad:   stopping criterion based on 2-norm of gradient vector
 126 |         momentum:   momentum parameter (default = 0)
 127 |         nesterov:   set to True if Nesterov momentum equation to be used 
 128 |                     (default = False)
 129 |         learnSched: learning schedule (constant, exponential or time-based, 
 130 |                                        default = constant)
 131 |         lrParam:    learning schedule parameter (default =0.1)
 132 |         alg:        algorithm used
 133 |         __version__:version of the code
 134 |     ==============================================================================
 135 |     Methods:
 136 |      Public:
 137 |         getParam:       returns the parameter values
 138 |         getObj:         returns the current objective value
 139 |         getGrad:        returns the current gradient information
 140 |         update:         perform a single iteration
 141 |         performIter:    perform maxIter number of iterations
 142 |         getParamHist:   returns parameter update history
 143 |      Private:
 144 |         __init___:          initialization
 145 |         evaluateObjFn:      evaluates the objective function
 146 |         evaluateGradFn:     evaluates the gradients
 147 |         satisfyBounds:      satisfies the parameter bounds
 148 |         learningSchedule:   learning schedule
 149 |         stopCrit:           check stopping criteria
 150 |     ==============================================================================
 151 |     Reference: Bottou, Léon, Frank E. Curtis, and Jorge Nocedal. 
 152 |     "Optimization methods for large-scale machine learning." 
 153 |     SIAM Review 60.2 (2018): 223-311.
 154 |     ==============================================================================
 155 |     written by Subhayan De (email: Subhayan.De@colorado.edu), July, 2018.
 156 |     ==============================================================================
 157 |     """ 
 158 |     def __init__(self,**kwargs):
 159 |         allowed_kwargs = {'obj', 'grad', 'param', 'eta', 'iter', 'maxiter', 'objFun', 'gradFun', 'lowerBound', 'upperBound', 'oldGrad', 'stopGrad', 'momentum', 'nesterov','learnSched', 'lrParam'}
 160 |         for k in kwargs:
 161 |             if k not in allowed_kwargs:
 162 |                 raise TypeError('Unexpected keyword argument passed to optimizer at: ' + str(k))
 163 | 
 164 |         self.__dict__.update(kwargs)
 165 |         self.nParam = np.size(self.param)
 166 |         # Checks and setting default values
 167 |         # Iteration numbers
 168 |         if hasattr(self,'iter') == False:
 169 |             self.iter = 0 # set the iteration number
 170 |         self.currentIter = self.iter
 171 |         # stopping criteria
 172 |         # max iteration no.
 173 |         if hasattr(self,'maxiter') == False:
 174 |             self.maxiter = 1 # set the default max iteration number
 175 |         # minimum gradient
 176 |         if hasattr(self,'stopGrad') == False:
 177 |             self.stopGrad = 1e-6
 178 |         # Parameter values
 179 |         if hasattr(self,'param') == False:
 180 |             raise ValueError('Parameter vector is missing')
 181 |         # Gradient information
 182 |         if hasattr(self,'grad') == False:
 183 |             print('No gradient information provided at iteration: 1')
 184 |             if hasattr(self,'gradFun') == False:
 185 |                 raise ValueError('Please provide the gradient function')
 186 |         elif np.size(self.grad) != self.nParam:
 187 |             raise ValueError('Gradient dimension mismatch')
 188 |         if self.maxiter > 1 and hasattr(self,'gradFun') == False:
 189 |             raise ValueError('Please provide the gradient function')
 190 |         # Objective values
 191 |         if hasattr(self,'objFun') == False and self.maxiter > 1:
 192 |             raise ValueError('Please provide the objective function')
 193 |         if hasattr(self,'obj') == False:
 194 |             self.obj = np.array([])
 195 |             if hasattr(self,'objFun'):
 196 |                 self.evaluateObjFn(self)
 197 |         else:
 198 |             self.obj = np.array([self.obj])
 199 |         # Learning rate
 200 |         if hasattr(self,'eta') == False:
 201 |             self.eta = 1.0
 202 |             print('*NOTE: No learning rate provided, assumed as 1.0')
 203 |         else:
 204 |             print('Learning rate = ',self.eta,'\n')    
 205 |         if hasattr(self,'lowerBound') == False:
 206 |             self.lowerBound = -np.inf*np.ones(self.nParam)
 207 |         elif np.size(self.lowerBound) == 1:
 208 |             self.lowerBound = self.lowerBound*np.ones(self.nParam)
 209 |         else:
 210 |             raise ValueError('parameter lower bound dimension mismatch')
 211 |         # Set the upper bounds
 212 |         if hasattr(self,'upperBound') == False:
 213 |             self.upperBound = np.inf*np.ones(self.nParam)
 214 |         elif np.size(self.upperBound) == 1:
 215 |             self.upperBound = self.upperBound*np.ones(self.nParam)
 216 |         else:
 217 |             raise ValueError('parameter upper bound dimension mismatch')
 218 |         # Momentum
 219 |         #self.alg = 'SGD with Momentum'
 220 |         if hasattr(self,'alg') == False:
 221 |             self.alg = 'SGD+momentum'
 222 |             if hasattr(self,'momentum') == False:
 223 |                 self.alg = 'SGD'
 224 |                 self.momentum = 0.0;
 225 |         self.paramHist = np.reshape(self.param,(2,1))
 226 |         self.__version__ = '0.0.1'
 227 |         self.stop = False
 228 |         self.updateParam = np.zeros(self.nParam)
 229 |         # Nesterov momentum
 230 |         if hasattr(self, 'nesterov'):
 231 |             if self.nesterov == True:
 232 |                 self.alg = 'SGD+Nesterov momentum'
 233 |                 if hasattr(self,'gradFun') == False:
 234 |                     raise ValueError('provide gradient function information with Nesterov')
 235 |         else:
 236 |             self.nesterov = False
 237 |         # learning schedule
 238 |         if hasattr(self,'learnSched') == False:
 239 |             self.learnSched = 'constant'
 240 |         elif self.learnSched != 'exponential' and self.learnSched != 'time-based':
 241 |             print('no such learning schedule in this module\nSet to constant')
 242 |             self.learnSched = 'constant'
 243 |         elif hasattr(self,'lrParam') == False:
 244 |             self.lrParam = 0.1
 245 |         print('Learning schedule: ',self.learnSched)
 246 |             
 247 |         
 248 |     def __version__(self):
 249 |         """
 250 |         version of the code
 251 |         """
 252 |         print(self.__version__)
 253 |         
 254 |     def getParam(self):
 255 |         """
 256 |         To get the next parameter values
 257 |         """
 258 |         print(self.nParam,'parameters have been updated!\n')
 259 |         return self.param
 260 |     
 261 |     def getObj(self):
 262 |         """
 263 |         To get the current objective (if possible)
 264 |         """
 265 |         self.evaluateObjFn()
 266 |         return self.obj
 267 |     
 268 |     def getGrad(self):
 269 |         """
 270 |         To get the gradients
 271 |         """
 272 |         return self.grad
 273 |     
 274 |     def getParamHist(self):
 275 |         """
 276 |         To get parameter history
 277 |         """
 278 |         return self.paramHist
 279 |     
 280 |     def evaluateObjFn(self):
 281 |         """
 282 |         This evalutes the objective function
 283 |         objFun should be a function handle with input: param, output: objective
 284 |         """
 285 |         if not self.obj.any():
 286 |             print('No objective information provided to SGD')
 287 |         else:
 288 |             self.obj = np.append(self.obj,self.objFun(self.param))
 289 |             #print('Current objective value: ', self.obj[self.currentIter],'\n')
 290 |     
 291 |     def evaluateGradFn(self):
 292 |         """
 293 |         This evalutes the gradient function for i-th data point, where i in [0, n]
 294 |         gradFun should be a function handle with input: param, output: gradient
 295 |         """
 296 |         self.grad = self.gradFun(self.param)
 297 |         
 298 |     def satisfyBounds(self):
 299 |         """
 300 |         This satisfies the parameter bounds (if any)
 301 |         """
 302 |         # Set the lower bounds
 303 |         #print(self.lowerBound)
 304 |         
 305 |         # Satisfy the bounds
 306 |         for i in range(self.nParam):
 307 |             if self.param[i] > self.upperBound[i]:
 308 |                 self.param[i] = self.upperBound[i]
 309 |             elif self.param[i] < self.lowerBound[i]:
 310 |                 self.param[i] = self.lowerBound[i]
 311 |                 
 312 |     def update(self):
 313 |         """
 314 |         Perform one iteration of SGD
 315 |         """
 316 |         # Perform one iteration of SGD
 317 |         SGD.learningSchedule(self)
 318 |         if self.nesterov == True:
 319 |             grdnt = self.gradFun(self.param - self.momentum*self.updateParam)
 320 |             self.updateParam = self.updateParam*self.momentum + self.etaCurrent*grdnt
 321 |         else:
 322 |             self.updateParam = self.updateParam*self.momentum + self.etaCurrent*self.grad
 323 |         self.param=self.param - self.updateParam
 324 |         #self.param=self.param - self.eta*self.grad
 325 |         # satisfy the parameter bounds
 326 |         SGD.satisfyBounds(self)
 327 |         self.paramHist = np.append(self.paramHist,np.reshape(self.param,(2,1)), axis = 1)
 328 |         #print('One iteration of Stochatsic Gradient Descent has been performed successfully!\n')
 329 |         
 330 |     def performIter(self):
 331 |         """
 332 |         Performs all the iterations of SGD
 333 |         """
 334 |         SGD.printAlg(self)
 335 |         # initialize progress bar
 336 |         printProgressBar(0, self.maxiter, prefix = self.alg, suffix = 'Complete', length = 25)
 337 |         self.t = time.clock()
 338 |         for i in range(self.iter,self.maxiter,1):
 339 |             if self.stop == True:
 340 |                 break
 341 |         #print('iteration', i+1, 'out of', self.maxiter)
 342 |             self.update()
 343 |             self.currentIter = i+1
 344 |             # print progress bar
 345 |             SGD.printProgress(self)
 346 |             # Update the objective and gradient
 347 |             if self.maxiter > 1: # since objFun and gradFun are optional for 1 iteration
 348 |                 SGD.evaluateObjFn(self)
 349 |                 SGD.evaluateGradFn(self)
 350 |                 SGD.stopCrit(self)
 351 |     
 352 |     def stopCrit(self):
 353 |         """
 354 |         Checks stopping criteria
 355 |         """
 356 |         if self.grad.ndim >1:
 357 |             self.avgGrad = np.mean(self.grad,axis =1)
 358 |             if np.linalg.norm(self.avgGrad)<self.stopGrad:
 359 |                 self.stop = True
 360 |         elif np.linalg.norm(self.grad)<self.stopGrad:
 361 |             self.stop = True
 362 |             
 363 |     def learningSchedule(self):
 364 |         """
 365 |         creates a learning schedule for SGD
 366 |         """
 367 |         if self.learnSched == 'constant':
 368 |             self.etaCurrent =self.eta # no change        
 369 |         elif self.learnSched == 'exponential':
 370 |             self.etaCurrent = self.eta*np.exp(-self.lrParam*self.currentIter)
 371 |             print(self.etaCurrent)
 372 |         elif self.learnSched == 'time-based':
 373 |             self.etaCurrent = self.eta/(1.0+self.lrParam*self.currentIter)
 374 |     
 375 |     def printAlg(self):
 376 |         """
 377 |         prints algorithm
 378 |         """
 379 |         print('\nAlgorithm: ',self.alg,'\n')
 380 |         
 381 |     def printProgress(self):
 382 |         # Update Progress Bar
 383 |         if hasattr(self,'outerIter'):
 384 |             printProgressBar(self.currentIter, self.outerIter, prefix = self.alg, suffix = ('Complete: Time Elapsed = '+str(np.around(time.clock()-self.t,decimals=2))+'s'+', Objective = '+str(np.around(self.obj[self.currentIter-1],decimals=6))+'    '), length = 25)
 385 |         else:
 386 |             printProgressBar(self.currentIter, self.maxiter, prefix = self.alg, suffix = ('Complete: Time Elapsed = '+str(np.around(time.clock()-self.t,decimals=2))+'s'+', Objective = '+str(np.around(self.obj[self.currentIter-1],decimals=6))+'    '), length = 25)
 387 | 
 388 | 
 389 | class AdaGrad(SGD):
 390 |     """
 391 |     ==============================================================================
 392 |     |                Adaptive Subgradient Method (AdaGrad) class                 |
 393 |     |               derived class from Stochastic Gradient Descent               |
 394 |     ==============================================================================
 395 |     Initialization:
 396 |         adg = AdaGrad(gradHist, obj, grad, eta, param, 
 397 |                       iter, maxIter, objFun, gradFun, lowerBound, upperBound)
 398 |         
 399 |     NOTE: gradHist:     historical information of gradients 
 400 |                         (array of dimension nparam-by-1).
 401 |                         This should equal to zero for 1st iteration
 402 |     ==============================================================================
 403 |     Attributes: 
 404 |         obj:            Initial objective value (optional input)
 405 |         grad:           Gradient information (array of dimension nParam-by-1)
 406 |         eta:            learning rate ( = 1.0, default)
 407 |         param:          the parameter vector (array of dimension nParam-by-1)
 408 |         nParam:         number of parameters
 409 |         gradHist:       sum of gradient history (see the algorithm)
 410 |         epsilon:        square-root of machine-precision 
 411 |                         (required to avoid division by zero)
 412 |         iter:           iteration number (optional input)
 413 |         maxIter:        maximum iteration number (optional input, default = 1)
 414 |         objFun:         function handle to evaluate the objective 
 415 |                         (not required for maxit = 1 )
 416 |         gradFun:        function handle to evaluate the gradient 
 417 |                         (not required for maxit = 1 )
 418 |         lowerBound:     lower bound for the parameters (optional input)
 419 |         upperBound:     upper bound for the parameters (optional input)
 420 |         stopGrad:       stopping criterion based on 2-norm of gradient vector
 421 |                         (default 10^-6)
 422 |         alg:            algorithm used
 423 |         __version__:    version of the code
 424 |     ==============================================================================
 425 |     Methods:
 426 |      Public:
 427 |         performIter:performs all the iterations inside a for loop
 428 |         getGradHist:returns gradient history (default is zero)
 429 |         Inherited:
 430 |             getParam:       returns the parameter values
 431 |             getObj:         returns the current objective value
 432 |             getGrad:        returns the current gradient information
 433 |             getParamHist:   returns parameter update history
 434 | 
 435 |      Private: (should not be called outside this class file)
 436 |         __init__:       initialization
 437 |         update:         performs one iteration of AdaGrad
 438 |         Inherited:
 439 |             evaluateObjFn:      evaluates the objective function
 440 |             evaluateGradFn:     evaluates the gradients
 441 |             satisfyBounds:      satisfies the parameter bounds
 442 |             learningSchedule:   learning schedule
 443 |             stopCrit:           check stopping criteria
 444 |     ==============================================================================
 445 |     Reference: Duchi, John, Elad Hazan, and Yoram Singer. 
 446 |     "Adaptive subgradient methods for online learning and stochastic optimization." 
 447 |     Journal of Machine Learning Research 12.Jul (2011): 2121-2159.
 448 |     ==============================================================================
 449 |     written by Subhayan De (email: Subhayan.De@colorado.edu), July, 2018.
 450 |     ==============================================================================
 451 |     """
 452 |     def __init__(self,gradHist=0.0,**kwargs):
 453 |     #def __init__(self,grad,learningRate,param,nParam,gradHist):
 454 |         """ Initialize the AdaGrad class object. 
 455 |             This can be used to perform one iteration of AdaGrad. 
 456 |         """
 457 |         self.alg = 'AdaGrad'
 458 |         SGD.printAlg(self)
 459 |         #SGD.__init__(self,grad,learningRate,param,nParam)
 460 |         SGD.__init__(self,**kwargs)
 461 |         self.epsilon=np.finfo(float).eps # The machine precision
 462 |         if np.sum(gradHist) != 0.0:
 463 |             self.gradHist=np.reshape(gradHist,(self.nParam))
 464 |         else:
 465 |             self.gradHist = np.zeros(self.nParam)
 466 |         
 467 |     def update(self):
 468 |         """
 469 |         Perform one iteration of AdaGrad
 470 |         """
 471 |         SGD.learningSchedule(self)
 472 |         self.gradHist += np.multiply(self.grad,self.grad); # Sum of gradient history
 473 |         # Perform one iteration of AdaGrad
 474 |         self.param=self.param - np.divide((self.etaCurrent*self.grad),(np.sqrt(self.gradHist)+self.epsilon))
 475 |         # satisfy the parameter bounds
 476 |         SGD.satisfyBounds(self)
 477 |         self.paramHist = np.append(self.paramHist,np.reshape(self.param,(2,1)), axis = 1)
 478 |         #print('One iteration of AdaGrad has been performed successfully!\n')
 479 |         
 480 |     def performIter(self):
 481 |         """
 482 |         Performs all the iterations of AdaGrad
 483 |         """
 484 |         
 485 |         # initialize progress bar
 486 |         printProgressBar(0, self.maxiter, prefix = self.alg, suffix = 'Complete', length = 25)
 487 |         self.t = time.clock()
 488 |         for i in range(self.iter,self.maxiter,1):
 489 |             if self.stop == True:
 490 |                 break
 491 |             #print('iteration', i+1, 'out of', self.maxiter)
 492 |             self.update()
 493 |             self.currentIter = i+1
 494 |             # print progress bar
 495 |             SGD.printProgress(self)
 496 |             # Update the objective and gradient
 497 |             if self.maxiter > 1: # since objFun and gradFun are optional for 1 iteration
 498 |                 SGD.evaluateObjFn(self)
 499 |                 SGD.evaluateGradFn(self)
 500 |                 SGD.stopCrit(self)
 501 |                 
 502 |     def getGradHist(self):
 503 |         """
 504 |         Returns accumulated gradient history
 505 |         """
 506 |         return self.gradHist
 507 |     
 508 | class RMSprop(SGD):
 509 |     """
 510 |     ==============================================================================
 511 |     |                               RMSprop class                                |
 512 |     |               derived class from Stochastic Gradient Descent               |
 513 |     ==============================================================================
 514 |     Initialization:
 515 |         rp = RMSprop(gradHist, updatehist, rho, obj, grad, eta, param, 
 516 |                        iter, maxIter, objFun, gradFun, lowerBound, upperBound)
 517 |         NOTE: gradHist: historical information of gradients 
 518 |                         (array of dimension nparam-by-1)
 519 |                         this should equal to zero for 1st iteration
 520 |     ==============================================================================
 521 |     Attributes: 
 522 |         grad:           Gradient information (array of dimension nParam-by-1)
 523 |         eta:            learning rate = 1 by default
 524 |         param:          the parameter vector (array of dimension nParam-by-1)
 525 |         nParam:         number of parameters
 526 |         gradHist:       gradient history accumulator (see the algorithm)
 527 |         epsilon:        square-root of machine-precision 
 528 |                         (required to avoid division by zero)
 529 |         rho:            exponential decay rate (0.95 may be a good choice)
 530 |         iter:           iteration number (optional)
 531 |         maxIter:        maximum iteration number (optional input, default = 1)
 532 |         objFun:         function handle to evaluate the objective 
 533 |                         (not required for maxit = 1 )
 534 |         gradFun:        function handle to evaluate the gradient 
 535 |                         (not required for maxit = 1 )
 536 |         lowerBound:     lower bound for the parameters (optional input)
 537 |         upperBound:     upper bound for the parameters (optional input)
 538 |         stopGrad:       stopping criterion based on 2-norm of gradient vector
 539 |                         (default 10^-6)
 540 |         alg:            algorithm used
 541 |         __version__:    version of the code
 542 |     ==============================================================================
 543 |     Methods:
 544 |      Public:
 545 |         performIter:performs all the iterations inside a for loop
 546 |         getGradHist:returns gradient history (default is zero)
 547 |         Inherited:
 548 |             getParam:       returns the parameter values
 549 |             getObj:         returns the current objective value
 550 |             getGrad:        returns the current gradient information
 551 |             getParamHist:   returns parameter update history
 552 |      Private: (should not be called outside this class file)
 553 |         __init__:       initialization
 554 |         update:         performs one iteration of Adadelta
 555 |         Inherited:
 556 |             evaluateObjFn:      evaluates the objective function
 557 |             evaluateGradFn:     evaluates the gradients
 558 |             satisfyBounds:      satisfies the parameter bounds
 559 |             learningSchedule:   learning schedule
 560 |             stopCrit:           check stopping criteria
 561 |     ==============================================================================
 562 |     Reference: Geoffrey 	Hinton 
 563 |     "rmsprop: Divide the gradient by a running average of its recent magnitude." 
 564 |     http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf.
 565 |     ==============================================================================
 566 |     written by Subhayan De (email: Subhayan.De@colorado.edu), July, 2018.
 567 |     ==============================================================================
 568 |     """
 569 |     def __init__(self,gradHist=0.0,rho=0.9,**kwargs):
 570 |         """ Initialize the Adadelta class object. 
 571 |             This can be used to perform one iteration of Adadelta. 
 572 |         """
 573 |         self.alg = 'RMSprop'
 574 |         SGD.printAlg(self)
 575 |         SGD.__init__(self,**kwargs)
 576 |         self.epsilon=np.finfo(float).eps # The machine precision
 577 |         # Initialize gradient history
 578 |         if np.sum(gradHist) != 0.0:
 579 |             if np.size(gradHist) != self.nParam:
 580 |                 raise ValueError('Gradient history dimension mismatch')
 581 |             else:
 582 |                 self.gradHist=np.reshape(gradHist,(self.nParam))
 583 |         else:
 584 |             self.gradHist = np.zeros(self.nParam)
 585 |         # Initialize rho
 586 |         self.rho = rho
 587 |         
 588 |     def update(self):
 589 |         """
 590 |         Perform one iteration of RMSprop
 591 |         """
 592 |         # update gradient history acccumulator
 593 |         SGD.learningSchedule(self)
 594 |         self.gradHist+=self.rho*self.gradHist+(1.0-self.rho)*np.multiply(self.grad,self.grad); # Sum of gradient history
 595 |         # Perform one iteration of RMSprop
 596 |         RMSg = np.sqrt(self.gradHist)+self.epsilon
 597 |         updateParam = ((np.divide(self.grad,RMSg)))
 598 |         self.param=self.param-self.etaCurrent*updateParam
 599 |         SGD.satisfyBounds(self)
 600 |         self.paramHist = np.append(self.paramHist,np.reshape(self.param,(2,1)), axis = 1)
 601 |         #print('One iteration of RMSprop has been performed successfully!\n')
 602 |         
 603 |     def performIter(self):
 604 |         """
 605 |         Performs all the iterations of RMSprop
 606 |         """
 607 |         # initialize progress bar
 608 |         printProgressBar(0, self.maxiter, prefix = self.alg, suffix = 'Complete', length = 25)
 609 |         self.t = time.clock()
 610 |         for i in range(self.iter,self.maxiter,1):
 611 |             if self.stop == True:
 612 |                 break
 613 |             #print('iteration', i+1, 'out of', self.maxiter)
 614 |             self.update()
 615 |             self.currentIter = i+1
 616 |             # print progress bar
 617 |             SGD.printProgress(self)
 618 |             # Update the objective and gradient
 619 |             if self.maxiter > 1: # since objFun and gradFun are optional for 1 iteration
 620 |                 SGD.evaluateObjFn(self)
 621 |                 SGD.evaluateGradFn(self)
 622 |                 SGD.stopCrit(self)
 623 |         
 624 |     def getGradHist(self):
 625 |         """
 626 |         This returns the gradient history
 627 |         """
 628 |         return self.gradHist
 629 |     
 630 | class Adam(SGD):
 631 |     """
 632 |     ==============================================================================
 633 |     |                   Adaptive moment estimation (Adam) class                  |
 634 |     |               derived class from Stochastic Gradient Descent               |
 635 |     ==============================================================================
 636 |     Initialization:
 637 |         adm = Adam(m, v, beta1, beta2, obj, grad, eta, param, 
 638 |                    iter, maxIter, objFun, gradFun, lowerBound, upperBound)
 639 | 
 640 |     ==============================================================================
 641 |     Attributes: 
 642 |         grad:           Gradient information (array of dimension nParam-by-1)
 643 |         eta:            learning rate 
 644 |         param:          the parameter vector (array of dimension nParam-by-1)
 645 |         nParam:         number of parameters
 646 |         beta1, beta2:   exponential decay rates in [0,1) 
 647 |                         (default beta1 = 0.9, beta2 = 0.999)
 648 |         m:              First moment (array of dimension nParam-by-1)
 649 |         v:              Second raw moment (array of dimension nParam-by-1)
 650 |         epsilon:        square-root of machine-precision 
 651 |                         (required to avoid division by zero)
 652 |         iter:           iteration number
 653 |         maxIter:        maximum iteration number (optional input, default = 1)
 654 |         objFun:         function handle to evaluate the objective 
 655 |                         (not required for maxit = 1 )
 656 |         gradFun:        function handle to evaluate the gradient 
 657 |                         (not required for maxit = 1 )
 658 |         lowerBound:     lower bound for the parameters (optional input)
 659 |         upperBound:     upper bound for the parameters (optional input)
 660 |         stopGrad:       stopping criterion based on 2-norm of gradient vector
 661 |                         (default 10^-6)
 662 |         alg:            algorithm used
 663 |         __version__:    version of the code
 664 |     ==============================================================================
 665 |     Methods:
 666 |      Public:
 667 |         performIter:    performs all the iterations inside a for loop
 668 |         getGradHist:    returns gradient history (default is zero)
 669 |         getMoments:     returns history of moments
 670 |         Inherited:
 671 |             getParam:       returns the parameter values
 672 |             getObj:         returns the current objective value
 673 |             getGrad:        returns the current gradient information
 674 |             getParamHist:   returns parameter update history
 675 |      Private: (should not be called outside this class file)
 676 |         __init__:       initialization
 677 |         update:         performs one iteration of Adam
 678 |         Inherited:
 679 |             evaluateObjFn:      evaluates the objective function
 680 |             evaluateGradFn:     evaluates the gradients
 681 |             satisfyBounds:      satisfies the parameter bounds
 682 |             learningSchedule:   learning schedule
 683 |             stopCrit:           check stopping criteria
 684 |     ==============================================================================
 685 |     Reference: Kingma, Diederik P., and Jimmy Ba. 
 686 |     "Adam: A method for stochastic optimization." 
 687 |     arXiv preprint arXiv:1412.6980 (2014).
 688 |     ==============================================================================
 689 |     written by Subhayan De (email: Subhayan.De@colorado.edu), July, 2018.
 690 |     ==============================================================================
 691 |     """
 692 |     def __init__(self,m = 0.0,v = 0.0,beta1 = 0.9,beta2 = 0.99,**kwargs):
 693 | #    def __init__(self,grad,learningRate,parameters,numParam,gradHist,beta1,beta2):
 694 |         """ Initialize the adagrad class object. 
 695 |         This can be used to perform one iteration of Adam. 
 696 |         """
 697 |         self.alg = 'Adam'
 698 |         SGD.printAlg(self)
 699 |         self.beta1 = beta1 # decay rate (beta1 = 0.9 is a good suggestion)
 700 |         self.beta2 = beta2 # decay rate (beta2 = 0.999 is a good suggetion)
 701 |         self.epsilon=np.finfo(float).eps # The machine precision
 702 |         SGD.__init__(self,**kwargs)
 703 |         # Initialize first moment
 704 |         if np.sum(m) != 0.0:
 705 |             if np.size(m) != self.nParam:
 706 |                 raise ValueError('First moment dimension mismatch')
 707 |             else:
 708 |                 self.m=np.reshape(m,(self.nParam))
 709 |         else:
 710 |             self.m = np.zeros(self.nParam)
 711 |         # Initialize second raw moment
 712 |         if np.sum(v) != 0.0:
 713 |             if np.size(v) != self.nParam:
 714 |                 raise ValueError('Second raw moment dimension mismatch')
 715 |             else:
 716 |                 self.v=np.reshape(v,(self.nParam))
 717 |         else:
 718 |             self.v = np.zeros(self.nParam)
 719 |         
 720 |     def update(self):
 721 |         """ Perform one iteration of Adam
 722 |         """
 723 |         SGD.learningSchedule(self)
 724 |         # Moment updates
 725 |         self.m = self.beta1*self.m + (1.0-self.beta1)*self.grad # Update biased first moment estimate
 726 |         self.mHat = self.m/(1.0-self.beta1**(self.currentIter+1)) # Compute bias-corrected first moment estimate
 727 |         #print(self.mHat)
 728 |         self.v = self.beta2*self.v + (1.0-self.beta2)*np.multiply(self.grad,self.grad) # Update biased second moment estimate
 729 |         self.vHat = self.v/(1.0-self.beta2**(self.currentIter+1)) # Compute bias-corrected second moment estimate
 730 |         # Parameter updates
 731 |         self.param = self.param - np.divide((self.etaCurrent*self.mHat),(np.sqrt(self.vHat))+self.epsilon)
 732 |         SGD.satisfyBounds(self)
 733 |         self.paramHist = np.append(self.paramHist,np.reshape(self.param,(2,1)), axis = 1)
 734 |         #print('One iteration of Adam has been performed successfully!\n')   
 735 |         
 736 |     def performIter(self):
 737 |         """
 738 |         Performs all the iterations of Adam
 739 |         """
 740 |         # initialize progress bar
 741 |         printProgressBar(0, self.maxiter, prefix = self.alg, suffix = 'Complete', length = 25)
 742 |         self.t = time.clock()
 743 |         for i in range(self.iter,self.maxiter,1):
 744 |             if self.stop == True:
 745 |                 break
 746 |             #print('iteration', i+1, 'out of', self.maxiter)
 747 |             self.update()
 748 |             self.currentIter = i+1
 749 |             # print progress bar
 750 |             SGD.printProgress(self)
 751 |             # Update the objective and gradient
 752 |             if self.maxiter > 1: # since objFun and gradFun are optional for 1 iteration
 753 |                 SGD.evaluateObjFn(self)
 754 |                 SGD.evaluateGradFn(self)
 755 |                 SGD.stopCrit(self)
 756 |         
 757 |     def getMoments(self):
 758 |         """
 759 |         This returns the updated moments
 760 |         """
 761 |         return self.m, self.v
 762 |     
 763 | class Adamax(SGD):
 764 |     """
 765 |     ==============================================================================
 766 |     |                  Adaptive moment estimation (Adamax) class                 |
 767 |     |               derived class from Stochastic Gradient Descent               |
 768 |     ==============================================================================
 769 |     Initialization:
 770 |         admx = Adamax(m, v, beta1, beta2, obj, grad, eta, param, 
 771 |                    iter, maxIter, objFun, gradFun, lowerBound, upperBound)
 772 | 
 773 |     ==============================================================================
 774 |     Attributes: (all private)
 775 |         grad:           Gradient information (array of dimension nParam-by-1)
 776 |         eta:            learning rate 
 777 |         param:          the parameter vector (array of dimension nParam-by-1)
 778 |         nParam:         number of parameters
 779 |         beta1, beta2:   exponential decay rates in [0,1) 
 780 |                         (default beta1 = 0.9, beta2 = 0.999)
 781 |         m:              First moment (array of dimension nParam-by-1)
 782 |         u:              infinity norm constrained second moment 
 783 |                         (array of dimension nParam-by-1)
 784 |         epsilon:        square-root of machine-precision 
 785 |                         (required to avoid division by zero)
 786 |         iter:           iteration number
 787 |         maxIter:        maximum iteration number (optional input, default = 1)
 788 |         objFun:         function handle to evaluate the objective 
 789 |                         (not required for maxit = 1 )
 790 |         gradFun:        function handle to evaluate the gradient 
 791 |                         (not required for maxit = 1 )
 792 |         lowerBound:     lower bound for the parameters (optional input)
 793 |         upperBound:     upper bound for the parameters (optional input)
 794 |         stopGrad:       stopping criterion based on 2-norm of gradient vector
 795 |                         (default 10^-6)
 796 |         alg:            algorithm used
 797 |         __version__:    version of the code
 798 |     ==============================================================================
 799 |     Methods:
 800 |      Public:
 801 |         performIter:    performs all the iterations inside a for loop
 802 |         getGradHist:    returns gradient history (default is zero)
 803 |         getMoments:     returns history of moments
 804 |         Inherited:
 805 |             getParam:       returns the parameter values
 806 |             getObj:         returns the current objective value
 807 |             getGrad:        returns the current gradient information
 808 |             getParamHist:   returns parameter update history
 809 |      Private: (should not be called outside this class file)
 810 |         __init__:       initialization
 811 |         update:         performs one iteration of Adam
 812 |         Inherited:
 813 |             evaluateObjFn:      evaluates the objective function
 814 |             evaluateGradFn:     evaluates the gradients
 815 |             satisfyBounds:      satisfies the parameter bounds
 816 |             learningSchedule:   learning schedule
 817 |             stopCrit:           check stopping criteria
 818 |     ==============================================================================
 819 |     Reference: Kingma, Diederik P., and Jimmy Ba. 
 820 |     "Adam: A method for stochastic optimization." 
 821 |     arXiv preprint arXiv:1412.6980 (2014).
 822 |     ==============================================================================
 823 |     written by Subhayan De (email: Subhayan.De@colorado.edu), July, 2018.
 824 |     ==============================================================================
 825 |     """
 826 |     def __init__(self,m = 0.0,u = 0.0,beta1 = 0.9,beta2 = 0.99,**kwargs):
 827 | #    def __init__(self,grad,learningRate,parameters,numParam,gradHist,beta1,beta2):
 828 |         """ Initialize the adagrad class object. 
 829 |         This can be used to perform one iteration of Adamax. 
 830 |         """
 831 |         self.alg = 'Adamax'
 832 |         SGD.printAlg(self)
 833 |         self.beta1 = beta1 # decay rate (beta1 = 0.9 is a good suggestion)
 834 |         self.beta2 = beta2 # decay rate (beta2 = 0.999 is a good suggetion)
 835 |         self.epsilon=np.finfo(float).eps # The machine precision
 836 |         SGD.__init__(self,**kwargs)
 837 |         # Initialize first moment
 838 |         if np.sum(m) != 0.0:
 839 |             if np.size(m) != self.nParam:
 840 |                 raise ValueError('First moment dimension mismatch')
 841 |             else:
 842 |                 self.m=np.reshape(m,(self.nParam))
 843 |         else:
 844 |             self.m = np.zeros(self.nParam)
 845 |         # Initialize second raw moment
 846 |         if np.sum(u) != 0.0:
 847 |             if np.size(u) != self.nParam:
 848 |                 raise ValueError('Second raw moment dimension mismatch')
 849 |             else:
 850 |                 self.u=np.reshape(u,(self.nParam))
 851 |         else:
 852 |             self.u = np.zeros(self.nParam)
 853 |         
 854 |     def update(self):
 855 |         """ Perform one iteration of Adamax
 856 |         """
 857 |         SGD.learningSchedule(self)
 858 |         # Moment updates
 859 |         self.m = self.beta1*self.m + (1.0-self.beta1)*self.grad # Update biased first moment estimate
 860 |         self.mHat = self.m/(1.0-self.beta1**(self.currentIter+1)) # Compute bias-corrected first moment estimate
 861 |         self.u = np.maximum(self.beta2*self.u,np.abs(self.grad))
 862 | #        self.v = self.beta2*self.v + (1.0-self.beta2)*np.multiply(self.grad,self.grad) # Update biased second moment estimate
 863 |         # Parameter updates
 864 |         self.param = self.param - np.divide((self.etaCurrent*self.mHat),self.u)
 865 |         SGD.satisfyBounds(self)
 866 |         self.paramHist = np.append(self.paramHist,np.reshape(self.param,(2,1)), axis = 1)
 867 |         #print('One iteration of Adamax has been performed successfully!\n')   
 868 |         
 869 |     def performIter(self):
 870 |         """
 871 |         Performs all the iterations of Adamax
 872 |         """
 873 |         # initialize progress bar
 874 |         printProgressBar(0, self.maxiter, prefix = self.alg, suffix = 'Complete', length = 25)
 875 |         self.t = time.clock()
 876 |         for i in range(self.iter,self.maxiter,1):
 877 |             if self.stop == True:
 878 |                 break
 879 |             #print('iteration', i+1, 'out of', self.maxiter)
 880 |             self.update()
 881 |             self.currentIter = i+1
 882 |             # print progress bar
 883 |             SGD.printProgress(self)
 884 |             # Update the objective and gradient
 885 |             if self.maxiter > 1: # since objFun and gradFun are optional for 1 iteration
 886 |                 SGD.evaluateObjFn(self)
 887 |                 SGD.evaluateGradFn(self)
 888 |                 SGD.stopCrit(self)
 889 |         
 890 |     def getMoments(self):
 891 |         """
 892 |         This returns the updated moments
 893 |         """
 894 |         return self.m, self.v
 895 |         
 896 | class Adadelta(SGD):
 897 |     """
 898 |     ==============================================================================
 899 |     |                               ADADELTA class                               |
 900 |     |               derived class from Stochastic Gradient Descent               |
 901 |     ==============================================================================
 902 |     Initialization:
 903 |         add = Adadelta(gradHist, updatehist, rho, obj, grad, eta, param, 
 904 |                        iter, maxIter, objFun, gradFun, lowerBound, upperBound)
 905 |         NOTE: gradHist: historical information of gradients 
 906 |                         (array of dimension nparam-by-1)
 907 |                         this should equal to zero for 1st iteration
 908 |     ==============================================================================
 909 |     Attributes: (all private)
 910 |         grad:           Gradient information (array of dimension nParam-by-1)
 911 |         eta:            learning rate = 1 by default
 912 |         param:          the parameter vector (array of dimension nParam-by-1)
 913 |         nParam:         number of parameters
 914 |         gradHist:       gradient history accumulator (see the algorithm)
 915 |         updateHist:     parameter update history accumulator
 916 |         epsilon:        square-root of machine-precision 
 917 |                         (required to avoid division by zero)
 918 |         rho:            exponential decay rate (0.95 may be a good choice)
 919 |         iter:           iteration number (optional)
 920 |         maxIter:        maximum iteration number (optional input, default = 1)
 921 |         objFun:         function handle to evaluate the objective 
 922 |                         (not required for maxit = 1 )
 923 |         gradFun:        function handle to evaluate the gradient 
 924 |                         (not required for maxit = 1 )
 925 |         lowerBound:     lower bound for the parameters (optional input)
 926 |         upperBound:     upper bound for the parameters (optional input)
 927 |         stopGrad:       stopping criterion based on 2-norm of gradient vector
 928 |                         (default 10^-6)
 929 |         alg:            algorithm used
 930 |         __version__:    version of the code
 931 |     ==============================================================================
 932 |     Methods:
 933 |      Public:
 934 |         performIter:performs all the iterations inside a for loop
 935 |         getGradHist:returns gradient history (default is zero)
 936 |         Inherited:
 937 |             getParam:       returns the parameter values
 938 |             getObj:         returns the current objective value
 939 |             getGrad:        returns the current gradient information
 940 |             getParamHist:   returns parameter update history
 941 |      Private: (should not be called outside this class file)
 942 |         __init__:       initialization
 943 |         update:         performs one iteration of Adadelta
 944 |         Inherited:
 945 |             evaluateObjFn:      evaluates the objective function
 946 |             evaluateGradFn:     evaluates the gradients
 947 |             satisfyBounds:      satisfies the parameter bounds
 948 |             learningSchedule:   learning schedule
 949 |             stopCrit:           check stopping criteria
 950 |     ==============================================================================
 951 |     Reference: Zeiler, Matthew D. 
 952 |     "Adadelta: an adaptive learning rate method." 
 953 |     arXiv preprint arXiv:1212.5701 (2012).
 954 |     ==============================================================================
 955 |     written by Subhayan De (email: Subhayan.De@colorado.edu), July, 2018.
 956 |     ==============================================================================
 957 |     """
 958 |     def __init__(self,gradHist=0.0,updateHist=0.0,rho=0.95,**kwargs):
 959 |         """ Initialize the Adadelta class object. 
 960 |             This can be used to perform one iteration of Adadelta. 
 961 |         """
 962 |         self.alg = 'Adadelta'
 963 |         SGD.printAlg(self)
 964 |         SGD.__init__(self,**kwargs)
 965 |         self.epsilon=np.finfo(float).eps # The machine precision
 966 |         # Initialize gradient history
 967 |         if np.sum(gradHist) != 0.0:
 968 |             if np.size(gradHist) != self.nParam:
 969 |                 raise ValueError('Gradient history dimension mismatch')
 970 |             else:
 971 |                 self.gradHist=np.reshape(gradHist,(self.nParam))
 972 |         else:
 973 |             self.gradHist = np.zeros(self.nParam)
 974 |         # Initialize parameter history
 975 |         if np.sum(updateHist) != 0.0:
 976 |             if np.size(updateHist) != self.nParam:
 977 |                 raise ValueError('Gradient history dimension mismatch')
 978 |             else:
 979 |                 self.updateHist=np.reshape(updateHist,(self.nParam))
 980 |         else:
 981 |             self.updateHist = np.zeros(self.nParam)
 982 |         # Initialize rho
 983 |         self.rho = rho
 984 |         # Set eta to 1.0
 985 |         if self.eta!=1.0:
 986 |             print('Learning rate = ',self.eta,'!= 1.0\nSo, the learning rate is set to 1.0\n')
 987 |         self.eta = 1.0
 988 |         
 989 |     def update(self):
 990 |         """
 991 |         Perform one iteration of Adadelta
 992 |         """
 993 |         self.epsilon = 1e-6
 994 |         if self.currentIter<200:
 995 |             self.epsilon = 0.1
 996 |         else:
 997 |             self.epsilon = 1e-6
 998 |         SGD.learningSchedule(self)
 999 |         # update gradient history acccumulator
1000 |         self.gradHist+=self.rho*self.gradHist+(1.0-self.rho)*np.multiply(self.grad,self.grad); # Sum of gradient history
1001 |         # Perform one iteration of Adadelta
1002 |         RMSdx = np.sqrt(self.updateHist)+self.epsilon
1003 |         RMSg = np.sqrt(self.gradHist)+self.epsilon
1004 |         updateParam = np.multiply((np.divide(RMSdx,RMSg)),self.grad)
1005 |         self.param=self.param-self.etaCurrent*updateParam
1006 |         SGD.satisfyBounds(self)
1007 |         self.paramHist = np.append(self.paramHist,np.reshape(self.param,(2,1)), axis = 1)
1008 |         #print('One iteration of Adadelta has been performed successfully!\n')
1009 |         # update parameter history accumulator
1010 |         self.updateHist = self.rho*self.updateHist+(1.0-self.rho)*np.multiply(updateParam,updateParam)
1011 |         
1012 |     def performIter(self):
1013 |         """
1014 |         Performs all the iterations of Adadelta
1015 |         """
1016 |         # initialize progress bar
1017 |         printProgressBar(0, self.maxiter, prefix = self.alg, suffix = 'Complete', length = 25)
1018 |         self.t = time.clock()
1019 |         for i in range(self.iter,self.maxiter,1):
1020 |             if self.stop == True:
1021 |                 break
1022 |             #print('iteration', i+1, 'out of', self.maxiter)
1023 |             self.update()
1024 |             self.currentIter = i+1
1025 |             # print progress bar
1026 |             SGD.printProgress(self)
1027 |             # Update the objective and gradient
1028 |             if self.maxiter > 1: # since objFun and gradFun are optional for 1 iteration
1029 |                 SGD.evaluateObjFn(self)
1030 |                 SGD.evaluateGradFn(self)
1031 |                 SGD.stopCrit(self)
1032 |         
1033 |     def getGradHist(self):
1034 |         """
1035 |         This returns the gradient history
1036 |         """
1037 |         return self.gradHist
1038 |     
1039 |     def getUpdateHist(self):
1040 |         """
1041 |         This returns the parameter update history
1042 |         """
1043 |         self.updateHist
1044 |         
1045 | class Nadam(SGD):
1046 |     """
1047 |     ==============================================================================
1048 |     |         Nesterov-accelerated Adaptive moment estimation (Nadam) class      |
1049 |     |               derived class from Stochastic Gradient Descent               |
1050 |     ==============================================================================
1051 |     Initialization:
1052 |         nadm = Nadam(m, v, beta1, beta2, obj, grad, eta, param, iter, 
1053 |                      maxIter, objFun, gradFun, lowerBound, upperBound)
1054 | 
1055 |     ==============================================================================
1056 |     Attributes: (all private)
1057 |         grad:           Gradient information (array of dimension nParam-by-1)
1058 |         eta:            learning rate 
1059 |         param:          the parameter vector (array of dimension nParam-by-1)
1060 |         nParam:         number of parameters
1061 |         beta1, beta2:   exponential decay rates in [0,1) 
1062 |                         (default beta1 = 0.9, beta2 = 0.999)
1063 |         m:              First moment (array of dimension nParam-by-1)
1064 |         v:              Second raw moment (array of dimension nParam-by-1)
1065 |         epsilon:        square-root of machine-precision 
1066 |                         (required to avoid division by zero)
1067 |         iter:           iteration number
1068 |         maxIter:        maximum iteration number (optional input, default = 1)
1069 |         objFun:         function handle to evaluate the objective 
1070 |                         (not required for maxit = 1 )
1071 |         gradFun:        function handle to evaluate the gradient 
1072 |                         (not required for maxit = 1 )
1073 |         lowerBound:     lower bound for the parameters (optional input)
1074 |         upperBound:     upper bound for the parameters (optional input)
1075 |         stopGrad:       stopping criterion based on 2-norm of gradient vector
1076 |                         (default 10^-6)
1077 |         alg:            algorithm used
1078 |         __version__:    version of the code
1079 |     ==============================================================================
1080 |     Methods:
1081 |      Public:
1082 |         performIter:    performs all the iterations inside a for loop
1083 |         getGradHist:    returns gradient history (default is zero)
1084 |         getMoments:     returns history of moments
1085 |         Inherited:
1086 |             getParam:       returns the parameter values
1087 |             getObj:         returns the current objective value
1088 |             getGrad:        returns the current gradient information
1089 |             getParamHist:   returns parameter update history
1090 |      Private: (should not be called outside this class file)
1091 |         __init__:       initialization
1092 |         update:         performs one iteration of Adam
1093 |         Inherited:
1094 |             evaluateObjFn:      evaluates the objective function
1095 |             evaluateGradFn:     evaluates the gradients
1096 |             satisfyBounds:      satisfies the parameter bounds
1097 |             learningSchedule:   learning schedule
1098 |             stopCrit:           check stopping criteria
1099 |     ==============================================================================
1100 |     Reference: Timothy Dozat. 
1101 |       "Incorporating Nesterov Momentum into Adam". 
1102 |        ICLR Workshop, (1):2013–2016, 2016.
1103 |     ==============================================================================
1104 |     written by Subhayan De (email: Subhayan.De@colorado.edu), July, 2018.
1105 |     ==============================================================================
1106 |     """
1107 |     def __init__(self,m = 0.0,v = 0.0,beta1 = 0.9,beta2 = 0.99,**kwargs):
1108 | #    def __init__(self,grad,learningRate,parameters,numParam,gradHist,beta1,beta2):
1109 |         """ Initialize the adagrad class object. 
1110 |         This can be used to perform one iteration of Adam. 
1111 |         """
1112 |         self.alg = 'Nadam'
1113 |         SGD.printAlg(self)
1114 |         self.beta1 = beta1 # decay rate (beta1 = 0.9 is a good suggestion)
1115 |         self.beta2 = beta2 # decay rate (beta2 = 0.999 is a good suggetion)
1116 |         self.epsilon=np.finfo(float).eps # The machine precision
1117 |         SGD.__init__(self,**kwargs)
1118 |         # Initialize first moment
1119 |         if np.sum(m) != 0.0:
1120 |             if np.size(m) != self.nParam:
1121 |                 raise ValueError('First moment dimension mismatch')
1122 |             else:
1123 |                 self.m=np.reshape(m,(self.nParam))
1124 |         else:
1125 |             self.m = np.zeros(self.nParam)
1126 |         # Initialize second raw moment
1127 |         if np.sum(v) != 0.0:
1128 |             if np.size(v) != self.nParam:
1129 |                 raise ValueError('Second raw moment dimension mismatch')
1130 |             else:
1131 |                 self.v=np.reshape(v,(self.nParam))
1132 |         else:
1133 |             self.v = np.zeros(self.nParam)
1134 |         
1135 |         
1136 |     def update(self):
1137 |         """ 
1138 |         Perform one iteration of Nadam
1139 |         """
1140 |         SGD.learningSchedule(self)
1141 |         # Moment updates
1142 |         self.m = self.beta1*self.m + (1.0-self.beta1)*self.grad # Update biased first moment estimate
1143 |         self.mHat = self.m/(1.0-self.beta1**(self.currentIter+1)) # Compute bias-corrected first moment estimate
1144 |         self.v = self.beta2*self.v + (1.0-self.beta2)*np.multiply(self.grad,self.grad) # Update biased second moment estimate
1145 |         self.vHat = self.v/(1.0-self.beta2**(self.currentIter+1)) # Compute bias-corrected second moment estimate
1146 |         # Parameter updates
1147 |         mHat2 = self.beta1*self.mHat+(1.0-self.beta1)*self.grad/(1.0-self.beta1**(self.currentIter+1))
1148 |         self.param = self.param - np.divide((self.etaCurrent*mHat2),(np.sqrt(self.vHat))+self.epsilon)
1149 |         SGD.satisfyBounds(self)
1150 |         self.paramHist = np.append(self.paramHist,np.reshape(self.param,(2,1)), axis = 1)
1151 |         #print('One iteration of Nadam has been performed successfully!\n')   
1152 |         
1153 |     def performIter(self):
1154 |         """
1155 |         Performs all the iterations of Nadam
1156 |         """
1157 |         # initialize progress bar
1158 |         printProgressBar(0, self.maxiter, prefix = self.alg, suffix = 'Complete', length = 25)
1159 |         self.t = time.clock()
1160 |         for i in range(self.iter,self.maxiter,1):
1161 |             if self.stop == True:
1162 |                 break
1163 |             #print('iteration', i+1, 'out of', self.maxiter)
1164 |             self.update()
1165 |             self.currentIter = i+1
1166 |             # print progress bar
1167 |             SGD.printProgress(self)
1168 |             # Update the objective and gradient
1169 |             if self.maxiter > 1: # since objFun and gradFun are optional for 1 iteration
1170 |                 SGD.evaluateObjFn(self)
1171 |                 SGD.evaluateGradFn(self)
1172 |                 SGD.stopCrit(self)
1173 |         
1174 |     def getMoments(self):
1175 |         """
1176 |         This returns the updated moments
1177 |         """
1178 |         return self.m, self.v
1179 |         
1180 | class SAG(SGD):
1181 |     """
1182 |     ==============================================================================
1183 |     |                   Stochastic Average Gradient (SAG) class                  |
1184 |     |               derived class from Stochastic Gradient Descent               |
1185 |     ==============================================================================
1186 |     Initialization:
1187 |         sag = SAG(nSamples, nTotSamples, fullGrad = 0.0, obj, grad, eta, param, 
1188 |                   iter, maxIter, objFun, gradFun, lowerBound, upperBound)
1189 | 
1190 |     ==============================================================================
1191 |     Attributes: (all private)
1192 |         fullGrad:           Full gradient information 
1193 |                         (array of dimension nParam-by-nTotSamples)
1194 |         eta:            learning rate 
1195 |         param:          the parameter vector (array of dimension nParam-by-1)
1196 |         nParam:         number of parameters
1197 |         nTotSamples:    total number of samples
1198 |         nSamples:       number of gradients updated at each iteration
1199 |         iter:           iteration number (optional)
1200 |         maxIter:        maximum iteration number (optional input, default = 1)
1201 |         objFun:         function handle to evaluate the objective 
1202 |                         (not required for maxit = 1 )
1203 |         gradFun:        function handle to evaluate the gradient 
1204 |                         (not required for maxit = 1 )
1205 |         lowerBound:     lower bound for the parameters (optional input)
1206 |         upperBound:     upper bound for the parameters (optional input)
1207 |         stopGrad:       stopping criterion based on 2-norm of gradient vector
1208 |                         (default 10^-6)
1209 |         learnSched:     learning schedule (constant, exponential or time-based, 
1210 |                                        default = constant)
1211 |         lrParam:        learning schedule parameter (default =0.1)
1212 |         alg:            algorithm used
1213 |         __version__:    version of the code
1214 |     ==============================================================================
1215 |     Methods:
1216 |      Public:
1217 |         performIter:performs all the iterations inside a for loop
1218 |         getGradHist:returns gradient history (default is zero)
1219 |         Inherited:
1220 |             getParam:       returns the parameter values
1221 |             getObj:         returns the current objective value
1222 |             getGrad:        returns the current gradient information
1223 |             getParamHist:   returns parameter update history
1224 |      Private: (should not be called outside this class file)
1225 |         __init__:       initialization
1226 |         update:         performs one iteration of SAG
1227 |         Inherited:
1228 |             evaluateObjFn:      evaluates the objective function
1229 |             evaluateGradFn:     evaluates the gradients
1230 |             satisfyBounds:      satisfies the parameter bounds
1231 |             learningSchedule:   learning schedule
1232 |             stopCrit:           check stopping criteria
1233 |     ==============================================================================
1234 |     Reference: Roux, Nicolas L., Mark Schmidt, and Francis R. Bach. 
1235 |     "A stochastic gradient method with an exponential convergence rate 
1236 |      for finite training sets." 
1237 |     Advances in neural information processing systems. 2012.
1238 |     ==============================================================================
1239 |     written by Subhayan De (email: Subhayan.De@colorado.edu), July, 2018.
1240 |     ==============================================================================
1241 |     """
1242 |     def __init__(self,nSamples,nTotSamples,fullGrad =0.0,**kwargs):
1243 |         """ Initialize the SAG class object. 
1244 |             This can be used to perform one iteration of SAG. 
1245 |         """
1246 |         self.alg = 'SAG'
1247 |         SGD.printAlg(self)
1248 |         grad = fullGrad
1249 |         SGD.__init__(self,**kwargs)
1250 |         # Assign total number of samples
1251 |         if type(nTotSamples) != int:
1252 |             raise TypeError('nSamples not an integer value')
1253 |         else:
1254 |             self.nTotSamples = nTotSamples
1255 |         # Assign number of samples to be replaced at each iteration
1256 |         if type(nSamples) != int:
1257 |             raise TypeError('nSamples not an integer value')
1258 |         else:
1259 |             self.nSamples = nSamples
1260 |         # Initialize gradients
1261 |         if np.sum(fullGrad) != 0:
1262 |             if np.size(fullGrad)/nTotSamples != self.nParam:
1263 |                 raise ValueError('Full gradient dimension mismatch')
1264 |             else:
1265 |                 fullGrad = np.reshape(fullGrad,(self.nParam,nTotSamples))
1266 |         else:
1267 |             self.fullGrad = np.zeros((self.nParam,self.nTotSamples))
1268 |             try: 
1269 |                 self.gradFun
1270 |             except NameError: 
1271 |                 print('Please provide gradient function name')
1272 |             self.fullGrad, nprime = self.gradFun(self.param,self.nTotSamples)
1273 |         self.grad = self.fullGrad
1274 | 
1275 |     def update(self):
1276 |         """
1277 |         Perform one iteration of SAG
1278 |         """
1279 |         if hasattr(self,'gradFun'):
1280 |             batchGrad,nprime = self.gradFun(self.param,self.nSamples)
1281 |         else:
1282 |             nprime = np.random.choice(range(self.nTotSamples), self.nSamples, replace = False)
1283 |             batchGrad = self.fullGrad[:,nprime]
1284 |         # Perform one iteration of SAG
1285 |         for i in range(self.nSamples):
1286 |             #self.evaluateGradFn()
1287 |             self.fullGrad[:,nprime[i]] = batchGrad[:,i]
1288 |         
1289 |         SGD.learningSchedule(self)
1290 |         self.param=self.param-self.etaCurrent*np.mean(self.fullGrad,1)
1291 |         #print(np.mean(self.fullGrad,1),self.param)
1292 |         SGD.satisfyBounds(self)
1293 |         self.paramHist = np.append(self.paramHist,np.reshape(self.param,(2,1)), axis = 1)
1294 |         #print('One iteration of SAG has been performed successfully!\n')
1295 |         
1296 |     def performIter(self):
1297 |         """
1298 |         Performs all the iterations of SAG
1299 |         """
1300 |         # initialize progress bar
1301 |         printProgressBar(0, self.maxiter, prefix = self.alg, suffix = 'Complete', length = 25)
1302 |         self.t = time.clock()
1303 |         for i in range(self.iter,self.maxiter,1):
1304 |             if self.stop == True:
1305 |                 break
1306 |             #print('iteration', i+1, 'out of', self.maxiter)
1307 |             self.update()
1308 |             self.currentIter = i+1
1309 |             # print progress bar
1310 |             SGD.printProgress(self)
1311 |             # Update the objective and gradient
1312 |             if self.maxiter > 1: # since objFun and gradFun are optional for 1 iteration
1313 |                 SGD.evaluateObjFn(self)
1314 |                 SGD.stopCrit(self)
1315 |                 
1316 |                 
1317 | class minibatchSGD(SGD):
1318 |     """
1319 |     ==============================================================================
1320 |     |                           minibatch SGD class                              |
1321 |     |               derived class from Stochastic Gradient Descent               |
1322 |     ==============================================================================
1323 |     Initialization: 
1324 |         mbsgd = minibatchSGD(nSamples, nTotSamples,newGrad = 0.0,
1325 |                               obj, grad, eta, param, iter, maxiter, 
1326 |                               objFun, gradFun, lowerBound, upperBound)
1327 |         
1328 |     ==============================================================================
1329 |     Attributes:
1330 |         alg:            minibatchSGD
1331 |         eta:            learning rate 
1332 |         param:          the parameter vector (array of dimension nParam-by-1)
1333 |         nParam:         number of parameters
1334 |         newGrad:        gradient information 
1335 |                         (array of dimension nParam-by-nSamples)
1336 |         nSamples:       number of gradients updated at each iteration
1337 |         iter:           iteration number (optional)
1338 |         maxIter:        maximum iteration number (optional input, default = 1)
1339 |         objFun:         function handle to evaluate the objective 
1340 |                         (not required for maxit = 1 )
1341 |         gradFun:        function handle to evaluate the gradient 
1342 |                         (not required for maxit = 1 )
1343 |         lowerBound:     lower bound for the parameters (optionalinput)
1344 |         upperBound:     upper bound for the parameters (optional input)
1345 |         stopGrad:       stopping criterion based on 2-norm of gradient vector
1346 |                         (default 10^-6)
1347 |         learnSched:     learning schedule (constant, exponential or time-based, 
1348 |                                        default = constant)
1349 |         lrParam:        learning schedule parameter (default =0.1)
1350 |         alg:            algorithm used
1351 |         __version__:    version of the code
1352 |     ==============================================================================
1353 |     Methods:
1354 |      Public:
1355 |         performIter:        performs all the iterations inside a for loop
1356 |         getGradHist:        returns gradient history (default is zero)
1357 |         Inherited:
1358 |             getParam:       returns the parameter values
1359 |             getObj:         returns the current objective value
1360 |             getGrad:        returns the current gradient information
1361 |             getParamHist:   returns parameter update history
1362 |      Private: (should not be called outside this class file)
1363 |         __init__:       initialization
1364 |         update:         performs one iteration of minibatch SGD
1365 |         Inherited:
1366 |             evaluateObjFn:      evaluates the objective function
1367 |             evaluateGradFn:     evaluates the gradients
1368 |             satisfyBounds:      satisfies the parameter bounds
1369 |             learningSchedule:   learning schedule
1370 |             stopCrit:           check stopping criteria
1371 |     ==============================================================================
1372 |     Reference: 
1373 |     ==============================================================================
1374 |     written by Subhayan De (email: Subhayan.De@colorado.edu), July, 2018.
1375 |     ==============================================================================
1376 |     """
1377 |     def __init__(self,nSamples,nTotSamples = np.inf,newGrad = 0.0,**kwargs):
1378 |         """ Initialize the minibatch SGD class object. 
1379 |             This can be used to perform one iteration of minibatch SGD. 
1380 |         """
1381 |         self.alg = 'minibatchSGD'
1382 |         SGD.printAlg(self)
1383 |         self.grad = newGrad
1384 |         SGD.__init__(self,**kwargs)
1385 |         # Assign number of samples used at each iteration
1386 |         if type(nSamples) != int:
1387 |             raise TypeError('nSamples not an integer value')
1388 |         else:
1389 |             self.nSamples = nSamples
1390 |         # Total number of samples
1391 |         if type(nTotSamples) != int:
1392 |             raise TypeError('nTotSamples not an integer value')
1393 |         else:
1394 |             self.nTotSamples = nTotSamples
1395 |         # Check for total number of samples
1396 |         if nTotSamples < nSamples:
1397 |             print('nTotSamples can not be smaller that nSamples\n')
1398 |             print('nTotSamples = nSamples is set\n')
1399 |             print('NOTE: performing a batch gradient descent')
1400 |         elif nTotSamples == nSamples:
1401 |             print('NOTE: performing a batch gradient descent')
1402 |         elif nTotSamples < np.inf:
1403 |             print('NOTE: performing a minibatch SGD with ', nSamples/nTotSamples*100, '% of total samples')
1404 |         else:
1405 |             print('NOTE: performing a minibatch SGD with ', nSamples, ' samples')
1406 |         # Initialize new gradients
1407 |         if np.sum(newGrad) != 0.0:
1408 |             if np.size(newGrad)/nSamples != self.nParam:
1409 |                 raise ValueError('New gradient dimension mismatch')
1410 |             else:
1411 |                 self.newGrad=np.reshape(newGrad,(self.nParam))
1412 |         else:
1413 |             self.newGrad = np.zeros((self.nParam,self.nSamples))
1414 |             try: 
1415 |                 self.gradFun
1416 |             except NameError: 
1417 |                 print('Please provide gradient function name')
1418 |             self.newGrad, nprime = self.gradFun(self.param,self.nSamples)
1419 | 
1420 |     def update(self):
1421 |         """
1422 |         Perform one iteration of minibatch SGD
1423 |         """
1424 |         SGD.learningSchedule(self)
1425 |         if self.maxiter>1:
1426 |             self.newGrad,nprime = self.gradFun(self.param,self.nSamples)
1427 |         # Perform one iteration of minibatch SGD
1428 |         self.param=self.param-self.etaCurrent*np.mean(self.newGrad,1)
1429 |         SGD.satisfyBounds(self)
1430 |         self.paramHist = np.append(self.paramHist,np.reshape(self.param,(2,1)), axis = 1)
1431 |         #print('One iteration of minibatch SGD has been performed successfully!\n')
1432 |         
1433 |     def performIter(self):
1434 |         """
1435 |         Performs all the iterations of minibatch SGD
1436 |         """
1437 |         # initialize progress bar
1438 |         printProgressBar(0, self.maxiter, prefix = self.alg, suffix = 'Complete', length = 25)
1439 |         self.t = time.clock()
1440 |         for i in range(self.iter,self.maxiter,1):
1441 |             if self.stop == True:
1442 |                 break
1443 |             #print('iteration', i+1, 'out of', self.maxiter)
1444 |             self.update()
1445 |             self.currentIter = i+1
1446 |             # print progress bar
1447 |             SGD.printProgress(self)
1448 |             # Update the objective and gradient
1449 |             if self.maxiter > 1: # since objFun and gradFun are optional for 1 iteration
1450 |                 SGD.evaluateObjFn(self)       
1451 |                 SGD.stopCrit(self)
1452 |         
1453 | class SVRG(SGD):
1454 |     """
1455 |     ==============================================================================
1456 |     |              Stochastic variance reduced gradient (SVRG) class             |
1457 |     |               derived class from Stochastic Gradient Descent               |
1458 |     ==============================================================================
1459 |     Initialization:
1460 |         opt = SVRG(nTotSamples, innerIter = 10, outerIter = 200, option = 1,obj, 
1461 |         grad, eta, param, iter, maxiter, objFun, gradFun)
1462 |         
1463 |     NOTE: option = 1 or 2 as suggested in the reference paper.
1464 |     ==============================================================================
1465 |     Attributes:
1466 |         alg:            SVRG
1467 |         eta:            learning rate 
1468 |         param:          the parameter vector (array of dimension nParam-by-1)
1469 |         nParam:         number of parameters
1470 |         fullGrad:       Full gradient information 
1471 |                         (array of dimension nParam-by-nTotSamples)
1472 |         nTotSamples:    total number of samples
1473 |         innerIter:      inner iteration
1474 |         outerIter:      outer iteration
1475 |         iter:           iteration number (optional input)
1476 |         maxIter:        maximum iteration number 
1477 |                         (optional, default = innerIter*outerIter)
1478 |         objFun:         function handle to evaluate the objective 
1479 |                         (not required for maxit = 1 )
1480 |         gradFun:        function handle to evaluate the gradient 
1481 |                         (not required for maxit = 1 )
1482 |         mu:             average gradient in the outer iteration
1483 |         paramBest:      best estimate of the param in the oter iteration
1484 |         lowerBound:     lower bound for the parameters (optional input)
1485 |         upperBound:     upper bound for the parameters (optional input)
1486 |         stopGrad:       stopping criterion based on 2-norm of gradient vector
1487 |                         (default 10^-6)
1488 |         alg:            algorithm used
1489 |         __version__:    version of the code
1490 |     ==============================================================================
1491 |     Methods:
1492 |      Public:
1493 |         performOuterIter:   performs all the iterations inside a for loop
1494 |         getGradHist:        returns gradient history (default is zero)
1495 |         Inherited:
1496 |             getParam:       returns the parameter values
1497 |             getObj:         returns the current objective value
1498 |             getGrad:        returns the current gradient information
1499 |             getParamHist:   returns parameter update history
1500 |      Private: (should not be called outside this class file)
1501 |         __init__:           initialization
1502 |         innerUpdate:        performs inner iterations of SVRG
1503 |         Inherited:
1504 |             evaluateObjFn:      evaluates the objective function
1505 |             evaluateGradFn:     evaluates the gradients
1506 |             satisfyBounds:      satisfies the parameter bounds
1507 |             learningSchedule:   learning schedule
1508 |             stopCrit:           check stopping criteria
1509 |     ==============================================================================
1510 |     Reference: Johnson, Rie, and Tong Zhang. 
1511 |     "Accelerating stochastic gradient descent using predictive variance reduction." 
1512 |     Advances in neural information processing systems. 2013.
1513 |     ==============================================================================
1514 |     written by Subhayan De (email: Subhayan.De@colorado.edu), July, 2018.
1515 |     ==============================================================================
1516 |     """
1517 |     def __init__(self,nTotSamples, innerIter = 10, outerIter = 200, option = 1, **kwargs):
1518 |         """ Initialize the SVRG class object. 
1519 |             This can be used to perform one iteration of SVRG. 
1520 |         """
1521 |         self.alg = 'SVRG'
1522 |         SGD.printAlg(self)
1523 |         SGD.__init__(self,**kwargs)
1524 |         self.nTotSamples = nTotSamples
1525 |         # Check inner iteration and outer iteration values
1526 |         if innerIter*outerIter > self.maxiter:
1527 |             self.maxiter = innerIter*outerIter
1528 |             print('Maximum iteration number is set to ',self.maxiter)
1529 |         self.innerIter = innerIter
1530 |         self.outerIter = outerIter
1531 |         self.paramBest = self.param
1532 |         # Initialize gradients
1533 |         try: 
1534 |             self.gradFun
1535 |         except NameError: 
1536 |             print('Please provide gradient function name')
1537 |         self.fullGrad, nprime = self.gradFun(self.param,self.nTotSamples)
1538 |         self.grad = self.fullGrad        
1539 |         self.mu = np.mean(self.grad,1)
1540 |         self.option = option
1541 |         
1542 |     def innerUpdate(self):
1543 |         """
1544 |         Perform inner iterations of SVRG
1545 |         """
1546 |         for i in range(self.innerIter):
1547 |             SGD.learningSchedule(self)
1548 |             it = np.random.randint(self.nTotSamples)
1549 |             bestParamGrad, notNeeded = self.gradFun(self.paramBest,1)
1550 |             bestParamGrad = np.reshape(bestParamGrad,(2))
1551 |             self.param = self.param - self.etaCurrent*(self.grad[:,it]-bestParamGrad+self.mu)
1552 |             SGD.satisfyBounds(self)
1553 |             self.paramHist = np.append(self.paramHist,np.reshape(self.param,(2,1)), axis = 1)
1554 |         if self.option == 1:
1555 |             
1556 |             self.paramBest = self.param
1557 |         else:
1558 |             ind = np.random.randint(low = self.totIter, high = self.totIter+self.innerIter)
1559 |             self.paramBest = self.paramHist[:,ind]
1560 |         
1561 |     def performOuterIter(self):
1562 |         """
1563 |         Performs all the iterations of SVRG
1564 |         """
1565 |         # initialize progress bar
1566 |         printProgressBar(0, self.outerIter, prefix = self.alg, suffix = 'Complete', length = 25)
1567 |         self.t = time.clock()
1568 |         self.totIter = 0
1569 |         for i in range(self.iter,self.outerIter,1):
1570 |             if self.stop == True:
1571 |                 break
1572 |             #print('Outer iteration', i+1, ' of', self.outerIter, ' (inner iteration = ', self.innerIter,')')
1573 |             self.innerUpdate()
1574 |             self.totIter = self.totIter + (i+1)*self.innerIter
1575 |             self.currentIter = i+1
1576 |             # print progress bar
1577 |             SGD.printProgress(self)
1578 |             self.grad, notNeeded = self.gradFun(self.paramBest,self.nTotSamples)
1579 |             self.mu = np.mean(self.grad,1)
1580 |             # Update the objective and gradient
1581 |             if self.maxiter > 1: # since objFun and gradFun are optional for 1 iteration
1582 |                 SGD.evaluateObjFn(self)  
1583 |                 SGD.stopCrit(self)
1584 | 


--------------------------------------------------------------------------------