├── SGD_tutorial.pdf
├── README.md
├── sgd_demo.py
├── LICENSE
└── SGD.py
/SGD_tutorial.pdf:
--------------------------------------------------------------------------------
https://raw.githubusercontent.com/CU-UQ/SGD/HEAD/SGD_tutorial.pdf
--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
1 | # SGD
2 | Implementation of Stochastic Gradient Descent algorithms in Python (GNU GPLv3)
3 | If you find this code useful please cite the article:
4 | ### Topology Optimization under Uncertainty using a Stochastic Gradient-based Approach ###
5 | Subhayan De, Jerrad Hampton, Kurt Maute, and Alireza Doostan (2020)
6 | Structural and Multidisciplinary Optimization, 62(5), 2255-2278.
7 | https://doi.org/10.1007/s00158-020-02599-z
8 |
9 | ### BibTeX entry: ###
10 | @article{de2020topology,
11 | title={Topology optimization under uncertainty using a stochastic gradient-based approach},
12 | author={De, Subhayan and Hampton, Jerrad and Maute, Kurt and Doostan, Alireza},
13 | journal={Structural and Multidisciplinary Optimization},
14 | volume={62},
15 | number={5},
16 | pages={2255--2278},
17 | year={2020},
18 | publisher={Springer}
19 | }
20 |
21 | Download the SGD module from https://github.com/CU-UQ/SGD.
22 | See the demo https://github.com/CU-UQ/SGD/blob/master/sgd_demo.py for an example of the implementation.
23 | For a description of the algorithms, see De et al (2020) (https://doi.org/10.1007/s00158-020-02599-z) and Ruder (2016) (https://arxiv.org/abs/1609.04747).
24 | Please report any bugs to Subhayan.De@colorado.edu
25 | ### Website: www.subhayande.com
26 |
27 | Required packages: numpy, time
28 |
29 | This module implements:
30 | (i) Stochastic Gradient Descent,
31 | (ii) SGD with Momentum,
32 | (iii) NAG,
33 | (iv) AdaGrad,
34 | (iv) RMSprop,
35 | (vi) Adam,
36 | (vii) Adamax,
37 | (viii) Adadelta,
38 | (ix) Nadam,
39 | (x) SAG,
40 | (xi) minibatch SGD,
41 | (xii) SVRG.
42 |
43 | *NOTE*: Currently, the stopping conditions are maximum number of iteration and 2nd norm of gradient vector is smaller than a tolerance value. Only, time-delay and exponential learning schedules are implemented.
44 |
45 | Download this file and use *import SGD as sgd* to use the algorithms.
46 | See *sgd_demo.py* for an example.
47 |
48 |
--------------------------------------------------------------------------------
/sgd_demo.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # -*- coding: utf-8 -*-
3 | """
4 | -------------------------------------------------------------------------------
5 | If you find this code useful please cite the article:
6 | Topology Optimization under Uncertainty using a Stochastic Gradient-based Approach
7 | Subhayan De, Jerrad Hampton, Kurt Maute, and Alireza Doostan (2020)
8 | Structural and Multidisciplinary Optimization, 62(5), 2255-2278. https://doi.org/10.1007/s00158-020-02599-z
9 | BibTeX entry:
10 | @article{de2020topology,
11 | title={Topology optimization under uncertainty using a stochastic gradient-based approach},
12 | author={De, Subhayan and Hampton, Jerrad and Maute, Kurt and Doostan, Alireza},
13 | journal={Structural and Multidisciplinary Optimization},
14 | volume={62},
15 | number={5},
16 | pages={2255--2278},
17 | year={2020},
18 | publisher={Springer}
19 | }
20 | Download the SGD module from https://github.com/CU-UQ/SGD.
21 | See the demo https://github.com/CU-UQ/SGD/blob/master/sgd_demo.py for an example of the implementation.
22 | For a description of the algorithms, see De et al (2020) (https://doi.org/10.1007/s00158-020-02599-z) and Ruder (2016) (https://arxiv.org/abs/1609.04747).
23 | Please report any bugs to Subhayan.De@colorado.edu
24 | Website: www.subhayande.com
25 | -------------------------------------------------------------------------------
26 | This file uses a linear regression example to show the use of StochasticGradientDescent module.
27 | Available classes:
28 | (1) Stochastic gradient descent
29 | (2) SGD with momentum
30 | (3) Nesterov accelerated SGD
31 | (4) AdaGrad
32 | (5) RMSprop
33 | (6) Adam
34 | (7) Adamax
35 | (8) Adadelta
36 | (9) Nadam
37 | (10) Stochastic average gradient
38 | (11) Mini-batch stochastic gradient descent
39 | (12) SVRG
40 |
41 | Copyright (C) 2019 Subhayan De
42 |
43 | This program is free software: you can redistribute it and/or modify
44 | it under the terms of the GNU General Public License as published by
45 | the Free Software Foundation, either version 3 of the License, or
46 | (at your option) any later version.
47 |
48 | This program is distributed in the hope that it will be useful,
49 | but WITHOUT ANY WARRANTY; without even the implied warranty of
50 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
51 | GNU General Public License for more details.
52 |
53 | You should have received a copy of the GNU General Public License
54 | along with this program. If not, see .
55 |
56 | Created on Mon Jul 9 21:19:43 2018
57 | @author: Subhayan De (email: Subhayan.De@colorado.edu)
58 | """
59 | # import matplotlib and numpy packages
60 | import matplotlib
61 | import numpy as np
62 | import matplotlib.pyplot as plt
63 | import matplotlib.cm as cm
64 |
65 | # import the algorithm classes from the SGD module
66 | import SGD as sgd
67 |
68 | def main():
69 | # Generate data
70 | np.random.seed(0)
71 | n = 1000
72 | X = 2.0*np.random.rand(n,1)
73 |
74 | # parameters
75 | w1 = 3.0
76 | w2 = 4.5
77 | # noisy data
78 | y = w1 + w2 * X + np.random.randn(n,1)
79 |
80 | X_b = np.c_[np.ones((n,1)), X] # add 1 to each instance
81 | # save data and x to files to be used later to calculate objectives and gradients
82 | np.savetxt('test1_data.txt',y)
83 | np.savetxt('test1_x.txt',X_b)
84 |
85 | # select the algorithm to run
86 | # acceptable terms: SGD, SGDmomentum, SGDnesterov, AdaGrad, RMSprop, Adam, Adamax, Adadelta, Nadam, minibatchSGD, SAG, SVRG
87 | alg = 'Adam'
88 |
89 | # initial parameter
90 | w10 = 2.0
91 | w20 = 0.5
92 | theta = np.array([w10, w20])
93 | R = objFun(theta) # initial objective
94 | it = 0 # set iteration counter to 0
95 | maxIt = 2500 # maximum iteration
96 | dR = gradFun(theta) # initial gradient
97 | if alg == 'SGD':
98 | # Stochastic Gradient Descent
99 | eta = 0.0025 # learning rate
100 | opt = sgd.SGD(obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=gradFun) # initialize
101 | opt.performIter() # perform iterations
102 | thetaHist = opt.getParamHist()
103 | elif alg == 'SGDmomentum':
104 | # Stochastic Gradient Descent with momentum
105 | eta = 0.001 # learning rate
106 | opt = sgd.SGD(obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=gradFun, momentum = 0.9) # initialize
107 | opt.performIter() # perform iterations
108 | thetaHist = opt.getParamHist()
109 | elif alg == 'SGDnesterov':
110 | # Stochastic Gradient Descent with Nesterov momentum
111 | eta = 0.001 # learning rate
112 | opt = sgd.SGD(obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=gradFun, momentum = 0.9,nesterov = True) # initialize
113 | opt.performIter() # perform iterations
114 | thetaHist = opt.getParamHist()
115 | elif alg == 'AdaGrad':
116 | # AdaGrad
117 | eta = 0.25 # learning rate
118 | opt = sgd.AdaGrad(gradHist=0.0,obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=gradFun) # initialize
119 | opt.performIter() # perform iterations
120 | thetaHist = opt.getParamHist()
121 | elif alg == 'RMSprop':
122 | # RMSprop
123 | eta = 0.9 # learning rate
124 | opt = sgd.RMSprop(gradHist=0.0,rho=0.1,obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=gradFun) # initialize
125 | opt.performIter() # perform iterations
126 | thetaHist = opt.getParamHist()
127 | elif alg == 'Adam':
128 | # Adam
129 | eta = 0.025 # learning rate
130 | opt = sgd.Adam(m = 0.0,v = 0.0,beta1 = 0.9,beta2 = 0.999,obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=gradFun) # initialize
131 | opt.performIter() # perform iterations
132 | thetaHist = opt.getParamHist()
133 | elif alg == 'Adamax':
134 | # Adamax
135 | eta = 0.025 # learning rate
136 | opt = sgd.Adamax(m = 0.0,u = 0.0,beta1 = 0.9,beta2 = 0.999,obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=gradFun) # initialize
137 | opt.performIter() # perform iterations
138 | thetaHist = opt.getParamHist()
139 | elif alg == 'Adadelta':
140 | # Adadelta
141 | eta = 1.0 # learning rate
142 | opt = sgd.Adadelta(gradHist=0.0,updateHist=0.0,rho=0.99,obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=gradFun) # initialize
143 | opt.performIter() # perform iterations
144 | thetaHist = opt.getParamHist()
145 | elif alg == 'Nadam':
146 | # Nadam
147 | eta = 0.01# learning rate
148 | opt = sgd.Nadam(m = 0.0,v = 0.0,beta1 = 0.9,beta2 = 0.999,obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=gradFun) # initialize
149 | opt.performIter() # perform iterations
150 | thetaHist = opt.getParamHist()
151 | elif alg == 'minibatchSGD':
152 | # mini batch stochastic gradient descent
153 | eta = 0.025 # learning rate
154 | opt = sgd.minibatchSGD(nSamples = 10,nTotSamples = n,newGrad = 0.0,obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=batchGradFun) # initialize
155 | opt.performIter() # perform iterations
156 | thetaHist = opt.getParamHist()
157 | elif alg == 'SAG':
158 | # stochastic average gradient descent
159 | eta = 0.0025 # learning rate
160 | opt = sgd.SAG(nSamples = 20,nTotSamples= n, obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=batchGradFun) # initialize
161 | opt.performIter() # perform iterations
162 | thetaHist = opt.getParamHist()
163 | elif alg == 'SVRG':
164 | # stochastic variance reduced gradient descent
165 | eta = 0.004
166 | opt = sgd.SVRG(nTotSamples = n, innerIter = 10, outerIter = 200, option = 1,obj = R, grad = dR, eta = eta, param = theta, iter = it, maxiter=maxIt, objFun=objFun, gradFun=batchGradFun)
167 | opt.performOuterIter()
168 | thetaHist = opt.getParamHist()
169 | else:
170 | raise ValueError('No such algorithm is in the module.\n Please use one of the following options:\nSGD, SGDmomentum, SGDnesterov, AdaGrad, RMSprop, Adam, Adamax, Adadelta, Nadam, minibatchSGD, SAG, SVRG')
171 |
172 |
173 | # Plot the results
174 | matplotlib.rcParams['xtick.direction'] = 'out'
175 | matplotlib.rcParams['ytick.direction'] = 'out'
176 | delta = 0.025
177 | w1 = np.arange(-2.0, 10.0, delta)
178 | w2 = np.arange(-2.0, 10.0, delta)
179 | Xx, Yy = np.meshgrid(w1, w2)
180 | nx = np.shape(Xx)
181 | Z = np.zeros(nx)
182 | for i in range(nx[0]):
183 | for j in range(nx[1]):
184 | Z[i,j] = (np.linalg.norm(y - Xx[i,j]-Yy[i,j]*X,2))**2/n
185 |
186 | plt.figure()
187 | levels = np.arange(0, 40, 4)
188 | CS = plt.contour(Xx, Yy, Z, levels,origin='lower',
189 | linewidths=2,
190 | extent=(-2, 10, -2, 10))
191 | #plt.clabel(CS, inline=1, fontsize=10)
192 | # Thicken the zero contour.
193 | zc = CS.collections[6]
194 | plt.setp(zc, linewidth=4)
195 |
196 | plt.clabel(CS, levels[1::2], # label every second level
197 | inline=1,
198 | fmt='%1.1f',
199 | fontsize=10)
200 | im = plt.imshow(Z, interpolation='bilinear', origin='lower', cmap=cm.Wistia, extent=(-2, 10, -2, 10))
201 |
202 | # make a colorbar
203 | plt.colorbar(im, shrink=0.8, extend='both')
204 | plt.plot(thetaHist[0,:], thetaHist[1,:],'r.',linewidth = 6)
205 | titl = opt.alg+' with a learning rate '+str(eta)
206 | plt.title(titl)
207 | return opt
208 |
209 | def objFun(param):
210 | # objective function
211 | y = np.loadtxt('test1_data.txt')
212 | X_b = np.loadtxt('test1_x.txt')
213 | n = np.size(y)
214 | yprime = X_b.dot(param)
215 | obj = np.sum(np.multiply(y-yprime,y-yprime))/n
216 | return obj
217 |
218 | def gradFun(param):
219 | # gradient function
220 | y = np.loadtxt('test1_data.txt')
221 | X_b = np.loadtxt('test1_x.txt')
222 | n = np.size(y)
223 | nprime = np.random.randint(n)
224 | xi = X_b[nprime:nprime+1]
225 | yi = y[nprime:nprime+1]
226 | grad = 2.0 * xi.T.dot(xi.dot(param) - yi)
227 | return grad
228 |
229 | def batchGradFun(param,nBatch):
230 | # batch gradient function
231 | y = np.loadtxt('test1_data.txt')
232 | X_b = np.loadtxt('test1_x.txt')
233 | n = np.size(y)
234 | nParam = np.size(param)
235 | batchGrad = np.zeros((nParam,nBatch))
236 | nprime = np.random.choice(range(n), nBatch, replace = False)
237 | for i in range(nBatch):
238 | xi = X_b[nprime[i]:nprime[i]+1]
239 | yi = y[nprime[i]:nprime[i]+1]
240 | batchGrad[:,i] = 2.0 * xi.T.dot(xi.dot(param) - yi)
241 | return batchGrad,nprime
242 |
243 | if __name__ == "__main__":
244 | opt = main()
245 |
--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
1 | GNU GENERAL PUBLIC LICENSE
2 | Version 3, 29 June 2007
3 |
4 | Copyright (C) 2007 Free Software Foundation, Inc.
5 | Everyone is permitted to copy and distribute verbatim copies
6 | of this license document, but changing it is not allowed.
7 |
8 | Preamble
9 |
10 | The GNU General Public License is a free, copyleft license for
11 | software and other kinds of works.
12 |
13 | The licenses for most software and other practical works are designed
14 | to take away your freedom to share and change the works. By contrast,
15 | the GNU General Public License is intended to guarantee your freedom to
16 | share and change all versions of a program--to make sure it remains free
17 | software for all its users. We, the Free Software Foundation, use the
18 | GNU General Public License for most of our software; it applies also to
19 | any other work released this way by its authors. You can apply it to
20 | your programs, too.
21 |
22 | When we speak of free software, we are referring to freedom, not
23 | price. Our General Public Licenses are designed to make sure that you
24 | have the freedom to distribute copies of free software (and charge for
25 | them if you wish), that you receive source code or can get it if you
26 | want it, that you can change the software or use pieces of it in new
27 | free programs, and that you know you can do these things.
28 |
29 | To protect your rights, we need to prevent others from denying you
30 | these rights or asking you to surrender the rights. Therefore, you have
31 | certain responsibilities if you distribute copies of the software, or if
32 | you modify it: responsibilities to respect the freedom of others.
33 |
34 | For example, if you distribute copies of such a program, whether
35 | gratis or for a fee, you must pass on to the recipients the same
36 | freedoms that you received. You must make sure that they, too, receive
37 | or can get the source code. And you must show them these terms so they
38 | know their rights.
39 |
40 | Developers that use the GNU GPL protect your rights with two steps:
41 | (1) assert copyright on the software, and (2) offer you this License
42 | giving you legal permission to copy, distribute and/or modify it.
43 |
44 | For the developers' and authors' protection, the GPL clearly explains
45 | that there is no warranty for this free software. For both users' and
46 | authors' sake, the GPL requires that modified versions be marked as
47 | changed, so that their problems will not be attributed erroneously to
48 | authors of previous versions.
49 |
50 | Some devices are designed to deny users access to install or run
51 | modified versions of the software inside them, although the manufacturer
52 | can do so. This is fundamentally incompatible with the aim of
53 | protecting users' freedom to change the software. The systematic
54 | pattern of such abuse occurs in the area of products for individuals to
55 | use, which is precisely where it is most unacceptable. Therefore, we
56 | have designed this version of the GPL to prohibit the practice for those
57 | products. If such problems arise substantially in other domains, we
58 | stand ready to extend this provision to those domains in future versions
59 | of the GPL, as needed to protect the freedom of users.
60 |
61 | Finally, every program is threatened constantly by software patents.
62 | States should not allow patents to restrict development and use of
63 | software on general-purpose computers, but in those that do, we wish to
64 | avoid the special danger that patents applied to a free program could
65 | make it effectively proprietary. To prevent this, the GPL assures that
66 | patents cannot be used to render the program non-free.
67 |
68 | The precise terms and conditions for copying, distribution and
69 | modification follow.
70 |
71 | TERMS AND CONDITIONS
72 |
73 | 0. Definitions.
74 |
75 | "This License" refers to version 3 of the GNU General Public License.
76 |
77 | "Copyright" also means copyright-like laws that apply to other kinds of
78 | works, such as semiconductor masks.
79 |
80 | "The Program" refers to any copyrightable work licensed under this
81 | License. Each licensee is addressed as "you". "Licensees" and
82 | "recipients" may be individuals or organizations.
83 |
84 | To "modify" a work means to copy from or adapt all or part of the work
85 | in a fashion requiring copyright permission, other than the making of an
86 | exact copy. The resulting work is called a "modified version" of the
87 | earlier work or a work "based on" the earlier work.
88 |
89 | A "covered work" means either the unmodified Program or a work based
90 | on the Program.
91 |
92 | To "propagate" a work means to do anything with it that, without
93 | permission, would make you directly or secondarily liable for
94 | infringement under applicable copyright law, except executing it on a
95 | computer or modifying a private copy. Propagation includes copying,
96 | distribution (with or without modification), making available to the
97 | public, and in some countries other activities as well.
98 |
99 | To "convey" a work means any kind of propagation that enables other
100 | parties to make or receive copies. Mere interaction with a user through
101 | a computer network, with no transfer of a copy, is not conveying.
102 |
103 | An interactive user interface displays "Appropriate Legal Notices"
104 | to the extent that it includes a convenient and prominently visible
105 | feature that (1) displays an appropriate copyright notice, and (2)
106 | tells the user that there is no warranty for the work (except to the
107 | extent that warranties are provided), that licensees may convey the
108 | work under this License, and how to view a copy of this License. If
109 | the interface presents a list of user commands or options, such as a
110 | menu, a prominent item in the list meets this criterion.
111 |
112 | 1. Source Code.
113 |
114 | The "source code" for a work means the preferred form of the work
115 | for making modifications to it. "Object code" means any non-source
116 | form of a work.
117 |
118 | A "Standard Interface" means an interface that either is an official
119 | standard defined by a recognized standards body, or, in the case of
120 | interfaces specified for a particular programming language, one that
121 | is widely used among developers working in that language.
122 |
123 | The "System Libraries" of an executable work include anything, other
124 | than the work as a whole, that (a) is included in the normal form of
125 | packaging a Major Component, but which is not part of that Major
126 | Component, and (b) serves only to enable use of the work with that
127 | Major Component, or to implement a Standard Interface for which an
128 | implementation is available to the public in source code form. A
129 | "Major Component", in this context, means a major essential component
130 | (kernel, window system, and so on) of the specific operating system
131 | (if any) on which the executable work runs, or a compiler used to
132 | produce the work, or an object code interpreter used to run it.
133 |
134 | The "Corresponding Source" for a work in object code form means all
135 | the source code needed to generate, install, and (for an executable
136 | work) run the object code and to modify the work, including scripts to
137 | control those activities. However, it does not include the work's
138 | System Libraries, or general-purpose tools or generally available free
139 | programs which are used unmodified in performing those activities but
140 | which are not part of the work. For example, Corresponding Source
141 | includes interface definition files associated with source files for
142 | the work, and the source code for shared libraries and dynamically
143 | linked subprograms that the work is specifically designed to require,
144 | such as by intimate data communication or control flow between those
145 | subprograms and other parts of the work.
146 |
147 | The Corresponding Source need not include anything that users
148 | can regenerate automatically from other parts of the Corresponding
149 | Source.
150 |
151 | The Corresponding Source for a work in source code form is that
152 | same work.
153 |
154 | 2. Basic Permissions.
155 |
156 | All rights granted under this License are granted for the term of
157 | copyright on the Program, and are irrevocable provided the stated
158 | conditions are met. This License explicitly affirms your unlimited
159 | permission to run the unmodified Program. The output from running a
160 | covered work is covered by this License only if the output, given its
161 | content, constitutes a covered work. This License acknowledges your
162 | rights of fair use or other equivalent, as provided by copyright law.
163 |
164 | You may make, run and propagate covered works that you do not
165 | convey, without conditions so long as your license otherwise remains
166 | in force. You may convey covered works to others for the sole purpose
167 | of having them make modifications exclusively for you, or provide you
168 | with facilities for running those works, provided that you comply with
169 | the terms of this License in conveying all material for which you do
170 | not control copyright. Those thus making or running the covered works
171 | for you must do so exclusively on your behalf, under your direction
172 | and control, on terms that prohibit them from making any copies of
173 | your copyrighted material outside their relationship with you.
174 |
175 | Conveying under any other circumstances is permitted solely under
176 | the conditions stated below. Sublicensing is not allowed; section 10
177 | makes it unnecessary.
178 |
179 | 3. Protecting Users' Legal Rights From Anti-Circumvention Law.
180 |
181 | No covered work shall be deemed part of an effective technological
182 | measure under any applicable law fulfilling obligations under article
183 | 11 of the WIPO copyright treaty adopted on 20 December 1996, or
184 | similar laws prohibiting or restricting circumvention of such
185 | measures.
186 |
187 | When you convey a covered work, you waive any legal power to forbid
188 | circumvention of technological measures to the extent such circumvention
189 | is effected by exercising rights under this License with respect to
190 | the covered work, and you disclaim any intention to limit operation or
191 | modification of the work as a means of enforcing, against the work's
192 | users, your or third parties' legal rights to forbid circumvention of
193 | technological measures.
194 |
195 | 4. Conveying Verbatim Copies.
196 |
197 | You may convey verbatim copies of the Program's source code as you
198 | receive it, in any medium, provided that you conspicuously and
199 | appropriately publish on each copy an appropriate copyright notice;
200 | keep intact all notices stating that this License and any
201 | non-permissive terms added in accord with section 7 apply to the code;
202 | keep intact all notices of the absence of any warranty; and give all
203 | recipients a copy of this License along with the Program.
204 |
205 | You may charge any price or no price for each copy that you convey,
206 | and you may offer support or warranty protection for a fee.
207 |
208 | 5. Conveying Modified Source Versions.
209 |
210 | You may convey a work based on the Program, or the modifications to
211 | produce it from the Program, in the form of source code under the
212 | terms of section 4, provided that you also meet all of these conditions:
213 |
214 | a) The work must carry prominent notices stating that you modified
215 | it, and giving a relevant date.
216 |
217 | b) The work must carry prominent notices stating that it is
218 | released under this License and any conditions added under section
219 | 7. This requirement modifies the requirement in section 4 to
220 | "keep intact all notices".
221 |
222 | c) You must license the entire work, as a whole, under this
223 | License to anyone who comes into possession of a copy. This
224 | License will therefore apply, along with any applicable section 7
225 | additional terms, to the whole of the work, and all its parts,
226 | regardless of how they are packaged. This License gives no
227 | permission to license the work in any other way, but it does not
228 | invalidate such permission if you have separately received it.
229 |
230 | d) If the work has interactive user interfaces, each must display
231 | Appropriate Legal Notices; however, if the Program has interactive
232 | interfaces that do not display Appropriate Legal Notices, your
233 | work need not make them do so.
234 |
235 | A compilation of a covered work with other separate and independent
236 | works, which are not by their nature extensions of the covered work,
237 | and which are not combined with it such as to form a larger program,
238 | in or on a volume of a storage or distribution medium, is called an
239 | "aggregate" if the compilation and its resulting copyright are not
240 | used to limit the access or legal rights of the compilation's users
241 | beyond what the individual works permit. Inclusion of a covered work
242 | in an aggregate does not cause this License to apply to the other
243 | parts of the aggregate.
244 |
245 | 6. Conveying Non-Source Forms.
246 |
247 | You may convey a covered work in object code form under the terms
248 | of sections 4 and 5, provided that you also convey the
249 | machine-readable Corresponding Source under the terms of this License,
250 | in one of these ways:
251 |
252 | a) Convey the object code in, or embodied in, a physical product
253 | (including a physical distribution medium), accompanied by the
254 | Corresponding Source fixed on a durable physical medium
255 | customarily used for software interchange.
256 |
257 | b) Convey the object code in, or embodied in, a physical product
258 | (including a physical distribution medium), accompanied by a
259 | written offer, valid for at least three years and valid for as
260 | long as you offer spare parts or customer support for that product
261 | model, to give anyone who possesses the object code either (1) a
262 | copy of the Corresponding Source for all the software in the
263 | product that is covered by this License, on a durable physical
264 | medium customarily used for software interchange, for a price no
265 | more than your reasonable cost of physically performing this
266 | conveying of source, or (2) access to copy the
267 | Corresponding Source from a network server at no charge.
268 |
269 | c) Convey individual copies of the object code with a copy of the
270 | written offer to provide the Corresponding Source. This
271 | alternative is allowed only occasionally and noncommercially, and
272 | only if you received the object code with such an offer, in accord
273 | with subsection 6b.
274 |
275 | d) Convey the object code by offering access from a designated
276 | place (gratis or for a charge), and offer equivalent access to the
277 | Corresponding Source in the same way through the same place at no
278 | further charge. You need not require recipients to copy the
279 | Corresponding Source along with the object code. If the place to
280 | copy the object code is a network server, the Corresponding Source
281 | may be on a different server (operated by you or a third party)
282 | that supports equivalent copying facilities, provided you maintain
283 | clear directions next to the object code saying where to find the
284 | Corresponding Source. Regardless of what server hosts the
285 | Corresponding Source, you remain obligated to ensure that it is
286 | available for as long as needed to satisfy these requirements.
287 |
288 | e) Convey the object code using peer-to-peer transmission, provided
289 | you inform other peers where the object code and Corresponding
290 | Source of the work are being offered to the general public at no
291 | charge under subsection 6d.
292 |
293 | A separable portion of the object code, whose source code is excluded
294 | from the Corresponding Source as a System Library, need not be
295 | included in conveying the object code work.
296 |
297 | A "User Product" is either (1) a "consumer product", which means any
298 | tangible personal property which is normally used for personal, family,
299 | or household purposes, or (2) anything designed or sold for incorporation
300 | into a dwelling. In determining whether a product is a consumer product,
301 | doubtful cases shall be resolved in favor of coverage. For a particular
302 | product received by a particular user, "normally used" refers to a
303 | typical or common use of that class of product, regardless of the status
304 | of the particular user or of the way in which the particular user
305 | actually uses, or expects or is expected to use, the product. A product
306 | is a consumer product regardless of whether the product has substantial
307 | commercial, industrial or non-consumer uses, unless such uses represent
308 | the only significant mode of use of the product.
309 |
310 | "Installation Information" for a User Product means any methods,
311 | procedures, authorization keys, or other information required to install
312 | and execute modified versions of a covered work in that User Product from
313 | a modified version of its Corresponding Source. The information must
314 | suffice to ensure that the continued functioning of the modified object
315 | code is in no case prevented or interfered with solely because
316 | modification has been made.
317 |
318 | If you convey an object code work under this section in, or with, or
319 | specifically for use in, a User Product, and the conveying occurs as
320 | part of a transaction in which the right of possession and use of the
321 | User Product is transferred to the recipient in perpetuity or for a
322 | fixed term (regardless of how the transaction is characterized), the
323 | Corresponding Source conveyed under this section must be accompanied
324 | by the Installation Information. But this requirement does not apply
325 | if neither you nor any third party retains the ability to install
326 | modified object code on the User Product (for example, the work has
327 | been installed in ROM).
328 |
329 | The requirement to provide Installation Information does not include a
330 | requirement to continue to provide support service, warranty, or updates
331 | for a work that has been modified or installed by the recipient, or for
332 | the User Product in which it has been modified or installed. Access to a
333 | network may be denied when the modification itself materially and
334 | adversely affects the operation of the network or violates the rules and
335 | protocols for communication across the network.
336 |
337 | Corresponding Source conveyed, and Installation Information provided,
338 | in accord with this section must be in a format that is publicly
339 | documented (and with an implementation available to the public in
340 | source code form), and must require no special password or key for
341 | unpacking, reading or copying.
342 |
343 | 7. Additional Terms.
344 |
345 | "Additional permissions" are terms that supplement the terms of this
346 | License by making exceptions from one or more of its conditions.
347 | Additional permissions that are applicable to the entire Program shall
348 | be treated as though they were included in this License, to the extent
349 | that they are valid under applicable law. If additional permissions
350 | apply only to part of the Program, that part may be used separately
351 | under those permissions, but the entire Program remains governed by
352 | this License without regard to the additional permissions.
353 |
354 | When you convey a copy of a covered work, you may at your option
355 | remove any additional permissions from that copy, or from any part of
356 | it. (Additional permissions may be written to require their own
357 | removal in certain cases when you modify the work.) You may place
358 | additional permissions on material, added by you to a covered work,
359 | for which you have or can give appropriate copyright permission.
360 |
361 | Notwithstanding any other provision of this License, for material you
362 | add to a covered work, you may (if authorized by the copyright holders of
363 | that material) supplement the terms of this License with terms:
364 |
365 | a) Disclaiming warranty or limiting liability differently from the
366 | terms of sections 15 and 16 of this License; or
367 |
368 | b) Requiring preservation of specified reasonable legal notices or
369 | author attributions in that material or in the Appropriate Legal
370 | Notices displayed by works containing it; or
371 |
372 | c) Prohibiting misrepresentation of the origin of that material, or
373 | requiring that modified versions of such material be marked in
374 | reasonable ways as different from the original version; or
375 |
376 | d) Limiting the use for publicity purposes of names of licensors or
377 | authors of the material; or
378 |
379 | e) Declining to grant rights under trademark law for use of some
380 | trade names, trademarks, or service marks; or
381 |
382 | f) Requiring indemnification of licensors and authors of that
383 | material by anyone who conveys the material (or modified versions of
384 | it) with contractual assumptions of liability to the recipient, for
385 | any liability that these contractual assumptions directly impose on
386 | those licensors and authors.
387 |
388 | All other non-permissive additional terms are considered "further
389 | restrictions" within the meaning of section 10. If the Program as you
390 | received it, or any part of it, contains a notice stating that it is
391 | governed by this License along with a term that is a further
392 | restriction, you may remove that term. If a license document contains
393 | a further restriction but permits relicensing or conveying under this
394 | License, you may add to a covered work material governed by the terms
395 | of that license document, provided that the further restriction does
396 | not survive such relicensing or conveying.
397 |
398 | If you add terms to a covered work in accord with this section, you
399 | must place, in the relevant source files, a statement of the
400 | additional terms that apply to those files, or a notice indicating
401 | where to find the applicable terms.
402 |
403 | Additional terms, permissive or non-permissive, may be stated in the
404 | form of a separately written license, or stated as exceptions;
405 | the above requirements apply either way.
406 |
407 | 8. Termination.
408 |
409 | You may not propagate or modify a covered work except as expressly
410 | provided under this License. Any attempt otherwise to propagate or
411 | modify it is void, and will automatically terminate your rights under
412 | this License (including any patent licenses granted under the third
413 | paragraph of section 11).
414 |
415 | However, if you cease all violation of this License, then your
416 | license from a particular copyright holder is reinstated (a)
417 | provisionally, unless and until the copyright holder explicitly and
418 | finally terminates your license, and (b) permanently, if the copyright
419 | holder fails to notify you of the violation by some reasonable means
420 | prior to 60 days after the cessation.
421 |
422 | Moreover, your license from a particular copyright holder is
423 | reinstated permanently if the copyright holder notifies you of the
424 | violation by some reasonable means, this is the first time you have
425 | received notice of violation of this License (for any work) from that
426 | copyright holder, and you cure the violation prior to 30 days after
427 | your receipt of the notice.
428 |
429 | Termination of your rights under this section does not terminate the
430 | licenses of parties who have received copies or rights from you under
431 | this License. If your rights have been terminated and not permanently
432 | reinstated, you do not qualify to receive new licenses for the same
433 | material under section 10.
434 |
435 | 9. Acceptance Not Required for Having Copies.
436 |
437 | You are not required to accept this License in order to receive or
438 | run a copy of the Program. Ancillary propagation of a covered work
439 | occurring solely as a consequence of using peer-to-peer transmission
440 | to receive a copy likewise does not require acceptance. However,
441 | nothing other than this License grants you permission to propagate or
442 | modify any covered work. These actions infringe copyright if you do
443 | not accept this License. Therefore, by modifying or propagating a
444 | covered work, you indicate your acceptance of this License to do so.
445 |
446 | 10. Automatic Licensing of Downstream Recipients.
447 |
448 | Each time you convey a covered work, the recipient automatically
449 | receives a license from the original licensors, to run, modify and
450 | propagate that work, subject to this License. You are not responsible
451 | for enforcing compliance by third parties with this License.
452 |
453 | An "entity transaction" is a transaction transferring control of an
454 | organization, or substantially all assets of one, or subdividing an
455 | organization, or merging organizations. If propagation of a covered
456 | work results from an entity transaction, each party to that
457 | transaction who receives a copy of the work also receives whatever
458 | licenses to the work the party's predecessor in interest had or could
459 | give under the previous paragraph, plus a right to possession of the
460 | Corresponding Source of the work from the predecessor in interest, if
461 | the predecessor has it or can get it with reasonable efforts.
462 |
463 | You may not impose any further restrictions on the exercise of the
464 | rights granted or affirmed under this License. For example, you may
465 | not impose a license fee, royalty, or other charge for exercise of
466 | rights granted under this License, and you may not initiate litigation
467 | (including a cross-claim or counterclaim in a lawsuit) alleging that
468 | any patent claim is infringed by making, using, selling, offering for
469 | sale, or importing the Program or any portion of it.
470 |
471 | 11. Patents.
472 |
473 | A "contributor" is a copyright holder who authorizes use under this
474 | License of the Program or a work on which the Program is based. The
475 | work thus licensed is called the contributor's "contributor version".
476 |
477 | A contributor's "essential patent claims" are all patent claims
478 | owned or controlled by the contributor, whether already acquired or
479 | hereafter acquired, that would be infringed by some manner, permitted
480 | by this License, of making, using, or selling its contributor version,
481 | but do not include claims that would be infringed only as a
482 | consequence of further modification of the contributor version. For
483 | purposes of this definition, "control" includes the right to grant
484 | patent sublicenses in a manner consistent with the requirements of
485 | this License.
486 |
487 | Each contributor grants you a non-exclusive, worldwide, royalty-free
488 | patent license under the contributor's essential patent claims, to
489 | make, use, sell, offer for sale, import and otherwise run, modify and
490 | propagate the contents of its contributor version.
491 |
492 | In the following three paragraphs, a "patent license" is any express
493 | agreement or commitment, however denominated, not to enforce a patent
494 | (such as an express permission to practice a patent or covenant not to
495 | sue for patent infringement). To "grant" such a patent license to a
496 | party means to make such an agreement or commitment not to enforce a
497 | patent against the party.
498 |
499 | If you convey a covered work, knowingly relying on a patent license,
500 | and the Corresponding Source of the work is not available for anyone
501 | to copy, free of charge and under the terms of this License, through a
502 | publicly available network server or other readily accessible means,
503 | then you must either (1) cause the Corresponding Source to be so
504 | available, or (2) arrange to deprive yourself of the benefit of the
505 | patent license for this particular work, or (3) arrange, in a manner
506 | consistent with the requirements of this License, to extend the patent
507 | license to downstream recipients. "Knowingly relying" means you have
508 | actual knowledge that, but for the patent license, your conveying the
509 | covered work in a country, or your recipient's use of the covered work
510 | in a country, would infringe one or more identifiable patents in that
511 | country that you have reason to believe are valid.
512 |
513 | If, pursuant to or in connection with a single transaction or
514 | arrangement, you convey, or propagate by procuring conveyance of, a
515 | covered work, and grant a patent license to some of the parties
516 | receiving the covered work authorizing them to use, propagate, modify
517 | or convey a specific copy of the covered work, then the patent license
518 | you grant is automatically extended to all recipients of the covered
519 | work and works based on it.
520 |
521 | A patent license is "discriminatory" if it does not include within
522 | the scope of its coverage, prohibits the exercise of, or is
523 | conditioned on the non-exercise of one or more of the rights that are
524 | specifically granted under this License. You may not convey a covered
525 | work if you are a party to an arrangement with a third party that is
526 | in the business of distributing software, under which you make payment
527 | to the third party based on the extent of your activity of conveying
528 | the work, and under which the third party grants, to any of the
529 | parties who would receive the covered work from you, a discriminatory
530 | patent license (a) in connection with copies of the covered work
531 | conveyed by you (or copies made from those copies), or (b) primarily
532 | for and in connection with specific products or compilations that
533 | contain the covered work, unless you entered into that arrangement,
534 | or that patent license was granted, prior to 28 March 2007.
535 |
536 | Nothing in this License shall be construed as excluding or limiting
537 | any implied license or other defenses to infringement that may
538 | otherwise be available to you under applicable patent law.
539 |
540 | 12. No Surrender of Others' Freedom.
541 |
542 | If conditions are imposed on you (whether by court order, agreement or
543 | otherwise) that contradict the conditions of this License, they do not
544 | excuse you from the conditions of this License. If you cannot convey a
545 | covered work so as to satisfy simultaneously your obligations under this
546 | License and any other pertinent obligations, then as a consequence you may
547 | not convey it at all. For example, if you agree to terms that obligate you
548 | to collect a royalty for further conveying from those to whom you convey
549 | the Program, the only way you could satisfy both those terms and this
550 | License would be to refrain entirely from conveying the Program.
551 |
552 | 13. Use with the GNU Affero General Public License.
553 |
554 | Notwithstanding any other provision of this License, you have
555 | permission to link or combine any covered work with a work licensed
556 | under version 3 of the GNU Affero General Public License into a single
557 | combined work, and to convey the resulting work. The terms of this
558 | License will continue to apply to the part which is the covered work,
559 | but the special requirements of the GNU Affero General Public License,
560 | section 13, concerning interaction through a network will apply to the
561 | combination as such.
562 |
563 | 14. Revised Versions of this License.
564 |
565 | The Free Software Foundation may publish revised and/or new versions of
566 | the GNU General Public License from time to time. Such new versions will
567 | be similar in spirit to the present version, but may differ in detail to
568 | address new problems or concerns.
569 |
570 | Each version is given a distinguishing version number. If the
571 | Program specifies that a certain numbered version of the GNU General
572 | Public License "or any later version" applies to it, you have the
573 | option of following the terms and conditions either of that numbered
574 | version or of any later version published by the Free Software
575 | Foundation. If the Program does not specify a version number of the
576 | GNU General Public License, you may choose any version ever published
577 | by the Free Software Foundation.
578 |
579 | If the Program specifies that a proxy can decide which future
580 | versions of the GNU General Public License can be used, that proxy's
581 | public statement of acceptance of a version permanently authorizes you
582 | to choose that version for the Program.
583 |
584 | Later license versions may give you additional or different
585 | permissions. However, no additional obligations are imposed on any
586 | author or copyright holder as a result of your choosing to follow a
587 | later version.
588 |
589 | 15. Disclaimer of Warranty.
590 |
591 | THERE IS NO WARRANTY FOR THE PROGRAM, TO THE EXTENT PERMITTED BY
592 | APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT
593 | HOLDERS AND/OR OTHER PARTIES PROVIDE THE PROGRAM "AS IS" WITHOUT WARRANTY
594 | OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO,
595 | THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
596 | PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE PROGRAM
597 | IS WITH YOU. SHOULD THE PROGRAM PROVE DEFECTIVE, YOU ASSUME THE COST OF
598 | ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
599 |
600 | 16. Limitation of Liability.
601 |
602 | IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING
603 | WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MODIFIES AND/OR CONVEYS
604 | THE PROGRAM AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY
605 | GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE
606 | USE OR INABILITY TO USE THE PROGRAM (INCLUDING BUT NOT LIMITED TO LOSS OF
607 | DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD
608 | PARTIES OR A FAILURE OF THE PROGRAM TO OPERATE WITH ANY OTHER PROGRAMS),
609 | EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF
610 | SUCH DAMAGES.
611 |
612 | 17. Interpretation of Sections 15 and 16.
613 |
614 | If the disclaimer of warranty and limitation of liability provided
615 | above cannot be given local legal effect according to their terms,
616 | reviewing courts shall apply local law that most closely approximates
617 | an absolute waiver of all civil liability in connection with the
618 | Program, unless a warranty or assumption of liability accompanies a
619 | copy of the Program in return for a fee.
620 |
621 | END OF TERMS AND CONDITIONS
622 |
623 | How to Apply These Terms to Your New Programs
624 |
625 | If you develop a new program, and you want it to be of the greatest
626 | possible use to the public, the best way to achieve this is to make it
627 | free software which everyone can redistribute and change under these terms.
628 |
629 | To do so, attach the following notices to the program. It is safest
630 | to attach them to the start of each source file to most effectively
631 | state the exclusion of warranty; and each file should have at least
632 | the "copyright" line and a pointer to where the full notice is found.
633 |
634 |
635 | Copyright (C)
636 |
637 | This program is free software: you can redistribute it and/or modify
638 | it under the terms of the GNU General Public License as published by
639 | the Free Software Foundation, either version 3 of the License, or
640 | (at your option) any later version.
641 |
642 | This program is distributed in the hope that it will be useful,
643 | but WITHOUT ANY WARRANTY; without even the implied warranty of
644 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
645 | GNU General Public License for more details.
646 |
647 | You should have received a copy of the GNU General Public License
648 | along with this program. If not, see .
649 |
650 | Also add information on how to contact you by electronic and paper mail.
651 |
652 | If the program does terminal interaction, make it output a short
653 | notice like this when it starts in an interactive mode:
654 |
655 | Copyright (C)
656 | This program comes with ABSOLUTELY NO WARRANTY; for details type `show w'.
657 | This is free software, and you are welcome to redistribute it
658 | under certain conditions; type `show c' for details.
659 |
660 | The hypothetical commands `show w' and `show c' should show the appropriate
661 | parts of the General Public License. Of course, your program's commands
662 | might be different; for a GUI interface, you would use an "about box".
663 |
664 | You should also get your employer (if you work as a programmer) or school,
665 | if any, to sign a "copyright disclaimer" for the program, if necessary.
666 | For more information on this, and how to apply and follow the GNU GPL, see
667 | .
668 |
669 | The GNU General Public License does not permit incorporating your program
670 | into proprietary programs. If your program is a subroutine library, you
671 | may consider it more useful to permit linking proprietary applications with
672 | the library. If this is what you want to do, use the GNU Lesser General
673 | Public License instead of this License. But first, please read
674 | .
675 |
--------------------------------------------------------------------------------
/SGD.py:
--------------------------------------------------------------------------------
1 | #!/usr/bin/env python3
2 | # -*- coding: utf-8 -*-
3 | """
4 | -------------------------------------------------------------------------------
5 | If you find this code useful please cite the article:
6 |
7 | Topology Optimization under Uncertainty using a Stochastic Gradient-based Approach
8 | Subhayan De, Jerrad Hampton, Kurt Maute, and Alireza Doostan (2020)
9 | Structural and Multidisciplinary Optimization, 62(5), 2255-2278.
10 | https://doi.org/10.1007/s00158-020-02599-z
11 |
12 | BibTeX entry:
13 | @article{de2020topology,
14 | title={Topology optimization under uncertainty using a stochastic gradient-based approach},
15 | author={De, Subhayan and Hampton, Jerrad and Maute, Kurt and Doostan, Alireza},
16 | journal={Structural and Multidisciplinary Optimization},
17 | volume={62},
18 | number={5},
19 | pages={2255--2278},
20 | year={2020},
21 | publisher={Springer}
22 | }
23 |
24 | Download the SGD module from https://github.com/CU-UQ/SGD.
25 | See the demo https://github.com/CU-UQ/SGD/blob/master/sgd_demo.py for an example of the implementation.
26 | For a description of the algorithms, see De et al (2020) (https://doi.org/10.1007/s00158-020-02599-z) and Ruder (2016) (https://arxiv.org/abs/1609.04747).
27 | Please report any bugs to Subhayan.De@colorado.edu
28 | Website: www.subhayande.com
29 | -------------------------------------------------------------------------------
30 |
31 | This is the class file that implements:
32 | (i) Stochastic Gradient Descent,
33 | (ii) SGD with Momentum,
34 | (iii) NAG,
35 | (iv) AdaGrad,
36 | (iv) RMSprop,
37 | (vi) Adam,
38 | (vii) Adamax,
39 | (viii) Adadelta,
40 | (ix) Nadam,
41 | (x) SAG,
42 | (xi) minibatch SGD,
43 | (xii) SVRG.
44 |
45 | NOTE: Currently, the stopping conditions are maximum number of iteration and 2nd norm of gradient vector
46 | and time-delay and exponential learnong schedules are implemented.
47 |
48 | Copyright (C) 2019 Subhayan De
49 |
50 | This program is free software: you can redistribute it and/or modify
51 | it under the terms of the GNU General Public License as published by
52 | the Free Software Foundation, either version 3 of the License, or
53 | (at your option) any later version.
54 |
55 | This program is distributed in the hope that it will be useful,
56 | but WITHOUT ANY WARRANTY; without even the implied warranty of
57 | MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
58 | GNU General Public License for more details.
59 |
60 | You should have received a copy of the GNU General Public License
61 | along with this program. If not, see .
62 |
63 | Created on Sat Jun 30 01:04:28 2018
64 | @author: Subhayan De
65 |
66 | Report any bugs to Subhayan.De@colorado.edu
67 |
68 | Author's note: add kSGD, 2nd order methods
69 | """
70 |
71 | import numpy as np
72 | import time
73 |
74 | # Print iterations progress
75 | def printProgressBar (iteration, total, prefix = '', suffix = '', decimals = 1, length = 100, fill = '█'):
76 | """
77 | Call in a loop to create terminal progress bar
78 | parameters:
79 | iteration - Required : current iteration (Int)
80 | total - Required : total iterations (Int)
81 | prefix - Optional : prefix string (Str)
82 | suffix - Optional : suffix string (Str)
83 | decimals - Optional : positive number of decimals in percent complete (Int)
84 | length - Optional : character length of bar (Int)
85 | fill - Optional : bar fill character (Str)
86 | """
87 | percent = ("{0:." + str(decimals) + "f}").format(100 * (iteration / float(total)))
88 | filledLength = int(length * iteration // total)
89 | bar = fill * filledLength + '-' * (length - filledLength)
90 | print('\r%s |%s| %s%% %s' % (prefix, bar, percent, suffix), end = '\r')
91 | # Print New Line on Complete
92 | if iteration == total:
93 | print()
94 |
95 |
96 | class SGD(object):
97 | """
98 | ==============================================================================
99 | | Stochastic Gradient Descent class |
100 | ==============================================================================
101 | Initialization:
102 | sgd = SGD(obj, grad, eta, param, iter, maxIter, objFun, gradFun,
103 | lowerBound, upperBound, stopGrad, momentum, nesterov,
104 | learnSched, lrParam)
105 |
106 | NOTE: To perform just one iteration provide either grad or gradFn.
107 | obj or objFn are optional.
108 | ==============================================================================
109 | Attributes:
110 | obj: objective (optional input)
111 | grad: Gradient information
112 | (array of dimension nParam-by-1, optional input)
113 | eta: learning rate ( = 1.0, default)
114 | param: the parameter vector (array of dimension nParam-by-1)
115 | nParam: number of parameters
116 | iter: iteration number
117 | maxIter: maximum iteration number (optional, default = 1)
118 | objFun: function handle to evaluate the objective
119 | (not required for maxit = 1 )
120 | gradFun: function handle to evaluate the gradient
121 | (not required for maxit = 1 )
122 | lowerBound: lower bound for the parameters (optional input)
123 | upperBound: upper bound for the parameters (optional input)
124 | paramHist: parameter evolution history
125 | stopGrad: stopping criterion based on 2-norm of gradient vector
126 | momentum: momentum parameter (default = 0)
127 | nesterov: set to True if Nesterov momentum equation to be used
128 | (default = False)
129 | learnSched: learning schedule (constant, exponential or time-based,
130 | default = constant)
131 | lrParam: learning schedule parameter (default =0.1)
132 | alg: algorithm used
133 | __version__:version of the code
134 | ==============================================================================
135 | Methods:
136 | Public:
137 | getParam: returns the parameter values
138 | getObj: returns the current objective value
139 | getGrad: returns the current gradient information
140 | update: perform a single iteration
141 | performIter: perform maxIter number of iterations
142 | getParamHist: returns parameter update history
143 | Private:
144 | __init___: initialization
145 | evaluateObjFn: evaluates the objective function
146 | evaluateGradFn: evaluates the gradients
147 | satisfyBounds: satisfies the parameter bounds
148 | learningSchedule: learning schedule
149 | stopCrit: check stopping criteria
150 | ==============================================================================
151 | Reference: Bottou, Léon, Frank E. Curtis, and Jorge Nocedal.
152 | "Optimization methods for large-scale machine learning."
153 | SIAM Review 60.2 (2018): 223-311.
154 | ==============================================================================
155 | written by Subhayan De (email: Subhayan.De@colorado.edu), July, 2018.
156 | ==============================================================================
157 | """
158 | def __init__(self,**kwargs):
159 | allowed_kwargs = {'obj', 'grad', 'param', 'eta', 'iter', 'maxiter', 'objFun', 'gradFun', 'lowerBound', 'upperBound', 'oldGrad', 'stopGrad', 'momentum', 'nesterov','learnSched', 'lrParam'}
160 | for k in kwargs:
161 | if k not in allowed_kwargs:
162 | raise TypeError('Unexpected keyword argument passed to optimizer at: ' + str(k))
163 |
164 | self.__dict__.update(kwargs)
165 | self.nParam = np.size(self.param)
166 | # Checks and setting default values
167 | # Iteration numbers
168 | if hasattr(self,'iter') == False:
169 | self.iter = 0 # set the iteration number
170 | self.currentIter = self.iter
171 | # stopping criteria
172 | # max iteration no.
173 | if hasattr(self,'maxiter') == False:
174 | self.maxiter = 1 # set the default max iteration number
175 | # minimum gradient
176 | if hasattr(self,'stopGrad') == False:
177 | self.stopGrad = 1e-6
178 | # Parameter values
179 | if hasattr(self,'param') == False:
180 | raise ValueError('Parameter vector is missing')
181 | # Gradient information
182 | if hasattr(self,'grad') == False:
183 | print('No gradient information provided at iteration: 1')
184 | if hasattr(self,'gradFun') == False:
185 | raise ValueError('Please provide the gradient function')
186 | elif np.size(self.grad) != self.nParam:
187 | raise ValueError('Gradient dimension mismatch')
188 | if self.maxiter > 1 and hasattr(self,'gradFun') == False:
189 | raise ValueError('Please provide the gradient function')
190 | # Objective values
191 | if hasattr(self,'objFun') == False and self.maxiter > 1:
192 | raise ValueError('Please provide the objective function')
193 | if hasattr(self,'obj') == False:
194 | self.obj = np.array([])
195 | if hasattr(self,'objFun'):
196 | self.evaluateObjFn(self)
197 | else:
198 | self.obj = np.array([self.obj])
199 | # Learning rate
200 | if hasattr(self,'eta') == False:
201 | self.eta = 1.0
202 | print('*NOTE: No learning rate provided, assumed as 1.0')
203 | else:
204 | print('Learning rate = ',self.eta,'\n')
205 | if hasattr(self,'lowerBound') == False:
206 | self.lowerBound = -np.inf*np.ones(self.nParam)
207 | elif np.size(self.lowerBound) == 1:
208 | self.lowerBound = self.lowerBound*np.ones(self.nParam)
209 | else:
210 | raise ValueError('parameter lower bound dimension mismatch')
211 | # Set the upper bounds
212 | if hasattr(self,'upperBound') == False:
213 | self.upperBound = np.inf*np.ones(self.nParam)
214 | elif np.size(self.upperBound) == 1:
215 | self.upperBound = self.upperBound*np.ones(self.nParam)
216 | else:
217 | raise ValueError('parameter upper bound dimension mismatch')
218 | # Momentum
219 | #self.alg = 'SGD with Momentum'
220 | if hasattr(self,'alg') == False:
221 | self.alg = 'SGD+momentum'
222 | if hasattr(self,'momentum') == False:
223 | self.alg = 'SGD'
224 | self.momentum = 0.0;
225 | self.paramHist = np.reshape(self.param,(2,1))
226 | self.__version__ = '0.0.1'
227 | self.stop = False
228 | self.updateParam = np.zeros(self.nParam)
229 | # Nesterov momentum
230 | if hasattr(self, 'nesterov'):
231 | if self.nesterov == True:
232 | self.alg = 'SGD+Nesterov momentum'
233 | if hasattr(self,'gradFun') == False:
234 | raise ValueError('provide gradient function information with Nesterov')
235 | else:
236 | self.nesterov = False
237 | # learning schedule
238 | if hasattr(self,'learnSched') == False:
239 | self.learnSched = 'constant'
240 | elif self.learnSched != 'exponential' and self.learnSched != 'time-based':
241 | print('no such learning schedule in this module\nSet to constant')
242 | self.learnSched = 'constant'
243 | elif hasattr(self,'lrParam') == False:
244 | self.lrParam = 0.1
245 | print('Learning schedule: ',self.learnSched)
246 |
247 |
248 | def __version__(self):
249 | """
250 | version of the code
251 | """
252 | print(self.__version__)
253 |
254 | def getParam(self):
255 | """
256 | To get the next parameter values
257 | """
258 | print(self.nParam,'parameters have been updated!\n')
259 | return self.param
260 |
261 | def getObj(self):
262 | """
263 | To get the current objective (if possible)
264 | """
265 | self.evaluateObjFn()
266 | return self.obj
267 |
268 | def getGrad(self):
269 | """
270 | To get the gradients
271 | """
272 | return self.grad
273 |
274 | def getParamHist(self):
275 | """
276 | To get parameter history
277 | """
278 | return self.paramHist
279 |
280 | def evaluateObjFn(self):
281 | """
282 | This evalutes the objective function
283 | objFun should be a function handle with input: param, output: objective
284 | """
285 | if not self.obj.any():
286 | print('No objective information provided to SGD')
287 | else:
288 | self.obj = np.append(self.obj,self.objFun(self.param))
289 | #print('Current objective value: ', self.obj[self.currentIter],'\n')
290 |
291 | def evaluateGradFn(self):
292 | """
293 | This evalutes the gradient function for i-th data point, where i in [0, n]
294 | gradFun should be a function handle with input: param, output: gradient
295 | """
296 | self.grad = self.gradFun(self.param)
297 |
298 | def satisfyBounds(self):
299 | """
300 | This satisfies the parameter bounds (if any)
301 | """
302 | # Set the lower bounds
303 | #print(self.lowerBound)
304 |
305 | # Satisfy the bounds
306 | for i in range(self.nParam):
307 | if self.param[i] > self.upperBound[i]:
308 | self.param[i] = self.upperBound[i]
309 | elif self.param[i] < self.lowerBound[i]:
310 | self.param[i] = self.lowerBound[i]
311 |
312 | def update(self):
313 | """
314 | Perform one iteration of SGD
315 | """
316 | # Perform one iteration of SGD
317 | SGD.learningSchedule(self)
318 | if self.nesterov == True:
319 | grdnt = self.gradFun(self.param - self.momentum*self.updateParam)
320 | self.updateParam = self.updateParam*self.momentum + self.etaCurrent*grdnt
321 | else:
322 | self.updateParam = self.updateParam*self.momentum + self.etaCurrent*self.grad
323 | self.param=self.param - self.updateParam
324 | #self.param=self.param - self.eta*self.grad
325 | # satisfy the parameter bounds
326 | SGD.satisfyBounds(self)
327 | self.paramHist = np.append(self.paramHist,np.reshape(self.param,(2,1)), axis = 1)
328 | #print('One iteration of Stochatsic Gradient Descent has been performed successfully!\n')
329 |
330 | def performIter(self):
331 | """
332 | Performs all the iterations of SGD
333 | """
334 | SGD.printAlg(self)
335 | # initialize progress bar
336 | printProgressBar(0, self.maxiter, prefix = self.alg, suffix = 'Complete', length = 25)
337 | self.t = time.clock()
338 | for i in range(self.iter,self.maxiter,1):
339 | if self.stop == True:
340 | break
341 | #print('iteration', i+1, 'out of', self.maxiter)
342 | self.update()
343 | self.currentIter = i+1
344 | # print progress bar
345 | SGD.printProgress(self)
346 | # Update the objective and gradient
347 | if self.maxiter > 1: # since objFun and gradFun are optional for 1 iteration
348 | SGD.evaluateObjFn(self)
349 | SGD.evaluateGradFn(self)
350 | SGD.stopCrit(self)
351 |
352 | def stopCrit(self):
353 | """
354 | Checks stopping criteria
355 | """
356 | if self.grad.ndim >1:
357 | self.avgGrad = np.mean(self.grad,axis =1)
358 | if np.linalg.norm(self.avgGrad) 1: # since objFun and gradFun are optional for 1 iteration
498 | SGD.evaluateObjFn(self)
499 | SGD.evaluateGradFn(self)
500 | SGD.stopCrit(self)
501 |
502 | def getGradHist(self):
503 | """
504 | Returns accumulated gradient history
505 | """
506 | return self.gradHist
507 |
508 | class RMSprop(SGD):
509 | """
510 | ==============================================================================
511 | | RMSprop class |
512 | | derived class from Stochastic Gradient Descent |
513 | ==============================================================================
514 | Initialization:
515 | rp = RMSprop(gradHist, updatehist, rho, obj, grad, eta, param,
516 | iter, maxIter, objFun, gradFun, lowerBound, upperBound)
517 | NOTE: gradHist: historical information of gradients
518 | (array of dimension nparam-by-1)
519 | this should equal to zero for 1st iteration
520 | ==============================================================================
521 | Attributes:
522 | grad: Gradient information (array of dimension nParam-by-1)
523 | eta: learning rate = 1 by default
524 | param: the parameter vector (array of dimension nParam-by-1)
525 | nParam: number of parameters
526 | gradHist: gradient history accumulator (see the algorithm)
527 | epsilon: square-root of machine-precision
528 | (required to avoid division by zero)
529 | rho: exponential decay rate (0.95 may be a good choice)
530 | iter: iteration number (optional)
531 | maxIter: maximum iteration number (optional input, default = 1)
532 | objFun: function handle to evaluate the objective
533 | (not required for maxit = 1 )
534 | gradFun: function handle to evaluate the gradient
535 | (not required for maxit = 1 )
536 | lowerBound: lower bound for the parameters (optional input)
537 | upperBound: upper bound for the parameters (optional input)
538 | stopGrad: stopping criterion based on 2-norm of gradient vector
539 | (default 10^-6)
540 | alg: algorithm used
541 | __version__: version of the code
542 | ==============================================================================
543 | Methods:
544 | Public:
545 | performIter:performs all the iterations inside a for loop
546 | getGradHist:returns gradient history (default is zero)
547 | Inherited:
548 | getParam: returns the parameter values
549 | getObj: returns the current objective value
550 | getGrad: returns the current gradient information
551 | getParamHist: returns parameter update history
552 | Private: (should not be called outside this class file)
553 | __init__: initialization
554 | update: performs one iteration of Adadelta
555 | Inherited:
556 | evaluateObjFn: evaluates the objective function
557 | evaluateGradFn: evaluates the gradients
558 | satisfyBounds: satisfies the parameter bounds
559 | learningSchedule: learning schedule
560 | stopCrit: check stopping criteria
561 | ==============================================================================
562 | Reference: Geoffrey Hinton
563 | "rmsprop: Divide the gradient by a running average of its recent magnitude."
564 | http://www.cs.toronto.edu/~tijmen/csc321/slides/lecture_slides_lec6.pdf.
565 | ==============================================================================
566 | written by Subhayan De (email: Subhayan.De@colorado.edu), July, 2018.
567 | ==============================================================================
568 | """
569 | def __init__(self,gradHist=0.0,rho=0.9,**kwargs):
570 | """ Initialize the Adadelta class object.
571 | This can be used to perform one iteration of Adadelta.
572 | """
573 | self.alg = 'RMSprop'
574 | SGD.printAlg(self)
575 | SGD.__init__(self,**kwargs)
576 | self.epsilon=np.finfo(float).eps # The machine precision
577 | # Initialize gradient history
578 | if np.sum(gradHist) != 0.0:
579 | if np.size(gradHist) != self.nParam:
580 | raise ValueError('Gradient history dimension mismatch')
581 | else:
582 | self.gradHist=np.reshape(gradHist,(self.nParam))
583 | else:
584 | self.gradHist = np.zeros(self.nParam)
585 | # Initialize rho
586 | self.rho = rho
587 |
588 | def update(self):
589 | """
590 | Perform one iteration of RMSprop
591 | """
592 | # update gradient history acccumulator
593 | SGD.learningSchedule(self)
594 | self.gradHist+=self.rho*self.gradHist+(1.0-self.rho)*np.multiply(self.grad,self.grad); # Sum of gradient history
595 | # Perform one iteration of RMSprop
596 | RMSg = np.sqrt(self.gradHist)+self.epsilon
597 | updateParam = ((np.divide(self.grad,RMSg)))
598 | self.param=self.param-self.etaCurrent*updateParam
599 | SGD.satisfyBounds(self)
600 | self.paramHist = np.append(self.paramHist,np.reshape(self.param,(2,1)), axis = 1)
601 | #print('One iteration of RMSprop has been performed successfully!\n')
602 |
603 | def performIter(self):
604 | """
605 | Performs all the iterations of RMSprop
606 | """
607 | # initialize progress bar
608 | printProgressBar(0, self.maxiter, prefix = self.alg, suffix = 'Complete', length = 25)
609 | self.t = time.clock()
610 | for i in range(self.iter,self.maxiter,1):
611 | if self.stop == True:
612 | break
613 | #print('iteration', i+1, 'out of', self.maxiter)
614 | self.update()
615 | self.currentIter = i+1
616 | # print progress bar
617 | SGD.printProgress(self)
618 | # Update the objective and gradient
619 | if self.maxiter > 1: # since objFun and gradFun are optional for 1 iteration
620 | SGD.evaluateObjFn(self)
621 | SGD.evaluateGradFn(self)
622 | SGD.stopCrit(self)
623 |
624 | def getGradHist(self):
625 | """
626 | This returns the gradient history
627 | """
628 | return self.gradHist
629 |
630 | class Adam(SGD):
631 | """
632 | ==============================================================================
633 | | Adaptive moment estimation (Adam) class |
634 | | derived class from Stochastic Gradient Descent |
635 | ==============================================================================
636 | Initialization:
637 | adm = Adam(m, v, beta1, beta2, obj, grad, eta, param,
638 | iter, maxIter, objFun, gradFun, lowerBound, upperBound)
639 |
640 | ==============================================================================
641 | Attributes:
642 | grad: Gradient information (array of dimension nParam-by-1)
643 | eta: learning rate
644 | param: the parameter vector (array of dimension nParam-by-1)
645 | nParam: number of parameters
646 | beta1, beta2: exponential decay rates in [0,1)
647 | (default beta1 = 0.9, beta2 = 0.999)
648 | m: First moment (array of dimension nParam-by-1)
649 | v: Second raw moment (array of dimension nParam-by-1)
650 | epsilon: square-root of machine-precision
651 | (required to avoid division by zero)
652 | iter: iteration number
653 | maxIter: maximum iteration number (optional input, default = 1)
654 | objFun: function handle to evaluate the objective
655 | (not required for maxit = 1 )
656 | gradFun: function handle to evaluate the gradient
657 | (not required for maxit = 1 )
658 | lowerBound: lower bound for the parameters (optional input)
659 | upperBound: upper bound for the parameters (optional input)
660 | stopGrad: stopping criterion based on 2-norm of gradient vector
661 | (default 10^-6)
662 | alg: algorithm used
663 | __version__: version of the code
664 | ==============================================================================
665 | Methods:
666 | Public:
667 | performIter: performs all the iterations inside a for loop
668 | getGradHist: returns gradient history (default is zero)
669 | getMoments: returns history of moments
670 | Inherited:
671 | getParam: returns the parameter values
672 | getObj: returns the current objective value
673 | getGrad: returns the current gradient information
674 | getParamHist: returns parameter update history
675 | Private: (should not be called outside this class file)
676 | __init__: initialization
677 | update: performs one iteration of Adam
678 | Inherited:
679 | evaluateObjFn: evaluates the objective function
680 | evaluateGradFn: evaluates the gradients
681 | satisfyBounds: satisfies the parameter bounds
682 | learningSchedule: learning schedule
683 | stopCrit: check stopping criteria
684 | ==============================================================================
685 | Reference: Kingma, Diederik P., and Jimmy Ba.
686 | "Adam: A method for stochastic optimization."
687 | arXiv preprint arXiv:1412.6980 (2014).
688 | ==============================================================================
689 | written by Subhayan De (email: Subhayan.De@colorado.edu), July, 2018.
690 | ==============================================================================
691 | """
692 | def __init__(self,m = 0.0,v = 0.0,beta1 = 0.9,beta2 = 0.99,**kwargs):
693 | # def __init__(self,grad,learningRate,parameters,numParam,gradHist,beta1,beta2):
694 | """ Initialize the adagrad class object.
695 | This can be used to perform one iteration of Adam.
696 | """
697 | self.alg = 'Adam'
698 | SGD.printAlg(self)
699 | self.beta1 = beta1 # decay rate (beta1 = 0.9 is a good suggestion)
700 | self.beta2 = beta2 # decay rate (beta2 = 0.999 is a good suggetion)
701 | self.epsilon=np.finfo(float).eps # The machine precision
702 | SGD.__init__(self,**kwargs)
703 | # Initialize first moment
704 | if np.sum(m) != 0.0:
705 | if np.size(m) != self.nParam:
706 | raise ValueError('First moment dimension mismatch')
707 | else:
708 | self.m=np.reshape(m,(self.nParam))
709 | else:
710 | self.m = np.zeros(self.nParam)
711 | # Initialize second raw moment
712 | if np.sum(v) != 0.0:
713 | if np.size(v) != self.nParam:
714 | raise ValueError('Second raw moment dimension mismatch')
715 | else:
716 | self.v=np.reshape(v,(self.nParam))
717 | else:
718 | self.v = np.zeros(self.nParam)
719 |
720 | def update(self):
721 | """ Perform one iteration of Adam
722 | """
723 | SGD.learningSchedule(self)
724 | # Moment updates
725 | self.m = self.beta1*self.m + (1.0-self.beta1)*self.grad # Update biased first moment estimate
726 | self.mHat = self.m/(1.0-self.beta1**(self.currentIter+1)) # Compute bias-corrected first moment estimate
727 | #print(self.mHat)
728 | self.v = self.beta2*self.v + (1.0-self.beta2)*np.multiply(self.grad,self.grad) # Update biased second moment estimate
729 | self.vHat = self.v/(1.0-self.beta2**(self.currentIter+1)) # Compute bias-corrected second moment estimate
730 | # Parameter updates
731 | self.param = self.param - np.divide((self.etaCurrent*self.mHat),(np.sqrt(self.vHat))+self.epsilon)
732 | SGD.satisfyBounds(self)
733 | self.paramHist = np.append(self.paramHist,np.reshape(self.param,(2,1)), axis = 1)
734 | #print('One iteration of Adam has been performed successfully!\n')
735 |
736 | def performIter(self):
737 | """
738 | Performs all the iterations of Adam
739 | """
740 | # initialize progress bar
741 | printProgressBar(0, self.maxiter, prefix = self.alg, suffix = 'Complete', length = 25)
742 | self.t = time.clock()
743 | for i in range(self.iter,self.maxiter,1):
744 | if self.stop == True:
745 | break
746 | #print('iteration', i+1, 'out of', self.maxiter)
747 | self.update()
748 | self.currentIter = i+1
749 | # print progress bar
750 | SGD.printProgress(self)
751 | # Update the objective and gradient
752 | if self.maxiter > 1: # since objFun and gradFun are optional for 1 iteration
753 | SGD.evaluateObjFn(self)
754 | SGD.evaluateGradFn(self)
755 | SGD.stopCrit(self)
756 |
757 | def getMoments(self):
758 | """
759 | This returns the updated moments
760 | """
761 | return self.m, self.v
762 |
763 | class Adamax(SGD):
764 | """
765 | ==============================================================================
766 | | Adaptive moment estimation (Adamax) class |
767 | | derived class from Stochastic Gradient Descent |
768 | ==============================================================================
769 | Initialization:
770 | admx = Adamax(m, v, beta1, beta2, obj, grad, eta, param,
771 | iter, maxIter, objFun, gradFun, lowerBound, upperBound)
772 |
773 | ==============================================================================
774 | Attributes: (all private)
775 | grad: Gradient information (array of dimension nParam-by-1)
776 | eta: learning rate
777 | param: the parameter vector (array of dimension nParam-by-1)
778 | nParam: number of parameters
779 | beta1, beta2: exponential decay rates in [0,1)
780 | (default beta1 = 0.9, beta2 = 0.999)
781 | m: First moment (array of dimension nParam-by-1)
782 | u: infinity norm constrained second moment
783 | (array of dimension nParam-by-1)
784 | epsilon: square-root of machine-precision
785 | (required to avoid division by zero)
786 | iter: iteration number
787 | maxIter: maximum iteration number (optional input, default = 1)
788 | objFun: function handle to evaluate the objective
789 | (not required for maxit = 1 )
790 | gradFun: function handle to evaluate the gradient
791 | (not required for maxit = 1 )
792 | lowerBound: lower bound for the parameters (optional input)
793 | upperBound: upper bound for the parameters (optional input)
794 | stopGrad: stopping criterion based on 2-norm of gradient vector
795 | (default 10^-6)
796 | alg: algorithm used
797 | __version__: version of the code
798 | ==============================================================================
799 | Methods:
800 | Public:
801 | performIter: performs all the iterations inside a for loop
802 | getGradHist: returns gradient history (default is zero)
803 | getMoments: returns history of moments
804 | Inherited:
805 | getParam: returns the parameter values
806 | getObj: returns the current objective value
807 | getGrad: returns the current gradient information
808 | getParamHist: returns parameter update history
809 | Private: (should not be called outside this class file)
810 | __init__: initialization
811 | update: performs one iteration of Adam
812 | Inherited:
813 | evaluateObjFn: evaluates the objective function
814 | evaluateGradFn: evaluates the gradients
815 | satisfyBounds: satisfies the parameter bounds
816 | learningSchedule: learning schedule
817 | stopCrit: check stopping criteria
818 | ==============================================================================
819 | Reference: Kingma, Diederik P., and Jimmy Ba.
820 | "Adam: A method for stochastic optimization."
821 | arXiv preprint arXiv:1412.6980 (2014).
822 | ==============================================================================
823 | written by Subhayan De (email: Subhayan.De@colorado.edu), July, 2018.
824 | ==============================================================================
825 | """
826 | def __init__(self,m = 0.0,u = 0.0,beta1 = 0.9,beta2 = 0.99,**kwargs):
827 | # def __init__(self,grad,learningRate,parameters,numParam,gradHist,beta1,beta2):
828 | """ Initialize the adagrad class object.
829 | This can be used to perform one iteration of Adamax.
830 | """
831 | self.alg = 'Adamax'
832 | SGD.printAlg(self)
833 | self.beta1 = beta1 # decay rate (beta1 = 0.9 is a good suggestion)
834 | self.beta2 = beta2 # decay rate (beta2 = 0.999 is a good suggetion)
835 | self.epsilon=np.finfo(float).eps # The machine precision
836 | SGD.__init__(self,**kwargs)
837 | # Initialize first moment
838 | if np.sum(m) != 0.0:
839 | if np.size(m) != self.nParam:
840 | raise ValueError('First moment dimension mismatch')
841 | else:
842 | self.m=np.reshape(m,(self.nParam))
843 | else:
844 | self.m = np.zeros(self.nParam)
845 | # Initialize second raw moment
846 | if np.sum(u) != 0.0:
847 | if np.size(u) != self.nParam:
848 | raise ValueError('Second raw moment dimension mismatch')
849 | else:
850 | self.u=np.reshape(u,(self.nParam))
851 | else:
852 | self.u = np.zeros(self.nParam)
853 |
854 | def update(self):
855 | """ Perform one iteration of Adamax
856 | """
857 | SGD.learningSchedule(self)
858 | # Moment updates
859 | self.m = self.beta1*self.m + (1.0-self.beta1)*self.grad # Update biased first moment estimate
860 | self.mHat = self.m/(1.0-self.beta1**(self.currentIter+1)) # Compute bias-corrected first moment estimate
861 | self.u = np.maximum(self.beta2*self.u,np.abs(self.grad))
862 | # self.v = self.beta2*self.v + (1.0-self.beta2)*np.multiply(self.grad,self.grad) # Update biased second moment estimate
863 | # Parameter updates
864 | self.param = self.param - np.divide((self.etaCurrent*self.mHat),self.u)
865 | SGD.satisfyBounds(self)
866 | self.paramHist = np.append(self.paramHist,np.reshape(self.param,(2,1)), axis = 1)
867 | #print('One iteration of Adamax has been performed successfully!\n')
868 |
869 | def performIter(self):
870 | """
871 | Performs all the iterations of Adamax
872 | """
873 | # initialize progress bar
874 | printProgressBar(0, self.maxiter, prefix = self.alg, suffix = 'Complete', length = 25)
875 | self.t = time.clock()
876 | for i in range(self.iter,self.maxiter,1):
877 | if self.stop == True:
878 | break
879 | #print('iteration', i+1, 'out of', self.maxiter)
880 | self.update()
881 | self.currentIter = i+1
882 | # print progress bar
883 | SGD.printProgress(self)
884 | # Update the objective and gradient
885 | if self.maxiter > 1: # since objFun and gradFun are optional for 1 iteration
886 | SGD.evaluateObjFn(self)
887 | SGD.evaluateGradFn(self)
888 | SGD.stopCrit(self)
889 |
890 | def getMoments(self):
891 | """
892 | This returns the updated moments
893 | """
894 | return self.m, self.v
895 |
896 | class Adadelta(SGD):
897 | """
898 | ==============================================================================
899 | | ADADELTA class |
900 | | derived class from Stochastic Gradient Descent |
901 | ==============================================================================
902 | Initialization:
903 | add = Adadelta(gradHist, updatehist, rho, obj, grad, eta, param,
904 | iter, maxIter, objFun, gradFun, lowerBound, upperBound)
905 | NOTE: gradHist: historical information of gradients
906 | (array of dimension nparam-by-1)
907 | this should equal to zero for 1st iteration
908 | ==============================================================================
909 | Attributes: (all private)
910 | grad: Gradient information (array of dimension nParam-by-1)
911 | eta: learning rate = 1 by default
912 | param: the parameter vector (array of dimension nParam-by-1)
913 | nParam: number of parameters
914 | gradHist: gradient history accumulator (see the algorithm)
915 | updateHist: parameter update history accumulator
916 | epsilon: square-root of machine-precision
917 | (required to avoid division by zero)
918 | rho: exponential decay rate (0.95 may be a good choice)
919 | iter: iteration number (optional)
920 | maxIter: maximum iteration number (optional input, default = 1)
921 | objFun: function handle to evaluate the objective
922 | (not required for maxit = 1 )
923 | gradFun: function handle to evaluate the gradient
924 | (not required for maxit = 1 )
925 | lowerBound: lower bound for the parameters (optional input)
926 | upperBound: upper bound for the parameters (optional input)
927 | stopGrad: stopping criterion based on 2-norm of gradient vector
928 | (default 10^-6)
929 | alg: algorithm used
930 | __version__: version of the code
931 | ==============================================================================
932 | Methods:
933 | Public:
934 | performIter:performs all the iterations inside a for loop
935 | getGradHist:returns gradient history (default is zero)
936 | Inherited:
937 | getParam: returns the parameter values
938 | getObj: returns the current objective value
939 | getGrad: returns the current gradient information
940 | getParamHist: returns parameter update history
941 | Private: (should not be called outside this class file)
942 | __init__: initialization
943 | update: performs one iteration of Adadelta
944 | Inherited:
945 | evaluateObjFn: evaluates the objective function
946 | evaluateGradFn: evaluates the gradients
947 | satisfyBounds: satisfies the parameter bounds
948 | learningSchedule: learning schedule
949 | stopCrit: check stopping criteria
950 | ==============================================================================
951 | Reference: Zeiler, Matthew D.
952 | "Adadelta: an adaptive learning rate method."
953 | arXiv preprint arXiv:1212.5701 (2012).
954 | ==============================================================================
955 | written by Subhayan De (email: Subhayan.De@colorado.edu), July, 2018.
956 | ==============================================================================
957 | """
958 | def __init__(self,gradHist=0.0,updateHist=0.0,rho=0.95,**kwargs):
959 | """ Initialize the Adadelta class object.
960 | This can be used to perform one iteration of Adadelta.
961 | """
962 | self.alg = 'Adadelta'
963 | SGD.printAlg(self)
964 | SGD.__init__(self,**kwargs)
965 | self.epsilon=np.finfo(float).eps # The machine precision
966 | # Initialize gradient history
967 | if np.sum(gradHist) != 0.0:
968 | if np.size(gradHist) != self.nParam:
969 | raise ValueError('Gradient history dimension mismatch')
970 | else:
971 | self.gradHist=np.reshape(gradHist,(self.nParam))
972 | else:
973 | self.gradHist = np.zeros(self.nParam)
974 | # Initialize parameter history
975 | if np.sum(updateHist) != 0.0:
976 | if np.size(updateHist) != self.nParam:
977 | raise ValueError('Gradient history dimension mismatch')
978 | else:
979 | self.updateHist=np.reshape(updateHist,(self.nParam))
980 | else:
981 | self.updateHist = np.zeros(self.nParam)
982 | # Initialize rho
983 | self.rho = rho
984 | # Set eta to 1.0
985 | if self.eta!=1.0:
986 | print('Learning rate = ',self.eta,'!= 1.0\nSo, the learning rate is set to 1.0\n')
987 | self.eta = 1.0
988 |
989 | def update(self):
990 | """
991 | Perform one iteration of Adadelta
992 | """
993 | self.epsilon = 1e-6
994 | if self.currentIter<200:
995 | self.epsilon = 0.1
996 | else:
997 | self.epsilon = 1e-6
998 | SGD.learningSchedule(self)
999 | # update gradient history acccumulator
1000 | self.gradHist+=self.rho*self.gradHist+(1.0-self.rho)*np.multiply(self.grad,self.grad); # Sum of gradient history
1001 | # Perform one iteration of Adadelta
1002 | RMSdx = np.sqrt(self.updateHist)+self.epsilon
1003 | RMSg = np.sqrt(self.gradHist)+self.epsilon
1004 | updateParam = np.multiply((np.divide(RMSdx,RMSg)),self.grad)
1005 | self.param=self.param-self.etaCurrent*updateParam
1006 | SGD.satisfyBounds(self)
1007 | self.paramHist = np.append(self.paramHist,np.reshape(self.param,(2,1)), axis = 1)
1008 | #print('One iteration of Adadelta has been performed successfully!\n')
1009 | # update parameter history accumulator
1010 | self.updateHist = self.rho*self.updateHist+(1.0-self.rho)*np.multiply(updateParam,updateParam)
1011 |
1012 | def performIter(self):
1013 | """
1014 | Performs all the iterations of Adadelta
1015 | """
1016 | # initialize progress bar
1017 | printProgressBar(0, self.maxiter, prefix = self.alg, suffix = 'Complete', length = 25)
1018 | self.t = time.clock()
1019 | for i in range(self.iter,self.maxiter,1):
1020 | if self.stop == True:
1021 | break
1022 | #print('iteration', i+1, 'out of', self.maxiter)
1023 | self.update()
1024 | self.currentIter = i+1
1025 | # print progress bar
1026 | SGD.printProgress(self)
1027 | # Update the objective and gradient
1028 | if self.maxiter > 1: # since objFun and gradFun are optional for 1 iteration
1029 | SGD.evaluateObjFn(self)
1030 | SGD.evaluateGradFn(self)
1031 | SGD.stopCrit(self)
1032 |
1033 | def getGradHist(self):
1034 | """
1035 | This returns the gradient history
1036 | """
1037 | return self.gradHist
1038 |
1039 | def getUpdateHist(self):
1040 | """
1041 | This returns the parameter update history
1042 | """
1043 | self.updateHist
1044 |
1045 | class Nadam(SGD):
1046 | """
1047 | ==============================================================================
1048 | | Nesterov-accelerated Adaptive moment estimation (Nadam) class |
1049 | | derived class from Stochastic Gradient Descent |
1050 | ==============================================================================
1051 | Initialization:
1052 | nadm = Nadam(m, v, beta1, beta2, obj, grad, eta, param, iter,
1053 | maxIter, objFun, gradFun, lowerBound, upperBound)
1054 |
1055 | ==============================================================================
1056 | Attributes: (all private)
1057 | grad: Gradient information (array of dimension nParam-by-1)
1058 | eta: learning rate
1059 | param: the parameter vector (array of dimension nParam-by-1)
1060 | nParam: number of parameters
1061 | beta1, beta2: exponential decay rates in [0,1)
1062 | (default beta1 = 0.9, beta2 = 0.999)
1063 | m: First moment (array of dimension nParam-by-1)
1064 | v: Second raw moment (array of dimension nParam-by-1)
1065 | epsilon: square-root of machine-precision
1066 | (required to avoid division by zero)
1067 | iter: iteration number
1068 | maxIter: maximum iteration number (optional input, default = 1)
1069 | objFun: function handle to evaluate the objective
1070 | (not required for maxit = 1 )
1071 | gradFun: function handle to evaluate the gradient
1072 | (not required for maxit = 1 )
1073 | lowerBound: lower bound for the parameters (optional input)
1074 | upperBound: upper bound for the parameters (optional input)
1075 | stopGrad: stopping criterion based on 2-norm of gradient vector
1076 | (default 10^-6)
1077 | alg: algorithm used
1078 | __version__: version of the code
1079 | ==============================================================================
1080 | Methods:
1081 | Public:
1082 | performIter: performs all the iterations inside a for loop
1083 | getGradHist: returns gradient history (default is zero)
1084 | getMoments: returns history of moments
1085 | Inherited:
1086 | getParam: returns the parameter values
1087 | getObj: returns the current objective value
1088 | getGrad: returns the current gradient information
1089 | getParamHist: returns parameter update history
1090 | Private: (should not be called outside this class file)
1091 | __init__: initialization
1092 | update: performs one iteration of Adam
1093 | Inherited:
1094 | evaluateObjFn: evaluates the objective function
1095 | evaluateGradFn: evaluates the gradients
1096 | satisfyBounds: satisfies the parameter bounds
1097 | learningSchedule: learning schedule
1098 | stopCrit: check stopping criteria
1099 | ==============================================================================
1100 | Reference: Timothy Dozat.
1101 | "Incorporating Nesterov Momentum into Adam".
1102 | ICLR Workshop, (1):2013–2016, 2016.
1103 | ==============================================================================
1104 | written by Subhayan De (email: Subhayan.De@colorado.edu), July, 2018.
1105 | ==============================================================================
1106 | """
1107 | def __init__(self,m = 0.0,v = 0.0,beta1 = 0.9,beta2 = 0.99,**kwargs):
1108 | # def __init__(self,grad,learningRate,parameters,numParam,gradHist,beta1,beta2):
1109 | """ Initialize the adagrad class object.
1110 | This can be used to perform one iteration of Adam.
1111 | """
1112 | self.alg = 'Nadam'
1113 | SGD.printAlg(self)
1114 | self.beta1 = beta1 # decay rate (beta1 = 0.9 is a good suggestion)
1115 | self.beta2 = beta2 # decay rate (beta2 = 0.999 is a good suggetion)
1116 | self.epsilon=np.finfo(float).eps # The machine precision
1117 | SGD.__init__(self,**kwargs)
1118 | # Initialize first moment
1119 | if np.sum(m) != 0.0:
1120 | if np.size(m) != self.nParam:
1121 | raise ValueError('First moment dimension mismatch')
1122 | else:
1123 | self.m=np.reshape(m,(self.nParam))
1124 | else:
1125 | self.m = np.zeros(self.nParam)
1126 | # Initialize second raw moment
1127 | if np.sum(v) != 0.0:
1128 | if np.size(v) != self.nParam:
1129 | raise ValueError('Second raw moment dimension mismatch')
1130 | else:
1131 | self.v=np.reshape(v,(self.nParam))
1132 | else:
1133 | self.v = np.zeros(self.nParam)
1134 |
1135 |
1136 | def update(self):
1137 | """
1138 | Perform one iteration of Nadam
1139 | """
1140 | SGD.learningSchedule(self)
1141 | # Moment updates
1142 | self.m = self.beta1*self.m + (1.0-self.beta1)*self.grad # Update biased first moment estimate
1143 | self.mHat = self.m/(1.0-self.beta1**(self.currentIter+1)) # Compute bias-corrected first moment estimate
1144 | self.v = self.beta2*self.v + (1.0-self.beta2)*np.multiply(self.grad,self.grad) # Update biased second moment estimate
1145 | self.vHat = self.v/(1.0-self.beta2**(self.currentIter+1)) # Compute bias-corrected second moment estimate
1146 | # Parameter updates
1147 | mHat2 = self.beta1*self.mHat+(1.0-self.beta1)*self.grad/(1.0-self.beta1**(self.currentIter+1))
1148 | self.param = self.param - np.divide((self.etaCurrent*mHat2),(np.sqrt(self.vHat))+self.epsilon)
1149 | SGD.satisfyBounds(self)
1150 | self.paramHist = np.append(self.paramHist,np.reshape(self.param,(2,1)), axis = 1)
1151 | #print('One iteration of Nadam has been performed successfully!\n')
1152 |
1153 | def performIter(self):
1154 | """
1155 | Performs all the iterations of Nadam
1156 | """
1157 | # initialize progress bar
1158 | printProgressBar(0, self.maxiter, prefix = self.alg, suffix = 'Complete', length = 25)
1159 | self.t = time.clock()
1160 | for i in range(self.iter,self.maxiter,1):
1161 | if self.stop == True:
1162 | break
1163 | #print('iteration', i+1, 'out of', self.maxiter)
1164 | self.update()
1165 | self.currentIter = i+1
1166 | # print progress bar
1167 | SGD.printProgress(self)
1168 | # Update the objective and gradient
1169 | if self.maxiter > 1: # since objFun and gradFun are optional for 1 iteration
1170 | SGD.evaluateObjFn(self)
1171 | SGD.evaluateGradFn(self)
1172 | SGD.stopCrit(self)
1173 |
1174 | def getMoments(self):
1175 | """
1176 | This returns the updated moments
1177 | """
1178 | return self.m, self.v
1179 |
1180 | class SAG(SGD):
1181 | """
1182 | ==============================================================================
1183 | | Stochastic Average Gradient (SAG) class |
1184 | | derived class from Stochastic Gradient Descent |
1185 | ==============================================================================
1186 | Initialization:
1187 | sag = SAG(nSamples, nTotSamples, fullGrad = 0.0, obj, grad, eta, param,
1188 | iter, maxIter, objFun, gradFun, lowerBound, upperBound)
1189 |
1190 | ==============================================================================
1191 | Attributes: (all private)
1192 | fullGrad: Full gradient information
1193 | (array of dimension nParam-by-nTotSamples)
1194 | eta: learning rate
1195 | param: the parameter vector (array of dimension nParam-by-1)
1196 | nParam: number of parameters
1197 | nTotSamples: total number of samples
1198 | nSamples: number of gradients updated at each iteration
1199 | iter: iteration number (optional)
1200 | maxIter: maximum iteration number (optional input, default = 1)
1201 | objFun: function handle to evaluate the objective
1202 | (not required for maxit = 1 )
1203 | gradFun: function handle to evaluate the gradient
1204 | (not required for maxit = 1 )
1205 | lowerBound: lower bound for the parameters (optional input)
1206 | upperBound: upper bound for the parameters (optional input)
1207 | stopGrad: stopping criterion based on 2-norm of gradient vector
1208 | (default 10^-6)
1209 | learnSched: learning schedule (constant, exponential or time-based,
1210 | default = constant)
1211 | lrParam: learning schedule parameter (default =0.1)
1212 | alg: algorithm used
1213 | __version__: version of the code
1214 | ==============================================================================
1215 | Methods:
1216 | Public:
1217 | performIter:performs all the iterations inside a for loop
1218 | getGradHist:returns gradient history (default is zero)
1219 | Inherited:
1220 | getParam: returns the parameter values
1221 | getObj: returns the current objective value
1222 | getGrad: returns the current gradient information
1223 | getParamHist: returns parameter update history
1224 | Private: (should not be called outside this class file)
1225 | __init__: initialization
1226 | update: performs one iteration of SAG
1227 | Inherited:
1228 | evaluateObjFn: evaluates the objective function
1229 | evaluateGradFn: evaluates the gradients
1230 | satisfyBounds: satisfies the parameter bounds
1231 | learningSchedule: learning schedule
1232 | stopCrit: check stopping criteria
1233 | ==============================================================================
1234 | Reference: Roux, Nicolas L., Mark Schmidt, and Francis R. Bach.
1235 | "A stochastic gradient method with an exponential convergence rate
1236 | for finite training sets."
1237 | Advances in neural information processing systems. 2012.
1238 | ==============================================================================
1239 | written by Subhayan De (email: Subhayan.De@colorado.edu), July, 2018.
1240 | ==============================================================================
1241 | """
1242 | def __init__(self,nSamples,nTotSamples,fullGrad =0.0,**kwargs):
1243 | """ Initialize the SAG class object.
1244 | This can be used to perform one iteration of SAG.
1245 | """
1246 | self.alg = 'SAG'
1247 | SGD.printAlg(self)
1248 | grad = fullGrad
1249 | SGD.__init__(self,**kwargs)
1250 | # Assign total number of samples
1251 | if type(nTotSamples) != int:
1252 | raise TypeError('nSamples not an integer value')
1253 | else:
1254 | self.nTotSamples = nTotSamples
1255 | # Assign number of samples to be replaced at each iteration
1256 | if type(nSamples) != int:
1257 | raise TypeError('nSamples not an integer value')
1258 | else:
1259 | self.nSamples = nSamples
1260 | # Initialize gradients
1261 | if np.sum(fullGrad) != 0:
1262 | if np.size(fullGrad)/nTotSamples != self.nParam:
1263 | raise ValueError('Full gradient dimension mismatch')
1264 | else:
1265 | fullGrad = np.reshape(fullGrad,(self.nParam,nTotSamples))
1266 | else:
1267 | self.fullGrad = np.zeros((self.nParam,self.nTotSamples))
1268 | try:
1269 | self.gradFun
1270 | except NameError:
1271 | print('Please provide gradient function name')
1272 | self.fullGrad, nprime = self.gradFun(self.param,self.nTotSamples)
1273 | self.grad = self.fullGrad
1274 |
1275 | def update(self):
1276 | """
1277 | Perform one iteration of SAG
1278 | """
1279 | if hasattr(self,'gradFun'):
1280 | batchGrad,nprime = self.gradFun(self.param,self.nSamples)
1281 | else:
1282 | nprime = np.random.choice(range(self.nTotSamples), self.nSamples, replace = False)
1283 | batchGrad = self.fullGrad[:,nprime]
1284 | # Perform one iteration of SAG
1285 | for i in range(self.nSamples):
1286 | #self.evaluateGradFn()
1287 | self.fullGrad[:,nprime[i]] = batchGrad[:,i]
1288 |
1289 | SGD.learningSchedule(self)
1290 | self.param=self.param-self.etaCurrent*np.mean(self.fullGrad,1)
1291 | #print(np.mean(self.fullGrad,1),self.param)
1292 | SGD.satisfyBounds(self)
1293 | self.paramHist = np.append(self.paramHist,np.reshape(self.param,(2,1)), axis = 1)
1294 | #print('One iteration of SAG has been performed successfully!\n')
1295 |
1296 | def performIter(self):
1297 | """
1298 | Performs all the iterations of SAG
1299 | """
1300 | # initialize progress bar
1301 | printProgressBar(0, self.maxiter, prefix = self.alg, suffix = 'Complete', length = 25)
1302 | self.t = time.clock()
1303 | for i in range(self.iter,self.maxiter,1):
1304 | if self.stop == True:
1305 | break
1306 | #print('iteration', i+1, 'out of', self.maxiter)
1307 | self.update()
1308 | self.currentIter = i+1
1309 | # print progress bar
1310 | SGD.printProgress(self)
1311 | # Update the objective and gradient
1312 | if self.maxiter > 1: # since objFun and gradFun are optional for 1 iteration
1313 | SGD.evaluateObjFn(self)
1314 | SGD.stopCrit(self)
1315 |
1316 |
1317 | class minibatchSGD(SGD):
1318 | """
1319 | ==============================================================================
1320 | | minibatch SGD class |
1321 | | derived class from Stochastic Gradient Descent |
1322 | ==============================================================================
1323 | Initialization:
1324 | mbsgd = minibatchSGD(nSamples, nTotSamples,newGrad = 0.0,
1325 | obj, grad, eta, param, iter, maxiter,
1326 | objFun, gradFun, lowerBound, upperBound)
1327 |
1328 | ==============================================================================
1329 | Attributes:
1330 | alg: minibatchSGD
1331 | eta: learning rate
1332 | param: the parameter vector (array of dimension nParam-by-1)
1333 | nParam: number of parameters
1334 | newGrad: gradient information
1335 | (array of dimension nParam-by-nSamples)
1336 | nSamples: number of gradients updated at each iteration
1337 | iter: iteration number (optional)
1338 | maxIter: maximum iteration number (optional input, default = 1)
1339 | objFun: function handle to evaluate the objective
1340 | (not required for maxit = 1 )
1341 | gradFun: function handle to evaluate the gradient
1342 | (not required for maxit = 1 )
1343 | lowerBound: lower bound for the parameters (optionalinput)
1344 | upperBound: upper bound for the parameters (optional input)
1345 | stopGrad: stopping criterion based on 2-norm of gradient vector
1346 | (default 10^-6)
1347 | learnSched: learning schedule (constant, exponential or time-based,
1348 | default = constant)
1349 | lrParam: learning schedule parameter (default =0.1)
1350 | alg: algorithm used
1351 | __version__: version of the code
1352 | ==============================================================================
1353 | Methods:
1354 | Public:
1355 | performIter: performs all the iterations inside a for loop
1356 | getGradHist: returns gradient history (default is zero)
1357 | Inherited:
1358 | getParam: returns the parameter values
1359 | getObj: returns the current objective value
1360 | getGrad: returns the current gradient information
1361 | getParamHist: returns parameter update history
1362 | Private: (should not be called outside this class file)
1363 | __init__: initialization
1364 | update: performs one iteration of minibatch SGD
1365 | Inherited:
1366 | evaluateObjFn: evaluates the objective function
1367 | evaluateGradFn: evaluates the gradients
1368 | satisfyBounds: satisfies the parameter bounds
1369 | learningSchedule: learning schedule
1370 | stopCrit: check stopping criteria
1371 | ==============================================================================
1372 | Reference:
1373 | ==============================================================================
1374 | written by Subhayan De (email: Subhayan.De@colorado.edu), July, 2018.
1375 | ==============================================================================
1376 | """
1377 | def __init__(self,nSamples,nTotSamples = np.inf,newGrad = 0.0,**kwargs):
1378 | """ Initialize the minibatch SGD class object.
1379 | This can be used to perform one iteration of minibatch SGD.
1380 | """
1381 | self.alg = 'minibatchSGD'
1382 | SGD.printAlg(self)
1383 | self.grad = newGrad
1384 | SGD.__init__(self,**kwargs)
1385 | # Assign number of samples used at each iteration
1386 | if type(nSamples) != int:
1387 | raise TypeError('nSamples not an integer value')
1388 | else:
1389 | self.nSamples = nSamples
1390 | # Total number of samples
1391 | if type(nTotSamples) != int:
1392 | raise TypeError('nTotSamples not an integer value')
1393 | else:
1394 | self.nTotSamples = nTotSamples
1395 | # Check for total number of samples
1396 | if nTotSamples < nSamples:
1397 | print('nTotSamples can not be smaller that nSamples\n')
1398 | print('nTotSamples = nSamples is set\n')
1399 | print('NOTE: performing a batch gradient descent')
1400 | elif nTotSamples == nSamples:
1401 | print('NOTE: performing a batch gradient descent')
1402 | elif nTotSamples < np.inf:
1403 | print('NOTE: performing a minibatch SGD with ', nSamples/nTotSamples*100, '% of total samples')
1404 | else:
1405 | print('NOTE: performing a minibatch SGD with ', nSamples, ' samples')
1406 | # Initialize new gradients
1407 | if np.sum(newGrad) != 0.0:
1408 | if np.size(newGrad)/nSamples != self.nParam:
1409 | raise ValueError('New gradient dimension mismatch')
1410 | else:
1411 | self.newGrad=np.reshape(newGrad,(self.nParam))
1412 | else:
1413 | self.newGrad = np.zeros((self.nParam,self.nSamples))
1414 | try:
1415 | self.gradFun
1416 | except NameError:
1417 | print('Please provide gradient function name')
1418 | self.newGrad, nprime = self.gradFun(self.param,self.nSamples)
1419 |
1420 | def update(self):
1421 | """
1422 | Perform one iteration of minibatch SGD
1423 | """
1424 | SGD.learningSchedule(self)
1425 | if self.maxiter>1:
1426 | self.newGrad,nprime = self.gradFun(self.param,self.nSamples)
1427 | # Perform one iteration of minibatch SGD
1428 | self.param=self.param-self.etaCurrent*np.mean(self.newGrad,1)
1429 | SGD.satisfyBounds(self)
1430 | self.paramHist = np.append(self.paramHist,np.reshape(self.param,(2,1)), axis = 1)
1431 | #print('One iteration of minibatch SGD has been performed successfully!\n')
1432 |
1433 | def performIter(self):
1434 | """
1435 | Performs all the iterations of minibatch SGD
1436 | """
1437 | # initialize progress bar
1438 | printProgressBar(0, self.maxiter, prefix = self.alg, suffix = 'Complete', length = 25)
1439 | self.t = time.clock()
1440 | for i in range(self.iter,self.maxiter,1):
1441 | if self.stop == True:
1442 | break
1443 | #print('iteration', i+1, 'out of', self.maxiter)
1444 | self.update()
1445 | self.currentIter = i+1
1446 | # print progress bar
1447 | SGD.printProgress(self)
1448 | # Update the objective and gradient
1449 | if self.maxiter > 1: # since objFun and gradFun are optional for 1 iteration
1450 | SGD.evaluateObjFn(self)
1451 | SGD.stopCrit(self)
1452 |
1453 | class SVRG(SGD):
1454 | """
1455 | ==============================================================================
1456 | | Stochastic variance reduced gradient (SVRG) class |
1457 | | derived class from Stochastic Gradient Descent |
1458 | ==============================================================================
1459 | Initialization:
1460 | opt = SVRG(nTotSamples, innerIter = 10, outerIter = 200, option = 1,obj,
1461 | grad, eta, param, iter, maxiter, objFun, gradFun)
1462 |
1463 | NOTE: option = 1 or 2 as suggested in the reference paper.
1464 | ==============================================================================
1465 | Attributes:
1466 | alg: SVRG
1467 | eta: learning rate
1468 | param: the parameter vector (array of dimension nParam-by-1)
1469 | nParam: number of parameters
1470 | fullGrad: Full gradient information
1471 | (array of dimension nParam-by-nTotSamples)
1472 | nTotSamples: total number of samples
1473 | innerIter: inner iteration
1474 | outerIter: outer iteration
1475 | iter: iteration number (optional input)
1476 | maxIter: maximum iteration number
1477 | (optional, default = innerIter*outerIter)
1478 | objFun: function handle to evaluate the objective
1479 | (not required for maxit = 1 )
1480 | gradFun: function handle to evaluate the gradient
1481 | (not required for maxit = 1 )
1482 | mu: average gradient in the outer iteration
1483 | paramBest: best estimate of the param in the oter iteration
1484 | lowerBound: lower bound for the parameters (optional input)
1485 | upperBound: upper bound for the parameters (optional input)
1486 | stopGrad: stopping criterion based on 2-norm of gradient vector
1487 | (default 10^-6)
1488 | alg: algorithm used
1489 | __version__: version of the code
1490 | ==============================================================================
1491 | Methods:
1492 | Public:
1493 | performOuterIter: performs all the iterations inside a for loop
1494 | getGradHist: returns gradient history (default is zero)
1495 | Inherited:
1496 | getParam: returns the parameter values
1497 | getObj: returns the current objective value
1498 | getGrad: returns the current gradient information
1499 | getParamHist: returns parameter update history
1500 | Private: (should not be called outside this class file)
1501 | __init__: initialization
1502 | innerUpdate: performs inner iterations of SVRG
1503 | Inherited:
1504 | evaluateObjFn: evaluates the objective function
1505 | evaluateGradFn: evaluates the gradients
1506 | satisfyBounds: satisfies the parameter bounds
1507 | learningSchedule: learning schedule
1508 | stopCrit: check stopping criteria
1509 | ==============================================================================
1510 | Reference: Johnson, Rie, and Tong Zhang.
1511 | "Accelerating stochastic gradient descent using predictive variance reduction."
1512 | Advances in neural information processing systems. 2013.
1513 | ==============================================================================
1514 | written by Subhayan De (email: Subhayan.De@colorado.edu), July, 2018.
1515 | ==============================================================================
1516 | """
1517 | def __init__(self,nTotSamples, innerIter = 10, outerIter = 200, option = 1, **kwargs):
1518 | """ Initialize the SVRG class object.
1519 | This can be used to perform one iteration of SVRG.
1520 | """
1521 | self.alg = 'SVRG'
1522 | SGD.printAlg(self)
1523 | SGD.__init__(self,**kwargs)
1524 | self.nTotSamples = nTotSamples
1525 | # Check inner iteration and outer iteration values
1526 | if innerIter*outerIter > self.maxiter:
1527 | self.maxiter = innerIter*outerIter
1528 | print('Maximum iteration number is set to ',self.maxiter)
1529 | self.innerIter = innerIter
1530 | self.outerIter = outerIter
1531 | self.paramBest = self.param
1532 | # Initialize gradients
1533 | try:
1534 | self.gradFun
1535 | except NameError:
1536 | print('Please provide gradient function name')
1537 | self.fullGrad, nprime = self.gradFun(self.param,self.nTotSamples)
1538 | self.grad = self.fullGrad
1539 | self.mu = np.mean(self.grad,1)
1540 | self.option = option
1541 |
1542 | def innerUpdate(self):
1543 | """
1544 | Perform inner iterations of SVRG
1545 | """
1546 | for i in range(self.innerIter):
1547 | SGD.learningSchedule(self)
1548 | it = np.random.randint(self.nTotSamples)
1549 | bestParamGrad, notNeeded = self.gradFun(self.paramBest,1)
1550 | bestParamGrad = np.reshape(bestParamGrad,(2))
1551 | self.param = self.param - self.etaCurrent*(self.grad[:,it]-bestParamGrad+self.mu)
1552 | SGD.satisfyBounds(self)
1553 | self.paramHist = np.append(self.paramHist,np.reshape(self.param,(2,1)), axis = 1)
1554 | if self.option == 1:
1555 |
1556 | self.paramBest = self.param
1557 | else:
1558 | ind = np.random.randint(low = self.totIter, high = self.totIter+self.innerIter)
1559 | self.paramBest = self.paramHist[:,ind]
1560 |
1561 | def performOuterIter(self):
1562 | """
1563 | Performs all the iterations of SVRG
1564 | """
1565 | # initialize progress bar
1566 | printProgressBar(0, self.outerIter, prefix = self.alg, suffix = 'Complete', length = 25)
1567 | self.t = time.clock()
1568 | self.totIter = 0
1569 | for i in range(self.iter,self.outerIter,1):
1570 | if self.stop == True:
1571 | break
1572 | #print('Outer iteration', i+1, ' of', self.outerIter, ' (inner iteration = ', self.innerIter,')')
1573 | self.innerUpdate()
1574 | self.totIter = self.totIter + (i+1)*self.innerIter
1575 | self.currentIter = i+1
1576 | # print progress bar
1577 | SGD.printProgress(self)
1578 | self.grad, notNeeded = self.gradFun(self.paramBest,self.nTotSamples)
1579 | self.mu = np.mean(self.grad,1)
1580 | # Update the objective and gradient
1581 | if self.maxiter > 1: # since objFun and gradFun are optional for 1 iteration
1582 | SGD.evaluateObjFn(self)
1583 | SGD.stopCrit(self)
1584 |
--------------------------------------------------------------------------------