135 |
136 |
137 |
138 |
139 |
140 |
141 |
142 |
143 |
--------------------------------------------------------------------------------
/man/hpar.Rd:
--------------------------------------------------------------------------------
1 | \name{hpar}
2 | \alias{hpar}
3 | \title{Deep Neural Net parameters and hyperparameters}
4 | \description{
5 | List of Neural Network parameters and hyperparameters to train with gradient descent or particle swarm optimization\cr
6 | Not mandatory (the list is preset and all arguments are initialized with default value) but it is advisable to adjust some important arguments for performance reasons (including processing time)
7 | }
8 | \arguments{
9 |
10 | \item{modexec}{ \sQuote{trainwgrad} (the default value) to train with gradient descent (suitable for all volume of data)\cr
11 | \sQuote{trainwpso} to train using Particle Swarm Optimization, each particle represents a set of neural network weights (CAUTION: suitable for low volume of data, time consuming for medium to large volume of data)}\cr
12 |
13 | \emph{Below specific arguments to \sQuote{trainwgrad} execution mode}
14 |
15 | \item{learningrate}{ learningrate alpha (default value 0.001)\cr
16 | #tuning priority 1}
17 |
18 | \item{beta1}{ see below}
19 | \item{beta2}{ \sQuote{Momentum} if beta1 different from 0 and beta2 equal 0 )\cr
20 | \sQuote{RMSprop} if beta1 equal 0 and beta2 different from 0\cr
21 | \sQuote{adam optimization} if beta1 different from 0 and beta2 different from 0 (default)\cr
22 | (default value beta1 equal 0.9 and beta2 equal 0.999)\cr
23 | #tuning priority 2}
24 |
25 | \item{lrdecayrate}{ learning rate decay value (default value 0, no learning rate decay, 1e-6 should be a good value to start with)\cr
26 | #tuning priority 4}
27 |
28 | \item{chkgradevery}{ epoch interval to run gradient check function (default value 0, for debug only)}
29 |
30 | \item{chkgradepsilon}{ epsilon value for derivative calculations and threshold test in gradient check function (default 0.0000001)}\cr
31 |
32 | \emph{Below specific arguments to \sQuote{trainwpso} execution mode}
33 |
34 | \item{psoxxx}{ see \link{pso} for PSO specific arguments details}
35 |
36 | \item{costcustformul}{ custom cost formula (default \sQuote{}, no custom cost function)\cr
37 | standard input variables: yhat (prediction), y (target actual value)\cr
38 | custom input variables: any variable declared in hpar may be used via alias mydl (ie: hpar(list = (foo = 1.5)) will be used in custom cost formula as mydl$foo))\cr
39 | result: J\cr
40 | see \sQuote{automl_train_manual} example using Mean Average Percentage Error cost function\cr
41 | nb: X and Y matrices used as input into automl_train_manual or automl_train_manual functions are transposed (features in rows and cases in columns)}\cr
42 |
43 | \emph{Below arguments for both execution modes}
44 |
45 | \item{numiterations}{ number of training epochs (default value 50))\cr
46 | #tuning priority 1}
47 |
48 | \item{seed}{ seed for reproductibility (default 4)}
49 |
50 | \item{minibatchsize}{ mini batch size, 2 to the power 0 for stochastic gradient descent (default 2 to the power 5)
51 | #tuning priority 3}
52 |
53 | \item{layersshape}{ number of nodes per layer, each nodes number initialize a hidden layer\cr
54 | output layer nodes number, may be left to 0 it will be automatically set by Y matrix shape\cr
55 | default value one hidden layer with 10 nodes: c(10, 0)\cr
56 | #tuning priority 4}
57 |
58 | \item{layersacttype}{ activation function for each layer; \sQuote{linear} for no activation or \sQuote{sigmoid}, \sQuote{relu} or \sQuote{reluleaky} or \sQuote{tanh} or \sQuote{softmax} (softmax for output layer only supported in trainwpso exec mode)\cr
59 | output layer activation function may be left to \sQuote{}, default value \sQuote{linear} for regression, \sQuote{sigmoid} for classification\cr
60 | nb: layersacttype parameter vector must have same length as layersshape parameter vector\cr
61 | default value c(\sQuote{relu}, \sQuote{})\cr
62 | #tuning priority 4}
63 |
64 | \item{layersdropoprob}{ drop out probability for each layer, continuous value from 0 to less than 1 (give the percentage of matrix weight values to drop out randomly)\cr
65 | nb: layersdropoprob parameter vector must have same length as layersshape parameter vector\cr
66 | default value no drop out: c(0, 0)\cr
67 | #tuning priority for regularization}
68 |
69 | \item{printcostevery}{ epoch interval to test and print costs (train and cross validation cost: default value 10, for 1 test every 10 epochs)}
70 |
71 | \item{testcvsize}{ size of cross validation sample, 0 for no cross validation sample (default 10, for 10 percent)}
72 |
73 | \item{testgainunder}{ threshold to stop the training if the gain between last train or cross validation cost is smaller than the threshold, 0 for no stop test (default 0.000001)}
74 |
75 | \item{costtype}{ cost type function name \sQuote{mse} or \sQuote{crossentropy} or \sQuote{custom}\cr
76 | \sQuote{mse} for Mean Squared Error, set automatically for continuous target type (\sQuote{mape} Mean Absolute Percentage Error may be specified)\cr
77 | \sQuote{crossentropy} set automatically for binary target type\cr
78 | \sQuote{custom} set automatically if \sQuote{costcustformul} different from \sQuote{}
79 | }
80 |
81 | \item{lambda}{ regularization term added to cost function (default value 0, no regularization)
82 | }
83 |
84 | \item{batchnor_mom}{ batch normalization momentum for j and B (default 0, no batch normalization, may be set to 0.9 for deep neural net)
85 | }
86 |
87 | \item{epsil}{ epsilon the low value to avoid dividing by 0 or log(0) in cost function, etc ... (default value 1e-12)\cr
88 | }
89 |
90 | \item{verbose}{ to display or not the costs and the shapes (default TRUE)}\cr
91 |
92 | \emph{back to \link{automl_train}, \link{automl_train_manual}}
93 | }
94 |
95 | \seealso{
96 | Deep Learning specialization from Andrew NG on Coursera
97 | }
98 |
--------------------------------------------------------------------------------
/docs/pkgdown.css:
--------------------------------------------------------------------------------
1 | /* Sticky footer */
2 |
3 | /**
4 | * Basic idea: https://philipwalton.github.io/solved-by-flexbox/demos/sticky-footer/
5 | * Details: https://github.com/philipwalton/solved-by-flexbox/blob/master/assets/css/components/site.css
6 | *
7 | * .Site -> body > .container
8 | * .Site-content -> body > .container .row
9 | * .footer -> footer
10 | *
11 | * Key idea seems to be to ensure that .container and __all its parents__
12 | * have height set to 100%
13 | *
14 | */
15 |
16 | html, body {
17 | height: 100%;
18 | }
19 |
20 | body > .container {
21 | display: flex;
22 | height: 100%;
23 | flex-direction: column;
24 | }
25 |
26 | body > .container .row {
27 | flex: 1 0 auto;
28 | }
29 |
30 | footer {
31 | margin-top: 45px;
32 | padding: 35px 0 36px;
33 | border-top: 1px solid #e5e5e5;
34 | color: #666;
35 | display: flex;
36 | flex-shrink: 0;
37 | }
38 | footer p {
39 | margin-bottom: 0;
40 | }
41 | footer div {
42 | flex: 1;
43 | }
44 | footer .pkgdown {
45 | text-align: right;
46 | }
47 | footer p {
48 | margin-bottom: 0;
49 | }
50 |
51 | img.icon {
52 | float: right;
53 | }
54 |
55 | img {
56 | max-width: 100%;
57 | }
58 |
59 | /* Fix bug in bootstrap (only seen in firefox) */
60 | summary {
61 | display: list-item;
62 | }
63 |
64 | /* Typographic tweaking ---------------------------------*/
65 |
66 | .contents .page-header {
67 | margin-top: calc(-60px + 1em);
68 | }
69 |
70 | /* Section anchors ---------------------------------*/
71 |
72 | a.anchor {
73 | margin-left: -30px;
74 | display:inline-block;
75 | width: 30px;
76 | height: 30px;
77 | visibility: hidden;
78 |
79 | background-image: url(./link.svg);
80 | background-repeat: no-repeat;
81 | background-size: 20px 20px;
82 | background-position: center center;
83 | }
84 |
85 | .hasAnchor:hover a.anchor {
86 | visibility: visible;
87 | }
88 |
89 | @media (max-width: 767px) {
90 | .hasAnchor:hover a.anchor {
91 | visibility: hidden;
92 | }
93 | }
94 |
95 |
96 | /* Fixes for fixed navbar --------------------------*/
97 |
98 | .contents h1, .contents h2, .contents h3, .contents h4 {
99 | padding-top: 60px;
100 | margin-top: -40px;
101 | }
102 |
103 | /* Sidebar --------------------------*/
104 |
105 | #sidebar {
106 | margin-top: 30px;
107 | position: -webkit-sticky;
108 | position: sticky;
109 | top: 70px;
110 | }
111 | #sidebar h2 {
112 | font-size: 1.5em;
113 | margin-top: 1em;
114 | }
115 |
116 | #sidebar h2:first-child {
117 | margin-top: 0;
118 | }
119 |
120 | #sidebar .list-unstyled li {
121 | margin-bottom: 0.5em;
122 | }
123 |
124 | .orcid {
125 | height: 16px;
126 | /* margins are required by official ORCID trademark and display guidelines */
127 | margin-left:4px;
128 | margin-right:4px;
129 | vertical-align: middle;
130 | }
131 |
132 | /* Reference index & topics ----------------------------------------------- */
133 |
134 | .ref-index th {font-weight: normal;}
135 |
136 | .ref-index td {vertical-align: top;}
137 | .ref-index .icon {width: 40px;}
138 | .ref-index .alias {width: 40%;}
139 | .ref-index-icons .alias {width: calc(40% - 40px);}
140 | .ref-index .title {width: 60%;}
141 |
142 | .ref-arguments th {text-align: right; padding-right: 10px;}
143 | .ref-arguments th, .ref-arguments td {vertical-align: top;}
144 | .ref-arguments .name {width: 20%;}
145 | .ref-arguments .desc {width: 80%;}
146 |
147 | /* Nice scrolling for wide elements --------------------------------------- */
148 |
149 | table {
150 | display: block;
151 | overflow: auto;
152 | }
153 |
154 | /* Syntax highlighting ---------------------------------------------------- */
155 |
156 | pre {
157 | word-wrap: normal;
158 | word-break: normal;
159 | border: 1px solid #eee;
160 | }
161 |
162 | pre, code {
163 | background-color: #f8f8f8;
164 | color: #333;
165 | }
166 |
167 | pre code {
168 | overflow: auto;
169 | word-wrap: normal;
170 | white-space: pre;
171 | }
172 |
173 | pre .img {
174 | margin: 5px 0;
175 | }
176 |
177 | pre .img img {
178 | background-color: #fff;
179 | display: block;
180 | height: auto;
181 | }
182 |
183 | code a, pre a {
184 | color: #375f84;
185 | }
186 |
187 | a.sourceLine:hover {
188 | text-decoration: none;
189 | }
190 |
191 | .fl {color: #1514b5;}
192 | .fu {color: #000000;} /* function */
193 | .ch,.st {color: #036a07;} /* string */
194 | .kw {color: #264D66;} /* keyword */
195 | .co {color: #888888;} /* comment */
196 |
197 | .message { color: black; font-weight: bolder;}
198 | .error { color: orange; font-weight: bolder;}
199 | .warning { color: #6A0366; font-weight: bolder;}
200 |
201 | /* Clipboard --------------------------*/
202 |
203 | .hasCopyButton {
204 | position: relative;
205 | }
206 |
207 | .btn-copy-ex {
208 | position: absolute;
209 | right: 0;
210 | top: 0;
211 | visibility: hidden;
212 | }
213 |
214 | .hasCopyButton:hover button.btn-copy-ex {
215 | visibility: visible;
216 | }
217 |
218 | /* headroom.js ------------------------ */
219 |
220 | .headroom {
221 | will-change: transform;
222 | transition: transform 200ms linear;
223 | }
224 | .headroom--pinned {
225 | transform: translateY(0%);
226 | }
227 | .headroom--unpinned {
228 | transform: translateY(-100%);
229 | }
230 |
231 | /* mark.js ----------------------------*/
232 |
233 | mark {
234 | background-color: rgba(255, 255, 51, 0.5);
235 | border-bottom: 2px solid rgba(255, 153, 51, 0.3);
236 | padding: 1px;
237 | }
238 |
239 | /* vertical spacing after htmlwidgets */
240 | .html-widget {
241 | margin-bottom: 10px;
242 | }
243 |
244 | /* fontawesome ------------------------ */
245 |
246 | .fab {
247 | font-family: "Font Awesome 5 Brands" !important;
248 | }
249 |
250 | /* don't display links in code chunks when printing */
251 | /* source: https://stackoverflow.com/a/10781533 */
252 | @media print {
253 | code a:link:after, code a:visited:after {
254 | content: "";
255 | }
256 | }
257 |
--------------------------------------------------------------------------------
/man/autopar.Rd:
--------------------------------------------------------------------------------
1 | \name{autopar}
2 | \alias{autopar}
3 | \title{parameters for automatic hyperparameters optimization}
4 | \description{
5 | List of parameters to allow multi deep neural network automatic hyperparameters tuning with Particle Swarm Optimization\cr
6 | Not mandatory (the list is preset and all arguments are initialized with default value) but it is advisable to adjust some important arguments for performance reasons (including processing time)
7 | }
8 | \arguments{
9 | \item{psopartpopsize}{ number of particles in swarm, the main argument that should be tuned (default value 8, which is quite low)\cr
10 | #tuning priority 1}
11 |
12 | \item{psoxxx}{ see \link{pso} for other PSO specific arguments details}
13 |
14 | \item{numiterations}{ number of convergence steps between particles (hyperparameters), default value 3)\cr
15 | #tuning priority 1}
16 |
17 | \item{auto_modexec}{ if \sQuote{TRUE} the type of Neural Net optimization will be randomly choosen between \sQuote{trainwgrad} and \sQuote{trainwpso} for each particle\cr
18 | default value is \sQuote{FALSE} (so default value of argument \sQuote{modexec} in \link{automl_train_manual} function, actually \sQuote{trainwgrad} as default is more suited to large data volume)\cr
19 | the value can be forced if defined in \link{hpar} list}
20 |
21 | \item{auto_runtype}{ if \sQuote{2steps} the 2 following steps will be run automatically (default value is \sQuote{normal}):\cr
22 | 1st overfitting, the goal is performance\cr
23 | 2nd regularization, the goal is generalization \cr
24 | nb: \sQuote{overfitting} or \sQuote{regularization} may be directly specified to avoid the 2 steps\cr}
25 |
26 | \item{auto_minibatchsize}{ see below}
27 | \item{auto_minibatchsize_min}{ see below}
28 | \item{auto_minibatchsize_max}{ \sQuote{auto_minibatch} default value \sQuote{TRUE} for automatic adjustment of \sQuote{minibatchsize} argument in \link{automl_train_manual} function\cr
29 | the minimum and maximum value for \sQuote{minibatchsize} corespond to 2 to the power value (default 0 for \sQuote{auto_minibatchsize_min} and 9 for \sQuote{auto_minibatchsize_max})}
30 |
31 | \item{auto_learningrate}{ see below}
32 | \item{auto_learningrate_min}{ see below}
33 | \item{auto_learningrate_max}{ \sQuote{auto_learningrate} default value \sQuote{TRUE} for automatic adjustment of \sQuote{learningrate} argument in \link{automl_train_manual} function\cr
34 | the minimum and maximum value for \sQuote{learningrate} correspond to 10 to the power negative value (default -5 for \sQuote{auto_learningrate_min} and -2 for \sQuote{auto_learningrate_max})}
35 |
36 | \item{auto_beta1}{ see below}
37 | \item{auto_beta2}{ \sQuote{auto_beta1} and \sQuote{auto_beta2} default value \sQuote{TRUE} for automatic adjustment of \sQuote{beta1} and \sQuote{beta2} argument in \link{automl_train_manual} function}
38 |
39 | \item{auto_psopartpopsize}{ see below}
40 | \item{auto_psopartpopsize_min}{ see below}
41 | \item{auto_psopartpopsize_max}{ \sQuote{auto_psopartpopsize} default value \sQuote{TRUE} for automatic adjustment of \sQuote{psopartpopsize} argument in \link{automl_train_manual} function (concern only \sQuote{modexec} set to \sQuote{trainwpso})\cr
42 | the minimum and maximum value for \sQuote{psopartpopsize} ; default 2 for \sQuote{auto_psopartpopsize_min} and 50 for \sQuote{auto_psopartpopsize_max})}
43 |
44 | \item{auto_lambda}{ see below}
45 | \item{auto_lambda_min}{ see below}
46 | \item{auto_lambda_max}{ \sQuote{auto_lambda} default value \sQuote{FALSE} for automatic adjustment of \sQuote{lambda} regularization argument in \link{automl_train_manual} function\cr
47 | the minimum and maximum value for \sQuote{lambda} correspond to 10 to the power value (default -2) for \sQuote{auto_lambda_min} and (default 4) for \sQuote{auto_lambda_max})}
48 |
49 | \item{auto_psovelocitymaxratio}{ see below}
50 | \item{auto_psovelocitymaxratio_min}{ see below}
51 | \item{auto_psovelocitymaxratio_max}{ \sQuote{auto_psovelocitymaxratio} default value \sQuote{TRUE} for automatic adjustment of \sQuote{psovelocitymaxratio} PSO velocity max ratio argument in \link{automl_train_manual} function\cr
52 | the minimum and maximum value for \sQuote{psovelocitymaxratio}; default 0.01 for \sQuote{auto_psovelocitymaxratio_min} and 0.5 for \sQuote{auto_psovelocitymaxratio_max}}
53 |
54 | \item{auto_layers}{ see below (\sQuote{auto_layers} default value \sQuote{TRUE} for automatic adjustment of layers shape in \link{automl_train_manual} function)}
55 | \item{auto_layers_min}{ (linked to \sQuote{auto_layers} above, set \link{hpar} \sQuote{layersshape} and \sQuote{layersacttype}) the minimum number of hidden layers (default 1 no hidden layer)}
56 | \item{auto_layers_max}{ (linked to \sQuote{auto_layers} above, set \link{hpar} \sQuote{layersshape} and \sQuote{layersacttype}) the maximum number of hidden layers (default 2)}
57 | \item{auto_layersnodes_min}{ (linked to \sQuote{auto_layers} above, set \link{hpar} \sQuote{layersshape} and \sQuote{layersacttype}) the minimum number of nodes per layer (default 3)}
58 | \item{auto_layersnodes_max}{ (linked to \sQuote{auto_layers} above, set \link{hpar} \sQuote{layersshape} and \sQuote{layersacttype}) the maximum number of nodes per layer (default 33)}
59 |
60 | \item{auto_layersdropo}{ see below}
61 | \item{auto_layersdropoprob_min}{ see below}
62 | \item{auto_layersdropoprob_max}{ \sQuote{auto_layersdropo} default value \sQuote{FALSE} for automatic adjustment of \link{hpar} \sQuote{layersdropoprob} in \link{automl_train_manual} function)\cr
63 | the minimum and maximum value for \sQuote{layersdropoprob}; default 0.05 for \sQuote{auto_layersdropoprob_min} and 0.75 for \sQuote{auto_layersdropoprob_max}}
64 |
65 | \item{seed}{ seed for reproductibility (default 4)}
66 |
67 | \item{nbcores}{ number of cores used to parallelize particles optimization, not available on Windows (default 1, automatically reduced if not enough cores)}
68 |
69 | \item{verbose}{ to display or not the costs at each iteration for each particle (default TRUE)}\cr
70 |
71 | \item{subtimelimit}{ time limit in seconds for sub modelizations to avoid waiting too long for a specific particle to finish its modelization (default 3600)}\cr
72 |
73 | \emph{back to \link{automl_train}}
74 | }
75 |
--------------------------------------------------------------------------------
/docs/index.html:
--------------------------------------------------------------------------------
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 | Deep Learning with Metaheuristic • automl
9 |
10 |
11 |
12 |
13 |
14 |
18 |
19 |
23 |
24 |
25 |
automl package fits from simple regression to highly customizable deep neural networks either with gradient descent or metaheuristic, using automatic hyper parameters tuning and custom cost function. A mix inspired by the common tricks on Deep Learning and Particle Swarm Optimization.
List of parameters and hyperparameters for Particle Swarm Optimization
118 |
119 |
120 |
121 |
Arguments
122 |
123 |
124 |
125 |
psopartpopsize
126 |
number of particles in swarm (discrete value)
127 | (‘autopar’ context: default value 8, which means that 8 different neural net hyperparameters sets will be tested
128 | (‘hpar’ context: default value 50, which means that 50 neural net weights sets will be tested
129 | #tuning priority 1
130 |
131 |
132 |
psovarvalmin
133 |
Minimum value for particles positions (default value -10)
134 |
135 |
136 |
psovarvalmax
137 |
maximum value for particles positions (default value 10)
138 |
139 |
140 |
psovelocitymaxratio
141 |
ratio applied to limit velocities (continuous value between 0 and 1, default value 0.2)
142 |
143 |
144 |
psoinertiadampratio
145 |
inertia damp ratio (continuous value between 0 and 1, default value 1 equivalent to OFF)
The multi deep neural network automatic train function (several deep neural networks are trained with automatic hyperparameters tuning, best model is kept)
120 | This function launches the automl_train_manual function by passing it parameters
121 | for each particle at each converging step
inputs matrix or data.frame (containing numerical values only)
132 |
133 |
134 |
Yref
135 |
target matrix or data.frame (containing numerical values only)
136 |
137 |
138 |
autopar
139 |
list of parameters for hyperparameters optimization, see autopar section
140 | Not mandatory (the list is preset and all arguments are initialized with default value) but it is advisable to adjust some important arguments for performance reasons (including processing time)
141 |
142 |
143 |
hpar
144 |
list of parameters and hyperparameters for Deep Neural Network, see hpar section
145 | Not mandatory (the list is preset and all arguments are initialized with default value) but it is advisable to adjust some important arguments for performance reasons (including processing time)
146 |
147 |
148 |
mdlref
149 |
model trained with automl_train to start training with saved hpar and autopar
150 | (not the model)
151 | nb: manually entered parameters above override loaded ones
List of Neural Network parameters and hyperparameters to train with gradient descent or particle swarm optimization
119 | Not mandatory (the list is preset and all arguments are initialized with default value) but it is advisable to adjust some important arguments for performance reasons (including processing time)
120 |
121 |
122 |
123 |
Arguments
124 |
125 |
126 |
127 |
modexec
128 |
‘trainwgrad’ (the default value) to train with gradient descent (suitable for all volume of data)
129 | ‘trainwpso’ to train using Particle Swarm Optimization, each particle represents a set of neural network weights (CAUTION: suitable for low volume of data, time consuming for medium to large volume of data)
130 |
131 |
132 |
learningrate
133 |
learningrate alpha (default value 0.001)
134 | #tuning priority 1
135 |
136 |
137 |
beta1
138 |
see below
139 |
140 |
141 |
beta2
142 |
‘Momentum’ if beta1 different from 0 and beta2 equal 0 )
143 | ‘RMSprop’ if beta1 equal 0 and beta2 different from 0
144 | ‘adam optimization’ if beta1 different from 0 and beta2 different from 0 (default)
145 | (default value beta1 equal 0.9 and beta2 equal 0.999)
146 | #tuning priority 2
147 |
148 |
149 |
lrdecayrate
150 |
learning rate decay value (default value 0, no learning rate decay, 1e-6 should be a good value to start with)
151 | #tuning priority 4
152 |
153 |
154 |
chkgradevery
155 |
epoch interval to run gradient check function (default value 0, for debug only)
156 |
157 |
158 |
chkgradepsilon
159 |
epsilon value for derivative calculations and threshold test in gradient check function (default 0.0000001)
custom cost formula (default ‘’, no custom cost function)
168 | standard input variables: yhat (prediction), y (target actual value)
169 | custom input variables: any variable declared in hpar may be used via alias mydl (ie: hpar(list = (foo = 1.5)) will be used in custom cost formula as mydl$foo))
170 | result: J
171 | see ‘automl_train_manual’ example using Mean Average Percentage Error cost function
172 | nb: X and Y matrices used as input into automl_train_manual or automl_train_manual functions are transposed (features in rows and cases in columns)
173 |
174 |
175 |
numiterations
176 |
number of training epochs (default value 50))
177 | #tuning priority 1
178 |
179 |
180 |
seed
181 |
seed for reproductibility (default 4)
182 |
183 |
184 |
minibatchsize
185 |
mini batch size, 2 to the power 0 for stochastic gradient descent (default 2 to the power 5)
186 | #tuning priority 3
187 |
188 |
189 |
layersshape
190 |
number of nodes per layer, each nodes number initialize a hidden layer
191 | output layer nodes number, may be left to 0 it will be automatically set by Y matrix shape
192 | default value one hidden layer with 10 nodes: c(10, 0)
193 | #tuning priority 4
194 |
195 |
196 |
layersacttype
197 |
activation function for each layer; ‘linear’ for no activation or ‘sigmoid’, ‘relu’ or ‘reluleaky’ or ‘tanh’ or ‘softmax’ (softmax for output layer only supported in trainwpso exec mode)
198 | output layer activation function may be left to ‘’, default value ‘linear’ for regression, ‘sigmoid’ for classification
199 | nb: layersacttype parameter vector must have same length as layersshape parameter vector
200 | default value c(‘relu’, ‘’)
201 | #tuning priority 4
202 |
203 |
204 |
layersdropoprob
205 |
drop out probability for each layer, continuous value from 0 to less than 1 (give the percentage of matrix weight values to drop out randomly)
206 | nb: layersdropoprob parameter vector must have same length as layersshape parameter vector
207 | default value no drop out: c(0, 0)
208 | #tuning priority for regularization
209 |
210 |
211 |
printcostevery
212 |
epoch interval to test and print costs (train and cross validation cost: default value 10, for 1 test every 10 epochs)
213 |
214 |
215 |
testcvsize
216 |
size of cross validation sample, 0 for no cross validation sample (default 10, for 10 percent)
217 |
218 |
219 |
testgainunder
220 |
threshold to stop the training if the gain between last train or cross validation cost is smaller than the threshold, 0 for no stop test (default 0.000001)
221 |
222 |
223 |
costtype
224 |
cost type function name ‘mse’ or ‘crossentropy’ or ‘custom’
225 | ‘mse’ for Mean Squared Error, set automatically for continuous target type (‘mape’ Mean Absolute Percentage Error may be specified)
226 | ‘crossentropy’ set automatically for binary target type
227 | ‘custom’ set automatically if ‘costcustformul’ different from ‘’
228 |
229 |
230 |
lambda
231 |
regularization term added to cost function (default value 0, no regularization)
232 |
233 |
234 |
batchnor_mom
235 |
batch normalization momentum for j and B (default 0, no batch normalization, may be set to 0.9 for deep neural net)
236 |
237 |
238 |
epsil
239 |
epsilon the low value to avoid dividing by 0 or log(0) in cost function, etc ... (default value 1e-12)
240 |
241 |
242 |
verbose
243 |
to display or not the costs and the shapes (default TRUE)
244 |
245 |
246 |
247 |
See also
248 |
249 |
Deep Learning specialization from Andrew NG on Coursera
parameters for automatic hyperparameters optimization
113 |
114 |
autopar.Rd
115 |
116 |
117 |
118 |
List of parameters to allow multi deep neural network automatic hyperparameters tuning with Particle Swarm Optimization
119 | Not mandatory (the list is preset and all arguments are initialized with default value) but it is advisable to adjust some important arguments for performance reasons (including processing time)
120 |
121 |
122 |
123 |
Arguments
124 |
125 |
126 |
127 |
psopartpopsize
128 |
number of particles in swarm, the main argument that should be tuned (default value 8, which is quite low)
129 | #tuning priority 1
number of convergence steps between particles (hyperparameters), default value 3)
138 | #tuning priority 1
139 |
140 |
141 |
auto_modexec
142 |
if ‘TRUE’ the type of Neural Net optimization will be randomly choosen between ‘trainwgrad’ and ‘trainwpso’ for each particle
143 | default value is ‘FALSE’ (so default value of argument ‘modexec’ in automl_train_manual function, actually ‘trainwgrad’ as default is more suited to large data volume)
144 | the value can be forced if defined in hpar list
145 |
146 |
147 |
auto_runtype
148 |
if ‘2steps’ the 2 following steps will be run automatically (default value is ‘normal’):
149 | 1st overfitting, the goal is performance
150 | 2nd regularization, the goal is generalization
151 | nb: ‘overfitting’ or ‘regularization’ may be directly specified to avoid the 2 steps
152 |
153 |
154 |
auto_minibatchsize
155 |
see below
156 |
157 |
158 |
auto_minibatchsize_min
159 |
see below
160 |
161 |
162 |
auto_minibatchsize_max
163 |
‘auto_minibatch’ default value ‘TRUE’ for automatic adjustment of ‘minibatchsize’ argument in automl_train_manual function
164 | the minimum and maximum value for ‘minibatchsize’ corespond to 2 to the power value (default 0 for ‘auto_minibatchsize_min’ and 9 for ‘auto_minibatchsize_max’)
165 |
166 |
167 |
auto_learningrate
168 |
see below
169 |
170 |
171 |
auto_learningrate_min
172 |
see below
173 |
174 |
175 |
auto_learningrate_max
176 |
‘auto_learningrate’ default value ‘TRUE’ for automatic adjustment of ‘learningrate’ argument in automl_train_manual function
177 | the minimum and maximum value for ‘learningrate’ correspond to 10 to the power negative value (default -5 for ‘auto_learningrate_min’ and -2 for ‘auto_learningrate_max’)
178 |
179 |
180 |
auto_beta1
181 |
see below
182 |
183 |
184 |
auto_beta2
185 |
‘auto_beta1’ and ‘auto_beta2’ default value ‘TRUE’ for automatic adjustment of ‘beta1’ and ‘beta2’ argument in automl_train_manual function
186 |
187 |
188 |
auto_psopartpopsize
189 |
see below
190 |
191 |
192 |
auto_psopartpopsize_min
193 |
see below
194 |
195 |
196 |
auto_psopartpopsize_max
197 |
‘auto_psopartpopsize’ default value ‘TRUE’ for automatic adjustment of ‘psopartpopsize’ argument in automl_train_manual function (concern only ‘modexec’ set to ‘trainwpso’)
198 | the minimum and maximum value for ‘psopartpopsize’ ; default 2 for ‘auto_psopartpopsize_min’ and 50 for ‘auto_psopartpopsize_max’)
199 |
200 |
201 |
auto_lambda
202 |
see below
203 |
204 |
205 |
auto_lambda_min
206 |
see below
207 |
208 |
209 |
auto_lambda_max
210 |
‘auto_lambda’ default value ‘FALSE’ for automatic adjustment of ‘lambda’ regularization argument in automl_train_manual function
211 | the minimum and maximum value for ‘lambda’ correspond to 10 to the power value (default -2) for ‘auto_lambda_min’ and (default 4) for ‘auto_lambda_max’)
212 |
213 |
214 |
auto_psovelocitymaxratio
215 |
see below
216 |
217 |
218 |
auto_psovelocitymaxratio_min
219 |
see below
220 |
221 |
222 |
auto_psovelocitymaxratio_max
223 |
‘auto_psovelocitymaxratio’ default value ‘TRUE’ for automatic adjustment of ‘psovelocitymaxratio’ PSO velocity max ratio argument in automl_train_manual function
224 | the minimum and maximum value for ‘psovelocitymaxratio’; default 0.01 for ‘auto_psovelocitymaxratio_min’ and 0.5 for ‘auto_psovelocitymaxratio_max’
225 |
226 |
227 |
auto_layers
228 |
see below (‘auto_layers’ default value ‘TRUE’ for automatic adjustment of layers shape in automl_train_manual function)
229 |
230 |
231 |
auto_layers_min
232 |
(linked to ‘auto_layers’ above, set hpar ‘layersshape’ and ‘layersacttype’) the minimum number of hidden layers (default 1 no hidden layer)
233 |
234 |
235 |
auto_layers_max
236 |
(linked to ‘auto_layers’ above, set hpar ‘layersshape’ and ‘layersacttype’) the maximum number of hidden layers (default 2)
237 |
238 |
239 |
auto_layersnodes_min
240 |
(linked to ‘auto_layers’ above, set hpar ‘layersshape’ and ‘layersacttype’) the minimum number of nodes per layer (default 3)
241 |
242 |
243 |
auto_layersnodes_max
244 |
(linked to ‘auto_layers’ above, set hpar ‘layersshape’ and ‘layersacttype’) the maximum number of nodes per layer (default 33)
245 |
246 |
247 |
auto_layersdropo
248 |
see below
249 |
250 |
251 |
auto_layersdropoprob_min
252 |
see below
253 |
254 |
255 |
auto_layersdropoprob_max
256 |
‘auto_layersdropo’ default value ‘FALSE’ for automatic adjustment of hpar ‘layersdropoprob’ in automl_train_manual function)
257 | the minimum and maximum value for ‘layersdropoprob’; default 0.05 for ‘auto_layersdropoprob_min’ and 0.75 for ‘auto_layersdropoprob_max’
258 |
259 |
260 |
seed
261 |
seed for reproductibility (default 4)
262 |
263 |
264 |
nbcores
265 |
number of cores used to parallelize particles optimization, not available on Windows (default 1, automatically reduced if not enough cores)
266 |
267 |
268 |
verbose
269 |
to display or not the costs at each iteration for each particle (default TRUE)
270 |
271 |
272 |
subtimelimit
273 |
time limit in seconds for sub modelizations to avoid waiting too long for a specific particle to finish its modelization (default 3600)
This document is intended to answer the following questions; why & how automl and how to use it
38 |
39 | automl package provides:
40 | -Deep Learning last tricks (those who have taken Andrew NG’s MOOC on Coursera will be in familiar territory)
41 | -hyperparameters autotune with metaheuristic (PSO)
42 | -experimental stuff and more to come (you’re welcome as coauthor!)
43 |
44 |
0.2 Why & how automl
45 |
0.2.1 Deep Learning existing frameworks, disadvantages
46 |
Deploying and maintaining most Deep Learning frameworks means: Python...
47 | R language is so simple to install and maintain in production environments that it is obvious to use a pure R based package for deep learning !
48 |
49 |
0.2.2 Neural Network - Deep Learning, disadvantages
50 |
Disadvantages :
51 | 1st disadvantage: you have to test manually different combinations of parameters (number of layers, nodes, activation function, etc ...) and then also tune manually hyper parameters for training (learning rate, momentum, mini batch size, etc ...)
52 | 2nd disadvantage: only for those who are not mathematicians, calculating derivative in case of new cost or activation function, may by an issue.
53 |
54 |
55 |
56 |
0.2.3 Metaheuristic - PSO, benefits
57 |
The Particle Swarm Optimization algorithm is a great and simple one.
58 | In a few words, the first step consists in throwing randomly a set of particles in a space and the next steps consist in discovering the best solution while converging.
59 |
60 | video tutorial from Yarpiz is a great ressource
61 |
62 |
0.2.4 Birth of automl package
63 |
automl package was born from the idea to use metaheuristic PSO to address the identified disadvantages above.
64 | And last but not the least reason: use R and R only :-)
65 | 3 functions are available:
66 | - automl_train_manual: the manual mode to train a model
67 | - automl_train: the automatic mode to train model
68 | - automl_predict: the prediction function to apply a trained model on datas
69 |
70 |
0.2.5 Mix 1: hyperparameters tuning with PSO
71 |
Mix 1 consists in using PSO algorithm to optimize the hyperparameters: each particle corresponds to a set of hyperparameters.
72 | The automl_train function was made to do that.
73 |
74 |
75 |
76 |
0.2.6 Mix 2: PSO instead of gradient descent
77 |
Mix 2 is experimental, it consists in using PSO algorithm to optimize the weights of Neural Network in place of gradient descent: each particle corresponds to a set of neural network weights matrices.
78 | The automl_train_manual function do that too.
79 |
80 |
81 |
82 |
0.3 First steps: How to
83 |
For those who will laugh at seeing deep learning with one hidden layer and the Iris data set of 150 records, I will say: you’re perfectly right :-)
84 | The goal at this stage is simply to take the first steps
85 |
86 |
87 |
0.3.1 fit a regression model manually (hard way)
88 |
Subject: predict Sepal.Length given other Iris parameters
89 | 1st with gradient descent and default hyperparameters value for learning rate (0.001) and mini batch size (32)
90 |
91 |
92 | ```{r}
93 | data(iris)
94 | xmat <- cbind(iris[,2:4], as.numeric(iris$Species))
95 | ymat <- iris[,1]
96 | amlmodel <- automl_train_manual(Xref = xmat, Yref = ymat)
97 | ```
98 | ```{r}
99 | res <- cbind(ymat, automl_predict(model = amlmodel, X = xmat))
100 | colnames(res) <- c('actual', 'predict')
101 | head(res)
102 | ```
103 | :-[] no pain, no gain ...
104 | After some manual fine tuning on learning rate, mini batch size and iterations number (epochs):
105 | ```{r}
106 | data(iris)
107 | xmat <- cbind(iris[,2:4], as.numeric(iris$Species))
108 | ymat <- iris[,1]
109 | amlmodel <- automl_train_manual(Xref = xmat, Yref = ymat,
110 | hpar = list(learningrate = 0.01,
111 | minibatchsize = 2^2,
112 | numiterations = 30))
113 | ```
114 | ```{r}
115 | res <- cbind(ymat, automl_predict(model = amlmodel, X = xmat))
116 | colnames(res) <- c('actual', 'predict')
117 | head(res)
118 | ```
119 | Better result, but with human efforts!
120 |
121 |
122 |
0.3.2 fit a regression model automatically (easy way, Mix 1)
123 | Same subject: predict Sepal.Length given other Iris parameters
124 | ```{r}
125 | data(iris)
126 | xmat <- as.matrix(cbind(iris[,2:4], as.numeric(iris$Species)))
127 | ymat <- iris[,1]
128 | start.time <- Sys.time()
129 | amlmodel <- automl_train(Xref = xmat, Yref = ymat,
130 | autopar = list(psopartpopsize = 15,
131 | numiterations = 5,
132 | auto_layers_max = 1,
133 | nbcores = 4))
134 | end.time <- Sys.time()
135 | cat(paste('time ellapsed:', end.time - start.time, '\n'))
136 | ```
137 | ```{r}
138 | res <- cbind(ymat, automl_predict(model = amlmodel, X = xmat))
139 | colnames(res) <- c('actual', 'predict')
140 | head(res)
141 | ```
142 | It’s even better, with no human efforts but machine time
143 | Windows users won’t benefit from parallelization, the function uses parallel package included with R base...
144 |
145 |
146 |
0.3.3 fit a regression model experimentally (experimental way, Mix 2)
215 | Same subject: predict Species given other Iris parameters
216 | 1st example: with gradient descent and 2 hidden layers containing 10 nodes, with various activation functions for hidden layers
217 | ```{r}
218 | data(iris)
219 | xmat = iris[,1:4]
220 | lab2pred <- levels(iris$Species)
221 | lghlab <- length(lab2pred)
222 | iris$Species <- as.numeric(iris$Species)
223 | ymat <- matrix(seq(from = 1, to = lghlab, by = 1), nrow(xmat), lghlab, byrow = TRUE)
224 | ymat <- (ymat == as.numeric(iris$Species)) + 0
225 | amlmodel <- automl_train_manual(
226 | Xref = xmat, Yref = ymat,
227 | hpar = list(
228 | layersshape = c(10, 10, 0),
229 | layersacttype = c('tanh', 'relu', ''),
230 | layersdropoprob = c(0, 0, 0)))
231 | ```
232 | nb: last activation type may be left to blank (it will be set automatically)
233 |
234 | 2nd example: with gradient descent and no hidden layer (logistic regression)
235 | ```{r}
236 | data(iris)
237 | xmat = iris[,1:4]
238 | lab2pred <- levels(iris$Species)
239 | lghlab <- length(lab2pred)
240 | iris$Species <- as.numeric(iris$Species)
241 | ymat <- matrix(seq(from = 1, to = lghlab, by = 1), nrow(xmat), lghlab, byrow = TRUE)
242 | ymat <- (ymat == as.numeric(iris$Species)) + 0
243 | amlmodel <- automl_train_manual(Xref = xmat, Yref = ymat,
244 | hpar = list(layersshape = c(0),
245 | layersacttype = c('sigmoid'),
246 | layersdropoprob = c(0)))
247 | ```
248 | We saved the model to continue training later (see below in next section)
249 | ```{r}
250 | amlmodelsaved <- amlmodel
251 | ```
252 |
253 |
254 |
0.3.7 continue training on saved model (fine tuning ...)
255 | Subject: continue training on saved model (model saved above in last section)
256 | ```{r}
257 | amlmodel <- automl_train_manual(Xref = xmat, Yref = ymat,
258 | hpar = list(numiterations = 100,
259 | psopartpopsize = 50),
260 | mdlref = amlmodelsaved)
261 | ```
262 | We can see the error continuing to decrease from last training
263 | The training continued with the same parameters, but notice that we were able to change the number of iterations
264 |
265 |
0.3.8 use the 2 steps automatic approach
266 | Same subject: predict Species given other Iris parameters
267 | Let’s try the automatic approach in 2 steps with the same Logistic Regression architecture;
268 | 1st step goal is performance, overfitting
269 | 2nd step is robustness, regularization
270 | ```{r}
271 | data(iris)
272 | xmat = iris[,1:4]
273 | lab2pred <- levels(iris$Species)
274 | lghlab <- length(lab2pred)
275 | iris$Species <- as.numeric(iris$Species)
276 | ymat <- matrix(seq(from = 1, to = lghlab, by = 1), nrow(xmat), lghlab, byrow = TRUE)
277 | ymat <- (ymat == as.numeric(iris$Species)) + 0
278 | amlmodel <- automl_train(Xref = xmat, Yref = ymat,
279 | hpar = list(layersshape = c(0),
280 | layersacttype = c('sigmoid'),
281 | layersdropoprob = c(0)),
282 | autopar = list(auto_runtype = '2steps'))
283 | ```
284 | Compared to the last runs (in previous sections above), difference between train and cross validation errors is much more tenuous
285 | Automatically :-)
286 |
287 |
0.4 ToDo List idea
288 |
- review the code to object oriented
289 | - manage transfert learning from existing frameworks
290 | - implement CNN
291 | - implement RNN
292 | - ...
293 |
294 | -> I won't do it alone, let's create a team !
295 | https://aboulaboul.github.io/automl
296 | https://github.com/aboulaboul/automl