├── .gitignore
├── LICENSE
├── README.md
├── minimize.py
├── expr_cifar10_ZLIN_normhid_nolinb_dropout.py
└── expr_cifar10_ZLIN_normhid_nolinb_dtagmt_dropout.py


/.gitignore:
--------------------------------------------------------------------------------
1 | *.pyc
2 | *.npy
3 | *.png
4 | *.out
5 | *.log
6 | 
7 | 


--------------------------------------------------------------------------------
/LICENSE:
--------------------------------------------------------------------------------
 1 | Copyright (c) 2015, Zhouhan LIN
 2 | All rights reserved.
 3 | 
 4 | Redistribution and use in source and binary forms, with or without
 5 | modification, are permitted provided that the following conditions are met:
 6 | 
 7 | * Redistributions of source code must retain the above copyright notice, this
 8 |   list of conditions and the following disclaimer.
 9 | 
10 | * Redistributions in binary form must reproduce the above copyright notice,
11 |   this list of conditions and the following disclaimer in the documentation
12 |   and/or other materials provided with the distribution.
13 | 
14 | * Neither the name of zlinnet nor the names of its
15 |   contributors may be used to endorse or promote products derived from
16 |   this software without specific prior written permission.
17 | 
18 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
19 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
20 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
21 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE
22 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL
23 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR
24 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER
25 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY,
26 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
27 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
28 | 
29 | 


--------------------------------------------------------------------------------
/README.md:
--------------------------------------------------------------------------------
 1 | # zlinnet
 2 | This repo provides an implementation of Z-Lin network. It should reproduce the results on permutation invariant CIFAR-10 reported in the paper:
 3 |  - Zhouhan Lin, Roland Memisevic, Kishore Konda, [How far can we go without convolution: Improving fully-connected networks](http://arxiv.org/pdf/1511.02580v1.pdf), arXiv preprint arXiv:1511.02580 (2015).
 4 | 
 5 | ## Setup
 6 |  - Download dependency. First you need to download NeuroBricks. It is a super light framework we use in this repo, which is currently not coverring Recurrent Nets and not documented. There are far more mature and successful ones available, like [blocks](https://github.com/mila-udem/blocks) and [lasagne](https://github.com/Lasagne/Lasagne). Execute the following line to download the source:
 7 | 
 8 |        git clone https://github.com/hantek/NeuroBricks.git
 9 | 
10 |  - (Optional) In case anything changes in the future, checking out to the snapshot at the time when this repo is published should ensure everything working fine. It may still work without checking out to that snapshot, but just in case.
11 | 
12 |        git checkout 191704feb5de67ab2815d5891dd633b9f2d04afb
13 | 
14 |  - Import the path you store NeuroBlocks to python's searching directory.
15 | 
16 |  - Specify the data path to wherever you store your CIFAR10 dataset. At line 56 on both .py scripts, modify the line to:
17 | 
18 |        cifar10_data = CIFAR10(folderpath="/path/to/your/cifar-10-batches-py/folder")
19 | 
20 | Now you should be able to run the codes with no problem.
21 | 
22 | ## Permutation invariant CIFAR-10
23 | 
24 | Execute the following commands in your terminal:
25 | 
26 |        python expr_cifar10_ZLIN_normhid_nolinb_dropout.py
27 | 
28 | It should generate an accuracy around 69.62% in the end.
29 | 
30 | ## CIFAR-10 with deformations
31 | 
32 | It is basically the same model but with data augmentation. So it's a feed-forward, fully connected network. Type
33 | 
34 |        python expr_cifar10_ZLIN_normhid_nolinb_dtagmt_dropout.py
35 | 
36 | to execute. You can expect an 78.62% accuracy after training process finishes.
37 | 
38 | 
39 | 


--------------------------------------------------------------------------------
/minimize.py:
--------------------------------------------------------------------------------
  1 | #This program is distributed WITHOUT ANY WARRANTY; without even the implied 
  2 | #warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 
  3 | #LICENSE file for more details.
  4 | #
  5 | #
  6 | #This file contains a Python version of Carl Rasmussen's Matlab-function 
  7 | #minimize.m
  8 | #
  9 | #minimize.m is copyright (C) 1999 - 2006, Carl Edward Rasmussen.
 10 | #Python adaptation by Roland Memisevic 2008.
 11 | #
 12 | #
 13 | #The following is the original copyright notice that comes with the 
 14 | #function minimize.m
 15 | #(from http://www.kyb.tuebingen.mpg.de/bs/people/carl/code/minimize/Copyright):
 16 | #
 17 | #
 18 | #"(C) Copyright 1999 - 2006, Carl Edward Rasmussen
 19 | #
 20 | #Permission is granted for anyone to copy, use, or modify these
 21 | #programs and accompanying documents for purposes of research or
 22 | #education, provided this copyright notice is retained, and note is
 23 | #made of any changes that have been made.
 24 | #
 25 | #These programs and documents are distributed without any warranty,
 26 | #express or implied.  As the programs were written for research
 27 | #purposes only, they have not been tested to the degree that would be
 28 | #advisable in any important application.  All use of these programs is
 29 | #entirely at the user's own risk."
 30 | 
 31 | 
 32 | """minimize.py 
 33 | 
 34 | This module contains a function 'minimize' that performs unconstrained
 35 | gradient based optimization using nonlinear conjugate gradients. 
 36 | 
 37 | The function is a straightforward Python-translation of Carl Rasmussen's
 38 | Matlab-function minimize.m
 39 | 
 40 | """
 41 | 
 42 | 
 43 | from numpy import dot, isinf, isnan, any, sqrt, isreal, real, nan, inf
 44 | 
 45 | def minimize(X, f, grad, args, maxnumlinesearch=None, maxnumfuneval=None, red=1.0, verbose=True):
 46 |     INT = 0.1;# don't reevaluate within 0.1 of the limit of the current bracket
 47 |     EXT = 3.0;              # extrapolate maximum 3 times the current step-size
 48 |     MAX = 20;                     # max 20 function evaluations per line search
 49 |     RATIO = 10;                                   # maximum allowed slope ratio
 50 |     SIG = 0.1;RHO = SIG/2;# SIG and RHO are the constants controlling the Wolfe-
 51 |     #Powell conditions. SIG is the maximum allowed absolute ratio between
 52 |     #previous and new slopes (derivatives in the search direction), thus setting
 53 |     #SIG to low (positive) values forces higher precision in the line-searches.
 54 |     #RHO is the minimum allowed fraction of the expected (from the slope at the
 55 |     #initial point in the linesearch). Constants must satisfy 0 < RHO < SIG < 1.
 56 |     #Tuning of SIG (depending on the nature of the function to be optimized) may
 57 |     #speed up the minimization; it is probably not worth playing much with RHO.
 58 | 
 59 |     SMALL = 10.**-16                    #minimize.m uses matlab's realmin 
 60 |     
 61 |     if maxnumlinesearch == None:
 62 |         if maxnumfuneval == None:
 63 |             raise "Specify maxnumlinesearch or maxnumfuneval"
 64 |         else:
 65 |             S = 'Function evaluation'
 66 |             length = maxnumfuneval
 67 |     else:
 68 |         if maxnumfuneval != None:
 69 |             raise "Specify either maxnumlinesearch or maxnumfuneval (not both)"
 70 |         else: 
 71 |             S = 'Linesearch'
 72 |             length = maxnumlinesearch
 73 | 
 74 |     i = 0                                         # zero the run length counter
 75 |     ls_failed = 0                          # no previous line search has failed
 76 |     f0 = f(X, *args)                          # get function value and gradient
 77 |     df0 = grad(X, *args)  
 78 |     fX = [f0]
 79 |     i = i + (length<0)                                         # count epochs?!
 80 |     s = -df0; d0 = -dot(s,s)    # initial search direction (steepest) and slope
 81 |     x3 = red/(1.0-d0)                             # initial step is red/(|s|+1)
 82 | 
 83 |     while i < abs(length):                                 # while not finished
 84 |         i = i + (length>0)                                 # count iterations?!
 85 | 
 86 |         X0 = X; F0 = f0; dF0 = df0              # make a copy of current values
 87 |         if length>0:
 88 |             M = MAX
 89 |         else: 
 90 |             M = min(MAX, -length-i)
 91 |         while 1:                      # keep extrapolating as long as necessary
 92 |             x2 = 0; f2 = f0; d2 = d0; f3 = f0; df3 = df0
 93 |             success = 0
 94 |             while (not success) and (M > 0):
 95 |                 try:
 96 |                     M = M - 1; i = i + (length<0)              # count epochs?!
 97 |                     f3 = f(X+x3*s, *args)
 98 |                     df3 = grad(X+x3*s, *args)
 99 |                     if isnan(f3) or isinf(f3) or any(isnan(df3)+isinf(df3)):
100 |                         print "error"
101 |                         return
102 |                     success = 1
103 |                 except:                    # catch any error which occured in f
104 |                     x3 = (x2+x3)/2                       # bisect and try again
105 |             if f3 < F0:
106 |                 X0 = X+x3*s; F0 = f3; dF0 = df3   # keep best values
107 |             d3 = dot(df3,s)                                         # new slope
108 |             if d3 > SIG*d0 or f3 > f0+x3*RHO*d0 or M == 0:  
109 |                                                    # are we done extrapolating?
110 |                 break
111 |             x1 = x2; f1 = f2; d1 = d2                 # move point 2 to point 1
112 |             x2 = x3; f2 = f3; d2 = d3                 # move point 3 to point 2
113 |             A = 6*(f1-f2)+3*(d2+d1)*(x2-x1)          # make cubic extrapolation
114 |             B = 3*(f2-f1)-(2*d1+d2)*(x2-x1)
115 |             Z = B+sqrt(complex(B*B-A*d1*(x2-x1)))
116 |             if Z != 0.0:
117 |                 x3 = x1-d1*(x2-x1)**2/Z              # num. error possible, ok!
118 |             else: 
119 |                 x3 = inf
120 |             if (not isreal(x3)) or isnan(x3) or isinf(x3) or (x3 < 0): 
121 |                                                        # num prob | wrong sign?
122 |                 x3 = x2*EXT                        # extrapolate maximum amount
123 |             elif x3 > x2*EXT:           # new point beyond extrapolation limit?
124 |                 x3 = x2*EXT                        # extrapolate maximum amount
125 |             elif x3 < x2+INT*(x2-x1):  # new point too close to previous point?
126 |                 x3 = x2+INT*(x2-x1)
127 |             x3 = real(x3)
128 | 
129 |         while (abs(d3) > -SIG*d0 or f3 > f0+x3*RHO*d0) and M > 0:  
130 |                                                            # keep interpolating
131 |             if (d3 > 0) or (f3 > f0+x3*RHO*d0):            # choose subinterval
132 |                 x4 = x3; f4 = f3; d4 = d3             # move point 3 to point 4
133 |             else:
134 |                 x2 = x3; f2 = f3; d2 = d3             # move point 3 to point 2
135 |             if f4 > f0:           
136 |                 x3 = x2-(0.5*d2*(x4-x2)**2)/(f4-f2-d2*(x4-x2))
137 |                                                       # quadratic interpolation
138 |             else:
139 |                 A = 6*(f2-f4)/(x4-x2)+3*(d4+d2)           # cubic interpolation
140 |                 B = 3*(f4-f2)-(2*d2+d4)*(x4-x2)
141 |                 if A != 0:
142 |                     x3=x2+(sqrt(B*B-A*d2*(x4-x2)**2)-B)/A
143 |                                                      # num. error possible, ok!
144 |                 else:
145 |                     x3 = inf
146 |             if isnan(x3) or isinf(x3):
147 |                 x3 = (x2+x4)/2      # if we had a numerical problem then bisect
148 |             x3 = max(min(x3, x4-INT*(x4-x2)),x2+INT*(x4-x2))  
149 |                                                        # don't accept too close
150 |             f3 = f(X+x3*s, *args)
151 |             df3 = grad(X+x3*s, *args)
152 |             if f3 < F0:
153 |                 X0 = X+x3*s; F0 = f3; dF0 = df3              # keep best values
154 |             M = M - 1; i = i + (length<0)                      # count epochs?!
155 |             d3 = dot(df3,s)                                         # new slope
156 | 
157 |         if abs(d3) < -SIG*d0 and f3 < f0+x3*RHO*d0:  # if line search succeeded
158 |             X = X+x3*s; f0 = f3; fX.append(f0)               # update variables
159 |             if verbose: print '%s %6i;  Value %4.6e\r' % (S, i, f0)
160 |             s = (dot(df3,df3)-dot(df0,df3))/dot(df0,df0)*s - df3
161 |                                                   # Polack-Ribiere CG direction
162 |             df0 = df3                                        # swap derivatives
163 |             d3 = d0; d0 = dot(df0,s)
164 |             if d0 > 0:                             # new slope must be negative
165 |                 s = -df0; d0 = -dot(s,s)     # otherwise use steepest direction
166 |             x3 = x3 * min(RATIO, d3/(d0-SMALL))     # slope ratio but max RATIO
167 |             ls_failed = 0                       # this line search did not fail
168 |         else:
169 |             X = X0; f0 = F0; df0 = dF0              # restore best point so far
170 |             if ls_failed or (i>abs(length)):# line search failed twice in a row
171 |                 break                    # or we ran out of time, so we give up
172 |             s = -df0; d0 = -dot(s,s)                             # try steepest
173 |             x3 = 1/(1-d0)                     
174 |             ls_failed = 1                             # this line search failed
175 |     if verbose: print "\n"
176 |     return X, fX, i
177 | 
178 | 


--------------------------------------------------------------------------------
/expr_cifar10_ZLIN_normhid_nolinb_dropout.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import sys
  3 | import numpy
  4 | import theano
  5 | import theano.tensor as T
  6 | from theano.tensor.shared_randomstreams import RandomStreams
  7 | import cPickle
  8 | 
  9 | from dataset import CIFAR10
 10 | from layer import StackedLayer
 11 | from classifier import LogisticRegression
 12 | from model import ClassicalAutoencoder, ZerobiasAutoencoder, LinearAutoencoder
 13 | from preprocess import SubtractMeanAndNormalizeH, PCA
 14 | from train import GraddescentMinibatch, Dropout
 15 | from params import save_params, load_params, set_params, get_params
 16 | 
 17 | from minimize import minimize
 18 | 
 19 | 
 20 | #######################
 21 | # SET SUPER PARAMETER #
 22 | #######################
 23 | 
 24 | pca_retain = 800
 25 | hid_layer_sizes = [4000, 1000, 4000, 1000, 4000, 1000, 4000]
 26 | batchsize = 100
 27 | zae_threshold=1.
 28 | 
 29 | momentum = 0.9
 30 | pretrain_lr_zae = 1e-3
 31 | pretrain_lr_lin = 1e-4
 32 | weightdecay = 0.001
 33 | pretrain_epc = 600
 34 | 
 35 | logreg_lr = 0.5
 36 | logreg_epc = 1000
 37 | 
 38 | finetune_lr = 5e-3
 39 | finetune_epc = 1000
 40 | 
 41 | print " "
 42 | print "pca_retain =", pca_retain
 43 | print "hid_layer_sizes =", hid_layer_sizes
 44 | print "batchsize =", batchsize
 45 | print "zae_threshold =", zae_threshold
 46 | print "momentum =", momentum
 47 | print "pretrain, zae:       lr = %f, epc = %d" % (pretrain_lr_zae, pretrain_epc)
 48 | print "pretrain, lin:       lr = %f, epc = %d, wd = %.3f" % (pretrain_lr_lin, pretrain_epc, weightdecay)
 49 | print "logistic regression: lr = %f, epc = %d" % (logreg_lr, logreg_epc)
 50 | print "finetune:            lr = %f, epc = %d" % (finetune_lr, finetune_epc)
 51 | 
 52 | #############
 53 | # LOAD DATA #
 54 | #############
 55 | 
 56 | cifar10_data = CIFAR10()
 57 | train_x, train_y = cifar10_data.get_train_set()
 58 | test_x, test_y = cifar10_data.get_test_set()
 59 | 
 60 | print "\n... pre-processing"
 61 | preprocess_model = SubtractMeanAndNormalizeH(train_x.shape[1])
 62 | map_fun = theano.function([preprocess_model.varin], preprocess_model.output())
 63 | 
 64 | pca_obj = PCA()
 65 | pca_obj.fit(map_fun(train_x), retain=pca_retain, whiten=True)
 66 | preprocess_model = preprocess_model + pca_obj.forward_layer
 67 | preprocess_function = theano.function([preprocess_model.varin], preprocess_model.output())
 68 | train_x = preprocess_function(train_x)
 69 | test_x = preprocess_function(test_x)
 70 | 
 71 | feature_num = train_x.shape[0] * train_x.shape[1]
 72 | 
 73 | train_x = theano.shared(value=train_x, name='train_x', borrow=True)
 74 | train_y = theano.shared(value=train_y, name='train_y', borrow=True)
 75 | test_x = theano.shared(value=test_x, name='test_x', borrow=True)
 76 | test_y = theano.shared(value=test_y, name='test_y', borrow=True)
 77 | print "Done."
 78 | 
 79 | #########################
 80 | # BUILD PRE-TRAIN MODEL #
 81 | #########################
 82 | 
 83 | print "... building pre-train model"
 84 | npy_rng = numpy.random.RandomState(123)
 85 | model = ZerobiasAutoencoder(
 86 |     train_x.get_value().shape[1], hid_layer_sizes[0], 
 87 |     init_w = theano.shared(
 88 |         value=0.01 * train_x.get_value()[:hid_layer_sizes[0], :].T,
 89 |         name='w_zae_0',
 90 |         borrow=True
 91 |     ),
 92 |     threshold=zae_threshold, vistype='real', tie=True, npy_rng=npy_rng
 93 | ) + SubtractMeanAndNormalizeH(hid_layer_sizes[0]
 94 | ) + LinearAutoencoder(
 95 |     hid_layer_sizes[0], hid_layer_sizes[1],
 96 |     init_w = theano.shared(
 97 |         value=numpy.tile(
 98 |             0.01 * train_x.get_value(),
 99 |             (hid_layer_sizes[0] * hid_layer_sizes[1] / feature_num + 1, 1)
100 |         ).flatten()[:(hid_layer_sizes[0] * hid_layer_sizes[1])].reshape(
101 |             hid_layer_sizes[0], hid_layer_sizes[1]
102 |         ),
103 |         name='w_ae_1',
104 |         borrow=True
105 |     ),
106 |     vistype = 'real', tie=True, npy_rng=npy_rng
107 | ) + SubtractMeanAndNormalizeH(hid_layer_sizes[1]
108 | ) + ZerobiasAutoencoder(
109 |     hid_layer_sizes[1], hid_layer_sizes[2],
110 |     init_w = theano.shared(
111 |         value=numpy.tile(
112 |             0.01 * train_x.get_value(),
113 |             (hid_layer_sizes[1] * hid_layer_sizes[2] / feature_num + 1, 1)
114 |         ).flatten()[:(hid_layer_sizes[1] * hid_layer_sizes[2])].reshape(
115 |             hid_layer_sizes[1], hid_layer_sizes[2]
116 |         ),
117 |         name='w_zae_2',
118 |         borrow=True
119 |     ),
120 |     threshold=zae_threshold, vistype='real', tie=True, npy_rng=npy_rng
121 | ) + SubtractMeanAndNormalizeH(hid_layer_sizes[2]
122 | ) + LinearAutoencoder(
123 |     hid_layer_sizes[2], hid_layer_sizes[3],
124 |     init_w = theano.shared(
125 |         value=numpy.tile(
126 |             0.01 * train_x.get_value(),
127 |             (hid_layer_sizes[2] * hid_layer_sizes[3] / feature_num + 1, 1)
128 |         ).flatten()[:(hid_layer_sizes[2] * hid_layer_sizes[3])].reshape(
129 |             hid_layer_sizes[2], hid_layer_sizes[3]
130 |         ),
131 |         name='w_ae_3',
132 |         borrow=True
133 |     ),
134 |     vistype = 'real', tie=True, npy_rng=npy_rng
135 | ) + SubtractMeanAndNormalizeH(hid_layer_sizes[3]
136 | ) + ZerobiasAutoencoder(
137 |     hid_layer_sizes[3], hid_layer_sizes[4],
138 |     init_w = theano.shared(
139 |         value=numpy.tile(
140 |             0.01 * train_x.get_value(),
141 |             (hid_layer_sizes[3] * hid_layer_sizes[4] / feature_num + 1, 1)
142 |         ).flatten()[:(hid_layer_sizes[3] * hid_layer_sizes[4])].reshape(
143 |             hid_layer_sizes[3], hid_layer_sizes[4]
144 |         ),
145 |         name='w_zae_4',
146 |         borrow=True
147 |     ),
148 |     threshold=zae_threshold, vistype='real', tie=True, npy_rng=npy_rng
149 | ) + SubtractMeanAndNormalizeH(hid_layer_sizes[4]
150 | ) + LinearAutoencoder(
151 |     hid_layer_sizes[4], hid_layer_sizes[5],
152 |     init_w = theano.shared(
153 |         value=numpy.tile(
154 |             0.01 * train_x.get_value(),
155 |             (hid_layer_sizes[4] * hid_layer_sizes[5] / feature_num + 1, 1)
156 |         ).flatten()[:(hid_layer_sizes[4] * hid_layer_sizes[5])].reshape(
157 |             hid_layer_sizes[4], hid_layer_sizes[5]
158 |         ),
159 |         name='w_ae_5',
160 |         borrow=True
161 |     ),
162 |     vistype = 'real', tie=True, npy_rng=npy_rng
163 | ) + SubtractMeanAndNormalizeH(hid_layer_sizes[5]
164 | ) + ZerobiasAutoencoder(
165 |     hid_layer_sizes[5], hid_layer_sizes[6],
166 |     init_w = theano.shared(
167 |         value=numpy.tile(
168 |             0.01 * train_x.get_value(),
169 |             (hid_layer_sizes[5] * hid_layer_sizes[6] / feature_num + 1, 1)
170 |         ).flatten()[:(hid_layer_sizes[5] * hid_layer_sizes[6])].reshape(
171 |             hid_layer_sizes[5], hid_layer_sizes[6]
172 |         ),
173 |         name='w_zae_6',
174 |         borrow=True
175 |     ),
176 |     threshold=zae_threshold, vistype='real', tie=True, npy_rng=npy_rng
177 | )
178 | model.models_stack[2].params = [model.models_stack[2].w]
179 | model.models_stack[2].params_private = [model.models_stack[2].w, model.models_stack[2].bT]
180 | model.models_stack[6].params = [model.models_stack[6].w]
181 | model.models_stack[6].params_private = [model.models_stack[6].w, model.models_stack[6].bT]
182 | model.models_stack[10].params = [model.models_stack[10].w]
183 | model.models_stack[10].params_private = [model.models_stack[10].w, model.models_stack[10].bT]
184 | 
185 | model.print_layer()
186 | print "Done."
187 | 
188 | #############
189 | # PRE-TRAIN #
190 | #############
191 | 
192 | theano_rng = RandomStreams(123)
193 | for i in range(0, len(model.models_stack), 2):
194 |     if (i + 2) % 4 == 0:
195 |         model.models_stack[i-2].threshold = 0.
196 |         model.models_stack[i-1].varin = model.models_stack[i-2].output()
197 | 
198 |     print "\n\nPre-training layer %d:" % i
199 |     layer_dropout = Dropout(model.models_stack[i], droprates=[0.2, 0.5], theano_rng=theano_rng).dropout_model
200 |     layer_dropout.varin = model.models_stack[i].varin
201 | 
202 |     if (i + 2) % 4 == 0:
203 |         model.models_stack[i-2].threshold = 0.
204 |         pretrain_lr = pretrain_lr_lin
205 |         layer_cost = layer_dropout.cost() + layer_dropout.weightdecay(weightdecay)
206 |     else:
207 |         pretrain_lr = pretrain_lr_zae
208 |         layer_cost = layer_dropout.cost()
209 | 
210 |     trainer = GraddescentMinibatch(
211 |         varin=model.varin, data=train_x,
212 |         cost=layer_cost,
213 |         params=layer_dropout.params_private,
214 |         supervised=False,
215 |         batchsize=batchsize, learningrate=pretrain_lr, momentum=momentum,
216 |         rng=npy_rng
217 |     )
218 | 
219 |     prev_cost = numpy.inf
220 |     patience = 0
221 |     for epoch in xrange(pretrain_epc):
222 |         cost = trainer.epoch()
223 |         if prev_cost <= cost:
224 |             patience += 1 
225 |             if patience > 10:
226 |                 patience = 0
227 |                 trainer.set_learningrate(0.9 * trainer.learningrate)
228 |             if trainer.learningrate < 1e-10:
229 |                 break
230 |         prev_cost = cost
231 |     save_params(model, 'ZLIN_4000_1000_4000_1000_4000_1000_4000_normhid_nolinb_cae1_dropout.npy')
232 | print "Done."
233 | 
234 | 
235 | #########################
236 | # BUILD FINE-TUNE MODEL #
237 | #########################
238 | 
239 | print "\n\n... building fine-tune model -- contraction 1"
240 | for imodel in model.models_stack:
241 |     imodel.threshold = 0.
242 | model_ft = model + LogisticRegression(
243 |     hid_layer_sizes[-1], 10, npy_rng=npy_rng
244 | )
245 | model_ft.print_layer()
246 | 
247 | train_set_error_rate = theano.function(
248 |     [],
249 |     T.mean(T.neq(model_ft.models_stack[-1].predict(), train_y)),
250 |     givens = {model_ft.varin : train_x},
251 | )
252 | test_set_error_rate = theano.function(
253 |     [],
254 |     T.mean(T.neq(model_ft.models_stack[-1].predict(), test_y)),
255 |     givens = {model_ft.varin : test_x},
256 | )
257 | print "Done."
258 | 
259 | print "... training with conjugate gradient: minimize.py"
260 | fun_cost = theano.function(
261 |     [model_ft.varin, model_ft.models_stack[-1].vartruth],
262 |     model_ft.models_stack[-1].cost() + model_ft.models_stack[-1].weightdecay(weightdecay)
263 | )
264 | def return_cost(test_params, input_x, truth_y):
265 |     tmp = get_params(model_ft.models_stack[-1])
266 |     set_params(model_ft.models_stack[-1], test_params)
267 |     result = fun_cost(input_x, truth_y)
268 |     set_params(model_ft.models_stack[-1], tmp)
269 |     return result
270 | 
271 | fun_grad = theano.function(
272 |     [model_ft.varin, model_ft.models_stack[-1].vartruth],
273 |     T.grad(model_ft.models_stack[-1].cost() + model_ft.models_stack[-1].weightdecay(weightdecay),
274 |            model_ft.models_stack[-1].params)
275 | )
276 | def return_grad(test_params, input_x, truth_y):
277 |     tmp = get_params(model_ft.models_stack[-1])
278 |     set_params(model_ft.models_stack[-1], test_params)
279 |     result = numpy.concatenate([numpy.array(i).flatten() for i in fun_grad(input_x, truth_y)])
280 |     set_params(model_ft.models_stack[-1], tmp)
281 |     return result
282 | p, g, numlinesearches = minimize(
283 |     get_params(model_ft.models_stack[-1]), return_cost, return_grad,
284 |     (train_x.get_value(), train_y.get_value()), logreg_epc, verbose=False
285 | )
286 | set_params(model_ft.models_stack[-1], p)
287 | save_params(model_ft, 'ZLIN_4000_1000_4000_1000_4000_1000_4000_10_normhid_nolinb_cae1_dropout.npy')
288 | print "***error rate: train: %f, test: %f" % (
289 |     train_set_error_rate(), test_set_error_rate()
290 | )
291 | 
292 | #############
293 | # FINE-TUNE #
294 | #############
295 | 
296 | """
297 | print "\n\n... fine-tuning the whole network"
298 | truth = T.lmatrix('truth')
299 | trainer = GraddescentMinibatch(
300 |     varin=model_ft.varin, data=train_x, 
301 |     truth=model_ft.models_stack[-1].vartruth, truth_data=train_y,
302 |     supervised=True,
303 |     cost=model_ft.models_stack[-1].cost(), 
304 |     params=model.params,
305 |     batchsize=batchsize, learningrate=finetune_lr, momentum=momentum,
306 |     rng=npy_rng
307 | )
308 | 
309 | prev_cost = numpy.inf
310 | for epoch in xrange(finetune_epc):
311 |     cost = trainer.epoch()
312 |     if epoch % 100 == 0 and epoch != 0:  # prev_cost <= cost:
313 |         trainer.set_learningrate(trainer.learningrate*0.8)
314 |     if epoch % 50 == 0:
315 |         print "***error rate: train: %f, test: %f" % (
316 |             train_set_error_rate(), test_set_error_rate()
317 |         )
318 |     prev_cost = cost
319 | print "Done."
320 | """
321 | 
322 | 
323 | 
324 | print "\n\n... fine-tuning the whole network, with dropout"
325 | theano_rng = RandomStreams(123)
326 | dropout_ft = Dropout(model_ft, droprates=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1], theano_rng=theano_rng).dropout_model
327 | dropout_ft.print_layer()
328 | 
329 | trainer = GraddescentMinibatch(
330 |     varin=dropout_ft.varin, data=train_x, 
331 |     truth=dropout_ft.models_stack[-1].vartruth, truth_data=train_y,
332 |     supervised=True,
333 |     cost=dropout_ft.models_stack[-1].cost(),
334 |     params=dropout_ft.params,
335 |     batchsize=batchsize, learningrate=finetune_lr, momentum=momentum,
336 |     rng=npy_rng
337 | )
338 | 
339 | prev_cost = numpy.inf
340 | patience = 0
341 | for epoch in xrange(1000):
342 |     cost = trainer.epoch()
343 |     if prev_cost <= cost:
344 |         patience += 1
345 |         if patience > 5:
346 |             patience = 0
347 |             trainer.set_learningrate(trainer.learningrate * 0.9)
348 |         if trainer.learningrate < 1e-10:
349 |             break
350 |     print "***error rate: train: %f, test: %f" % (train_set_error_rate(), test_set_error_rate())
351 |     prev_cost = cost
352 | print "Done."
353 | 
354 | print "***FINAL error rate, train: %f, test: %f" % (
355 |     train_set_error_rate(), test_set_error_rate()
356 | )
357 | save_params(model_ft, 'ZLIN_4000_1000_4000_1000_4000_1000_4000_10_normhid_nolinb_cae1_dropout_dpft.npy')
358 | 


--------------------------------------------------------------------------------
/expr_cifar10_ZLIN_normhid_nolinb_dtagmt_dropout.py:
--------------------------------------------------------------------------------
  1 | import os
  2 | import sys
  3 | import numpy
  4 | from scipy import ndimage
  5 | import theano
  6 | import theano.tensor as T
  7 | from theano.tensor.shared_randomstreams import RandomStreams
  8 | import cPickle
  9 | 
 10 | from dataset import CIFAR10
 11 | from layer import StackedLayer
 12 | from classifier import LogisticRegression
 13 | from model import ClassicalAutoencoder, ZerobiasAutoencoder
 14 | from preprocess import SubtractMeanAndNormalizeH, PCA
 15 | from train import GraddescentMinibatch, Dropout
 16 | from params import save_params, load_params, set_params, get_params
 17 | 
 18 | from minimize import minimize
 19 | 
 20 | #######################
 21 | # SET SUPER PARAMETER #
 22 | #######################
 23 | 
 24 | pca_retain = 800
 25 | hid_layer_sizes = [4000, 1000, 4000]
 26 | batchsize = 100
 27 | zae_threshold=1.
 28 | 
 29 | momentum = 0.9
 30 | pretrain_lr_zae = 1e-3
 31 | pretrain_lr_lin = 1e-4
 32 | weightdecay = 1.0
 33 | pretrain_epc = 800
 34 | 
 35 | logreg_lr = 0.5
 36 | logreg_epc = 1000
 37 | 
 38 | finetune_lr = 5e-3
 39 | finetune_epc = 1000
 40 | 
 41 | print " "
 42 | print "pca_retain =", pca_retain
 43 | print "hid_layer_sizes =", hid_layer_sizes
 44 | print "batchsize =", batchsize
 45 | print "zae_threshold =", zae_threshold
 46 | print "momentum =", momentum
 47 | print "pretrain, zae:       lr = %f, epc = %d" % (pretrain_lr_zae, pretrain_epc)
 48 | print "pretrain, lin:       lr = %f, epc = %d, wd = %.3f" % (pretrain_lr_lin, pretrain_epc, weightdecay)
 49 | print "logistic regression: lr = %f, epc = %d" % (logreg_lr, logreg_epc)
 50 | print "finetune:            lr = %f, epc = %d" % (finetune_lr, finetune_epc)
 51 | 
 52 | #############
 53 | # LOAD DATA #
 54 | #############
 55 | 
 56 | cifar10_data = CIFAR10()
 57 | train_x, train_y = cifar10_data.get_train_set()
 58 | test_x, test_y = cifar10_data.get_test_set()
 59 | 
 60 | print "\n... pre-processing"
 61 | preprocess_model = SubtractMeanAndNormalizeH(train_x.shape[1])
 62 | map_fun = theano.function([preprocess_model.varin], preprocess_model.output())
 63 | 
 64 | pca_obj = PCA()
 65 | pca_obj.fit(map_fun(train_x), retain=pca_retain, whiten=True)
 66 | preprocess_model = preprocess_model + pca_obj.forward_layer
 67 | preprocess_function = theano.function([preprocess_model.varin], preprocess_model.output())
 68 | 
 69 | pcamapping = theano.function([pca_obj.forward_layer.varin], pca_obj.forward_layer.output())
 70 | pcaback = theano.function([pca_obj.backward_layer.varin], pca_obj.backward_layer.output())
 71 | 
 72 | train_x = preprocess_function(train_x)
 73 | test_x = preprocess_function(test_x)
 74 | 
 75 | feature_num = train_x.shape[0] * train_x.shape[1]
 76 | 
 77 | train_x = theano.shared(value=train_x, name='train_x', borrow=True)
 78 | train_y = theano.shared(value=train_y, name='train_y', borrow=True)
 79 | test_x = theano.shared(value=test_x, name='test_x', borrow=True)
 80 | test_y = theano.shared(value=test_y, name='test_y', borrow=True)
 81 | print "Done."
 82 | 
 83 | #########################
 84 | # BUILD PRE-TRAIN MODEL #
 85 | #########################
 86 | 
 87 | print "... building pre-train model"
 88 | npy_rng = numpy.random.RandomState(123)
 89 | model = ZerobiasAutoencoder(
 90 |     train_x.get_value().shape[1], hid_layer_sizes[0], 
 91 |     init_w = theano.shared(
 92 |         value=0.01 * train_x.get_value()[:hid_layer_sizes[0], :].T,
 93 |         name='w_zae_0',
 94 |         borrow=True
 95 |     ),
 96 |     threshold=zae_threshold, vistype='real', tie=True, npy_rng=npy_rng
 97 | ) + SubtractMeanAndNormalizeH(hid_layer_sizes[0]
 98 | ) + ClassicalAutoencoder(
 99 |     hid_layer_sizes[0], hid_layer_sizes[1],
100 |     init_w = theano.shared(
101 |         value=numpy.tile(
102 |             0.01 * train_x.get_value(),
103 |             (hid_layer_sizes[0] * hid_layer_sizes[1] / feature_num + 1, 1)
104 |         ).flatten()[:(hid_layer_sizes[0] * hid_layer_sizes[1])].reshape(
105 |             hid_layer_sizes[0], hid_layer_sizes[1]
106 |         ),
107 |         name='w_ae_1',
108 |         borrow=True
109 |     ),
110 |     vistype = 'real', tie=True, npy_rng=npy_rng
111 | ) + SubtractMeanAndNormalizeH(hid_layer_sizes[1]
112 | ) + ZerobiasAutoencoder(
113 |     hid_layer_sizes[1], hid_layer_sizes[2],
114 |     init_w = theano.shared(
115 |         value=numpy.tile(
116 |             0.01 * train_x.get_value(),
117 |             (hid_layer_sizes[1] * hid_layer_sizes[2] / feature_num + 1, 1)
118 |         ).flatten()[:(hid_layer_sizes[1] * hid_layer_sizes[2])].reshape(
119 |             hid_layer_sizes[1], hid_layer_sizes[2]
120 |         ),
121 |         name='w_zae_2',
122 |         borrow=True
123 |     ),
124 |     threshold=zae_threshold, vistype='real', tie=True, npy_rng=npy_rng
125 | )
126 | model.models_stack[2].params = [model.models_stack[2].w]
127 | model.models_stack[2].params_private = [model.models_stack[2].w, model.models_stack[2].bT]
128 | 
129 | model.print_layer()
130 | print "Done."
131 | 
132 | #############
133 | # PRE-TRAIN #
134 | #############
135 | 
136 | theano_rng = RandomStreams(123)
137 | for i in range(0, len(model.models_stack), 2):
138 |     if (i + 2) % 4 == 0:
139 |         model.models_stack[i-2].threshold = 0.
140 |         model.models_stack[i-1].varin = model.models_stack[i-2].output()
141 | 
142 |     print "\n\nPre-training layer %d:" % i
143 |     layer_dropout = Dropout(model.models_stack[i], droprates=[0.2, 0.5], theano_rng=theano_rng).dropout_model
144 |     layer_dropout.varin = model.models_stack[i].varin
145 | 
146 |     if (i + 2) % 4 == 0:
147 |         model.models_stack[i-2].threshold = 0.
148 |         pretrain_lr = pretrain_lr_lin
149 |         layer_cost = layer_dropout.cost() + layer_dropout.contraction(weightdecay)
150 |     else:
151 |         pretrain_lr = pretrain_lr_zae
152 |         layer_cost = layer_dropout.cost()
153 | 
154 |     trainer = GraddescentMinibatch(
155 |         varin=model.varin, data=train_x,
156 |         cost=layer_cost,
157 |         params=layer_dropout.params_private,
158 |         supervised=False,
159 |         batchsize=batchsize, learningrate=pretrain_lr, momentum=momentum,
160 |         rng=npy_rng
161 |     )
162 | 
163 |     prev_cost = numpy.inf
164 |     patience = 0
165 |     origin_x = train_x.get_value()
166 |     reshaped_x = pcaback(origin_x).reshape((50000, 32, 32, 3), order='F')
167 |     for epoch in xrange(pretrain_epc):
168 |         cost = 0.
169 |         # original data
170 |         cost += trainer.epoch()
171 |         
172 |         # data augmentation: horizontal flip
173 |         flipped_x = reshaped_x[:, ::-1, :, :].reshape((50000, 3072), order='F')
174 |         train_x.set_value(pcamapping(flipped_x))
175 |         cost += trainer.epoch()
176 | 
177 |         # random rotation
178 |         movies = numpy.zeros(reshaped_x.shape, dtype=theano.config.floatX)
179 |         for i, im in enumerate(reshaped_x):
180 |             angle_delta = (npy_rng.vonmises(0.0, 1.0)/(4 * numpy.pi)) * 180
181 |             movies[i] = ndimage.rotate(im, angle_delta, reshape=False, mode='wrap')
182 |         train_x.set_value(pcamapping(movies.reshape((50000, 3072), order='F')))
183 |         cost += trainer.epoch()
184 | 
185 |         # random shift
186 |         for i, im in enumerate(reshaped_x):
187 |             shifts = (npy_rng.randint(-4, 4), npy_rng.randint(-4, 4), 0)
188 |             movies[i] = ndimage.interpolation.shift(im, shifts, mode='reflect')
189 |         train_x.set_value(pcamapping(movies.reshape((50000, 3072), order='F')))
190 |         cost += trainer.epoch()
191 | 
192 |         cost /= 4.
193 |         train_x.set_value(origin_x)
194 |         
195 |         if prev_cost <= cost:
196 |             patience += 1
197 |             if patience > 10:
198 |                 patience = 0
199 |                 trainer.set_learningrate(0.9 * trainer.learningrate)
200 |             if trainer.learningrate < 1e-10:
201 |                 break
202 |         prev_cost = cost
203 | save_params(model, 'ZLIN_4000_1000_4000_normhid_nolinb_cae1_dtagmt2_dropout.npy')
204 | print "Done."
205 | 
206 | 
207 | #########################
208 | # BUILD FINE-TUNE MODEL #
209 | #########################
210 | 
211 | print "\n\n... building fine-tune model -- contraction 1"
212 | for imodel in model.models_stack:
213 |     imodel.threshold = 0.
214 | model_ft = model + LogisticRegression(
215 |     hid_layer_sizes[-1], 10, npy_rng=npy_rng
216 | )
217 | model_ft.print_layer()
218 | 
219 | train_set_error_rate = theano.function(
220 |     [],
221 |     T.mean(T.neq(model_ft.models_stack[-1].predict(), train_y)),
222 |     givens = {model_ft.varin : train_x},
223 | )
224 | test_set_error_rate = theano.function(
225 |     [],
226 |     T.mean(T.neq(model_ft.models_stack[-1].predict(), test_y)),
227 |     givens = {model_ft.varin : test_x},
228 | )
229 | print "Done."
230 | 
231 | print "... training with conjugate gradient: minimize.py"
232 | fun_cost = theano.function(
233 |     [model_ft.varin, model_ft.models_stack[-1].vartruth],
234 |     model_ft.models_stack[-1].cost() + model_ft.models_stack[-1].weightdecay(weightdecay)
235 | )
236 | def return_cost(test_params, input_x, truth_y):
237 |     tmp = get_params(model_ft.models_stack[-1])
238 |     set_params(model_ft.models_stack[-1], test_params)
239 |     result = fun_cost(input_x, truth_y)
240 |     set_params(model_ft.models_stack[-1], tmp)
241 |     return result
242 | 
243 | fun_grad = theano.function(
244 |     [model_ft.varin, model_ft.models_stack[-1].vartruth],
245 |     T.grad(model_ft.models_stack[-1].cost() + model_ft.models_stack[-1].weightdecay(weightdecay),
246 |            model_ft.models_stack[-1].params)
247 | )
248 | def return_grad(test_params, input_x, truth_y):
249 |     tmp = get_params(model_ft.models_stack[-1])
250 |     set_params(model_ft.models_stack[-1], test_params)
251 |     result = numpy.concatenate([numpy.array(i).flatten() for i in fun_grad(input_x, truth_y)])
252 |     set_params(model_ft.models_stack[-1], tmp)
253 |     return result
254 | p, g, numlinesearches = minimize(
255 |     get_params(model_ft.models_stack[-1]), return_cost, return_grad,
256 |     (train_x.get_value(), train_y.get_value()), logreg_epc, verbose=False
257 | )
258 | set_params(model_ft.models_stack[-1], p)
259 | save_params(model_ft, 'ZLIN_4000_1000_4000_10_normhid_nolinb_cae1_dtagmt2_dropout.npy')
260 | 
261 | load_params(model_ft, 'ZLIN_4000_1000_4000_10_normhid_nolinb_cae1_dtagmt2_dropout.npy')
262 | print "***error rate: train: %f, test: %f" % (
263 |     train_set_error_rate(), test_set_error_rate()
264 | )
265 | 
266 | #############
267 | # FINE-TUNE #
268 | #############
269 | 
270 | """
271 | print "\n\n... fine-tuning the whole network"
272 | truth = T.lmatrix('truth')
273 | trainer = GraddescentMinibatch(
274 |     varin=model_ft.varin, data=train_x, 
275 |     truth=model_ft.models_stack[-1].vartruth, truth_data=train_y,
276 |     supervised=True,
277 |     cost=model_ft.models_stack[-1].cost(), 
278 |     params=model.params,
279 |     batchsize=batchsize, learningrate=finetune_lr, momentum=momentum,
280 |     rng=npy_rng
281 | )
282 | 
283 | prev_cost = numpy.inf
284 | for epoch in xrange(finetune_epc):
285 |     cost = trainer.epoch()
286 |     if epoch % 100 == 0 and epoch != 0:  # prev_cost <= cost:
287 |         trainer.set_learningrate(trainer.learningrate*0.8)
288 |     if epoch % 50 == 0:
289 |         print "***error rate: train: %f, test: %f" % (
290 |             train_set_error_rate(), test_set_error_rate()
291 |         )
292 |     prev_cost = cost
293 | print "Done."
294 | """
295 | 
296 | 
297 | 
298 | print "\n\n... fine-tuning the whole network, with dropout"
299 | theano_rng = RandomStreams(123)
300 | dropout_ft = Dropout(model_ft, droprates=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1], theano_rng=theano_rng).dropout_model
301 | dropout_ft.print_layer()
302 | 
303 | trainer = GraddescentMinibatch(
304 |     varin=dropout_ft.varin, data=train_x, 
305 |     truth=dropout_ft.models_stack[-1].vartruth, truth_data=train_y,
306 |     supervised=True,
307 |     cost=dropout_ft.models_stack[-1].cost(),
308 |     params=dropout_ft.params,
309 |     batchsize=batchsize, learningrate=finetune_lr, momentum=momentum,
310 |     rng=npy_rng
311 | )
312 | 
313 | origin_x = train_x.get_value()
314 | reshaped_x = pcaback(origin_x).reshape((50000, 32, 32, 3), order='F')
315 | prev_cost = numpy.inf
316 | patience = 0
317 | for epoch in xrange(1000):
318 |     cost = 0.
319 |     # original data
320 |     cost += trainer.epoch()
321 |         
322 |     # data augmentation: horizontal flip
323 |     flipped_x = reshaped_x[:, ::-1, :, :].reshape((50000, 3072), order='F')
324 |     train_x.set_value(pcamapping(flipped_x))
325 |     cost += trainer.epoch()
326 | 
327 |     # random rotation
328 |     movies = numpy.zeros(reshaped_x.shape, dtype=theano.config.floatX)
329 |     for i, im in enumerate(reshaped_x):
330 |         angle_delta = (npy_rng.vonmises(0.0, 1.0)/(4 * numpy.pi)) * 180
331 |         movies[i] = ndimage.rotate(im, angle_delta, reshape=False, mode='wrap')
332 |     train_x.set_value(pcamapping(movies.reshape((50000, 3072), order='F')))
333 |     cost += trainer.epoch()
334 | 
335 |     # random shift
336 |     for i, im in enumerate(reshaped_x):
337 |         shifts = (npy_rng.randint(-4, 4), npy_rng.randint(-4, 4), 0)
338 |         movies[i] = ndimage.interpolation.shift(im, shifts, mode='reflect')
339 |     train_x.set_value(pcamapping(movies.reshape((50000, 3072), order='F')))
340 |     cost += trainer.epoch()
341 | 
342 |     cost /= 4.
343 |     train_x.set_value(origin_x)
344 |         
345 |     if prev_cost <= cost:
346 |         patience += 1
347 |         if patience > 5:
348 |             patience = 0
349 |             trainer.set_learningrate(0.9 * trainer.learningrate)
350 |         if trainer.learningrate < 1e-10:
351 |             break
352 |     print "***error rate: train: %f, test: %f" % (train_set_error_rate(), test_set_error_rate())
353 |     prev_cost = cost
354 | print "Done."
355 | 
356 | print "***FINAL error rate, train: %f, test: %f" % (
357 |     train_set_error_rate(), test_set_error_rate()
358 | )
359 | save_params(model_ft, 'ZLIN_4000_1000_4000_10_normhid_nolinb_cae1_dtagmt2_dropout_dpft.npy')
360 | 


--------------------------------------------------------------------------------