├── .gitignore ├── LICENSE ├── README.md ├── minimize.py ├── expr_cifar10_ZLIN_normhid_nolinb_dropout.py └── expr_cifar10_ZLIN_normhid_nolinb_dtagmt_dropout.py /.gitignore: -------------------------------------------------------------------------------- 1 | *.pyc 2 | *.npy 3 | *.png 4 | *.out 5 | *.log 6 | 7 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | Copyright (c) 2015, Zhouhan LIN 2 | All rights reserved. 3 | 4 | Redistribution and use in source and binary forms, with or without 5 | modification, are permitted provided that the following conditions are met: 6 | 7 | * Redistributions of source code must retain the above copyright notice, this 8 | list of conditions and the following disclaimer. 9 | 10 | * Redistributions in binary form must reproduce the above copyright notice, 11 | this list of conditions and the following disclaimer in the documentation 12 | and/or other materials provided with the distribution. 13 | 14 | * Neither the name of zlinnet nor the names of its 15 | contributors may be used to endorse or promote products derived from 16 | this software without specific prior written permission. 17 | 18 | THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" 19 | AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE 20 | IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE 21 | DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE 22 | FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL 23 | DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR 24 | SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER 25 | CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, 26 | OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE 27 | OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. 28 | 29 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # zlinnet 2 | This repo provides an implementation of Z-Lin network. It should reproduce the results on permutation invariant CIFAR-10 reported in the paper: 3 | - Zhouhan Lin, Roland Memisevic, Kishore Konda, [How far can we go without convolution: Improving fully-connected networks](http://arxiv.org/pdf/1511.02580v1.pdf), arXiv preprint arXiv:1511.02580 (2015). 4 | 5 | ## Setup 6 | - Download dependency. First you need to download NeuroBricks. It is a super light framework we use in this repo, which is currently not coverring Recurrent Nets and not documented. There are far more mature and successful ones available, like [blocks](https://github.com/mila-udem/blocks) and [lasagne](https://github.com/Lasagne/Lasagne). Execute the following line to download the source: 7 | 8 | git clone https://github.com/hantek/NeuroBricks.git 9 | 10 | - (Optional) In case anything changes in the future, checking out to the snapshot at the time when this repo is published should ensure everything working fine. It may still work without checking out to that snapshot, but just in case. 11 | 12 | git checkout 191704feb5de67ab2815d5891dd633b9f2d04afb 13 | 14 | - Import the path you store NeuroBlocks to python's searching directory. 15 | 16 | - Specify the data path to wherever you store your CIFAR10 dataset. At line 56 on both .py scripts, modify the line to: 17 | 18 | cifar10_data = CIFAR10(folderpath="/path/to/your/cifar-10-batches-py/folder") 19 | 20 | Now you should be able to run the codes with no problem. 21 | 22 | ## Permutation invariant CIFAR-10 23 | 24 | Execute the following commands in your terminal: 25 | 26 | python expr_cifar10_ZLIN_normhid_nolinb_dropout.py 27 | 28 | It should generate an accuracy around 69.62% in the end. 29 | 30 | ## CIFAR-10 with deformations 31 | 32 | It is basically the same model but with data augmentation. So it's a feed-forward, fully connected network. Type 33 | 34 | python expr_cifar10_ZLIN_normhid_nolinb_dtagmt_dropout.py 35 | 36 | to execute. You can expect an 78.62% accuracy after training process finishes. 37 | 38 | 39 | -------------------------------------------------------------------------------- /minimize.py: -------------------------------------------------------------------------------- 1 | #This program is distributed WITHOUT ANY WARRANTY; without even the implied 2 | #warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the 3 | #LICENSE file for more details. 4 | # 5 | # 6 | #This file contains a Python version of Carl Rasmussen's Matlab-function 7 | #minimize.m 8 | # 9 | #minimize.m is copyright (C) 1999 - 2006, Carl Edward Rasmussen. 10 | #Python adaptation by Roland Memisevic 2008. 11 | # 12 | # 13 | #The following is the original copyright notice that comes with the 14 | #function minimize.m 15 | #(from http://www.kyb.tuebingen.mpg.de/bs/people/carl/code/minimize/Copyright): 16 | # 17 | # 18 | #"(C) Copyright 1999 - 2006, Carl Edward Rasmussen 19 | # 20 | #Permission is granted for anyone to copy, use, or modify these 21 | #programs and accompanying documents for purposes of research or 22 | #education, provided this copyright notice is retained, and note is 23 | #made of any changes that have been made. 24 | # 25 | #These programs and documents are distributed without any warranty, 26 | #express or implied. As the programs were written for research 27 | #purposes only, they have not been tested to the degree that would be 28 | #advisable in any important application. All use of these programs is 29 | #entirely at the user's own risk." 30 | 31 | 32 | """minimize.py 33 | 34 | This module contains a function 'minimize' that performs unconstrained 35 | gradient based optimization using nonlinear conjugate gradients. 36 | 37 | The function is a straightforward Python-translation of Carl Rasmussen's 38 | Matlab-function minimize.m 39 | 40 | """ 41 | 42 | 43 | from numpy import dot, isinf, isnan, any, sqrt, isreal, real, nan, inf 44 | 45 | def minimize(X, f, grad, args, maxnumlinesearch=None, maxnumfuneval=None, red=1.0, verbose=True): 46 | INT = 0.1;# don't reevaluate within 0.1 of the limit of the current bracket 47 | EXT = 3.0; # extrapolate maximum 3 times the current step-size 48 | MAX = 20; # max 20 function evaluations per line search 49 | RATIO = 10; # maximum allowed slope ratio 50 | SIG = 0.1;RHO = SIG/2;# SIG and RHO are the constants controlling the Wolfe- 51 | #Powell conditions. SIG is the maximum allowed absolute ratio between 52 | #previous and new slopes (derivatives in the search direction), thus setting 53 | #SIG to low (positive) values forces higher precision in the line-searches. 54 | #RHO is the minimum allowed fraction of the expected (from the slope at the 55 | #initial point in the linesearch). Constants must satisfy 0 < RHO < SIG < 1. 56 | #Tuning of SIG (depending on the nature of the function to be optimized) may 57 | #speed up the minimization; it is probably not worth playing much with RHO. 58 | 59 | SMALL = 10.**-16 #minimize.m uses matlab's realmin 60 | 61 | if maxnumlinesearch == None: 62 | if maxnumfuneval == None: 63 | raise "Specify maxnumlinesearch or maxnumfuneval" 64 | else: 65 | S = 'Function evaluation' 66 | length = maxnumfuneval 67 | else: 68 | if maxnumfuneval != None: 69 | raise "Specify either maxnumlinesearch or maxnumfuneval (not both)" 70 | else: 71 | S = 'Linesearch' 72 | length = maxnumlinesearch 73 | 74 | i = 0 # zero the run length counter 75 | ls_failed = 0 # no previous line search has failed 76 | f0 = f(X, *args) # get function value and gradient 77 | df0 = grad(X, *args) 78 | fX = [f0] 79 | i = i + (length<0) # count epochs?! 80 | s = -df0; d0 = -dot(s,s) # initial search direction (steepest) and slope 81 | x3 = red/(1.0-d0) # initial step is red/(|s|+1) 82 | 83 | while i < abs(length): # while not finished 84 | i = i + (length>0) # count iterations?! 85 | 86 | X0 = X; F0 = f0; dF0 = df0 # make a copy of current values 87 | if length>0: 88 | M = MAX 89 | else: 90 | M = min(MAX, -length-i) 91 | while 1: # keep extrapolating as long as necessary 92 | x2 = 0; f2 = f0; d2 = d0; f3 = f0; df3 = df0 93 | success = 0 94 | while (not success) and (M > 0): 95 | try: 96 | M = M - 1; i = i + (length<0) # count epochs?! 97 | f3 = f(X+x3*s, *args) 98 | df3 = grad(X+x3*s, *args) 99 | if isnan(f3) or isinf(f3) or any(isnan(df3)+isinf(df3)): 100 | print "error" 101 | return 102 | success = 1 103 | except: # catch any error which occured in f 104 | x3 = (x2+x3)/2 # bisect and try again 105 | if f3 < F0: 106 | X0 = X+x3*s; F0 = f3; dF0 = df3 # keep best values 107 | d3 = dot(df3,s) # new slope 108 | if d3 > SIG*d0 or f3 > f0+x3*RHO*d0 or M == 0: 109 | # are we done extrapolating? 110 | break 111 | x1 = x2; f1 = f2; d1 = d2 # move point 2 to point 1 112 | x2 = x3; f2 = f3; d2 = d3 # move point 3 to point 2 113 | A = 6*(f1-f2)+3*(d2+d1)*(x2-x1) # make cubic extrapolation 114 | B = 3*(f2-f1)-(2*d1+d2)*(x2-x1) 115 | Z = B+sqrt(complex(B*B-A*d1*(x2-x1))) 116 | if Z != 0.0: 117 | x3 = x1-d1*(x2-x1)**2/Z # num. error possible, ok! 118 | else: 119 | x3 = inf 120 | if (not isreal(x3)) or isnan(x3) or isinf(x3) or (x3 < 0): 121 | # num prob | wrong sign? 122 | x3 = x2*EXT # extrapolate maximum amount 123 | elif x3 > x2*EXT: # new point beyond extrapolation limit? 124 | x3 = x2*EXT # extrapolate maximum amount 125 | elif x3 < x2+INT*(x2-x1): # new point too close to previous point? 126 | x3 = x2+INT*(x2-x1) 127 | x3 = real(x3) 128 | 129 | while (abs(d3) > -SIG*d0 or f3 > f0+x3*RHO*d0) and M > 0: 130 | # keep interpolating 131 | if (d3 > 0) or (f3 > f0+x3*RHO*d0): # choose subinterval 132 | x4 = x3; f4 = f3; d4 = d3 # move point 3 to point 4 133 | else: 134 | x2 = x3; f2 = f3; d2 = d3 # move point 3 to point 2 135 | if f4 > f0: 136 | x3 = x2-(0.5*d2*(x4-x2)**2)/(f4-f2-d2*(x4-x2)) 137 | # quadratic interpolation 138 | else: 139 | A = 6*(f2-f4)/(x4-x2)+3*(d4+d2) # cubic interpolation 140 | B = 3*(f4-f2)-(2*d2+d4)*(x4-x2) 141 | if A != 0: 142 | x3=x2+(sqrt(B*B-A*d2*(x4-x2)**2)-B)/A 143 | # num. error possible, ok! 144 | else: 145 | x3 = inf 146 | if isnan(x3) or isinf(x3): 147 | x3 = (x2+x4)/2 # if we had a numerical problem then bisect 148 | x3 = max(min(x3, x4-INT*(x4-x2)),x2+INT*(x4-x2)) 149 | # don't accept too close 150 | f3 = f(X+x3*s, *args) 151 | df3 = grad(X+x3*s, *args) 152 | if f3 < F0: 153 | X0 = X+x3*s; F0 = f3; dF0 = df3 # keep best values 154 | M = M - 1; i = i + (length<0) # count epochs?! 155 | d3 = dot(df3,s) # new slope 156 | 157 | if abs(d3) < -SIG*d0 and f3 < f0+x3*RHO*d0: # if line search succeeded 158 | X = X+x3*s; f0 = f3; fX.append(f0) # update variables 159 | if verbose: print '%s %6i; Value %4.6e\r' % (S, i, f0) 160 | s = (dot(df3,df3)-dot(df0,df3))/dot(df0,df0)*s - df3 161 | # Polack-Ribiere CG direction 162 | df0 = df3 # swap derivatives 163 | d3 = d0; d0 = dot(df0,s) 164 | if d0 > 0: # new slope must be negative 165 | s = -df0; d0 = -dot(s,s) # otherwise use steepest direction 166 | x3 = x3 * min(RATIO, d3/(d0-SMALL)) # slope ratio but max RATIO 167 | ls_failed = 0 # this line search did not fail 168 | else: 169 | X = X0; f0 = F0; df0 = dF0 # restore best point so far 170 | if ls_failed or (i>abs(length)):# line search failed twice in a row 171 | break # or we ran out of time, so we give up 172 | s = -df0; d0 = -dot(s,s) # try steepest 173 | x3 = 1/(1-d0) 174 | ls_failed = 1 # this line search failed 175 | if verbose: print "\n" 176 | return X, fX, i 177 | 178 | -------------------------------------------------------------------------------- /expr_cifar10_ZLIN_normhid_nolinb_dropout.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import numpy 4 | import theano 5 | import theano.tensor as T 6 | from theano.tensor.shared_randomstreams import RandomStreams 7 | import cPickle 8 | 9 | from dataset import CIFAR10 10 | from layer import StackedLayer 11 | from classifier import LogisticRegression 12 | from model import ClassicalAutoencoder, ZerobiasAutoencoder, LinearAutoencoder 13 | from preprocess import SubtractMeanAndNormalizeH, PCA 14 | from train import GraddescentMinibatch, Dropout 15 | from params import save_params, load_params, set_params, get_params 16 | 17 | from minimize import minimize 18 | 19 | 20 | ####################### 21 | # SET SUPER PARAMETER # 22 | ####################### 23 | 24 | pca_retain = 800 25 | hid_layer_sizes = [4000, 1000, 4000, 1000, 4000, 1000, 4000] 26 | batchsize = 100 27 | zae_threshold=1. 28 | 29 | momentum = 0.9 30 | pretrain_lr_zae = 1e-3 31 | pretrain_lr_lin = 1e-4 32 | weightdecay = 0.001 33 | pretrain_epc = 600 34 | 35 | logreg_lr = 0.5 36 | logreg_epc = 1000 37 | 38 | finetune_lr = 5e-3 39 | finetune_epc = 1000 40 | 41 | print " " 42 | print "pca_retain =", pca_retain 43 | print "hid_layer_sizes =", hid_layer_sizes 44 | print "batchsize =", batchsize 45 | print "zae_threshold =", zae_threshold 46 | print "momentum =", momentum 47 | print "pretrain, zae: lr = %f, epc = %d" % (pretrain_lr_zae, pretrain_epc) 48 | print "pretrain, lin: lr = %f, epc = %d, wd = %.3f" % (pretrain_lr_lin, pretrain_epc, weightdecay) 49 | print "logistic regression: lr = %f, epc = %d" % (logreg_lr, logreg_epc) 50 | print "finetune: lr = %f, epc = %d" % (finetune_lr, finetune_epc) 51 | 52 | ############# 53 | # LOAD DATA # 54 | ############# 55 | 56 | cifar10_data = CIFAR10() 57 | train_x, train_y = cifar10_data.get_train_set() 58 | test_x, test_y = cifar10_data.get_test_set() 59 | 60 | print "\n... pre-processing" 61 | preprocess_model = SubtractMeanAndNormalizeH(train_x.shape[1]) 62 | map_fun = theano.function([preprocess_model.varin], preprocess_model.output()) 63 | 64 | pca_obj = PCA() 65 | pca_obj.fit(map_fun(train_x), retain=pca_retain, whiten=True) 66 | preprocess_model = preprocess_model + pca_obj.forward_layer 67 | preprocess_function = theano.function([preprocess_model.varin], preprocess_model.output()) 68 | train_x = preprocess_function(train_x) 69 | test_x = preprocess_function(test_x) 70 | 71 | feature_num = train_x.shape[0] * train_x.shape[1] 72 | 73 | train_x = theano.shared(value=train_x, name='train_x', borrow=True) 74 | train_y = theano.shared(value=train_y, name='train_y', borrow=True) 75 | test_x = theano.shared(value=test_x, name='test_x', borrow=True) 76 | test_y = theano.shared(value=test_y, name='test_y', borrow=True) 77 | print "Done." 78 | 79 | ######################### 80 | # BUILD PRE-TRAIN MODEL # 81 | ######################### 82 | 83 | print "... building pre-train model" 84 | npy_rng = numpy.random.RandomState(123) 85 | model = ZerobiasAutoencoder( 86 | train_x.get_value().shape[1], hid_layer_sizes[0], 87 | init_w = theano.shared( 88 | value=0.01 * train_x.get_value()[:hid_layer_sizes[0], :].T, 89 | name='w_zae_0', 90 | borrow=True 91 | ), 92 | threshold=zae_threshold, vistype='real', tie=True, npy_rng=npy_rng 93 | ) + SubtractMeanAndNormalizeH(hid_layer_sizes[0] 94 | ) + LinearAutoencoder( 95 | hid_layer_sizes[0], hid_layer_sizes[1], 96 | init_w = theano.shared( 97 | value=numpy.tile( 98 | 0.01 * train_x.get_value(), 99 | (hid_layer_sizes[0] * hid_layer_sizes[1] / feature_num + 1, 1) 100 | ).flatten()[:(hid_layer_sizes[0] * hid_layer_sizes[1])].reshape( 101 | hid_layer_sizes[0], hid_layer_sizes[1] 102 | ), 103 | name='w_ae_1', 104 | borrow=True 105 | ), 106 | vistype = 'real', tie=True, npy_rng=npy_rng 107 | ) + SubtractMeanAndNormalizeH(hid_layer_sizes[1] 108 | ) + ZerobiasAutoencoder( 109 | hid_layer_sizes[1], hid_layer_sizes[2], 110 | init_w = theano.shared( 111 | value=numpy.tile( 112 | 0.01 * train_x.get_value(), 113 | (hid_layer_sizes[1] * hid_layer_sizes[2] / feature_num + 1, 1) 114 | ).flatten()[:(hid_layer_sizes[1] * hid_layer_sizes[2])].reshape( 115 | hid_layer_sizes[1], hid_layer_sizes[2] 116 | ), 117 | name='w_zae_2', 118 | borrow=True 119 | ), 120 | threshold=zae_threshold, vistype='real', tie=True, npy_rng=npy_rng 121 | ) + SubtractMeanAndNormalizeH(hid_layer_sizes[2] 122 | ) + LinearAutoencoder( 123 | hid_layer_sizes[2], hid_layer_sizes[3], 124 | init_w = theano.shared( 125 | value=numpy.tile( 126 | 0.01 * train_x.get_value(), 127 | (hid_layer_sizes[2] * hid_layer_sizes[3] / feature_num + 1, 1) 128 | ).flatten()[:(hid_layer_sizes[2] * hid_layer_sizes[3])].reshape( 129 | hid_layer_sizes[2], hid_layer_sizes[3] 130 | ), 131 | name='w_ae_3', 132 | borrow=True 133 | ), 134 | vistype = 'real', tie=True, npy_rng=npy_rng 135 | ) + SubtractMeanAndNormalizeH(hid_layer_sizes[3] 136 | ) + ZerobiasAutoencoder( 137 | hid_layer_sizes[3], hid_layer_sizes[4], 138 | init_w = theano.shared( 139 | value=numpy.tile( 140 | 0.01 * train_x.get_value(), 141 | (hid_layer_sizes[3] * hid_layer_sizes[4] / feature_num + 1, 1) 142 | ).flatten()[:(hid_layer_sizes[3] * hid_layer_sizes[4])].reshape( 143 | hid_layer_sizes[3], hid_layer_sizes[4] 144 | ), 145 | name='w_zae_4', 146 | borrow=True 147 | ), 148 | threshold=zae_threshold, vistype='real', tie=True, npy_rng=npy_rng 149 | ) + SubtractMeanAndNormalizeH(hid_layer_sizes[4] 150 | ) + LinearAutoencoder( 151 | hid_layer_sizes[4], hid_layer_sizes[5], 152 | init_w = theano.shared( 153 | value=numpy.tile( 154 | 0.01 * train_x.get_value(), 155 | (hid_layer_sizes[4] * hid_layer_sizes[5] / feature_num + 1, 1) 156 | ).flatten()[:(hid_layer_sizes[4] * hid_layer_sizes[5])].reshape( 157 | hid_layer_sizes[4], hid_layer_sizes[5] 158 | ), 159 | name='w_ae_5', 160 | borrow=True 161 | ), 162 | vistype = 'real', tie=True, npy_rng=npy_rng 163 | ) + SubtractMeanAndNormalizeH(hid_layer_sizes[5] 164 | ) + ZerobiasAutoencoder( 165 | hid_layer_sizes[5], hid_layer_sizes[6], 166 | init_w = theano.shared( 167 | value=numpy.tile( 168 | 0.01 * train_x.get_value(), 169 | (hid_layer_sizes[5] * hid_layer_sizes[6] / feature_num + 1, 1) 170 | ).flatten()[:(hid_layer_sizes[5] * hid_layer_sizes[6])].reshape( 171 | hid_layer_sizes[5], hid_layer_sizes[6] 172 | ), 173 | name='w_zae_6', 174 | borrow=True 175 | ), 176 | threshold=zae_threshold, vistype='real', tie=True, npy_rng=npy_rng 177 | ) 178 | model.models_stack[2].params = [model.models_stack[2].w] 179 | model.models_stack[2].params_private = [model.models_stack[2].w, model.models_stack[2].bT] 180 | model.models_stack[6].params = [model.models_stack[6].w] 181 | model.models_stack[6].params_private = [model.models_stack[6].w, model.models_stack[6].bT] 182 | model.models_stack[10].params = [model.models_stack[10].w] 183 | model.models_stack[10].params_private = [model.models_stack[10].w, model.models_stack[10].bT] 184 | 185 | model.print_layer() 186 | print "Done." 187 | 188 | ############# 189 | # PRE-TRAIN # 190 | ############# 191 | 192 | theano_rng = RandomStreams(123) 193 | for i in range(0, len(model.models_stack), 2): 194 | if (i + 2) % 4 == 0: 195 | model.models_stack[i-2].threshold = 0. 196 | model.models_stack[i-1].varin = model.models_stack[i-2].output() 197 | 198 | print "\n\nPre-training layer %d:" % i 199 | layer_dropout = Dropout(model.models_stack[i], droprates=[0.2, 0.5], theano_rng=theano_rng).dropout_model 200 | layer_dropout.varin = model.models_stack[i].varin 201 | 202 | if (i + 2) % 4 == 0: 203 | model.models_stack[i-2].threshold = 0. 204 | pretrain_lr = pretrain_lr_lin 205 | layer_cost = layer_dropout.cost() + layer_dropout.weightdecay(weightdecay) 206 | else: 207 | pretrain_lr = pretrain_lr_zae 208 | layer_cost = layer_dropout.cost() 209 | 210 | trainer = GraddescentMinibatch( 211 | varin=model.varin, data=train_x, 212 | cost=layer_cost, 213 | params=layer_dropout.params_private, 214 | supervised=False, 215 | batchsize=batchsize, learningrate=pretrain_lr, momentum=momentum, 216 | rng=npy_rng 217 | ) 218 | 219 | prev_cost = numpy.inf 220 | patience = 0 221 | for epoch in xrange(pretrain_epc): 222 | cost = trainer.epoch() 223 | if prev_cost <= cost: 224 | patience += 1 225 | if patience > 10: 226 | patience = 0 227 | trainer.set_learningrate(0.9 * trainer.learningrate) 228 | if trainer.learningrate < 1e-10: 229 | break 230 | prev_cost = cost 231 | save_params(model, 'ZLIN_4000_1000_4000_1000_4000_1000_4000_normhid_nolinb_cae1_dropout.npy') 232 | print "Done." 233 | 234 | 235 | ######################### 236 | # BUILD FINE-TUNE MODEL # 237 | ######################### 238 | 239 | print "\n\n... building fine-tune model -- contraction 1" 240 | for imodel in model.models_stack: 241 | imodel.threshold = 0. 242 | model_ft = model + LogisticRegression( 243 | hid_layer_sizes[-1], 10, npy_rng=npy_rng 244 | ) 245 | model_ft.print_layer() 246 | 247 | train_set_error_rate = theano.function( 248 | [], 249 | T.mean(T.neq(model_ft.models_stack[-1].predict(), train_y)), 250 | givens = {model_ft.varin : train_x}, 251 | ) 252 | test_set_error_rate = theano.function( 253 | [], 254 | T.mean(T.neq(model_ft.models_stack[-1].predict(), test_y)), 255 | givens = {model_ft.varin : test_x}, 256 | ) 257 | print "Done." 258 | 259 | print "... training with conjugate gradient: minimize.py" 260 | fun_cost = theano.function( 261 | [model_ft.varin, model_ft.models_stack[-1].vartruth], 262 | model_ft.models_stack[-1].cost() + model_ft.models_stack[-1].weightdecay(weightdecay) 263 | ) 264 | def return_cost(test_params, input_x, truth_y): 265 | tmp = get_params(model_ft.models_stack[-1]) 266 | set_params(model_ft.models_stack[-1], test_params) 267 | result = fun_cost(input_x, truth_y) 268 | set_params(model_ft.models_stack[-1], tmp) 269 | return result 270 | 271 | fun_grad = theano.function( 272 | [model_ft.varin, model_ft.models_stack[-1].vartruth], 273 | T.grad(model_ft.models_stack[-1].cost() + model_ft.models_stack[-1].weightdecay(weightdecay), 274 | model_ft.models_stack[-1].params) 275 | ) 276 | def return_grad(test_params, input_x, truth_y): 277 | tmp = get_params(model_ft.models_stack[-1]) 278 | set_params(model_ft.models_stack[-1], test_params) 279 | result = numpy.concatenate([numpy.array(i).flatten() for i in fun_grad(input_x, truth_y)]) 280 | set_params(model_ft.models_stack[-1], tmp) 281 | return result 282 | p, g, numlinesearches = minimize( 283 | get_params(model_ft.models_stack[-1]), return_cost, return_grad, 284 | (train_x.get_value(), train_y.get_value()), logreg_epc, verbose=False 285 | ) 286 | set_params(model_ft.models_stack[-1], p) 287 | save_params(model_ft, 'ZLIN_4000_1000_4000_1000_4000_1000_4000_10_normhid_nolinb_cae1_dropout.npy') 288 | print "***error rate: train: %f, test: %f" % ( 289 | train_set_error_rate(), test_set_error_rate() 290 | ) 291 | 292 | ############# 293 | # FINE-TUNE # 294 | ############# 295 | 296 | """ 297 | print "\n\n... fine-tuning the whole network" 298 | truth = T.lmatrix('truth') 299 | trainer = GraddescentMinibatch( 300 | varin=model_ft.varin, data=train_x, 301 | truth=model_ft.models_stack[-1].vartruth, truth_data=train_y, 302 | supervised=True, 303 | cost=model_ft.models_stack[-1].cost(), 304 | params=model.params, 305 | batchsize=batchsize, learningrate=finetune_lr, momentum=momentum, 306 | rng=npy_rng 307 | ) 308 | 309 | prev_cost = numpy.inf 310 | for epoch in xrange(finetune_epc): 311 | cost = trainer.epoch() 312 | if epoch % 100 == 0 and epoch != 0: # prev_cost <= cost: 313 | trainer.set_learningrate(trainer.learningrate*0.8) 314 | if epoch % 50 == 0: 315 | print "***error rate: train: %f, test: %f" % ( 316 | train_set_error_rate(), test_set_error_rate() 317 | ) 318 | prev_cost = cost 319 | print "Done." 320 | """ 321 | 322 | 323 | 324 | print "\n\n... fine-tuning the whole network, with dropout" 325 | theano_rng = RandomStreams(123) 326 | dropout_ft = Dropout(model_ft, droprates=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1, 0.1], theano_rng=theano_rng).dropout_model 327 | dropout_ft.print_layer() 328 | 329 | trainer = GraddescentMinibatch( 330 | varin=dropout_ft.varin, data=train_x, 331 | truth=dropout_ft.models_stack[-1].vartruth, truth_data=train_y, 332 | supervised=True, 333 | cost=dropout_ft.models_stack[-1].cost(), 334 | params=dropout_ft.params, 335 | batchsize=batchsize, learningrate=finetune_lr, momentum=momentum, 336 | rng=npy_rng 337 | ) 338 | 339 | prev_cost = numpy.inf 340 | patience = 0 341 | for epoch in xrange(1000): 342 | cost = trainer.epoch() 343 | if prev_cost <= cost: 344 | patience += 1 345 | if patience > 5: 346 | patience = 0 347 | trainer.set_learningrate(trainer.learningrate * 0.9) 348 | if trainer.learningrate < 1e-10: 349 | break 350 | print "***error rate: train: %f, test: %f" % (train_set_error_rate(), test_set_error_rate()) 351 | prev_cost = cost 352 | print "Done." 353 | 354 | print "***FINAL error rate, train: %f, test: %f" % ( 355 | train_set_error_rate(), test_set_error_rate() 356 | ) 357 | save_params(model_ft, 'ZLIN_4000_1000_4000_1000_4000_1000_4000_10_normhid_nolinb_cae1_dropout_dpft.npy') 358 | -------------------------------------------------------------------------------- /expr_cifar10_ZLIN_normhid_nolinb_dtagmt_dropout.py: -------------------------------------------------------------------------------- 1 | import os 2 | import sys 3 | import numpy 4 | from scipy import ndimage 5 | import theano 6 | import theano.tensor as T 7 | from theano.tensor.shared_randomstreams import RandomStreams 8 | import cPickle 9 | 10 | from dataset import CIFAR10 11 | from layer import StackedLayer 12 | from classifier import LogisticRegression 13 | from model import ClassicalAutoencoder, ZerobiasAutoencoder 14 | from preprocess import SubtractMeanAndNormalizeH, PCA 15 | from train import GraddescentMinibatch, Dropout 16 | from params import save_params, load_params, set_params, get_params 17 | 18 | from minimize import minimize 19 | 20 | ####################### 21 | # SET SUPER PARAMETER # 22 | ####################### 23 | 24 | pca_retain = 800 25 | hid_layer_sizes = [4000, 1000, 4000] 26 | batchsize = 100 27 | zae_threshold=1. 28 | 29 | momentum = 0.9 30 | pretrain_lr_zae = 1e-3 31 | pretrain_lr_lin = 1e-4 32 | weightdecay = 1.0 33 | pretrain_epc = 800 34 | 35 | logreg_lr = 0.5 36 | logreg_epc = 1000 37 | 38 | finetune_lr = 5e-3 39 | finetune_epc = 1000 40 | 41 | print " " 42 | print "pca_retain =", pca_retain 43 | print "hid_layer_sizes =", hid_layer_sizes 44 | print "batchsize =", batchsize 45 | print "zae_threshold =", zae_threshold 46 | print "momentum =", momentum 47 | print "pretrain, zae: lr = %f, epc = %d" % (pretrain_lr_zae, pretrain_epc) 48 | print "pretrain, lin: lr = %f, epc = %d, wd = %.3f" % (pretrain_lr_lin, pretrain_epc, weightdecay) 49 | print "logistic regression: lr = %f, epc = %d" % (logreg_lr, logreg_epc) 50 | print "finetune: lr = %f, epc = %d" % (finetune_lr, finetune_epc) 51 | 52 | ############# 53 | # LOAD DATA # 54 | ############# 55 | 56 | cifar10_data = CIFAR10() 57 | train_x, train_y = cifar10_data.get_train_set() 58 | test_x, test_y = cifar10_data.get_test_set() 59 | 60 | print "\n... pre-processing" 61 | preprocess_model = SubtractMeanAndNormalizeH(train_x.shape[1]) 62 | map_fun = theano.function([preprocess_model.varin], preprocess_model.output()) 63 | 64 | pca_obj = PCA() 65 | pca_obj.fit(map_fun(train_x), retain=pca_retain, whiten=True) 66 | preprocess_model = preprocess_model + pca_obj.forward_layer 67 | preprocess_function = theano.function([preprocess_model.varin], preprocess_model.output()) 68 | 69 | pcamapping = theano.function([pca_obj.forward_layer.varin], pca_obj.forward_layer.output()) 70 | pcaback = theano.function([pca_obj.backward_layer.varin], pca_obj.backward_layer.output()) 71 | 72 | train_x = preprocess_function(train_x) 73 | test_x = preprocess_function(test_x) 74 | 75 | feature_num = train_x.shape[0] * train_x.shape[1] 76 | 77 | train_x = theano.shared(value=train_x, name='train_x', borrow=True) 78 | train_y = theano.shared(value=train_y, name='train_y', borrow=True) 79 | test_x = theano.shared(value=test_x, name='test_x', borrow=True) 80 | test_y = theano.shared(value=test_y, name='test_y', borrow=True) 81 | print "Done." 82 | 83 | ######################### 84 | # BUILD PRE-TRAIN MODEL # 85 | ######################### 86 | 87 | print "... building pre-train model" 88 | npy_rng = numpy.random.RandomState(123) 89 | model = ZerobiasAutoencoder( 90 | train_x.get_value().shape[1], hid_layer_sizes[0], 91 | init_w = theano.shared( 92 | value=0.01 * train_x.get_value()[:hid_layer_sizes[0], :].T, 93 | name='w_zae_0', 94 | borrow=True 95 | ), 96 | threshold=zae_threshold, vistype='real', tie=True, npy_rng=npy_rng 97 | ) + SubtractMeanAndNormalizeH(hid_layer_sizes[0] 98 | ) + ClassicalAutoencoder( 99 | hid_layer_sizes[0], hid_layer_sizes[1], 100 | init_w = theano.shared( 101 | value=numpy.tile( 102 | 0.01 * train_x.get_value(), 103 | (hid_layer_sizes[0] * hid_layer_sizes[1] / feature_num + 1, 1) 104 | ).flatten()[:(hid_layer_sizes[0] * hid_layer_sizes[1])].reshape( 105 | hid_layer_sizes[0], hid_layer_sizes[1] 106 | ), 107 | name='w_ae_1', 108 | borrow=True 109 | ), 110 | vistype = 'real', tie=True, npy_rng=npy_rng 111 | ) + SubtractMeanAndNormalizeH(hid_layer_sizes[1] 112 | ) + ZerobiasAutoencoder( 113 | hid_layer_sizes[1], hid_layer_sizes[2], 114 | init_w = theano.shared( 115 | value=numpy.tile( 116 | 0.01 * train_x.get_value(), 117 | (hid_layer_sizes[1] * hid_layer_sizes[2] / feature_num + 1, 1) 118 | ).flatten()[:(hid_layer_sizes[1] * hid_layer_sizes[2])].reshape( 119 | hid_layer_sizes[1], hid_layer_sizes[2] 120 | ), 121 | name='w_zae_2', 122 | borrow=True 123 | ), 124 | threshold=zae_threshold, vistype='real', tie=True, npy_rng=npy_rng 125 | ) 126 | model.models_stack[2].params = [model.models_stack[2].w] 127 | model.models_stack[2].params_private = [model.models_stack[2].w, model.models_stack[2].bT] 128 | 129 | model.print_layer() 130 | print "Done." 131 | 132 | ############# 133 | # PRE-TRAIN # 134 | ############# 135 | 136 | theano_rng = RandomStreams(123) 137 | for i in range(0, len(model.models_stack), 2): 138 | if (i + 2) % 4 == 0: 139 | model.models_stack[i-2].threshold = 0. 140 | model.models_stack[i-1].varin = model.models_stack[i-2].output() 141 | 142 | print "\n\nPre-training layer %d:" % i 143 | layer_dropout = Dropout(model.models_stack[i], droprates=[0.2, 0.5], theano_rng=theano_rng).dropout_model 144 | layer_dropout.varin = model.models_stack[i].varin 145 | 146 | if (i + 2) % 4 == 0: 147 | model.models_stack[i-2].threshold = 0. 148 | pretrain_lr = pretrain_lr_lin 149 | layer_cost = layer_dropout.cost() + layer_dropout.contraction(weightdecay) 150 | else: 151 | pretrain_lr = pretrain_lr_zae 152 | layer_cost = layer_dropout.cost() 153 | 154 | trainer = GraddescentMinibatch( 155 | varin=model.varin, data=train_x, 156 | cost=layer_cost, 157 | params=layer_dropout.params_private, 158 | supervised=False, 159 | batchsize=batchsize, learningrate=pretrain_lr, momentum=momentum, 160 | rng=npy_rng 161 | ) 162 | 163 | prev_cost = numpy.inf 164 | patience = 0 165 | origin_x = train_x.get_value() 166 | reshaped_x = pcaback(origin_x).reshape((50000, 32, 32, 3), order='F') 167 | for epoch in xrange(pretrain_epc): 168 | cost = 0. 169 | # original data 170 | cost += trainer.epoch() 171 | 172 | # data augmentation: horizontal flip 173 | flipped_x = reshaped_x[:, ::-1, :, :].reshape((50000, 3072), order='F') 174 | train_x.set_value(pcamapping(flipped_x)) 175 | cost += trainer.epoch() 176 | 177 | # random rotation 178 | movies = numpy.zeros(reshaped_x.shape, dtype=theano.config.floatX) 179 | for i, im in enumerate(reshaped_x): 180 | angle_delta = (npy_rng.vonmises(0.0, 1.0)/(4 * numpy.pi)) * 180 181 | movies[i] = ndimage.rotate(im, angle_delta, reshape=False, mode='wrap') 182 | train_x.set_value(pcamapping(movies.reshape((50000, 3072), order='F'))) 183 | cost += trainer.epoch() 184 | 185 | # random shift 186 | for i, im in enumerate(reshaped_x): 187 | shifts = (npy_rng.randint(-4, 4), npy_rng.randint(-4, 4), 0) 188 | movies[i] = ndimage.interpolation.shift(im, shifts, mode='reflect') 189 | train_x.set_value(pcamapping(movies.reshape((50000, 3072), order='F'))) 190 | cost += trainer.epoch() 191 | 192 | cost /= 4. 193 | train_x.set_value(origin_x) 194 | 195 | if prev_cost <= cost: 196 | patience += 1 197 | if patience > 10: 198 | patience = 0 199 | trainer.set_learningrate(0.9 * trainer.learningrate) 200 | if trainer.learningrate < 1e-10: 201 | break 202 | prev_cost = cost 203 | save_params(model, 'ZLIN_4000_1000_4000_normhid_nolinb_cae1_dtagmt2_dropout.npy') 204 | print "Done." 205 | 206 | 207 | ######################### 208 | # BUILD FINE-TUNE MODEL # 209 | ######################### 210 | 211 | print "\n\n... building fine-tune model -- contraction 1" 212 | for imodel in model.models_stack: 213 | imodel.threshold = 0. 214 | model_ft = model + LogisticRegression( 215 | hid_layer_sizes[-1], 10, npy_rng=npy_rng 216 | ) 217 | model_ft.print_layer() 218 | 219 | train_set_error_rate = theano.function( 220 | [], 221 | T.mean(T.neq(model_ft.models_stack[-1].predict(), train_y)), 222 | givens = {model_ft.varin : train_x}, 223 | ) 224 | test_set_error_rate = theano.function( 225 | [], 226 | T.mean(T.neq(model_ft.models_stack[-1].predict(), test_y)), 227 | givens = {model_ft.varin : test_x}, 228 | ) 229 | print "Done." 230 | 231 | print "... training with conjugate gradient: minimize.py" 232 | fun_cost = theano.function( 233 | [model_ft.varin, model_ft.models_stack[-1].vartruth], 234 | model_ft.models_stack[-1].cost() + model_ft.models_stack[-1].weightdecay(weightdecay) 235 | ) 236 | def return_cost(test_params, input_x, truth_y): 237 | tmp = get_params(model_ft.models_stack[-1]) 238 | set_params(model_ft.models_stack[-1], test_params) 239 | result = fun_cost(input_x, truth_y) 240 | set_params(model_ft.models_stack[-1], tmp) 241 | return result 242 | 243 | fun_grad = theano.function( 244 | [model_ft.varin, model_ft.models_stack[-1].vartruth], 245 | T.grad(model_ft.models_stack[-1].cost() + model_ft.models_stack[-1].weightdecay(weightdecay), 246 | model_ft.models_stack[-1].params) 247 | ) 248 | def return_grad(test_params, input_x, truth_y): 249 | tmp = get_params(model_ft.models_stack[-1]) 250 | set_params(model_ft.models_stack[-1], test_params) 251 | result = numpy.concatenate([numpy.array(i).flatten() for i in fun_grad(input_x, truth_y)]) 252 | set_params(model_ft.models_stack[-1], tmp) 253 | return result 254 | p, g, numlinesearches = minimize( 255 | get_params(model_ft.models_stack[-1]), return_cost, return_grad, 256 | (train_x.get_value(), train_y.get_value()), logreg_epc, verbose=False 257 | ) 258 | set_params(model_ft.models_stack[-1], p) 259 | save_params(model_ft, 'ZLIN_4000_1000_4000_10_normhid_nolinb_cae1_dtagmt2_dropout.npy') 260 | 261 | load_params(model_ft, 'ZLIN_4000_1000_4000_10_normhid_nolinb_cae1_dtagmt2_dropout.npy') 262 | print "***error rate: train: %f, test: %f" % ( 263 | train_set_error_rate(), test_set_error_rate() 264 | ) 265 | 266 | ############# 267 | # FINE-TUNE # 268 | ############# 269 | 270 | """ 271 | print "\n\n... fine-tuning the whole network" 272 | truth = T.lmatrix('truth') 273 | trainer = GraddescentMinibatch( 274 | varin=model_ft.varin, data=train_x, 275 | truth=model_ft.models_stack[-1].vartruth, truth_data=train_y, 276 | supervised=True, 277 | cost=model_ft.models_stack[-1].cost(), 278 | params=model.params, 279 | batchsize=batchsize, learningrate=finetune_lr, momentum=momentum, 280 | rng=npy_rng 281 | ) 282 | 283 | prev_cost = numpy.inf 284 | for epoch in xrange(finetune_epc): 285 | cost = trainer.epoch() 286 | if epoch % 100 == 0 and epoch != 0: # prev_cost <= cost: 287 | trainer.set_learningrate(trainer.learningrate*0.8) 288 | if epoch % 50 == 0: 289 | print "***error rate: train: %f, test: %f" % ( 290 | train_set_error_rate(), test_set_error_rate() 291 | ) 292 | prev_cost = cost 293 | print "Done." 294 | """ 295 | 296 | 297 | 298 | print "\n\n... fine-tuning the whole network, with dropout" 299 | theano_rng = RandomStreams(123) 300 | dropout_ft = Dropout(model_ft, droprates=[0.1, 0.1, 0.1, 0.1, 0.1, 0.1], theano_rng=theano_rng).dropout_model 301 | dropout_ft.print_layer() 302 | 303 | trainer = GraddescentMinibatch( 304 | varin=dropout_ft.varin, data=train_x, 305 | truth=dropout_ft.models_stack[-1].vartruth, truth_data=train_y, 306 | supervised=True, 307 | cost=dropout_ft.models_stack[-1].cost(), 308 | params=dropout_ft.params, 309 | batchsize=batchsize, learningrate=finetune_lr, momentum=momentum, 310 | rng=npy_rng 311 | ) 312 | 313 | origin_x = train_x.get_value() 314 | reshaped_x = pcaback(origin_x).reshape((50000, 32, 32, 3), order='F') 315 | prev_cost = numpy.inf 316 | patience = 0 317 | for epoch in xrange(1000): 318 | cost = 0. 319 | # original data 320 | cost += trainer.epoch() 321 | 322 | # data augmentation: horizontal flip 323 | flipped_x = reshaped_x[:, ::-1, :, :].reshape((50000, 3072), order='F') 324 | train_x.set_value(pcamapping(flipped_x)) 325 | cost += trainer.epoch() 326 | 327 | # random rotation 328 | movies = numpy.zeros(reshaped_x.shape, dtype=theano.config.floatX) 329 | for i, im in enumerate(reshaped_x): 330 | angle_delta = (npy_rng.vonmises(0.0, 1.0)/(4 * numpy.pi)) * 180 331 | movies[i] = ndimage.rotate(im, angle_delta, reshape=False, mode='wrap') 332 | train_x.set_value(pcamapping(movies.reshape((50000, 3072), order='F'))) 333 | cost += trainer.epoch() 334 | 335 | # random shift 336 | for i, im in enumerate(reshaped_x): 337 | shifts = (npy_rng.randint(-4, 4), npy_rng.randint(-4, 4), 0) 338 | movies[i] = ndimage.interpolation.shift(im, shifts, mode='reflect') 339 | train_x.set_value(pcamapping(movies.reshape((50000, 3072), order='F'))) 340 | cost += trainer.epoch() 341 | 342 | cost /= 4. 343 | train_x.set_value(origin_x) 344 | 345 | if prev_cost <= cost: 346 | patience += 1 347 | if patience > 5: 348 | patience = 0 349 | trainer.set_learningrate(0.9 * trainer.learningrate) 350 | if trainer.learningrate < 1e-10: 351 | break 352 | print "***error rate: train: %f, test: %f" % (train_set_error_rate(), test_set_error_rate()) 353 | prev_cost = cost 354 | print "Done." 355 | 356 | print "***FINAL error rate, train: %f, test: %f" % ( 357 | train_set_error_rate(), test_set_error_rate() 358 | ) 359 | save_params(model_ft, 'ZLIN_4000_1000_4000_10_normhid_nolinb_cae1_dtagmt2_dropout_dpft.npy') 360 | --------------------------------------------------------------------------------