├── README.md ├── dl ├── genmlp.py ├── logistic_sgd.py └── pickler.py ├── scripts └── process.sh └── utils └── src ├── Annotations.cpp ├── Annotations.h ├── DataStream.py ├── GazeTracker.cpp ├── GazeTracker.h ├── Globals.cpp ├── Globals.h ├── Makefile ├── Preprocess.cpp ├── Preprocess.h ├── annotator.cpp ├── classify.cpp └── xmlToIDX.cpp /README.md: -------------------------------------------------------------------------------- 1 | DeepLearning 2 | ============ 3 | 4 | Code to build MLP models for outdoor head orientation tracking 5 | 6 | The following is the set of directories with a description of what is 7 | in each. 8 | 9 | ### dl 10 | The directory with the python files to train and test an MLP. 11 | 12 | The file *genmlp.py* is based on the mlp.py that is part of the Theano 13 | documentation. It is more general purpose in that one can configure 14 | a network of arbitrary depth and number of nodes per layer. It also 15 | implements a sliding window for training that enables one to train 16 | data sets of arbitrary size on limited GPU memory. 17 | 18 | The file *logistic_sgd.py* comes with a nice reporting function that 19 | builds a matrix with classification results on the test set where we 20 | show the number of correctly classified frames and the distribution 21 | of the incorrectly classified frames across all classes. 22 | 23 | The file *pickler.py* has a number of helper methods that can be used to 24 | build files with the data that conform to the Theano input format. 25 | The file takes as input files in the MNIST IDX format. It can be used 26 | to chunk data sets into multiple sets of files, one for training, one 27 | for validation, and the last for test. 28 | 29 | ### utils 30 | The directory with C++ code that can be used to generate datasets in 31 | the MNIST IDX format from labeled data. The labels correspond to a 32 | partition of the space in front of a driver in a car, with the 33 | following values, 34 | 35 | 1. Driver window 36 | 2. Left of center 37 | 3. Straight ahead 38 | 4. Right of center 39 | 5. Passenger window 40 | 41 | Given a video of the driver, an annotation file for that video has the 42 | following format, 43 | 44 | 45 | 46 | 47 | 1 48 | 0,0 49 | 9 50 | 1 51 | 4 52 | 53 | 54 | 2 55 | 0,0 56 | 9 57 | 1 58 | 4 59 | 60 | ... 61 | ... 62 | 63 | 64 | where, the directory is expected to contain frames from the video with the 65 | following filenames *'frame_%d.png'%frameNumber*. Each video frame is a 66 | 640x480 image file with the zone indicating the class, the status indicating 67 | the car status, and the intersection indicating the type of intersection. 68 | For the purposes of building the data sets, we only use the zone information 69 | at this point. The center is expected to be the rough center of the location 70 | of the face in each frame. 71 | 72 | The pre-processing that is done on the images is as follows, 73 | 74 | 1. A Region of Interest (ROI) of configurable size (Globals.cpp) is picked 75 | around the image center. 76 | 2. A histogram equalization followed by edge detection is performed. 77 | 3. A DC suppression using a sigmoid is then done. 78 | 4. A gaussian window function is applied around the center. 79 | 5. The image is scaled and a vector generated from the image matrix in 80 | row-major. 81 | 82 | ### Build 83 | 84 | Do a make mode=opt in utils/src to build optimized executables. The dependencies is *OpenCV*. This builds everything and places them in an *install* directory under DeepLearning. 85 | 86 | #### Data generation 87 | To generate data sets, use the following commands, 88 | 89 | xmlToIDX 90 | -o 91 | -r 92 | -v 93 | -status 94 | -inter 95 | [-d ]+ 96 | [-b binaryThreshold] 97 | [-usebins] 98 | [-h for usage] 99 | 100 | if the outputFileNameSuffix is ubyte, then run the following command to generate pickled numpy arrays from the IDX data sets 101 | 102 | python pickler.py data-train-ubyte label-train-ubyte data-valid-ubyte label-valid-ubyte data-test-ubyte label-test-ubyte gaze_data.pkl 103 | 104 | which will generate sets of training, validation, and test files with the prefix *gaze_data.pkl*. The number of files generated in each set will depend on the chunking size used in pickler.py. The data is broken up into chunks and files are generated one per chunk; as an example the set of test files will be *gaze_data.pkl_test_%d.gz*, with the integer argument in range(numberOfChunks). The first command builds the IDX format data sets. The second converts them into a numpy array of tuples, with each tuple being an array of data points and an array of labels. We have one tuple for the training data, one for validation, and one for test. 105 | 106 | The options to xmlToIDX are as follows, 107 | 108 | * -o is the suffix to use for all generated files 109 | * -r is the training fraction in the interval [0, 1) 110 | * -v is the validation fraction in the interval (0, 1) 111 | * -usebins is used to bin the data based on their labels. We generate as many 112 | data points as argmin_{l \in labels} |D_l|, where D_l is the set of data 113 | points with label l; in other workds we pick as many data points as the 114 | cardinality of the smallest set of data points across all labels. This is to 115 | prevent our network from being biased to class label 3, which is straight 116 | ahead. A large fraction of the frames have the driver facing straight ahead 117 | which causes an enormous bias during training without binning. 118 | * -d a directory of images for training. An annotation file called 119 | annotations.xml is expected to be present in each such directory. 120 | * -b is used to specify a binary threshold that is used to generate image pixel data as binary values with all pixel values above the threshold considered as 1, with the rest being 0. 121 | * -status is used to pick only those frames that have a car status annotation that matches what follows this flag 122 | * -intersection is used to pick only those frames that have the intersection annotation that matches what follows this flag 123 | 124 | The second command builds the tuples of numpy arrays as required by the 125 | Theano based trainer. This one takes as input the training, validation, 126 | and test data and label files with the prefix to use for the generated 127 | file names. 128 | 129 | ### Training and classification 130 | 131 | Training and classification can be done using genmlp.py. The following 132 | command will train a network and generate a report with the validation 133 | error rate, test error rate, and the distribution of the numbers of 134 | frames across all classes together with the expected number of frames 135 | per class. 136 | 137 | THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python genmlp.py 138 | (-d datasetDir | -f datasetFileName) 139 | [-p prefix] 140 | [-b batchSize] 141 | [-nt nTrainFiles] 142 | [-nv nValidFiles] 143 | [-ns nTestFiles] 144 | [-i inputLength] 145 | [-o numClasses] 146 | [-gen modelFileName] 147 | [-l [nLayer1Size, nLayer2Size, ...]] 148 | [-classify] 149 | [-useparams paramsFileName] 150 | [-h help] 151 | 152 | The options are, 153 | 154 | * -d the directory that contains the data sets 155 | * -f a single file that contains the complete pickled data. This is useful when the data sets are small enough to be pickled into one file 156 | * -p the file name prefix for the files names that hold the data sets 157 | * -nt the number of training files 158 | * -nv the number of validation files 159 | * -ns the number of test files 160 | * -l the configuration of the hidden layers in the network, with as many 161 | hidden layers as the number of comma separated elements with the size of 162 | each hidden layer being the elements 163 | * -o the number of labels 164 | * -i the input dimension of the data 165 | * -gen to generate the trained model for use outside Theano. This is as a text 166 | file. We also generate a pickled file called params.gz in the training data 167 | set directory that contains the numpy weights and biases of all hidden layers 168 | and the final logistic layer. 169 | 170 | For questions please send mail to: vishwa.raman@west.cmu.edu 171 | 172 | Thanks for looking. 173 | 174 | 175 | -------------------------------------------------------------------------------- /dl/genmlp.py: -------------------------------------------------------------------------------- 1 | """ 2 | mlpseq.py 3 | This file contains an implementation for the training of a configurable 4 | multi-layer perceptron. It is a generalized implementation of mlp.py 5 | from the theano DeepLearningTutorials. The specific enhancements include, 6 | 7 | 1. Command line configurable network 8 | 2. Ability to generate the model as a pickled file and as a text file 9 | 3. A sliding window implementation to handle large data sets given any 10 | GPU to selectively load data into available GPU memory 11 | 4. A reporting infrastructure that shows the expected vs. classified 12 | classes for all test data points that tracks not only how many data 13 | points were misclassified but their distribution over classes 14 | 15 | The training proceeds in the following manner, 16 | 17 | 1. The first hidden layer is paired with a logistic layer and the 18 | parameters are trained 19 | 2. For all subsequent hidden layers, the following training steps 20 | are followed, 21 | a. Drop the parameters of the logistic layer, but retain the 22 | parameter values of all hidden layers trained so far 23 | b. Add the next hidden layer and the logistic layer 24 | c. Train the parameters of the newly added hidden layer and the 25 | logistic layer 26 | 3. A final pass that includes all parameters is optional and is 27 | being done in the main function 28 | 29 | The model is the values of all weights and biases from the first to the 30 | last hidden layer and the logistic regressor. 31 | 32 | References: 33 | 34 | - textbooks: "Pattern Recognition and Machine Learning" - 35 | Christopher M. Bishop, section 5 36 | 37 | """ 38 | __docformat__ = 'restructedtext en' 39 | 40 | 41 | import cPickle 42 | import gzip 43 | import os 44 | import sys 45 | import time 46 | 47 | import numpy 48 | 49 | import theano 50 | import theano.tensor as T 51 | from pickler import getLists, getPickledLists, getPickledList 52 | 53 | from logistic_sgd import LogisticRegression, load_data 54 | 55 | class HiddenLayer(object): 56 | def __init__(self, rng, input, n_in, n_out, W=None, b=None, 57 | activation=T.tanh): 58 | """ 59 | Typical hidden layer of a MLP: units are fully-connected and have 60 | sigmoidal activation function. Weight matrix W is of shape (n_in,n_out) 61 | and the bias vector b is of shape (n_out,). 62 | 63 | NOTE : The nonlinearity used here is tanh 64 | 65 | Hidden unit activation is given by: tanh(dot(input,W) + b) 66 | 67 | :type rng: numpy.random.RandomState 68 | :param rng: a random number generator used to initialize weights 69 | 70 | :type input: theano.tensor.dmatrix 71 | :param input: a symbolic tensor of shape (n_examples, n_in) 72 | 73 | :type n_in: int 74 | :param n_in: dimensionality of input 75 | 76 | :type n_out: int 77 | :param n_out: number of hidden units 78 | 79 | :type activation: theano.Op or function 80 | :param activation: Non linearity to be applied in the hidden 81 | layer 82 | """ 83 | self.input = input 84 | 85 | # `W` is initialized with `W_values` which is uniformely sampled 86 | # from sqrt(-6./(n_in+n_hidden)) and sqrt(6./(n_in+n_hidden)) 87 | # for tanh activation function 88 | # the output of uniform if converted using asarray to dtype 89 | # theano.config.floatX so that the code is runable on GPU 90 | # Note : optimal initialization of weights is dependent on the 91 | # activation function used (among other things). 92 | # For example, results presented in [Xavier10] suggest that you 93 | # should use 4 times larger initial weights for sigmoid 94 | # compared to tanh 95 | # We have no info for other function, so we use the same as 96 | # tanh. 97 | if W is None: 98 | W_values = numpy.asarray(rng.uniform( 99 | low=-numpy.sqrt(6. / (n_in + n_out)), 100 | high=numpy.sqrt(6. / (n_in + n_out)), 101 | size=(n_in, n_out)), dtype=theano.config.floatX) 102 | if activation == theano.tensor.nnet.sigmoid: 103 | W_values *= 4 104 | 105 | W = theano.shared(value=W_values, name='W%d'%n_out, borrow=True) 106 | 107 | if b is None: 108 | b_values = numpy.zeros((n_out,), dtype=theano.config.floatX) 109 | b = theano.shared(value=b_values, name='b%d'%n_out, borrow=True) 110 | 111 | self.W = W 112 | self.b = b 113 | 114 | self.lin_output = T.dot(input, self.W) + self.b 115 | self.output = (self.lin_output if activation is None 116 | else activation(self.lin_output)) 117 | # parameters of the model 118 | self.params = [self.W, self.b] 119 | 120 | class MLP(object): 121 | """Multi-Layer Perceptron Class 122 | 123 | A multilayer perceptron is a feedforward artificial neural network model 124 | that has one layer or more of hidden units and nonlinear activations. 125 | Intermediate layers usually have as activation function tanh or the 126 | sigmoid function (defined here by a ``SigmoidalLayer`` class) while the 127 | top layer is a softmax layer (defined here by a ``LogisticRegression`` 128 | class). 129 | """ 130 | 131 | def __init__(self, rng, input, n_in, n_out, layers, weights, biases, 132 | includeAllParams = False, activation = T.tanh): 133 | """Initialize the parameters for the multilayer perceptron 134 | 135 | :type rng: numpy.random.RandomState 136 | :param rng: a random number generator used to initialize weights 137 | 138 | :type input: theano.tensor.TensorType 139 | :param input: symbolic variable that describes the input of the 140 | architecture (one minibatch) 141 | 142 | :type layers: numpy.array 143 | :param layers: the number of layers and the number of hidden units 144 | per layer 145 | 146 | :type weights: numpy.array of layer weights 147 | :param weights: the weights for each layer and the logistic layer 148 | (any or all elements may be None) 149 | 150 | :type biases: numpy.array of layer biases 151 | :param biases: the biases for each layer and the logistic layer 152 | (any or all elements may be None) 153 | 154 | :type includeAllParams: boolean 155 | :param includeAllParams: flag used to indicate that we want to 156 | include all parameters during training as opposed to just the 157 | top hidden layer and the logistic layer 158 | """ 159 | 160 | print 'building MLP' 161 | 162 | # initialize hidden layers 163 | self.hiddenLayers = [] 164 | 165 | # build hidden layers 166 | for i in range(len(layers)): 167 | if (i == 0): 168 | print 'Layer %d. n_in = %d, n_out = %d'%(i + 1, n_in, layers[i]) 169 | self.hiddenLayers.append(HiddenLayer(rng=rng, input=input, 170 | n_in=n_in, n_out=layers[i], 171 | activation=activation, 172 | W=weights[i], b=biases[i])) 173 | else: 174 | print 'Layer %d. n_in = %d, n_out = %d'%(i + 1, layers[i - 1], layers[i]) 175 | self.hiddenLayers.append(HiddenLayer(rng=rng, input=self.hiddenLayers[i - 1].output, 176 | n_in=layers[i - 1], n_out=layers[i], 177 | activation=activation, 178 | W=weights[i], b=biases[i])) 179 | 180 | lastHiddenLayer = self.hiddenLayers[-1] 181 | 182 | # The logistic regression layer gets as input the hidden units 183 | # of the last hidden layer 184 | self.logRegressionLayer = LogisticRegression( 185 | input=lastHiddenLayer.output, 186 | n_in=layers[-1], 187 | n_out=n_out, 188 | W=weights[-1], b=biases[-1]) 189 | 190 | # The MLP implemented here is called Progressive MLP as it learns parameters 191 | # of hidden layers one at a time, re-using the results for all prior 192 | # layers from previous learning iterations 193 | 194 | # L1 norm ; one regularization option is to enforce L1 norm to 195 | # be small 196 | if (includeAllParams == True): 197 | self.L1 = sum([abs(self.hiddenLayers[i].W.sum()) for i in xrange(len(layers))]) + \ 198 | abs(self.logRegressionLayer.W).sum() 199 | else: 200 | self.L1 = abs(lastHiddenLayer.W).sum() + abs(self.logRegressionLayer.W).sum() 201 | 202 | # square of L2 norm ; one regularization option is to enforce 203 | # square of L2 norm to be small 204 | if (includeAllParams == True): 205 | self.L2_sqr = sum([(self.hiddenLayers[i].W ** 2).sum() for i in xrange(len(layers))]) + \ 206 | (self.logRegressionLayer.W ** 2).sum() 207 | else: 208 | self.L2_sqr = (lastHiddenLayer.W ** 2).sum() + (self.logRegressionLayer.W ** 2).sum() 209 | 210 | # negative log likelihood of the MLP is given by the negative 211 | # log likelihood of the output of the model, computed in the 212 | # logistic regression layer 213 | self.negative_log_likelihood = self.logRegressionLayer.negative_log_likelihood 214 | 215 | # same holds for the function computing the number of errors 216 | self.errors = self.logRegressionLayer.errors 217 | 218 | # the error report function gives more detailed info on how well we 219 | # did on the training data. It generates for each class the number of 220 | # data points that were correctly classified, the number of data points 221 | # that were expected to be classified as belonging to the class, and 222 | # the numbers of data points that were incorrectly classifier with their 223 | # distribution over all the unexpected classes 224 | self.errorReport = self.logRegressionLayer.errorReport 225 | 226 | # the parameters of the model are the parameters of the final two layers it is 227 | # made out of, that is the last hidden layer that was added followed by the 228 | # logistic layer 229 | self.params = self.logRegressionLayer.params 230 | if (includeAllParams == True): 231 | for i in range(len(layers)): 232 | self.params = self.params + self.hiddenLayers[i].params 233 | else: 234 | self.params = self.params + lastHiddenLayer.params 235 | 236 | class ProgressiveMLP(object): 237 | """Training class for MLP 238 | 239 | The class implements methods to train an MLP, test the models generated 240 | against the held-out (validation) set and apply the model on the test 241 | data when there is an improvement in error rates over the validation set. 242 | The class implements a sliding window over the training data to ensure 243 | that at any given point in time, we only have as much data in GPU memory 244 | as the determined by the capacity of the GPU memory. 245 | 246 | The class also provides a classify method that can be used to classify 247 | a given data set using parameters learned from the training cycle. This 248 | can be used to target the model to different test sets. 249 | """ 250 | 251 | def __init__(self, n_in, n_out, layers, weights=None, biases=None, nBatchSize = 20, 252 | nWindowSize = 64000, includeAllParams = False): 253 | """Initialize the parameters for the multilayer perceptron trainer 254 | 255 | :type n_in: integer 256 | :param n_in: the number of inputs 257 | 258 | :type n_out: integer 259 | :param n_out: the number of outputs (classes) 260 | 261 | :type layers: numpy.array 262 | :param layers: the number of layers and the number of hidden units 263 | per layer 264 | 265 | :type weights: numpy.array of layer weights 266 | :param weights: the weights for each layer and the logistic layer 267 | (any or all elements may be None) 268 | 269 | :type biases: numpy.array of layer biases 270 | :param biases: the biases for each layer and the logistic layer 271 | (any or all elements may be None) 272 | 273 | :type nBatchSize: integer 274 | :param nBatchSize: the size of each minibatch in data points 275 | 276 | :type nWindowSize: integer 277 | :param nWindowSize: the size of the sliding window over the training 278 | data. This can be picked based on the size of the GPU memory 279 | """ 280 | 281 | self.classifier = None 282 | self.nSharedLen = nWindowSize 283 | self.batchSize = nBatchSize 284 | self.datasets = None 285 | self.n_in = n_in 286 | 287 | # allocate symbolic variables for the data 288 | self.index = T.lscalar('index') # index to minibatch 289 | self.x = T.matrix('x') # the data is presented as rasterized images 290 | self.y = T.ivector('y') # the labels are presented as 1D vector of 291 | # [int] labels 292 | 293 | rng = numpy.random.RandomState(1234) 294 | 295 | # construct the MLP class 296 | self.classifier = MLP(rng=rng, 297 | input = self.x, 298 | n_in = n_in, 299 | n_out = n_out, 300 | layers = layers, 301 | weights = weights, 302 | biases = biases, 303 | includeAllParams = includeAllParams) 304 | 305 | def initializeSharedData(self, data_xy, length, borrow=True): 306 | """ 307 | setup shared data for use on the GPU 308 | 309 | We allocate a numpy array that is as large as length and make it shared. 310 | All subsequent computations on the GPU use the sharedData_x and 311 | sharedData_y arrays. The length is configurable and should be so chosen 312 | that we can load as many elements in the available GPU memory 313 | """ 314 | data_x, data_y = data_xy 315 | sharedData_x = theano.shared(numpy.asarray(data_x[:length], 316 | dtype=theano.config.floatX), 317 | borrow=borrow) 318 | sharedData_y = theano.shared(numpy.asarray(data_y[:length], 319 | dtype=theano.config.floatX), 320 | borrow=borrow) 321 | return sharedData_x, sharedData_y 322 | 323 | def getNumberOfSplitBatches(self): 324 | """ 325 | Given a size of the arrays that are stored in the GPU memory, this 326 | method returns the number of batches that can be accomodated in 327 | that size 328 | """ 329 | return self.nSharedLen / self.batchSize 330 | 331 | def getWindowData(self, data_xy, miniBatchIndex): 332 | """ 333 | method used to return a chunk of data from the data_xy that is as big 334 | as the size of our sliding window, based on a miniBatchIndex. The 335 | miniBatchIndex will range over all the minibatches in data_xy. 336 | Given a miniBatchIndex, we determine which data chunk contains 337 | that miniBatchIndex and return that chunk from data_xy 338 | """ 339 | data_x, data_y = data_xy 340 | index = miniBatchIndex / self.getNumberOfSplitBatches() 341 | # print ' Returning data_xy[%d].'%(index) 342 | return data_x[index], data_y[index] 343 | 344 | def splitList(self, data_xy): 345 | """ 346 | method used to split data_xy into chunks, where each chunk is as big 347 | as the window size (self.nSharedLen) 348 | """ 349 | data_x, data_y = data_xy 350 | 351 | print 'in split. %d %d %d'%(len(data_x), len(data_y), len(data_x[0])) 352 | split_x = numpy.split(data_x, range(0, len(data_x), self.nSharedLen)) 353 | split_y = numpy.split(data_y, range(0, len(data_y), self.nSharedLen)) 354 | 355 | return split_x[1:], split_y[1:] 356 | 357 | def getNumberOfBatches(self, data_xy): 358 | """ 359 | method used to get the total number of batches in data_xy. The 360 | method expects an array of chunks, walks each chunk and accumulates 361 | the number of batches in that chunk 362 | """ 363 | data_x, data_y = data_xy 364 | nBatches = 0 365 | for i in range(0, len(data_x)): 366 | nBatches = nBatches + len(data_x[i]) / self.batchSize 367 | return nBatches 368 | 369 | def loadDataSets(self, datasets, datasetFileName, datasetDirectory, prefix, 370 | nTrainFiles, nValidFiles, nTestFiles): 371 | """ 372 | method to load the data sets. The data sets are expected to either be in 373 | a single pickled file that contains training, validation, and test sets 374 | as an array of tuples, where each tuple is two arrays, one for the data 375 | and the other for the labels. Please refer to the MNIST data format for 376 | more on this. We follow the same format here. 377 | 378 | :type datasets: numpy.array of tuples 379 | :param datasets: an array of tuples one each for training, validation and test 380 | 381 | :type datasetFileName: string 382 | :param datasetFileName: the name of a pickled file that contains all the data 383 | May be None in which case we expect that the datasetDirectory points to 384 | the location of the data. 385 | 386 | :type datasetDirectory: string 387 | :param datasetDirectory: location of the data 388 | 389 | :type prefix: string 390 | :param prefix: the filename prefix to use for the data files. The filenames 391 | are composed using the prefix and one of _train_, _valid_, or _test_ 392 | followed by an index ranging from 0 to the number of files for each. 393 | The number of training files, validation files and test files are also 394 | passed as additional arguments to this method. 395 | 396 | NOTE: We propose both mechanisms to load data as for smaller data sets 397 | the pickler can generate a single pickled file, but for large data sets 398 | we need to chunk the data up and pickle each chunk separately. This is 399 | due to a limitation in cPickle that cannot handle very large files 400 | 401 | """ 402 | if datasets is None: 403 | # Load the dataset 404 | if (datasetFileName is not None): 405 | f = gzip.open(datasetFileName, 'rb') 406 | trainSet, validSet, testSet = cPickle.load(f) 407 | f.close() 408 | else: 409 | trainSet = getPickledList(datasetDirectory, prefix + '_train_', nTrainFiles) 410 | validSet = getPickledList(datasetDirectory, prefix + '_valid_', nValidFiles) 411 | testSet = getPickledList(datasetDirectory, prefix + '_test_', nTestFiles) 412 | self.datasets = (trainSet, validSet, testSet) 413 | else: 414 | self.datasets = datasets 415 | 416 | def classify(self, learningRate = 0.01, L1_reg = 0.00, L2_reg = 0.0001, n_epochs = 1000, 417 | datasetFileName = None, 418 | datasetDirectory = None, 419 | prefix = 'gaze_data.pkl', 420 | nTrainFiles = 1, 421 | nValidFiles = 1, 422 | nTestFiles = 1, 423 | datasets = None, 424 | batchSize = 20): 425 | """ 426 | method used to classify a given test set against a model that is expected 427 | to have been loaded. The method is akin to a subset of the training method 428 | in that it simply computes the validation loss, test loss, and reports 429 | the test error, for a given model 430 | """ 431 | 432 | if self.datasets is None: 433 | self.loadDataSets(datasets, datasetFileName, datasetDirectory, prefix, 434 | nTrainFiles, nValidFiles, nTestFiles) 435 | 436 | self.batchSize = batchSize 437 | 438 | validSet = self.datasets[1] 439 | testSet = self.datasets[2] 440 | 441 | validSet_x, validSet_y = self.initializeSharedData(self.datasets[1],len(validSet[0])) 442 | testSet_x, testSet_y = self.initializeSharedData(self.datasets[2], len(testSet[0])) 443 | 444 | # compute number of minibatches for validation and testing 445 | nValidBatches = len(validSet[0]) / self.batchSize 446 | nTestBatches = len(testSet[0]) / self.batchSize 447 | 448 | print nValidBatches 449 | print nTestBatches 450 | 451 | # compiling a Theano function that computes the mistakes that are made 452 | # by the model on a minibatch 453 | test_model = theano.function(inputs=[self.index], 454 | outputs=self.classifier.errors(self.y), 455 | givens={ 456 | self.x: testSet_x[self.index * batchSize:(self.index + 1) * batchSize], 457 | self.y: T.cast(testSet_y[self.index * batchSize:(self.index + 1) * batchSize], 458 | 'int32')}) 459 | 460 | validate_model = theano.function(inputs=[self.index], 461 | outputs=self.classifier.errors(self.y), 462 | givens={ 463 | self.x: validSet_x[self.index * batchSize:(self.index + 1) * batchSize], 464 | self.y: T.cast(validSet_y[self.index * batchSize:(self.index + 1) * batchSize], 465 | 'int32')}) 466 | 467 | # error reporting function that computes the overall rate of misclassification 468 | # by class 469 | error_model = theano.function(inputs=[self.index], 470 | outputs=self.classifier.errorReport(self.y, batchSize), 471 | givens={ 472 | self.x: testSet_x[self.index * batchSize:(self.index + 1) * batchSize], 473 | self.y: T.cast(testSet_y[self.index * batchSize:(self.index + 1) * batchSize], 474 | 'int32')}) 475 | 476 | validationLosses = [validate_model(i) for i 477 | in xrange(nValidBatches)] 478 | validationLoss = numpy.mean(validationLosses) 479 | 480 | # test it on the test set 481 | testLosses = [test_model(i) for i 482 | in xrange(nTestBatches)] 483 | testScore = numpy.mean(testLosses) 484 | 485 | print(('Best validation score of %f %% with test performance %f %%') % 486 | (validationLoss * 100., testScore * 100.)) 487 | print('Classification errors by class') 488 | error_mat = [error_model(i) for i in xrange(nTestBatches)] 489 | class_errors = error_mat[0] 490 | for i in xrange(len(error_mat) - 1): 491 | class_errors = numpy.add(class_errors, error_mat[i + 1]) 492 | print class_errors 493 | 494 | def train(self, learningRate = 0.01, L1_reg = 0.00, L2_reg = 0.0001, n_epochs = 1000, 495 | datasetFileName = None, 496 | datasetDirectory = None, 497 | prefix = 'gaze_data.pkl', 498 | nTrainFiles = 1, 499 | nValidFiles = 1, 500 | nTestFiles = 1, 501 | datasets = None, 502 | batchSize = 20): 503 | """ 504 | method that trains the MLP 505 | 506 | The training data is accessed through the sliding window. For each 507 | epoch we walk through the training mini batches and compute a cost 508 | and update the model. Based on the validation frequency, the model 509 | is checked against the validation set and if there is an 510 | improvement at least as much as the improvement threshold, we check 511 | the model against the test set. Other pieces of code do things 512 | such as termination based on patience 513 | """ 514 | 515 | if self.datasets is None: 516 | self.loadDataSets(datasets, datasetFileName, datasetDirectory, prefix, 517 | nTrainFiles, nValidFiles, nTestFiles) 518 | 519 | # compute the size of the window we would like to use 520 | self.nSharedLen = batchSize * 2000 521 | self.batchSize = batchSize 522 | 523 | trainSet_x, trainSet_y = self.initializeSharedData(self.datasets[0], self.nSharedLen) 524 | 525 | validSet = self.datasets[1] 526 | testSet = self.datasets[2] 527 | 528 | validSet_x, validSet_y = self.initializeSharedData(self.datasets[1],len(validSet[0])) 529 | testSet_x, testSet_y = self.initializeSharedData(self.datasets[2], len(testSet[0])) 530 | 531 | trainSet = self.splitList(self.datasets[0]) 532 | 533 | # compute number of minibatches for training, validation and testing 534 | nTrainBatches = self.getNumberOfBatches(trainSet) 535 | nSplitTrainBatches = self.getNumberOfSplitBatches() 536 | nValidBatches = len(validSet[0]) / self.batchSize 537 | nTestBatches = len(testSet[0]) / self.batchSize 538 | 539 | print nTrainBatches 540 | print nSplitTrainBatches 541 | print nValidBatches 542 | print nTestBatches 543 | 544 | print '... building the model' 545 | 546 | # the cost we minimize during training is the negative log likelihood of 547 | # the model plus the regularization terms (L1 and L2); cost is expressed 548 | # here symbolically 549 | cost = self.classifier.negative_log_likelihood(self.y) \ 550 | + L1_reg * self.classifier.L1 \ 551 | + L2_reg * self.classifier.L2_sqr 552 | 553 | # compiling a Theano function that computes the mistakes that are made 554 | # by the model on a minibatch 555 | test_model = theano.function(inputs=[self.index], 556 | outputs=self.classifier.errors(self.y), 557 | givens={ 558 | self.x: testSet_x[self.index * batchSize:(self.index + 1) * batchSize], 559 | self.y: T.cast(testSet_y[self.index * batchSize:(self.index + 1) * batchSize], 560 | 'int32')}) 561 | 562 | validate_model = theano.function(inputs=[self.index], 563 | outputs=self.classifier.errors(self.y), 564 | givens={ 565 | self.x: validSet_x[self.index * batchSize:(self.index + 1) * batchSize], 566 | self.y: T.cast(validSet_y[self.index * batchSize:(self.index + 1) * batchSize], 567 | 'int32')}) 568 | 569 | # error reporting function that computes the overall rate of misclassification 570 | # by class 571 | error_model = theano.function(inputs=[self.index], 572 | outputs=self.classifier.errorReport(self.y, batchSize), 573 | givens={ 574 | self.x: testSet_x[self.index * batchSize:(self.index + 1) * batchSize], 575 | self.y: T.cast(testSet_y[self.index * batchSize:(self.index + 1) * batchSize], 576 | 'int32')}) 577 | 578 | # compute the gradient of cost with respect to theta (sotred in params) 579 | # the resulting gradients will be stored in a list gparams 580 | gparams = [] 581 | for param in self.classifier.params: 582 | gparam = T.grad(cost, param) 583 | gparams.append(gparam) 584 | 585 | # specify how to update the parameters of the model as a dictionary 586 | updates = {} 587 | # given two list the zip A = [ a1,a2,a3,a4] and B = [b1,b2,b3,b4] of 588 | # same length, zip generates a list C of same size, where each element 589 | # is a pair formed from the two lists : 590 | # C = [ (a1,b1), (a2,b2), (a3,b3) , (a4,b4) ] 591 | for param, gparam in zip(self.classifier.params, gparams): 592 | updates[param] = param - learningRate * gparam 593 | 594 | # compiling a Theano function `train_model` that returns the cost, but 595 | # in the same time updates the parameter of the model based on the rules 596 | # defined in `updates` 597 | train_model = theano.function(inputs=[self.index], outputs=cost, 598 | updates=updates, 599 | givens={ 600 | self.x: trainSet_x[self.index * batchSize:(self.index + 1) * batchSize], 601 | self.y: T.cast(trainSet_y[self.index * batchSize:(self.index + 1) * batchSize], 602 | 'int32')}) 603 | 604 | print '... training' 605 | 606 | # early-stopping parameters 607 | patience = 20000 # look as this many examples regardless 608 | patience_increase = 2 # wait this much longer when a new best is 609 | # found 610 | improvement_threshold = 0.995 # a relative improvement of this much is 611 | # considered significant 612 | validationFrequency = min(nTrainBatches, patience / 2) 613 | # go through this many 614 | # minibatche before checking the network 615 | # on the validation set; in this case we 616 | # check every epoch 617 | 618 | bestParams = None 619 | bestValidationLoss = numpy.inf 620 | bestIter = 0 621 | testScore = 0. 622 | startTime = time.clock() 623 | 624 | epoch = 0 625 | doneLooping = False 626 | 627 | while (epoch < n_epochs) and (not doneLooping): 628 | epoch = epoch + 1 629 | for minibatchIndex in xrange(nTrainBatches): 630 | 631 | actualMiniBatchIndex = minibatchIndex % nSplitTrainBatches 632 | # print ' actualMiniBatchIndex = %d. miniBatchIndex = %d'\ 633 | # %(actualMiniBatchIndex, minibatchIndex) 634 | if (actualMiniBatchIndex == 0): 635 | data_x, data_y = self.getWindowData(trainSet, minibatchIndex) 636 | # print ' Update. data_x[0][0] = %f, data_y[0] = %d.'%(data_x[0][0], data_y[0]) 637 | trainSet_x.set_value(data_x, borrow=True) 638 | trainSet_y.set_value(numpy.asarray(data_y, 639 | dtype=theano.config.floatX), 640 | borrow=True) 641 | 642 | minibatchAvgCost = train_model(actualMiniBatchIndex) 643 | # iteration number 644 | iter = epoch * nTrainBatches + minibatchIndex 645 | 646 | if (iter + 1) % validationFrequency == 0: 647 | # compute zero-one loss on validation set 648 | validationLosses = [validate_model(i) for i 649 | in xrange(nValidBatches)] 650 | thisValidationLoss = numpy.mean(validationLosses) 651 | 652 | print('epoch %i, minibatch %i/%i, validation error %f %%' % 653 | (epoch, minibatchIndex + 1, nTrainBatches, 654 | thisValidationLoss * 100.)) 655 | 656 | # if we got the best validation score until now 657 | if thisValidationLoss < bestValidationLoss: 658 | #improve patience if loss improvement is good enough 659 | if thisValidationLoss < bestValidationLoss * \ 660 | improvement_threshold: 661 | patience = max(patience, iter * patience_increase) 662 | 663 | bestValidationLoss = thisValidationLoss 664 | bestIter = iter 665 | 666 | # test it on the test set 667 | testLosses = [test_model(i) for i 668 | in xrange(nTestBatches)] 669 | testScore = numpy.mean(testLosses) 670 | 671 | print((' epoch %i, minibatch %i/%i, test error of ' 672 | 'best model %f %%') % 673 | (epoch, minibatchIndex + 1, nTrainBatches, 674 | testScore * 100.)) 675 | 676 | if patience <= iter: 677 | doneLooping = True 678 | break 679 | 680 | endTime = time.clock() 681 | print(('Optimization complete. Best validation score of %f %% ' 682 | 'obtained at iteration %i, with test performance %f %%') % 683 | (bestValidationLoss * 100., bestIter, testScore * 100.)) 684 | print('Classification errors by class') 685 | error_mat = [error_model(i) for i in xrange(nTestBatches)] 686 | class_errors = error_mat[0] 687 | for i in xrange(len(error_mat) - 1): 688 | class_errors = numpy.add(class_errors, error_mat[i + 1]) 689 | print class_errors 690 | print >> sys.stderr, ('The code for file ' + 691 | os.path.split(__file__)[1] + 692 | ' ran for %.2fm' % ((endTime - startTime) / 60.)) 693 | 694 | if __name__ == '__main__': 695 | datasetDirectory = None 696 | datasetFileName = None 697 | batchSize = 20 698 | nTrainFiles = 12 699 | nValidFiles = 1 700 | nTestFiles = 1 701 | layers = None 702 | inputs = 1000 703 | outputs = 5 704 | doClassify = False 705 | useParamsFromFile = False 706 | paramsFileName = None 707 | genModels = False 708 | modelsFileName = None 709 | prefix = 'gaze_data.pkl' 710 | 711 | for i in range(len(sys.argv)): 712 | if sys.argv[i] == '-d': 713 | datasetDirectory = sys.argv[i + 1] 714 | elif sys.argv[i] == '-f': 715 | datasetDirectory = None 716 | datasetFileName = sys.argv[i + 1] 717 | elif sys.argv[i] == '-b': 718 | batchSize = int(sys.argv[i + 1]) 719 | elif sys.argv[i] == '-nt': 720 | nTrainFiles = int(sys.argv[i + 1]) 721 | elif sys.argv[i] == '-nv': 722 | nValidFiles = int(sys.argv[i + 1]) 723 | elif sys.argv[i] == '-ns': 724 | nTestFiles = int(sys.argv[i + 1]) 725 | elif sys.argv[i] == '-p': 726 | prefix = sys.argv[i + 1] 727 | elif sys.argv[i] == '-l': 728 | l = sys.argv[i + 1] 729 | li = l.split(',') 730 | layers = numpy.array(li, dtype=numpy.int64) 731 | elif sys.argv[i] == '-o': 732 | outputs = int(sys.argv[i + 1]) 733 | elif sys.argv[i] == '-i': 734 | inputs = int(sys.argv[i + 1]) 735 | elif sys.argv[i] == '-classify': 736 | doClassify = True 737 | elif sys.argv[i] == '-gen': 738 | genModels = True 739 | modelsFileName = sys.argv[i + 1] 740 | elif sys.argv[i] == '-useparams': 741 | useParamsFromFile = True 742 | paramsFileName = sys.argv[i + 1] 743 | elif sys.argv[i] == '-h': 744 | print('Usage: mlp.py (-d datasetDir | -f datasetFileName) [-p prefix] [-b batchSize]' + 745 | '[-nt nTrainFiles] [-nv nValidFiles] [-ns nTestFiles]' + 746 | '[-i inputLength] [-o numClasses] [-gen modelFileName]'+ 747 | '[-l [nLayer1Size, nLayer2Size, ...]] [-classify] [-useparams paramsFileName]' + 748 | '[-h help]') 749 | sys.exit() 750 | 751 | if (doClassify == False): 752 | if (paramsFileName is not None): 753 | print 'loading parameters from ' + paramsFileName 754 | paramsFileHandle = gzip.open(paramsFileName, 'rb') 755 | params = cPickle.load(paramsFileHandle) 756 | weights, biases = params 757 | else: 758 | weights = [] 759 | biases = [] 760 | # + 1 for the logistic layer 761 | for i in range(len(layers) + 1): 762 | weights.append(None) 763 | biases.append(None) 764 | 765 | # initialize datasets 766 | datasets = None 767 | 768 | l = [] 769 | for i in xrange(len(layers)): 770 | W = [] 771 | b = [] 772 | l.append(layers[i]) 773 | 774 | for j in xrange(i): 775 | W.append(weights[j]) 776 | b.append(biases[j]) 777 | # One for the final hidden layer and another for the logistic layer 778 | W.extend([None, None]) 779 | b.extend([None, None]) 780 | 781 | mlp = ProgressiveMLP(n_in = inputs, n_out = outputs, layers = l, 782 | weights = W, biases = b) 783 | if (datasets is not None): 784 | mlp.train(datasets = datasets) 785 | else: 786 | mlp.train( 787 | datasetDirectory = datasetDirectory, 788 | datasetFileName = datasetFileName, 789 | prefix = prefix, 790 | batchSize = batchSize, 791 | nTrainFiles = nTrainFiles, 792 | nValidFiles = nValidFiles, 793 | nTestFiles = nTestFiles) 794 | 795 | weights = [] 796 | biases = [] 797 | for i in range(len(l)): 798 | weights.append(mlp.classifier.hiddenLayers[i].W) 799 | biases.append(mlp.classifier.hiddenLayers[i].b) 800 | datasets = mlp.datasets 801 | 802 | # final pass where we include all parameters 803 | weights.append(mlp.classifier.logRegressionLayer.W) 804 | biases.append(mlp.classifier.logRegressionLayer.b) 805 | mlp = ProgressiveMLP(n_in = inputs, n_out = outputs, layers = l, 806 | weights = weights, biases = biases, 807 | includeAllParams = True) 808 | mlp.train(datasets = datasets) 809 | 810 | # if we want the models written out then 811 | if (genModels == True): 812 | # first write out the models as numpy arrays using the pickler 813 | paramsFileName = datasetDirectory + '/params.gz' 814 | weights = [] 815 | biases = [] 816 | for i in xrange(len(mlp.classifier.hiddenLayers)): 817 | weights.append(mlp.classifier.hiddenLayers[i].W) 818 | biases.append(mlp.classifier.hiddenLayers[i].b) 819 | weights.append(mlp.classifier.logRegressionLayer.W) 820 | biases.append(mlp.classifier.logRegressionLayer.b) 821 | 822 | params = (weights, biases) 823 | paramsFileHandle = gzip.open(paramsFileName, 'wb') 824 | cPickle.dump(params, paramsFileHandle) 825 | 826 | # now write out the models as a text file for loading outside Python 827 | mfp = open(modelsFileName, 'wb') 828 | for i in xrange(len(mlp.classifier.hiddenLayers) + 1): 829 | W = weights[i].get_value() 830 | b = biases[i].get_value() 831 | mfp.write('%d,%d\n'%(W.shape[0], W.shape[1])) 832 | for j in xrange(W.shape[0]): 833 | for k in xrange(W.shape[1]): 834 | mfp.write('%f'%W[j][k]) 835 | mfp.write(',') 836 | mfp.write('\n') 837 | mfp.write('%d\n'%b.shape[0]) 838 | for j in xrange(b.shape[0]): 839 | mfp.write('%f'%b[j]) 840 | mfp.write(',') 841 | mfp.write('\n') 842 | mfp.close() 843 | else: 844 | # classify mode 845 | if (paramsFileName == ''): 846 | print 'Parameters file is required for sequential MLP' 847 | sys.exit() 848 | 849 | print 'loading parameters from ' + paramsFileName 850 | paramsFileHandle = gzip.open(paramsFileName, 'rb') 851 | params = cPickle.load(paramsFileHandle) 852 | weights, biases = params 853 | 854 | mlp = ProgressiveMLP(n_in=inputs, n_out=outputs, layers=layers, 855 | weights=weights, biases=biases) 856 | mlp.classify( 857 | datasetDirectory=datasetDirectory, 858 | datasetFileName=datasetFileName, 859 | prefix=prefix, 860 | batchSize=batchSize, 861 | nTrainFiles=nTrainFiles, 862 | nValidFiles=nValidFiles, 863 | nTestFiles=nTestFiles) 864 | 865 | -------------------------------------------------------------------------------- /dl/logistic_sgd.py: -------------------------------------------------------------------------------- 1 | """ 2 | This tutorial introduces logistic regression using Theano and stochastic 3 | gradient descent. 4 | 5 | Logistic regression is a probabilistic, linear classifier. It is parametrized 6 | by a weight matrix :math:`W` and a bias vector :math:`b`. Classification is 7 | done by projecting data points onto a set of hyperplanes, the distance to 8 | which is used to determine a class membership probability. 9 | 10 | Mathematically, this can be written as: 11 | 12 | .. math:: 13 | P(Y=i|x, W,b) &= softmax_i(W x + b) \\ 14 | &= \frac {e^{W_i x + b_i}} {\sum_j e^{W_j x + b_j}} 15 | 16 | 17 | The output of the model or prediction is then done by taking the argmax of 18 | the vector whose i'th element is P(Y=i|x). 19 | 20 | .. math:: 21 | 22 | y_{pred} = argmax_i P(Y=i|x,W,b) 23 | 24 | 25 | This tutorial presents a stochastic gradient descent optimization method 26 | suitable for large datasets, and a conjugate gradient optimization method 27 | that is suitable for smaller datasets. 28 | 29 | 30 | References: 31 | 32 | - textbooks: "Pattern Recognition and Machine Learning" - 33 | Christopher M. Bishop, section 4.3.2 34 | 35 | """ 36 | __docformat__ = 'restructedtext en' 37 | 38 | import cPickle 39 | import gzip 40 | import os 41 | import sys 42 | import time 43 | import math 44 | 45 | import numpy 46 | 47 | import theano 48 | import theano.tensor as T 49 | 50 | 51 | class LogisticRegression(object): 52 | """Multi-class Logistic Regression Class 53 | 54 | The logistic regression is fully described by a weight matrix :math:`W` 55 | and bias vector :math:`b`. Classification is done by projecting data 56 | points onto a set of hyperplanes, the distance to which is used to 57 | determine a class membership probability. 58 | """ 59 | 60 | def __init__(self, input, n_in, n_out, W=None, b=None): 61 | """ Initialize the parameters of the logistic regression 62 | 63 | :type input: theano.tensor.TensorType 64 | :param input: symbolic variable that describes the input of the 65 | architecture (one minibatch) 66 | 67 | :type n_in: int 68 | :param n_in: number of input units, the dimension of the space in 69 | which the datapoints lie 70 | 71 | :type n_out: int 72 | :param n_out: number of output units, the dimension of the space in 73 | which the labels lie 74 | 75 | """ 76 | 77 | self.n_out = n_out 78 | 79 | if W is None: 80 | # initialize with 0 the weights W as a matrix of shape (n_in, n_out) 81 | self.W = theano.shared(value=numpy.zeros((n_in, n_out), 82 | dtype=theano.config.floatX), 83 | name='W', borrow=True) 84 | else: 85 | self.W = W 86 | 87 | if b is None: 88 | # initialize the baises b as a vector of n_out 0s 89 | self.b = theano.shared(value=numpy.zeros((n_out,), 90 | dtype=theano.config.floatX), 91 | name='b', borrow=True) 92 | else: 93 | self.b = b 94 | 95 | # compute vector of class-membership probabilities in symbolic form 96 | self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b) 97 | 98 | # compute prediction as class whose probability is maximal in 99 | # symbolic form 100 | self.y_pred = T.argmax(self.p_y_given_x, axis=1) 101 | 102 | # parameters of the model 103 | self.params = [self.W, self.b] 104 | 105 | def negative_log_likelihood(self, y): 106 | """Return the mean of the negative log-likelihood of the prediction 107 | of this model under a given target distribution. 108 | 109 | .. math:: 110 | 111 | \frac{1}{|\mathcal{D}|} \mathcal{L} (\theta=\{W,b\}, \mathcal{D}) = 112 | \frac{1}{|\mathcal{D}|} \sum_{i=0}^{|\mathcal{D}|} \log(P(Y=y^{(i)}|x^{(i)}, W,b)) \\ 113 | \ell (\theta=\{W,b\}, \mathcal{D}) 114 | 115 | :type y: theano.tensor.TensorType 116 | :param y: corresponds to a vector that gives for each example the 117 | correct label 118 | 119 | Note: we use the mean instead of the sum so that 120 | the learning rate is less dependent on the batch size 121 | """ 122 | # y.shape[0] is (symbolically) the number of rows in y, i.e., 123 | # number of examples (call it n) in the minibatch 124 | # T.arange(y.shape[0]) is a symbolic vector which will contain 125 | # [0,1,2,... n-1] T.log(self.p_y_given_x) is a matrix of 126 | # Log-Probabilities (call it LP) with one row per example and 127 | # one column per class LP[T.arange(y.shape[0]),y] is a vector 128 | # v containing [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ..., 129 | # LP[n-1,y[n-1]]] and T.mean(LP[T.arange(y.shape[0]),y]) is 130 | # the mean (across minibatch examples) of the elements in v, 131 | # i.e., the mean log-likelihood across the minibatch. 132 | return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y]) 133 | 134 | def errors(self, y): 135 | """Return a float representing the number of errors in the minibatch 136 | over the total number of examples of the minibatch ; zero one 137 | loss over the size of the minibatch 138 | 139 | :type y: theano.tensor.TensorType 140 | :param y: corresponds to a vector that gives for each example the 141 | correct label 142 | """ 143 | 144 | # check if y has same dimension of y_pred 145 | if y.ndim != self.y_pred.ndim: 146 | raise TypeError('y should have the same shape as self.y_pred', 147 | ('y', target.type, 'y_pred', self.y_pred.type)) 148 | # check if y is of the correct datatype 149 | if y.dtype.startswith('int'): 150 | # the T.neq operator returns a vector of 0s and 1s, where 1 151 | # represents a mistake in prediction 152 | return T.mean(T.neq(self.y_pred, y)) 153 | else: 154 | raise NotImplementedError() 155 | 156 | def errorReport(self, y, n): 157 | # compute error rate by class 158 | # check if y has same dimension of y_pred 159 | if y.ndim != self.y_pred.ndim: 160 | raise TypeError('y should have the same shape as self.y_pred', 161 | ('y', target.type, 'y_pred', self.y_pred.type)) 162 | # check if y is of the correct datatype 163 | if y.dtype.startswith('int'): 164 | c = numpy.zeros((self.n_out, self.n_out + 1), dtype=numpy.int64) 165 | counts = T.as_tensor_variable(c) 166 | classVector = numpy.zeros(n) 167 | for i in xrange(self.n_out): 168 | othersVector = numpy.zeros(n) 169 | for j in xrange(self.n_out): 170 | counts = theano.tensor.basic.set_subtensor( 171 | counts[i, j], 172 | T.sum(T.and_(T.eq(self.y_pred, othersVector), 173 | T.eq(y, classVector)))) 174 | othersVector = othersVector + 1 175 | counts = theano.tensor.basic.set_subtensor( 176 | counts[i, self.n_out], 177 | T.sum(T.eq(y, classVector))) 178 | classVector = classVector + 1 179 | return counts 180 | else: 181 | raise NotImplementedError() 182 | 183 | def load_data(dataset): 184 | ''' Loads the dataset 185 | 186 | :type dataset: string 187 | :param dataset: the path to the dataset (here GAZE) 188 | ''' 189 | 190 | ############# 191 | # LOAD DATA # 192 | ############# 193 | 194 | print '... loading data' 195 | 196 | # Load the dataset 197 | f = gzip.open(dataset, 'rb') 198 | train_set, valid_set, test_set = cPickle.load(f) 199 | f.close() 200 | #train_set, valid_set, test_set format: tuple(input, target) 201 | #input is an numpy.ndarray of 2 dimensions (a matrix) 202 | #witch row's correspond to an example. target is a 203 | #numpy.ndarray of 1 dimensions (vector)) that have the same length as 204 | #the number of rows in the input. It should give the target 205 | #target to the example with the same index in the input. 206 | 207 | def shared_dataset(data_xy, borrow=True): 208 | """ Function that loads the dataset into shared variables 209 | 210 | The reason we store our dataset in shared variables is to allow 211 | Theano to copy it into the GPU memory (when code is run on GPU). 212 | Since copying data into the GPU is slow, copying a minibatch everytime 213 | is needed (the default behaviour if the data is not in a shared 214 | variable) would lead to a large decrease in performance. 215 | """ 216 | data_x, data_y = data_xy 217 | shared_x = theano.shared(numpy.asarray(data_x, 218 | dtype=theano.config.floatX), 219 | borrow=borrow) 220 | shared_y = theano.shared(numpy.asarray(data_y, 221 | dtype=theano.config.floatX), 222 | borrow=borrow) 223 | # When storing data on the GPU it has to be stored as floats 224 | # therefore we will store the labels as ``floatX`` as well 225 | # (``shared_y`` does exactly that). But during our computations 226 | # we need them as ints (we use labels as index, and if they are 227 | # floats it doesn't make sense) therefore instead of returning 228 | # ``shared_y`` we will have to cast it to int. This little hack 229 | # lets ous get around this issue 230 | return shared_x, T.cast(shared_y, 'int32') 231 | 232 | test_set_x, test_set_y = shared_dataset(test_set) 233 | valid_set_x, valid_set_y = shared_dataset(valid_set) 234 | train_set_x, train_set_y = shared_dataset(train_set) 235 | 236 | rval = [(train_set_x, train_set_y), (valid_set_x, valid_set_y), 237 | (test_set_x, test_set_y)] 238 | return rval 239 | 240 | 241 | def sgd_optimization_gaze(learning_rate=0.13, n_epochs=1000, 242 | dataset='/home/vishwa/work/dbn/train_utils/data/gaze_data.pkl.gz', 243 | batch_size=400): 244 | """ 245 | Demonstrate stochastic gradient descent optimization of a log-linear 246 | model 247 | 248 | This is demonstrated on GAZE data. 249 | 250 | :type learning_rate: float 251 | :param learning_rate: learning rate used (factor for the stochastic 252 | gradient) 253 | 254 | :type n_epochs: int 255 | :param n_epochs: maximal number of epochs to run the optimizer 256 | 257 | :type dataset: string 258 | :param dataset: the path of the GAZE dataset file 259 | 260 | """ 261 | datasets = load_data(dataset) 262 | 263 | train_set_x, train_set_y = datasets[0] 264 | valid_set_x, valid_set_y = datasets[1] 265 | test_set_x, test_set_y = datasets[2] 266 | 267 | # compute number of minibatches for training, validation and testing 268 | n_train_batches = train_set_x.get_value(borrow=True).shape[0] / batch_size 269 | n_valid_batches = valid_set_x.get_value(borrow=True).shape[0] / batch_size 270 | n_test_batches = test_set_x.get_value(borrow=True).shape[0] / batch_size 271 | 272 | ###################### 273 | # BUILD ACTUAL MODEL # 274 | ###################### 275 | print '... building the model' 276 | 277 | # allocate symbolic variables for the data 278 | index = T.lscalar() # index to a [mini]batch 279 | x = T.matrix('x') # the data is presented as rasterized images 280 | y = T.ivector('y') # the labels are presented as 1D vector of 281 | # [int] labels 282 | 283 | # construct the logistic regression class 284 | # Each GAZE image has size 100*100 285 | classifier = LogisticRegression(input=x, n_in=100 * 100, n_out=5) 286 | 287 | # the cost we minimize during training is the negative log likelihood of 288 | # the model in symbolic format 289 | cost = classifier.negative_log_likelihood(y) 290 | 291 | # compiling a Theano function that computes the mistakes that are made by 292 | # the model on a minibatch 293 | test_model = theano.function(inputs=[index], 294 | outputs=classifier.errors(y), 295 | givens={ 296 | x: test_set_x[index * batch_size: (index + 1) * batch_size], 297 | y: test_set_y[index * batch_size: (index + 1) * batch_size]}) 298 | 299 | validate_model = theano.function(inputs=[index], 300 | outputs=classifier.errors(y), 301 | givens={ 302 | x: valid_set_x[index * batch_size:(index + 1) * batch_size], 303 | y: valid_set_y[index * batch_size:(index + 1) * batch_size]}) 304 | 305 | # compute the gradient of cost with respect to theta = (W,b) 306 | g_W = T.grad(cost=cost, wrt=classifier.W) 307 | g_b = T.grad(cost=cost, wrt=classifier.b) 308 | 309 | # specify how to update the parameters of the model as a dictionary 310 | updates = {classifier.W: classifier.W - learning_rate * g_W, 311 | classifier.b: classifier.b - learning_rate * g_b} 312 | 313 | # compiling a Theano function `train_model` that returns the cost, but in 314 | # the same time updates the parameter of the model based on the rules 315 | # defined in `updates` 316 | train_model = theano.function(inputs=[index], 317 | outputs=cost, 318 | updates=updates, 319 | givens={ 320 | x: train_set_x[index * batch_size:(index + 1) * batch_size], 321 | y: train_set_y[index * batch_size:(index + 1) * batch_size]}) 322 | 323 | ############### 324 | # TRAIN MODEL # 325 | ############### 326 | print '... training the model' 327 | # early-stopping parameters 328 | patience = 5000 # look as this many examples regardless 329 | patience_increase = 2 # wait this much longer when a new best is 330 | # found 331 | improvement_threshold = 0.995 # a relative improvement of this much is 332 | # considered significant 333 | validation_frequency = min(n_train_batches, patience / 2) 334 | # go through this many 335 | # minibatche before checking the network 336 | # on the validation set; in this case we 337 | # check every epoch 338 | 339 | best_params = None 340 | best_validation_loss = numpy.inf 341 | test_score = 0. 342 | start_time = time.clock() 343 | 344 | done_looping = False 345 | epoch = 0 346 | while (epoch < n_epochs) and (not done_looping): 347 | epoch = epoch + 1 348 | for minibatch_index in xrange(n_train_batches): 349 | 350 | minibatch_avg_cost = train_model(minibatch_index) 351 | # iteration number 352 | iter = epoch * n_train_batches + minibatch_index 353 | 354 | if (iter + 1) % validation_frequency == 0: 355 | # compute zero-one loss on validation set 356 | validation_losses = [validate_model(i) 357 | for i in xrange(n_valid_batches)] 358 | this_validation_loss = numpy.mean(validation_losses) 359 | 360 | print('epoch %i, minibatch %i/%i, validation error %f %%' % \ 361 | (epoch, minibatch_index + 1, n_train_batches, 362 | this_validation_loss * 100.)) 363 | 364 | # if we got the best validation score until now 365 | if this_validation_loss < best_validation_loss: 366 | #improve patience if loss improvement is good enough 367 | if this_validation_loss < best_validation_loss * \ 368 | improvement_threshold: 369 | patience = max(patience, iter * patience_increase) 370 | 371 | best_validation_loss = this_validation_loss 372 | # test it on the test set 373 | 374 | test_losses = [test_model(i) 375 | for i in xrange(n_test_batches)] 376 | test_score = numpy.mean(test_losses) 377 | 378 | print((' epoch %i, minibatch %i/%i, test error of best' 379 | ' model %f %%') % 380 | (epoch, minibatch_index + 1, n_train_batches, 381 | test_score * 100.)) 382 | 383 | if patience <= iter: 384 | done_looping = True 385 | break 386 | 387 | end_time = time.clock() 388 | print(('Optimization complete with best validation score of %f %%,' 389 | 'with test performance %f %%') % 390 | (best_validation_loss * 100., test_score * 100.)) 391 | print 'The code run for %d epochs, with %f epochs/sec' % ( 392 | epoch, 1. * epoch / (end_time - start_time)) 393 | print >> sys.stderr, ('The code for file ' + 394 | os.path.split(__file__)[1] + 395 | ' ran for %.1fs' % ((end_time - start_time))) 396 | 397 | if __name__ == '__main__': 398 | sgd_optimization_gaze() 399 | -------------------------------------------------------------------------------- /dl/pickler.py: -------------------------------------------------------------------------------- 1 | """ 2 | pickler.py 3 | This file implements a pickler for the image data. The input files are 4 | expected to be in the MNIST dataset format. The output is a set of files 5 | in the Theano data format. 6 | 7 | Please refer to the MNIST documentation for the input format. 8 | The output format consists of a numpy array of tuples, the first is the 9 | training data, the second is the validation data and the third is the 10 | test data. Each tuple consists of an array of image vectors, where each 11 | vector is a floating point pixel value for image pixels stored in 12 | row-major, and an array of labels for each image vector. 13 | 14 | Since cPickle has limits on the sizes of the arrays that it can process 15 | in memory, we generate chunks of the arrays in files named as follows, 16 | 17 | _train_%d.gz for the training data 18 | _valid_%d.gz for the validation data 19 | _test_%d.gz for the test data 20 | 21 | The prefix we use is typically gaze_data_pkl. 22 | 23 | We set an arbitrary limit of 8000 datapoints and labels per file. 24 | """ 25 | __docformat__ = 'restructedtext en' 26 | 27 | import datetime, shutil, glob, sys, os, thread 28 | import time, math, re, unicodedata, struct 29 | import cPickle, gzip 30 | import numpy 31 | 32 | global trainDataFile 33 | global trainLabelFile 34 | global validDataFile 35 | global validLabelFile 36 | global testDataFile 37 | global testLabelFile 38 | 39 | def readMagic(fileHandle): 40 | fileHandle.seek(0) 41 | return struct.unpack('i', fileHandle.read(4)) 42 | 43 | def readLength(fileHandle): 44 | fileHandle.seek(4) 45 | return struct.unpack('i', fileHandle.read(4)) 46 | 47 | def readWidth(fileHandle): 48 | fileHandle.seek(8) 49 | return struct.unpack('i', fileHandle.read(4)) 50 | 51 | def readHeight(fileHandle): 52 | fileHandle.seek(12) 53 | return struct.unpack('i', fileHandle.read(4)) 54 | 55 | def readByte(fileHandle): 56 | return struct.unpack('B', fileHandle.read(1)) 57 | 58 | def loadData(dataFile, labelFile): 59 | """ 60 | method used to load the input file that is expected to be in the MNIST 61 | data format 62 | """ 63 | dataFileHandle = open(dataFile, 'rb') 64 | labelFileHandle = open(labelFile, 'rb') 65 | length = readLength(dataFileHandle)[0] 66 | width = readWidth(dataFileHandle)[0] 67 | height = readHeight(dataFileHandle)[0] 68 | 69 | data = numpy.zeros((length, width * height), numpy.float32) 70 | labels = numpy.zeros((length), numpy.int64) 71 | 72 | dataFileHandle.seek(16) 73 | labelFileHandle.seek(8) 74 | labelFmt = '' 75 | for i in range(length): 76 | labelFmt += 'B' 77 | labelBytes = numpy.int64(struct.unpack(labelFmt, labelFileHandle.read(length))) 78 | dataFmt = '' 79 | for i in range(width * height): 80 | dataFmt += 'B' 81 | for i in range(length): 82 | dataBytes = numpy.float32(struct.unpack(dataFmt,\ 83 | dataFileHandle.read(width * height))) 84 | data[i] = numpy.divide(dataBytes, 255.0) 85 | labels[i] = labelBytes[i] - 1 86 | # print(data[i]) 87 | # print(labels[i]) 88 | sys.stdout.write('.') 89 | sys.stdout.write('\n') 90 | dataFileHandle.close() 91 | labelFileHandle.close() 92 | return (data, labels) 93 | 94 | def getLists(directory, suffix): 95 | """ 96 | helper method used within the Theano code to load data from MNIST 97 | to Theano format, when the entire data can be encapsulated in a 98 | single file 99 | """ 100 | dataFileName = directory + '/data-train-' + suffix 101 | labelFileName = directory + '/label-train-' + suffix 102 | trainList = loadData(dataFileName, labelFileName) 103 | 104 | dataFileName = directory + '/data-valid-' + suffix 105 | labelFileName = directory + '/label-valid-' + suffix 106 | validList = loadData(dataFileName, labelFileName) 107 | 108 | dataFileName = directory + '/data-test-' + suffix 109 | labelFileName = directory + '/label-test-' + suffix 110 | testList = loadData(dataFileName, labelFileName) 111 | 112 | return (trainList, validList, testList) 113 | 114 | def getPickledLists(directory, prefix, nFiles): 115 | """ 116 | method used to load data from the MNIST data format to Theano data 117 | format when just the training data is chunked into multiple files, 118 | but the validation and test is in a single file 119 | """ 120 | for i in xrange(nFiles): 121 | dataFileName = directory + '/' + prefix + ('_train_%d.gz'%i) 122 | f = gzip.open(dataFileName, 'rb') 123 | trainList = cPickle.load(f) 124 | x, y = trainList 125 | if (i == 0): 126 | data_x = x 127 | data_y = y 128 | else: 129 | data_x = numpy.concatenate((data_x, x)) 130 | data_y = numpy.concatenate((data_y, y)) 131 | f.close() 132 | 133 | trainList = (data_x, data_y) 134 | 135 | f = gzip.open(directory + '/' + prefix + '.gz') 136 | validList, testList = cPickle.load(f) 137 | f.close() 138 | 139 | return (trainList, validList, testList) 140 | 141 | def getPickledList(directory, prefix, nFiles): 142 | """ 143 | method that is used to load a particular set of data files, 144 | either training, validation or test. This method expects as 145 | inputs the directory where the files are stored, the prefix 146 | to use and the number of files that contain the chunked 147 | data sets. 148 | """ 149 | for i in xrange(nFiles): 150 | dataFileName = directory + '/' + prefix + ('%d.gz'%i) 151 | f = gzip.open(dataFileName, 'rb') 152 | dataList = cPickle.load(f) 153 | x, y = dataList 154 | if (i == 0): 155 | data_x = x 156 | data_y = y 157 | else: 158 | data_x = numpy.concatenate((data_x, x)) 159 | data_y = numpy.concatenate((data_y, y)) 160 | f.close() 161 | 162 | return (data_x, data_y) 163 | 164 | def pickleMeThis(outFile): 165 | """ 166 | method used to pickle data in the Theano format. Since the pickler 167 | has limits on data size, we chunk the data arrays into 8000 168 | (arbitrary) elements per chunk and store them as pickled files. 169 | """ 170 | global trainDataFile 171 | global trainLabelFile 172 | global validDataFile 173 | global validLabelFile 174 | global testDataFile 175 | global testLabelFile 176 | 177 | trainList = loadData(trainDataFile, trainLabelFile) 178 | validList = loadData(validDataFile, validLabelFile) 179 | testList = loadData(testDataFile, testLabelFile) 180 | 181 | # checking data that was loaded 182 | print("length of trainList[0] = %d"%len(trainList[0])) 183 | print("length of trainList[1] = %d"%len(trainList[1])) 184 | print("length of trainList[0][1] = %d"%len(trainList[0][1])) 185 | print("label for trainList[0][1] = %d"%(trainList[1][1])) 186 | 187 | print("length of validList[0] = %d"%len(validList[0])) 188 | print("length of validList[1] = %d"%len(validList[1])) 189 | print("length of validList[0][1] = %d"%len(validList[0][1])) 190 | print("label for validList[0][1] = %d"%(validList[1][1])) 191 | 192 | print("length of testList[0] = %d"%len(testList[0])) 193 | print("length of testList[1] = %d"%len(testList[1])) 194 | print("length of testList[0][1] = %d"%len(testList[0][1])) 195 | print("label for testList[0][1] = %d"%(testList[1][1])) 196 | 197 | index = 0 198 | data_x, data_y = trainList 199 | for i in range(0, len(trainList[0]), 8000): 200 | trainFileHandle = gzip.open(outFile + ('_train_%d.gz'%index), 'wb') 201 | lb = i 202 | rb = min(lb + 8000, len(trainList[0])) 203 | fragment = (data_x[lb:rb], data_y[lb:rb]) 204 | cPickle.dump(fragment, trainFileHandle) 205 | trainFileHandle.close() 206 | index = index + 1 207 | 208 | index = 0 209 | data_x, data_y = validList 210 | for i in range(0, len(validList[0]), 8000): 211 | validFileHandle = gzip.open(outFile + ('_valid_%d.gz'%index), 'wb') 212 | lb = i 213 | rb = min(lb + 8000, len(validList[0])) 214 | fragment = (data_x[lb:rb], data_y[lb:rb]) 215 | cPickle.dump(fragment, validFileHandle) 216 | validFileHandle.close() 217 | index = index + 1 218 | 219 | index = 0 220 | data_x, data_y = testList 221 | for i in range(0, len(testList[0]), 8000): 222 | testFileHandle = gzip.open(outFile + ('_test_%d.gz'%index), 'wb') 223 | lb = i 224 | rb = min(lb + 8000, len(testList[0])) 225 | fragment = (data_x[lb:rb], data_y[lb:rb]) 226 | cPickle.dump(fragment, testFileHandle) 227 | testFileHandle.close() 228 | index = index + 1 229 | 230 | if __name__ == "__main__": 231 | global trainDataFile 232 | global trainLabelFile 233 | global validDataFile 234 | global validLabelFile 235 | global testDataFile 236 | global testLabelFile 237 | 238 | trainDataFile = sys.argv[1] 239 | trainLabelFile = sys.argv[2] 240 | validDataFile = sys.argv[3] 241 | validLabelFile = sys.argv[4] 242 | testDataFile = sys.argv[5] 243 | testLabelFile = sys.argv[6] 244 | outFile = sys.argv[7] 245 | 246 | pickleMeThis(outFile) 247 | 248 | -------------------------------------------------------------------------------- /scripts/process.sh: -------------------------------------------------------------------------------- 1 | #!/bin/sh 2 | 3 | /home/vishwa/work/dbn/train_utils/install/bin/xmlToIDX -o ubyte -r 0.80 -v 0.05 -usebins -d /media/CESAR-EXT02/VCode/CESAR_May-Fri-25-17-05-43-2012 -d /media/CESAR-EXT02/VCode/CESAR_May-Fri-25-14-55-42-2012 -d /media/CESAR-EXT02/VCode/CESAR_May-Fri-25-11-10-26-2012 -d /media/CESAR-EXT02/VCode/CESAR_May-Fri-11-11-00-50-2012 4 | 5 | python pickler.py data-train-ubyte label-train-ubyte data-valid-ubyte label-valid-ubyte data-test-ubyte label-test-ubyte gaze_data.pkl 6 | 7 | -------------------------------------------------------------------------------- /utils/src/Annotations.cpp: -------------------------------------------------------------------------------- 1 | // Annotations.cpp 2 | // This file contains the implementation of class Annotations. 3 | // This class is used to read an annotations XML doc and provide an 4 | // iterator to the frame annotations. 5 | 6 | #include 7 | #include 8 | #include 9 | #include 10 | #include 11 | #include 12 | 13 | #include 14 | #include 15 | #include 16 | #include 17 | 18 | #include "Annotations.h" 19 | 20 | // Construction and desctruction 21 | 22 | Annotations::Annotations() { 23 | framesDirectory = ""; 24 | center.x = Globals::imgWidth / 2; 25 | center.y = Globals::imgHeight / 2; 26 | useBins = false; 27 | } 28 | 29 | Annotations::~Annotations() 30 | { 31 | // delete annotations 32 | for (unsigned int i = 0; i < frameAnnotations.size(); i++) 33 | delete frameAnnotations[i]; 34 | } 35 | 36 | // getData 37 | // This function takes as input a string and returns an annotation tag 38 | // corresponding to the annotation. It fills the CvPoint object with 39 | // the data read from that line 40 | 41 | Annotations::Tag Annotations::getData(string str, CvPoint& point) { 42 | const char* token = strtok((char*)str.c_str(), " <>"); 43 | if (token) { 44 | if (!strcmp(token, "/frame")) 45 | return EndFrame; 46 | else if (!strcmp(token, "annotations")) { 47 | token = strtok(NULL, " <>\""); 48 | if (token && !strncmp(token, "dir=", 4)) { 49 | token = strtok(NULL, " <>\""); 50 | if (!token) { 51 | string err = "Annotations::getData. Malformed annotations.xml. No directory name."; 52 | throw err; 53 | } 54 | framesDirectory = token; 55 | } 56 | token = strtok(NULL, " <>\""); 57 | if (token && !strncmp(token, "center=", 7)) { 58 | token = strtok(NULL, " <>\""); 59 | if (!token) { 60 | string err = "Annotations::getData. Malformed annotations.xml. No center."; 61 | throw err; 62 | } 63 | char* chP = (char*)strchr(token, ','); 64 | if (!chP) { 65 | string err = "Annotations::getData. Malformed annotations.xml. No center."; 66 | throw err; 67 | } 68 | *chP = 0; 69 | chP++; 70 | center.x = atoi(token); 71 | center.y = atoi(chP); 72 | } 73 | return Root; 74 | } else if (!strcmp(token, "frame")) 75 | return Frame; 76 | else if (!strcmp(token, "frameNumber")) { 77 | token = strtok(NULL, " <>"); 78 | point.x = (token)? atoi(token) : 0; 79 | return FrameNumber; 80 | } else if (!strcmp(token, "zone")) { 81 | token = strtok(NULL, " <>"); 82 | point.x = (token)? atoi(token) : 0; 83 | /* 84 | if (point.x == 2) 85 | point.x = 1; 86 | else if (point.x == 3) 87 | point.x = 2; 88 | else if (point.x == 4 || point.x == 5) 89 | point.x = 3; 90 | */ 91 | return Orientation; 92 | } else if (!strcmp(token, "status")) { 93 | token = strtok(NULL, " <>"); 94 | point.x = (token)? atoi(token) : 0; 95 | return CarStatus; 96 | } else if (!strcmp(token, "intersection")) { 97 | token = strtok(NULL, " <>"); 98 | point.x = (token)? atoi(token) : 0; 99 | return IntersectionType; 100 | } else if (token[0] != '/') { 101 | string tag = token; 102 | 103 | token = strtok(NULL, " <>"); 104 | const char* field = strtok((char*)token, ","); 105 | point.y = (field)? atoi(field) : 0; 106 | field = strtok(NULL, ","); 107 | point.x = (field)? atoi(field) : 0; 108 | 109 | if (tag == "face") { 110 | return Face; 111 | } 112 | } 113 | } 114 | return Ignore; 115 | } 116 | 117 | // trimEnds 118 | // The following method is called to trim the ends of sections of frames that 119 | // are labelled by sector. The number of frames to trim at either end is 120 | // specified through parameter nTrim. We do this to remove badly labelled 121 | // frames at the ends of sections. Since the labelling is contiguous, we 122 | // are seeing wrongly labelled images at the ends of each section of frames 123 | // by sector. This is particularly acute for sector 3 of which we have 124 | // an enormous number 125 | 126 | void Annotations::trimEnds(int sectorToTrim, int nTrim) { 127 | vector fas; 128 | int prevSector = Annotations::UnknownSector; 129 | unsigned int i = 0; 130 | while (i < frameAnnotations.size()) { 131 | FrameAnnotation* fa = frameAnnotations[i]; 132 | 133 | int sector = fa->getSector(); 134 | if (prevSector != sector && sector == sectorToTrim) { 135 | unsigned int start = i; 136 | unsigned int end = i; 137 | for (unsigned int j = i; j < frameAnnotations.size(); j++) { 138 | fa = frameAnnotations[j]; 139 | if (fa->getSector() != sectorToTrim) { 140 | end = j - 1; 141 | break; 142 | } 143 | } 144 | // we now have the indices for the range of frames with 145 | // sector equal to sectorToTrim 146 | for (unsigned int j = start + nTrim; j < end - nTrim; j++) { 147 | fa = frameAnnotations[j]; 148 | fas.push_back(new FrameAnnotation(fa)); 149 | } 150 | i = end; 151 | } else { 152 | fas.push_back(new FrameAnnotation(fa)); 153 | } 154 | i++; 155 | prevSector = sector; 156 | } 157 | // now delete and copy frames into frameAnnotations 158 | for (unsigned int i = 0; i < frameAnnotations.size(); i++) 159 | delete frameAnnotations[i]; 160 | frameAnnotations.clear(); 161 | for (unsigned int i = 0; i < fas.size(); i++) 162 | frameAnnotations.push_back(fas[i]); 163 | } 164 | 165 | // readAnnotations 166 | // The following method reads an XML file and populates the annotations vector 167 | 168 | void Annotations::readAnnotations(string& filename) { 169 | ifstream file; 170 | 171 | file.open((const char*)filename.c_str()); 172 | if (file.good()) { 173 | string line; 174 | int nFrame = 0; 175 | int sector = UnknownSector; 176 | int status = UnknownStatus; 177 | int intersection = UnknownIntersection; 178 | 179 | CvPoint temp, face; 180 | face.x = face.y = 0; 181 | 182 | getline(file, line); // ignore the first line 183 | 184 | while (!file.eof()) { 185 | getline(file, line); 186 | Tag tag = getData(line, temp); 187 | switch (tag) { 188 | case FrameNumber: { 189 | nFrame = temp.x; 190 | break; 191 | } 192 | case Orientation: { 193 | sector = temp.x; 194 | break; 195 | } 196 | case CarStatus: { 197 | status = temp.x; 198 | break; 199 | } 200 | case IntersectionType: { 201 | intersection = temp.x; 202 | break; 203 | } 204 | case Face: { 205 | face.x = temp.x; 206 | face.y = temp.y; 207 | break; 208 | } 209 | case EndFrame: { 210 | if (face.x && face.y) { 211 | FrameAnnotation* annotation = 212 | new FrameAnnotation(nFrame, face, sector, status, intersection); 213 | frameAnnotations.push_back(annotation); 214 | face.x = face.y = 0; 215 | } else { 216 | FrameAnnotation* annotation = 217 | new FrameAnnotation(nFrame, center /* face */, sector, status, intersection); 218 | frameAnnotations.push_back(annotation); 219 | } 220 | break; 221 | } 222 | default: { 223 | continue; 224 | } 225 | } 226 | } 227 | } 228 | } 229 | 230 | // createBins 231 | // The following method is used to create bins for the five gaze sectors, with 232 | // the same number of frames per sector. Since data is typically disproportionately 233 | // skewed towards straigh-ahead, the models are over-trained for that sector 234 | 235 | void Annotations::createBins() { 236 | // first set the useBins flag so that we return the binned annotations on 237 | // subsequent calls for annotations 238 | useBins = true; 239 | 240 | int nBins = Globals::numZones; 241 | 242 | // create a counter for each bin and reset it 243 | int count[nBins]; 244 | for (int i = 0; i < nBins; i++) { 245 | vector* bin = new vector(); 246 | bins.push_back(bin); 247 | count[i] = 0; 248 | } 249 | 250 | // now iterate over all annotations and place them in their bin based 251 | // on their zone 252 | for (unsigned int i = 0; i < frameAnnotations.size(); i++) { 253 | FrameAnnotation* fa = frameAnnotations[i]; 254 | int index = fa->getSector() - 1; 255 | if (index < 0 || index > 4) 256 | continue; 257 | 258 | bins[index]->push_back(fa); 259 | count[index]++; 260 | } 261 | 262 | // Get the smallest bin size 263 | int sampleSize = INT_MAX; 264 | for (int i = 0; i < nBins; i++) 265 | if (sampleSize > count[i] && count[i]) 266 | sampleSize = count[i]; 267 | 268 | cout << "Creating " << sampleSize << " frames for each sector" << endl; 269 | 270 | // We now create a set of sampleSize * nBins frame annotations in the unif 271 | // vector 272 | for (int i = 0; i < nBins; i++) { 273 | // shuffle images in each bin before picking the first sampleSize images 274 | random_shuffle(bins[i]->begin(), bins[i]->end()); 275 | 276 | // for now pick the first sampleSize elements from each bin 277 | for (int j = 0; j < sampleSize; j++) { 278 | if (bins[i]->size()) { 279 | unif.push_back(bins[i]->back()); 280 | bins[i]->pop_back(); 281 | } 282 | } 283 | // Now that we are done with bin i, destroy it 284 | delete bins[i]; 285 | } 286 | 287 | // final shuffle of the collection of all images 288 | random_shuffle(unif.begin(), unif.end()); 289 | 290 | cout << "Pulled " << unif.size() << " frames from the dataset" << endl; 291 | } 292 | -------------------------------------------------------------------------------- /utils/src/Annotations.h: -------------------------------------------------------------------------------- 1 | #ifndef __ANNOTATIONS_H 2 | #define __ANNOTATIONS_H 3 | 4 | // Annotations.h 5 | // This file contains the definition of class Annotations. It provides an interface 6 | // to read and access the frames stored in an annotations file. 7 | 8 | #include 9 | #include 10 | 11 | #include "Globals.h" 12 | 13 | // openCV stuff 14 | #include 15 | #include 16 | 17 | using namespace std; 18 | 19 | // forward declaration 20 | class FrameAnnotation; 21 | 22 | class Annotations { 23 | private: 24 | // the directory containing the frames for a given annotations.xml file 25 | string framesDirectory; 26 | 27 | // the center of the face in sector 3 (straight ahead) 28 | CvPoint center; 29 | 30 | // use bins flag 31 | bool useBins; 32 | 33 | // the set of all annotations 34 | vector frameAnnotations; 35 | 36 | // bins for annotations. We want to only return annotations that are uniformly 37 | // distributed across the image regions within which the LOIs occur. This 38 | // ensures that the training is unbiased 39 | vector* > bins; 40 | 41 | // the set of annotations after binning. We will have min(bin sizes) * number of bins 42 | // annotations that will be placed in this vector 43 | vector unif; 44 | 45 | public: 46 | enum Tag { 47 | Root, 48 | Frame, 49 | FrameNumber, 50 | Face, 51 | Orientation, 52 | CarStatus, 53 | IntersectionType, 54 | EndFrame, 55 | Ignore 56 | }; 57 | enum Sector { 58 | DriverWindow = 1, 59 | LeftOfCenter, 60 | Center, 61 | RightOfCenter, 62 | PassengerWindow, 63 | CoPilot, 64 | OverLeftShoulder, 65 | OverRightShoulder, 66 | UnknownSector = 9 67 | }; 68 | enum Status { 69 | stationaryIgnore = 1, 70 | stationaryParked, 71 | stationaryAtIntersection, 72 | movingAtIntersection, 73 | movingInCarPark, 74 | movingOnStreet, 75 | UnknownStatus = 7 76 | }; 77 | enum Intersection { 78 | FourWay = 1, 79 | TJunction, 80 | CarParkExit, 81 | UnknownIntersection = 4 82 | }; 83 | 84 | Annotations(); 85 | ~Annotations(); 86 | 87 | void readAnnotations(string& filename); 88 | void trimEnds(int sector, int nTrim); 89 | void createBins(); 90 | CvPoint& getCenter() { return center; } 91 | string getFramesDirectory() { return framesDirectory; } 92 | 93 | vector& getFrameAnnotations() { 94 | return useBins? unif : frameAnnotations; 95 | } 96 | 97 | private: 98 | Annotations::Tag getData(string str, CvPoint& point); 99 | }; 100 | 101 | // Frame annotations class 102 | 103 | class FrameAnnotation { 104 | private: 105 | int nFrame; 106 | CvPoint face; 107 | int sector; 108 | int status; 109 | int intersection; 110 | 111 | public: 112 | FrameAnnotation() { 113 | nFrame = 0; 114 | face.x = face.y = 0; 115 | sector = Annotations::UnknownSector; 116 | status = Annotations::UnknownStatus; 117 | intersection = Annotations::UnknownIntersection; 118 | } 119 | FrameAnnotation(FrameAnnotation& fa) { 120 | nFrame = fa.nFrame; 121 | face.x = fa.face.x; 122 | face.y = fa.face.y; 123 | sector = fa.sector; 124 | status = fa.status; 125 | intersection = fa.intersection; 126 | } 127 | FrameAnnotation(int frame, CvPoint& f, int s, int st, int intx) { 128 | nFrame = frame; 129 | face.x = f.x; face.y = f.y; 130 | sector = s; 131 | status = st; 132 | intersection = intx; 133 | } 134 | FrameAnnotation(FrameAnnotation* fa) { 135 | nFrame = fa->getFrameNumber(); 136 | setFace(fa->getFace()); 137 | sector = fa->getSector(); 138 | status = fa->getStatus(); 139 | intersection = fa->getIntersection(); 140 | } 141 | 142 | int getFrameNumber() { return nFrame; } 143 | CvPoint& getFace() { return face; } 144 | int getSector() { return sector; } 145 | int getStatus() { return status; } 146 | int getIntersection() { return intersection; } 147 | 148 | CvPoint& getLOI(Annotations::Tag tag) { 149 | switch (tag) { 150 | case Annotations::Face: 151 | return getFace(); 152 | default: { 153 | string err = "FrameAnnotation::getLOI. Unknown tag."; 154 | throw(err); 155 | } 156 | } 157 | } 158 | 159 | // set functions 160 | void setFace(CvPoint& point) { 161 | face.x = point.x; 162 | face.y = point.y; 163 | } 164 | 165 | // print functions 166 | void print() { 167 | cout << "Frame: (" << nFrame << ") "; 168 | cout << "Face: (" << face.x << ", " << face.y << ") "; 169 | cout << "Sector: (" << sector << ") "; 170 | cout << "Status: (" << status << ") "; 171 | cout << "Intersection: (" << intersection << ")" << endl; 172 | } 173 | }; 174 | 175 | #endif 176 | -------------------------------------------------------------------------------- /utils/src/DataStream.py: -------------------------------------------------------------------------------- 1 | # DataStream.py 2 | 3 | """ 4 | This is a smaller version of the DataStream module containing only (part of) the 5 | function to read synchronized_data_streams.xml files 6 | 7 | Example: 8 | d = read_xml('/data01/processed_for_annotation/CESAR_May-Tue-29-18-55-16-2012/ 9 | synchronized_data_streams.xml') 10 | print d['gaze'][0] #(video_filename, frame_number, age_of_sample) 11 | """ 12 | 13 | import sys, os 14 | import unicodedata 15 | 16 | from subprocess import call 17 | from xml.dom.minidom import * 18 | 19 | sector = {'Driver Window': 1, 'Left of Center': 2, 'Center': 3, 'Right of Center': 4, 20 | 'Passenger Window':5, 'Copilot':6, 'OverRightShoulder':7, 'OverLeftShoulder':8, 21 | 'Other': 9}; 22 | status = {'stationary-Ignore': 1, 'stationary-Parked': 2, 'stationary-AtIntersection': 3, 23 | 'moving-AtInterSection': 4, 'moving-InCarPark': 5, 'moving-OnStreet': 6, 24 | 'Other': 7}; 25 | intersection = {'4way': 1, 'Tjunction': 2, 'CarParkExit': 3, 'other': 4}; 26 | 27 | #sector = {'driverWindow': 1, 'leftOfCenter': 2, 'center': 3, 'rightOfCenter': 4, 28 | # 'passengerWindow':5, 'coPilot':6, 'overRightShoulder':7, 'overLeftShoulder':8, 29 | # 'Other': 9}; 30 | #status = {'sIgnore': 1, 'sParked': 2, 'sAtIntersection': 3, 31 | # 'mAtInterSection': 4, 'mCarPark': 5, 'mOnStreet': 6, 32 | # 'Other': 7}; 33 | #intersection = {'4way': 1, 'Tjunction': 2, 'carParkExit': 3, 'other': 4}; 34 | 35 | annotations = {} 36 | 37 | def read_xml(xml_filename): 38 | """ 39 | Read in a synchronized_data_streams.xml file, 40 | 41 | Return a dict of dicts. Return_value[stream_name][t] = 42 | (video_filename, frame_number, age) 43 | """ 44 | print 'reading xml DataStream' 45 | doc=parse(xml_filename) 46 | header = doc.childNodes[0].childNodes[1] 47 | sync_points = [n for n in doc.childNodes[0].childNodes if n.nodeName == 'sync_point'] 48 | 49 | data_types = dict() 50 | paced_time_data_maps = dict() 51 | ages = dict() 52 | stream_identifiers = [n for n in header.childNodes if n.nodeName == 'stream'] 53 | stream_names = [] 54 | 55 | for s in stream_identifiers: 56 | name = s.getAttribute('sensor_name') 57 | data_type = s.getAttribute('data_type') 58 | if not data_type == 'video_file_reference:frame_number': 59 | continue 60 | stream_names.append(name) 61 | data_types[name] = data_type 62 | paced_time_data_maps[name] = dict() 63 | ages[name] = dict() 64 | print 'stream names:', stream_names 65 | 66 | for p in sync_points: 67 | t = float(p.getAttribute('timestamp')) 68 | for sample in [n for n in p.childNodes if n.nodeName == 'sample']: 69 | stream_name = sample.getAttribute('stream') 70 | if not stream_name in stream_names: 71 | continue 72 | data = sample.getAttribute('data') 73 | #print 'DATA:', data 74 | video_filename = data.split(':')[0] 75 | frame_number = int(data.split(':')[1]) 76 | age = sample.getAttribute('age') 77 | paced_time_data_maps[stream_name][t] = (video_filename, frame_number, age) 78 | 79 | ans = dict() 80 | for s in stream_names: 81 | ans[s] = paced_time_data_maps[s] 82 | 83 | return ans 84 | 85 | def find(lb, ub, lIndex, rIndex, stream, keys): 86 | key = keys[lIndex] 87 | value = stream[key] 88 | key = key + float(unicode.decode(value[2])) 89 | if (key >= lb and key <= ub): 90 | return lIndex 91 | key = keys[rIndex] 92 | value = stream[key] 93 | key = key + float(unicode.decode(value[2])) 94 | if (key >= lb and key <= ub): 95 | return rIndex 96 | if (lIndex == rIndex): 97 | return None 98 | index = (lIndex + rIndex) / 2 99 | key = keys[index] 100 | value = stream[key] 101 | key = key + float(unicode.decode(value[2])) 102 | if (key >= lb and key <= ub): 103 | return index 104 | if (key < lb): 105 | return find(lb, ub, index + 1, rIndex, stream, keys) 106 | return find(lb, ub, lIndex, index - 1, stream, keys) 107 | 108 | def insertAnnotation(key, value, ann): 109 | # print '%f = %s'%(key, value) 110 | aType, aVal = ann 111 | f, frame, delta = value 112 | fileName = unicode.decode(f) 113 | annotationKey = '%s_%0*d'%(fileName, 4, frame) 114 | if annotationKey in annotations: 115 | annotations[annotationKey][aType] = aVal 116 | else: 117 | annotations[annotationKey] = {} 118 | annotations[annotationKey]['timestamp'] = key 119 | annotations[annotationKey]['sector'] = 9 120 | annotations[annotationKey]['status'] = 7 121 | annotations[annotationKey]['intersection'] = 4 122 | annotations[annotationKey][aType] = aVal 123 | 124 | # Returns a list of frames that are included in a given time range. Since the 125 | # data stream is sorted by timestamp. The data stream keys are timestamps that 126 | # may or may not fall within the range passed to this function. We search through 127 | # the sequence of gaze data using binary search 128 | 129 | def getIncludedFrames(lb, ub, ann, stream, keys): 130 | length = len(keys) 131 | # find the first key in stream that falls within the range requested 132 | index = find(lb, ub, 0, length - 1, stream, keys) 133 | if index is not None: 134 | key = keys[index] 135 | value = stream[key] 136 | key = key + float(unicode.decode(value[2])) 137 | insertAnnotation(key, value, ann) 138 | i = index 139 | while i > 0: 140 | i = i - 1 141 | key = keys[i] 142 | value = stream[key] 143 | key = key + float(unicode.decode(value[2])) 144 | if key >= lb and key <= ub: 145 | insertAnnotation(key, value, ann) 146 | else: 147 | break 148 | i = index 149 | while i < length - 1: 150 | i = i + 1 151 | key = keys[i] 152 | value = stream[key] 153 | key = key + float(unicode.decode(value[2])) 154 | if key >= lb and key <= ub: 155 | insertAnnotation(key, value, ann) 156 | else: 157 | break 158 | 159 | def getAnnotation(token): 160 | try: 161 | key, value = 'sector', sector[token] 162 | except KeyError: 163 | try: 164 | key, value = 'status', status[token] 165 | except KeyError: 166 | key, value = 'intersection', intersection[token] 167 | return key, value 168 | 169 | def read_vcode(fileName): 170 | f = open(fileName, 'r') 171 | # ignore the first four lines 172 | for i in range(4): 173 | line = f.readline() 174 | line = f.readline() 175 | data = {} 176 | while line: 177 | tokens = line.split(',') 178 | lb = float(tokens[0]) 179 | rb = lb + float(tokens[1]) 180 | try: 181 | data[lb] = (rb, getAnnotation(tokens[2])) 182 | except (KeyError): 183 | pass 184 | line = f.readline() 185 | return data 186 | 187 | def blowUpVideo(fileName): 188 | call(["/bin/mkdir", "-p", "_temp"]) 189 | call(["/bin/rm", "-f", "_temp/*.bmp"]) 190 | call(["/bin/rm", "-f", "/run/shm/gaze*"]) 191 | imageNames = '_temp/frame_%d.bmp' 192 | avibz2 = '/run/shm/gaze.avi.bz2' 193 | call(["/bin/cp", "-f", fileName + '.bz2', avibz2]) 194 | call(["bunzip2", "-f", avibz2]) 195 | call(["ffmpeg", "-i", "/run/shm/gaze.avi", "-sameq", imageNames]) 196 | 197 | def buildAnnotations(keys, datasetDir, prefix, outputDir): 198 | destFrame = 1 199 | currentFileName = "" 200 | call(["mkdir", "-p", outputDir]) 201 | annotationsFileName = outputDir + '/annotations.xml' 202 | annotationsFile = open(annotationsFileName, 'w') 203 | annotationsFile.write('\n') 204 | annotationsFile.write('\n') 205 | lenOfKeys = len(keys) 206 | for i in range(lenOfKeys): 207 | aKey = keys[i] 208 | value = annotations[aKey] 209 | # ignore the first 10 and last 10 frames for sector 3. The problem 210 | # is that we have a disproportionate number of sector 3 frames. 211 | # The end points of these sections of the video have frames that 212 | # are labelled either 2 or 4 in the annotations, which generate 213 | # a large number of badly labelled sector 2 and 4 frames, not to 214 | # mention incorrect section 3 labellings. By ignoring the first 10 215 | # and last 10 frames we hope to have better labelled frames 216 | if (value['sector'] == 3 and i < 10 and i > (lenOfKeys - 10)): 217 | continue; 218 | fileName, frameStr = aKey.split('avi_') 219 | sourceFrame = int(frameStr) 220 | fileName = datasetDir + '/' + prefix + '/' + fileName + 'avi' 221 | tokens = aKey.split('-') 222 | # number = int(tokens[-1].split('.')[0]) 223 | if (fileName != currentFileName): 224 | print 'building annotations for %s'%aKey 225 | currentFileName = fileName 226 | blowUpVideo(currentFileName) 227 | sourceFileName = '_temp/frame_%d.bmp'%(sourceFrame + 1) 228 | destFileName = outputDir + ('/frame_%d.png'%destFrame) 229 | call(["convert", sourceFileName, destFileName]) 230 | annotationsFile.write(' \n') 231 | annotationsFile.write(' %d\n'%destFrame) 232 | annotationsFile.write(' 0,0\n') 233 | annotationsFile.write(' %d\n'%value['sector']) 234 | annotationsFile.write(' %d\n'%value['status']) 235 | annotationsFile.write(' %d\n'%value['intersection']) 236 | annotationsFile.write(' \n') 237 | destFrame = destFrame + 1 238 | annotationsFile.write('\n') 239 | annotationsFile.close() 240 | 241 | if __name__ == '__main__': 242 | if (len(sys.argv) < 3): 243 | print 'Usage: python DataStream.py ' 244 | sys.exit() 245 | 246 | vcodeFileName = sys.argv[1] 247 | datasetDir = sys.argv[2] 248 | outputDir = sys.argv[3] 249 | 250 | vcodeData = read_vcode(vcodeFileName) 251 | 252 | prefix = vcodeFileName[:vcodeFileName.index('-stitched')] 253 | summaryFileName = datasetDir + '/' + prefix + '/synchronized_data_streams.xml' 254 | 255 | stream = read_xml(summaryFileName) 256 | keys = stream['gaze'].keys() 257 | keys.sort() 258 | 259 | for key, value in sorted(vcodeData.iteritems()): 260 | # print ('%f = %s'%(key, value)) 261 | getIncludedFrames(key, value[0], value[1], stream['gaze'], keys) 262 | 263 | keys = annotations.keys() 264 | keys.sort() 265 | 266 | buildAnnotations(keys, datasetDir, prefix, outputDir) 267 | -------------------------------------------------------------------------------- /utils/src/GazeTracker.cpp: -------------------------------------------------------------------------------- 1 | // GazeTracker.cpp 2 | // File that contains the definition of the methods of class GazeTracker. It provides 3 | // all the functionality needed for in car gaze tracking. It uses the Filter, Location, 4 | // Trainer and Classifier classes underneath 5 | 6 | #include "GazeTracker.h" 7 | 8 | #ifdef SINGLETHREADED 9 | #define fftw_init_threads() ; 10 | #define fftw_plan_with_nthreads(a) ; 11 | #define fftw_cleanup_threads() ; 12 | #endif 13 | 14 | // static member initialization 15 | CvSize GazeTracker::roiSize; 16 | Trainer::KernelType GazeTracker::kernelType = Trainer::Polynomial; 17 | 18 | // Class construction and destruction 19 | 20 | GazeTracker::GazeTracker(string outputDir, bool online) { 21 | isOnline = online; 22 | 23 | // fftw3 initialization to use openmp. These functions should be called 24 | // once at application scope before any other fftw functions are called 25 | fftw_init_threads(); 26 | fftw_plan_with_nthreads(1); 27 | 28 | // check if the output directory exists, or else bail 29 | DIR* dir; 30 | dir = opendir(outputDir.c_str()); 31 | if (dir == NULL) { 32 | string err = "GazeTracker::GazeTracker. The directory " + outputDir + 33 | " does not exist. Bailing out."; 34 | throw (err); 35 | } 36 | closedir(dir); 37 | 38 | // compute full path names 39 | char fullPath[PATH_MAX + 1]; 40 | outputDirectory = realpath((const char*)outputDir.c_str(), fullPath); 41 | 42 | char* path = getenv("SVM_PATH"); 43 | if (!path) { 44 | string err = "GazeTracker::GazeTracker. The SVM_PATH environment variable is not set"; 45 | throw (err); 46 | } 47 | svmPath = path; 48 | classifier = 0; 49 | 50 | roiSize.width = Globals::roiWidth; 51 | roiSize.height = Globals::roiHeight; 52 | 53 | // we cannot initialize these extractors at this point because 54 | // we don't at this point what this object is going to be used 55 | // for. It could be for creating offline filters or it could 56 | // be for online tracking 57 | leftEyeExtractor = rightEyeExtractor = noseExtractor = 0; 58 | 59 | faceCenter.x = faceCenter.y = 0; 60 | 61 | // now read the config file and update state specific to the current 62 | // classification task 63 | readConfiguration(); 64 | } 65 | 66 | GazeTracker::~GazeTracker() { 67 | if (classifier) delete classifier; 68 | 69 | delete leftEyeExtractor; 70 | delete rightEyeExtractor; 71 | delete noseExtractor; 72 | 73 | // final cleanup of all fftw thread data 74 | fftw_cleanup_threads(); 75 | } 76 | 77 | // addFrameSet 78 | // Method used to add frame sets in the training data. These sets are the 79 | // directories containing training samples. Each directory is expected to 80 | // contain an annotations file with annotated LOIs 81 | 82 | void GazeTracker::addFrameSet(string directory) { 83 | // compute full path names 84 | char fullPath[PATH_MAX + 1]; 85 | string framesDirectory = realpath((const char*)directory.c_str(), fullPath); 86 | 87 | frameSetDirectories.push_back(framesDirectory); 88 | } 89 | 90 | // getWindowCenter 91 | // Method used to compute the center of the window we apply to a given 92 | // frame during LOI extraction 93 | 94 | CvPoint GazeTracker::getWindowCenter() { 95 | CvPoint windowCenter; 96 | 97 | windowCenter.x = Globals::roiWidth / 2; 98 | windowCenter.y = Globals::roiHeight / 2; 99 | 100 | return windowCenter; 101 | } 102 | 103 | // updateWindowCenter 104 | // Method used to update the min and max co-ordinates of the window center 105 | // by walking through all face annotations 106 | 107 | void GazeTracker::updateWindowCenter(string trainingDirectory, 108 | int& minX, int& maxX, int& minY, int& maxY) { 109 | Annotations annotations; 110 | 111 | // first capture the mapping from file names to locations of interest 112 | string locationsFileName = trainingDirectory + "/" + 113 | Globals::annotationsFileName; 114 | annotations.readAnnotations(locationsFileName); 115 | 116 | // now get the set of all annotations 117 | vector& frameAnnotations = annotations.getFrameAnnotations(); 118 | for (unsigned int i = 0; i < frameAnnotations.size(); i++) { 119 | FrameAnnotation* fa = frameAnnotations[i]; 120 | CvPoint& faceLocation = fa->getLOI(Annotations::Face); 121 | if (minX > faceLocation.x) 122 | minX = faceLocation.x; 123 | if (maxX < faceLocation.x) 124 | maxX = faceLocation.x; 125 | 126 | if (minY > faceLocation.y) 127 | minY = faceLocation.y; 128 | if (maxY < faceLocation.y) 129 | maxY = faceLocation.y; 130 | } 131 | } 132 | 133 | // computeWindowCenter 134 | // Method used to compute a window center for face annotations. This method 135 | // reverts to getWindowCenter for all annotations when not in training mode. 136 | // For in filter training mode, for all annotations other than the face, we 137 | // simply return the result of getWindowCenter, but for the face we use the 138 | // midpoint of extremal x and y co-ordinates 139 | 140 | CvPoint GazeTracker::computeWindowCenter(string trainingDirectory) { 141 | if (trainingDirectory == "" && 142 | !frameSetDirectories.size() && !trainingSetDirectories.size()) 143 | return getWindowCenter(); 144 | 145 | // iterate over all frameset directories and pick face annotations 146 | // to compute the average of the extremal face locations 147 | int minX = INT_MAX; 148 | int maxX = INT_MIN; 149 | int minY = INT_MAX; 150 | int maxY = INT_MIN; 151 | 152 | if (trainingDirectory == "") { 153 | for (unsigned int i = 0; i < frameSetDirectories.size(); i++) 154 | updateWindowCenter(frameSetDirectories[i], minX, maxX, minY, maxY); 155 | for (unsigned int i = 0; i < trainingSetDirectories.size(); i++) 156 | updateWindowCenter(trainingSetDirectories[i], minX, maxX, minY, maxY); 157 | } else { 158 | updateWindowCenter(trainingDirectory, minX, maxX, minY, maxY); 159 | } 160 | 161 | // now take the average of the min and max co-ordinate values 162 | CvPoint windowCenter; 163 | 164 | windowCenter.x = (minX + maxX) / 2; 165 | windowCenter.y = (minY + maxY) / 2; 166 | 167 | cout << "Face windowCenter = " << windowCenter.x << ", " << windowCenter.y << endl; 168 | 169 | return windowCenter; 170 | } 171 | 172 | // createFilters 173 | // Method used to create filters. This method will iterate over all the 174 | // training frames directories, add them to each filter we want to create, 175 | // create those filters and save them 176 | 177 | void GazeTracker::createFilters() { 178 | // get the center of the window function 179 | CvPoint windowCenter = getWindowCenter(); 180 | 181 | // create filters 182 | Filter* leftEyeFilter = new Filter(outputDirectory, Annotations::LeftEye, roiSize, 183 | Globals::gaussianWidth /* gaussian spread */, 184 | windowCenter, roiFunction); 185 | Filter* rightEyeFilter = new Filter(outputDirectory, Annotations::RightEye, roiSize, 186 | Globals::gaussianWidth /* gaussian spread */, 187 | windowCenter, roiFunction); 188 | Filter* noseFilter = new Filter(outputDirectory, Annotations::Nose, roiSize, 189 | Globals::gaussianWidth /* gaussian spread */, 190 | windowCenter, roiFunction); 191 | 192 | //leftEyeFilter->setAffineTransforms(); 193 | //rightEyeFilter->setAffineTransforms(); 194 | //noseFilter->setAffineTransforms(); 195 | 196 | for (unsigned int i = 0; i < frameSetDirectories.size(); i++) { 197 | cout << "Adding left eye annotations..." << endl; 198 | leftEyeFilter->addTrainingSet(frameSetDirectories[i]); 199 | 200 | cout << "Adding right eye annotations..." << endl; 201 | rightEyeFilter->addTrainingSet(frameSetDirectories[i]); 202 | 203 | cout << "Adding nose annotations..." << endl; 204 | noseFilter->addTrainingSet(frameSetDirectories[i]); 205 | } 206 | 207 | #pragma omp parallel sections num_threads(4) 208 | { 209 | #pragma omp section 210 | { 211 | cout << "Creating left eye filter..." << endl; 212 | leftEyeFilter->create(); 213 | leftEyeFilter->save(); 214 | } 215 | 216 | #pragma omp section 217 | { 218 | cout << "Creating right eye filter..." << endl; 219 | rightEyeFilter->create(); 220 | rightEyeFilter->save(); 221 | } 222 | 223 | #pragma omp section 224 | { 225 | cout << "Creating nose filter..." << endl; 226 | noseFilter->create(); 227 | noseFilter->save(); 228 | } 229 | } 230 | 231 | delete leftEyeFilter; 232 | delete rightEyeFilter; 233 | delete noseFilter; 234 | } 235 | 236 | // addTrainingSet 237 | // Method used to add directories with training data for SVM models. Each directory is 238 | // expected to contain an annotations file with annotated LOIs 239 | 240 | void GazeTracker::addTrainingSet(string directory) { 241 | // compute full path names 242 | char fullPath[PATH_MAX + 1]; 243 | string trainingDirectory = realpath((const char*)directory.c_str(), fullPath); 244 | 245 | trainingSetDirectories.push_back(trainingDirectory); 246 | } 247 | 248 | // train 249 | // Method used to train the SVM classifier. The trainer expects to find an 250 | // annotations file in the output directory that contains all the annotations that 251 | // were applied during filter training. All these annotations are used to create 252 | // models using the SVM Light trainer 253 | 254 | void GazeTracker::train() { 255 | if (!leftEyeExtractor) { 256 | // get the center of the window we want to apply 257 | CvPoint windowCenter = getWindowCenter(); 258 | leftEyeExtractor = new Location(outputDirectory, Annotations::LeftEye, 259 | windowCenter); 260 | rightEyeExtractor = new Location(outputDirectory, Annotations::RightEye, 261 | windowCenter); 262 | noseExtractor = new Location(outputDirectory, Annotations::Nose, 263 | windowCenter); 264 | } 265 | 266 | Trainer trainer(outputDirectory, kernelType, 267 | leftEyeExtractor, rightEyeExtractor, noseExtractor, 268 | roiFunction, svmPath); 269 | 270 | // add training sets 271 | for (unsigned int i = 0; i < trainingSetDirectories.size(); i++) 272 | trainer.addTrainingSet(trainingSetDirectories[i]); 273 | 274 | // generate models 275 | trainer.generate(); 276 | } 277 | 278 | // getZone 279 | // Method used to get the gaze zone given an image. If the classifier object is as 280 | // yet not created, we first create it here. If the online flag is set, then 281 | // we create online filters and then initialize the location extractors with those 282 | // filters, else we use the offline filters that are expected to have been 283 | // generated before this function is called 284 | 285 | int GazeTracker::getZone(IplImage* image, double& confidence, FrameAnnotation& fa) { 286 | if (!classifier) 287 | createClassifier(); 288 | 289 | fa.setFace(faceCenter); 290 | return classifier->getZone(image, confidence, fa); 291 | } 292 | 293 | // getFilterAccuracy 294 | // Method used to compute the error for a filter identified by xml tag for the 295 | // annotations in a given directory 296 | 297 | double GazeTracker::getFilterAccuracy(string trainingDirectory, Annotations::Tag xmlTag, 298 | Classifier::ErrorType errorType) { 299 | if (!classifier) 300 | createClassifier(); 301 | 302 | return classifier->getFilterError(trainingDirectory, xmlTag, errorType); 303 | } 304 | 305 | // getClassifierAccuracy 306 | // Method to get the classifier accuracy 307 | 308 | pair GazeTracker::getClassifierAccuracy(string trainingDirectory) { 309 | if (!classifier) 310 | createClassifier(); 311 | 312 | return classifier->getError(trainingDirectory); 313 | } 314 | 315 | // createClassifier 316 | // Method used to create a classifier object 317 | 318 | void GazeTracker::createClassifier() { 319 | // create location extractors 320 | if (isOnline) { 321 | // create new online filters and use them as filters in the location 322 | // extractors for all subsequent images 323 | 324 | CvPoint windowCenter = getWindowCenter(); 325 | 326 | // left eye filter and extractor 327 | Filter* filter = new OnlineFilter(outputDirectory, Annotations::LeftEye, roiSize, 328 | Globals::gaussianWidth /* gaussian spread */, 329 | Globals::learningRate, windowCenter); 330 | leftEyeExtractor = new Location(filter); 331 | 332 | // right eye filter and extractor 333 | filter = new OnlineFilter(outputDirectory, Annotations::RightEye, roiSize, 334 | Globals::gaussianWidth /* gaussian spread */, 335 | Globals::learningRate, windowCenter); 336 | rightEyeExtractor = new Location(filter); 337 | 338 | // nose filter and extractor 339 | filter = new OnlineFilter(outputDirectory, Annotations::Nose, roiSize, 340 | Globals::gaussianWidth /* gaussian spread */, 341 | Globals::learningRate, windowCenter); 342 | noseExtractor = new Location(filter); 343 | } else { 344 | if (!leftEyeExtractor) { 345 | CvPoint windowCenter = getWindowCenter(); 346 | leftEyeExtractor = new Location(outputDirectory, Annotations::LeftEye, 347 | windowCenter); 348 | rightEyeExtractor = new Location(outputDirectory, Annotations::RightEye, 349 | windowCenter); 350 | noseExtractor = new Location(outputDirectory, Annotations::Nose, 351 | windowCenter); 352 | } 353 | } 354 | 355 | // now create the classifier 356 | classifier = new Classifier(outputDirectory, kernelType, 357 | leftEyeExtractor, rightEyeExtractor, 358 | noseExtractor, roiFunction); 359 | } 360 | 361 | // showAnnotations 362 | // Method used to show annotations in the training set 363 | 364 | void GazeTracker::showAnnotations() { 365 | if (!frameSetDirectories.size()) 366 | return; 367 | 368 | string wName = "Annotations"; 369 | cvNamedWindow((const char*)wName.c_str(), CV_WINDOW_NORMAL | CV_WINDOW_AUTOSIZE); 370 | 371 | // initialize font and add text 372 | CvFont font; 373 | cvInitFont(&font, CV_FONT_HERSHEY_SIMPLEX, 1.0, 1.0, 0, 1, CV_AA); 374 | 375 | for (unsigned int i = 0; i < frameSetDirectories.size(); i++) { 376 | Annotations annotations; 377 | 378 | // first capture the mapping from file names to locations of interest 379 | string locationsFileName = frameSetDirectories[i] + "/" + 380 | Globals::annotationsFileName; 381 | annotations.readAnnotations(locationsFileName); 382 | 383 | string framesDirectory = annotations.getFramesDirectory(); 384 | string prefix = framesDirectory + "/frame_"; 385 | 386 | // now get the set of all annotations 387 | vector& frameAnnotations = annotations.getFrameAnnotations(); 388 | for (unsigned int j = 0; j < frameAnnotations.size(); j++) { 389 | FrameAnnotation* fa = frameAnnotations[j]; 390 | 391 | char buffer[256]; 392 | sprintf(buffer, "%d.png", fa->getFrameNumber()); 393 | string filename = prefix + buffer; 394 | IplImage* image = cvLoadImage((const char*)filename.c_str()); 395 | 396 | CvPoint& faceLocation = fa->getLOI(Annotations::Face); 397 | if (!faceLocation.x && !faceLocation.y) 398 | continue; 399 | 400 | cvCircle(image, fa->getLOI(Annotations::LeftEye), 5, cvScalar(0, 255, 255, 0), 2, 8, 0); 401 | cvCircle(image, fa->getLOI(Annotations::RightEye), 5, cvScalar(255, 255, 0, 0), 2, 8, 0); 402 | cvCircle(image, fa->getLOI(Annotations::Nose), 5, cvScalar(255, 0, 255, 0), 2, 8, 0); 403 | 404 | sprintf(buffer, "%d", fa->getZone()); 405 | cvPutText(image, buffer, cvPoint(580, 440), &font, cvScalar(255, 255, 255, 0)); 406 | 407 | cvShowImage((const char*)wName.c_str(), image); 408 | char c = cvWaitKey(); 409 | if (c != 'c') { 410 | cvReleaseImage(&image); 411 | break; 412 | } 413 | 414 | cvReleaseImage(&image); 415 | } 416 | } 417 | } 418 | 419 | // readConfiguration 420 | // Function used to read the config.xml file in the models directory. The file 421 | // contains the configuration that is specific to the location of the driver 422 | // with respect to the camera and will grow to other pieces of information 423 | // eventually 424 | 425 | void GazeTracker::readConfiguration() { 426 | string fileName = outputDirectory + '/' + Globals::configFileName; 427 | 428 | ifstream file; 429 | 430 | file.open((const char*)fileName.c_str()); 431 | if (file.good()) { 432 | string line; 433 | 434 | getline(file, line); // ignore the first line 435 | while (!file.eof()) { 436 | getline(file, line); 437 | if (line.find("center") != string::npos) { 438 | getline(file, line); 439 | const char* token = strtok((char*)line.c_str(), "<>/x"); 440 | if (!token) { 441 | string err = "GazeTracker::readConfiguration. Malformed config file for center"; 442 | throw (err); 443 | } 444 | faceCenter.x = atoi(token); 445 | getline(file, line); 446 | token = strtok((char*)line.c_str(), "<>/y"); 447 | if (!token) { 448 | string err = "GazeTracker::readConfiguration. Malformed config file for center"; 449 | throw (err); 450 | } 451 | faceCenter.y = atoi(token); 452 | } 453 | } 454 | } 455 | } 456 | 457 | // roiFunction 458 | // We support the use of LOIs identified to cull images to smaller regions of interest 459 | // (ROI) for use in locating future LOIs. This function is passed to the constructors 460 | // of the filter and classifier classes. Those classes in turn call this function when 461 | // they need a culled image. The input parameters are the original image, a frame 462 | // annotation object that is annotated with all the LOIs that we have found before this 463 | // function gets called. The offset parameter is an output parameter that contains the 464 | // offset of the ROI within the image. The function returns a culled image object 465 | 466 | IplImage* GazeTracker::roiFunction(IplImage* image, FrameAnnotation& fa, 467 | CvPoint& offset, Annotations::Tag xmlTag) { 468 | offset.x = 0; 469 | offset.y = 0; 470 | 471 | CvPoint& location = fa.getLOI(xmlTag); 472 | offset.y = location.y - (Globals::roiHeight / 2); 473 | offset.x = location.x - (Globals::roiWidth / 2); 474 | 475 | // now check if the roi overflows the image boundary. If it does then 476 | // we move it so that it is contained within the image boundary 477 | if (offset.x + Globals::roiWidth > Globals::imgWidth) 478 | offset.x = Globals::imgWidth - Globals::roiWidth; 479 | if (offset.x < 0) 480 | offset.x = 0; 481 | if (offset.y + Globals::roiHeight > Globals::imgHeight) 482 | offset.y = Globals::imgHeight - Globals::roiHeight; 483 | if (offset.y < 0) 484 | offset.y = 0; 485 | 486 | cvSetImageROI(image, cvRect(offset.x, offset.y, Globals::roiWidth, Globals::roiHeight)); 487 | IplImage* roi = cvCreateImage(cvGetSize(image), image->depth, image->nChannels); 488 | cvCopy(image, roi); 489 | cvResetImageROI(image); 490 | 491 | return roi; 492 | } 493 | -------------------------------------------------------------------------------- /utils/src/GazeTracker.h: -------------------------------------------------------------------------------- 1 | // GazeTracker.h 2 | // File that contains the definition of class GazeTracker. This class is used to 3 | // perform the following operations, 4 | // 1. Learn filters for LOIs 5 | // 2. Apply a filter to an image and get the co-ordinates of the LOI 6 | // 3. Train SVM models for given data 7 | // 4. Apply SVM models to classify gaze zones 8 | 9 | #ifndef __GAZETRACKER_H 10 | #define __GAZETRACKER_H 11 | 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | #include 20 | #include 21 | #include 22 | 23 | #include "Filter.h" 24 | #include "OnlineFilter.h" 25 | #include "Location.h" 26 | #include "Annotations.h" 27 | #include "Trainer.h" 28 | #include "Classifier.h" 29 | 30 | class GazeTracker { 31 | public: 32 | static Trainer::KernelType kernelType; // The SVM Kernel type 33 | 34 | static CvSize roiSize; // ROI size 35 | 36 | string outputDirectory; // The directory for all generated info 37 | 38 | // Construction and destruction 39 | GazeTracker(string outputDirectory, bool online); 40 | virtual ~GazeTracker(); 41 | 42 | // create filters 43 | void addFrameSet(string directory); 44 | void createFilters(); 45 | 46 | // train models for SVM 47 | void addTrainingSet(string directory); 48 | void train(); 49 | 50 | // get the gaze zone given an image 51 | int getZone(IplImage* image, double& confidence, FrameAnnotation& fa); 52 | 53 | // get error for a given filter used by the gaze tracker 54 | double getFilterAccuracy(string trainingDirectory, Annotations::Tag xmlTag, 55 | Classifier::ErrorType errorType); 56 | // get classification error 57 | pair getClassifierAccuracy(string trainingDirectory); 58 | 59 | // show annotations 60 | void showAnnotations(); 61 | 62 | // Function used to compute an image ROI based on partial frame annotations 63 | // As we recognize LOIs, we use those location to potentially cull the 64 | // input image and use a reduced ROI for subsequent recognition 65 | static IplImage* roiFunction(IplImage* image, FrameAnnotation& fa, 66 | CvPoint& offset, Annotations::Tag xmlTag); 67 | 68 | private: 69 | bool isOnline; // true when using online filters 70 | string svmPath; 71 | 72 | // the center of the face for classification 73 | CvPoint faceCenter; 74 | 75 | // The classifier 76 | Classifier* classifier; 77 | 78 | // The frame sets in the training data for filter generation 79 | vector frameSetDirectories; 80 | 81 | // The training set directories for SVM model generation 82 | vector trainingSetDirectories; 83 | 84 | // location extractors 85 | Location* leftEyeExtractor; 86 | Location* rightEyeExtractor; 87 | Location* noseExtractor; 88 | 89 | // create a classifier 90 | void createClassifier(); 91 | 92 | // window center functions 93 | CvPoint getWindowCenter(); 94 | CvPoint computeWindowCenter(string trainingDirectory = ""); 95 | void updateWindowCenter(string trainingDirectory, 96 | int& minX, int& maxX, int& minY, int& maxY); 97 | void readConfiguration(); 98 | }; 99 | 100 | #endif 101 | -------------------------------------------------------------------------------- /utils/src/Globals.cpp: -------------------------------------------------------------------------------- 1 | // Globals.cpp 2 | // File that defines all the constants used across all modules 3 | 4 | #include "Globals.h" 5 | 6 | int Globals::imgWidth = 640; 7 | int Globals::imgHeight = 480; 8 | int Globals::roiWidth = 500; 9 | int Globals::roiHeight = 350; 10 | int Globals::maxDistance = 100; 11 | int Globals::maxAngle = 180; 12 | int Globals::maxArea = 200; 13 | int Globals::binWidth = 10; 14 | int Globals::gaussianWidth = 21; 15 | int Globals::psrWidth = 30; 16 | int Globals::nPastLocations = 5; 17 | int Globals::noseDrop = 70; 18 | 19 | int Globals::smallBufferSize = 32; 20 | int Globals::midBufferSize = 256; 21 | int Globals::largeBufferSize = 1024; 22 | int Globals::nSequenceLength = 600; 23 | 24 | unsigned Globals::numZones = 5; 25 | 26 | double Globals::learningRate = 0.125; 27 | double Globals::initialGaussianScale = 0.5; 28 | double Globals::windowXScale = 30; 29 | double Globals::windowYScale = 25; 30 | 31 | string Globals::annotationsFileName = "annotations.xml"; 32 | string Globals::modelNamePrefix = "zone_"; 33 | string Globals::faceFilter = "MOSSE_Face"; 34 | string Globals::leftEyeFilter = "MOSSE_LeftEye"; 35 | string Globals::rightEyeFilter = "MOSSE_RightEye"; 36 | string Globals::noseFilter = "MOSSE_Nose"; 37 | 38 | string Globals::paramsFileName = "parameters.xml"; 39 | string Globals::configFileName = "config.xml"; 40 | -------------------------------------------------------------------------------- /utils/src/Globals.h: -------------------------------------------------------------------------------- 1 | // Globals.h 2 | // File that contains the set of all global constants and typedefs. We define a 3 | // class with static members for each constant that will be used at application 4 | // scope 5 | 6 | #ifndef __GLOBALS_H 7 | #define __GLOBALS_H 8 | 9 | #include 10 | 11 | // The standard OpenCV headers 12 | #include 13 | #include 14 | 15 | // The OMP stuff 16 | #include 17 | 18 | // fftw3 stuff 19 | #include 20 | 21 | using namespace std; 22 | 23 | class Globals { 24 | public: 25 | Globals() { } 26 | ~Globals() { } 27 | 28 | static int imgWidth; // default image width that we handle 29 | static int imgHeight; // image height 30 | static int roiWidth; // roi width 31 | static int roiHeight; // roi height 32 | static int maxDistance; // max pixel distance for normalization 33 | static int maxAngle; // max angle for normalization 34 | static int maxArea; // max area of the L, R and N triangle 35 | static int binWidth; // the bin width for binning annotations 36 | static int gaussianWidth; // width of the gaussian 37 | static int psrWidth; // width of window to compute PSR 38 | static int nPastLocations; // number of past locations for smoothing 39 | static int noseDrop; // approx. drop below the eyes for the nose 40 | 41 | static int smallBufferSize; // small stack buffer size 42 | static int midBufferSize; // mid stack buffer size 43 | static int largeBufferSize; // large stack buffer size 44 | static int nSequenceLength; // the length of frame sequences we need 45 | 46 | static unsigned numZones; // number of zones 47 | 48 | static double learningRate; // the learning rate for online filters 49 | static double initialGaussianScale; // the gaussian scale for the face filter 50 | 51 | // the window function is computed as image width times the x scale and 52 | // the image height times the y scale 53 | static double windowXScale; // the X scale factor for the window function 54 | static double windowYScale; // the Y scale factor for the window function 55 | 56 | static string annotationsFileName; // the name of the annotations file 57 | static string modelNamePrefix; // prefix for SVM model names 58 | static string faceFilter; // name of the face filter 59 | static string leftEyeFilter; // name of left eye filter 60 | static string rightEyeFilter; // name of right eye filter 61 | static string noseFilter; // name of nose filter 62 | static string paramsFileName; // name of parameters file 63 | static string configFileName; // name of the config file 64 | 65 | static void setRoiSize(CvSize size) { 66 | roiWidth = size.width; 67 | roiHeight = size.height; 68 | } 69 | }; 70 | 71 | #endif 72 | -------------------------------------------------------------------------------- /utils/src/Makefile: -------------------------------------------------------------------------------- 1 | # Makefile for ADM 2 | 3 | CC = g++ 4 | AR = ar 5 | LD = ld 6 | RANLIB = ranlib 7 | RM = /bin/rm 8 | MKDIR = /bin/mkdir 9 | CP = /bin/cp 10 | RM = /bin/rm 11 | 12 | #makedepend flags 13 | DFLAGS = 14 | 15 | #Compiler flags 16 | #if mode variable is empty, setting debug build mode 17 | ifeq ($(mode),opt) 18 | CFLAGS = -Wall -O3 -fPIC -shared -fopenmp 19 | BUILD_DIR = ../build/src.opt 20 | else 21 | mode = debug 22 | CFLAGS = -g -Wall -fPIC -shared -DSINGLETHREADED 23 | BUILD_DIR = ../build/src 24 | endif 25 | 26 | CFILES = Globals.cpp Preprocess.cpp Annotations.cpp xmlToIDX.cpp classify.cpp 27 | 28 | OFILES = Globals.o Preprocess.o Annotations.o xmlToIDX.o classify.o 29 | 30 | TRAIN_DATA_OFILES = $(BUILD_DIR)/Globals.o $(BUILD_DIR)/Preprocess.o $(BUILD_DIR)/Annotations.o 31 | 32 | INSTALL_DIR = ../../install/lib 33 | HEADER_DIR = ../../install/include 34 | BIN_DIR = ../../install/bin 35 | 36 | TRAIN_DATA_LIB = $(INSTALL_DIR)/libdata.a 37 | XML_TO_IDX = $(BIN_DIR)/xmlToIDX 38 | CLASSIFY = $(BIN_DIR)/classify 39 | TRAIN_DATA_INCLUDE = Annotations.h Preprocess.h 40 | 41 | OUT = $(TRAIN_DATA_LIB) $(XML_TO_IDX) 42 | 43 | INCLUDES = -I ./ `pkg-config opencv --cflags` `pkg-config fftw3 --cflags` 44 | LIBS = `pkg-config opencv --cflags --libs` `pkg-config fftw3 --cflags --libs` 45 | 46 | OBJS := $(patsubst %.cpp, $(BUILD_DIR)/%.o, $(filter %.cpp,$(CFILES))) 47 | 48 | #OBJS = $(patsubst %,$(BUILD_DIR)/%,$(OFILES)) 49 | 50 | .phony:all header 51 | 52 | all: information $(OUT) 53 | 54 | information: 55 | ifneq ($(mode),opt) 56 | ifneq ($(mode),debug) 57 | @echo "Invalid build mode." 58 | @echo "Please use 'make mode=opt' or 'make mode=debug'" 59 | @exit 1 60 | endif 61 | endif 62 | @echo "Building on "$(mode)" mode" 63 | @echo ".........................." 64 | 65 | $(BUILD_DIR)/%.o: %.cpp $(TRAIN_DATA_INCLUDE) 66 | $(MKDIR) -p $(BUILD_DIR) 67 | $(CC) -c $(INCLUDES) -o $@ $< $(CFLAGS) 68 | 69 | $(OUT): $(OBJS) 70 | $(MKDIR) -p $(INSTALL_DIR) 71 | $(MKDIR) -p $(BIN_DIR) 72 | $(MKDIR) -p $(HEADER_DIR) 73 | $(AR) rcs $(TRAIN_DATA_LIB) $(TRAIN_DATA_OFILES) 74 | $(RANLIB) $(TRAIN_DATA_LIB) 75 | $(CP) -p $(TRAIN_DATA_INCLUDE) $(HEADER_DIR) 76 | $(CC) -o $(XML_TO_IDX) $(BUILD_DIR)/xmlToIDX.o -L$(INSTALL_DIR) -ldata $(LIBS) 77 | $(CC) -o $(CLASSIFY) $(BUILD_DIR)/classify.o -L$(INSTALL_DIR) -ldata $(LIBS) 78 | @echo train_utils finished 79 | 80 | header: 81 | $(MKDIR) -p $(HEADER_DIR) 82 | $(CP) -p $(TRAIN_DATA_INCLUDE) $(HEADER_DIR) 83 | @echo header finished 84 | 85 | depend: 86 | makedepend -- $(DFLAGS) -- $(CFILES) 87 | 88 | .PHONY: clean 89 | 90 | clean: 91 | $(RM) -f $(BUILD_DIR)/*.o $(OUT) 92 | 93 | -------------------------------------------------------------------------------- /utils/src/Preprocess.cpp: -------------------------------------------------------------------------------- 1 | // Preprocess.cpp 2 | // This file contains the implementation of class Preprocess. 3 | // This class is used to generate training and test sets from directories of 4 | // labelled or unlabelled images. 5 | 6 | #include "Preprocess.h" 7 | 8 | // union to enable accessing bytes of an integer 9 | typedef union { 10 | unsigned int i; 11 | unsigned char u[4]; 12 | } IntBytesT; 13 | 14 | // Class construction and destruction 15 | 16 | Preprocess::Preprocess(string output, CvSize size, double scale, CvPoint& center, 17 | map& sFilter, map& iFilter, 18 | roiFnT roiFn, bool bins, 19 | bool inBinFmt, double binThreshold) { 20 | outputFileName = output; 21 | useBins = bins; 22 | roiFunction = roiFn; 23 | inBinaryFormat = inBinFmt; 24 | binaryThreshold = binThreshold; 25 | imgSize.height = size.height; 26 | imgSize.width = size.width; 27 | scaleFactor = scale; 28 | 29 | map::iterator it; 30 | for (it = sFilter.begin(); it != sFilter.end(); it++) { 31 | int key = (*it).first; 32 | bool value = (*it).second; 33 | statusFilter[key] = value; 34 | } 35 | for (it = iFilter.begin(); it != iFilter.end(); it++) { 36 | int key = (*it).first; 37 | bool value = (*it).second; 38 | intersectionFilter[key] = value; 39 | } 40 | 41 | nSamples = 0; 42 | doAffineTransforms = false; 43 | 44 | length = imgSize.height * imgSize.width; 45 | 46 | // allocate real complex vectors for use during filter creation or update 47 | realImg = cvCreateImage(imgSize, IPL_DEPTH_64F, 1); 48 | tempImg = cvCreateImage(imgSize, IPL_DEPTH_64F, 1); 49 | imageBuffer = (double*)fftw_malloc(sizeof(double) * length); 50 | 51 | windowCenter.x = center.x; 52 | windowCenter.y = center.y; 53 | 54 | // Now compute a window around the center 55 | double xSpread = imgSize.width * Globals::windowXScale; 56 | double ySpread = imgSize.height * Globals::windowYScale; 57 | window = createWindow(windowCenter, xSpread, ySpread); 58 | 59 | // cvNamedWindow("window", CV_WINDOW_NORMAL | CV_WINDOW_AUTOSIZE); 60 | } 61 | 62 | Preprocess::Preprocess(CvSize size, double scale, CvPoint& center, 63 | roiFnT roiFn) { 64 | roiFunction = roiFn; 65 | imgSize.height = size.height; 66 | imgSize.width = size.width; 67 | scaleFactor = scale; 68 | 69 | nSamples = 0; 70 | doAffineTransforms = false; 71 | 72 | length = imgSize.height * imgSize.width; 73 | 74 | // allocate real complex vectors for use during filter creation or update 75 | realImg = cvCreateImage(imgSize, IPL_DEPTH_64F, 1); 76 | tempImg = cvCreateImage(imgSize, IPL_DEPTH_64F, 1); 77 | imageBuffer = (double*)fftw_malloc(sizeof(double) * length); 78 | 79 | windowCenter.x = center.x; 80 | windowCenter.y = center.y; 81 | 82 | // Now compute a window around the center 83 | double xSpread = imgSize.width * Globals::windowXScale; 84 | double ySpread = imgSize.height * Globals::windowYScale; 85 | window = createWindow(windowCenter, xSpread, ySpread); 86 | 87 | // cvNamedWindow("window", CV_WINDOW_NORMAL | CV_WINDOW_AUTOSIZE); 88 | } 89 | 90 | Preprocess::~Preprocess() { 91 | cvReleaseImage(&realImg); 92 | cvReleaseImage(&tempImg); 93 | fftw_free(imageBuffer); 94 | fftw_free(window); 95 | 96 | for (unsigned int i = 0; i < fileToAnnotations.size(); i++) 97 | delete fileToAnnotations[i].second; 98 | for (unsigned int i = 0; i < fileToAnnotationsForValidation.size(); i++) 99 | delete fileToAnnotationsForValidation[i].second; 100 | for (unsigned int i = 0; i < fileToAnnotationsForTest.size(); i++) 101 | delete fileToAnnotationsForTest[i].second; 102 | } 103 | 104 | // addTrainingSet 105 | // Method used to add directories that contains images. For training data, 106 | // the directory is also expected to contains labels for each image. 107 | 108 | void Preprocess::addTrainingSet(string trainingDirectory) { 109 | Annotations annotations; 110 | 111 | // first capture the mapping from file names to locations of interest 112 | string locationsFileName = trainingDirectory + "/" + Globals::annotationsFileName; 113 | annotations.readAnnotations(locationsFileName); 114 | 115 | // trim ends 116 | annotations.trimEnds(Annotations::Center, 20 /* nTrim */); 117 | annotations.trimEnds(Annotations::DriverWindow, 10 /* nTrim */); 118 | annotations.trimEnds(Annotations::LeftOfCenter, 10 /* nTrim */); 119 | annotations.trimEnds(Annotations::RightOfCenter, 10 /* nTrim */); 120 | annotations.trimEnds(Annotations::PassengerWindow, 10 /* nTrim */); 121 | 122 | // if we want to pick the same number of frames from each sector, then 123 | // we create bins. This will pick as many frames for each sector as the 124 | // smallest number of frames we have across all sectors 125 | if (useBins) 126 | annotations.createBins(); 127 | 128 | // get the frames directory 129 | string framesDirectory = annotations.getFramesDirectory(); 130 | 131 | // get the window center 132 | CvPoint& center = annotations.getCenter(); 133 | 134 | // collect the number of transitions from each sector to other sectors 135 | unsigned int transitions[Globals::numZones][Globals::numZones]; 136 | for (unsigned int i = 0; i < Globals::numZones; i++) 137 | for (unsigned int j = 0; j < Globals::numZones; j++) 138 | transitions[i][j] = 0; 139 | 140 | FrameAnnotation* prev = 0; 141 | 142 | // now get the set of all annotations 143 | vector& frameAnnotations = annotations.getFrameAnnotations(); 144 | for (unsigned int i = 0; i < frameAnnotations.size(); i++) { 145 | FrameAnnotation* fa = frameAnnotations[i]; 146 | 147 | CvPoint faceCenter = fa->getFace(); 148 | if (!faceCenter.x && !faceCenter.y) 149 | fa->setFace(center); 150 | 151 | // collect transition counts 152 | if (prev) { 153 | unsigned int prevSector = prev->getSector(); 154 | unsigned int sector = fa->getSector(); 155 | 156 | if (prevSector > 0 && prevSector <= 5 && sector > 0 && sector <= 5) 157 | transitions[prevSector - 1][sector - 1]++; 158 | } 159 | 160 | // now check if we have status and/or intersection filters. If yes, then 161 | // pick only those frames that match the filter 162 | if (statusFilter.size() && statusFilter.find(fa->getStatus()) == statusFilter.end()) 163 | continue; 164 | if (intersectionFilter.size() && 165 | intersectionFilter.find(fa->getIntersection()) == intersectionFilter.end()) 166 | continue; 167 | 168 | // compose filename and update map 169 | char buffer[256]; 170 | sprintf(buffer, "frame_%d.png", fa->getFrameNumber()); 171 | string simpleName = buffer; 172 | string fileName = framesDirectory + "/" + simpleName; 173 | fileToAnnotations.push_back(make_pair(fileName, new FrameAnnotation(*fa))); 174 | 175 | prev = fa; 176 | } 177 | 178 | // report transition counts 179 | cout << "Transitions" << endl; 180 | for (unsigned int i = 0; i < Globals::numZones; i++) { 181 | for (unsigned int j = 0; j < Globals::numZones; j++) 182 | cout << transitions[i][j] << "\t"; 183 | cout << endl; 184 | } 185 | } 186 | 187 | // addTestSet 188 | // Method used to add annotation files for validation and test. The method 189 | // takes as input an annotation file name and the fraction of the images that 190 | // should be used for validation in that set. The remainder are treated as 191 | // part of the test set. 192 | 193 | void Preprocess::addTestSet(string annotationsFileName, 194 | double validationFraction, double testFraction) { 195 | Annotations annotations; 196 | 197 | // first capture the mapping from file names to locations of interest 198 | annotations.readAnnotations(annotationsFileName); 199 | 200 | // get the frames directory 201 | string framesDirectory = annotations.getFramesDirectory(); 202 | 203 | // get the window center 204 | CvPoint& center = annotations.getCenter(); 205 | 206 | // now get the set of all annotations 207 | vector& frameAnnotations = annotations.getFrameAnnotations(); 208 | random_shuffle(frameAnnotations.begin(), frameAnnotations.end()); 209 | 210 | unsigned int nValidationLength = validationFraction * frameAnnotations.size(); 211 | unsigned int nTestLength = testFraction * frameAnnotations.size(); 212 | 213 | for (unsigned int i = 0; i < frameAnnotations.size(); i++) { 214 | FrameAnnotation* fa = frameAnnotations[i]; 215 | 216 | CvPoint faceCenter = fa->getFace(); 217 | if (!faceCenter.x && !faceCenter.y) 218 | fa->setFace(center); 219 | 220 | // compose filename and update map 221 | char buffer[256]; 222 | sprintf(buffer, "frame_%d.png", fa->getFrameNumber()); 223 | string simpleName = buffer; 224 | string fileName = framesDirectory + "/" + simpleName; 225 | if (i < nValidationLength) 226 | fileToAnnotationsForValidation.push_back(make_pair(fileName, new FrameAnnotation(*fa))); 227 | else if (i < nValidationLength + nTestLength) 228 | fileToAnnotationsForTest.push_back(make_pair(fileName, new FrameAnnotation(*fa))); 229 | } 230 | } 231 | 232 | // The following method is used to open a data file and a label file given 233 | // as input a simpleName. It generates the preambles in these files based 234 | // on the MNIST data format 235 | 236 | void Preprocess::openFiles(string simpleName) { 237 | // now open the data file and labels file and create their respective preambles 238 | string dataFileName = "data-" + simpleName; 239 | dataFile.open(dataFileName.c_str(), ofstream::binary); 240 | if (!dataFile.good()) { 241 | string err = "Preprocess::generate. Cannot open " + dataFileName + 242 | " for write"; 243 | throw (err); 244 | } 245 | // The magic number 0x00000803 is used for data, where the 0x08 246 | // is for unsigned byte valued data and the 0x03 is for the number 247 | // of dimensions 248 | IntBytesT ib; 249 | ib.i = 8; 250 | ib.i <<= 16; 251 | ib.i |= 3; 252 | dataFile.write((char*)&(ib.u), 4); 253 | ib.i = 0; 254 | dataFile.write((char*)&(ib.u), 4); // for the number of samples 255 | ib.i = (unsigned int)imgSize.width * scaleFactor; 256 | dataFile.write((char*)&(ib.u), 4); 257 | ib.i = (unsigned int)imgSize.height * scaleFactor; 258 | dataFile.write((char*)&(ib.u), 4); 259 | 260 | string labelFileName = "label-" + simpleName; 261 | labelFile.open(labelFileName.c_str(), ofstream::binary); 262 | if (!labelFile.good()) { 263 | string err = "Preprocess::generate. Cannot open " + labelFileName + 264 | " for write"; 265 | throw (err); 266 | } 267 | // The magic number 0x00000803 is used for data, where the 0x08 268 | // is for unsigned byte valued data and the 0x03 is for the number 269 | // of dimensions 270 | ib.i = 8; 271 | ib.i <<= 16; 272 | ib.i |= 1; 273 | labelFile.write((char*)&(ib.u), 4); 274 | ib.i = 0; 275 | labelFile.write((char*)&(ib.u), 4); // for the number of samples 276 | } 277 | 278 | // The following method closes the data file and the label file. It re-writes 279 | // the number of samples contained in these files first before closing. The 280 | // number of samples is not available at the time the files are created, as 281 | // we may choose to do affine transforms increasing the number of images 282 | // written over and above the number of images we get through the annotation 283 | // files 284 | 285 | void Preprocess::closeFiles(int nSamples) { 286 | // now write out the total number of samples that were written to file 287 | // and close both the data and label files 288 | IntBytesT ib; 289 | ib.i = nSamples; 290 | 291 | dataFile.seekp(4, ios_base::beg); 292 | dataFile.write((char*)&(ib.u), 4); 293 | dataFile.close(); 294 | 295 | labelFile.seekp(4, ios_base::beg); 296 | labelFile.write((char*)&(ib.u), 4); 297 | labelFile.close(); 298 | } 299 | 300 | // Method to do the actual image data and sector writing into data and label 301 | // files. We use a randomized access into the set of all files accumulated 302 | // to generate the number of samples required based on a user specified 303 | // percentage 304 | 305 | void Preprocess::generate(int startIndex, int nSamples, string simpleName, 306 | bool doWrite, 307 | vector >* additionalPairs) { 308 | // open data and label files 309 | openFiles(simpleName); 310 | 311 | int nSectorFrames[Globals::numZones]; 312 | for (unsigned int i = 0; i < Globals::numZones; i++) 313 | nSectorFrames[i] = 0; 314 | 315 | // if we want to write out test annotations then open a file for that 316 | // purpose 317 | ofstream annotationsFile; 318 | string dirName; 319 | if (doWrite) { 320 | // get absolute path of the input directory 321 | char fullPath[PATH_MAX + 1]; 322 | string fullPathName = realpath("./", fullPath); 323 | 324 | string fileName = fullPathName + "/annotations.xml"; 325 | dirName = string(fullPath) + "/_files"; 326 | annotationsFile.open(fileName.c_str()); 327 | mkdir(dirName.c_str(), S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH); 328 | if (annotationsFile.good()) { 329 | annotationsFile << "" << endl; 330 | annotationsFile << "" << endl; 331 | } 332 | } 333 | 334 | // write content 335 | int samples = 0; 336 | for (int i = startIndex; i < startIndex + nSamples; i++) { 337 | string fileName = fileToAnnotations[i].first; 338 | FrameAnnotation* fa = fileToAnnotations[i].second; 339 | 340 | int sector = fa->getSector(); 341 | if (sector < Annotations::DriverWindow || sector > Annotations::PassengerWindow) 342 | continue; 343 | 344 | if (doWrite) { 345 | annotationsFile << " " << endl; 346 | annotationsFile << " " << samples + 1 << 347 | "" << endl; 348 | annotationsFile << " " << fa->getFace().y << "," << 349 | fa->getFace().x << "" << endl; 350 | annotationsFile << " " << fa->getSector() << "" << endl; 351 | annotationsFile << " " << fa->getStatus() << "" << endl; 352 | annotationsFile << " " << fa->getIntersection() << 353 | "" << endl; 354 | annotationsFile << " " << endl; 355 | 356 | char buffer[256]; 357 | sprintf(buffer, "/frame_%d.png", samples + 1); 358 | string destFileName = dirName + string(buffer); 359 | string command = "/bin/cp " + fileName + " " + destFileName; 360 | static_cast(system(command.c_str())); 361 | } 362 | 363 | nSectorFrames[fa->getSector() - 1]++; 364 | 365 | // update files with image and label data 366 | update(fileName, fa, samples); 367 | } 368 | // now add any frames from the additional pairs if any 369 | if (additionalPairs) { 370 | for (unsigned int i = 0; i < additionalPairs->size(); i++) { 371 | string fileName = (*additionalPairs)[i].first; 372 | FrameAnnotation* fa = (*additionalPairs)[i].second; 373 | 374 | int sector = fa->getSector(); 375 | if (sector < Annotations::DriverWindow || sector > Annotations::PassengerWindow) 376 | continue; 377 | 378 | if (doWrite) { 379 | annotationsFile << " " << endl; 380 | annotationsFile << " " << samples + 1 << 381 | "" << endl; 382 | annotationsFile << " " << fa->getFace().y << "," << 383 | fa->getFace().x << "" << endl; 384 | annotationsFile << " " << fa->getSector() << "" << endl; 385 | annotationsFile << " " << fa->getStatus() << "" << endl; 386 | annotationsFile << " " << fa->getIntersection() << 387 | "" << endl; 388 | annotationsFile << " " << endl; 389 | 390 | char buffer[256]; 391 | sprintf(buffer, "/frame_%d.png", samples + 1); 392 | string destFileName = dirName + string(buffer); 393 | string command = "/bin/cp " + fileName + " " + destFileName; 394 | static_cast(system(command.c_str())); 395 | } 396 | 397 | nSectorFrames[fa->getSector() - 1]++; 398 | 399 | // update files with image and label data 400 | update(fileName, fa, samples); 401 | } 402 | } 403 | 404 | if (doWrite) { 405 | annotationsFile << "" << endl; 406 | annotationsFile.close(); 407 | } 408 | 409 | // close data and label files 410 | closeFiles(samples); 411 | 412 | // info 413 | cout << "For " << simpleName << " generated the following frames by sector" << endl; 414 | cout << "[" << nSectorFrames[0] << ", " << nSectorFrames[1] << ", " << 415 | nSectorFrames[2] << ", " << nSectorFrames[3] << ", " << nSectorFrames[4] << "]" << 416 | endl; 417 | } 418 | 419 | // Method that generates three sets of files; training, validation and test. 420 | // The user specified training and validation percentages are used with the 421 | // remaining images being treated as test images 422 | 423 | void Preprocess::generate(double training, double validation, bool doWriteTests) { 424 | // first shuffle the vector of image frames 425 | random_shuffle(fileToAnnotations.begin(), fileToAnnotations.end()); 426 | 427 | // compute training, validation, and testing required samples 428 | int len = fileToAnnotations.size(); 429 | int nTrainingSamples = len * training; 430 | int nValidationSamples = len * validation; 431 | int nTestingSamples = len - (nTrainingSamples + nValidationSamples); 432 | 433 | // generate training files 434 | string simpleName = "train-" + outputFileName; 435 | generate(0 /* startIndex */, nTrainingSamples, simpleName); 436 | 437 | // generate validation files 438 | simpleName = "valid-" + outputFileName; 439 | generate(nTrainingSamples, nValidationSamples, simpleName, 440 | false /* doWrite */, &fileToAnnotationsForValidation); 441 | 442 | // we don't want affine transforms for validation and test sets 443 | doAffineTransforms = false; 444 | 445 | // the remainder of the images are test images 446 | simpleName = "test-" + outputFileName; 447 | generate(nTrainingSamples + nValidationSamples, nTestingSamples, simpleName, 448 | doWriteTests, &fileToAnnotationsForTest); 449 | } 450 | 451 | // Method to do the actual image data and sector writing into data and label 452 | // files. We pick Globals::nSequenceLength chunks based on the chunk indices vector, 453 | // the starting chunk index and the number of chunks desired 454 | 455 | void Preprocess::generateSequences(int startIndex, int nChunks, 456 | vector& indices, string simpleName) { 457 | // open data and label files 458 | openFiles(simpleName); 459 | 460 | int nSectorFrames[Globals::numZones]; 461 | for (unsigned int i = 0; i < Globals::numZones; i++) 462 | nSectorFrames[i] = 0; 463 | 464 | // if we want to write out test annotations then open a file for that 465 | // purpose 466 | ofstream annotationsFile; 467 | string dirName; 468 | 469 | // collect the number of transitions from each sector to other sectors 470 | unsigned int transitions[Globals::numZones][Globals::numZones]; 471 | for (unsigned int i = 0; i < Globals::numZones; i++) 472 | for (unsigned int j = 0; j < Globals::numZones; j++) 473 | transitions[i][j] = 0; 474 | 475 | FrameAnnotation* prev = 0; 476 | 477 | // write content 478 | int samples = 0; 479 | for (int i = startIndex; i < startIndex + nChunks; i++) 480 | for (int j = 0; j < Globals::nSequenceLength; j++) { 481 | int offset = indices[i] * Globals::nSequenceLength + j; 482 | string fileName = fileToAnnotations[offset].first; 483 | FrameAnnotation* fa = fileToAnnotations[offset].second; 484 | 485 | int sector = fa->getSector(); 486 | if (sector < Annotations::DriverWindow || sector > Annotations::PassengerWindow) 487 | continue; 488 | 489 | // collect transition counts 490 | if (prev) { 491 | unsigned int prevSector = prev->getSector(); 492 | 493 | if (prevSector > 0 && prevSector <= 5) 494 | transitions[prevSector - 1][sector - 1]++; 495 | } 496 | 497 | nSectorFrames[fa->getSector() - 1]++; 498 | 499 | // update files with image and label data 500 | update(fileName, fa, samples); 501 | 502 | prev = fa; 503 | } 504 | 505 | // close data and label files 506 | closeFiles(samples); 507 | 508 | // report transition counts 509 | cout << "Transitions" << endl; 510 | for (unsigned int i = 0; i < Globals::numZones; i++) { 511 | for (unsigned int j = 0; j < Globals::numZones; j++) 512 | cout << transitions[i][j] << "\t"; 513 | cout << endl; 514 | } 515 | 516 | // info 517 | cout << "For " << simpleName << " generated the following frames by sector" << endl; 518 | cout << "[" << nSectorFrames[0] << ", " << nSectorFrames[1] << ", " << 519 | nSectorFrames[2] << ", " << nSectorFrames[3] << ", " << nSectorFrames[4] << "]" << 520 | endl; 521 | } 522 | 523 | // Method that generates three sets of files; training, validation and test. 524 | // The user specified training and validation percentages are used with the 525 | // remaining images being treated as test images. This method, unlike the one 526 | // above, will preserve frame sequences and will not randomize 527 | 528 | void Preprocess::generateSequences(double training, double validation) { 529 | // compute training, validation, and testing required samples 530 | int len = fileToAnnotations.size(); 531 | int nTrainingSamples = len * training; 532 | int nValidationSamples = len * validation; 533 | int nTestingSamples = len - (nTrainingSamples + nValidationSamples); 534 | 535 | int chunks = len / Globals::nSequenceLength; 536 | vector indices; 537 | for (int i = 0; i < chunks; i++) 538 | indices.push_back(i); 539 | random_shuffle(indices.begin(), indices.end()); 540 | 541 | int nValidationChunks = nValidationSamples / Globals::nSequenceLength; 542 | int nTestChunks = nTestingSamples / Globals::nSequenceLength; 543 | 544 | // pick the first nValidationChunks from indices for validation samples 545 | cout << "Validation frames" << endl; 546 | string simpleName = "valid-" + outputFileName; 547 | generateSequences(0 /* startIndex */, nValidationChunks, indices, simpleName); 548 | 549 | // generate test files 550 | cout << "Test frames" << endl; 551 | simpleName = "test-" + outputFileName; 552 | generateSequences(nValidationChunks, nTestChunks, indices, simpleName); 553 | 554 | // generate training files 555 | // since we want to preserve as much sequentiality in the data as possible, 556 | // we sort the remaining chunk indices and then use them to generate 557 | // training data. This way, contiguous chunks will contribute to increasing 558 | // sequential correlation 559 | cout << "Train frames" << endl; 560 | sort(indices.begin() + nValidationChunks + nTestChunks, indices.end()); 561 | simpleName = "train-" + outputFileName; 562 | generateSequences(nValidationChunks + nTestChunks, 563 | chunks - (nValidationChunks + nTestChunks), 564 | indices, simpleName); 565 | } 566 | 567 | // update 568 | // Method used to update terms used to create a filter from an image file 569 | // and a location of interest. This method is called by the method addTrainingSet 570 | // for each image file in the test set and a location of interest for that 571 | // image 572 | 573 | void Preprocess::update(string filename, FrameAnnotation* fa, int& samples) { 574 | IplImage* image = cvLoadImage(filename.c_str()); 575 | if (!image) { 576 | string err = "Preprocess::update. Cannot load file " + filename + "."; 577 | throw (err); 578 | } 579 | 580 | // generate affine transforms if requested 581 | vector& imgLocPairs = getAffineTransforms(image, fa->getFace()); 582 | 583 | for (unsigned int i = 0; i < imgLocPairs.size(); i++) { 584 | image = imgLocPairs[i].first; 585 | CvPoint& location = imgLocPairs[i].second; 586 | 587 | CvPoint offset; 588 | offset.x = offset.y = 0; 589 | fa->setFace(location); 590 | if (roiFunction) { 591 | IplImage* roi = roiFunction(image, *fa, offset, Annotations::Face); 592 | image = roi; 593 | } 594 | 595 | // compute size and length of the image data 596 | CvSize size = cvGetSize(image); 597 | 598 | // check consistency 599 | if (imgSize.height != size.height || imgSize.width != size.width) { 600 | char buffer[32]; 601 | sprintf(buffer, "(%d, %d).", imgSize.height, imgSize.width); 602 | string err = "Preprocess::update. Inconsistent image sizes. Expecting" + string(buffer); 603 | throw (err); 604 | } 605 | 606 | // preprocess 607 | double* preImage = preprocessImage(image); 608 | IplImage* processedImage = cvCreateImage(imgSize, IPL_DEPTH_8U, 1); 609 | int step = processedImage->widthStep; 610 | unsigned char* imageData = (unsigned char*)processedImage->imageData; 611 | for (int i = 0; i < imgSize.height; i++) { 612 | for (int j = 0; j < imgSize.width; j++) { 613 | if (preImage[i * imgSize.width + j] > 1) 614 | cout << "(" << i << ", " << j << ") = " << preImage[i * imgSize.width + j] << endl; 615 | double d = preImage[i * imgSize.width + j]; 616 | d = (inBinaryFormat)? ((d >= binaryThreshold)? 255 : 0) : d * 255; 617 | unsigned char c = (unsigned char)d; 618 | (*imageData++) = c; 619 | } 620 | imageData += step / sizeof(unsigned char) - imgSize.width; 621 | } 622 | // cvNamedWindow("grayScaleImage", CV_WINDOW_NORMAL | CV_WINDOW_AUTOSIZE); 623 | // cvShowImage("grayScaleImage", processedImage); 624 | // cvWaitKey(1); 625 | 626 | CvSize scaledSize; 627 | scaledSize.width = (unsigned int)imgSize.width * scaleFactor; 628 | scaledSize.height = (unsigned int)imgSize.height * scaleFactor; 629 | 630 | IplImage* scaledImage = cvCreateImage(scaledSize, processedImage->depth, 631 | processedImage->nChannels); 632 | cvResize(processedImage, scaledImage); 633 | // cvNamedWindow("scaledImage", CV_WINDOW_NORMAL | CV_WINDOW_AUTOSIZE); 634 | // cvShowImage("scaledImage", scaledImage); 635 | // cvWaitKey(); 636 | cvReleaseImage(&processedImage); 637 | 638 | samples++; 639 | unsigned char buffer[16]; 640 | buffer[0] = (unsigned char)fa->getSector(); 641 | labelFile.write((char*)buffer, 1); 642 | /* 643 | // write out other bits of annotated information 644 | unsigned char status = (unsigned char)fa->getStatus(); 645 | bitset<8> statusBits(status); 646 | for (unsigned int i = 0; i < statusBits.size(); i++) { 647 | if (statusBits[i]) 648 | buffer[i] = 255; 649 | else 650 | buffer[i] = 0; 651 | } 652 | dataFile.write((char*)buffer, 8); 653 | 654 | unsigned char intersection = (unsigned char)fa->getIntersection(); 655 | bitset<8> intxBits(intersection); 656 | for (unsigned int i = 0; i < intxBits.size(); i++) { 657 | if (intxBits[i]) 658 | buffer[i] = 255; 659 | else 660 | buffer[i] = 0; 661 | } 662 | dataFile.write((char*)buffer, 8); 663 | */ 664 | imageData = (unsigned char*)scaledImage->imageData; 665 | for (int i = 0; i < scaledSize.height; i++) { 666 | for (int j = 0; j < scaledSize.width; j++) { 667 | buffer[0] = (*imageData++); 668 | dataFile.write((char*)buffer, 1); 669 | } 670 | imageData += step / sizeof(unsigned char) - scaledSize.width; 671 | } 672 | cvReleaseImage(&scaledImage); 673 | 674 | if (roiFunction) 675 | cvReleaseImage(&image); 676 | } 677 | 678 | destroyAffineTransforms(imgLocPairs); 679 | } 680 | 681 | // generateImageVector 682 | // Method used to create an image vector for classification. We carve out an ROI 683 | // using the ROI function, preprocess the ROI image, scale the preprocessed image 684 | // and return it as an array of doubles 685 | 686 | IplImage* Preprocess::generateImageVector(IplImage* image) { 687 | FrameAnnotation fa; 688 | CvPoint offset; 689 | offset.x = offset.y = 0; 690 | fa.setFace(windowCenter); 691 | if (roiFunction) { 692 | IplImage* roi = roiFunction(image, fa, offset, Annotations::Face); 693 | image = roi; 694 | } 695 | 696 | // cvNamedWindow("Image", CV_WINDOW_NORMAL | CV_WINDOW_AUTOSIZE); 697 | // cvShowImage("Image", image); 698 | 699 | // compute size and length of the image data 700 | CvSize size = cvGetSize(image); 701 | 702 | // check consistency 703 | if (imgSize.height != size.height || imgSize.width != size.width) { 704 | char buffer[32]; 705 | sprintf(buffer, "(%d, %d).", imgSize.height, imgSize.width); 706 | string err = "Preprocess::update. Inconsistent image sizes. Expecting" + string(buffer); 707 | throw (err); 708 | } 709 | 710 | // preprocess 711 | double* preImage = preprocessImage(image); 712 | IplImage* processedImage = cvCreateImage(imgSize, IPL_DEPTH_64F, 1); 713 | int step = processedImage->widthStep; 714 | double* imageData = (double*)processedImage->imageData; 715 | for (int i = 0; i < imgSize.height; i++) { 716 | for (int j = 0; j < imgSize.width; j++) { 717 | if (preImage[i * imgSize.width + j] > 1) 718 | cout << "(" << i << ", " << j << ") = " << preImage[i * imgSize.width + j] << endl; 719 | double d = preImage[i * imgSize.width + j]; 720 | (*imageData++) = d; 721 | } 722 | imageData += step / sizeof(double) - imgSize.width; 723 | } 724 | 725 | CvSize scaledSize; 726 | scaledSize.width = (unsigned int)imgSize.width * scaleFactor; 727 | scaledSize.height = (unsigned int)imgSize.height * scaleFactor; 728 | 729 | IplImage* scaledImage = cvCreateImage(scaledSize, processedImage->depth, 730 | processedImage->nChannels); 731 | cvResize(processedImage, scaledImage); 732 | cvNamedWindow("scaledImage", CV_WINDOW_NORMAL | CV_WINDOW_AUTOSIZE); 733 | cvShowImage("scaledImage", scaledImage); 734 | cvWaitKey(); 735 | cvReleaseImage(&processedImage); 736 | 737 | size.height = 1; 738 | size.width = scaledSize.width * scaledSize.height; 739 | IplImage* result = cvCreateImage(size, IPL_DEPTH_64F, 1); 740 | double* resultData = (double*)result->imageData; 741 | imageData = (double*)scaledImage->imageData; 742 | step = scaledImage->widthStep; 743 | for (int i = 0; i < scaledSize.height; i++) { 744 | for (int j = 0; j < scaledSize.width; j++) { 745 | (*resultData++) = (*imageData++); 746 | } 747 | imageData += step / sizeof(double) - scaledSize.width; 748 | } 749 | 750 | if (roiFunction) 751 | cvReleaseImage(&image); 752 | cvReleaseImage(&scaledImage); 753 | 754 | return result; 755 | } 756 | 757 | // preprocessImage 758 | // Method that preprocesses an image that has already been loaded for a 759 | // subsequent application of the filter. The method returns a preprocessed 760 | // image which can then be used on a subsequent call to apply or update. 761 | 762 | double* Preprocess::preprocessImage(IplImage* inputImg) { 763 | // we take the complex image and preprocess it here 764 | if (!inputImg) { 765 | string err = "Preprocess::preprocessImage. Call setImage with a valid image."; 766 | throw (err); 767 | } 768 | 769 | bool releaseImage = false; 770 | IplImage* image = 0; 771 | 772 | // First check if the image is in grayscale. If not, we first convert it 773 | // into grayscale. The input image is replaced with its grayscale version 774 | if ((inputImg->nChannels != 1 && strcmp(inputImg->colorModel, "GRAY"))) { 775 | image = cvCreateImage(imgSize, IPL_DEPTH_8U, 1); 776 | cvCvtColor(inputImg, image, CV_BGR2GRAY); 777 | releaseImage = true; 778 | } else 779 | image = inputImg; 780 | 781 | // now do histogram equalization 782 | cvEqualizeHist(image, image); 783 | 784 | // edge detection 785 | cvCanny(image, image, 120, 200, 3); 786 | 787 | // We follow preprocessing steps here as outlined in Bolme 2009 788 | // First populate a real image from the grayscale image 789 | double scale = 1.0 / 255.0; 790 | cvConvertScale(image, realImg, scale, 0.0); 791 | /* 792 | // compute image inversion 793 | int step = realImg->widthStep; 794 | double* imageData = (double*)realImg->imageData; 795 | for (int i = 0; i < imgSize.height; i++) { 796 | for (int j = 0; j < imgSize.width; j++) { 797 | *(imageData) = 1 - *(imageData); 798 | imageData++; 799 | } 800 | imageData += step / sizeof(double) - imgSize.width; 801 | } 802 | */ 803 | // suppress DC 804 | cvDCT(realImg, tempImg, CV_DXT_FORWARD); 805 | int step = tempImg->widthStep; 806 | double* imageData = (double*)tempImg->imageData; 807 | for (int i = 0; i < imgSize.height; i++) { 808 | for (int j = 0; j < imgSize.width; j++) { 809 | double sigmoid = (1 / (1 + (exp(-(i * imgSize.width + j))))); 810 | *(imageData) = *(imageData) * sigmoid; 811 | imageData++; 812 | } 813 | imageData += step / sizeof(double) - imgSize.width; 814 | } 815 | cvSet2D(tempImg, 0, 0, cvScalar(0)); 816 | cvDCT(tempImg, realImg, CV_DXT_INVERSE); 817 | 818 | double min, max; 819 | cvMinMaxLoc(realImg, &min, &max, NULL, NULL); 820 | if (min < 0) 821 | cvAddS(realImg, cvScalar(-min), realImg, NULL); 822 | else 823 | cvAddS(realImg, cvScalar(min), realImg, NULL); 824 | 825 | cvMinMaxLoc(realImg, &min, &max, NULL, NULL); 826 | scale = 1.0 / max; 827 | cvConvertScale(realImg, realImg, scale, 0); 828 | 829 | // Apply the window 830 | applyWindow(realImg, window, imageBuffer); 831 | 832 | /* double* destImageData = imageBuffer; 833 | double* srcImageData = (double*)realImg->imageData; 834 | for (int i = 0; i < imgSize.height; i++) { 835 | for (int j = 0; j < imgSize.width; j++) { 836 | (*destImageData) = (*srcImageData); 837 | srcImageData++; destImageData++; 838 | } 839 | srcImageData += step / sizeof(double) - imgSize.width; 840 | }*/ 841 | // showRealImage("preprocessedImage", imageBuffer); 842 | 843 | if (releaseImage) 844 | cvReleaseImage(&image); 845 | 846 | return imageBuffer; 847 | } 848 | 849 | // getAffineTransforms 850 | // Method to generate a set of affine transformations of a given image. The 851 | // image is rotated and translated to perturb the LOI around the given location. 852 | // The method returns a vector of images that have been perturbed with small 853 | // affine transforms 854 | 855 | vector& Preprocess::getAffineTransforms(IplImage* image, CvPoint& location) { 856 | // first check if affine transformations are needed, if not then simply 857 | // push the input images to the vector of transformed images and return 858 | transformedImages.push_back(make_pair(image, location)); 859 | if (!doAffineTransforms) 860 | return transformedImages; 861 | 862 | // Setup unchanging data sets used for transformations 863 | // for rotation 864 | Mat imageMat(image); 865 | Point2f center(imageMat.cols / 2.0F, imageMat.rows / 2.0F); 866 | 867 | CvSize size = cvGetSize(image); 868 | 869 | // for translation 870 | Mat translationMat = getRotationMatrix2D(center, 0, 1.0); 871 | translationMat.at(0, 0) = 1; 872 | translationMat.at(0, 1) = 0; 873 | translationMat.at(1, 0) = 0; 874 | translationMat.at(1, 1) = 1; 875 | 876 | // perform a set of translations of each rotated image 877 | Mat src(image); 878 | for (double xdist = -20; xdist <= 20; xdist += 10) { 879 | if (xdist == 0) continue; 880 | 881 | translationMat.at(0, 2) = xdist; 882 | translationMat.at(1, 2) = 0; 883 | IplImage* translatedImage = cvCloneImage(image); 884 | Mat dest(translatedImage); 885 | warpAffine(src, dest, translationMat, src.size()); 886 | 887 | CvPoint translatedLocation; 888 | translatedLocation.x = location.x + xdist; 889 | translatedLocation.y = location.y; 890 | 891 | // check if the translated location is out of bounds with respect 892 | // to the image window. Do not add those images to the set 893 | if (translatedLocation.x < 0 || translatedLocation.x > size.width || 894 | translatedLocation.y < 0 || translatedLocation.y > size.height) { 895 | cvReleaseImage(&translatedImage); 896 | continue; 897 | } 898 | 899 | pair p = make_pair(translatedImage, translatedLocation); 900 | transformedImages.push_back(p); 901 | } 902 | 903 | return transformedImages; 904 | } 905 | 906 | // destroyAffineTransforms 907 | // Method to destroy the images generated using affine transforms 908 | 909 | void Preprocess::destroyAffineTransforms(vector& imgLocPairs) { 910 | for (unsigned int i = 0; i < imgLocPairs.size(); i++) 911 | cvReleaseImage(&(imgLocPairs[i].first)); 912 | imgLocPairs.clear(); 913 | } 914 | 915 | // createWindow 916 | // Method to create a window around a given location to drop the value of 917 | // the pixels all around the area of interest in the image. The size of the 918 | // field is the same as the filter size. The spread parameters are used to 919 | // define the spread of the hot spot on the image plane. Pixels close to 920 | // location have the highest values and those beyond the spread rapidly go 921 | // to zero 922 | 923 | double* Preprocess::createWindow(CvPoint& location, double xSpread, double ySpread) { 924 | // Linear space vector. Create a meshgrid with x and y axes values 925 | // that stradle the location 926 | 927 | double xspacer[imgSize.width]; 928 | double yspacer[imgSize.height]; 929 | 930 | int lx = location.x; 931 | double left = -lx; 932 | for (int i = 0; i < imgSize.width; i++) { 933 | xspacer[i] = left; 934 | left += 1.0; 935 | } 936 | int ly = location.y; 937 | double top = -ly; 938 | for (int i = 0; i < imgSize.height; i++) { 939 | yspacer[i] = top; 940 | top += 1.0; 941 | } 942 | 943 | // Mesh grid 944 | double x[imgSize.height][imgSize.width]; 945 | double y[imgSize.height][imgSize.width]; 946 | 947 | for (int i = 0; i < imgSize.height; i++) { 948 | for (int j = 0; j < imgSize.width; j++) { 949 | x[i][j] = xspacer[j]; 950 | y[i][j] = yspacer[i]; 951 | } 952 | } 953 | 954 | // create a gaussian as big as the image 955 | double gaussian[imgSize.height][imgSize.width]; 956 | 957 | double det = xSpread * ySpread; 958 | 959 | for (int i = 0; i < imgSize.height; i++) { 960 | for (int j = 0; j < imgSize.width; j++) { 961 | // using just the gaussian kernel 962 | double X = x[i][j] * x[i][j]; 963 | double Y = y[i][j] * y[i][j]; 964 | gaussian[i][j] = exp(-((X * ySpread + Y * xSpread) / det)); 965 | } 966 | } 967 | 968 | double* window = (double*)fftw_malloc(sizeof(double) * length); 969 | 970 | // now initialize a real array as large as the image array with the values 971 | // of the gaussian 972 | for (int i = 0; i < length; i++) 973 | window[i] = 0; 974 | for (int i = 0; i < imgSize.height; i++) { 975 | for (int j = 0; j < imgSize.width; j++) { 976 | window[i * imgSize.width + j] = gaussian[i][j]; 977 | } 978 | } 979 | 980 | // showRealImage("__window", window); 981 | 982 | return window; 983 | } 984 | 985 | // applyWindow 986 | // Method used to apply a window function to a real image. It takes as input 987 | // the real image source and a window as a 2d real array. The result is stored in 988 | // the third parameter, the source and the destination can be the same. The step 989 | // is expected to match in the src and dest images 990 | 991 | void Preprocess::applyWindow(IplImage* src, double* window, double* dest) { 992 | int step = src->widthStep; 993 | 994 | double* destImageData = dest; 995 | double* srcImageData = (double*)src->imageData; 996 | for (int i = 0; i < imgSize.height; i++) { 997 | for (int j = 0; j < imgSize.width; j++) { 998 | (*destImageData) = (*srcImageData) * window[i * imgSize.width + j]; 999 | srcImageData++; destImageData++; 1000 | } 1001 | srcImageData += step / sizeof(double) - imgSize.width; 1002 | } 1003 | } 1004 | 1005 | -------------------------------------------------------------------------------- /utils/src/Preprocess.h: -------------------------------------------------------------------------------- 1 | #ifndef __PREPROCESS_H 2 | #define __PREPROCESS_H 3 | 4 | // Preprocess.h 5 | // This file contains the definition of the Preprocess class. It generates 6 | // data in the IDX file format, similar to the MNIST dataset 7 | 8 | #include 9 | #include 10 | #include 11 | #include 12 | #include 13 | #include 14 | #include 15 | #include 16 | #include 17 | #include 18 | #include 19 | #include 20 | #include 21 | #include 22 | #include 23 | 24 | // The standard OpenCV headers 25 | #include 26 | #include 27 | #include 28 | 29 | #include "Globals.h" 30 | #include "Annotations.h" 31 | 32 | using namespace std; 33 | using namespace cv; 34 | 35 | #include "Annotations.h" 36 | 37 | // the ROI extraction function pointer type 38 | typedef IplImage* (*roiFnT)(IplImage*, FrameAnnotation&, 39 | CvPoint&, Annotations::Tag xmlTag); 40 | 41 | // an image and location pair type 42 | typedef pair ImgLocPairT; 43 | 44 | class Preprocess { 45 | protected: 46 | string outputFileName; // the name of the file for preprocessed data 47 | CvSize imgSize; // the size of the input image 48 | int length; // length of the image array 49 | roiFnT roiFunction; // an optional function to get image ROI 50 | ofstream dataFile; // the output file handle for data 51 | ofstream labelFile; // the output file handle for labels 52 | unsigned int nSamples; // the number of samples written 53 | double scaleFactor; // the scale factor for the final image data 54 | bool inBinaryFormat; // flag to generate pixels in {0, 1} instead 55 | // of in [0, 1] 56 | double binaryThreshold; // the threshold for choosing 0 or 1 in binary format 57 | bool useBins; // whether or not to bin frames based on sector 58 | 59 | // filters used to pick images that match specific car status or intersection type 60 | map statusFilter; 61 | map intersectionFilter; 62 | 63 | // map used to associate an image filename with its frame annotation data 64 | vector > fileToAnnotations; 65 | 66 | // map used to associate an image filename with its frame annotations data for 67 | // additional validation and test data 68 | vector > fileToAnnotationsForValidation; 69 | vector > fileToAnnotationsForTest; 70 | 71 | public: 72 | // Constructor used to construct a filter 73 | Preprocess(string output, CvSize size, double scale, CvPoint& windowCenter, 74 | map& statusFilter, 75 | map& intersectionFilter, 76 | roiFnT roiFunction = 0, bool useBins = false, 77 | bool inBinaryFormat = false, double binaryThreshold = 0.5); 78 | Preprocess(CvSize size, double scale, CvPoint& windowCenter, 79 | roiFnT roiFunction = 0); 80 | 81 | virtual ~Preprocess(); 82 | 83 | // main methods 84 | virtual void addTrainingSet(string trainingDirectory); 85 | virtual void addTestSet(string annotationsFileName, 86 | double validationFraction, double testFraction); 87 | virtual void setAffineTransforms() { 88 | doAffineTransforms = true; 89 | } 90 | 91 | // method to generate the output files with percentages of images to 92 | // be used for training, validation and testing 93 | virtual void generate(double training, double validation, bool doWriteTests = false); 94 | 95 | // method to generate the output files with percentages of images to 96 | // be used for training, validation and testing. The difference here 97 | // is that we have to preserve the sequencing of image frames. We do 98 | // not randomize frames but instead pick sub-sequences to cover the 99 | // desired partition 100 | virtual void generateSequences(double training, double validation); 101 | 102 | // method to preprocess images 103 | virtual double* preprocessImage(IplImage* image); 104 | 105 | // method to generate an image vector after preprocessing 106 | virtual IplImage* generateImageVector(IplImage* image); 107 | 108 | CvSize getSize() { return imgSize; } 109 | 110 | // test methods 111 | static void showImage(string window, IplImage* image) { 112 | cvNamedWindow((const char*)window.c_str(), CV_WINDOW_NORMAL | CV_WINDOW_AUTOSIZE); 113 | cvShowImage((const char*)window.c_str(), image); 114 | cvWaitKey(1); 115 | } 116 | void showRealImage(string window, double* data) { 117 | IplImage* temp = cvCreateImage(imgSize, IPL_DEPTH_64F, 1); 118 | int step = temp->widthStep; 119 | double* imageData = (double*)temp->imageData; 120 | for (int i = 0; i < imgSize.height; i++) { 121 | for (int j = 0; j < imgSize.width; j++) { 122 | (*imageData++) = data[i * imgSize.width + j]; 123 | } 124 | imageData += step /sizeof(double) - imgSize.width; 125 | } 126 | cvNamedWindow((const char*)window.c_str(), CV_WINDOW_NORMAL | CV_WINDOW_AUTOSIZE); 127 | cvShowImage((const char*)window.c_str(), temp); 128 | cvWaitKey(1); 129 | cvReleaseImage(&temp); 130 | } 131 | 132 | protected: 133 | void openFiles(string simpleName); 134 | void closeFiles(int nSamples); 135 | void generate(int startIndex, int nSamples, string simpleName, bool doWriteTests = false, 136 | vector >* additionalPairs = 0); 137 | void generateSequences(int startIndex, int nChunks, 138 | vector& indices, string simpleName); 139 | void update(string filename, FrameAnnotation* fa, int& samples); 140 | vector& getAffineTransforms(IplImage* image, CvPoint& location); 141 | void destroyAffineTransforms(vector& imgLocPairs); 142 | double* createWindow(CvPoint& location, double xSpread, double ySpread); 143 | void applyWindow(IplImage* src, double* window, double* dest); 144 | 145 | // if the following is set then for each update operation during 146 | // filter generation, we take a set of affine transformations of 147 | // each image and update the filter using the original image and 148 | // the affine transformations of the image 149 | bool doAffineTransforms; 150 | 151 | // used to store a set of images after doing small perturbations of the 152 | // images using affine transformations for filter update 153 | vector transformedImages; 154 | 155 | double* window; 156 | IplImage* realImg; 157 | IplImage* tempImg; 158 | double* imageBuffer; 159 | 160 | CvPoint windowCenter; 161 | }; 162 | 163 | #endif // __PREPROCESS_H 164 | -------------------------------------------------------------------------------- /utils/src/annotator.cpp: -------------------------------------------------------------------------------- 1 | #include 2 | #include 3 | #include 4 | #include 5 | #include 6 | #include 7 | #include 8 | 9 | class MyWidget : public QWidget 10 | { 11 | public: 12 | MyWidget(QWidget *parent = 0); 13 | }; 14 | 15 | MyWidget::MyWidget(QWidget *parent) 16 | : QWidget(parent) { 17 | QPushButton *quit = new QPushButton(tr("Quit")); 18 | quit->setFont(QFont("Times", 18, QFont::Bold)); 19 | 20 | QLCDNumber *lcd = new QLCDNumber(2); 21 | lcd->setSegmentStyle(QLCDNumber::Filled); 22 | 23 | QSlider *slider = new QSlider(Qt::Horizontal); 24 | slider->setRange(0, 99); 25 | slider->setValue(0); 26 | 27 | connect(quit, SIGNAL(clicked()), qApp, SLOT(quit())); 28 | connect(slider, SIGNAL(valueChanged(int)), 29 | lcd, SLOT(display(int))); 30 | 31 | QVBoxLayout *layout = new QVBoxLayout; 32 | layout->addWidget(quit); 33 | layout->addWidget(lcd); 34 | layout->addWidget(slider); 35 | setLayout(layout); 36 | } 37 | 38 | int main(int argc, char *argv[]) { 39 | QApplication app(argc, argv); 40 | MyWidget widget; 41 | widget.show(); 42 | return app.exec(); 43 | } 44 | -------------------------------------------------------------------------------- /utils/src/classify.cpp: -------------------------------------------------------------------------------- 1 | // test.cpp 2 | // Code that tests the various pieces of functionality we have for gaze tracking 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | 10 | #include "Preprocess.h" 11 | 12 | using namespace std; 13 | 14 | vector g_weights; 15 | vector g_biases; 16 | vector g_results; 17 | 18 | // roiFunction 19 | // We support the use of LOIs identified to cull images to smaller regions of interest 20 | // (ROI) for use in locating future LOIs. This function is passed to the constructors 21 | // of the filter and classifier classes. Those classes in turn call this function when 22 | // they need a culled image. The input parameters are the original image, a frame 23 | // annotation object that is annotated with all the LOIs that we have found before this 24 | // function gets called. The offset parameter is an output parameter that contains the 25 | // offset of the ROI within the image. The function returns a culled image object 26 | 27 | IplImage* roiFunction(IplImage* image, FrameAnnotation& fa, 28 | CvPoint& offset, Annotations::Tag xmlTag) { 29 | offset.x = 0; 30 | offset.y = 0; 31 | 32 | CvPoint& location = fa.getLOI(xmlTag); 33 | offset.y = location.y - (Globals::roiHeight / 2); 34 | offset.x = location.x - (Globals::roiWidth / 2); 35 | 36 | // now check if the roi overflows the image boundary. If it does then 37 | // we move it so that it is contained within the image boundary 38 | if (offset.x + Globals::roiWidth > Globals::imgWidth) 39 | offset.x = Globals::imgWidth - Globals::roiWidth; 40 | if (offset.x < 0) 41 | offset.x = 0; 42 | if (offset.y + Globals::roiHeight > Globals::imgHeight) 43 | offset.y = Globals::imgHeight - Globals::roiHeight; 44 | if (offset.y < 0) 45 | offset.y = 0; 46 | 47 | cvSetImageROI(image, cvRect(offset.x, offset.y, Globals::roiWidth, Globals::roiHeight)); 48 | IplImage* roi = cvCreateImage(cvGetSize(image), image->depth, image->nChannels); 49 | cvCopy(image, roi); 50 | cvResetImageROI(image); 51 | 52 | return roi; 53 | } 54 | 55 | void loadModel(string modelFileName) { 56 | ifstream ifs; 57 | string line; 58 | int layer = 1; 59 | 60 | ifs.open(modelFileName.c_str()); 61 | while (ifs.good()) { 62 | getline(ifs, line); 63 | if (line == "") break; 64 | 65 | char* str = (char*)line.c_str(); 66 | if (strchr(str, ',')) { 67 | int rows, cols; 68 | sscanf(str, "%d,%d", &rows, &cols); 69 | cout << "Layer " << layer << ". (" << rows << ", " << cols << ")." << endl; 70 | CvSize size; 71 | size.height = rows; 72 | size.width = cols; 73 | IplImage* W = cvCreateImage(size, IPL_DEPTH_64F, 1); 74 | double* imageData = (double*)W->imageData; 75 | int step = W->widthStep; 76 | for (int i = 0; i < size.height; i++) { 77 | getline(ifs, line); 78 | str = (char*)line.c_str(); 79 | char* token = strtok(str, ","); 80 | for (int j = 0; j < size.width; j++) { 81 | (*imageData++) = atof(token); 82 | token = strtok(NULL, ","); 83 | } 84 | imageData += step / sizeof(double) - size.width; 85 | } 86 | g_weights.push_back(W); 87 | 88 | size.height = 1; 89 | size.width = cols; 90 | IplImage* result = cvCreateImage(size, IPL_DEPTH_64F, 1); 91 | g_results.push_back(result); 92 | } else { 93 | int elems; 94 | sscanf(str, "%d", &elems); 95 | cout << "Layer " << layer << ". Bias length (" << elems << ")." << endl; 96 | 97 | CvSize size; 98 | size.height = 1; 99 | size.width = elems; 100 | IplImage* b = cvCreateImage(size, IPL_DEPTH_64F, 1); 101 | double* imageData = (double*)b->imageData; 102 | int step = b->widthStep; 103 | for (int i = 0; i < size.height; i++) { 104 | getline(ifs, line); 105 | str = (char*)line.c_str(); 106 | char* token = strtok(str, ","); 107 | for (int j = 0; j < size.width; j++) { 108 | (*imageData++) = atof(token); 109 | token = strtok(NULL, ","); 110 | } 111 | imageData += step / sizeof(double) - size.width; 112 | } 113 | g_biases.push_back(b); 114 | layer++; 115 | } 116 | } 117 | } 118 | 119 | int classify(IplImage* image) { 120 | IplImage* input = image; 121 | int nLayers = g_weights.size(); 122 | for (int i = 0; i < nLayers; i++) { 123 | IplImage* W = g_weights[i]; 124 | IplImage* b = g_biases[i]; 125 | IplImage* result = g_results[i]; 126 | 127 | cvMatMulAdd(input, W, b, result); 128 | CvSize size = cvGetSize(result); 129 | double* imageData = (double*)result->imageData; 130 | for (int j = 0; j < size.width; j++) { 131 | if (i == nLayers - 1) 132 | *imageData = exp(*imageData); 133 | else 134 | *imageData = tanh(*imageData); 135 | imageData++; 136 | } 137 | if (i == nLayers - 1) { 138 | CvScalar sum = cvSum(result); 139 | double scale = 1.0 / sum.val[0]; 140 | cvConvertScale(result, result, scale, 0.0); 141 | } 142 | input = result; 143 | } 144 | 145 | double min, max; 146 | CvPoint minIndex, maxIndex; 147 | cvMinMaxLoc(input, &min, &max, &minIndex, &maxIndex); 148 | // cout << "Min = " << min << ", Max = " << max << endl; 149 | // cout << "Min index = (" << minIndex.x << ", " << minIndex.y << ")" << endl; 150 | // cout << "Max index = (" << maxIndex.x << ", " << maxIndex.y << ")" << endl; 151 | return maxIndex.x + 1; 152 | } 153 | 154 | int main(int argc, char** argv) { 155 | string modelsFileName = ""; 156 | string imageFileName = ""; 157 | string imageDirectory = ""; 158 | int x = Globals::imgWidth / 2; 159 | int y = Globals::imgHeight / 2; 160 | 161 | for (int i = 1; i < argc; i++) { 162 | if (!strcmp(argv[i], "-i")) 163 | imageFileName = argv[i + 1]; 164 | else if (!strcmp(argv[i], "-d")) 165 | imageDirectory = argv[i + 1]; 166 | else if (!strcmp(argv[i], "-m")) 167 | modelsFileName = argv[i + 1]; 168 | else if (!strcmp(argv[i], "-c")) { 169 | char* str = argv[i + 1]; 170 | char* token = strtok(str, "(),"); 171 | if (token) 172 | x = atoi(token); 173 | token = strtok(NULL, "(),"); 174 | if (token) 175 | y = atoi(token); 176 | } 177 | else if (!strcmp(argv[i], "-h")) { 178 | cout << 179 | "Usage: classify -i -m " << endl; 180 | return 0; 181 | } 182 | } 183 | 184 | if (modelsFileName == "") { 185 | cout << 186 | "Usage: classify -i -m " << endl; 187 | return -1; 188 | } 189 | 190 | CvPoint center; 191 | center.x = x; 192 | center.y = y; 193 | 194 | CvSize size; 195 | size.width = Globals::roiWidth; 196 | size.height = Globals::roiHeight; 197 | 198 | double scale = 0.3; 199 | 200 | try { 201 | loadModel(modelsFileName); 202 | 203 | if (imageFileName != "") { 204 | Preprocess preprocess(size, scale, center, roiFunction); 205 | IplImage* image = cvLoadImage(imageFileName.c_str()); 206 | IplImage* imageVector = preprocess.generateImageVector(image); 207 | 208 | cout << "Sector " << classify(imageVector) << endl; 209 | cvReleaseImage(&image); 210 | cvReleaseImage(&imageVector); 211 | } else if (imageDirectory != "") { 212 | int counts[5][6]; 213 | for (int i = 0; i < 5; i++) 214 | for (int j = 0; j < 6; j++) 215 | counts[i][j] = 0; 216 | 217 | string annotationsFileName = imageDirectory + "/annotations.xml"; 218 | Annotations annotations; 219 | annotations.readAnnotations(annotationsFileName); 220 | CvPoint& center = annotations.getCenter(); 221 | 222 | Preprocess preprocess(size, scale, center, roiFunction); 223 | vector& frameAnnotations = annotations.getFrameAnnotations(); 224 | for (unsigned int i = 0; i < frameAnnotations.size(); i++) { 225 | FrameAnnotation* fa = frameAnnotations[i]; 226 | fa->setFace(center); 227 | 228 | int expectedZone = fa->getSector(); 229 | counts[expectedZone - 1][5]++; 230 | 231 | // compose filename and update map 232 | char buffer[256]; 233 | sprintf(buffer, "frame_%d.png", fa->getFrameNumber()); 234 | string simpleName = buffer; 235 | string fileName = imageDirectory + "/" + simpleName; 236 | IplImage* image = cvLoadImage(fileName.c_str()); 237 | IplImage* imageVector = preprocess.generateImageVector(image); 238 | 239 | int zone = classify(imageVector); 240 | if (expectedZone == zone) 241 | counts[zone - 1][zone - 1]++; 242 | else 243 | counts[expectedZone - 1][zone - 1]++; 244 | 245 | cvReleaseImage(&image); 246 | cvReleaseImage(&imageVector); 247 | } 248 | cout << "Errors by class" << endl; 249 | for (int i = 0; i < 5; i++) { 250 | for (int j = 0; j < 6; j++) 251 | cout << counts[i][j] << "\t"; 252 | cout << endl; 253 | } 254 | } 255 | } catch (string err) { 256 | cout << err << endl; 257 | } 258 | 259 | return 0; 260 | } 261 | -------------------------------------------------------------------------------- /utils/src/xmlToIDX.cpp: -------------------------------------------------------------------------------- 1 | // test.cpp 2 | // Code that tests the various pieces of functionality we have for gaze tracking 3 | 4 | #include 5 | #include 6 | #include 7 | #include 8 | #include 9 | 10 | #include "Preprocess.h" 11 | 12 | using namespace std; 13 | 14 | // roiFunction 15 | // We support the use of LOIs identified to cull images to smaller regions of interest 16 | // (ROI) for use in locating future LOIs. This function is passed to the constructors 17 | // of the filter and classifier classes. Those classes in turn call this function when 18 | // they need a culled image. The input parameters are the original image, a frame 19 | // annotation object that is annotated with all the LOIs that we have found before this 20 | // function gets called. The offset parameter is an output parameter that contains the 21 | // offset of the ROI within the image. The function returns a culled image object 22 | 23 | IplImage* roiFunction(IplImage* image, FrameAnnotation& fa, 24 | CvPoint& offset, Annotations::Tag xmlTag) { 25 | offset.x = 0; 26 | offset.y = 0; 27 | 28 | CvPoint& location = fa.getLOI(xmlTag); 29 | offset.y = location.y - (Globals::roiHeight / 2); 30 | offset.x = location.x - (Globals::roiWidth / 2); 31 | 32 | // now check if the roi overflows the image boundary. If it does then 33 | // we move it so that it is contained within the image boundary 34 | if (offset.x + Globals::roiWidth > Globals::imgWidth) 35 | offset.x = Globals::imgWidth - Globals::roiWidth; 36 | if (offset.x < 0) 37 | offset.x = 0; 38 | if (offset.y + Globals::roiHeight > Globals::imgHeight) 39 | offset.y = Globals::imgHeight - Globals::roiHeight; 40 | if (offset.y < 0) 41 | offset.y = 0; 42 | 43 | cvSetImageROI(image, cvRect(offset.x, offset.y, Globals::roiWidth, Globals::roiHeight)); 44 | IplImage* roi = cvCreateImage(cvGetSize(image), image->depth, image->nChannels); 45 | cvCopy(image, roi); 46 | cvResetImageROI(image); 47 | 48 | return roi; 49 | } 50 | 51 | int main(int argc, char** argv) { 52 | string outputFileName = ""; 53 | vector dataDirs; 54 | vector testAnnotations; 55 | double trainingFraction = 0; 56 | double validationFraction = 0; 57 | double additionalValidationFraction = 0; 58 | double additionalTestFraction = 0; 59 | map statusFilter; 60 | map intersectionFilter; 61 | double binaryThreshold = 1; 62 | bool inBinaryFormat = false; 63 | bool useBins = false; 64 | 65 | for (int i = 1; i < argc; i++) { 66 | if (!strcmp(argv[i], "-o")) 67 | outputFileName = argv[i + 1]; 68 | else if (!strcmp(argv[i], "-r")) 69 | trainingFraction = atof(argv[i + 1]); 70 | else if (!strcmp(argv[i], "-v")) 71 | validationFraction = atof(argv[i + 1]); 72 | else if (!strcmp(argv[i], "-d")) 73 | dataDirs.push_back(argv[i + 1]); 74 | else if (!strcmp(argv[i], "-f")) 75 | testAnnotations.push_back(argv[i + 1]); 76 | else if (!strcmp(argv[i], "-status")) 77 | statusFilter[atoi(argv[i + 1])] = true; 78 | else if (!strcmp(argv[i], "-inter")) 79 | intersectionFilter[atoi(argv[i + 1])] = true; 80 | else if (!strcmp(argv[i], "-usebins")) 81 | useBins = true; 82 | else if (!strcmp(argv[i], "-av")) 83 | additionalValidationFraction = atof(argv[i + 1]); 84 | else if (!strcmp(argv[i], "-at")) 85 | additionalTestFraction = atof(argv[i + 1]); 86 | else if (!strcmp(argv[i], "-b")) { 87 | binaryThreshold = atof(argv[i + 1]); 88 | inBinaryFormat = true; 89 | } else if (!strcmp(argv[i], "-h")) { 90 | cout << 91 | "Usage: xmlToIDX -o -r -v " << 92 | " -status -inter [-d ]+" << 93 | " [-b binaryThreshold] [-usebins] [-f testAnnotationFileName]" << 94 | " [-av ]" << 95 | " [-at ] [-h for usage]" << endl; 96 | return 0; 97 | } 98 | } 99 | 100 | if (outputFileName == "") { 101 | cout << 102 | "Usage: xmlToIDX -o -r -v " << 103 | " -status -inter [-d ]+" << 104 | " [-b binaryThreshold] [-usebins] [-f testAnnotationFileName]" << 105 | " [-av ]" << 106 | " [-at ] [-h for usage]" << endl; 107 | return -1; 108 | } 109 | 110 | CvPoint center; 111 | center.x = Globals::roiWidth / 2; 112 | center.y = Globals::roiHeight / 2; 113 | 114 | CvSize size; 115 | size.width = Globals::roiWidth; 116 | size.height = Globals::roiHeight; 117 | 118 | double scale = 0.3; 119 | 120 | try { 121 | Preprocess preprocess(outputFileName, size, scale, center, 122 | statusFilter, intersectionFilter, 123 | roiFunction, useBins, inBinaryFormat, binaryThreshold); 124 | 125 | for (unsigned int i = 0; i < dataDirs.size(); i++) 126 | preprocess.addTrainingSet(dataDirs[i]); 127 | for (unsigned int i = 0; i < testAnnotations.size(); i++) 128 | preprocess.addTestSet(testAnnotations[i], additionalValidationFraction, 129 | additionalTestFraction); 130 | 131 | if (useBins) 132 | preprocess.generate(trainingFraction, validationFraction); 133 | else 134 | preprocess.generateSequences(trainingFraction, validationFraction); 135 | } catch (string err) { 136 | cout << err << endl; 137 | } 138 | 139 | return 0; 140 | } 141 | --------------------------------------------------------------------------------