├── README.md
├── dl
├── genmlp.py
├── logistic_sgd.py
└── pickler.py
├── scripts
└── process.sh
└── utils
└── src
├── Annotations.cpp
├── Annotations.h
├── DataStream.py
├── GazeTracker.cpp
├── GazeTracker.h
├── Globals.cpp
├── Globals.h
├── Makefile
├── Preprocess.cpp
├── Preprocess.h
├── annotator.cpp
├── classify.cpp
└── xmlToIDX.cpp
/README.md:
--------------------------------------------------------------------------------
1 | DeepLearning
2 | ============
3 |
4 | Code to build MLP models for outdoor head orientation tracking
5 |
6 | The following is the set of directories with a description of what is
7 | in each.
8 |
9 | ### dl
10 | The directory with the python files to train and test an MLP.
11 |
12 | The file *genmlp.py* is based on the mlp.py that is part of the Theano
13 | documentation. It is more general purpose in that one can configure
14 | a network of arbitrary depth and number of nodes per layer. It also
15 | implements a sliding window for training that enables one to train
16 | data sets of arbitrary size on limited GPU memory.
17 |
18 | The file *logistic_sgd.py* comes with a nice reporting function that
19 | builds a matrix with classification results on the test set where we
20 | show the number of correctly classified frames and the distribution
21 | of the incorrectly classified frames across all classes.
22 |
23 | The file *pickler.py* has a number of helper methods that can be used to
24 | build files with the data that conform to the Theano input format.
25 | The file takes as input files in the MNIST IDX format. It can be used
26 | to chunk data sets into multiple sets of files, one for training, one
27 | for validation, and the last for test.
28 |
29 | ### utils
30 | The directory with C++ code that can be used to generate datasets in
31 | the MNIST IDX format from labeled data. The labels correspond to a
32 | partition of the space in front of a driver in a car, with the
33 | following values,
34 |
35 | 1. Driver window
36 | 2. Left of center
37 | 3. Straight ahead
38 | 4. Right of center
39 | 5. Passenger window
40 |
41 | Given a video of the driver, an annotation file for that video has the
42 | following format,
43 |
44 |
45 |
46 |
47 | 1
48 | 0,0
49 | 9
50 | 1
51 | 4
52 |
53 |
54 | 2
55 | 0,0
56 | 9
57 | 1
58 | 4
59 |
60 | ...
61 | ...
62 |
63 |
64 | where, the directory is expected to contain frames from the video with the
65 | following filenames *'frame_%d.png'%frameNumber*. Each video frame is a
66 | 640x480 image file with the zone indicating the class, the status indicating
67 | the car status, and the intersection indicating the type of intersection.
68 | For the purposes of building the data sets, we only use the zone information
69 | at this point. The center is expected to be the rough center of the location
70 | of the face in each frame.
71 |
72 | The pre-processing that is done on the images is as follows,
73 |
74 | 1. A Region of Interest (ROI) of configurable size (Globals.cpp) is picked
75 | around the image center.
76 | 2. A histogram equalization followed by edge detection is performed.
77 | 3. A DC suppression using a sigmoid is then done.
78 | 4. A gaussian window function is applied around the center.
79 | 5. The image is scaled and a vector generated from the image matrix in
80 | row-major.
81 |
82 | ### Build
83 |
84 | Do a make mode=opt in utils/src to build optimized executables. The dependencies is *OpenCV*. This builds everything and places them in an *install* directory under DeepLearning.
85 |
86 | #### Data generation
87 | To generate data sets, use the following commands,
88 |
89 | xmlToIDX
90 | -o
91 | -r
92 | -v
93 | -status
94 | -inter
95 | [-d ]+
96 | [-b binaryThreshold]
97 | [-usebins]
98 | [-h for usage]
99 |
100 | if the outputFileNameSuffix is ubyte, then run the following command to generate pickled numpy arrays from the IDX data sets
101 |
102 | python pickler.py data-train-ubyte label-train-ubyte data-valid-ubyte label-valid-ubyte data-test-ubyte label-test-ubyte gaze_data.pkl
103 |
104 | which will generate sets of training, validation, and test files with the prefix *gaze_data.pkl*. The number of files generated in each set will depend on the chunking size used in pickler.py. The data is broken up into chunks and files are generated one per chunk; as an example the set of test files will be *gaze_data.pkl_test_%d.gz*, with the integer argument in range(numberOfChunks). The first command builds the IDX format data sets. The second converts them into a numpy array of tuples, with each tuple being an array of data points and an array of labels. We have one tuple for the training data, one for validation, and one for test.
105 |
106 | The options to xmlToIDX are as follows,
107 |
108 | * -o is the suffix to use for all generated files
109 | * -r is the training fraction in the interval [0, 1)
110 | * -v is the validation fraction in the interval (0, 1)
111 | * -usebins is used to bin the data based on their labels. We generate as many
112 | data points as argmin_{l \in labels} |D_l|, where D_l is the set of data
113 | points with label l; in other workds we pick as many data points as the
114 | cardinality of the smallest set of data points across all labels. This is to
115 | prevent our network from being biased to class label 3, which is straight
116 | ahead. A large fraction of the frames have the driver facing straight ahead
117 | which causes an enormous bias during training without binning.
118 | * -d a directory of images for training. An annotation file called
119 | annotations.xml is expected to be present in each such directory.
120 | * -b is used to specify a binary threshold that is used to generate image pixel data as binary values with all pixel values above the threshold considered as 1, with the rest being 0.
121 | * -status is used to pick only those frames that have a car status annotation that matches what follows this flag
122 | * -intersection is used to pick only those frames that have the intersection annotation that matches what follows this flag
123 |
124 | The second command builds the tuples of numpy arrays as required by the
125 | Theano based trainer. This one takes as input the training, validation,
126 | and test data and label files with the prefix to use for the generated
127 | file names.
128 |
129 | ### Training and classification
130 |
131 | Training and classification can be done using genmlp.py. The following
132 | command will train a network and generate a report with the validation
133 | error rate, test error rate, and the distribution of the numbers of
134 | frames across all classes together with the expected number of frames
135 | per class.
136 |
137 | THEANO_FLAGS=mode=FAST_RUN,device=gpu,floatX=float32 python genmlp.py
138 | (-d datasetDir | -f datasetFileName)
139 | [-p prefix]
140 | [-b batchSize]
141 | [-nt nTrainFiles]
142 | [-nv nValidFiles]
143 | [-ns nTestFiles]
144 | [-i inputLength]
145 | [-o numClasses]
146 | [-gen modelFileName]
147 | [-l [nLayer1Size, nLayer2Size, ...]]
148 | [-classify]
149 | [-useparams paramsFileName]
150 | [-h help]
151 |
152 | The options are,
153 |
154 | * -d the directory that contains the data sets
155 | * -f a single file that contains the complete pickled data. This is useful when the data sets are small enough to be pickled into one file
156 | * -p the file name prefix for the files names that hold the data sets
157 | * -nt the number of training files
158 | * -nv the number of validation files
159 | * -ns the number of test files
160 | * -l the configuration of the hidden layers in the network, with as many
161 | hidden layers as the number of comma separated elements with the size of
162 | each hidden layer being the elements
163 | * -o the number of labels
164 | * -i the input dimension of the data
165 | * -gen to generate the trained model for use outside Theano. This is as a text
166 | file. We also generate a pickled file called params.gz in the training data
167 | set directory that contains the numpy weights and biases of all hidden layers
168 | and the final logistic layer.
169 |
170 | For questions please send mail to: vishwa.raman@west.cmu.edu
171 |
172 | Thanks for looking.
173 |
174 |
175 |
--------------------------------------------------------------------------------
/dl/genmlp.py:
--------------------------------------------------------------------------------
1 | """
2 | mlpseq.py
3 | This file contains an implementation for the training of a configurable
4 | multi-layer perceptron. It is a generalized implementation of mlp.py
5 | from the theano DeepLearningTutorials. The specific enhancements include,
6 |
7 | 1. Command line configurable network
8 | 2. Ability to generate the model as a pickled file and as a text file
9 | 3. A sliding window implementation to handle large data sets given any
10 | GPU to selectively load data into available GPU memory
11 | 4. A reporting infrastructure that shows the expected vs. classified
12 | classes for all test data points that tracks not only how many data
13 | points were misclassified but their distribution over classes
14 |
15 | The training proceeds in the following manner,
16 |
17 | 1. The first hidden layer is paired with a logistic layer and the
18 | parameters are trained
19 | 2. For all subsequent hidden layers, the following training steps
20 | are followed,
21 | a. Drop the parameters of the logistic layer, but retain the
22 | parameter values of all hidden layers trained so far
23 | b. Add the next hidden layer and the logistic layer
24 | c. Train the parameters of the newly added hidden layer and the
25 | logistic layer
26 | 3. A final pass that includes all parameters is optional and is
27 | being done in the main function
28 |
29 | The model is the values of all weights and biases from the first to the
30 | last hidden layer and the logistic regressor.
31 |
32 | References:
33 |
34 | - textbooks: "Pattern Recognition and Machine Learning" -
35 | Christopher M. Bishop, section 5
36 |
37 | """
38 | __docformat__ = 'restructedtext en'
39 |
40 |
41 | import cPickle
42 | import gzip
43 | import os
44 | import sys
45 | import time
46 |
47 | import numpy
48 |
49 | import theano
50 | import theano.tensor as T
51 | from pickler import getLists, getPickledLists, getPickledList
52 |
53 | from logistic_sgd import LogisticRegression, load_data
54 |
55 | class HiddenLayer(object):
56 | def __init__(self, rng, input, n_in, n_out, W=None, b=None,
57 | activation=T.tanh):
58 | """
59 | Typical hidden layer of a MLP: units are fully-connected and have
60 | sigmoidal activation function. Weight matrix W is of shape (n_in,n_out)
61 | and the bias vector b is of shape (n_out,).
62 |
63 | NOTE : The nonlinearity used here is tanh
64 |
65 | Hidden unit activation is given by: tanh(dot(input,W) + b)
66 |
67 | :type rng: numpy.random.RandomState
68 | :param rng: a random number generator used to initialize weights
69 |
70 | :type input: theano.tensor.dmatrix
71 | :param input: a symbolic tensor of shape (n_examples, n_in)
72 |
73 | :type n_in: int
74 | :param n_in: dimensionality of input
75 |
76 | :type n_out: int
77 | :param n_out: number of hidden units
78 |
79 | :type activation: theano.Op or function
80 | :param activation: Non linearity to be applied in the hidden
81 | layer
82 | """
83 | self.input = input
84 |
85 | # `W` is initialized with `W_values` which is uniformely sampled
86 | # from sqrt(-6./(n_in+n_hidden)) and sqrt(6./(n_in+n_hidden))
87 | # for tanh activation function
88 | # the output of uniform if converted using asarray to dtype
89 | # theano.config.floatX so that the code is runable on GPU
90 | # Note : optimal initialization of weights is dependent on the
91 | # activation function used (among other things).
92 | # For example, results presented in [Xavier10] suggest that you
93 | # should use 4 times larger initial weights for sigmoid
94 | # compared to tanh
95 | # We have no info for other function, so we use the same as
96 | # tanh.
97 | if W is None:
98 | W_values = numpy.asarray(rng.uniform(
99 | low=-numpy.sqrt(6. / (n_in + n_out)),
100 | high=numpy.sqrt(6. / (n_in + n_out)),
101 | size=(n_in, n_out)), dtype=theano.config.floatX)
102 | if activation == theano.tensor.nnet.sigmoid:
103 | W_values *= 4
104 |
105 | W = theano.shared(value=W_values, name='W%d'%n_out, borrow=True)
106 |
107 | if b is None:
108 | b_values = numpy.zeros((n_out,), dtype=theano.config.floatX)
109 | b = theano.shared(value=b_values, name='b%d'%n_out, borrow=True)
110 |
111 | self.W = W
112 | self.b = b
113 |
114 | self.lin_output = T.dot(input, self.W) + self.b
115 | self.output = (self.lin_output if activation is None
116 | else activation(self.lin_output))
117 | # parameters of the model
118 | self.params = [self.W, self.b]
119 |
120 | class MLP(object):
121 | """Multi-Layer Perceptron Class
122 |
123 | A multilayer perceptron is a feedforward artificial neural network model
124 | that has one layer or more of hidden units and nonlinear activations.
125 | Intermediate layers usually have as activation function tanh or the
126 | sigmoid function (defined here by a ``SigmoidalLayer`` class) while the
127 | top layer is a softmax layer (defined here by a ``LogisticRegression``
128 | class).
129 | """
130 |
131 | def __init__(self, rng, input, n_in, n_out, layers, weights, biases,
132 | includeAllParams = False, activation = T.tanh):
133 | """Initialize the parameters for the multilayer perceptron
134 |
135 | :type rng: numpy.random.RandomState
136 | :param rng: a random number generator used to initialize weights
137 |
138 | :type input: theano.tensor.TensorType
139 | :param input: symbolic variable that describes the input of the
140 | architecture (one minibatch)
141 |
142 | :type layers: numpy.array
143 | :param layers: the number of layers and the number of hidden units
144 | per layer
145 |
146 | :type weights: numpy.array of layer weights
147 | :param weights: the weights for each layer and the logistic layer
148 | (any or all elements may be None)
149 |
150 | :type biases: numpy.array of layer biases
151 | :param biases: the biases for each layer and the logistic layer
152 | (any or all elements may be None)
153 |
154 | :type includeAllParams: boolean
155 | :param includeAllParams: flag used to indicate that we want to
156 | include all parameters during training as opposed to just the
157 | top hidden layer and the logistic layer
158 | """
159 |
160 | print 'building MLP'
161 |
162 | # initialize hidden layers
163 | self.hiddenLayers = []
164 |
165 | # build hidden layers
166 | for i in range(len(layers)):
167 | if (i == 0):
168 | print 'Layer %d. n_in = %d, n_out = %d'%(i + 1, n_in, layers[i])
169 | self.hiddenLayers.append(HiddenLayer(rng=rng, input=input,
170 | n_in=n_in, n_out=layers[i],
171 | activation=activation,
172 | W=weights[i], b=biases[i]))
173 | else:
174 | print 'Layer %d. n_in = %d, n_out = %d'%(i + 1, layers[i - 1], layers[i])
175 | self.hiddenLayers.append(HiddenLayer(rng=rng, input=self.hiddenLayers[i - 1].output,
176 | n_in=layers[i - 1], n_out=layers[i],
177 | activation=activation,
178 | W=weights[i], b=biases[i]))
179 |
180 | lastHiddenLayer = self.hiddenLayers[-1]
181 |
182 | # The logistic regression layer gets as input the hidden units
183 | # of the last hidden layer
184 | self.logRegressionLayer = LogisticRegression(
185 | input=lastHiddenLayer.output,
186 | n_in=layers[-1],
187 | n_out=n_out,
188 | W=weights[-1], b=biases[-1])
189 |
190 | # The MLP implemented here is called Progressive MLP as it learns parameters
191 | # of hidden layers one at a time, re-using the results for all prior
192 | # layers from previous learning iterations
193 |
194 | # L1 norm ; one regularization option is to enforce L1 norm to
195 | # be small
196 | if (includeAllParams == True):
197 | self.L1 = sum([abs(self.hiddenLayers[i].W.sum()) for i in xrange(len(layers))]) + \
198 | abs(self.logRegressionLayer.W).sum()
199 | else:
200 | self.L1 = abs(lastHiddenLayer.W).sum() + abs(self.logRegressionLayer.W).sum()
201 |
202 | # square of L2 norm ; one regularization option is to enforce
203 | # square of L2 norm to be small
204 | if (includeAllParams == True):
205 | self.L2_sqr = sum([(self.hiddenLayers[i].W ** 2).sum() for i in xrange(len(layers))]) + \
206 | (self.logRegressionLayer.W ** 2).sum()
207 | else:
208 | self.L2_sqr = (lastHiddenLayer.W ** 2).sum() + (self.logRegressionLayer.W ** 2).sum()
209 |
210 | # negative log likelihood of the MLP is given by the negative
211 | # log likelihood of the output of the model, computed in the
212 | # logistic regression layer
213 | self.negative_log_likelihood = self.logRegressionLayer.negative_log_likelihood
214 |
215 | # same holds for the function computing the number of errors
216 | self.errors = self.logRegressionLayer.errors
217 |
218 | # the error report function gives more detailed info on how well we
219 | # did on the training data. It generates for each class the number of
220 | # data points that were correctly classified, the number of data points
221 | # that were expected to be classified as belonging to the class, and
222 | # the numbers of data points that were incorrectly classifier with their
223 | # distribution over all the unexpected classes
224 | self.errorReport = self.logRegressionLayer.errorReport
225 |
226 | # the parameters of the model are the parameters of the final two layers it is
227 | # made out of, that is the last hidden layer that was added followed by the
228 | # logistic layer
229 | self.params = self.logRegressionLayer.params
230 | if (includeAllParams == True):
231 | for i in range(len(layers)):
232 | self.params = self.params + self.hiddenLayers[i].params
233 | else:
234 | self.params = self.params + lastHiddenLayer.params
235 |
236 | class ProgressiveMLP(object):
237 | """Training class for MLP
238 |
239 | The class implements methods to train an MLP, test the models generated
240 | against the held-out (validation) set and apply the model on the test
241 | data when there is an improvement in error rates over the validation set.
242 | The class implements a sliding window over the training data to ensure
243 | that at any given point in time, we only have as much data in GPU memory
244 | as the determined by the capacity of the GPU memory.
245 |
246 | The class also provides a classify method that can be used to classify
247 | a given data set using parameters learned from the training cycle. This
248 | can be used to target the model to different test sets.
249 | """
250 |
251 | def __init__(self, n_in, n_out, layers, weights=None, biases=None, nBatchSize = 20,
252 | nWindowSize = 64000, includeAllParams = False):
253 | """Initialize the parameters for the multilayer perceptron trainer
254 |
255 | :type n_in: integer
256 | :param n_in: the number of inputs
257 |
258 | :type n_out: integer
259 | :param n_out: the number of outputs (classes)
260 |
261 | :type layers: numpy.array
262 | :param layers: the number of layers and the number of hidden units
263 | per layer
264 |
265 | :type weights: numpy.array of layer weights
266 | :param weights: the weights for each layer and the logistic layer
267 | (any or all elements may be None)
268 |
269 | :type biases: numpy.array of layer biases
270 | :param biases: the biases for each layer and the logistic layer
271 | (any or all elements may be None)
272 |
273 | :type nBatchSize: integer
274 | :param nBatchSize: the size of each minibatch in data points
275 |
276 | :type nWindowSize: integer
277 | :param nWindowSize: the size of the sliding window over the training
278 | data. This can be picked based on the size of the GPU memory
279 | """
280 |
281 | self.classifier = None
282 | self.nSharedLen = nWindowSize
283 | self.batchSize = nBatchSize
284 | self.datasets = None
285 | self.n_in = n_in
286 |
287 | # allocate symbolic variables for the data
288 | self.index = T.lscalar('index') # index to minibatch
289 | self.x = T.matrix('x') # the data is presented as rasterized images
290 | self.y = T.ivector('y') # the labels are presented as 1D vector of
291 | # [int] labels
292 |
293 | rng = numpy.random.RandomState(1234)
294 |
295 | # construct the MLP class
296 | self.classifier = MLP(rng=rng,
297 | input = self.x,
298 | n_in = n_in,
299 | n_out = n_out,
300 | layers = layers,
301 | weights = weights,
302 | biases = biases,
303 | includeAllParams = includeAllParams)
304 |
305 | def initializeSharedData(self, data_xy, length, borrow=True):
306 | """
307 | setup shared data for use on the GPU
308 |
309 | We allocate a numpy array that is as large as length and make it shared.
310 | All subsequent computations on the GPU use the sharedData_x and
311 | sharedData_y arrays. The length is configurable and should be so chosen
312 | that we can load as many elements in the available GPU memory
313 | """
314 | data_x, data_y = data_xy
315 | sharedData_x = theano.shared(numpy.asarray(data_x[:length],
316 | dtype=theano.config.floatX),
317 | borrow=borrow)
318 | sharedData_y = theano.shared(numpy.asarray(data_y[:length],
319 | dtype=theano.config.floatX),
320 | borrow=borrow)
321 | return sharedData_x, sharedData_y
322 |
323 | def getNumberOfSplitBatches(self):
324 | """
325 | Given a size of the arrays that are stored in the GPU memory, this
326 | method returns the number of batches that can be accomodated in
327 | that size
328 | """
329 | return self.nSharedLen / self.batchSize
330 |
331 | def getWindowData(self, data_xy, miniBatchIndex):
332 | """
333 | method used to return a chunk of data from the data_xy that is as big
334 | as the size of our sliding window, based on a miniBatchIndex. The
335 | miniBatchIndex will range over all the minibatches in data_xy.
336 | Given a miniBatchIndex, we determine which data chunk contains
337 | that miniBatchIndex and return that chunk from data_xy
338 | """
339 | data_x, data_y = data_xy
340 | index = miniBatchIndex / self.getNumberOfSplitBatches()
341 | # print ' Returning data_xy[%d].'%(index)
342 | return data_x[index], data_y[index]
343 |
344 | def splitList(self, data_xy):
345 | """
346 | method used to split data_xy into chunks, where each chunk is as big
347 | as the window size (self.nSharedLen)
348 | """
349 | data_x, data_y = data_xy
350 |
351 | print 'in split. %d %d %d'%(len(data_x), len(data_y), len(data_x[0]))
352 | split_x = numpy.split(data_x, range(0, len(data_x), self.nSharedLen))
353 | split_y = numpy.split(data_y, range(0, len(data_y), self.nSharedLen))
354 |
355 | return split_x[1:], split_y[1:]
356 |
357 | def getNumberOfBatches(self, data_xy):
358 | """
359 | method used to get the total number of batches in data_xy. The
360 | method expects an array of chunks, walks each chunk and accumulates
361 | the number of batches in that chunk
362 | """
363 | data_x, data_y = data_xy
364 | nBatches = 0
365 | for i in range(0, len(data_x)):
366 | nBatches = nBatches + len(data_x[i]) / self.batchSize
367 | return nBatches
368 |
369 | def loadDataSets(self, datasets, datasetFileName, datasetDirectory, prefix,
370 | nTrainFiles, nValidFiles, nTestFiles):
371 | """
372 | method to load the data sets. The data sets are expected to either be in
373 | a single pickled file that contains training, validation, and test sets
374 | as an array of tuples, where each tuple is two arrays, one for the data
375 | and the other for the labels. Please refer to the MNIST data format for
376 | more on this. We follow the same format here.
377 |
378 | :type datasets: numpy.array of tuples
379 | :param datasets: an array of tuples one each for training, validation and test
380 |
381 | :type datasetFileName: string
382 | :param datasetFileName: the name of a pickled file that contains all the data
383 | May be None in which case we expect that the datasetDirectory points to
384 | the location of the data.
385 |
386 | :type datasetDirectory: string
387 | :param datasetDirectory: location of the data
388 |
389 | :type prefix: string
390 | :param prefix: the filename prefix to use for the data files. The filenames
391 | are composed using the prefix and one of _train_, _valid_, or _test_
392 | followed by an index ranging from 0 to the number of files for each.
393 | The number of training files, validation files and test files are also
394 | passed as additional arguments to this method.
395 |
396 | NOTE: We propose both mechanisms to load data as for smaller data sets
397 | the pickler can generate a single pickled file, but for large data sets
398 | we need to chunk the data up and pickle each chunk separately. This is
399 | due to a limitation in cPickle that cannot handle very large files
400 |
401 | """
402 | if datasets is None:
403 | # Load the dataset
404 | if (datasetFileName is not None):
405 | f = gzip.open(datasetFileName, 'rb')
406 | trainSet, validSet, testSet = cPickle.load(f)
407 | f.close()
408 | else:
409 | trainSet = getPickledList(datasetDirectory, prefix + '_train_', nTrainFiles)
410 | validSet = getPickledList(datasetDirectory, prefix + '_valid_', nValidFiles)
411 | testSet = getPickledList(datasetDirectory, prefix + '_test_', nTestFiles)
412 | self.datasets = (trainSet, validSet, testSet)
413 | else:
414 | self.datasets = datasets
415 |
416 | def classify(self, learningRate = 0.01, L1_reg = 0.00, L2_reg = 0.0001, n_epochs = 1000,
417 | datasetFileName = None,
418 | datasetDirectory = None,
419 | prefix = 'gaze_data.pkl',
420 | nTrainFiles = 1,
421 | nValidFiles = 1,
422 | nTestFiles = 1,
423 | datasets = None,
424 | batchSize = 20):
425 | """
426 | method used to classify a given test set against a model that is expected
427 | to have been loaded. The method is akin to a subset of the training method
428 | in that it simply computes the validation loss, test loss, and reports
429 | the test error, for a given model
430 | """
431 |
432 | if self.datasets is None:
433 | self.loadDataSets(datasets, datasetFileName, datasetDirectory, prefix,
434 | nTrainFiles, nValidFiles, nTestFiles)
435 |
436 | self.batchSize = batchSize
437 |
438 | validSet = self.datasets[1]
439 | testSet = self.datasets[2]
440 |
441 | validSet_x, validSet_y = self.initializeSharedData(self.datasets[1],len(validSet[0]))
442 | testSet_x, testSet_y = self.initializeSharedData(self.datasets[2], len(testSet[0]))
443 |
444 | # compute number of minibatches for validation and testing
445 | nValidBatches = len(validSet[0]) / self.batchSize
446 | nTestBatches = len(testSet[0]) / self.batchSize
447 |
448 | print nValidBatches
449 | print nTestBatches
450 |
451 | # compiling a Theano function that computes the mistakes that are made
452 | # by the model on a minibatch
453 | test_model = theano.function(inputs=[self.index],
454 | outputs=self.classifier.errors(self.y),
455 | givens={
456 | self.x: testSet_x[self.index * batchSize:(self.index + 1) * batchSize],
457 | self.y: T.cast(testSet_y[self.index * batchSize:(self.index + 1) * batchSize],
458 | 'int32')})
459 |
460 | validate_model = theano.function(inputs=[self.index],
461 | outputs=self.classifier.errors(self.y),
462 | givens={
463 | self.x: validSet_x[self.index * batchSize:(self.index + 1) * batchSize],
464 | self.y: T.cast(validSet_y[self.index * batchSize:(self.index + 1) * batchSize],
465 | 'int32')})
466 |
467 | # error reporting function that computes the overall rate of misclassification
468 | # by class
469 | error_model = theano.function(inputs=[self.index],
470 | outputs=self.classifier.errorReport(self.y, batchSize),
471 | givens={
472 | self.x: testSet_x[self.index * batchSize:(self.index + 1) * batchSize],
473 | self.y: T.cast(testSet_y[self.index * batchSize:(self.index + 1) * batchSize],
474 | 'int32')})
475 |
476 | validationLosses = [validate_model(i) for i
477 | in xrange(nValidBatches)]
478 | validationLoss = numpy.mean(validationLosses)
479 |
480 | # test it on the test set
481 | testLosses = [test_model(i) for i
482 | in xrange(nTestBatches)]
483 | testScore = numpy.mean(testLosses)
484 |
485 | print(('Best validation score of %f %% with test performance %f %%') %
486 | (validationLoss * 100., testScore * 100.))
487 | print('Classification errors by class')
488 | error_mat = [error_model(i) for i in xrange(nTestBatches)]
489 | class_errors = error_mat[0]
490 | for i in xrange(len(error_mat) - 1):
491 | class_errors = numpy.add(class_errors, error_mat[i + 1])
492 | print class_errors
493 |
494 | def train(self, learningRate = 0.01, L1_reg = 0.00, L2_reg = 0.0001, n_epochs = 1000,
495 | datasetFileName = None,
496 | datasetDirectory = None,
497 | prefix = 'gaze_data.pkl',
498 | nTrainFiles = 1,
499 | nValidFiles = 1,
500 | nTestFiles = 1,
501 | datasets = None,
502 | batchSize = 20):
503 | """
504 | method that trains the MLP
505 |
506 | The training data is accessed through the sliding window. For each
507 | epoch we walk through the training mini batches and compute a cost
508 | and update the model. Based on the validation frequency, the model
509 | is checked against the validation set and if there is an
510 | improvement at least as much as the improvement threshold, we check
511 | the model against the test set. Other pieces of code do things
512 | such as termination based on patience
513 | """
514 |
515 | if self.datasets is None:
516 | self.loadDataSets(datasets, datasetFileName, datasetDirectory, prefix,
517 | nTrainFiles, nValidFiles, nTestFiles)
518 |
519 | # compute the size of the window we would like to use
520 | self.nSharedLen = batchSize * 2000
521 | self.batchSize = batchSize
522 |
523 | trainSet_x, trainSet_y = self.initializeSharedData(self.datasets[0], self.nSharedLen)
524 |
525 | validSet = self.datasets[1]
526 | testSet = self.datasets[2]
527 |
528 | validSet_x, validSet_y = self.initializeSharedData(self.datasets[1],len(validSet[0]))
529 | testSet_x, testSet_y = self.initializeSharedData(self.datasets[2], len(testSet[0]))
530 |
531 | trainSet = self.splitList(self.datasets[0])
532 |
533 | # compute number of minibatches for training, validation and testing
534 | nTrainBatches = self.getNumberOfBatches(trainSet)
535 | nSplitTrainBatches = self.getNumberOfSplitBatches()
536 | nValidBatches = len(validSet[0]) / self.batchSize
537 | nTestBatches = len(testSet[0]) / self.batchSize
538 |
539 | print nTrainBatches
540 | print nSplitTrainBatches
541 | print nValidBatches
542 | print nTestBatches
543 |
544 | print '... building the model'
545 |
546 | # the cost we minimize during training is the negative log likelihood of
547 | # the model plus the regularization terms (L1 and L2); cost is expressed
548 | # here symbolically
549 | cost = self.classifier.negative_log_likelihood(self.y) \
550 | + L1_reg * self.classifier.L1 \
551 | + L2_reg * self.classifier.L2_sqr
552 |
553 | # compiling a Theano function that computes the mistakes that are made
554 | # by the model on a minibatch
555 | test_model = theano.function(inputs=[self.index],
556 | outputs=self.classifier.errors(self.y),
557 | givens={
558 | self.x: testSet_x[self.index * batchSize:(self.index + 1) * batchSize],
559 | self.y: T.cast(testSet_y[self.index * batchSize:(self.index + 1) * batchSize],
560 | 'int32')})
561 |
562 | validate_model = theano.function(inputs=[self.index],
563 | outputs=self.classifier.errors(self.y),
564 | givens={
565 | self.x: validSet_x[self.index * batchSize:(self.index + 1) * batchSize],
566 | self.y: T.cast(validSet_y[self.index * batchSize:(self.index + 1) * batchSize],
567 | 'int32')})
568 |
569 | # error reporting function that computes the overall rate of misclassification
570 | # by class
571 | error_model = theano.function(inputs=[self.index],
572 | outputs=self.classifier.errorReport(self.y, batchSize),
573 | givens={
574 | self.x: testSet_x[self.index * batchSize:(self.index + 1) * batchSize],
575 | self.y: T.cast(testSet_y[self.index * batchSize:(self.index + 1) * batchSize],
576 | 'int32')})
577 |
578 | # compute the gradient of cost with respect to theta (sotred in params)
579 | # the resulting gradients will be stored in a list gparams
580 | gparams = []
581 | for param in self.classifier.params:
582 | gparam = T.grad(cost, param)
583 | gparams.append(gparam)
584 |
585 | # specify how to update the parameters of the model as a dictionary
586 | updates = {}
587 | # given two list the zip A = [ a1,a2,a3,a4] and B = [b1,b2,b3,b4] of
588 | # same length, zip generates a list C of same size, where each element
589 | # is a pair formed from the two lists :
590 | # C = [ (a1,b1), (a2,b2), (a3,b3) , (a4,b4) ]
591 | for param, gparam in zip(self.classifier.params, gparams):
592 | updates[param] = param - learningRate * gparam
593 |
594 | # compiling a Theano function `train_model` that returns the cost, but
595 | # in the same time updates the parameter of the model based on the rules
596 | # defined in `updates`
597 | train_model = theano.function(inputs=[self.index], outputs=cost,
598 | updates=updates,
599 | givens={
600 | self.x: trainSet_x[self.index * batchSize:(self.index + 1) * batchSize],
601 | self.y: T.cast(trainSet_y[self.index * batchSize:(self.index + 1) * batchSize],
602 | 'int32')})
603 |
604 | print '... training'
605 |
606 | # early-stopping parameters
607 | patience = 20000 # look as this many examples regardless
608 | patience_increase = 2 # wait this much longer when a new best is
609 | # found
610 | improvement_threshold = 0.995 # a relative improvement of this much is
611 | # considered significant
612 | validationFrequency = min(nTrainBatches, patience / 2)
613 | # go through this many
614 | # minibatche before checking the network
615 | # on the validation set; in this case we
616 | # check every epoch
617 |
618 | bestParams = None
619 | bestValidationLoss = numpy.inf
620 | bestIter = 0
621 | testScore = 0.
622 | startTime = time.clock()
623 |
624 | epoch = 0
625 | doneLooping = False
626 |
627 | while (epoch < n_epochs) and (not doneLooping):
628 | epoch = epoch + 1
629 | for minibatchIndex in xrange(nTrainBatches):
630 |
631 | actualMiniBatchIndex = minibatchIndex % nSplitTrainBatches
632 | # print ' actualMiniBatchIndex = %d. miniBatchIndex = %d'\
633 | # %(actualMiniBatchIndex, minibatchIndex)
634 | if (actualMiniBatchIndex == 0):
635 | data_x, data_y = self.getWindowData(trainSet, minibatchIndex)
636 | # print ' Update. data_x[0][0] = %f, data_y[0] = %d.'%(data_x[0][0], data_y[0])
637 | trainSet_x.set_value(data_x, borrow=True)
638 | trainSet_y.set_value(numpy.asarray(data_y,
639 | dtype=theano.config.floatX),
640 | borrow=True)
641 |
642 | minibatchAvgCost = train_model(actualMiniBatchIndex)
643 | # iteration number
644 | iter = epoch * nTrainBatches + minibatchIndex
645 |
646 | if (iter + 1) % validationFrequency == 0:
647 | # compute zero-one loss on validation set
648 | validationLosses = [validate_model(i) for i
649 | in xrange(nValidBatches)]
650 | thisValidationLoss = numpy.mean(validationLosses)
651 |
652 | print('epoch %i, minibatch %i/%i, validation error %f %%' %
653 | (epoch, minibatchIndex + 1, nTrainBatches,
654 | thisValidationLoss * 100.))
655 |
656 | # if we got the best validation score until now
657 | if thisValidationLoss < bestValidationLoss:
658 | #improve patience if loss improvement is good enough
659 | if thisValidationLoss < bestValidationLoss * \
660 | improvement_threshold:
661 | patience = max(patience, iter * patience_increase)
662 |
663 | bestValidationLoss = thisValidationLoss
664 | bestIter = iter
665 |
666 | # test it on the test set
667 | testLosses = [test_model(i) for i
668 | in xrange(nTestBatches)]
669 | testScore = numpy.mean(testLosses)
670 |
671 | print((' epoch %i, minibatch %i/%i, test error of '
672 | 'best model %f %%') %
673 | (epoch, minibatchIndex + 1, nTrainBatches,
674 | testScore * 100.))
675 |
676 | if patience <= iter:
677 | doneLooping = True
678 | break
679 |
680 | endTime = time.clock()
681 | print(('Optimization complete. Best validation score of %f %% '
682 | 'obtained at iteration %i, with test performance %f %%') %
683 | (bestValidationLoss * 100., bestIter, testScore * 100.))
684 | print('Classification errors by class')
685 | error_mat = [error_model(i) for i in xrange(nTestBatches)]
686 | class_errors = error_mat[0]
687 | for i in xrange(len(error_mat) - 1):
688 | class_errors = numpy.add(class_errors, error_mat[i + 1])
689 | print class_errors
690 | print >> sys.stderr, ('The code for file ' +
691 | os.path.split(__file__)[1] +
692 | ' ran for %.2fm' % ((endTime - startTime) / 60.))
693 |
694 | if __name__ == '__main__':
695 | datasetDirectory = None
696 | datasetFileName = None
697 | batchSize = 20
698 | nTrainFiles = 12
699 | nValidFiles = 1
700 | nTestFiles = 1
701 | layers = None
702 | inputs = 1000
703 | outputs = 5
704 | doClassify = False
705 | useParamsFromFile = False
706 | paramsFileName = None
707 | genModels = False
708 | modelsFileName = None
709 | prefix = 'gaze_data.pkl'
710 |
711 | for i in range(len(sys.argv)):
712 | if sys.argv[i] == '-d':
713 | datasetDirectory = sys.argv[i + 1]
714 | elif sys.argv[i] == '-f':
715 | datasetDirectory = None
716 | datasetFileName = sys.argv[i + 1]
717 | elif sys.argv[i] == '-b':
718 | batchSize = int(sys.argv[i + 1])
719 | elif sys.argv[i] == '-nt':
720 | nTrainFiles = int(sys.argv[i + 1])
721 | elif sys.argv[i] == '-nv':
722 | nValidFiles = int(sys.argv[i + 1])
723 | elif sys.argv[i] == '-ns':
724 | nTestFiles = int(sys.argv[i + 1])
725 | elif sys.argv[i] == '-p':
726 | prefix = sys.argv[i + 1]
727 | elif sys.argv[i] == '-l':
728 | l = sys.argv[i + 1]
729 | li = l.split(',')
730 | layers = numpy.array(li, dtype=numpy.int64)
731 | elif sys.argv[i] == '-o':
732 | outputs = int(sys.argv[i + 1])
733 | elif sys.argv[i] == '-i':
734 | inputs = int(sys.argv[i + 1])
735 | elif sys.argv[i] == '-classify':
736 | doClassify = True
737 | elif sys.argv[i] == '-gen':
738 | genModels = True
739 | modelsFileName = sys.argv[i + 1]
740 | elif sys.argv[i] == '-useparams':
741 | useParamsFromFile = True
742 | paramsFileName = sys.argv[i + 1]
743 | elif sys.argv[i] == '-h':
744 | print('Usage: mlp.py (-d datasetDir | -f datasetFileName) [-p prefix] [-b batchSize]' +
745 | '[-nt nTrainFiles] [-nv nValidFiles] [-ns nTestFiles]' +
746 | '[-i inputLength] [-o numClasses] [-gen modelFileName]'+
747 | '[-l [nLayer1Size, nLayer2Size, ...]] [-classify] [-useparams paramsFileName]' +
748 | '[-h help]')
749 | sys.exit()
750 |
751 | if (doClassify == False):
752 | if (paramsFileName is not None):
753 | print 'loading parameters from ' + paramsFileName
754 | paramsFileHandle = gzip.open(paramsFileName, 'rb')
755 | params = cPickle.load(paramsFileHandle)
756 | weights, biases = params
757 | else:
758 | weights = []
759 | biases = []
760 | # + 1 for the logistic layer
761 | for i in range(len(layers) + 1):
762 | weights.append(None)
763 | biases.append(None)
764 |
765 | # initialize datasets
766 | datasets = None
767 |
768 | l = []
769 | for i in xrange(len(layers)):
770 | W = []
771 | b = []
772 | l.append(layers[i])
773 |
774 | for j in xrange(i):
775 | W.append(weights[j])
776 | b.append(biases[j])
777 | # One for the final hidden layer and another for the logistic layer
778 | W.extend([None, None])
779 | b.extend([None, None])
780 |
781 | mlp = ProgressiveMLP(n_in = inputs, n_out = outputs, layers = l,
782 | weights = W, biases = b)
783 | if (datasets is not None):
784 | mlp.train(datasets = datasets)
785 | else:
786 | mlp.train(
787 | datasetDirectory = datasetDirectory,
788 | datasetFileName = datasetFileName,
789 | prefix = prefix,
790 | batchSize = batchSize,
791 | nTrainFiles = nTrainFiles,
792 | nValidFiles = nValidFiles,
793 | nTestFiles = nTestFiles)
794 |
795 | weights = []
796 | biases = []
797 | for i in range(len(l)):
798 | weights.append(mlp.classifier.hiddenLayers[i].W)
799 | biases.append(mlp.classifier.hiddenLayers[i].b)
800 | datasets = mlp.datasets
801 |
802 | # final pass where we include all parameters
803 | weights.append(mlp.classifier.logRegressionLayer.W)
804 | biases.append(mlp.classifier.logRegressionLayer.b)
805 | mlp = ProgressiveMLP(n_in = inputs, n_out = outputs, layers = l,
806 | weights = weights, biases = biases,
807 | includeAllParams = True)
808 | mlp.train(datasets = datasets)
809 |
810 | # if we want the models written out then
811 | if (genModels == True):
812 | # first write out the models as numpy arrays using the pickler
813 | paramsFileName = datasetDirectory + '/params.gz'
814 | weights = []
815 | biases = []
816 | for i in xrange(len(mlp.classifier.hiddenLayers)):
817 | weights.append(mlp.classifier.hiddenLayers[i].W)
818 | biases.append(mlp.classifier.hiddenLayers[i].b)
819 | weights.append(mlp.classifier.logRegressionLayer.W)
820 | biases.append(mlp.classifier.logRegressionLayer.b)
821 |
822 | params = (weights, biases)
823 | paramsFileHandle = gzip.open(paramsFileName, 'wb')
824 | cPickle.dump(params, paramsFileHandle)
825 |
826 | # now write out the models as a text file for loading outside Python
827 | mfp = open(modelsFileName, 'wb')
828 | for i in xrange(len(mlp.classifier.hiddenLayers) + 1):
829 | W = weights[i].get_value()
830 | b = biases[i].get_value()
831 | mfp.write('%d,%d\n'%(W.shape[0], W.shape[1]))
832 | for j in xrange(W.shape[0]):
833 | for k in xrange(W.shape[1]):
834 | mfp.write('%f'%W[j][k])
835 | mfp.write(',')
836 | mfp.write('\n')
837 | mfp.write('%d\n'%b.shape[0])
838 | for j in xrange(b.shape[0]):
839 | mfp.write('%f'%b[j])
840 | mfp.write(',')
841 | mfp.write('\n')
842 | mfp.close()
843 | else:
844 | # classify mode
845 | if (paramsFileName == ''):
846 | print 'Parameters file is required for sequential MLP'
847 | sys.exit()
848 |
849 | print 'loading parameters from ' + paramsFileName
850 | paramsFileHandle = gzip.open(paramsFileName, 'rb')
851 | params = cPickle.load(paramsFileHandle)
852 | weights, biases = params
853 |
854 | mlp = ProgressiveMLP(n_in=inputs, n_out=outputs, layers=layers,
855 | weights=weights, biases=biases)
856 | mlp.classify(
857 | datasetDirectory=datasetDirectory,
858 | datasetFileName=datasetFileName,
859 | prefix=prefix,
860 | batchSize=batchSize,
861 | nTrainFiles=nTrainFiles,
862 | nValidFiles=nValidFiles,
863 | nTestFiles=nTestFiles)
864 |
865 |
--------------------------------------------------------------------------------
/dl/logistic_sgd.py:
--------------------------------------------------------------------------------
1 | """
2 | This tutorial introduces logistic regression using Theano and stochastic
3 | gradient descent.
4 |
5 | Logistic regression is a probabilistic, linear classifier. It is parametrized
6 | by a weight matrix :math:`W` and a bias vector :math:`b`. Classification is
7 | done by projecting data points onto a set of hyperplanes, the distance to
8 | which is used to determine a class membership probability.
9 |
10 | Mathematically, this can be written as:
11 |
12 | .. math::
13 | P(Y=i|x, W,b) &= softmax_i(W x + b) \\
14 | &= \frac {e^{W_i x + b_i}} {\sum_j e^{W_j x + b_j}}
15 |
16 |
17 | The output of the model or prediction is then done by taking the argmax of
18 | the vector whose i'th element is P(Y=i|x).
19 |
20 | .. math::
21 |
22 | y_{pred} = argmax_i P(Y=i|x,W,b)
23 |
24 |
25 | This tutorial presents a stochastic gradient descent optimization method
26 | suitable for large datasets, and a conjugate gradient optimization method
27 | that is suitable for smaller datasets.
28 |
29 |
30 | References:
31 |
32 | - textbooks: "Pattern Recognition and Machine Learning" -
33 | Christopher M. Bishop, section 4.3.2
34 |
35 | """
36 | __docformat__ = 'restructedtext en'
37 |
38 | import cPickle
39 | import gzip
40 | import os
41 | import sys
42 | import time
43 | import math
44 |
45 | import numpy
46 |
47 | import theano
48 | import theano.tensor as T
49 |
50 |
51 | class LogisticRegression(object):
52 | """Multi-class Logistic Regression Class
53 |
54 | The logistic regression is fully described by a weight matrix :math:`W`
55 | and bias vector :math:`b`. Classification is done by projecting data
56 | points onto a set of hyperplanes, the distance to which is used to
57 | determine a class membership probability.
58 | """
59 |
60 | def __init__(self, input, n_in, n_out, W=None, b=None):
61 | """ Initialize the parameters of the logistic regression
62 |
63 | :type input: theano.tensor.TensorType
64 | :param input: symbolic variable that describes the input of the
65 | architecture (one minibatch)
66 |
67 | :type n_in: int
68 | :param n_in: number of input units, the dimension of the space in
69 | which the datapoints lie
70 |
71 | :type n_out: int
72 | :param n_out: number of output units, the dimension of the space in
73 | which the labels lie
74 |
75 | """
76 |
77 | self.n_out = n_out
78 |
79 | if W is None:
80 | # initialize with 0 the weights W as a matrix of shape (n_in, n_out)
81 | self.W = theano.shared(value=numpy.zeros((n_in, n_out),
82 | dtype=theano.config.floatX),
83 | name='W', borrow=True)
84 | else:
85 | self.W = W
86 |
87 | if b is None:
88 | # initialize the baises b as a vector of n_out 0s
89 | self.b = theano.shared(value=numpy.zeros((n_out,),
90 | dtype=theano.config.floatX),
91 | name='b', borrow=True)
92 | else:
93 | self.b = b
94 |
95 | # compute vector of class-membership probabilities in symbolic form
96 | self.p_y_given_x = T.nnet.softmax(T.dot(input, self.W) + self.b)
97 |
98 | # compute prediction as class whose probability is maximal in
99 | # symbolic form
100 | self.y_pred = T.argmax(self.p_y_given_x, axis=1)
101 |
102 | # parameters of the model
103 | self.params = [self.W, self.b]
104 |
105 | def negative_log_likelihood(self, y):
106 | """Return the mean of the negative log-likelihood of the prediction
107 | of this model under a given target distribution.
108 |
109 | .. math::
110 |
111 | \frac{1}{|\mathcal{D}|} \mathcal{L} (\theta=\{W,b\}, \mathcal{D}) =
112 | \frac{1}{|\mathcal{D}|} \sum_{i=0}^{|\mathcal{D}|} \log(P(Y=y^{(i)}|x^{(i)}, W,b)) \\
113 | \ell (\theta=\{W,b\}, \mathcal{D})
114 |
115 | :type y: theano.tensor.TensorType
116 | :param y: corresponds to a vector that gives for each example the
117 | correct label
118 |
119 | Note: we use the mean instead of the sum so that
120 | the learning rate is less dependent on the batch size
121 | """
122 | # y.shape[0] is (symbolically) the number of rows in y, i.e.,
123 | # number of examples (call it n) in the minibatch
124 | # T.arange(y.shape[0]) is a symbolic vector which will contain
125 | # [0,1,2,... n-1] T.log(self.p_y_given_x) is a matrix of
126 | # Log-Probabilities (call it LP) with one row per example and
127 | # one column per class LP[T.arange(y.shape[0]),y] is a vector
128 | # v containing [LP[0,y[0]], LP[1,y[1]], LP[2,y[2]], ...,
129 | # LP[n-1,y[n-1]]] and T.mean(LP[T.arange(y.shape[0]),y]) is
130 | # the mean (across minibatch examples) of the elements in v,
131 | # i.e., the mean log-likelihood across the minibatch.
132 | return -T.mean(T.log(self.p_y_given_x)[T.arange(y.shape[0]), y])
133 |
134 | def errors(self, y):
135 | """Return a float representing the number of errors in the minibatch
136 | over the total number of examples of the minibatch ; zero one
137 | loss over the size of the minibatch
138 |
139 | :type y: theano.tensor.TensorType
140 | :param y: corresponds to a vector that gives for each example the
141 | correct label
142 | """
143 |
144 | # check if y has same dimension of y_pred
145 | if y.ndim != self.y_pred.ndim:
146 | raise TypeError('y should have the same shape as self.y_pred',
147 | ('y', target.type, 'y_pred', self.y_pred.type))
148 | # check if y is of the correct datatype
149 | if y.dtype.startswith('int'):
150 | # the T.neq operator returns a vector of 0s and 1s, where 1
151 | # represents a mistake in prediction
152 | return T.mean(T.neq(self.y_pred, y))
153 | else:
154 | raise NotImplementedError()
155 |
156 | def errorReport(self, y, n):
157 | # compute error rate by class
158 | # check if y has same dimension of y_pred
159 | if y.ndim != self.y_pred.ndim:
160 | raise TypeError('y should have the same shape as self.y_pred',
161 | ('y', target.type, 'y_pred', self.y_pred.type))
162 | # check if y is of the correct datatype
163 | if y.dtype.startswith('int'):
164 | c = numpy.zeros((self.n_out, self.n_out + 1), dtype=numpy.int64)
165 | counts = T.as_tensor_variable(c)
166 | classVector = numpy.zeros(n)
167 | for i in xrange(self.n_out):
168 | othersVector = numpy.zeros(n)
169 | for j in xrange(self.n_out):
170 | counts = theano.tensor.basic.set_subtensor(
171 | counts[i, j],
172 | T.sum(T.and_(T.eq(self.y_pred, othersVector),
173 | T.eq(y, classVector))))
174 | othersVector = othersVector + 1
175 | counts = theano.tensor.basic.set_subtensor(
176 | counts[i, self.n_out],
177 | T.sum(T.eq(y, classVector)))
178 | classVector = classVector + 1
179 | return counts
180 | else:
181 | raise NotImplementedError()
182 |
183 | def load_data(dataset):
184 | ''' Loads the dataset
185 |
186 | :type dataset: string
187 | :param dataset: the path to the dataset (here GAZE)
188 | '''
189 |
190 | #############
191 | # LOAD DATA #
192 | #############
193 |
194 | print '... loading data'
195 |
196 | # Load the dataset
197 | f = gzip.open(dataset, 'rb')
198 | train_set, valid_set, test_set = cPickle.load(f)
199 | f.close()
200 | #train_set, valid_set, test_set format: tuple(input, target)
201 | #input is an numpy.ndarray of 2 dimensions (a matrix)
202 | #witch row's correspond to an example. target is a
203 | #numpy.ndarray of 1 dimensions (vector)) that have the same length as
204 | #the number of rows in the input. It should give the target
205 | #target to the example with the same index in the input.
206 |
207 | def shared_dataset(data_xy, borrow=True):
208 | """ Function that loads the dataset into shared variables
209 |
210 | The reason we store our dataset in shared variables is to allow
211 | Theano to copy it into the GPU memory (when code is run on GPU).
212 | Since copying data into the GPU is slow, copying a minibatch everytime
213 | is needed (the default behaviour if the data is not in a shared
214 | variable) would lead to a large decrease in performance.
215 | """
216 | data_x, data_y = data_xy
217 | shared_x = theano.shared(numpy.asarray(data_x,
218 | dtype=theano.config.floatX),
219 | borrow=borrow)
220 | shared_y = theano.shared(numpy.asarray(data_y,
221 | dtype=theano.config.floatX),
222 | borrow=borrow)
223 | # When storing data on the GPU it has to be stored as floats
224 | # therefore we will store the labels as ``floatX`` as well
225 | # (``shared_y`` does exactly that). But during our computations
226 | # we need them as ints (we use labels as index, and if they are
227 | # floats it doesn't make sense) therefore instead of returning
228 | # ``shared_y`` we will have to cast it to int. This little hack
229 | # lets ous get around this issue
230 | return shared_x, T.cast(shared_y, 'int32')
231 |
232 | test_set_x, test_set_y = shared_dataset(test_set)
233 | valid_set_x, valid_set_y = shared_dataset(valid_set)
234 | train_set_x, train_set_y = shared_dataset(train_set)
235 |
236 | rval = [(train_set_x, train_set_y), (valid_set_x, valid_set_y),
237 | (test_set_x, test_set_y)]
238 | return rval
239 |
240 |
241 | def sgd_optimization_gaze(learning_rate=0.13, n_epochs=1000,
242 | dataset='/home/vishwa/work/dbn/train_utils/data/gaze_data.pkl.gz',
243 | batch_size=400):
244 | """
245 | Demonstrate stochastic gradient descent optimization of a log-linear
246 | model
247 |
248 | This is demonstrated on GAZE data.
249 |
250 | :type learning_rate: float
251 | :param learning_rate: learning rate used (factor for the stochastic
252 | gradient)
253 |
254 | :type n_epochs: int
255 | :param n_epochs: maximal number of epochs to run the optimizer
256 |
257 | :type dataset: string
258 | :param dataset: the path of the GAZE dataset file
259 |
260 | """
261 | datasets = load_data(dataset)
262 |
263 | train_set_x, train_set_y = datasets[0]
264 | valid_set_x, valid_set_y = datasets[1]
265 | test_set_x, test_set_y = datasets[2]
266 |
267 | # compute number of minibatches for training, validation and testing
268 | n_train_batches = train_set_x.get_value(borrow=True).shape[0] / batch_size
269 | n_valid_batches = valid_set_x.get_value(borrow=True).shape[0] / batch_size
270 | n_test_batches = test_set_x.get_value(borrow=True).shape[0] / batch_size
271 |
272 | ######################
273 | # BUILD ACTUAL MODEL #
274 | ######################
275 | print '... building the model'
276 |
277 | # allocate symbolic variables for the data
278 | index = T.lscalar() # index to a [mini]batch
279 | x = T.matrix('x') # the data is presented as rasterized images
280 | y = T.ivector('y') # the labels are presented as 1D vector of
281 | # [int] labels
282 |
283 | # construct the logistic regression class
284 | # Each GAZE image has size 100*100
285 | classifier = LogisticRegression(input=x, n_in=100 * 100, n_out=5)
286 |
287 | # the cost we minimize during training is the negative log likelihood of
288 | # the model in symbolic format
289 | cost = classifier.negative_log_likelihood(y)
290 |
291 | # compiling a Theano function that computes the mistakes that are made by
292 | # the model on a minibatch
293 | test_model = theano.function(inputs=[index],
294 | outputs=classifier.errors(y),
295 | givens={
296 | x: test_set_x[index * batch_size: (index + 1) * batch_size],
297 | y: test_set_y[index * batch_size: (index + 1) * batch_size]})
298 |
299 | validate_model = theano.function(inputs=[index],
300 | outputs=classifier.errors(y),
301 | givens={
302 | x: valid_set_x[index * batch_size:(index + 1) * batch_size],
303 | y: valid_set_y[index * batch_size:(index + 1) * batch_size]})
304 |
305 | # compute the gradient of cost with respect to theta = (W,b)
306 | g_W = T.grad(cost=cost, wrt=classifier.W)
307 | g_b = T.grad(cost=cost, wrt=classifier.b)
308 |
309 | # specify how to update the parameters of the model as a dictionary
310 | updates = {classifier.W: classifier.W - learning_rate * g_W,
311 | classifier.b: classifier.b - learning_rate * g_b}
312 |
313 | # compiling a Theano function `train_model` that returns the cost, but in
314 | # the same time updates the parameter of the model based on the rules
315 | # defined in `updates`
316 | train_model = theano.function(inputs=[index],
317 | outputs=cost,
318 | updates=updates,
319 | givens={
320 | x: train_set_x[index * batch_size:(index + 1) * batch_size],
321 | y: train_set_y[index * batch_size:(index + 1) * batch_size]})
322 |
323 | ###############
324 | # TRAIN MODEL #
325 | ###############
326 | print '... training the model'
327 | # early-stopping parameters
328 | patience = 5000 # look as this many examples regardless
329 | patience_increase = 2 # wait this much longer when a new best is
330 | # found
331 | improvement_threshold = 0.995 # a relative improvement of this much is
332 | # considered significant
333 | validation_frequency = min(n_train_batches, patience / 2)
334 | # go through this many
335 | # minibatche before checking the network
336 | # on the validation set; in this case we
337 | # check every epoch
338 |
339 | best_params = None
340 | best_validation_loss = numpy.inf
341 | test_score = 0.
342 | start_time = time.clock()
343 |
344 | done_looping = False
345 | epoch = 0
346 | while (epoch < n_epochs) and (not done_looping):
347 | epoch = epoch + 1
348 | for minibatch_index in xrange(n_train_batches):
349 |
350 | minibatch_avg_cost = train_model(minibatch_index)
351 | # iteration number
352 | iter = epoch * n_train_batches + minibatch_index
353 |
354 | if (iter + 1) % validation_frequency == 0:
355 | # compute zero-one loss on validation set
356 | validation_losses = [validate_model(i)
357 | for i in xrange(n_valid_batches)]
358 | this_validation_loss = numpy.mean(validation_losses)
359 |
360 | print('epoch %i, minibatch %i/%i, validation error %f %%' % \
361 | (epoch, minibatch_index + 1, n_train_batches,
362 | this_validation_loss * 100.))
363 |
364 | # if we got the best validation score until now
365 | if this_validation_loss < best_validation_loss:
366 | #improve patience if loss improvement is good enough
367 | if this_validation_loss < best_validation_loss * \
368 | improvement_threshold:
369 | patience = max(patience, iter * patience_increase)
370 |
371 | best_validation_loss = this_validation_loss
372 | # test it on the test set
373 |
374 | test_losses = [test_model(i)
375 | for i in xrange(n_test_batches)]
376 | test_score = numpy.mean(test_losses)
377 |
378 | print((' epoch %i, minibatch %i/%i, test error of best'
379 | ' model %f %%') %
380 | (epoch, minibatch_index + 1, n_train_batches,
381 | test_score * 100.))
382 |
383 | if patience <= iter:
384 | done_looping = True
385 | break
386 |
387 | end_time = time.clock()
388 | print(('Optimization complete with best validation score of %f %%,'
389 | 'with test performance %f %%') %
390 | (best_validation_loss * 100., test_score * 100.))
391 | print 'The code run for %d epochs, with %f epochs/sec' % (
392 | epoch, 1. * epoch / (end_time - start_time))
393 | print >> sys.stderr, ('The code for file ' +
394 | os.path.split(__file__)[1] +
395 | ' ran for %.1fs' % ((end_time - start_time)))
396 |
397 | if __name__ == '__main__':
398 | sgd_optimization_gaze()
399 |
--------------------------------------------------------------------------------
/dl/pickler.py:
--------------------------------------------------------------------------------
1 | """
2 | pickler.py
3 | This file implements a pickler for the image data. The input files are
4 | expected to be in the MNIST dataset format. The output is a set of files
5 | in the Theano data format.
6 |
7 | Please refer to the MNIST documentation for the input format.
8 | The output format consists of a numpy array of tuples, the first is the
9 | training data, the second is the validation data and the third is the
10 | test data. Each tuple consists of an array of image vectors, where each
11 | vector is a floating point pixel value for image pixels stored in
12 | row-major, and an array of labels for each image vector.
13 |
14 | Since cPickle has limits on the sizes of the arrays that it can process
15 | in memory, we generate chunks of the arrays in files named as follows,
16 |
17 | _train_%d.gz for the training data
18 | _valid_%d.gz for the validation data
19 | _test_%d.gz for the test data
20 |
21 | The prefix we use is typically gaze_data_pkl.
22 |
23 | We set an arbitrary limit of 8000 datapoints and labels per file.
24 | """
25 | __docformat__ = 'restructedtext en'
26 |
27 | import datetime, shutil, glob, sys, os, thread
28 | import time, math, re, unicodedata, struct
29 | import cPickle, gzip
30 | import numpy
31 |
32 | global trainDataFile
33 | global trainLabelFile
34 | global validDataFile
35 | global validLabelFile
36 | global testDataFile
37 | global testLabelFile
38 |
39 | def readMagic(fileHandle):
40 | fileHandle.seek(0)
41 | return struct.unpack('i', fileHandle.read(4))
42 |
43 | def readLength(fileHandle):
44 | fileHandle.seek(4)
45 | return struct.unpack('i', fileHandle.read(4))
46 |
47 | def readWidth(fileHandle):
48 | fileHandle.seek(8)
49 | return struct.unpack('i', fileHandle.read(4))
50 |
51 | def readHeight(fileHandle):
52 | fileHandle.seek(12)
53 | return struct.unpack('i', fileHandle.read(4))
54 |
55 | def readByte(fileHandle):
56 | return struct.unpack('B', fileHandle.read(1))
57 |
58 | def loadData(dataFile, labelFile):
59 | """
60 | method used to load the input file that is expected to be in the MNIST
61 | data format
62 | """
63 | dataFileHandle = open(dataFile, 'rb')
64 | labelFileHandle = open(labelFile, 'rb')
65 | length = readLength(dataFileHandle)[0]
66 | width = readWidth(dataFileHandle)[0]
67 | height = readHeight(dataFileHandle)[0]
68 |
69 | data = numpy.zeros((length, width * height), numpy.float32)
70 | labels = numpy.zeros((length), numpy.int64)
71 |
72 | dataFileHandle.seek(16)
73 | labelFileHandle.seek(8)
74 | labelFmt = ''
75 | for i in range(length):
76 | labelFmt += 'B'
77 | labelBytes = numpy.int64(struct.unpack(labelFmt, labelFileHandle.read(length)))
78 | dataFmt = ''
79 | for i in range(width * height):
80 | dataFmt += 'B'
81 | for i in range(length):
82 | dataBytes = numpy.float32(struct.unpack(dataFmt,\
83 | dataFileHandle.read(width * height)))
84 | data[i] = numpy.divide(dataBytes, 255.0)
85 | labels[i] = labelBytes[i] - 1
86 | # print(data[i])
87 | # print(labels[i])
88 | sys.stdout.write('.')
89 | sys.stdout.write('\n')
90 | dataFileHandle.close()
91 | labelFileHandle.close()
92 | return (data, labels)
93 |
94 | def getLists(directory, suffix):
95 | """
96 | helper method used within the Theano code to load data from MNIST
97 | to Theano format, when the entire data can be encapsulated in a
98 | single file
99 | """
100 | dataFileName = directory + '/data-train-' + suffix
101 | labelFileName = directory + '/label-train-' + suffix
102 | trainList = loadData(dataFileName, labelFileName)
103 |
104 | dataFileName = directory + '/data-valid-' + suffix
105 | labelFileName = directory + '/label-valid-' + suffix
106 | validList = loadData(dataFileName, labelFileName)
107 |
108 | dataFileName = directory + '/data-test-' + suffix
109 | labelFileName = directory + '/label-test-' + suffix
110 | testList = loadData(dataFileName, labelFileName)
111 |
112 | return (trainList, validList, testList)
113 |
114 | def getPickledLists(directory, prefix, nFiles):
115 | """
116 | method used to load data from the MNIST data format to Theano data
117 | format when just the training data is chunked into multiple files,
118 | but the validation and test is in a single file
119 | """
120 | for i in xrange(nFiles):
121 | dataFileName = directory + '/' + prefix + ('_train_%d.gz'%i)
122 | f = gzip.open(dataFileName, 'rb')
123 | trainList = cPickle.load(f)
124 | x, y = trainList
125 | if (i == 0):
126 | data_x = x
127 | data_y = y
128 | else:
129 | data_x = numpy.concatenate((data_x, x))
130 | data_y = numpy.concatenate((data_y, y))
131 | f.close()
132 |
133 | trainList = (data_x, data_y)
134 |
135 | f = gzip.open(directory + '/' + prefix + '.gz')
136 | validList, testList = cPickle.load(f)
137 | f.close()
138 |
139 | return (trainList, validList, testList)
140 |
141 | def getPickledList(directory, prefix, nFiles):
142 | """
143 | method that is used to load a particular set of data files,
144 | either training, validation or test. This method expects as
145 | inputs the directory where the files are stored, the prefix
146 | to use and the number of files that contain the chunked
147 | data sets.
148 | """
149 | for i in xrange(nFiles):
150 | dataFileName = directory + '/' + prefix + ('%d.gz'%i)
151 | f = gzip.open(dataFileName, 'rb')
152 | dataList = cPickle.load(f)
153 | x, y = dataList
154 | if (i == 0):
155 | data_x = x
156 | data_y = y
157 | else:
158 | data_x = numpy.concatenate((data_x, x))
159 | data_y = numpy.concatenate((data_y, y))
160 | f.close()
161 |
162 | return (data_x, data_y)
163 |
164 | def pickleMeThis(outFile):
165 | """
166 | method used to pickle data in the Theano format. Since the pickler
167 | has limits on data size, we chunk the data arrays into 8000
168 | (arbitrary) elements per chunk and store them as pickled files.
169 | """
170 | global trainDataFile
171 | global trainLabelFile
172 | global validDataFile
173 | global validLabelFile
174 | global testDataFile
175 | global testLabelFile
176 |
177 | trainList = loadData(trainDataFile, trainLabelFile)
178 | validList = loadData(validDataFile, validLabelFile)
179 | testList = loadData(testDataFile, testLabelFile)
180 |
181 | # checking data that was loaded
182 | print("length of trainList[0] = %d"%len(trainList[0]))
183 | print("length of trainList[1] = %d"%len(trainList[1]))
184 | print("length of trainList[0][1] = %d"%len(trainList[0][1]))
185 | print("label for trainList[0][1] = %d"%(trainList[1][1]))
186 |
187 | print("length of validList[0] = %d"%len(validList[0]))
188 | print("length of validList[1] = %d"%len(validList[1]))
189 | print("length of validList[0][1] = %d"%len(validList[0][1]))
190 | print("label for validList[0][1] = %d"%(validList[1][1]))
191 |
192 | print("length of testList[0] = %d"%len(testList[0]))
193 | print("length of testList[1] = %d"%len(testList[1]))
194 | print("length of testList[0][1] = %d"%len(testList[0][1]))
195 | print("label for testList[0][1] = %d"%(testList[1][1]))
196 |
197 | index = 0
198 | data_x, data_y = trainList
199 | for i in range(0, len(trainList[0]), 8000):
200 | trainFileHandle = gzip.open(outFile + ('_train_%d.gz'%index), 'wb')
201 | lb = i
202 | rb = min(lb + 8000, len(trainList[0]))
203 | fragment = (data_x[lb:rb], data_y[lb:rb])
204 | cPickle.dump(fragment, trainFileHandle)
205 | trainFileHandle.close()
206 | index = index + 1
207 |
208 | index = 0
209 | data_x, data_y = validList
210 | for i in range(0, len(validList[0]), 8000):
211 | validFileHandle = gzip.open(outFile + ('_valid_%d.gz'%index), 'wb')
212 | lb = i
213 | rb = min(lb + 8000, len(validList[0]))
214 | fragment = (data_x[lb:rb], data_y[lb:rb])
215 | cPickle.dump(fragment, validFileHandle)
216 | validFileHandle.close()
217 | index = index + 1
218 |
219 | index = 0
220 | data_x, data_y = testList
221 | for i in range(0, len(testList[0]), 8000):
222 | testFileHandle = gzip.open(outFile + ('_test_%d.gz'%index), 'wb')
223 | lb = i
224 | rb = min(lb + 8000, len(testList[0]))
225 | fragment = (data_x[lb:rb], data_y[lb:rb])
226 | cPickle.dump(fragment, testFileHandle)
227 | testFileHandle.close()
228 | index = index + 1
229 |
230 | if __name__ == "__main__":
231 | global trainDataFile
232 | global trainLabelFile
233 | global validDataFile
234 | global validLabelFile
235 | global testDataFile
236 | global testLabelFile
237 |
238 | trainDataFile = sys.argv[1]
239 | trainLabelFile = sys.argv[2]
240 | validDataFile = sys.argv[3]
241 | validLabelFile = sys.argv[4]
242 | testDataFile = sys.argv[5]
243 | testLabelFile = sys.argv[6]
244 | outFile = sys.argv[7]
245 |
246 | pickleMeThis(outFile)
247 |
248 |
--------------------------------------------------------------------------------
/scripts/process.sh:
--------------------------------------------------------------------------------
1 | #!/bin/sh
2 |
3 | /home/vishwa/work/dbn/train_utils/install/bin/xmlToIDX -o ubyte -r 0.80 -v 0.05 -usebins -d /media/CESAR-EXT02/VCode/CESAR_May-Fri-25-17-05-43-2012 -d /media/CESAR-EXT02/VCode/CESAR_May-Fri-25-14-55-42-2012 -d /media/CESAR-EXT02/VCode/CESAR_May-Fri-25-11-10-26-2012 -d /media/CESAR-EXT02/VCode/CESAR_May-Fri-11-11-00-50-2012
4 |
5 | python pickler.py data-train-ubyte label-train-ubyte data-valid-ubyte label-valid-ubyte data-test-ubyte label-test-ubyte gaze_data.pkl
6 |
7 |
--------------------------------------------------------------------------------
/utils/src/Annotations.cpp:
--------------------------------------------------------------------------------
1 | // Annotations.cpp
2 | // This file contains the implementation of class Annotations.
3 | // This class is used to read an annotations XML doc and provide an
4 | // iterator to the frame annotations.
5 |
6 | #include
7 | #include
8 | #include
9 | #include
10 | #include
11 | #include
12 |
13 | #include
14 | #include
15 | #include
16 | #include
17 |
18 | #include "Annotations.h"
19 |
20 | // Construction and desctruction
21 |
22 | Annotations::Annotations() {
23 | framesDirectory = "";
24 | center.x = Globals::imgWidth / 2;
25 | center.y = Globals::imgHeight / 2;
26 | useBins = false;
27 | }
28 |
29 | Annotations::~Annotations()
30 | {
31 | // delete annotations
32 | for (unsigned int i = 0; i < frameAnnotations.size(); i++)
33 | delete frameAnnotations[i];
34 | }
35 |
36 | // getData
37 | // This function takes as input a string and returns an annotation tag
38 | // corresponding to the annotation. It fills the CvPoint object with
39 | // the data read from that line
40 |
41 | Annotations::Tag Annotations::getData(string str, CvPoint& point) {
42 | const char* token = strtok((char*)str.c_str(), " <>");
43 | if (token) {
44 | if (!strcmp(token, "/frame"))
45 | return EndFrame;
46 | else if (!strcmp(token, "annotations")) {
47 | token = strtok(NULL, " <>\"");
48 | if (token && !strncmp(token, "dir=", 4)) {
49 | token = strtok(NULL, " <>\"");
50 | if (!token) {
51 | string err = "Annotations::getData. Malformed annotations.xml. No directory name.";
52 | throw err;
53 | }
54 | framesDirectory = token;
55 | }
56 | token = strtok(NULL, " <>\"");
57 | if (token && !strncmp(token, "center=", 7)) {
58 | token = strtok(NULL, " <>\"");
59 | if (!token) {
60 | string err = "Annotations::getData. Malformed annotations.xml. No center.";
61 | throw err;
62 | }
63 | char* chP = (char*)strchr(token, ',');
64 | if (!chP) {
65 | string err = "Annotations::getData. Malformed annotations.xml. No center.";
66 | throw err;
67 | }
68 | *chP = 0;
69 | chP++;
70 | center.x = atoi(token);
71 | center.y = atoi(chP);
72 | }
73 | return Root;
74 | } else if (!strcmp(token, "frame"))
75 | return Frame;
76 | else if (!strcmp(token, "frameNumber")) {
77 | token = strtok(NULL, " <>");
78 | point.x = (token)? atoi(token) : 0;
79 | return FrameNumber;
80 | } else if (!strcmp(token, "zone")) {
81 | token = strtok(NULL, " <>");
82 | point.x = (token)? atoi(token) : 0;
83 | /*
84 | if (point.x == 2)
85 | point.x = 1;
86 | else if (point.x == 3)
87 | point.x = 2;
88 | else if (point.x == 4 || point.x == 5)
89 | point.x = 3;
90 | */
91 | return Orientation;
92 | } else if (!strcmp(token, "status")) {
93 | token = strtok(NULL, " <>");
94 | point.x = (token)? atoi(token) : 0;
95 | return CarStatus;
96 | } else if (!strcmp(token, "intersection")) {
97 | token = strtok(NULL, " <>");
98 | point.x = (token)? atoi(token) : 0;
99 | return IntersectionType;
100 | } else if (token[0] != '/') {
101 | string tag = token;
102 |
103 | token = strtok(NULL, " <>");
104 | const char* field = strtok((char*)token, ",");
105 | point.y = (field)? atoi(field) : 0;
106 | field = strtok(NULL, ",");
107 | point.x = (field)? atoi(field) : 0;
108 |
109 | if (tag == "face") {
110 | return Face;
111 | }
112 | }
113 | }
114 | return Ignore;
115 | }
116 |
117 | // trimEnds
118 | // The following method is called to trim the ends of sections of frames that
119 | // are labelled by sector. The number of frames to trim at either end is
120 | // specified through parameter nTrim. We do this to remove badly labelled
121 | // frames at the ends of sections. Since the labelling is contiguous, we
122 | // are seeing wrongly labelled images at the ends of each section of frames
123 | // by sector. This is particularly acute for sector 3 of which we have
124 | // an enormous number
125 |
126 | void Annotations::trimEnds(int sectorToTrim, int nTrim) {
127 | vector fas;
128 | int prevSector = Annotations::UnknownSector;
129 | unsigned int i = 0;
130 | while (i < frameAnnotations.size()) {
131 | FrameAnnotation* fa = frameAnnotations[i];
132 |
133 | int sector = fa->getSector();
134 | if (prevSector != sector && sector == sectorToTrim) {
135 | unsigned int start = i;
136 | unsigned int end = i;
137 | for (unsigned int j = i; j < frameAnnotations.size(); j++) {
138 | fa = frameAnnotations[j];
139 | if (fa->getSector() != sectorToTrim) {
140 | end = j - 1;
141 | break;
142 | }
143 | }
144 | // we now have the indices for the range of frames with
145 | // sector equal to sectorToTrim
146 | for (unsigned int j = start + nTrim; j < end - nTrim; j++) {
147 | fa = frameAnnotations[j];
148 | fas.push_back(new FrameAnnotation(fa));
149 | }
150 | i = end;
151 | } else {
152 | fas.push_back(new FrameAnnotation(fa));
153 | }
154 | i++;
155 | prevSector = sector;
156 | }
157 | // now delete and copy frames into frameAnnotations
158 | for (unsigned int i = 0; i < frameAnnotations.size(); i++)
159 | delete frameAnnotations[i];
160 | frameAnnotations.clear();
161 | for (unsigned int i = 0; i < fas.size(); i++)
162 | frameAnnotations.push_back(fas[i]);
163 | }
164 |
165 | // readAnnotations
166 | // The following method reads an XML file and populates the annotations vector
167 |
168 | void Annotations::readAnnotations(string& filename) {
169 | ifstream file;
170 |
171 | file.open((const char*)filename.c_str());
172 | if (file.good()) {
173 | string line;
174 | int nFrame = 0;
175 | int sector = UnknownSector;
176 | int status = UnknownStatus;
177 | int intersection = UnknownIntersection;
178 |
179 | CvPoint temp, face;
180 | face.x = face.y = 0;
181 |
182 | getline(file, line); // ignore the first line
183 |
184 | while (!file.eof()) {
185 | getline(file, line);
186 | Tag tag = getData(line, temp);
187 | switch (tag) {
188 | case FrameNumber: {
189 | nFrame = temp.x;
190 | break;
191 | }
192 | case Orientation: {
193 | sector = temp.x;
194 | break;
195 | }
196 | case CarStatus: {
197 | status = temp.x;
198 | break;
199 | }
200 | case IntersectionType: {
201 | intersection = temp.x;
202 | break;
203 | }
204 | case Face: {
205 | face.x = temp.x;
206 | face.y = temp.y;
207 | break;
208 | }
209 | case EndFrame: {
210 | if (face.x && face.y) {
211 | FrameAnnotation* annotation =
212 | new FrameAnnotation(nFrame, face, sector, status, intersection);
213 | frameAnnotations.push_back(annotation);
214 | face.x = face.y = 0;
215 | } else {
216 | FrameAnnotation* annotation =
217 | new FrameAnnotation(nFrame, center /* face */, sector, status, intersection);
218 | frameAnnotations.push_back(annotation);
219 | }
220 | break;
221 | }
222 | default: {
223 | continue;
224 | }
225 | }
226 | }
227 | }
228 | }
229 |
230 | // createBins
231 | // The following method is used to create bins for the five gaze sectors, with
232 | // the same number of frames per sector. Since data is typically disproportionately
233 | // skewed towards straigh-ahead, the models are over-trained for that sector
234 |
235 | void Annotations::createBins() {
236 | // first set the useBins flag so that we return the binned annotations on
237 | // subsequent calls for annotations
238 | useBins = true;
239 |
240 | int nBins = Globals::numZones;
241 |
242 | // create a counter for each bin and reset it
243 | int count[nBins];
244 | for (int i = 0; i < nBins; i++) {
245 | vector* bin = new vector();
246 | bins.push_back(bin);
247 | count[i] = 0;
248 | }
249 |
250 | // now iterate over all annotations and place them in their bin based
251 | // on their zone
252 | for (unsigned int i = 0; i < frameAnnotations.size(); i++) {
253 | FrameAnnotation* fa = frameAnnotations[i];
254 | int index = fa->getSector() - 1;
255 | if (index < 0 || index > 4)
256 | continue;
257 |
258 | bins[index]->push_back(fa);
259 | count[index]++;
260 | }
261 |
262 | // Get the smallest bin size
263 | int sampleSize = INT_MAX;
264 | for (int i = 0; i < nBins; i++)
265 | if (sampleSize > count[i] && count[i])
266 | sampleSize = count[i];
267 |
268 | cout << "Creating " << sampleSize << " frames for each sector" << endl;
269 |
270 | // We now create a set of sampleSize * nBins frame annotations in the unif
271 | // vector
272 | for (int i = 0; i < nBins; i++) {
273 | // shuffle images in each bin before picking the first sampleSize images
274 | random_shuffle(bins[i]->begin(), bins[i]->end());
275 |
276 | // for now pick the first sampleSize elements from each bin
277 | for (int j = 0; j < sampleSize; j++) {
278 | if (bins[i]->size()) {
279 | unif.push_back(bins[i]->back());
280 | bins[i]->pop_back();
281 | }
282 | }
283 | // Now that we are done with bin i, destroy it
284 | delete bins[i];
285 | }
286 |
287 | // final shuffle of the collection of all images
288 | random_shuffle(unif.begin(), unif.end());
289 |
290 | cout << "Pulled " << unif.size() << " frames from the dataset" << endl;
291 | }
292 |
--------------------------------------------------------------------------------
/utils/src/Annotations.h:
--------------------------------------------------------------------------------
1 | #ifndef __ANNOTATIONS_H
2 | #define __ANNOTATIONS_H
3 |
4 | // Annotations.h
5 | // This file contains the definition of class Annotations. It provides an interface
6 | // to read and access the frames stored in an annotations file.
7 |
8 | #include
9 | #include
10 |
11 | #include "Globals.h"
12 |
13 | // openCV stuff
14 | #include
15 | #include
16 |
17 | using namespace std;
18 |
19 | // forward declaration
20 | class FrameAnnotation;
21 |
22 | class Annotations {
23 | private:
24 | // the directory containing the frames for a given annotations.xml file
25 | string framesDirectory;
26 |
27 | // the center of the face in sector 3 (straight ahead)
28 | CvPoint center;
29 |
30 | // use bins flag
31 | bool useBins;
32 |
33 | // the set of all annotations
34 | vector frameAnnotations;
35 |
36 | // bins for annotations. We want to only return annotations that are uniformly
37 | // distributed across the image regions within which the LOIs occur. This
38 | // ensures that the training is unbiased
39 | vector* > bins;
40 |
41 | // the set of annotations after binning. We will have min(bin sizes) * number of bins
42 | // annotations that will be placed in this vector
43 | vector unif;
44 |
45 | public:
46 | enum Tag {
47 | Root,
48 | Frame,
49 | FrameNumber,
50 | Face,
51 | Orientation,
52 | CarStatus,
53 | IntersectionType,
54 | EndFrame,
55 | Ignore
56 | };
57 | enum Sector {
58 | DriverWindow = 1,
59 | LeftOfCenter,
60 | Center,
61 | RightOfCenter,
62 | PassengerWindow,
63 | CoPilot,
64 | OverLeftShoulder,
65 | OverRightShoulder,
66 | UnknownSector = 9
67 | };
68 | enum Status {
69 | stationaryIgnore = 1,
70 | stationaryParked,
71 | stationaryAtIntersection,
72 | movingAtIntersection,
73 | movingInCarPark,
74 | movingOnStreet,
75 | UnknownStatus = 7
76 | };
77 | enum Intersection {
78 | FourWay = 1,
79 | TJunction,
80 | CarParkExit,
81 | UnknownIntersection = 4
82 | };
83 |
84 | Annotations();
85 | ~Annotations();
86 |
87 | void readAnnotations(string& filename);
88 | void trimEnds(int sector, int nTrim);
89 | void createBins();
90 | CvPoint& getCenter() { return center; }
91 | string getFramesDirectory() { return framesDirectory; }
92 |
93 | vector& getFrameAnnotations() {
94 | return useBins? unif : frameAnnotations;
95 | }
96 |
97 | private:
98 | Annotations::Tag getData(string str, CvPoint& point);
99 | };
100 |
101 | // Frame annotations class
102 |
103 | class FrameAnnotation {
104 | private:
105 | int nFrame;
106 | CvPoint face;
107 | int sector;
108 | int status;
109 | int intersection;
110 |
111 | public:
112 | FrameAnnotation() {
113 | nFrame = 0;
114 | face.x = face.y = 0;
115 | sector = Annotations::UnknownSector;
116 | status = Annotations::UnknownStatus;
117 | intersection = Annotations::UnknownIntersection;
118 | }
119 | FrameAnnotation(FrameAnnotation& fa) {
120 | nFrame = fa.nFrame;
121 | face.x = fa.face.x;
122 | face.y = fa.face.y;
123 | sector = fa.sector;
124 | status = fa.status;
125 | intersection = fa.intersection;
126 | }
127 | FrameAnnotation(int frame, CvPoint& f, int s, int st, int intx) {
128 | nFrame = frame;
129 | face.x = f.x; face.y = f.y;
130 | sector = s;
131 | status = st;
132 | intersection = intx;
133 | }
134 | FrameAnnotation(FrameAnnotation* fa) {
135 | nFrame = fa->getFrameNumber();
136 | setFace(fa->getFace());
137 | sector = fa->getSector();
138 | status = fa->getStatus();
139 | intersection = fa->getIntersection();
140 | }
141 |
142 | int getFrameNumber() { return nFrame; }
143 | CvPoint& getFace() { return face; }
144 | int getSector() { return sector; }
145 | int getStatus() { return status; }
146 | int getIntersection() { return intersection; }
147 |
148 | CvPoint& getLOI(Annotations::Tag tag) {
149 | switch (tag) {
150 | case Annotations::Face:
151 | return getFace();
152 | default: {
153 | string err = "FrameAnnotation::getLOI. Unknown tag.";
154 | throw(err);
155 | }
156 | }
157 | }
158 |
159 | // set functions
160 | void setFace(CvPoint& point) {
161 | face.x = point.x;
162 | face.y = point.y;
163 | }
164 |
165 | // print functions
166 | void print() {
167 | cout << "Frame: (" << nFrame << ") ";
168 | cout << "Face: (" << face.x << ", " << face.y << ") ";
169 | cout << "Sector: (" << sector << ") ";
170 | cout << "Status: (" << status << ") ";
171 | cout << "Intersection: (" << intersection << ")" << endl;
172 | }
173 | };
174 |
175 | #endif
176 |
--------------------------------------------------------------------------------
/utils/src/DataStream.py:
--------------------------------------------------------------------------------
1 | # DataStream.py
2 |
3 | """
4 | This is a smaller version of the DataStream module containing only (part of) the
5 | function to read synchronized_data_streams.xml files
6 |
7 | Example:
8 | d = read_xml('/data01/processed_for_annotation/CESAR_May-Tue-29-18-55-16-2012/
9 | synchronized_data_streams.xml')
10 | print d['gaze'][0] #(video_filename, frame_number, age_of_sample)
11 | """
12 |
13 | import sys, os
14 | import unicodedata
15 |
16 | from subprocess import call
17 | from xml.dom.minidom import *
18 |
19 | sector = {'Driver Window': 1, 'Left of Center': 2, 'Center': 3, 'Right of Center': 4,
20 | 'Passenger Window':5, 'Copilot':6, 'OverRightShoulder':7, 'OverLeftShoulder':8,
21 | 'Other': 9};
22 | status = {'stationary-Ignore': 1, 'stationary-Parked': 2, 'stationary-AtIntersection': 3,
23 | 'moving-AtInterSection': 4, 'moving-InCarPark': 5, 'moving-OnStreet': 6,
24 | 'Other': 7};
25 | intersection = {'4way': 1, 'Tjunction': 2, 'CarParkExit': 3, 'other': 4};
26 |
27 | #sector = {'driverWindow': 1, 'leftOfCenter': 2, 'center': 3, 'rightOfCenter': 4,
28 | # 'passengerWindow':5, 'coPilot':6, 'overRightShoulder':7, 'overLeftShoulder':8,
29 | # 'Other': 9};
30 | #status = {'sIgnore': 1, 'sParked': 2, 'sAtIntersection': 3,
31 | # 'mAtInterSection': 4, 'mCarPark': 5, 'mOnStreet': 6,
32 | # 'Other': 7};
33 | #intersection = {'4way': 1, 'Tjunction': 2, 'carParkExit': 3, 'other': 4};
34 |
35 | annotations = {}
36 |
37 | def read_xml(xml_filename):
38 | """
39 | Read in a synchronized_data_streams.xml file,
40 |
41 | Return a dict of dicts. Return_value[stream_name][t] =
42 | (video_filename, frame_number, age)
43 | """
44 | print 'reading xml DataStream'
45 | doc=parse(xml_filename)
46 | header = doc.childNodes[0].childNodes[1]
47 | sync_points = [n for n in doc.childNodes[0].childNodes if n.nodeName == 'sync_point']
48 |
49 | data_types = dict()
50 | paced_time_data_maps = dict()
51 | ages = dict()
52 | stream_identifiers = [n for n in header.childNodes if n.nodeName == 'stream']
53 | stream_names = []
54 |
55 | for s in stream_identifiers:
56 | name = s.getAttribute('sensor_name')
57 | data_type = s.getAttribute('data_type')
58 | if not data_type == 'video_file_reference:frame_number':
59 | continue
60 | stream_names.append(name)
61 | data_types[name] = data_type
62 | paced_time_data_maps[name] = dict()
63 | ages[name] = dict()
64 | print 'stream names:', stream_names
65 |
66 | for p in sync_points:
67 | t = float(p.getAttribute('timestamp'))
68 | for sample in [n for n in p.childNodes if n.nodeName == 'sample']:
69 | stream_name = sample.getAttribute('stream')
70 | if not stream_name in stream_names:
71 | continue
72 | data = sample.getAttribute('data')
73 | #print 'DATA:', data
74 | video_filename = data.split(':')[0]
75 | frame_number = int(data.split(':')[1])
76 | age = sample.getAttribute('age')
77 | paced_time_data_maps[stream_name][t] = (video_filename, frame_number, age)
78 |
79 | ans = dict()
80 | for s in stream_names:
81 | ans[s] = paced_time_data_maps[s]
82 |
83 | return ans
84 |
85 | def find(lb, ub, lIndex, rIndex, stream, keys):
86 | key = keys[lIndex]
87 | value = stream[key]
88 | key = key + float(unicode.decode(value[2]))
89 | if (key >= lb and key <= ub):
90 | return lIndex
91 | key = keys[rIndex]
92 | value = stream[key]
93 | key = key + float(unicode.decode(value[2]))
94 | if (key >= lb and key <= ub):
95 | return rIndex
96 | if (lIndex == rIndex):
97 | return None
98 | index = (lIndex + rIndex) / 2
99 | key = keys[index]
100 | value = stream[key]
101 | key = key + float(unicode.decode(value[2]))
102 | if (key >= lb and key <= ub):
103 | return index
104 | if (key < lb):
105 | return find(lb, ub, index + 1, rIndex, stream, keys)
106 | return find(lb, ub, lIndex, index - 1, stream, keys)
107 |
108 | def insertAnnotation(key, value, ann):
109 | # print '%f = %s'%(key, value)
110 | aType, aVal = ann
111 | f, frame, delta = value
112 | fileName = unicode.decode(f)
113 | annotationKey = '%s_%0*d'%(fileName, 4, frame)
114 | if annotationKey in annotations:
115 | annotations[annotationKey][aType] = aVal
116 | else:
117 | annotations[annotationKey] = {}
118 | annotations[annotationKey]['timestamp'] = key
119 | annotations[annotationKey]['sector'] = 9
120 | annotations[annotationKey]['status'] = 7
121 | annotations[annotationKey]['intersection'] = 4
122 | annotations[annotationKey][aType] = aVal
123 |
124 | # Returns a list of frames that are included in a given time range. Since the
125 | # data stream is sorted by timestamp. The data stream keys are timestamps that
126 | # may or may not fall within the range passed to this function. We search through
127 | # the sequence of gaze data using binary search
128 |
129 | def getIncludedFrames(lb, ub, ann, stream, keys):
130 | length = len(keys)
131 | # find the first key in stream that falls within the range requested
132 | index = find(lb, ub, 0, length - 1, stream, keys)
133 | if index is not None:
134 | key = keys[index]
135 | value = stream[key]
136 | key = key + float(unicode.decode(value[2]))
137 | insertAnnotation(key, value, ann)
138 | i = index
139 | while i > 0:
140 | i = i - 1
141 | key = keys[i]
142 | value = stream[key]
143 | key = key + float(unicode.decode(value[2]))
144 | if key >= lb and key <= ub:
145 | insertAnnotation(key, value, ann)
146 | else:
147 | break
148 | i = index
149 | while i < length - 1:
150 | i = i + 1
151 | key = keys[i]
152 | value = stream[key]
153 | key = key + float(unicode.decode(value[2]))
154 | if key >= lb and key <= ub:
155 | insertAnnotation(key, value, ann)
156 | else:
157 | break
158 |
159 | def getAnnotation(token):
160 | try:
161 | key, value = 'sector', sector[token]
162 | except KeyError:
163 | try:
164 | key, value = 'status', status[token]
165 | except KeyError:
166 | key, value = 'intersection', intersection[token]
167 | return key, value
168 |
169 | def read_vcode(fileName):
170 | f = open(fileName, 'r')
171 | # ignore the first four lines
172 | for i in range(4):
173 | line = f.readline()
174 | line = f.readline()
175 | data = {}
176 | while line:
177 | tokens = line.split(',')
178 | lb = float(tokens[0])
179 | rb = lb + float(tokens[1])
180 | try:
181 | data[lb] = (rb, getAnnotation(tokens[2]))
182 | except (KeyError):
183 | pass
184 | line = f.readline()
185 | return data
186 |
187 | def blowUpVideo(fileName):
188 | call(["/bin/mkdir", "-p", "_temp"])
189 | call(["/bin/rm", "-f", "_temp/*.bmp"])
190 | call(["/bin/rm", "-f", "/run/shm/gaze*"])
191 | imageNames = '_temp/frame_%d.bmp'
192 | avibz2 = '/run/shm/gaze.avi.bz2'
193 | call(["/bin/cp", "-f", fileName + '.bz2', avibz2])
194 | call(["bunzip2", "-f", avibz2])
195 | call(["ffmpeg", "-i", "/run/shm/gaze.avi", "-sameq", imageNames])
196 |
197 | def buildAnnotations(keys, datasetDir, prefix, outputDir):
198 | destFrame = 1
199 | currentFileName = ""
200 | call(["mkdir", "-p", outputDir])
201 | annotationsFileName = outputDir + '/annotations.xml'
202 | annotationsFile = open(annotationsFileName, 'w')
203 | annotationsFile.write('\n')
204 | annotationsFile.write('\n')
205 | lenOfKeys = len(keys)
206 | for i in range(lenOfKeys):
207 | aKey = keys[i]
208 | value = annotations[aKey]
209 | # ignore the first 10 and last 10 frames for sector 3. The problem
210 | # is that we have a disproportionate number of sector 3 frames.
211 | # The end points of these sections of the video have frames that
212 | # are labelled either 2 or 4 in the annotations, which generate
213 | # a large number of badly labelled sector 2 and 4 frames, not to
214 | # mention incorrect section 3 labellings. By ignoring the first 10
215 | # and last 10 frames we hope to have better labelled frames
216 | if (value['sector'] == 3 and i < 10 and i > (lenOfKeys - 10)):
217 | continue;
218 | fileName, frameStr = aKey.split('avi_')
219 | sourceFrame = int(frameStr)
220 | fileName = datasetDir + '/' + prefix + '/' + fileName + 'avi'
221 | tokens = aKey.split('-')
222 | # number = int(tokens[-1].split('.')[0])
223 | if (fileName != currentFileName):
224 | print 'building annotations for %s'%aKey
225 | currentFileName = fileName
226 | blowUpVideo(currentFileName)
227 | sourceFileName = '_temp/frame_%d.bmp'%(sourceFrame + 1)
228 | destFileName = outputDir + ('/frame_%d.png'%destFrame)
229 | call(["convert", sourceFileName, destFileName])
230 | annotationsFile.write(' \n')
231 | annotationsFile.write(' %d\n'%destFrame)
232 | annotationsFile.write(' 0,0\n')
233 | annotationsFile.write(' %d\n'%value['sector'])
234 | annotationsFile.write(' %d\n'%value['status'])
235 | annotationsFile.write(' %d\n'%value['intersection'])
236 | annotationsFile.write(' \n')
237 | destFrame = destFrame + 1
238 | annotationsFile.write('\n')
239 | annotationsFile.close()
240 |
241 | if __name__ == '__main__':
242 | if (len(sys.argv) < 3):
243 | print 'Usage: python DataStream.py '
244 | sys.exit()
245 |
246 | vcodeFileName = sys.argv[1]
247 | datasetDir = sys.argv[2]
248 | outputDir = sys.argv[3]
249 |
250 | vcodeData = read_vcode(vcodeFileName)
251 |
252 | prefix = vcodeFileName[:vcodeFileName.index('-stitched')]
253 | summaryFileName = datasetDir + '/' + prefix + '/synchronized_data_streams.xml'
254 |
255 | stream = read_xml(summaryFileName)
256 | keys = stream['gaze'].keys()
257 | keys.sort()
258 |
259 | for key, value in sorted(vcodeData.iteritems()):
260 | # print ('%f = %s'%(key, value))
261 | getIncludedFrames(key, value[0], value[1], stream['gaze'], keys)
262 |
263 | keys = annotations.keys()
264 | keys.sort()
265 |
266 | buildAnnotations(keys, datasetDir, prefix, outputDir)
267 |
--------------------------------------------------------------------------------
/utils/src/GazeTracker.cpp:
--------------------------------------------------------------------------------
1 | // GazeTracker.cpp
2 | // File that contains the definition of the methods of class GazeTracker. It provides
3 | // all the functionality needed for in car gaze tracking. It uses the Filter, Location,
4 | // Trainer and Classifier classes underneath
5 |
6 | #include "GazeTracker.h"
7 |
8 | #ifdef SINGLETHREADED
9 | #define fftw_init_threads() ;
10 | #define fftw_plan_with_nthreads(a) ;
11 | #define fftw_cleanup_threads() ;
12 | #endif
13 |
14 | // static member initialization
15 | CvSize GazeTracker::roiSize;
16 | Trainer::KernelType GazeTracker::kernelType = Trainer::Polynomial;
17 |
18 | // Class construction and destruction
19 |
20 | GazeTracker::GazeTracker(string outputDir, bool online) {
21 | isOnline = online;
22 |
23 | // fftw3 initialization to use openmp. These functions should be called
24 | // once at application scope before any other fftw functions are called
25 | fftw_init_threads();
26 | fftw_plan_with_nthreads(1);
27 |
28 | // check if the output directory exists, or else bail
29 | DIR* dir;
30 | dir = opendir(outputDir.c_str());
31 | if (dir == NULL) {
32 | string err = "GazeTracker::GazeTracker. The directory " + outputDir +
33 | " does not exist. Bailing out.";
34 | throw (err);
35 | }
36 | closedir(dir);
37 |
38 | // compute full path names
39 | char fullPath[PATH_MAX + 1];
40 | outputDirectory = realpath((const char*)outputDir.c_str(), fullPath);
41 |
42 | char* path = getenv("SVM_PATH");
43 | if (!path) {
44 | string err = "GazeTracker::GazeTracker. The SVM_PATH environment variable is not set";
45 | throw (err);
46 | }
47 | svmPath = path;
48 | classifier = 0;
49 |
50 | roiSize.width = Globals::roiWidth;
51 | roiSize.height = Globals::roiHeight;
52 |
53 | // we cannot initialize these extractors at this point because
54 | // we don't at this point what this object is going to be used
55 | // for. It could be for creating offline filters or it could
56 | // be for online tracking
57 | leftEyeExtractor = rightEyeExtractor = noseExtractor = 0;
58 |
59 | faceCenter.x = faceCenter.y = 0;
60 |
61 | // now read the config file and update state specific to the current
62 | // classification task
63 | readConfiguration();
64 | }
65 |
66 | GazeTracker::~GazeTracker() {
67 | if (classifier) delete classifier;
68 |
69 | delete leftEyeExtractor;
70 | delete rightEyeExtractor;
71 | delete noseExtractor;
72 |
73 | // final cleanup of all fftw thread data
74 | fftw_cleanup_threads();
75 | }
76 |
77 | // addFrameSet
78 | // Method used to add frame sets in the training data. These sets are the
79 | // directories containing training samples. Each directory is expected to
80 | // contain an annotations file with annotated LOIs
81 |
82 | void GazeTracker::addFrameSet(string directory) {
83 | // compute full path names
84 | char fullPath[PATH_MAX + 1];
85 | string framesDirectory = realpath((const char*)directory.c_str(), fullPath);
86 |
87 | frameSetDirectories.push_back(framesDirectory);
88 | }
89 |
90 | // getWindowCenter
91 | // Method used to compute the center of the window we apply to a given
92 | // frame during LOI extraction
93 |
94 | CvPoint GazeTracker::getWindowCenter() {
95 | CvPoint windowCenter;
96 |
97 | windowCenter.x = Globals::roiWidth / 2;
98 | windowCenter.y = Globals::roiHeight / 2;
99 |
100 | return windowCenter;
101 | }
102 |
103 | // updateWindowCenter
104 | // Method used to update the min and max co-ordinates of the window center
105 | // by walking through all face annotations
106 |
107 | void GazeTracker::updateWindowCenter(string trainingDirectory,
108 | int& minX, int& maxX, int& minY, int& maxY) {
109 | Annotations annotations;
110 |
111 | // first capture the mapping from file names to locations of interest
112 | string locationsFileName = trainingDirectory + "/" +
113 | Globals::annotationsFileName;
114 | annotations.readAnnotations(locationsFileName);
115 |
116 | // now get the set of all annotations
117 | vector& frameAnnotations = annotations.getFrameAnnotations();
118 | for (unsigned int i = 0; i < frameAnnotations.size(); i++) {
119 | FrameAnnotation* fa = frameAnnotations[i];
120 | CvPoint& faceLocation = fa->getLOI(Annotations::Face);
121 | if (minX > faceLocation.x)
122 | minX = faceLocation.x;
123 | if (maxX < faceLocation.x)
124 | maxX = faceLocation.x;
125 |
126 | if (minY > faceLocation.y)
127 | minY = faceLocation.y;
128 | if (maxY < faceLocation.y)
129 | maxY = faceLocation.y;
130 | }
131 | }
132 |
133 | // computeWindowCenter
134 | // Method used to compute a window center for face annotations. This method
135 | // reverts to getWindowCenter for all annotations when not in training mode.
136 | // For in filter training mode, for all annotations other than the face, we
137 | // simply return the result of getWindowCenter, but for the face we use the
138 | // midpoint of extremal x and y co-ordinates
139 |
140 | CvPoint GazeTracker::computeWindowCenter(string trainingDirectory) {
141 | if (trainingDirectory == "" &&
142 | !frameSetDirectories.size() && !trainingSetDirectories.size())
143 | return getWindowCenter();
144 |
145 | // iterate over all frameset directories and pick face annotations
146 | // to compute the average of the extremal face locations
147 | int minX = INT_MAX;
148 | int maxX = INT_MIN;
149 | int minY = INT_MAX;
150 | int maxY = INT_MIN;
151 |
152 | if (trainingDirectory == "") {
153 | for (unsigned int i = 0; i < frameSetDirectories.size(); i++)
154 | updateWindowCenter(frameSetDirectories[i], minX, maxX, minY, maxY);
155 | for (unsigned int i = 0; i < trainingSetDirectories.size(); i++)
156 | updateWindowCenter(trainingSetDirectories[i], minX, maxX, minY, maxY);
157 | } else {
158 | updateWindowCenter(trainingDirectory, minX, maxX, minY, maxY);
159 | }
160 |
161 | // now take the average of the min and max co-ordinate values
162 | CvPoint windowCenter;
163 |
164 | windowCenter.x = (minX + maxX) / 2;
165 | windowCenter.y = (minY + maxY) / 2;
166 |
167 | cout << "Face windowCenter = " << windowCenter.x << ", " << windowCenter.y << endl;
168 |
169 | return windowCenter;
170 | }
171 |
172 | // createFilters
173 | // Method used to create filters. This method will iterate over all the
174 | // training frames directories, add them to each filter we want to create,
175 | // create those filters and save them
176 |
177 | void GazeTracker::createFilters() {
178 | // get the center of the window function
179 | CvPoint windowCenter = getWindowCenter();
180 |
181 | // create filters
182 | Filter* leftEyeFilter = new Filter(outputDirectory, Annotations::LeftEye, roiSize,
183 | Globals::gaussianWidth /* gaussian spread */,
184 | windowCenter, roiFunction);
185 | Filter* rightEyeFilter = new Filter(outputDirectory, Annotations::RightEye, roiSize,
186 | Globals::gaussianWidth /* gaussian spread */,
187 | windowCenter, roiFunction);
188 | Filter* noseFilter = new Filter(outputDirectory, Annotations::Nose, roiSize,
189 | Globals::gaussianWidth /* gaussian spread */,
190 | windowCenter, roiFunction);
191 |
192 | //leftEyeFilter->setAffineTransforms();
193 | //rightEyeFilter->setAffineTransforms();
194 | //noseFilter->setAffineTransforms();
195 |
196 | for (unsigned int i = 0; i < frameSetDirectories.size(); i++) {
197 | cout << "Adding left eye annotations..." << endl;
198 | leftEyeFilter->addTrainingSet(frameSetDirectories[i]);
199 |
200 | cout << "Adding right eye annotations..." << endl;
201 | rightEyeFilter->addTrainingSet(frameSetDirectories[i]);
202 |
203 | cout << "Adding nose annotations..." << endl;
204 | noseFilter->addTrainingSet(frameSetDirectories[i]);
205 | }
206 |
207 | #pragma omp parallel sections num_threads(4)
208 | {
209 | #pragma omp section
210 | {
211 | cout << "Creating left eye filter..." << endl;
212 | leftEyeFilter->create();
213 | leftEyeFilter->save();
214 | }
215 |
216 | #pragma omp section
217 | {
218 | cout << "Creating right eye filter..." << endl;
219 | rightEyeFilter->create();
220 | rightEyeFilter->save();
221 | }
222 |
223 | #pragma omp section
224 | {
225 | cout << "Creating nose filter..." << endl;
226 | noseFilter->create();
227 | noseFilter->save();
228 | }
229 | }
230 |
231 | delete leftEyeFilter;
232 | delete rightEyeFilter;
233 | delete noseFilter;
234 | }
235 |
236 | // addTrainingSet
237 | // Method used to add directories with training data for SVM models. Each directory is
238 | // expected to contain an annotations file with annotated LOIs
239 |
240 | void GazeTracker::addTrainingSet(string directory) {
241 | // compute full path names
242 | char fullPath[PATH_MAX + 1];
243 | string trainingDirectory = realpath((const char*)directory.c_str(), fullPath);
244 |
245 | trainingSetDirectories.push_back(trainingDirectory);
246 | }
247 |
248 | // train
249 | // Method used to train the SVM classifier. The trainer expects to find an
250 | // annotations file in the output directory that contains all the annotations that
251 | // were applied during filter training. All these annotations are used to create
252 | // models using the SVM Light trainer
253 |
254 | void GazeTracker::train() {
255 | if (!leftEyeExtractor) {
256 | // get the center of the window we want to apply
257 | CvPoint windowCenter = getWindowCenter();
258 | leftEyeExtractor = new Location(outputDirectory, Annotations::LeftEye,
259 | windowCenter);
260 | rightEyeExtractor = new Location(outputDirectory, Annotations::RightEye,
261 | windowCenter);
262 | noseExtractor = new Location(outputDirectory, Annotations::Nose,
263 | windowCenter);
264 | }
265 |
266 | Trainer trainer(outputDirectory, kernelType,
267 | leftEyeExtractor, rightEyeExtractor, noseExtractor,
268 | roiFunction, svmPath);
269 |
270 | // add training sets
271 | for (unsigned int i = 0; i < trainingSetDirectories.size(); i++)
272 | trainer.addTrainingSet(trainingSetDirectories[i]);
273 |
274 | // generate models
275 | trainer.generate();
276 | }
277 |
278 | // getZone
279 | // Method used to get the gaze zone given an image. If the classifier object is as
280 | // yet not created, we first create it here. If the online flag is set, then
281 | // we create online filters and then initialize the location extractors with those
282 | // filters, else we use the offline filters that are expected to have been
283 | // generated before this function is called
284 |
285 | int GazeTracker::getZone(IplImage* image, double& confidence, FrameAnnotation& fa) {
286 | if (!classifier)
287 | createClassifier();
288 |
289 | fa.setFace(faceCenter);
290 | return classifier->getZone(image, confidence, fa);
291 | }
292 |
293 | // getFilterAccuracy
294 | // Method used to compute the error for a filter identified by xml tag for the
295 | // annotations in a given directory
296 |
297 | double GazeTracker::getFilterAccuracy(string trainingDirectory, Annotations::Tag xmlTag,
298 | Classifier::ErrorType errorType) {
299 | if (!classifier)
300 | createClassifier();
301 |
302 | return classifier->getFilterError(trainingDirectory, xmlTag, errorType);
303 | }
304 |
305 | // getClassifierAccuracy
306 | // Method to get the classifier accuracy
307 |
308 | pair GazeTracker::getClassifierAccuracy(string trainingDirectory) {
309 | if (!classifier)
310 | createClassifier();
311 |
312 | return classifier->getError(trainingDirectory);
313 | }
314 |
315 | // createClassifier
316 | // Method used to create a classifier object
317 |
318 | void GazeTracker::createClassifier() {
319 | // create location extractors
320 | if (isOnline) {
321 | // create new online filters and use them as filters in the location
322 | // extractors for all subsequent images
323 |
324 | CvPoint windowCenter = getWindowCenter();
325 |
326 | // left eye filter and extractor
327 | Filter* filter = new OnlineFilter(outputDirectory, Annotations::LeftEye, roiSize,
328 | Globals::gaussianWidth /* gaussian spread */,
329 | Globals::learningRate, windowCenter);
330 | leftEyeExtractor = new Location(filter);
331 |
332 | // right eye filter and extractor
333 | filter = new OnlineFilter(outputDirectory, Annotations::RightEye, roiSize,
334 | Globals::gaussianWidth /* gaussian spread */,
335 | Globals::learningRate, windowCenter);
336 | rightEyeExtractor = new Location(filter);
337 |
338 | // nose filter and extractor
339 | filter = new OnlineFilter(outputDirectory, Annotations::Nose, roiSize,
340 | Globals::gaussianWidth /* gaussian spread */,
341 | Globals::learningRate, windowCenter);
342 | noseExtractor = new Location(filter);
343 | } else {
344 | if (!leftEyeExtractor) {
345 | CvPoint windowCenter = getWindowCenter();
346 | leftEyeExtractor = new Location(outputDirectory, Annotations::LeftEye,
347 | windowCenter);
348 | rightEyeExtractor = new Location(outputDirectory, Annotations::RightEye,
349 | windowCenter);
350 | noseExtractor = new Location(outputDirectory, Annotations::Nose,
351 | windowCenter);
352 | }
353 | }
354 |
355 | // now create the classifier
356 | classifier = new Classifier(outputDirectory, kernelType,
357 | leftEyeExtractor, rightEyeExtractor,
358 | noseExtractor, roiFunction);
359 | }
360 |
361 | // showAnnotations
362 | // Method used to show annotations in the training set
363 |
364 | void GazeTracker::showAnnotations() {
365 | if (!frameSetDirectories.size())
366 | return;
367 |
368 | string wName = "Annotations";
369 | cvNamedWindow((const char*)wName.c_str(), CV_WINDOW_NORMAL | CV_WINDOW_AUTOSIZE);
370 |
371 | // initialize font and add text
372 | CvFont font;
373 | cvInitFont(&font, CV_FONT_HERSHEY_SIMPLEX, 1.0, 1.0, 0, 1, CV_AA);
374 |
375 | for (unsigned int i = 0; i < frameSetDirectories.size(); i++) {
376 | Annotations annotations;
377 |
378 | // first capture the mapping from file names to locations of interest
379 | string locationsFileName = frameSetDirectories[i] + "/" +
380 | Globals::annotationsFileName;
381 | annotations.readAnnotations(locationsFileName);
382 |
383 | string framesDirectory = annotations.getFramesDirectory();
384 | string prefix = framesDirectory + "/frame_";
385 |
386 | // now get the set of all annotations
387 | vector& frameAnnotations = annotations.getFrameAnnotations();
388 | for (unsigned int j = 0; j < frameAnnotations.size(); j++) {
389 | FrameAnnotation* fa = frameAnnotations[j];
390 |
391 | char buffer[256];
392 | sprintf(buffer, "%d.png", fa->getFrameNumber());
393 | string filename = prefix + buffer;
394 | IplImage* image = cvLoadImage((const char*)filename.c_str());
395 |
396 | CvPoint& faceLocation = fa->getLOI(Annotations::Face);
397 | if (!faceLocation.x && !faceLocation.y)
398 | continue;
399 |
400 | cvCircle(image, fa->getLOI(Annotations::LeftEye), 5, cvScalar(0, 255, 255, 0), 2, 8, 0);
401 | cvCircle(image, fa->getLOI(Annotations::RightEye), 5, cvScalar(255, 255, 0, 0), 2, 8, 0);
402 | cvCircle(image, fa->getLOI(Annotations::Nose), 5, cvScalar(255, 0, 255, 0), 2, 8, 0);
403 |
404 | sprintf(buffer, "%d", fa->getZone());
405 | cvPutText(image, buffer, cvPoint(580, 440), &font, cvScalar(255, 255, 255, 0));
406 |
407 | cvShowImage((const char*)wName.c_str(), image);
408 | char c = cvWaitKey();
409 | if (c != 'c') {
410 | cvReleaseImage(&image);
411 | break;
412 | }
413 |
414 | cvReleaseImage(&image);
415 | }
416 | }
417 | }
418 |
419 | // readConfiguration
420 | // Function used to read the config.xml file in the models directory. The file
421 | // contains the configuration that is specific to the location of the driver
422 | // with respect to the camera and will grow to other pieces of information
423 | // eventually
424 |
425 | void GazeTracker::readConfiguration() {
426 | string fileName = outputDirectory + '/' + Globals::configFileName;
427 |
428 | ifstream file;
429 |
430 | file.open((const char*)fileName.c_str());
431 | if (file.good()) {
432 | string line;
433 |
434 | getline(file, line); // ignore the first line
435 | while (!file.eof()) {
436 | getline(file, line);
437 | if (line.find("center") != string::npos) {
438 | getline(file, line);
439 | const char* token = strtok((char*)line.c_str(), "<>/x");
440 | if (!token) {
441 | string err = "GazeTracker::readConfiguration. Malformed config file for center";
442 | throw (err);
443 | }
444 | faceCenter.x = atoi(token);
445 | getline(file, line);
446 | token = strtok((char*)line.c_str(), "<>/y");
447 | if (!token) {
448 | string err = "GazeTracker::readConfiguration. Malformed config file for center";
449 | throw (err);
450 | }
451 | faceCenter.y = atoi(token);
452 | }
453 | }
454 | }
455 | }
456 |
457 | // roiFunction
458 | // We support the use of LOIs identified to cull images to smaller regions of interest
459 | // (ROI) for use in locating future LOIs. This function is passed to the constructors
460 | // of the filter and classifier classes. Those classes in turn call this function when
461 | // they need a culled image. The input parameters are the original image, a frame
462 | // annotation object that is annotated with all the LOIs that we have found before this
463 | // function gets called. The offset parameter is an output parameter that contains the
464 | // offset of the ROI within the image. The function returns a culled image object
465 |
466 | IplImage* GazeTracker::roiFunction(IplImage* image, FrameAnnotation& fa,
467 | CvPoint& offset, Annotations::Tag xmlTag) {
468 | offset.x = 0;
469 | offset.y = 0;
470 |
471 | CvPoint& location = fa.getLOI(xmlTag);
472 | offset.y = location.y - (Globals::roiHeight / 2);
473 | offset.x = location.x - (Globals::roiWidth / 2);
474 |
475 | // now check if the roi overflows the image boundary. If it does then
476 | // we move it so that it is contained within the image boundary
477 | if (offset.x + Globals::roiWidth > Globals::imgWidth)
478 | offset.x = Globals::imgWidth - Globals::roiWidth;
479 | if (offset.x < 0)
480 | offset.x = 0;
481 | if (offset.y + Globals::roiHeight > Globals::imgHeight)
482 | offset.y = Globals::imgHeight - Globals::roiHeight;
483 | if (offset.y < 0)
484 | offset.y = 0;
485 |
486 | cvSetImageROI(image, cvRect(offset.x, offset.y, Globals::roiWidth, Globals::roiHeight));
487 | IplImage* roi = cvCreateImage(cvGetSize(image), image->depth, image->nChannels);
488 | cvCopy(image, roi);
489 | cvResetImageROI(image);
490 |
491 | return roi;
492 | }
493 |
--------------------------------------------------------------------------------
/utils/src/GazeTracker.h:
--------------------------------------------------------------------------------
1 | // GazeTracker.h
2 | // File that contains the definition of class GazeTracker. This class is used to
3 | // perform the following operations,
4 | // 1. Learn filters for LOIs
5 | // 2. Apply a filter to an image and get the co-ordinates of the LOI
6 | // 3. Train SVM models for given data
7 | // 4. Apply SVM models to classify gaze zones
8 |
9 | #ifndef __GAZETRACKER_H
10 | #define __GAZETRACKER_H
11 |
12 | #include
13 | #include
14 | #include
15 | #include
16 | #include
17 | #include
18 | #include
19 | #include
20 | #include
21 | #include
22 |
23 | #include "Filter.h"
24 | #include "OnlineFilter.h"
25 | #include "Location.h"
26 | #include "Annotations.h"
27 | #include "Trainer.h"
28 | #include "Classifier.h"
29 |
30 | class GazeTracker {
31 | public:
32 | static Trainer::KernelType kernelType; // The SVM Kernel type
33 |
34 | static CvSize roiSize; // ROI size
35 |
36 | string outputDirectory; // The directory for all generated info
37 |
38 | // Construction and destruction
39 | GazeTracker(string outputDirectory, bool online);
40 | virtual ~GazeTracker();
41 |
42 | // create filters
43 | void addFrameSet(string directory);
44 | void createFilters();
45 |
46 | // train models for SVM
47 | void addTrainingSet(string directory);
48 | void train();
49 |
50 | // get the gaze zone given an image
51 | int getZone(IplImage* image, double& confidence, FrameAnnotation& fa);
52 |
53 | // get error for a given filter used by the gaze tracker
54 | double getFilterAccuracy(string trainingDirectory, Annotations::Tag xmlTag,
55 | Classifier::ErrorType errorType);
56 | // get classification error
57 | pair getClassifierAccuracy(string trainingDirectory);
58 |
59 | // show annotations
60 | void showAnnotations();
61 |
62 | // Function used to compute an image ROI based on partial frame annotations
63 | // As we recognize LOIs, we use those location to potentially cull the
64 | // input image and use a reduced ROI for subsequent recognition
65 | static IplImage* roiFunction(IplImage* image, FrameAnnotation& fa,
66 | CvPoint& offset, Annotations::Tag xmlTag);
67 |
68 | private:
69 | bool isOnline; // true when using online filters
70 | string svmPath;
71 |
72 | // the center of the face for classification
73 | CvPoint faceCenter;
74 |
75 | // The classifier
76 | Classifier* classifier;
77 |
78 | // The frame sets in the training data for filter generation
79 | vector frameSetDirectories;
80 |
81 | // The training set directories for SVM model generation
82 | vector trainingSetDirectories;
83 |
84 | // location extractors
85 | Location* leftEyeExtractor;
86 | Location* rightEyeExtractor;
87 | Location* noseExtractor;
88 |
89 | // create a classifier
90 | void createClassifier();
91 |
92 | // window center functions
93 | CvPoint getWindowCenter();
94 | CvPoint computeWindowCenter(string trainingDirectory = "");
95 | void updateWindowCenter(string trainingDirectory,
96 | int& minX, int& maxX, int& minY, int& maxY);
97 | void readConfiguration();
98 | };
99 |
100 | #endif
101 |
--------------------------------------------------------------------------------
/utils/src/Globals.cpp:
--------------------------------------------------------------------------------
1 | // Globals.cpp
2 | // File that defines all the constants used across all modules
3 |
4 | #include "Globals.h"
5 |
6 | int Globals::imgWidth = 640;
7 | int Globals::imgHeight = 480;
8 | int Globals::roiWidth = 500;
9 | int Globals::roiHeight = 350;
10 | int Globals::maxDistance = 100;
11 | int Globals::maxAngle = 180;
12 | int Globals::maxArea = 200;
13 | int Globals::binWidth = 10;
14 | int Globals::gaussianWidth = 21;
15 | int Globals::psrWidth = 30;
16 | int Globals::nPastLocations = 5;
17 | int Globals::noseDrop = 70;
18 |
19 | int Globals::smallBufferSize = 32;
20 | int Globals::midBufferSize = 256;
21 | int Globals::largeBufferSize = 1024;
22 | int Globals::nSequenceLength = 600;
23 |
24 | unsigned Globals::numZones = 5;
25 |
26 | double Globals::learningRate = 0.125;
27 | double Globals::initialGaussianScale = 0.5;
28 | double Globals::windowXScale = 30;
29 | double Globals::windowYScale = 25;
30 |
31 | string Globals::annotationsFileName = "annotations.xml";
32 | string Globals::modelNamePrefix = "zone_";
33 | string Globals::faceFilter = "MOSSE_Face";
34 | string Globals::leftEyeFilter = "MOSSE_LeftEye";
35 | string Globals::rightEyeFilter = "MOSSE_RightEye";
36 | string Globals::noseFilter = "MOSSE_Nose";
37 |
38 | string Globals::paramsFileName = "parameters.xml";
39 | string Globals::configFileName = "config.xml";
40 |
--------------------------------------------------------------------------------
/utils/src/Globals.h:
--------------------------------------------------------------------------------
1 | // Globals.h
2 | // File that contains the set of all global constants and typedefs. We define a
3 | // class with static members for each constant that will be used at application
4 | // scope
5 |
6 | #ifndef __GLOBALS_H
7 | #define __GLOBALS_H
8 |
9 | #include
10 |
11 | // The standard OpenCV headers
12 | #include
13 | #include
14 |
15 | // The OMP stuff
16 | #include
17 |
18 | // fftw3 stuff
19 | #include
20 |
21 | using namespace std;
22 |
23 | class Globals {
24 | public:
25 | Globals() { }
26 | ~Globals() { }
27 |
28 | static int imgWidth; // default image width that we handle
29 | static int imgHeight; // image height
30 | static int roiWidth; // roi width
31 | static int roiHeight; // roi height
32 | static int maxDistance; // max pixel distance for normalization
33 | static int maxAngle; // max angle for normalization
34 | static int maxArea; // max area of the L, R and N triangle
35 | static int binWidth; // the bin width for binning annotations
36 | static int gaussianWidth; // width of the gaussian
37 | static int psrWidth; // width of window to compute PSR
38 | static int nPastLocations; // number of past locations for smoothing
39 | static int noseDrop; // approx. drop below the eyes for the nose
40 |
41 | static int smallBufferSize; // small stack buffer size
42 | static int midBufferSize; // mid stack buffer size
43 | static int largeBufferSize; // large stack buffer size
44 | static int nSequenceLength; // the length of frame sequences we need
45 |
46 | static unsigned numZones; // number of zones
47 |
48 | static double learningRate; // the learning rate for online filters
49 | static double initialGaussianScale; // the gaussian scale for the face filter
50 |
51 | // the window function is computed as image width times the x scale and
52 | // the image height times the y scale
53 | static double windowXScale; // the X scale factor for the window function
54 | static double windowYScale; // the Y scale factor for the window function
55 |
56 | static string annotationsFileName; // the name of the annotations file
57 | static string modelNamePrefix; // prefix for SVM model names
58 | static string faceFilter; // name of the face filter
59 | static string leftEyeFilter; // name of left eye filter
60 | static string rightEyeFilter; // name of right eye filter
61 | static string noseFilter; // name of nose filter
62 | static string paramsFileName; // name of parameters file
63 | static string configFileName; // name of the config file
64 |
65 | static void setRoiSize(CvSize size) {
66 | roiWidth = size.width;
67 | roiHeight = size.height;
68 | }
69 | };
70 |
71 | #endif
72 |
--------------------------------------------------------------------------------
/utils/src/Makefile:
--------------------------------------------------------------------------------
1 | # Makefile for ADM
2 |
3 | CC = g++
4 | AR = ar
5 | LD = ld
6 | RANLIB = ranlib
7 | RM = /bin/rm
8 | MKDIR = /bin/mkdir
9 | CP = /bin/cp
10 | RM = /bin/rm
11 |
12 | #makedepend flags
13 | DFLAGS =
14 |
15 | #Compiler flags
16 | #if mode variable is empty, setting debug build mode
17 | ifeq ($(mode),opt)
18 | CFLAGS = -Wall -O3 -fPIC -shared -fopenmp
19 | BUILD_DIR = ../build/src.opt
20 | else
21 | mode = debug
22 | CFLAGS = -g -Wall -fPIC -shared -DSINGLETHREADED
23 | BUILD_DIR = ../build/src
24 | endif
25 |
26 | CFILES = Globals.cpp Preprocess.cpp Annotations.cpp xmlToIDX.cpp classify.cpp
27 |
28 | OFILES = Globals.o Preprocess.o Annotations.o xmlToIDX.o classify.o
29 |
30 | TRAIN_DATA_OFILES = $(BUILD_DIR)/Globals.o $(BUILD_DIR)/Preprocess.o $(BUILD_DIR)/Annotations.o
31 |
32 | INSTALL_DIR = ../../install/lib
33 | HEADER_DIR = ../../install/include
34 | BIN_DIR = ../../install/bin
35 |
36 | TRAIN_DATA_LIB = $(INSTALL_DIR)/libdata.a
37 | XML_TO_IDX = $(BIN_DIR)/xmlToIDX
38 | CLASSIFY = $(BIN_DIR)/classify
39 | TRAIN_DATA_INCLUDE = Annotations.h Preprocess.h
40 |
41 | OUT = $(TRAIN_DATA_LIB) $(XML_TO_IDX)
42 |
43 | INCLUDES = -I ./ `pkg-config opencv --cflags` `pkg-config fftw3 --cflags`
44 | LIBS = `pkg-config opencv --cflags --libs` `pkg-config fftw3 --cflags --libs`
45 |
46 | OBJS := $(patsubst %.cpp, $(BUILD_DIR)/%.o, $(filter %.cpp,$(CFILES)))
47 |
48 | #OBJS = $(patsubst %,$(BUILD_DIR)/%,$(OFILES))
49 |
50 | .phony:all header
51 |
52 | all: information $(OUT)
53 |
54 | information:
55 | ifneq ($(mode),opt)
56 | ifneq ($(mode),debug)
57 | @echo "Invalid build mode."
58 | @echo "Please use 'make mode=opt' or 'make mode=debug'"
59 | @exit 1
60 | endif
61 | endif
62 | @echo "Building on "$(mode)" mode"
63 | @echo ".........................."
64 |
65 | $(BUILD_DIR)/%.o: %.cpp $(TRAIN_DATA_INCLUDE)
66 | $(MKDIR) -p $(BUILD_DIR)
67 | $(CC) -c $(INCLUDES) -o $@ $< $(CFLAGS)
68 |
69 | $(OUT): $(OBJS)
70 | $(MKDIR) -p $(INSTALL_DIR)
71 | $(MKDIR) -p $(BIN_DIR)
72 | $(MKDIR) -p $(HEADER_DIR)
73 | $(AR) rcs $(TRAIN_DATA_LIB) $(TRAIN_DATA_OFILES)
74 | $(RANLIB) $(TRAIN_DATA_LIB)
75 | $(CP) -p $(TRAIN_DATA_INCLUDE) $(HEADER_DIR)
76 | $(CC) -o $(XML_TO_IDX) $(BUILD_DIR)/xmlToIDX.o -L$(INSTALL_DIR) -ldata $(LIBS)
77 | $(CC) -o $(CLASSIFY) $(BUILD_DIR)/classify.o -L$(INSTALL_DIR) -ldata $(LIBS)
78 | @echo train_utils finished
79 |
80 | header:
81 | $(MKDIR) -p $(HEADER_DIR)
82 | $(CP) -p $(TRAIN_DATA_INCLUDE) $(HEADER_DIR)
83 | @echo header finished
84 |
85 | depend:
86 | makedepend -- $(DFLAGS) -- $(CFILES)
87 |
88 | .PHONY: clean
89 |
90 | clean:
91 | $(RM) -f $(BUILD_DIR)/*.o $(OUT)
92 |
93 |
--------------------------------------------------------------------------------
/utils/src/Preprocess.cpp:
--------------------------------------------------------------------------------
1 | // Preprocess.cpp
2 | // This file contains the implementation of class Preprocess.
3 | // This class is used to generate training and test sets from directories of
4 | // labelled or unlabelled images.
5 |
6 | #include "Preprocess.h"
7 |
8 | // union to enable accessing bytes of an integer
9 | typedef union {
10 | unsigned int i;
11 | unsigned char u[4];
12 | } IntBytesT;
13 |
14 | // Class construction and destruction
15 |
16 | Preprocess::Preprocess(string output, CvSize size, double scale, CvPoint& center,
17 | map& sFilter, map& iFilter,
18 | roiFnT roiFn, bool bins,
19 | bool inBinFmt, double binThreshold) {
20 | outputFileName = output;
21 | useBins = bins;
22 | roiFunction = roiFn;
23 | inBinaryFormat = inBinFmt;
24 | binaryThreshold = binThreshold;
25 | imgSize.height = size.height;
26 | imgSize.width = size.width;
27 | scaleFactor = scale;
28 |
29 | map::iterator it;
30 | for (it = sFilter.begin(); it != sFilter.end(); it++) {
31 | int key = (*it).first;
32 | bool value = (*it).second;
33 | statusFilter[key] = value;
34 | }
35 | for (it = iFilter.begin(); it != iFilter.end(); it++) {
36 | int key = (*it).first;
37 | bool value = (*it).second;
38 | intersectionFilter[key] = value;
39 | }
40 |
41 | nSamples = 0;
42 | doAffineTransforms = false;
43 |
44 | length = imgSize.height * imgSize.width;
45 |
46 | // allocate real complex vectors for use during filter creation or update
47 | realImg = cvCreateImage(imgSize, IPL_DEPTH_64F, 1);
48 | tempImg = cvCreateImage(imgSize, IPL_DEPTH_64F, 1);
49 | imageBuffer = (double*)fftw_malloc(sizeof(double) * length);
50 |
51 | windowCenter.x = center.x;
52 | windowCenter.y = center.y;
53 |
54 | // Now compute a window around the center
55 | double xSpread = imgSize.width * Globals::windowXScale;
56 | double ySpread = imgSize.height * Globals::windowYScale;
57 | window = createWindow(windowCenter, xSpread, ySpread);
58 |
59 | // cvNamedWindow("window", CV_WINDOW_NORMAL | CV_WINDOW_AUTOSIZE);
60 | }
61 |
62 | Preprocess::Preprocess(CvSize size, double scale, CvPoint& center,
63 | roiFnT roiFn) {
64 | roiFunction = roiFn;
65 | imgSize.height = size.height;
66 | imgSize.width = size.width;
67 | scaleFactor = scale;
68 |
69 | nSamples = 0;
70 | doAffineTransforms = false;
71 |
72 | length = imgSize.height * imgSize.width;
73 |
74 | // allocate real complex vectors for use during filter creation or update
75 | realImg = cvCreateImage(imgSize, IPL_DEPTH_64F, 1);
76 | tempImg = cvCreateImage(imgSize, IPL_DEPTH_64F, 1);
77 | imageBuffer = (double*)fftw_malloc(sizeof(double) * length);
78 |
79 | windowCenter.x = center.x;
80 | windowCenter.y = center.y;
81 |
82 | // Now compute a window around the center
83 | double xSpread = imgSize.width * Globals::windowXScale;
84 | double ySpread = imgSize.height * Globals::windowYScale;
85 | window = createWindow(windowCenter, xSpread, ySpread);
86 |
87 | // cvNamedWindow("window", CV_WINDOW_NORMAL | CV_WINDOW_AUTOSIZE);
88 | }
89 |
90 | Preprocess::~Preprocess() {
91 | cvReleaseImage(&realImg);
92 | cvReleaseImage(&tempImg);
93 | fftw_free(imageBuffer);
94 | fftw_free(window);
95 |
96 | for (unsigned int i = 0; i < fileToAnnotations.size(); i++)
97 | delete fileToAnnotations[i].second;
98 | for (unsigned int i = 0; i < fileToAnnotationsForValidation.size(); i++)
99 | delete fileToAnnotationsForValidation[i].second;
100 | for (unsigned int i = 0; i < fileToAnnotationsForTest.size(); i++)
101 | delete fileToAnnotationsForTest[i].second;
102 | }
103 |
104 | // addTrainingSet
105 | // Method used to add directories that contains images. For training data,
106 | // the directory is also expected to contains labels for each image.
107 |
108 | void Preprocess::addTrainingSet(string trainingDirectory) {
109 | Annotations annotations;
110 |
111 | // first capture the mapping from file names to locations of interest
112 | string locationsFileName = trainingDirectory + "/" + Globals::annotationsFileName;
113 | annotations.readAnnotations(locationsFileName);
114 |
115 | // trim ends
116 | annotations.trimEnds(Annotations::Center, 20 /* nTrim */);
117 | annotations.trimEnds(Annotations::DriverWindow, 10 /* nTrim */);
118 | annotations.trimEnds(Annotations::LeftOfCenter, 10 /* nTrim */);
119 | annotations.trimEnds(Annotations::RightOfCenter, 10 /* nTrim */);
120 | annotations.trimEnds(Annotations::PassengerWindow, 10 /* nTrim */);
121 |
122 | // if we want to pick the same number of frames from each sector, then
123 | // we create bins. This will pick as many frames for each sector as the
124 | // smallest number of frames we have across all sectors
125 | if (useBins)
126 | annotations.createBins();
127 |
128 | // get the frames directory
129 | string framesDirectory = annotations.getFramesDirectory();
130 |
131 | // get the window center
132 | CvPoint& center = annotations.getCenter();
133 |
134 | // collect the number of transitions from each sector to other sectors
135 | unsigned int transitions[Globals::numZones][Globals::numZones];
136 | for (unsigned int i = 0; i < Globals::numZones; i++)
137 | for (unsigned int j = 0; j < Globals::numZones; j++)
138 | transitions[i][j] = 0;
139 |
140 | FrameAnnotation* prev = 0;
141 |
142 | // now get the set of all annotations
143 | vector& frameAnnotations = annotations.getFrameAnnotations();
144 | for (unsigned int i = 0; i < frameAnnotations.size(); i++) {
145 | FrameAnnotation* fa = frameAnnotations[i];
146 |
147 | CvPoint faceCenter = fa->getFace();
148 | if (!faceCenter.x && !faceCenter.y)
149 | fa->setFace(center);
150 |
151 | // collect transition counts
152 | if (prev) {
153 | unsigned int prevSector = prev->getSector();
154 | unsigned int sector = fa->getSector();
155 |
156 | if (prevSector > 0 && prevSector <= 5 && sector > 0 && sector <= 5)
157 | transitions[prevSector - 1][sector - 1]++;
158 | }
159 |
160 | // now check if we have status and/or intersection filters. If yes, then
161 | // pick only those frames that match the filter
162 | if (statusFilter.size() && statusFilter.find(fa->getStatus()) == statusFilter.end())
163 | continue;
164 | if (intersectionFilter.size() &&
165 | intersectionFilter.find(fa->getIntersection()) == intersectionFilter.end())
166 | continue;
167 |
168 | // compose filename and update map
169 | char buffer[256];
170 | sprintf(buffer, "frame_%d.png", fa->getFrameNumber());
171 | string simpleName = buffer;
172 | string fileName = framesDirectory + "/" + simpleName;
173 | fileToAnnotations.push_back(make_pair(fileName, new FrameAnnotation(*fa)));
174 |
175 | prev = fa;
176 | }
177 |
178 | // report transition counts
179 | cout << "Transitions" << endl;
180 | for (unsigned int i = 0; i < Globals::numZones; i++) {
181 | for (unsigned int j = 0; j < Globals::numZones; j++)
182 | cout << transitions[i][j] << "\t";
183 | cout << endl;
184 | }
185 | }
186 |
187 | // addTestSet
188 | // Method used to add annotation files for validation and test. The method
189 | // takes as input an annotation file name and the fraction of the images that
190 | // should be used for validation in that set. The remainder are treated as
191 | // part of the test set.
192 |
193 | void Preprocess::addTestSet(string annotationsFileName,
194 | double validationFraction, double testFraction) {
195 | Annotations annotations;
196 |
197 | // first capture the mapping from file names to locations of interest
198 | annotations.readAnnotations(annotationsFileName);
199 |
200 | // get the frames directory
201 | string framesDirectory = annotations.getFramesDirectory();
202 |
203 | // get the window center
204 | CvPoint& center = annotations.getCenter();
205 |
206 | // now get the set of all annotations
207 | vector& frameAnnotations = annotations.getFrameAnnotations();
208 | random_shuffle(frameAnnotations.begin(), frameAnnotations.end());
209 |
210 | unsigned int nValidationLength = validationFraction * frameAnnotations.size();
211 | unsigned int nTestLength = testFraction * frameAnnotations.size();
212 |
213 | for (unsigned int i = 0; i < frameAnnotations.size(); i++) {
214 | FrameAnnotation* fa = frameAnnotations[i];
215 |
216 | CvPoint faceCenter = fa->getFace();
217 | if (!faceCenter.x && !faceCenter.y)
218 | fa->setFace(center);
219 |
220 | // compose filename and update map
221 | char buffer[256];
222 | sprintf(buffer, "frame_%d.png", fa->getFrameNumber());
223 | string simpleName = buffer;
224 | string fileName = framesDirectory + "/" + simpleName;
225 | if (i < nValidationLength)
226 | fileToAnnotationsForValidation.push_back(make_pair(fileName, new FrameAnnotation(*fa)));
227 | else if (i < nValidationLength + nTestLength)
228 | fileToAnnotationsForTest.push_back(make_pair(fileName, new FrameAnnotation(*fa)));
229 | }
230 | }
231 |
232 | // The following method is used to open a data file and a label file given
233 | // as input a simpleName. It generates the preambles in these files based
234 | // on the MNIST data format
235 |
236 | void Preprocess::openFiles(string simpleName) {
237 | // now open the data file and labels file and create their respective preambles
238 | string dataFileName = "data-" + simpleName;
239 | dataFile.open(dataFileName.c_str(), ofstream::binary);
240 | if (!dataFile.good()) {
241 | string err = "Preprocess::generate. Cannot open " + dataFileName +
242 | " for write";
243 | throw (err);
244 | }
245 | // The magic number 0x00000803 is used for data, where the 0x08
246 | // is for unsigned byte valued data and the 0x03 is for the number
247 | // of dimensions
248 | IntBytesT ib;
249 | ib.i = 8;
250 | ib.i <<= 16;
251 | ib.i |= 3;
252 | dataFile.write((char*)&(ib.u), 4);
253 | ib.i = 0;
254 | dataFile.write((char*)&(ib.u), 4); // for the number of samples
255 | ib.i = (unsigned int)imgSize.width * scaleFactor;
256 | dataFile.write((char*)&(ib.u), 4);
257 | ib.i = (unsigned int)imgSize.height * scaleFactor;
258 | dataFile.write((char*)&(ib.u), 4);
259 |
260 | string labelFileName = "label-" + simpleName;
261 | labelFile.open(labelFileName.c_str(), ofstream::binary);
262 | if (!labelFile.good()) {
263 | string err = "Preprocess::generate. Cannot open " + labelFileName +
264 | " for write";
265 | throw (err);
266 | }
267 | // The magic number 0x00000803 is used for data, where the 0x08
268 | // is for unsigned byte valued data and the 0x03 is for the number
269 | // of dimensions
270 | ib.i = 8;
271 | ib.i <<= 16;
272 | ib.i |= 1;
273 | labelFile.write((char*)&(ib.u), 4);
274 | ib.i = 0;
275 | labelFile.write((char*)&(ib.u), 4); // for the number of samples
276 | }
277 |
278 | // The following method closes the data file and the label file. It re-writes
279 | // the number of samples contained in these files first before closing. The
280 | // number of samples is not available at the time the files are created, as
281 | // we may choose to do affine transforms increasing the number of images
282 | // written over and above the number of images we get through the annotation
283 | // files
284 |
285 | void Preprocess::closeFiles(int nSamples) {
286 | // now write out the total number of samples that were written to file
287 | // and close both the data and label files
288 | IntBytesT ib;
289 | ib.i = nSamples;
290 |
291 | dataFile.seekp(4, ios_base::beg);
292 | dataFile.write((char*)&(ib.u), 4);
293 | dataFile.close();
294 |
295 | labelFile.seekp(4, ios_base::beg);
296 | labelFile.write((char*)&(ib.u), 4);
297 | labelFile.close();
298 | }
299 |
300 | // Method to do the actual image data and sector writing into data and label
301 | // files. We use a randomized access into the set of all files accumulated
302 | // to generate the number of samples required based on a user specified
303 | // percentage
304 |
305 | void Preprocess::generate(int startIndex, int nSamples, string simpleName,
306 | bool doWrite,
307 | vector >* additionalPairs) {
308 | // open data and label files
309 | openFiles(simpleName);
310 |
311 | int nSectorFrames[Globals::numZones];
312 | for (unsigned int i = 0; i < Globals::numZones; i++)
313 | nSectorFrames[i] = 0;
314 |
315 | // if we want to write out test annotations then open a file for that
316 | // purpose
317 | ofstream annotationsFile;
318 | string dirName;
319 | if (doWrite) {
320 | // get absolute path of the input directory
321 | char fullPath[PATH_MAX + 1];
322 | string fullPathName = realpath("./", fullPath);
323 |
324 | string fileName = fullPathName + "/annotations.xml";
325 | dirName = string(fullPath) + "/_files";
326 | annotationsFile.open(fileName.c_str());
327 | mkdir(dirName.c_str(), S_IRWXU | S_IRWXG | S_IROTH | S_IXOTH);
328 | if (annotationsFile.good()) {
329 | annotationsFile << "" << endl;
330 | annotationsFile << "" << endl;
331 | }
332 | }
333 |
334 | // write content
335 | int samples = 0;
336 | for (int i = startIndex; i < startIndex + nSamples; i++) {
337 | string fileName = fileToAnnotations[i].first;
338 | FrameAnnotation* fa = fileToAnnotations[i].second;
339 |
340 | int sector = fa->getSector();
341 | if (sector < Annotations::DriverWindow || sector > Annotations::PassengerWindow)
342 | continue;
343 |
344 | if (doWrite) {
345 | annotationsFile << " " << endl;
346 | annotationsFile << " " << samples + 1 <<
347 | "" << endl;
348 | annotationsFile << " " << fa->getFace().y << "," <<
349 | fa->getFace().x << "" << endl;
350 | annotationsFile << " " << fa->getSector() << "" << endl;
351 | annotationsFile << " " << fa->getStatus() << "" << endl;
352 | annotationsFile << " " << fa->getIntersection() <<
353 | "" << endl;
354 | annotationsFile << " " << endl;
355 |
356 | char buffer[256];
357 | sprintf(buffer, "/frame_%d.png", samples + 1);
358 | string destFileName = dirName + string(buffer);
359 | string command = "/bin/cp " + fileName + " " + destFileName;
360 | static_cast(system(command.c_str()));
361 | }
362 |
363 | nSectorFrames[fa->getSector() - 1]++;
364 |
365 | // update files with image and label data
366 | update(fileName, fa, samples);
367 | }
368 | // now add any frames from the additional pairs if any
369 | if (additionalPairs) {
370 | for (unsigned int i = 0; i < additionalPairs->size(); i++) {
371 | string fileName = (*additionalPairs)[i].first;
372 | FrameAnnotation* fa = (*additionalPairs)[i].second;
373 |
374 | int sector = fa->getSector();
375 | if (sector < Annotations::DriverWindow || sector > Annotations::PassengerWindow)
376 | continue;
377 |
378 | if (doWrite) {
379 | annotationsFile << " " << endl;
380 | annotationsFile << " " << samples + 1 <<
381 | "" << endl;
382 | annotationsFile << " " << fa->getFace().y << "," <<
383 | fa->getFace().x << "" << endl;
384 | annotationsFile << " " << fa->getSector() << "" << endl;
385 | annotationsFile << " " << fa->getStatus() << "" << endl;
386 | annotationsFile << " " << fa->getIntersection() <<
387 | "" << endl;
388 | annotationsFile << " " << endl;
389 |
390 | char buffer[256];
391 | sprintf(buffer, "/frame_%d.png", samples + 1);
392 | string destFileName = dirName + string(buffer);
393 | string command = "/bin/cp " + fileName + " " + destFileName;
394 | static_cast(system(command.c_str()));
395 | }
396 |
397 | nSectorFrames[fa->getSector() - 1]++;
398 |
399 | // update files with image and label data
400 | update(fileName, fa, samples);
401 | }
402 | }
403 |
404 | if (doWrite) {
405 | annotationsFile << "" << endl;
406 | annotationsFile.close();
407 | }
408 |
409 | // close data and label files
410 | closeFiles(samples);
411 |
412 | // info
413 | cout << "For " << simpleName << " generated the following frames by sector" << endl;
414 | cout << "[" << nSectorFrames[0] << ", " << nSectorFrames[1] << ", " <<
415 | nSectorFrames[2] << ", " << nSectorFrames[3] << ", " << nSectorFrames[4] << "]" <<
416 | endl;
417 | }
418 |
419 | // Method that generates three sets of files; training, validation and test.
420 | // The user specified training and validation percentages are used with the
421 | // remaining images being treated as test images
422 |
423 | void Preprocess::generate(double training, double validation, bool doWriteTests) {
424 | // first shuffle the vector of image frames
425 | random_shuffle(fileToAnnotations.begin(), fileToAnnotations.end());
426 |
427 | // compute training, validation, and testing required samples
428 | int len = fileToAnnotations.size();
429 | int nTrainingSamples = len * training;
430 | int nValidationSamples = len * validation;
431 | int nTestingSamples = len - (nTrainingSamples + nValidationSamples);
432 |
433 | // generate training files
434 | string simpleName = "train-" + outputFileName;
435 | generate(0 /* startIndex */, nTrainingSamples, simpleName);
436 |
437 | // generate validation files
438 | simpleName = "valid-" + outputFileName;
439 | generate(nTrainingSamples, nValidationSamples, simpleName,
440 | false /* doWrite */, &fileToAnnotationsForValidation);
441 |
442 | // we don't want affine transforms for validation and test sets
443 | doAffineTransforms = false;
444 |
445 | // the remainder of the images are test images
446 | simpleName = "test-" + outputFileName;
447 | generate(nTrainingSamples + nValidationSamples, nTestingSamples, simpleName,
448 | doWriteTests, &fileToAnnotationsForTest);
449 | }
450 |
451 | // Method to do the actual image data and sector writing into data and label
452 | // files. We pick Globals::nSequenceLength chunks based on the chunk indices vector,
453 | // the starting chunk index and the number of chunks desired
454 |
455 | void Preprocess::generateSequences(int startIndex, int nChunks,
456 | vector& indices, string simpleName) {
457 | // open data and label files
458 | openFiles(simpleName);
459 |
460 | int nSectorFrames[Globals::numZones];
461 | for (unsigned int i = 0; i < Globals::numZones; i++)
462 | nSectorFrames[i] = 0;
463 |
464 | // if we want to write out test annotations then open a file for that
465 | // purpose
466 | ofstream annotationsFile;
467 | string dirName;
468 |
469 | // collect the number of transitions from each sector to other sectors
470 | unsigned int transitions[Globals::numZones][Globals::numZones];
471 | for (unsigned int i = 0; i < Globals::numZones; i++)
472 | for (unsigned int j = 0; j < Globals::numZones; j++)
473 | transitions[i][j] = 0;
474 |
475 | FrameAnnotation* prev = 0;
476 |
477 | // write content
478 | int samples = 0;
479 | for (int i = startIndex; i < startIndex + nChunks; i++)
480 | for (int j = 0; j < Globals::nSequenceLength; j++) {
481 | int offset = indices[i] * Globals::nSequenceLength + j;
482 | string fileName = fileToAnnotations[offset].first;
483 | FrameAnnotation* fa = fileToAnnotations[offset].second;
484 |
485 | int sector = fa->getSector();
486 | if (sector < Annotations::DriverWindow || sector > Annotations::PassengerWindow)
487 | continue;
488 |
489 | // collect transition counts
490 | if (prev) {
491 | unsigned int prevSector = prev->getSector();
492 |
493 | if (prevSector > 0 && prevSector <= 5)
494 | transitions[prevSector - 1][sector - 1]++;
495 | }
496 |
497 | nSectorFrames[fa->getSector() - 1]++;
498 |
499 | // update files with image and label data
500 | update(fileName, fa, samples);
501 |
502 | prev = fa;
503 | }
504 |
505 | // close data and label files
506 | closeFiles(samples);
507 |
508 | // report transition counts
509 | cout << "Transitions" << endl;
510 | for (unsigned int i = 0; i < Globals::numZones; i++) {
511 | for (unsigned int j = 0; j < Globals::numZones; j++)
512 | cout << transitions[i][j] << "\t";
513 | cout << endl;
514 | }
515 |
516 | // info
517 | cout << "For " << simpleName << " generated the following frames by sector" << endl;
518 | cout << "[" << nSectorFrames[0] << ", " << nSectorFrames[1] << ", " <<
519 | nSectorFrames[2] << ", " << nSectorFrames[3] << ", " << nSectorFrames[4] << "]" <<
520 | endl;
521 | }
522 |
523 | // Method that generates three sets of files; training, validation and test.
524 | // The user specified training and validation percentages are used with the
525 | // remaining images being treated as test images. This method, unlike the one
526 | // above, will preserve frame sequences and will not randomize
527 |
528 | void Preprocess::generateSequences(double training, double validation) {
529 | // compute training, validation, and testing required samples
530 | int len = fileToAnnotations.size();
531 | int nTrainingSamples = len * training;
532 | int nValidationSamples = len * validation;
533 | int nTestingSamples = len - (nTrainingSamples + nValidationSamples);
534 |
535 | int chunks = len / Globals::nSequenceLength;
536 | vector indices;
537 | for (int i = 0; i < chunks; i++)
538 | indices.push_back(i);
539 | random_shuffle(indices.begin(), indices.end());
540 |
541 | int nValidationChunks = nValidationSamples / Globals::nSequenceLength;
542 | int nTestChunks = nTestingSamples / Globals::nSequenceLength;
543 |
544 | // pick the first nValidationChunks from indices for validation samples
545 | cout << "Validation frames" << endl;
546 | string simpleName = "valid-" + outputFileName;
547 | generateSequences(0 /* startIndex */, nValidationChunks, indices, simpleName);
548 |
549 | // generate test files
550 | cout << "Test frames" << endl;
551 | simpleName = "test-" + outputFileName;
552 | generateSequences(nValidationChunks, nTestChunks, indices, simpleName);
553 |
554 | // generate training files
555 | // since we want to preserve as much sequentiality in the data as possible,
556 | // we sort the remaining chunk indices and then use them to generate
557 | // training data. This way, contiguous chunks will contribute to increasing
558 | // sequential correlation
559 | cout << "Train frames" << endl;
560 | sort(indices.begin() + nValidationChunks + nTestChunks, indices.end());
561 | simpleName = "train-" + outputFileName;
562 | generateSequences(nValidationChunks + nTestChunks,
563 | chunks - (nValidationChunks + nTestChunks),
564 | indices, simpleName);
565 | }
566 |
567 | // update
568 | // Method used to update terms used to create a filter from an image file
569 | // and a location of interest. This method is called by the method addTrainingSet
570 | // for each image file in the test set and a location of interest for that
571 | // image
572 |
573 | void Preprocess::update(string filename, FrameAnnotation* fa, int& samples) {
574 | IplImage* image = cvLoadImage(filename.c_str());
575 | if (!image) {
576 | string err = "Preprocess::update. Cannot load file " + filename + ".";
577 | throw (err);
578 | }
579 |
580 | // generate affine transforms if requested
581 | vector& imgLocPairs = getAffineTransforms(image, fa->getFace());
582 |
583 | for (unsigned int i = 0; i < imgLocPairs.size(); i++) {
584 | image = imgLocPairs[i].first;
585 | CvPoint& location = imgLocPairs[i].second;
586 |
587 | CvPoint offset;
588 | offset.x = offset.y = 0;
589 | fa->setFace(location);
590 | if (roiFunction) {
591 | IplImage* roi = roiFunction(image, *fa, offset, Annotations::Face);
592 | image = roi;
593 | }
594 |
595 | // compute size and length of the image data
596 | CvSize size = cvGetSize(image);
597 |
598 | // check consistency
599 | if (imgSize.height != size.height || imgSize.width != size.width) {
600 | char buffer[32];
601 | sprintf(buffer, "(%d, %d).", imgSize.height, imgSize.width);
602 | string err = "Preprocess::update. Inconsistent image sizes. Expecting" + string(buffer);
603 | throw (err);
604 | }
605 |
606 | // preprocess
607 | double* preImage = preprocessImage(image);
608 | IplImage* processedImage = cvCreateImage(imgSize, IPL_DEPTH_8U, 1);
609 | int step = processedImage->widthStep;
610 | unsigned char* imageData = (unsigned char*)processedImage->imageData;
611 | for (int i = 0; i < imgSize.height; i++) {
612 | for (int j = 0; j < imgSize.width; j++) {
613 | if (preImage[i * imgSize.width + j] > 1)
614 | cout << "(" << i << ", " << j << ") = " << preImage[i * imgSize.width + j] << endl;
615 | double d = preImage[i * imgSize.width + j];
616 | d = (inBinaryFormat)? ((d >= binaryThreshold)? 255 : 0) : d * 255;
617 | unsigned char c = (unsigned char)d;
618 | (*imageData++) = c;
619 | }
620 | imageData += step / sizeof(unsigned char) - imgSize.width;
621 | }
622 | // cvNamedWindow("grayScaleImage", CV_WINDOW_NORMAL | CV_WINDOW_AUTOSIZE);
623 | // cvShowImage("grayScaleImage", processedImage);
624 | // cvWaitKey(1);
625 |
626 | CvSize scaledSize;
627 | scaledSize.width = (unsigned int)imgSize.width * scaleFactor;
628 | scaledSize.height = (unsigned int)imgSize.height * scaleFactor;
629 |
630 | IplImage* scaledImage = cvCreateImage(scaledSize, processedImage->depth,
631 | processedImage->nChannels);
632 | cvResize(processedImage, scaledImage);
633 | // cvNamedWindow("scaledImage", CV_WINDOW_NORMAL | CV_WINDOW_AUTOSIZE);
634 | // cvShowImage("scaledImage", scaledImage);
635 | // cvWaitKey();
636 | cvReleaseImage(&processedImage);
637 |
638 | samples++;
639 | unsigned char buffer[16];
640 | buffer[0] = (unsigned char)fa->getSector();
641 | labelFile.write((char*)buffer, 1);
642 | /*
643 | // write out other bits of annotated information
644 | unsigned char status = (unsigned char)fa->getStatus();
645 | bitset<8> statusBits(status);
646 | for (unsigned int i = 0; i < statusBits.size(); i++) {
647 | if (statusBits[i])
648 | buffer[i] = 255;
649 | else
650 | buffer[i] = 0;
651 | }
652 | dataFile.write((char*)buffer, 8);
653 |
654 | unsigned char intersection = (unsigned char)fa->getIntersection();
655 | bitset<8> intxBits(intersection);
656 | for (unsigned int i = 0; i < intxBits.size(); i++) {
657 | if (intxBits[i])
658 | buffer[i] = 255;
659 | else
660 | buffer[i] = 0;
661 | }
662 | dataFile.write((char*)buffer, 8);
663 | */
664 | imageData = (unsigned char*)scaledImage->imageData;
665 | for (int i = 0; i < scaledSize.height; i++) {
666 | for (int j = 0; j < scaledSize.width; j++) {
667 | buffer[0] = (*imageData++);
668 | dataFile.write((char*)buffer, 1);
669 | }
670 | imageData += step / sizeof(unsigned char) - scaledSize.width;
671 | }
672 | cvReleaseImage(&scaledImage);
673 |
674 | if (roiFunction)
675 | cvReleaseImage(&image);
676 | }
677 |
678 | destroyAffineTransforms(imgLocPairs);
679 | }
680 |
681 | // generateImageVector
682 | // Method used to create an image vector for classification. We carve out an ROI
683 | // using the ROI function, preprocess the ROI image, scale the preprocessed image
684 | // and return it as an array of doubles
685 |
686 | IplImage* Preprocess::generateImageVector(IplImage* image) {
687 | FrameAnnotation fa;
688 | CvPoint offset;
689 | offset.x = offset.y = 0;
690 | fa.setFace(windowCenter);
691 | if (roiFunction) {
692 | IplImage* roi = roiFunction(image, fa, offset, Annotations::Face);
693 | image = roi;
694 | }
695 |
696 | // cvNamedWindow("Image", CV_WINDOW_NORMAL | CV_WINDOW_AUTOSIZE);
697 | // cvShowImage("Image", image);
698 |
699 | // compute size and length of the image data
700 | CvSize size = cvGetSize(image);
701 |
702 | // check consistency
703 | if (imgSize.height != size.height || imgSize.width != size.width) {
704 | char buffer[32];
705 | sprintf(buffer, "(%d, %d).", imgSize.height, imgSize.width);
706 | string err = "Preprocess::update. Inconsistent image sizes. Expecting" + string(buffer);
707 | throw (err);
708 | }
709 |
710 | // preprocess
711 | double* preImage = preprocessImage(image);
712 | IplImage* processedImage = cvCreateImage(imgSize, IPL_DEPTH_64F, 1);
713 | int step = processedImage->widthStep;
714 | double* imageData = (double*)processedImage->imageData;
715 | for (int i = 0; i < imgSize.height; i++) {
716 | for (int j = 0; j < imgSize.width; j++) {
717 | if (preImage[i * imgSize.width + j] > 1)
718 | cout << "(" << i << ", " << j << ") = " << preImage[i * imgSize.width + j] << endl;
719 | double d = preImage[i * imgSize.width + j];
720 | (*imageData++) = d;
721 | }
722 | imageData += step / sizeof(double) - imgSize.width;
723 | }
724 |
725 | CvSize scaledSize;
726 | scaledSize.width = (unsigned int)imgSize.width * scaleFactor;
727 | scaledSize.height = (unsigned int)imgSize.height * scaleFactor;
728 |
729 | IplImage* scaledImage = cvCreateImage(scaledSize, processedImage->depth,
730 | processedImage->nChannels);
731 | cvResize(processedImage, scaledImage);
732 | cvNamedWindow("scaledImage", CV_WINDOW_NORMAL | CV_WINDOW_AUTOSIZE);
733 | cvShowImage("scaledImage", scaledImage);
734 | cvWaitKey();
735 | cvReleaseImage(&processedImage);
736 |
737 | size.height = 1;
738 | size.width = scaledSize.width * scaledSize.height;
739 | IplImage* result = cvCreateImage(size, IPL_DEPTH_64F, 1);
740 | double* resultData = (double*)result->imageData;
741 | imageData = (double*)scaledImage->imageData;
742 | step = scaledImage->widthStep;
743 | for (int i = 0; i < scaledSize.height; i++) {
744 | for (int j = 0; j < scaledSize.width; j++) {
745 | (*resultData++) = (*imageData++);
746 | }
747 | imageData += step / sizeof(double) - scaledSize.width;
748 | }
749 |
750 | if (roiFunction)
751 | cvReleaseImage(&image);
752 | cvReleaseImage(&scaledImage);
753 |
754 | return result;
755 | }
756 |
757 | // preprocessImage
758 | // Method that preprocesses an image that has already been loaded for a
759 | // subsequent application of the filter. The method returns a preprocessed
760 | // image which can then be used on a subsequent call to apply or update.
761 |
762 | double* Preprocess::preprocessImage(IplImage* inputImg) {
763 | // we take the complex image and preprocess it here
764 | if (!inputImg) {
765 | string err = "Preprocess::preprocessImage. Call setImage with a valid image.";
766 | throw (err);
767 | }
768 |
769 | bool releaseImage = false;
770 | IplImage* image = 0;
771 |
772 | // First check if the image is in grayscale. If not, we first convert it
773 | // into grayscale. The input image is replaced with its grayscale version
774 | if ((inputImg->nChannels != 1 && strcmp(inputImg->colorModel, "GRAY"))) {
775 | image = cvCreateImage(imgSize, IPL_DEPTH_8U, 1);
776 | cvCvtColor(inputImg, image, CV_BGR2GRAY);
777 | releaseImage = true;
778 | } else
779 | image = inputImg;
780 |
781 | // now do histogram equalization
782 | cvEqualizeHist(image, image);
783 |
784 | // edge detection
785 | cvCanny(image, image, 120, 200, 3);
786 |
787 | // We follow preprocessing steps here as outlined in Bolme 2009
788 | // First populate a real image from the grayscale image
789 | double scale = 1.0 / 255.0;
790 | cvConvertScale(image, realImg, scale, 0.0);
791 | /*
792 | // compute image inversion
793 | int step = realImg->widthStep;
794 | double* imageData = (double*)realImg->imageData;
795 | for (int i = 0; i < imgSize.height; i++) {
796 | for (int j = 0; j < imgSize.width; j++) {
797 | *(imageData) = 1 - *(imageData);
798 | imageData++;
799 | }
800 | imageData += step / sizeof(double) - imgSize.width;
801 | }
802 | */
803 | // suppress DC
804 | cvDCT(realImg, tempImg, CV_DXT_FORWARD);
805 | int step = tempImg->widthStep;
806 | double* imageData = (double*)tempImg->imageData;
807 | for (int i = 0; i < imgSize.height; i++) {
808 | for (int j = 0; j < imgSize.width; j++) {
809 | double sigmoid = (1 / (1 + (exp(-(i * imgSize.width + j)))));
810 | *(imageData) = *(imageData) * sigmoid;
811 | imageData++;
812 | }
813 | imageData += step / sizeof(double) - imgSize.width;
814 | }
815 | cvSet2D(tempImg, 0, 0, cvScalar(0));
816 | cvDCT(tempImg, realImg, CV_DXT_INVERSE);
817 |
818 | double min, max;
819 | cvMinMaxLoc(realImg, &min, &max, NULL, NULL);
820 | if (min < 0)
821 | cvAddS(realImg, cvScalar(-min), realImg, NULL);
822 | else
823 | cvAddS(realImg, cvScalar(min), realImg, NULL);
824 |
825 | cvMinMaxLoc(realImg, &min, &max, NULL, NULL);
826 | scale = 1.0 / max;
827 | cvConvertScale(realImg, realImg, scale, 0);
828 |
829 | // Apply the window
830 | applyWindow(realImg, window, imageBuffer);
831 |
832 | /* double* destImageData = imageBuffer;
833 | double* srcImageData = (double*)realImg->imageData;
834 | for (int i = 0; i < imgSize.height; i++) {
835 | for (int j = 0; j < imgSize.width; j++) {
836 | (*destImageData) = (*srcImageData);
837 | srcImageData++; destImageData++;
838 | }
839 | srcImageData += step / sizeof(double) - imgSize.width;
840 | }*/
841 | // showRealImage("preprocessedImage", imageBuffer);
842 |
843 | if (releaseImage)
844 | cvReleaseImage(&image);
845 |
846 | return imageBuffer;
847 | }
848 |
849 | // getAffineTransforms
850 | // Method to generate a set of affine transformations of a given image. The
851 | // image is rotated and translated to perturb the LOI around the given location.
852 | // The method returns a vector of images that have been perturbed with small
853 | // affine transforms
854 |
855 | vector& Preprocess::getAffineTransforms(IplImage* image, CvPoint& location) {
856 | // first check if affine transformations are needed, if not then simply
857 | // push the input images to the vector of transformed images and return
858 | transformedImages.push_back(make_pair(image, location));
859 | if (!doAffineTransforms)
860 | return transformedImages;
861 |
862 | // Setup unchanging data sets used for transformations
863 | // for rotation
864 | Mat imageMat(image);
865 | Point2f center(imageMat.cols / 2.0F, imageMat.rows / 2.0F);
866 |
867 | CvSize size = cvGetSize(image);
868 |
869 | // for translation
870 | Mat translationMat = getRotationMatrix2D(center, 0, 1.0);
871 | translationMat.at(0, 0) = 1;
872 | translationMat.at(0, 1) = 0;
873 | translationMat.at(1, 0) = 0;
874 | translationMat.at(1, 1) = 1;
875 |
876 | // perform a set of translations of each rotated image
877 | Mat src(image);
878 | for (double xdist = -20; xdist <= 20; xdist += 10) {
879 | if (xdist == 0) continue;
880 |
881 | translationMat.at(0, 2) = xdist;
882 | translationMat.at(1, 2) = 0;
883 | IplImage* translatedImage = cvCloneImage(image);
884 | Mat dest(translatedImage);
885 | warpAffine(src, dest, translationMat, src.size());
886 |
887 | CvPoint translatedLocation;
888 | translatedLocation.x = location.x + xdist;
889 | translatedLocation.y = location.y;
890 |
891 | // check if the translated location is out of bounds with respect
892 | // to the image window. Do not add those images to the set
893 | if (translatedLocation.x < 0 || translatedLocation.x > size.width ||
894 | translatedLocation.y < 0 || translatedLocation.y > size.height) {
895 | cvReleaseImage(&translatedImage);
896 | continue;
897 | }
898 |
899 | pair p = make_pair(translatedImage, translatedLocation);
900 | transformedImages.push_back(p);
901 | }
902 |
903 | return transformedImages;
904 | }
905 |
906 | // destroyAffineTransforms
907 | // Method to destroy the images generated using affine transforms
908 |
909 | void Preprocess::destroyAffineTransforms(vector& imgLocPairs) {
910 | for (unsigned int i = 0; i < imgLocPairs.size(); i++)
911 | cvReleaseImage(&(imgLocPairs[i].first));
912 | imgLocPairs.clear();
913 | }
914 |
915 | // createWindow
916 | // Method to create a window around a given location to drop the value of
917 | // the pixels all around the area of interest in the image. The size of the
918 | // field is the same as the filter size. The spread parameters are used to
919 | // define the spread of the hot spot on the image plane. Pixels close to
920 | // location have the highest values and those beyond the spread rapidly go
921 | // to zero
922 |
923 | double* Preprocess::createWindow(CvPoint& location, double xSpread, double ySpread) {
924 | // Linear space vector. Create a meshgrid with x and y axes values
925 | // that stradle the location
926 |
927 | double xspacer[imgSize.width];
928 | double yspacer[imgSize.height];
929 |
930 | int lx = location.x;
931 | double left = -lx;
932 | for (int i = 0; i < imgSize.width; i++) {
933 | xspacer[i] = left;
934 | left += 1.0;
935 | }
936 | int ly = location.y;
937 | double top = -ly;
938 | for (int i = 0; i < imgSize.height; i++) {
939 | yspacer[i] = top;
940 | top += 1.0;
941 | }
942 |
943 | // Mesh grid
944 | double x[imgSize.height][imgSize.width];
945 | double y[imgSize.height][imgSize.width];
946 |
947 | for (int i = 0; i < imgSize.height; i++) {
948 | for (int j = 0; j < imgSize.width; j++) {
949 | x[i][j] = xspacer[j];
950 | y[i][j] = yspacer[i];
951 | }
952 | }
953 |
954 | // create a gaussian as big as the image
955 | double gaussian[imgSize.height][imgSize.width];
956 |
957 | double det = xSpread * ySpread;
958 |
959 | for (int i = 0; i < imgSize.height; i++) {
960 | for (int j = 0; j < imgSize.width; j++) {
961 | // using just the gaussian kernel
962 | double X = x[i][j] * x[i][j];
963 | double Y = y[i][j] * y[i][j];
964 | gaussian[i][j] = exp(-((X * ySpread + Y * xSpread) / det));
965 | }
966 | }
967 |
968 | double* window = (double*)fftw_malloc(sizeof(double) * length);
969 |
970 | // now initialize a real array as large as the image array with the values
971 | // of the gaussian
972 | for (int i = 0; i < length; i++)
973 | window[i] = 0;
974 | for (int i = 0; i < imgSize.height; i++) {
975 | for (int j = 0; j < imgSize.width; j++) {
976 | window[i * imgSize.width + j] = gaussian[i][j];
977 | }
978 | }
979 |
980 | // showRealImage("__window", window);
981 |
982 | return window;
983 | }
984 |
985 | // applyWindow
986 | // Method used to apply a window function to a real image. It takes as input
987 | // the real image source and a window as a 2d real array. The result is stored in
988 | // the third parameter, the source and the destination can be the same. The step
989 | // is expected to match in the src and dest images
990 |
991 | void Preprocess::applyWindow(IplImage* src, double* window, double* dest) {
992 | int step = src->widthStep;
993 |
994 | double* destImageData = dest;
995 | double* srcImageData = (double*)src->imageData;
996 | for (int i = 0; i < imgSize.height; i++) {
997 | for (int j = 0; j < imgSize.width; j++) {
998 | (*destImageData) = (*srcImageData) * window[i * imgSize.width + j];
999 | srcImageData++; destImageData++;
1000 | }
1001 | srcImageData += step / sizeof(double) - imgSize.width;
1002 | }
1003 | }
1004 |
1005 |
--------------------------------------------------------------------------------
/utils/src/Preprocess.h:
--------------------------------------------------------------------------------
1 | #ifndef __PREPROCESS_H
2 | #define __PREPROCESS_H
3 |
4 | // Preprocess.h
5 | // This file contains the definition of the Preprocess class. It generates
6 | // data in the IDX file format, similar to the MNIST dataset
7 |
8 | #include
9 | #include
10 | #include
11 | #include
12 | #include
13 | #include
14 | #include
15 | #include
16 | #include
17 | #include
18 | #include
19 | #include
20 | #include
21 | #include
22 | #include