├── README.md ├── cbof_paper ├── README.md ├── mnist_demo.py └── model │ ├── __init__.py │ ├── base_learner.py │ ├── cbof.py │ ├── cnn.py │ ├── cnn_feat.py │ ├── datasets.py │ └── nbof.py ├── datasets ├── __init__.py └── mnist.py ├── mnist_example.py └── models ├── __init__.py ├── bof.py └── learner_base.py /README.md: -------------------------------------------------------------------------------- 1 | # Bag-of-Features Pooling for Deep Convolutional Neural Networks 2 | 3 | **IMPORTANT: Given the uncertain future of theano, we also provide a [*keras*-based implementation](https://github.com/passalis/keras_cbof) of the proposed method.** 4 | 5 | In this repository we provide an efficient and simple re-implementation of the [Bag-of-Features Pooling method for Deep Convolutional Neural Networks](https://arxiv.org/abs/1707.08105) using the Lasagne framework. The provided lasagne layer can be used in any lasagne-based model. The distance between the extracted feature vectors and the codebook is calculated using convolutional layers (exploiting that the squared distance ||x-y||^2 can be calculated using three inner products, i.e., x^2+y^2-2xy), significantly speeding up the training/testing speed. 6 | 7 | We provide an example of using the proposed method in mnist_example.py and we compare the BoF pooling to the plain SPP polling. The proposed method can both increase the classification perfomance and provide better scale invariance, as shown below (the classification error on the MNIST dataset is reported): 8 | 9 | 10 | | Model | Scale = 1 | Scale = 0.8 | Scale = 0.7 | 11 | | ------------- | --------- | --------- | --------- | 12 | | SPP | 0.68 % | 4.08 % | 36.78 % | 13 | | BoF Pooling | **0.54 %** | **1.40 %** | **17.60 %** | 14 | 15 | Note that this is not the implementation used for conducting the experiments in our [paper](https://arxiv.org/abs/1707.08105). The original (slower, but more flexible) implementation can be found in [cbof_paper](cbof_paper). 16 | 17 | If you use this code in your work please cite the following paper: 18 | 19 |
20 | @InProceedings{cbof_iccv,
21 | author = {Passalis, Nikolaos and Tefas, Anastasios},
22 | title = {Bag-of-Features Pooling for Deep Convolutional Neural Networks},
23 | booktitle = {Proceedings of the IEEE International Conference on Computer Vision},
24 | year = {2017}
25 | }
26 | 
27 | 28 | 29 | ### Acknolwegment 30 | This project has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No 731667 (MULTIDRONE). This publication reflects the authors’ views only. The European Commission is not responsible for any use that may be made of the information it contains. 31 | -------------------------------------------------------------------------------- /cbof_paper/README.md: -------------------------------------------------------------------------------- 1 | # Bag-of-Features Pooling for Deep Convolutional Neural Networks 2 | 3 | This implementation is based on the implementation used for conducting the experiments in the [Bag-of-Features Pooling method for Deep Convolutional Neural Networks](https://arxiv.org/abs/1707.08105) paper. This implementation is slower than the *lasagne*-based implementation that we provide in the [main repository](). However, it is also more flexible, e.g., it allows for using separate codebooks for each spatial region. 4 | 5 | Note that the obtained results might slightly vary due to the non-deterministic behaviour of the libraries (CUDA) used for the GPU calculations and the clustering algorithm used for the initialization of the codebook. To avoid these issues we explicitly avoid using non-determining algorithms during the optimization in the results reported here. To do so, you can add the following in the *theano.rc* configuration file: 6 | 7 |
 8 | [dnn.conv]
 9 | algo_bwd_filter=deterministic
10 | algo_bwd_data=deterministic
11 | 
12 | 13 | After using this configuration and fixing the seeds, the following results should be obtained: 14 | 15 | 16 | | Model | 28 x 28 | 20 x 20 | 17 | | ------------- | --------- | --------- | 18 | | CNN | 0.56 % | - | 19 | | GMP | 0.78 % | 3.31 % | 20 | | SPP | 0.55 % | 1.49 % | 21 | | CBoF (64, 1) | **0.47 %** | **0.99 %** | 22 | 23 | 24 | If you use this code in your work please cite the following paper: 25 | 26 |
27 | @InProceedings{cbof_iccv,
28 | author = {Passalis, Nikolaos and Tefas, Anastasios},
29 | title = {Bag-of-Features Pooling for Deep Convolutional Neural Networks},
30 | booktitle = {Proceedings of the IEEE International Conference on Computer Vision (to appear)},
31 | year = {2017}
32 | }
33 | 
34 | 35 | -------------------------------------------------------------------------------- /cbof_paper/mnist_demo.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import sklearn.utils 3 | 4 | from model.cbof import CBoF 5 | from model.cnn import CNN_Simple 6 | from model.datasets import load_mnist, resize_mnist_data 7 | 8 | 9 | # Set the path to mnist.pkl.gz before running the code 10 | # Download mnist.pkl.gz from http://www.iro.umontreal.ca/~lisa/deep/data/mnist/mnist.pkl.gz 11 | 12 | def run_demo_mnist(model='cbof', n_iters=50, seed=1, ): 13 | 14 | # Load mnist data 15 | train_data, valid_data, test_data, train_labels, valid_labels, test_labels = load_mnist( 16 | dataset='/home/nick/Data/Datasets/mnist.pkl.gz') 17 | 18 | if model != 'plain': 19 | train_data_20 = resize_mnist_data(train_data, 20, 20) 20 | train_data_24 = resize_mnist_data(train_data, 24, 24) 21 | train_data_32 = resize_mnist_data(train_data, 32, 32) 22 | train_data_36 = resize_mnist_data(train_data, 36, 36) 23 | test_data_20 = resize_mnist_data(test_data, 20, 20) 24 | 25 | # Set seeds for reproducibility 26 | sklearn.utils.check_random_state(seed) 27 | np.random.seed(seed) 28 | 29 | eta = 0.0001 30 | if model == 'cbof': 31 | cnn = CBoF(learning_rate=eta, n_classes=10, bof_layer=(1, True, 64), hidden_neurons=(1000,)) 32 | cnn.init_bof(train_data[:50000, :]) 33 | elif model == 'spp': 34 | cnn = CNN_Simple(learning_rate=eta, hidden_neurons=(1000,), n_classes=10, use_spatial_pooling=True, 35 | pool_dims=[1, 2]) 36 | elif model == 'gmp': 37 | cnn = CNN_Simple(learning_rate=eta, hidden_neurons=(1000,), n_classes=10, use_spatial_pooling=True, 38 | pool_dims=[1]) 39 | elif model == 'plain': 40 | cnn = CNN_Simple(learning_rate=eta, hidden_neurons=(1000,), n_classes=10, use_spatial_pooling=False) 41 | 42 | best_valid, test_acc, best_iter = 0, 0, 0 43 | 44 | for i in range(n_iters): 45 | 46 | if model != 'plain': 47 | cnn.train_model(train_data_20, train_labels, batch_size=64) 48 | cnn.train_model(train_data_24, train_labels, batch_size=64) 49 | cnn.train_model(train_data_32, train_labels, batch_size=64) 50 | cnn.train_model(train_data_36, train_labels, batch_size=64) 51 | loss = cnn.train_model(train_data, train_labels, batch_size=64) 52 | print("Iter: ", i, ", loss: ", loss) 53 | 54 | # Get validation accuracy 55 | valid_acc = cnn.test_model(valid_data, valid_labels) 56 | if valid_acc > best_valid: 57 | best_valid = valid_acc 58 | best_iter = i 59 | # Test the model! 60 | test_acc = cnn.test_model(test_data, test_labels) 61 | test_acc_20 = 0 62 | if model != 'plain': 63 | test_acc_20 = cnn.test_model(test_data_20, test_labels) 64 | print("New validation best found, valid acc = ", valid_acc, " iter = ", i) 65 | print(test_acc, test_acc_20) 66 | 67 | print("Evaluated model = ", model) 68 | print("Best err = ", 100 - test_acc, "% found @ iter = ", best_iter) 69 | if model != 'plain': 70 | print("Err (20x20): ", 100 - test_acc_20) 71 | 72 | 73 | run_demo_mnist(model='plain') 74 | run_demo_mnist(model='gmp') 75 | run_demo_mnist(model='spp') 76 | run_demo_mnist(model='cbof') 77 | 78 | -------------------------------------------------------------------------------- /cbof_paper/model/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /cbof_paper/model/base_learner.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from tqdm import tqdm 3 | 4 | 5 | class Base_Learner: 6 | def __init__(self): 7 | self.train_fn = None 8 | self.train_mlp_fn = None 9 | self.test_fn = None 10 | 11 | def train_model(self, train_data, train_labels, batch_size=32, pre_train=False): 12 | """ 13 | Trains the model 14 | :param train_data: 15 | :param train_labels: 16 | :param batch_size: 17 | :param pre_train: 18 | :return: 19 | """ 20 | n_batches = int(np.floor(train_data.shape[0] / batch_size)) 21 | loss = 0 22 | for i in tqdm(range(n_batches)): 23 | cur_data = train_data[i * batch_size:(i + 1) * batch_size, :] 24 | cur_labels = train_labels[i * batch_size:(i + 1) * batch_size] 25 | if pre_train: 26 | cur_loss = self.train_mlp_fn(np.float32(cur_data), np.int32(cur_labels)) 27 | else: 28 | cur_loss = self.train_fn(np.float32(cur_data), np.int32(cur_labels)) 29 | loss += cur_loss * batch_size 30 | 31 | if n_batches * batch_size < train_data.shape[0]: 32 | cur_data = train_data[n_batches * batch_size:, :] 33 | cur_labels = train_labels[n_batches * batch_size:] 34 | if pre_train: 35 | cur_loss = self.train_mlp_fn(np.float32(cur_data), np.int32(cur_labels)) 36 | else: 37 | cur_loss = self.train_fn(np.float32(cur_data), np.int32(cur_labels)) 38 | loss += cur_loss * train_data.shape[0] 39 | loss = loss / float(train_data.shape[0]) 40 | return loss 41 | 42 | def test_model(self, test_data, test_labels, batch_size=32): 43 | """ 44 | Predicts the labels and returns the accuracy and the precision 45 | :param test_data: 46 | :param test_labels: 47 | :param batch_size: 48 | :return: 49 | """ 50 | labels = np.zeros((0,)) 51 | n_batches = int(np.floor(test_data.shape[0] / batch_size)) 52 | 53 | for i in range(n_batches): 54 | cur_data = test_data[i * batch_size:(i + 1) * batch_size, :] 55 | labels = np.hstack((labels, self.test_fn(np.float32(cur_data)))) 56 | 57 | if n_batches * batch_size < test_data.shape[0]: 58 | cur_data = test_data[n_batches * batch_size:, :] 59 | labels = np.hstack((labels, self.test_fn(np.float32(cur_data)))) 60 | 61 | return 100 * np.mean(test_labels == labels) 62 | -------------------------------------------------------------------------------- /cbof_paper/model/cbof.py: -------------------------------------------------------------------------------- 1 | import lasagne 2 | import theano 3 | import theano.tensor as T 4 | from model.nbof import CBoF_Input_Layer 5 | from model.base_learner import Base_Learner 6 | from model.cnn_feat import CNN_Feature_Extractor 7 | 8 | 9 | class CBoF(Base_Learner): 10 | def __init__(self, n_classes=10, learning_rate=0.00001, bof_layer=(4, 0, 128), hidden_neurons=(1000,), 11 | dropout=(0.5,), feature_dropout=0, g=0.1): 12 | 13 | Base_Learner.__init__(self) 14 | 15 | input_var = T.ftensor4('inputs') 16 | target_var = T.ivector('targets') 17 | 18 | # Create the CNN feature extractor 19 | self.cnn_layer = CNN_Feature_Extractor(input_var, size=None) 20 | 21 | # Create the BoF layer 22 | (cnn_layer_id, spatial_level, n_codewords) = bof_layer 23 | self.bof_layer = CBoF_Input_Layer(input_var, self.cnn_layer, cnn_layer_id, level=spatial_level, 24 | n_codewords=n_codewords, g=g, pyramid=False) 25 | features = self.bof_layer.fused_features 26 | n_size_features = self.bof_layer.features_size 27 | 28 | # Create an output MLP 29 | network = lasagne.layers.InputLayer(shape=(None, n_size_features), input_var=features) 30 | if feature_dropout > 0: 31 | network = lasagne.layers.DropoutLayer(network, p=feature_dropout) 32 | for n, drop_rate in zip(hidden_neurons, dropout): 33 | network = lasagne.layers.DenseLayer(network, num_units=n, nonlinearity=lasagne.nonlinearities.elu, 34 | W=lasagne.init.Orthogonal()) 35 | network = lasagne.layers.DropoutLayer(network, p=drop_rate) 36 | 37 | network = lasagne.layers.DenseLayer(network, num_units=n_classes, 38 | nonlinearity=lasagne.nonlinearities.softmax, 39 | W=lasagne.init.Normal(std=1)) 40 | # Get network loss 41 | self.prediction_train = lasagne.layers.get_output(network, deterministic=False) 42 | loss = lasagne.objectives.categorical_crossentropy(self.prediction_train, target_var).mean() 43 | 44 | # Define training rules 45 | params_mlp = lasagne.layers.get_all_params(network, trainable=True) 46 | updates_mlp = lasagne.updates.adam(loss, params_mlp, learning_rate=learning_rate) 47 | updates = lasagne.updates.adam(loss, params_mlp, learning_rate=learning_rate) 48 | updates.update(lasagne.updates.adam(loss, self.cnn_layer.layer_params[cnn_layer_id], 49 | learning_rate=learning_rate)) 50 | updates.update(lasagne.updates.adam(loss, self.bof_layer.V, learning_rate=learning_rate)) 51 | updates.update(lasagne.updates.adam(loss, self.bof_layer.sigma, learning_rate=learning_rate)) 52 | 53 | # Define testing/validation 54 | prediction_test = lasagne.layers.get_output(network, deterministic=True) 55 | 56 | # Compile functions 57 | self.train_fn = theano.function([input_var, target_var], loss, updates=updates) 58 | self.train_mlp_fn = theano.function([input_var, target_var], loss, updates=updates_mlp) 59 | self.test_fn = theano.function([input_var], T.argmax(prediction_test, axis=1)) 60 | 61 | # Get the output of the bof module 62 | self.get_features_fn = theano.function([input_var], features) 63 | 64 | def init_bof(self, data): 65 | """ 66 | Initializes the BoF layer using k-means 67 | :param data: 68 | :return: 69 | """ 70 | self.bof_layer.initialize(data) 71 | -------------------------------------------------------------------------------- /cbof_paper/model/cnn.py: -------------------------------------------------------------------------------- 1 | import lasagne 2 | import lasagne.layers.dnn 3 | import theano 4 | import theano.tensor as T 5 | from model.cnn_feat import CNN_Feature_Extractor 6 | from model.base_learner import Base_Learner 7 | 8 | 9 | class CNN_Simple(Base_Learner): 10 | """ 11 | Implements the baseline models (CNN and SPP) 12 | """ 13 | 14 | def __init__(self, learning_rate=0.0001, hidden_neurons=(1000,), dropout=(0.5,), feature_dropout=0.5, n_classes=15, 15 | use_spatial_pooling=False, pool_dims=[2, 1], size=28): 16 | 17 | Base_Learner.__init__(self) 18 | 19 | input_var = T.ftensor4('inputs') 20 | target_var = T.ivector('targets') 21 | 22 | if use_spatial_pooling: 23 | size = None 24 | 25 | # Create the CNN feature extractor 26 | self.cnn_layer = CNN_Feature_Extractor(input_var, size=size, pool_size=[(2, 2), ()]) 27 | network = self.cnn_layer.networks[-1] 28 | cnn_params = self.cnn_layer.layer_params[-1] 29 | 30 | # Add spatial pooling layer, if needed 31 | if use_spatial_pooling: 32 | # network = lasagne.layers.Conv2DLayer(network, num_filters=64, filter_size=(1,1), 33 | # nonlinearity=lasagne.nonlinearities.rectify, 34 | # W=lasagne.init.GlorotUniform()) 35 | network = lasagne.layers.dnn.SpatialPyramidPoolingDNNLayer(network, pool_dims=pool_dims) 36 | else: 37 | # otherwise, add a regular 2x2 pooling layer 38 | network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2)) 39 | 40 | if feature_dropout > 0: 41 | network = lasagne.layers.DropoutLayer(network, p=feature_dropout) 42 | 43 | params_mlp = [] 44 | for n, drop_rate in zip(hidden_neurons, dropout): 45 | network = lasagne.layers.DenseLayer(network, num_units=n, nonlinearity=lasagne.nonlinearities.elu, 46 | W=lasagne.init.Orthogonal()) 47 | params_mlp.append(network.W) 48 | params_mlp.append(network.b) 49 | network = lasagne.layers.DropoutLayer(network, p=drop_rate) 50 | 51 | network = lasagne.layers.DenseLayer(network, num_units=n_classes, 52 | nonlinearity=lasagne.nonlinearities.softmax) 53 | params_mlp.append(network.W) 54 | params_mlp.append(network.b) 55 | 56 | # Get network loss 57 | prediction_train = lasagne.layers.get_output(network, deterministic=False) 58 | loss = lasagne.objectives.categorical_crossentropy(prediction_train, target_var).mean() 59 | 60 | # Define training rules 61 | updates_mlp = lasagne.updates.adam(loss, params_mlp, learning_rate=learning_rate) 62 | updates = lasagne.updates.adam(loss, params_mlp, learning_rate=learning_rate) 63 | updates.update(lasagne.updates.adam(loss, cnn_params, learning_rate=learning_rate)) 64 | 65 | # Define testing/validation 66 | prediction_test = lasagne.layers.get_output(network, deterministic=True) 67 | test_loss = lasagne.objectives.categorical_crossentropy(prediction_test, target_var).mean() 68 | test_acc = T.mean(T.eq(T.argmax(prediction_test, axis=1), target_var), dtype='float32') 69 | 70 | # Compile functions 71 | self.train_fn = theano.function([input_var, target_var], loss, updates=updates) 72 | self.test_fn = theano.function([input_var], T.argmax(prediction_test, axis=1)) 73 | self.val_fn = theano.function([input_var, target_var], [test_loss, test_acc]) 74 | 75 | self.train_mlp_fn = theano.function([input_var, target_var], loss, updates=updates_mlp) 76 | -------------------------------------------------------------------------------- /cbof_paper/model/cnn_feat.py: -------------------------------------------------------------------------------- 1 | import lasagne 2 | 3 | class Base_CNN_Feature_Extractor: 4 | def __init__(self): 5 | # Features extracted from each layer 6 | self.layer_features = [] 7 | 8 | # Feature dimension per layer 9 | self.features_dim = [] 10 | 11 | # Cumulative parameters for each layer 12 | self.layer_params = [] 13 | 14 | # Lasagne network reference for each layer 15 | self.networks = [] 16 | 17 | def get_features(self, layer): 18 | """ 19 | Returns all the feature vectors of a layer 20 | :param layer: the layer from which to extract the feature vectors 21 | :return: the feature vectors 22 | """ 23 | 24 | features = self.layer_features[layer] 25 | features = features.reshape((features.shape[0], features.shape[1] * features.shape[2], features.shape[3])) 26 | return features 27 | 28 | def get_spatial_features(self, layer, i, level=1): 29 | """ 30 | Returns the features of the i-th region of the layer (only 2x2 segmentation is supported) 31 | :param layer: the layer from which to extract the features 32 | :param i: the region of the layer to extract the features 33 | :return: the feature vectors 34 | """ 35 | # This function assumes a square image input 36 | pivot = self.layer_features[layer].shape[1] // 2 37 | if level == 1: 38 | if i == 0: 39 | features = self.layer_features[layer][:, :pivot, :pivot, :] 40 | elif i == 1: 41 | features = self.layer_features[layer][:, :pivot, pivot:, :] 42 | elif i == 2: 43 | features = self.layer_features[layer][:, pivot:, :pivot, :] 44 | elif i == 3: 45 | features = self.layer_features[layer][:, pivot:, pivot:, :] 46 | else: 47 | print("Wrong region number") 48 | assert False 49 | else: 50 | print("Only spatial levels 1 and 2 are supported, got ", level) 51 | assert False 52 | 53 | features = features.reshape((features.shape[0], features.shape[1] * features.shape[2], features.shape[3])) 54 | return features 55 | 56 | 57 | class CNN_Feature_Extractor(Base_CNN_Feature_Extractor): 58 | """ 59 | Implements a simple convolutional feature extractor 60 | """ 61 | 62 | def __init__(self, input_var, size=28, channels=1, n_filters=[32, 64], filters_size=[(5, 5), (5, 5)], 63 | pool_size=[(2, 2), ()]): 64 | """ 65 | Defines a set of convolutional layer that extracts features 66 | :param input_var: input var of the network 67 | :param size: image input size (set to None to allow images of arbitrary size) 68 | :param channels: number of channels in the image 69 | :param n_filters: number of filters in each layer 70 | :param filters_size: size of the filters in each layer 71 | :param pool_size: pool size in each layer 72 | """ 73 | # Input 74 | network = lasagne.layers.InputLayer(shape=(None, channels, size, size), input_var=input_var) 75 | 76 | # Store the dimensionality of each feature vector (for use by the next layers) 77 | self.features_dim = [] 78 | # Store the features of each convolutional layer 79 | self.layer_features = [] 80 | self.layer_params = [] 81 | self.networks = [] 82 | 83 | # Define the layers 84 | for n, size, pool in zip(n_filters, filters_size, pool_size): 85 | network = lasagne.layers.Conv2DLayer(network, num_filters=n, filter_size=size, 86 | nonlinearity=lasagne.nonlinearities.rectify, 87 | W=lasagne.init.GlorotUniform()) 88 | self.features_dim.append(n) 89 | if pool: 90 | network = lasagne.layers.MaxPool2DLayer(network, pool_size=pool) 91 | 92 | # Save the output of each layer (after reordering the dimensions: n_samples, n_vectors, n_feats) 93 | self.layer_features.append(lasagne.layers.get_output(network, deterministic=True).transpose((0, 2, 3, 1))) 94 | self.layer_params.append(lasagne.layers.get_all_params(network, trainable=True)) 95 | self.networks.append(network) 96 | -------------------------------------------------------------------------------- /cbof_paper/model/datasets.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def load_mnist(dataset='/home/nick/Data/Datasets/mnist.pkl.gz'): 5 | """ 6 | Loads the mnist dataset 7 | :return: 8 | """ 9 | import gzip 10 | import pickle 11 | 12 | with gzip.open(dataset, 'rb') as f: 13 | try: 14 | train_set, valid_set, test_set = pickle.load(f, encoding='latin1') 15 | except: 16 | train_set, valid_set, test_set = pickle.load(f) 17 | train_data = train_set[0].reshape((-1, 1, 28, 28)) 18 | valid_data = valid_set[0].reshape((-1, 1, 28, 28)) 19 | test_data = test_set[0].reshape((-1, 1, 28, 28)) 20 | return train_data, valid_data, test_data, train_set[1], valid_set[1], test_set[1] 21 | 22 | 23 | def resize_mnist_data(images, new_size_a, new_size_b=None): 24 | """ 25 | Resizes a set of images 26 | :param images: 27 | :param new_size: 28 | :return: 29 | """ 30 | from skimage.transform import resize 31 | 32 | if new_size_b is None: 33 | new_size_b = new_size_a 34 | 35 | resized_data = np.zeros((images.shape[0], 1, new_size_a, new_size_b)) 36 | for i in range(len(images)): 37 | resized_data[i, 0, :, :] = resize(images[i, 0, :, :], (new_size_a, new_size_b)) 38 | return np.float32(resized_data) 39 | 40 | 41 | 42 | 43 | -------------------------------------------------------------------------------- /cbof_paper/model/nbof.py: -------------------------------------------------------------------------------- 1 | import theano 2 | import numpy as np 3 | from sklearn.preprocessing import normalize as feature_normalizer 4 | import theano.tensor as T 5 | import theano.gradient 6 | import sklearn.cluster as cluster 7 | 8 | floatX = theano.config.floatX 9 | 10 | 11 | class NBoFInputLayer: 12 | """ 13 | Defines a Neural BoF input layer 14 | """ 15 | 16 | def __init__(self, g=0.1, feature_dimension=89, n_codewords=16): 17 | """ 18 | Initializes the Neural BoF object 19 | :param g: defines the softness of the quantization 20 | :param feature_dimension: dimension of the feature vectors 21 | :param n_codewords: number of codewords / RBF neurons to be used 22 | """ 23 | 24 | self.Nk = n_codewords 25 | self.D = feature_dimension 26 | 27 | # RBF-centers / codewords 28 | V = np.random.rand(self.Nk, self.D) 29 | self.V = theano.shared(value=V.astype(dtype=floatX), name='V', borrow=True) 30 | sigma = np.ones((self.Nk,)) / g 31 | self.sigma = theano.shared(value=sigma.astype(dtype=floatX), name='sigma', borrow=True) 32 | self.params = [self.V, self.sigma] 33 | 34 | # Tensor of input objects (n_objects, n_features, self.D) 35 | self.X = T.tensor3(name='X', dtype=floatX) 36 | 37 | # Feature matrix of an object (n_features, self.D) 38 | self.x = T.matrix(name='x', dtype=floatX) 39 | 40 | # Encode a set of objects 41 | """ 42 | Note that the number of features per object is fixed and same for all objects. 43 | The code can be easily extended by defining a feature vector mask, allowing for a variable number of feature 44 | vectors for each object (or alternatively separately encoding each object). 45 | """ 46 | self.encode_objects_theano = theano.function(inputs=[self.X], outputs=self.sym_histograms(self.X)) 47 | 48 | # Encodes only one object with an arbitrary number of features 49 | self.encode_object_theano = theano.function(inputs=[self.x], outputs=self.sym_histogram(self.x)) 50 | 51 | def sym_histogram(self, X): 52 | """ 53 | Computes a soft-quantized histogram of a set of feature vectors (X is a matrix). 54 | :param X: matrix of feature vectors 55 | :return: 56 | """ 57 | distances = symbolic_distance_matrix(X, self.V) 58 | membership = T.nnet.softmax(-distances * self.sigma) 59 | histogram = T.mean(membership, axis=0) 60 | return histogram 61 | 62 | def sym_histograms(self, X): 63 | """ 64 | Encodes a set of objects (X is a tensor3) 65 | :param X: tensor3 containing the feature vectors for each object 66 | :return: 67 | """ 68 | histograms, updates = theano.map(self.sym_histogram, X) 69 | return histograms 70 | 71 | def initialize_dictionary(self, X, max_iter=100, redo=5, n_samples=50000, normalize=False): 72 | """ 73 | Samples some feature vectors from X and learns an initial dictionary 74 | :param X: list of objects 75 | :param max_iter: maximum k-means iters 76 | :param redo: number of times to repeat k-means clustering 77 | :param n_samples: number of feature vectors to sample from the objects 78 | :param normalize: use l_2 norm normalization for the feature vectors 79 | """ 80 | 81 | # Sample only a small number of feature vectors from each object 82 | samples_per_object = int(np.ceil(n_samples / len(X))) 83 | 84 | features = None 85 | print("Sampling feature vectors...") 86 | for i in (range(len(X))): 87 | idx = np.random.permutation(X[i].shape[0])[:samples_per_object + 1] 88 | cur_features = X[i][idx, :] 89 | if features is None: 90 | features = cur_features 91 | else: 92 | features = np.vstack((features, cur_features)) 93 | 94 | print("Clustering feature vectors...") 95 | features = np.float64(features) 96 | if normalize: 97 | features = feature_normalizer(features) 98 | 99 | V = cluster.k_means(features, n_clusters=self.Nk, max_iter=max_iter, n_init=redo) 100 | self.V.set_value(np.asarray(V[0], dtype=theano.config.floatX)) 101 | 102 | 103 | def symbolic_distance_matrix(A, B): 104 | """ 105 | Defines the symbolic matrix that contains the distances between the vectors of A and B 106 | :param A: 107 | :param B: 108 | :return: 109 | """ 110 | aa = T.sum(A * A, axis=1) 111 | bb = T.sum(B * B, axis=1) 112 | AB = T.dot(A, T.transpose(B)) 113 | 114 | AA = T.transpose(T.tile(aa, (bb.shape[0], 1))) 115 | BB = T.tile(bb, (aa.shape[0], 1)) 116 | 117 | D = AA + BB - 2 * AB 118 | D = T.maximum(D, 0) 119 | D = T.sqrt(D) 120 | return D 121 | 122 | 123 | class CBoF_Input_Layer: 124 | def __init__(self, input, cnn, layer, level=1, pyramid=False, g=0.1, n_codewords=16): 125 | """ 126 | Defines a CBoF layer for use with convolutional feature extractors 127 | :param input: symbolic input variable 128 | :param cnn: the cnn input model 129 | :param layer: the convolutional layer (id) to use 130 | :param spatial: if set to True, spatial pyramid is used 131 | :param g: the BoF softness variable 132 | :param n_codewords: number of codewords for each BoF unit 133 | """ 134 | self.bof = [] 135 | self.features = [] 136 | self.V = [] 137 | self.sigma = [] 138 | self.get_features = [] 139 | self.n = 1 140 | 141 | self.features_size = 0 142 | if level == 0 or pyramid: 143 | # Create the BoF object 144 | self.bof.append(NBoFInputLayer(g=g, feature_dimension=cnn.features_dim[layer], n_codewords=n_codewords)) 145 | self.V.append(self.bof[0].V) 146 | self.sigma.append(self.bof[0].sigma) 147 | # Extract the representation 148 | self.features.append(self.bof[0].sym_histograms(cnn.get_features(layer))) 149 | # Compile functions for extracting feature vectors 150 | self.get_features.append(theano.function([input], cnn.get_features(layer))) 151 | # Fuse the extracted representations 152 | self.fused_features = self.features[0] 153 | # Calculate length 154 | self.features_size += n_codewords 155 | if level == 1: 156 | for i in range(4 ** level): 157 | # Create the BoF object 158 | self.bof.append(NBoFInputLayer(g=g, feature_dimension=cnn.features_dim[layer], n_codewords=n_codewords)) 159 | self.V.append(self.bof[i].V) 160 | self.sigma.append(self.bof[i].sigma) 161 | # Extract the representation 162 | self.features.append(self.bof[i].sym_histograms(cnn.get_spatial_features(layer, i, level))) 163 | # Compile functions for extracting feature vectors 164 | self.get_features.append(theano.function([input], cnn.get_spatial_features(layer, i, level))) 165 | # Fuse the extracted representations 166 | self.fused_features = T.concatenate(tuple(self.features), axis=1) 167 | # Calculate length 168 | self.features_size += n_codewords * (4 ** level) 169 | 170 | def initialize(self, data, max_iter=100, redo=5, n_samples=50000, normalize=False): 171 | """ 172 | Initializes each of the spatial BoF layers in the CBoF layer 173 | :param data: input samples 174 | :param max_iter: max number of iterations for the k-means algorithm 175 | :param redo: number to redo the clustering 176 | :param n_samples: number of vectors to sample for clustering 177 | :param normalize: use l_2 norm normalization for the feature vectors 178 | :return: 179 | """ 180 | 181 | for i in range(len(self.bof)): 182 | features = [] 183 | for x in data: 184 | x_in = x.reshape((1, x.shape[0], x.shape[1], x.shape[2])) 185 | cur_features = self.get_features[i](np.float32(x_in)) 186 | features.append(cur_features) 187 | features = np.asarray(features) 188 | features = features.reshape((features.shape[0], features.shape[2], features.shape[3])) 189 | self.bof[i].initialize_dictionary(features, max_iter=max_iter, redo=redo, n_samples=n_samples, 190 | normalize=normalize) 191 | -------------------------------------------------------------------------------- /datasets/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /datasets/mnist.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import os 3 | import numpy as np 4 | from keras.datasets import mnist 5 | 6 | def load_mnist(): 7 | (X_train, y_train), (X_test, y_test) = mnist.load_data() 8 | X_train, X_test = X_train/255.0, X_test/255.0 9 | 10 | # Keep some validation data 11 | X_train, X_val = X_train[:-5000], X_train[-5000:] 12 | y_train, y_val = y_train[:-5000], y_train[-5000:] 13 | 14 | X_train = X_train.reshape(X_train.shape[0], 1, 28, 28) 15 | X_val = X_val.reshape(X_val.shape[0], 1, 28, 28) 16 | X_test = X_test.reshape(X_test.shape[0], 1, 28, 28) 17 | 18 | return np.float32(X_train), y_train, np.float32(X_val), y_val, np.float32(X_test), y_test 19 | -------------------------------------------------------------------------------- /mnist_example.py: -------------------------------------------------------------------------------- 1 | import lasagne 2 | import theano 3 | import theano.tensor as T 4 | import numpy as np 5 | from datasets.mnist import load_mnist 6 | from models.bof import CBoF_Layer 7 | from models.learner_base import LearnerBase 8 | 9 | 10 | class LeNeT_Model(LearnerBase): 11 | def __init__(self, pooling='spp', spatial_level=1, n_codewords=64, learning_rate=0.001): 12 | self.initializers = [] 13 | 14 | input_var = T.ftensor4('input_var') 15 | target_var = T.ivector('targets') 16 | 17 | network = lasagne.layers.InputLayer(shape=(None, 1, None, None), input_var=input_var) 18 | network = lasagne.layers.Conv2DLayer(network, num_filters=32, filter_size=(5, 5), 19 | nonlinearity=lasagne.nonlinearities.rectify) 20 | network = lasagne.layers.MaxPool2DLayer(network, pool_size=(2, 2)) 21 | network = lasagne.layers.Conv2DLayer(network, num_filters=64, filter_size=(5, 5), 22 | nonlinearity=lasagne.nonlinearities.rectify) 23 | if pooling == 'spp': 24 | network = lasagne.layers.SpatialPyramidPoolingLayer(network, pool_dims=[1, 2]) 25 | elif pooling == 'bof': 26 | network = CBoF_Layer(network, input_var=input_var, initializers=self.initializers, n_codewords=n_codewords, 27 | spatial_level=spatial_level) 28 | 29 | network = lasagne.layers.dropout(network, p=.5) 30 | network = lasagne.layers.DenseLayer(network, num_units=1000, nonlinearity=lasagne.nonlinearities.elu) 31 | network = lasagne.layers.dropout(network, p=.5) 32 | network = lasagne.layers.DenseLayer(network, num_units=10, nonlinearity=lasagne.nonlinearities.softmax) 33 | self.network = network 34 | 35 | train_prediction = lasagne.layers.get_output(network, deterministic=False) 36 | test_prediction = lasagne.layers.get_output(network, deterministic=True) 37 | loss = lasagne.objectives.categorical_crossentropy(train_prediction, target_var).mean() 38 | 39 | self.params = lasagne.layers.get_all_params(network, trainable=True) 40 | updates = lasagne.updates.adam(loss, self.params, learning_rate=learning_rate) 41 | 42 | self.train_fn = theano.function([input_var, target_var], loss, updates=updates) 43 | self.test_fn = theano.function([input_var], T.argmax(test_prediction, axis=1)) 44 | 45 | print "Model Compiled!" 46 | 47 | def initialize_model(self, data, n_samples=50000): 48 | for initializer in self.initializers: 49 | initializer(data, n_samples=n_samples) 50 | print "Model initialized!" 51 | 52 | 53 | if __name__ == '__main__': 54 | np.random.seed(12345) 55 | 56 | X_train, y_train, X_val, y_val, X_test, y_test = load_mnist() 57 | 58 | for pool_type in ['bof', 'spp']: 59 | model = LeNeT_Model(pooling=pool_type) 60 | 61 | if pool_type == 'bof': 62 | model.initialize_model(X_train, n_samples=50000) 63 | 64 | model.train_model(X_train, y_train, validation_data=X_val, validation_labels=y_val, n_iters=50, batch_size=256) 65 | 66 | print "Evaluated model = ", pool_type 67 | print "Error = ", (1 - model.test_model(X_test, y_test)) * 100 68 | print "Error (0.7 scale) = ", (1 - model.test_model(X_test, y_test, scale=0.7)) * 100 69 | print "Error (0.8 scale) = ", (1 - model.test_model(X_test, y_test, scale=0.8)) * 100 70 | -------------------------------------------------------------------------------- /models/__init__.py: -------------------------------------------------------------------------------- 1 | 2 | -------------------------------------------------------------------------------- /models/bof.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | import theano 3 | import theano.tensor as T 4 | import lasagne 5 | from sklearn.cluster import KMeans 6 | from sklearn.metrics.pairwise import pairwise_distances 7 | 8 | 9 | class CBoF_Layer(lasagne.layers.Layer): 10 | """ 11 | Lasagne implementation of the CBoF Pooling Layer 12 | """ 13 | 14 | def __init__(self, incoming, n_codewords=24, V=lasagne.init.Normal(0.1), gamma=lasagne.init.Constant(0.1), 15 | eps=0.00001, input_var=None, initializers=None, spatial_level=1, **kwargs): 16 | """ 17 | Creates a BoF layer 18 | 19 | :param incoming: 20 | :param n_codewords: number of codewords 21 | :param V: initializer used for the codebook 22 | :param gamma: initializer used for the scaling factors 23 | :param eps: epsilon used to ensure numerical stability 24 | :param input_var: input_var of the model (used to compile a function that extract the features fed to layer) 25 | :param initializers: 26 | :param spatial_level: 0 (no spatial segmentation), 1 (first spatial level) 27 | :param pooling_type: either 'mean' or 'max' 28 | :param kwargs: 29 | """ 30 | super(CBoF_Layer, self).__init__(incoming, **kwargs) 31 | 32 | self.n_codewords = n_codewords 33 | self.spatial_level = spatial_level 34 | n_filters = self.input_shape[1] 35 | self.eps = eps 36 | 37 | # Create parameters 38 | self.V = self.add_param(V, (n_codewords, n_filters, 1, 1), name='V') 39 | self.gamma = self.add_param(gamma, (1, n_codewords, 1, 1), name='gamma') 40 | 41 | # Make gammas broadcastable 42 | self.gamma = T.addbroadcast(self.gamma, 0, 2, 3) 43 | 44 | # Compile function used for feature extraction 45 | if input_var is not None: 46 | self.features_fn = theano.function([input_var], lasagne.layers.get_output(incoming, deterministic=True)) 47 | 48 | if initializers is not None: 49 | initializers.append(self.initialize_layer) 50 | 51 | def get_output_for(self, input, **kwargs): 52 | distances = conv_pairwise_distance(input, self.V) 53 | similarities = T.exp(-distances / T.abs_(self.gamma)) 54 | norm = T.sum(similarities, 1).reshape((similarities.shape[0], 1, similarities.shape[2], similarities.shape[3])) 55 | membership = similarities / (norm + self.eps) 56 | 57 | histogram = T.mean(membership, axis=(2, 3)) 58 | if self.spatial_level == 1: 59 | pivot1, pivot2 = membership.shape[2] / 2, membership.shape[3] / 2 60 | h1 = T.mean(membership[:, :, :pivot1, :pivot2], axis=(2, 3)) 61 | h2 = T.mean(membership[:, :, :pivot1, pivot2:], axis=(2, 3)) 62 | h3 = T.mean(membership[:, :, pivot1:, :pivot2], axis=(2, 3)) 63 | h4 = T.mean(membership[:, :, pivot1:, pivot2:], axis=(2, 3)) 64 | # Pyramid is not used in the paper 65 | # histogram = T.horizontal_stack(h1, h2, h3, h4) 66 | histogram = T.horizontal_stack(histogram, h1, h2, h3, h4) 67 | return histogram 68 | 69 | def get_output_shape_for(self, input_shape): 70 | if self.spatial_level == 1: 71 | return (input_shape[0], 5 * self.n_codewords) 72 | return (input_shape[0], self.n_codewords) 73 | 74 | def initialize_layer(self, data, n_samples=10000): 75 | """ 76 | Initializes the layer using k-means (sigma is set to the mean pairwise distance) 77 | :param data: data 78 | :param n_samples: n_samples to keep for initializing the model 79 | :return: 80 | """ 81 | if self.features_fn is None: 82 | assert False 83 | 84 | idx = np.arange(data.shape[0]) 85 | np.random.shuffle(idx) 86 | 87 | features = [] 88 | for i in range(idx.shape[0]): 89 | feats = self.features_fn([data[idx[i]]]) 90 | feats = feats.transpose((0, 2, 3, 1)) 91 | feats = feats.reshape((-1, feats.shape[-1])) 92 | features.extend(feats) 93 | if len(features) > n_samples: 94 | break 95 | features = np.asarray(features) 96 | 97 | kmeans = KMeans(n_clusters=self.n_codewords, n_jobs=4, n_init=5) 98 | kmeans.fit(features) 99 | V = kmeans.cluster_centers_.copy() 100 | 101 | # Initialize gamma 102 | mean_distance = np.sum(pairwise_distances(V)) / (self.n_codewords * (self.n_codewords - 1)) 103 | self.gamma.set_value(self.gamma.get_value() * np.float32(mean_distance)) 104 | 105 | # Initialize codebook 106 | V = V.reshape((V.shape[0], V.shape[1], 1, 1)) 107 | self.V.set_value(np.float32(V)) 108 | 109 | 110 | def conv_pairwise_distance(feature_maps, codebook): 111 | """ 112 | Calculates the pairwise distances between the feature maps (n_samples, filters, x, y) 113 | :param feature_maps: 114 | :param codebook: 115 | :return: 116 | """ 117 | x_square = T.sum(feature_maps ** 2, axis=1) # n_samples, filters, x, y 118 | x_square = x_square.reshape((x_square.shape[0], 1, x_square.shape[1], x_square.shape[2])) 119 | x_square = T.addbroadcast(x_square, 1) 120 | 121 | y_square = T.sum(codebook ** 2, axis=1) 122 | y_square = y_square.reshape((1, y_square.shape[0], y_square.shape[1], y_square.shape[2])) 123 | y_square = T.addbroadcast(y_square, 0, 2, 3) 124 | 125 | inner_product = T.nnet.conv2d(feature_maps, codebook) 126 | dist = x_square + y_square - 2 * inner_product 127 | dist = T.sqrt(T.maximum(dist, 0)) 128 | return dist 129 | -------------------------------------------------------------------------------- /models/learner_base.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | from time import time 3 | from sklearn.metrics import accuracy_score 4 | import time 5 | import cv2 6 | 7 | 8 | class LearnerBase(): 9 | def __init__(self): 10 | self.train_fn = None 11 | self.test_fn = None 12 | self.lr = None 13 | 14 | self.best_param_values = [] 15 | self.params = [] 16 | 17 | def save_validation_parameters(self): 18 | """ 19 | Saves the best parameters found during the validation 20 | """ 21 | self.best_param_values = [] 22 | for i, param in enumerate(self.params): 23 | self.best_param_values.append(param.get_value()) 24 | 25 | def restore_validation_parameters(self): 26 | """ 27 | Restores the best parameters 28 | """ 29 | if len(self.best_param_values) > 0: 30 | for i, param in enumerate(self.params): 31 | param.set_value(self.best_param_values[i]) 32 | 33 | def train_model(self, data, labels, batch_size=32, n_iters=10, validation_data=None, validation_labels=None): 34 | 35 | loss = [] 36 | idx = np.arange(data.shape[0]) 37 | best_val_acc = 0 38 | 39 | for i in range(n_iters): 40 | np.random.shuffle(idx) 41 | cur_loss = 0 42 | n_batches = data.shape[0] / batch_size 43 | start_time = time.time() 44 | 45 | # Iterate mini-batches 46 | for j in range(n_batches): 47 | cur_idx = np.sort(idx[j * batch_size:(j + 1) * batch_size]) 48 | cur_data = data[cur_idx] 49 | cur_labels = labels[cur_idx] 50 | cur_loss += self.train_fn(cur_data, cur_labels) * cur_data.shape[0] 51 | 52 | # Last batch 53 | if n_batches * batch_size < data.shape[0]: 54 | # for cur_scale in scales: 55 | cur_idx = np.sort(idx[n_batches * batch_size:]) 56 | cur_data = data[cur_idx] 57 | cur_labels = labels[cur_idx] 58 | cur_loss += self.train_fn(cur_data, cur_labels) * cur_data.shape[0] 59 | 60 | loss.append(cur_loss / float(data.shape[0])) 61 | elapsed_time = time.time() - start_time 62 | 63 | print "Epoch %d loss = %5.4f, cur_time: %6.1f s time_left: %8.1f s" % \ 64 | (i + 1, loss[-1], elapsed_time, (n_iters - i) * elapsed_time) 65 | 66 | if validation_data is not None: 67 | val_acc = self.test_model(validation_data, validation_labels) 68 | if val_acc > best_val_acc: 69 | best_val_acc = val_acc 70 | print "New best found!", val_acc 71 | self.save_validation_parameters() 72 | 73 | if validation_data is not None: 74 | self.restore_validation_parameters() 75 | 76 | return loss 77 | 78 | def test_model(self, data, labels, scale=1, batch_size=100): 79 | """ 80 | 81 | :param data: images for testing 82 | :param labels: classes of the images 83 | :param scale: the scale used for the testing (only if global pooling/cbof is used) 84 | :param batch_size: batch size to be used for the testing 85 | :return: 86 | """ 87 | predicted_labels = [] 88 | 89 | # Resize images if needed 90 | if scale != 1: 91 | img_size = [int(x * scale) for x in data.shape[2:]] 92 | new_data = np.zeros((data.shape[0], data.shape[1], img_size[0], img_size[1])) 93 | for k in range(data.shape[0]): 94 | new_data[k] = resize_image(data[k], img_size) 95 | data = np.float32(new_data) 96 | 97 | n_batches = data.shape[0] / batch_size 98 | 99 | # Iterate mini-batches 100 | for j in range(n_batches): 101 | cur_data = data[j * batch_size:(j + 1) * batch_size] 102 | predicted_labels.extend(self.test_fn(cur_data)) 103 | # Last batch 104 | if n_batches * batch_size < data.shape[0]: 105 | cur_data = data[n_batches * batch_size:] 106 | predicted_labels.extend(self.test_fn(cur_data)) 107 | predicted_labels = np.asarray(predicted_labels) 108 | 109 | acc = accuracy_score(labels, predicted_labels) 110 | return acc 111 | 112 | 113 | def resize_image(img, size): 114 | img = img.transpose((1, 2, 0)) 115 | img = cv2.resize(img, (size[0], size[1])) 116 | 117 | if len(img.shape) == 2: 118 | img = img.reshape((1, img.shape[0], img.shape[1])) 119 | else: 120 | img = img.transpose((2, 0, 1)) 121 | return img 122 | --------------------------------------------------------------------------------