├── .gitignore ├── LICENSE ├── README.md ├── data_providers ├── __init__.py ├── base_provider.py ├── cifar.py ├── downloader.py ├── svhn.py └── utils.py ├── models ├── __init__.py └── dense_net.py ├── requirements ├── base.txt ├── cpu.txt └── gpu.txt └── run_dense_net.py /.gitignore: -------------------------------------------------------------------------------- 1 | .env/ 2 | .venv/ 3 | __pycache__ 4 | *.pyc 5 | saves/ 6 | logs/ 7 | TODO 8 | runscript.sh 9 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | MIT License 2 | 3 | Copyright (c) 2017 Illarion 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | # Stochastic Delta Rule implementation using DenseNet in TensorFlow 2 | 3 | 4 | ## THIS REPOSITORY IS NO LONGER IN USE. THE NEW SDR REPOSITORY CAN BE FOUND [HERE](https://github.com/noahfl/sdr-densenet-pytorch). 5 | 6 | **NOTE** This is repository is based off of [Illarion Khlestov's DenseNet implementation](https://github.com/ikhlestov/vision_networks/ "ikhlestov/vision_networks/"). Check out his blog post about implementing DenseNet in TensorFlow [here](https://medium.com/@illarionkhlestov/notes-on-the-implementation-densenet-in-tensorflow-beeda9dd1504#.55qu3tfqm). 7 | 8 | 9 | --------------------------------------------------------------------------------------- 10 | 11 | 12 | Check out @lifeiteng's [results from implementing SDR with WaveNet](https://twitter.com/FeitengLi/status/1029166830844227584). 13 | 14 | 15 | **UPDATE**: Due to a bug found by @basveeling which has now been corrected, the testing errors are being recalculated. Here are the preliminary results, which I will continue to update as the results come out. "-----" indicates results that have not yet been redone. 16 | 17 | |Model type |Depth |C10 |C100 | 18 | |:---------------------|:------|:----------------|:-----------------| 19 | |DenseNet(*k* = 12) |40 |-----(-----) |-----(-----) | 20 | |DenseNet(*k* = 12) |100 |**-----**(-----) |**-----**(-----) | 21 | |DenseNet-BC(*k* = 12) |100 |-----(-----) |-----(-----) | 22 | 23 | 24 | 25 | This repository holds the code for the paper 26 | 27 | 'Dropout is a special case of the stochastic delta rule: faster and more accurate deep learning' (submitted to NIPS; on [arXiv](https://arxiv.org/abs/1808.03578)) 28 | 29 | [Noah Frazier-Logue](https://www.linkedin.com/in/noah-frazier-logue-1524b796/), [Stephen Jose Hanson](http://nwkpsych.rutgers.edu/~jose/) 30 | 31 | Stochastic Delta Rule (SDR) is a weight update mechanism that assigns to each weight a standard deviation that changes as a function of the gradients every training iteration. At the beginning of each training iteration, the weights are re-initialized using a normal distribution bound by their standard deviations. Over the course of the training iterations and epochs, the standard deviations converge towards zero as the network becomes more sure of what the values of each of the weights should be. For a more detailed description of the method and its properties, have a look at the paper [link here]. 32 | 33 | 34 | 35 | Two types of [Densely Connected Convolutional Networks](https://arxiv.org/abs/1608.06993) (DenseNets) are available: 36 | 37 | - DenseNet - without bottleneck layers 38 | - DenseNet-BC - with bottleneck layers 39 | 40 | Each model can be tested on such datasets: 41 | 42 | - CIFAR-10 43 | - CIFAR-10+ (with data augmentation) 44 | - CIFAR-100 45 | - CIFAR-100+ (with data augmentation) 46 | - SVHN 47 | 48 | A number of layers, blocks, growth rate, image normalization and other training params may be changed trough shell or inside the source code. 49 | 50 | ## Usage 51 | 52 | Example run: 53 | 54 | ``` 55 | python run_dense_net.py --depth=40 --train --test --dataset=C10 --sdr 56 | ``` 57 | 58 | This run uses SDR instead of dropout. To use dropout, run something like 59 | 60 | ``` 61 | python run_dense_net.py --depth=40 --train --test --dataset=C10 --keep_prob=0.8 62 | ``` 63 | 64 | where `keep_prob` is the probability (in this case 80%) that a neuron is *kept* during dropout. 65 | 66 | **NOTE:** the `--sdr` argument will override the `--keep_prob` argument. For example: 67 | 68 | ``` 69 | python run_dense_net.py --depth=40 --train --test --dataset=C10 --keep_prob=0.8 --sdr 70 | ``` 71 | 72 | will use SDR and not dropout. 73 | 74 | 75 | List all available options: 76 | 77 | ``` 78 | python run_dense_net.py --help 79 | ``` 80 | 81 | There are also many [other implementations](https://github.com/liuzhuang13/DenseNet) - they may be useful. 82 | 83 | Citation: 84 | 85 | ``` 86 | @article{Huang2016Densely, 87 | author = {Huang, Gao and Liu, Zhuang and Weinberger, Kilian Q.}, 88 | title = {Densely Connected Convolutional Networks}, 89 | journal = {arXiv preprint arXiv:1608.06993}, 90 | year = {2016} 91 | } 92 | ``` 93 | 94 | **KNOWN ISSUES** 95 | 96 | - model will not save due to graph definiton being larger than 2GB 97 | 98 | If you see anything wrong, feel free to open an issue! 99 | 100 | 101 | ## Results from SDR paper 102 | 103 | This table shows the results on CIFAR shown in the paper. Parameters are all the same as what are used in the paper, except for a batch size of 100 and an epoch size of 100. SDR's beta value was 0.1 and zeta was 0.01. The augmented datasets were not tested on because dropout was not used on these datasets in the original paper, however they may be added in the future (as will the SVHN results and results with higher layer counts). 104 | 105 | |Model type |Depth |C10 |C100 | 106 | |:---------------------|:------|:----------------|:-----------------| 107 | |DenseNet(*k* = 12) |40 |2.256(5.160) |09.36(22.60) | 108 | |DenseNet(*k* = 12) |100 |**1.360**(3.820) |**05.16**(11.06) | 109 | |DenseNet-BC(*k* = 12) |100 |2.520(6.340) |11.12(25.08) | 110 | 111 | 112 | 113 | ### Epochs to error rate 114 | 115 | The below tables show the number of training epochs required to reach a training error of 15, 10, and 5, respectively. For example, the dropout version of DenseNet-40 on CIFAR-10 took 8 epochs to reach a training error of 15, 16 epochs to reach a training error of 10, and 94 epochs to reach a training error of 5. In contrast, the SDR version of DenseNet-40 on CIFAR-10 took 5 epochs to reach a training error of 15, 5 epochs to reach a training error of 10, and 15 epochs to reach a training error of 5. Best results for each value, across both dropout and SDR, are bolded. 116 | 117 | 118 | #### Dropout 119 | 120 | |Model type |Depth |C10 |C100 | 121 | |:---------------------|:------|:---------------|:----------------| 122 | |DenseNet(*k* = 12) |40 |8 \ 16 \ 94 |95 \ -- \ -- | 123 | | | | | | 124 | |DenseNet(*k* = 12) |100 |8 \ 13 \ 25 |28 \ 60 \ -- | 125 | | | | | | 126 | |DenseNet-BC(*k* = 12) |100 |10 \ 25 \ -- |-- \ -- \ -- | 127 | | | | | | 128 | 129 | #### SDR 130 | 131 | |Model type |Depth |C10 |C100 | 132 | |:---------------------|:------|:-----------------------|:--------------------------| 133 | |DenseNet(*k* = 12) |40 |**5** \ **8** \ **15** |27 \ 48 \ -- | 134 | | | | | | 135 | |DenseNet(*k* = 12) |100 |6 \ 9 \ **15** |**17** \ **21** \ **52** | 136 | | | | | | 137 | |DenseNet-BC(*k* = 12) |100 |**5** \ **8** \ 17 |31 \ 87 \ -- | 138 | | | | | | 139 | 140 | Comparison to original DenseNet implementation with dropout 141 | -------- 142 | 143 | Test results on various datasets. Image normalization per channels was used. Results reported in paper provided in parenthesis. For Cifar+ datasets image normalization was performed before augmentation. This may cause a little bit lower results than reported in paper. 144 | 145 | |Model type |Depth |C10 |C10+ |C100 |C100+ | 146 | |:---------------------|:------|:-----------|:----------|:-------------|:-----------| 147 | |DenseNet(*k* = 12) |40 |6.67(7.00) |5.44(5.24) |27.44(27.55) |25.62(24.42)| 148 | |DenseNet-BC(*k* = 12) |100 |5.54(5.92) |4.87(4.51) |24.88(24.15) |22.85(22.27)| 149 | 150 | 151 | Difference compared to the [original](https://github.com/liuzhuang13/DenseNet) implementation 152 | --------------------------------------------------------- 153 | The existing model should use identical hyperparameters to the original code. 154 | 155 | Dependencies 156 | ------------ 157 | 158 | - Model was tested with Python 3.4.3+ and Python 3.5.2 with and without CUDA. 159 | - Model should work as expected with TensorFlow >= 0.10 FOR DROPOUT ONLY. SDR was added using a development environment with TensorFlow 1.7 so it may require 1.0+. 160 | 161 | Repo supported with requirements files - so the easiest way to install all just run: 162 | 163 | - in case of CPU usage `pip install -r requirements/cpu.txt`. 164 | - in case of GPU usage `pip install -r requirements/gpu.txt`. 165 | 166 | -------------------------------------------------------------------------------- /data_providers/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/noahfl/densenet-sdr/6e627ad48985b76391d7566858a67287d05ed01f/data_providers/__init__.py -------------------------------------------------------------------------------- /data_providers/base_provider.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | class DataSet: 5 | """Class to represent some dataset: train, validation, test""" 6 | @property 7 | def num_examples(self): 8 | """Return qtty of examples in dataset""" 9 | raise NotImplementedError 10 | 11 | def next_batch(self, batch_size): 12 | """Return batch of required size of data, labels""" 13 | raise NotImplementedError 14 | 15 | 16 | class ImagesDataSet(DataSet): 17 | """Dataset for images that provide some often used methods""" 18 | 19 | def _measure_mean_and_std(self): 20 | # for every channel in image 21 | means = [] 22 | stds = [] 23 | # for every channel in image(assume this is last dimension) 24 | for ch in range(self.images.shape[-1]): 25 | means.append(np.mean(self.images[:, :, :, ch])) 26 | stds.append(np.std(self.images[:, :, :, ch])) 27 | self._means = means 28 | self._stds = stds 29 | 30 | @property 31 | def images_means(self): 32 | if not hasattr(self, '_means'): 33 | self._measure_mean_and_std() 34 | return self._means 35 | 36 | @property 37 | def images_stds(self): 38 | if not hasattr(self, '_stds'): 39 | self._measure_mean_and_std() 40 | return self._stds 41 | 42 | def shuffle_images_and_labels(self, images, labels): 43 | rand_indexes = np.random.permutation(images.shape[0]) 44 | shuffled_images = images[rand_indexes] 45 | shuffled_labels = labels[rand_indexes] 46 | return shuffled_images, shuffled_labels 47 | 48 | def normalize_images(self, images, normalization_type): 49 | """ 50 | Args: 51 | images: numpy 4D array 52 | normalization_type: `str`, available choices: 53 | - divide_255 54 | - divide_256 55 | - by_chanels 56 | """ 57 | if normalization_type == 'divide_255': 58 | images = images / 255 59 | elif normalization_type == 'divide_256': 60 | images = images / 256 61 | elif normalization_type == 'by_chanels': 62 | images = images.astype('float64') 63 | # for every channel in image(assume this is last dimension) 64 | for i in range(images.shape[-1]): 65 | images[:, :, :, i] = ((images[:, :, :, i] - self.images_means[i]) / 66 | self.images_stds[i]) 67 | else: 68 | raise Exception("Unknown type of normalization") 69 | return images 70 | 71 | def normalize_all_images_by_chanels(self, initial_images): 72 | new_images = np.zeros(initial_images.shape) 73 | for i in range(initial_images.shape[0]): 74 | new_images[i] = self.normalize_image_by_chanel(initial_images[i]) 75 | return new_images 76 | 77 | def normalize_image_by_chanel(self, image): 78 | new_image = np.zeros(image.shape) 79 | for chanel in range(3): 80 | mean = np.mean(image[:, :, chanel]) 81 | std = np.std(image[:, :, chanel]) 82 | new_image[:, :, chanel] = (image[:, :, chanel] - mean) / std 83 | return new_image 84 | 85 | 86 | class DataProvider: 87 | @property 88 | def data_shape(self): 89 | """Return shape as python list of one data entry""" 90 | raise NotImplementedError 91 | 92 | @property 93 | def n_classes(self): 94 | """Return `int` of num classes""" 95 | raise NotImplementedError 96 | 97 | def labels_to_one_hot(self, labels): 98 | """Convert 1D array of labels to one hot representation 99 | 100 | Args: 101 | labels: 1D numpy array 102 | """ 103 | new_labels = np.zeros((labels.shape[0], self.n_classes)) 104 | new_labels[range(labels.shape[0]), labels] = np.ones(labels.shape) 105 | return new_labels 106 | 107 | def labels_from_one_hot(self, labels): 108 | """Convert 2D array of labels to 1D class based representation 109 | 110 | Args: 111 | labels: 2D numpy array 112 | """ 113 | return np.argmax(labels, axis=1) 114 | -------------------------------------------------------------------------------- /data_providers/cifar.py: -------------------------------------------------------------------------------- 1 | import tempfile 2 | import os 3 | import pickle 4 | import random 5 | 6 | import numpy as np 7 | 8 | 9 | from .base_provider import ImagesDataSet, DataProvider 10 | from .downloader import download_data_url 11 | 12 | 13 | def augment_image(image, pad): 14 | """Perform zero padding, randomly crop image to original size, 15 | maybe mirror horizontally""" 16 | flip = random.getrandbits(1) 17 | if flip: 18 | image = image[:, ::-1, :] 19 | init_shape = image.shape 20 | new_shape = [init_shape[0] + pad * 2, 21 | init_shape[1] + pad * 2, 22 | init_shape[2]] 23 | zeros_padded = np.zeros(new_shape) 24 | zeros_padded[pad:init_shape[0] + pad, pad:init_shape[1] + pad, :] = image 25 | # randomly crop to original size 26 | init_x = np.random.randint(0, pad * 2) 27 | init_y = np.random.randint(0, pad * 2) 28 | cropped = zeros_padded[ 29 | init_x: init_x + init_shape[0], 30 | init_y: init_y + init_shape[1], 31 | :] 32 | return cropped 33 | 34 | 35 | def augment_all_images(initial_images, pad): 36 | new_images = np.zeros(initial_images.shape) 37 | for i in range(initial_images.shape[0]): 38 | new_images[i] = augment_image(initial_images[i], pad=4) 39 | return new_images 40 | 41 | 42 | class CifarDataSet(ImagesDataSet): 43 | def __init__(self, images, labels, n_classes, shuffle, normalization, 44 | augmentation): 45 | """ 46 | Args: 47 | images: 4D numpy array 48 | labels: 2D or 1D numpy array 49 | n_classes: `int`, number of cifar classes - 10 or 100 50 | shuffle: `str` or None 51 | None: no any shuffling 52 | once_prior_train: shuffle train data only once prior train 53 | every_epoch: shuffle train data prior every epoch 54 | normalization: `str` or None 55 | None: no any normalization 56 | divide_255: divide all pixels by 255 57 | divide_256: divide all pixels by 256 58 | by_chanels: substract mean of every chanel and divide each 59 | chanel data by it's standart deviation 60 | augmentation: `bool` 61 | """ 62 | if shuffle is None: 63 | self.shuffle_every_epoch = False 64 | elif shuffle == 'once_prior_train': 65 | self.shuffle_every_epoch = False 66 | images, labels = self.shuffle_images_and_labels(images, labels) 67 | elif shuffle == 'every_epoch': 68 | self.shuffle_every_epoch = True 69 | else: 70 | raise Exception("Unknown type of shuffling") 71 | 72 | self.images = images 73 | self.labels = labels 74 | self.n_classes = n_classes 75 | self.augmentation = augmentation 76 | self.normalization = normalization 77 | self.images = self.normalize_images(images, self.normalization) 78 | self.start_new_epoch() 79 | 80 | def start_new_epoch(self): 81 | self._batch_counter = 0 82 | if self.shuffle_every_epoch: 83 | images, labels = self.shuffle_images_and_labels( 84 | self.images, self.labels) 85 | else: 86 | images, labels = self.images, self.labels 87 | if self.augmentation: 88 | images = augment_all_images(images, pad=4) 89 | self.epoch_images = images 90 | self.epoch_labels = labels 91 | 92 | @property 93 | def num_examples(self): 94 | return self.labels.shape[0] 95 | 96 | def next_batch(self, batch_size): 97 | start = self._batch_counter * batch_size 98 | end = (self._batch_counter + 1) * batch_size 99 | self._batch_counter += 1 100 | images_slice = self.epoch_images[start: end] 101 | labels_slice = self.epoch_labels[start: end] 102 | if images_slice.shape[0] != batch_size: 103 | self.start_new_epoch() 104 | return self.next_batch(batch_size) 105 | else: 106 | return images_slice, labels_slice 107 | 108 | 109 | class CifarDataProvider(DataProvider): 110 | """Abstract class for cifar readers""" 111 | 112 | def __init__(self, save_path=None, validation_set=None, 113 | validation_split=None, shuffle=None, normalization=None, 114 | one_hot=True, **kwargs): 115 | """ 116 | Args: 117 | save_path: `str` 118 | validation_set: `bool`. 119 | validation_split: `float` or None 120 | float: chunk of `train set` will be marked as `validation set`. 121 | None: if 'validation set' == True, `validation set` will be 122 | copy of `test set` 123 | shuffle: `str` or None 124 | None: no any shuffling 125 | once_prior_train: shuffle train data only once prior train 126 | every_epoch: shuffle train data prior every epoch 127 | normalization: `str` or None 128 | None: no any normalization 129 | divide_255: divide all pixels by 255 130 | divide_256: divide all pixels by 256 131 | by_chanels: substract mean of every chanel and divide each 132 | chanel data by it's standart deviation 133 | one_hot: `bool`, return lasels one hot encoded 134 | """ 135 | self.batch_size = kwargs['batch_size'] 136 | self._save_path = save_path 137 | self.one_hot = one_hot 138 | download_data_url(self.data_url, self.save_path) 139 | train_fnames, test_fnames = self.get_filenames(self.save_path) 140 | 141 | # add train and validations datasets 142 | images, labels = self.read_cifar(train_fnames) 143 | if validation_set is not None and validation_split is not None: 144 | split_idx = int(images.shape[0] * (1 - validation_split)) 145 | self.train = CifarDataSet( 146 | images=images[:split_idx], labels=labels[:split_idx], 147 | n_classes=self.n_classes, shuffle=shuffle, 148 | normalization=normalization, 149 | augmentation=self.data_augmentation) 150 | self.validation = CifarDataSet( 151 | images=images[split_idx:], labels=labels[split_idx:], 152 | n_classes=self.n_classes, shuffle=shuffle, 153 | normalization=normalization, 154 | augmentation=self.data_augmentation) 155 | else: 156 | self.train = CifarDataSet( 157 | images=images, labels=labels, 158 | n_classes=self.n_classes, shuffle=shuffle, 159 | normalization=normalization, 160 | augmentation=self.data_augmentation) 161 | 162 | # add test set 163 | images, labels = self.read_cifar(test_fnames) 164 | self.test = CifarDataSet( 165 | images=images, labels=labels, 166 | shuffle=None, n_classes=self.n_classes, 167 | normalization=normalization, 168 | augmentation=False) 169 | 170 | if validation_set and not validation_split: 171 | self.validation = self.test 172 | 173 | @property 174 | def save_path(self): 175 | if self._save_path is None: 176 | self._save_path = os.path.join( 177 | tempfile.gettempdir(), 'cifar%d' % self.n_classes) 178 | return self._save_path 179 | 180 | @property 181 | def data_url(self): 182 | """Return url for downloaded data depends on cifar class""" 183 | data_url = ('http://www.cs.toronto.edu/' 184 | '~kriz/cifar-%d-python.tar.gz' % self.n_classes) 185 | return data_url 186 | 187 | @property 188 | def data_shape(self): 189 | return (32, 32, 3) 190 | 191 | @property 192 | def n_classes(self): 193 | return self._n_classes 194 | 195 | def get_filenames(self, save_path): 196 | """Return two lists of train and test filenames for dataset""" 197 | raise NotImplementedError 198 | 199 | def read_cifar(self, filenames): 200 | if self.n_classes == 10: 201 | labels_key = b'labels' 202 | elif self.n_classes == 100: 203 | labels_key = b'fine_labels' 204 | 205 | images_res = [] 206 | labels_res = [] 207 | for fname in filenames: 208 | with open(fname, 'rb') as f: 209 | images_and_labels = pickle.load(f, encoding='bytes') 210 | images = images_and_labels[b'data'] 211 | images = images.reshape(-1, 3, 32, 32) 212 | images = images.swapaxes(1, 3).swapaxes(1, 2) 213 | images_res.append(images) 214 | labels_res.append(images_and_labels[labels_key]) 215 | images_res = np.vstack(images_res) 216 | labels_res = np.hstack(labels_res) 217 | if self.one_hot: 218 | labels_res = self.labels_to_one_hot(labels_res) 219 | return images_res, labels_res 220 | 221 | 222 | class Cifar10DataProvider(CifarDataProvider): 223 | _n_classes = 10 224 | data_augmentation = False 225 | 226 | def get_filenames(self, save_path): 227 | sub_save_path = os.path.join(save_path, 'cifar-10-batches-py') 228 | train_filenames = [ 229 | os.path.join( 230 | sub_save_path, 231 | 'data_batch_%d' % i) for i in range(1, 6)] 232 | test_filenames = [os.path.join(sub_save_path, 'test_batch')] 233 | return train_filenames, test_filenames 234 | 235 | 236 | class Cifar100DataProvider(CifarDataProvider): 237 | _n_classes = 100 238 | data_augmentation = False 239 | 240 | def get_filenames(self, save_path): 241 | sub_save_path = os.path.join(save_path, 'cifar-100-python') 242 | train_filenames = [os.path.join(sub_save_path, 'train')] 243 | test_filenames = [os.path.join(sub_save_path, 'test')] 244 | return train_filenames, test_filenames 245 | 246 | 247 | class Cifar10AugmentedDataProvider(Cifar10DataProvider): 248 | _n_classes = 10 249 | data_augmentation = True 250 | 251 | 252 | class Cifar100AugmentedDataProvider(Cifar100DataProvider): 253 | _n_classes = 100 254 | data_augmentation = True 255 | 256 | 257 | if __name__ == '__main__': 258 | # some sanity checks for Cifar data providers 259 | import matplotlib.pyplot as plt 260 | 261 | # plot some CIFAR10 images with classes 262 | def plot_images_labels(images, labels, axes, main_label, classes): 263 | plt.text(0, 1.5, main_label, ha='center', va='top', 264 | transform=axes[len(axes) // 2].transAxes) 265 | for image, label, axe in zip(images, labels, axes): 266 | axe.imshow(image) 267 | axe.set_title(classes[np.argmax(label)]) 268 | axe.set_axis_off() 269 | 270 | cifar_10_idx_to_class = ['airplane', 'automobile', 'bird', 'cat', 'deer', 271 | 'dog', 'frog', 'horse', 'ship', 'truck'] 272 | c10_provider = Cifar10DataProvider( 273 | validation_set=True) 274 | assert c10_provider._n_classes == 10 275 | assert c10_provider.train.labels.shape[-1] == 10 276 | assert len(c10_provider.train.labels.shape) == 2 277 | assert np.all(c10_provider.validation.images == c10_provider.test.images) 278 | assert c10_provider.train.images.shape[0] == 50000 279 | assert c10_provider.test.images.shape[0] == 10000 280 | 281 | # test split on validation dataset 282 | c10_provider = Cifar10DataProvider( 283 | one_hot=False, validation_set=True, validation_split=0.1) 284 | assert len(c10_provider.train.labels.shape) == 1 285 | assert not np.all( 286 | c10_provider.validation.images == c10_provider.test.images) 287 | assert c10_provider.train.images.shape[0] == 45000 288 | assert c10_provider.validation.images.shape[0] == 5000 289 | assert c10_provider.test.images.shape[0] == 10000 290 | 291 | # test shuffling 292 | c10_provider_not_shuffled = Cifar10DataProvider(shuffle=None) 293 | c10_provider_shuffled = Cifar10DataProvider(shuffle='once_prior_train') 294 | assert not np.all( 295 | c10_provider_not_shuffled.train.images != c10_provider_shuffled.train.images) 296 | assert np.all( 297 | c10_provider_not_shuffled.test.images == c10_provider_shuffled.test.images) 298 | 299 | n_plots = 10 300 | fig, axes = plt.subplots(nrows=4, ncols=n_plots) 301 | plot_images_labels( 302 | c10_provider_not_shuffled.train.images[:n_plots], 303 | c10_provider_not_shuffled.train.labels[:n_plots], 304 | axes[0], 305 | 'Original dataset', 306 | cifar_10_idx_to_class) 307 | dataset = Cifar10DataProvider(normalization='divide_256') 308 | plot_images_labels( 309 | dataset.train.images[:n_plots], 310 | dataset.train.labels[:n_plots], 311 | axes[1], 312 | 'Original dataset normalized dividing by 256', 313 | cifar_10_idx_to_class) 314 | dataset = Cifar10DataProvider(normalization='by_chanels') 315 | plot_images_labels( 316 | dataset.train.images[:n_plots], 317 | dataset.train.labels[:n_plots], 318 | axes[2], 319 | 'Original dataset normalized by mean/std at every channel', 320 | cifar_10_idx_to_class) 321 | plot_images_labels( 322 | c10_provider_shuffled.train.images[:n_plots], 323 | c10_provider_shuffled.train.labels[:n_plots], 324 | axes[3], 325 | 'Shuffled dataset', 326 | cifar_10_idx_to_class) 327 | plt.show() 328 | 329 | text_classes_file = os.path.join( 330 | os.path.dirname(__file__), 'cifar_100_classes.txt') 331 | with open('/tmp/cifar100/cifar-100-python/meta', 'rb') as f: 332 | cifar_100_meta = pickle.load(f, encoding='bytes') 333 | cifar_100_idx_to_class = cifar_100_meta[b'fine_label_names'] 334 | 335 | c100_provider_not_shuffled = Cifar100DataProvider(shuffle=None) 336 | assert c100_provider_not_shuffled.train.labels.shape[-1] == 100 337 | c100_provider_shuffled = Cifar100DataProvider(shuffle='once_prior_train') 338 | 339 | n_plots = 15 340 | fig, axes = plt.subplots(nrows=2, ncols=n_plots) 341 | plot_images_labels( 342 | c100_provider_not_shuffled.train.images[:n_plots], 343 | c100_provider_not_shuffled.train.labels[:n_plots], 344 | axes[0], 345 | 'Original dataset', 346 | cifar_100_idx_to_class) 347 | 348 | plot_images_labels( 349 | c100_provider_shuffled.train.images[:n_plots], 350 | c100_provider_shuffled.train.labels[:n_plots], 351 | axes[1], 352 | 'Shuffled dataset', 353 | cifar_100_idx_to_class) 354 | plt.show() 355 | -------------------------------------------------------------------------------- /data_providers/downloader.py: -------------------------------------------------------------------------------- 1 | import sys 2 | import os 3 | import urllib.request 4 | import tarfile 5 | import zipfile 6 | 7 | 8 | def report_download_progress(count, block_size, total_size): 9 | pct_complete = float(count * block_size) / total_size 10 | msg = "\r {0:.1%} already downloaded".format(pct_complete) 11 | sys.stdout.write(msg) 12 | sys.stdout.flush() 13 | 14 | 15 | def download_data_url(url, download_dir): 16 | filename = url.split('/')[-1] 17 | file_path = os.path.join(download_dir, filename) 18 | 19 | if not os.path.exists(file_path): 20 | os.makedirs(download_dir, exist_ok=True) 21 | 22 | print("Download %s to %s" % (url, file_path)) 23 | file_path, _ = urllib.request.urlretrieve( 24 | url=url, 25 | filename=file_path, 26 | reporthook=report_download_progress) 27 | 28 | print("\nExtracting files") 29 | if file_path.endswith(".zip"): 30 | zipfile.ZipFile(file=file_path, mode="r").extractall(download_dir) 31 | elif file_path.endswith((".tar.gz", ".tgz")): 32 | tarfile.open(name=file_path, mode="r:gz").extractall(download_dir) 33 | -------------------------------------------------------------------------------- /data_providers/svhn.py: -------------------------------------------------------------------------------- 1 | import tempfile 2 | import os 3 | import scipy.io 4 | 5 | import numpy as np 6 | 7 | from .base_provider import ImagesDataSet, DataProvider 8 | from .downloader import download_data_url 9 | 10 | 11 | class SVHNDataSet(ImagesDataSet): 12 | n_classes = 10 13 | 14 | def __init__(self, images, labels, shuffle, normalization): 15 | """ 16 | Args: 17 | images: 4D numpy array 18 | labels: 2D or 1D numpy array 19 | shuffle: `bool`, should shuffle data or not 20 | normalization: `str` or None 21 | None: no any normalization 22 | divide_255: divide all pixels by 255 23 | divide_256: divide all pixels by 256 24 | by_chanels: substract mean of every chanel and divide each 25 | chanel data by it's standart deviation 26 | """ 27 | self.shuffle = shuffle 28 | self.images = images 29 | self.labels = labels 30 | self.normalization = normalization 31 | self.start_new_epoch() 32 | 33 | def start_new_epoch(self): 34 | self._batch_counter = 0 35 | if self.shuffle: 36 | self.images, self.labels = self.shuffle_images_and_labels( 37 | self.images, self.labels) 38 | 39 | @property 40 | def num_examples(self): 41 | return self.labels.shape[0] 42 | 43 | def next_batch(self, batch_size): 44 | start = self._batch_counter * batch_size 45 | end = (self._batch_counter + 1) * batch_size 46 | self._batch_counter += 1 47 | images_slice = self.images[start: end] 48 | labels_slice = self.labels[start: end] 49 | # due to memory error it should be done inside batch 50 | if self.normalization is not None: 51 | images_slice = self.normalize_images( 52 | images_slice, self.normalization) 53 | if images_slice.shape[0] != batch_size: 54 | self.start_new_epoch() 55 | return self.next_batch(batch_size) 56 | else: 57 | return images_slice, labels_slice 58 | 59 | 60 | class SVHNDataProvider(DataProvider): 61 | def __init__(self, save_path=None, validation_set=None, 62 | validation_split=None, shuffle=False, normalization=None, 63 | one_hot=True, **kwargs): 64 | """ 65 | Args: 66 | save_path: `str` 67 | validation_set: `bool`. 68 | validation_split: `int` or None 69 | float: chunk of `train set` will be marked as `validation set`. 70 | None: if 'validation set' == True, `validation set` will be 71 | copy of `test set` 72 | shuffle: `bool`, should shuffle data or not 73 | normalization: `str` or None 74 | None: no any normalization 75 | divide_255: divide all pixels by 255 76 | divide_256: divide all pixels by 256 77 | by_chanels: substract mean of every chanel and divide each 78 | chanel data by it's standart deviation 79 | one_hot: `bool`, return lasels one hot encoded 80 | """ 81 | self._save_path = save_path 82 | self.batch_size = kwargs['batch_size'] 83 | train_images = [] 84 | train_labels = [] 85 | 86 | for part in ['train', 'extra']: 87 | images, labels = self.get_images_and_labels(part, one_hot) 88 | train_images.append(images) 89 | train_labels.append(labels) 90 | train_images = np.vstack(train_images) 91 | if one_hot: 92 | train_labels = np.vstack(train_labels) 93 | else: 94 | train_labels = np.hstack(train_labels) 95 | if validation_set and validation_split: 96 | rand_indexes = np.random.permutation(train_images.shape[0]) 97 | valid_indexes = rand_indexes[:validation_split] 98 | train_indexes = rand_indexes[:validation_split] 99 | valid_images = train_images[valid_indexes] 100 | valid_labels = train_labels[valid_indexes] 101 | train_images = train_images[train_indexes] 102 | train_labels = train_labels[train_indexes] 103 | self.validation = SVHNDataSet( 104 | valid_images, valid_labels, shuffle, normalization) 105 | 106 | self.train = SVHNDataSet( 107 | train_images, train_labels, shuffle, normalization) 108 | 109 | test_images, test_labels = self.get_images_and_labels('test', one_hot) 110 | self.test = SVHNDataSet(test_images, test_labels, False, normalization) 111 | 112 | if validation_set and not validation_split: 113 | self.validation = self.test 114 | 115 | def get_images_and_labels(self, name_part, one_hot=False): 116 | url = self.data_url + name_part + '_32x32.mat' 117 | download_data_url(url, self.save_path) 118 | filename = os.path.join(self.save_path, name_part + '_32x32.mat') 119 | data = scipy.io.loadmat(filename) 120 | images = data['X'].transpose(3, 0, 1, 2) 121 | labels = data['y'].reshape((-1)) 122 | labels[labels == 10] = 0 123 | if one_hot: 124 | labels = self.labels_to_one_hot(labels) 125 | return images, labels 126 | 127 | @property 128 | def n_classes(self): 129 | return 10 130 | 131 | @property 132 | def save_path(self): 133 | if self._save_path is None: 134 | self._save_path = os.path.join(tempfile.gettempdir(), 'svhn') 135 | return self._save_path 136 | 137 | @property 138 | def data_url(self): 139 | return "http://ufldl.stanford.edu/housenumbers/" 140 | 141 | @property 142 | def data_shape(self): 143 | return (32, 32, 3) 144 | 145 | 146 | if __name__ == '__main__': 147 | # WARNING: this test will require about 5 GB of RAM 148 | import matplotlib.pyplot as plt 149 | 150 | def plot_images_labels(images, labels, axes, main_label): 151 | plt.text(0, 1.5, main_label, ha='center', va='top', 152 | transform=axes[len(axes) // 2].transAxes) 153 | for image, label, axe in zip(images, labels, axes): 154 | axe.imshow(image) 155 | axe.set_title(np.argmax(label)) 156 | axe.set_axis_off() 157 | 158 | n_plots = 10 159 | fig, axes = plt.subplots(nrows=2, ncols=n_plots) 160 | 161 | dataset = SVHNDataProvider() 162 | plot_images_labels( 163 | dataset.train.images[:n_plots], 164 | dataset.train.labels[:n_plots], 165 | axes[0], 166 | 'Original dataset') 167 | 168 | dataset = SVHNDataProvider(shuffle=True) 169 | plot_images_labels( 170 | dataset.train.images[:n_plots], 171 | dataset.train.labels[:n_plots], 172 | axes[1], 173 | 'Shuffled dataset') 174 | 175 | plt.show() 176 | -------------------------------------------------------------------------------- /data_providers/utils.py: -------------------------------------------------------------------------------- 1 | from .cifar import Cifar10DataProvider, Cifar100DataProvider, \ 2 | Cifar10AugmentedDataProvider, Cifar100AugmentedDataProvider 3 | from .svhn import SVHNDataProvider 4 | 5 | 6 | def get_data_provider_by_name(name, train_params): 7 | """Return required data provider class""" 8 | if name == 'C10': 9 | return Cifar10DataProvider(**train_params) 10 | if name == 'C10+': 11 | return Cifar10AugmentedDataProvider(**train_params) 12 | if name == 'C100': 13 | return Cifar100DataProvider(**train_params) 14 | if name == 'C100+': 15 | return Cifar100AugmentedDataProvider(**train_params) 16 | if name == 'SVHN': 17 | return SVHNDataProvider(**train_params) 18 | else: 19 | print("Sorry, data provider for `%s` dataset " 20 | "was not implemented yet" % name) 21 | exit() 22 | -------------------------------------------------------------------------------- /models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/noahfl/densenet-sdr/6e627ad48985b76391d7566858a67287d05ed01f/models/__init__.py -------------------------------------------------------------------------------- /models/dense_net.py: -------------------------------------------------------------------------------- 1 | import os 2 | import time 3 | import shutil 4 | from datetime import timedelta 5 | 6 | import numpy as np 7 | import tensorflow as tf 8 | 9 | 10 | TF_VERSION = float('.'.join(tf.__version__.split('.')[:2])) 11 | 12 | 13 | class DenseNet: 14 | def __init__(self, data_provider, growth_rate, depth, 15 | total_blocks, keep_prob, num_inter_threads, num_intra_threads, 16 | weight_decay, nesterov_momentum, model_type, dataset, 17 | should_save_logs, should_save_model, 18 | renew_logs=False, 19 | reduction=1.0, 20 | bc_mode=False, 21 | **kwargs): 22 | """ 23 | Class to implement networks from this paper 24 | https://arxiv.org/pdf/1611.05552.pdf 25 | 26 | Args: 27 | data_provider: Class, that have all required data sets 28 | growth_rate: `int`, variable from paper 29 | depth: `int`, variable from paper 30 | total_blocks: `int`, paper value == 3 31 | keep_prob: `float`, keep probability for dropout. If keep_prob = 1 32 | dropout will be disables 33 | weight_decay: `float`, weight decay for L2 loss, paper = 1e-4 34 | nesterov_momentum: `float`, momentum for Nesterov optimizer 35 | model_type: `str`, 'DenseNet' or 'DenseNet-BC'. Should model use 36 | bottle neck connections or not. 37 | dataset: `str`, dataset name 38 | should_save_logs: `bool`, should logs be saved or not 39 | should_save_model: `bool`, should model be saved or not 40 | renew_logs: `bool`, remove previous logs for current model 41 | reduction: `float`, reduction Theta at transition layer for 42 | DenseNets with bottleneck layers. See paragraph 'Compression' 43 | https://arxiv.org/pdf/1608.06993v3.pdf#4 44 | bc_mode: `bool`, should we use bottleneck layers and features 45 | reduction or not. 46 | """ 47 | self.data_provider = data_provider 48 | self.batch_size = self.data_provider.batch_size 49 | self.data_shape = data_provider.data_shape 50 | self.n_classes = data_provider.n_classes 51 | self.depth = depth 52 | self.growth_rate = growth_rate 53 | self.num_inter_threads = num_inter_threads 54 | self.num_intra_threads = num_intra_threads 55 | # how many features will be received after first convolution 56 | # value the same as in the original Torch code 57 | self.first_output_features = growth_rate * 2 58 | self.total_blocks = total_blocks 59 | self.layers_per_block = (depth - (total_blocks + 1)) // total_blocks 60 | self.bc_mode = bc_mode 61 | # compression rate at the transition layers 62 | self.reduction = reduction 63 | if not bc_mode: 64 | print("Build %s model with %d blocks, " 65 | "%d composite layers each." % ( 66 | model_type, self.total_blocks, self.layers_per_block)) 67 | if bc_mode: 68 | self.layers_per_block = self.layers_per_block // 2 69 | print("Build %s model with %d blocks, " 70 | "%d bottleneck layers and %d composite layers each." % ( 71 | model_type, self.total_blocks, self.layers_per_block, 72 | self.layers_per_block)) 73 | print("Reduction at transition layers: %.1f" % self.reduction) 74 | 75 | self.keep_prob = keep_prob 76 | self.weight_decay = weight_decay 77 | self.nesterov_momentum = nesterov_momentum 78 | self.model_type = model_type 79 | self.dataset_name = dataset 80 | self.should_save_logs = should_save_logs 81 | self.should_save_model = should_save_model 82 | self.renew_logs = renew_logs 83 | self.batches_step = 0 84 | #self.alpha = 0.05 85 | self.beta = 0.1 86 | self.zeta = 0.01 87 | self.use_sdr = kwargs['use_sdr'] 88 | self.no_histograms = kwargs['no_histograms'] 89 | self._define_inputs() 90 | if self.use_sdr: 91 | self._build_graph_sdr() 92 | else: 93 | self._build_graph() 94 | self._initialize_session() 95 | self._count_trainable_params() 96 | 97 | def _initialize_session(self): 98 | """Initialize session, variables, saver""" 99 | config = tf.ConfigProto() 100 | 101 | # Specify the CPU inter and Intra threads used by MKL 102 | config.intra_op_parallelism_threads = self.num_intra_threads 103 | config.inter_op_parallelism_threads = self.num_inter_threads 104 | 105 | # restrict model GPU memory utilization to min required 106 | config.gpu_options.allow_growth = True 107 | self.sess = tf.Session(config=config) 108 | tf_ver = int(tf.__version__.split('.')[1]) 109 | 110 | with tf.variable_scope("summaries"): 111 | 112 | if TF_VERSION <= 0.10: 113 | self.sess.run(tf.initialize_all_variables()) 114 | logswriter = tf.train.SummaryWriter 115 | else: 116 | self.sess.run(tf.global_variables_initializer()) 117 | logswriter = tf.summary.FileWriter 118 | self.saver = tf.train.Saver() 119 | self.summary_writer = logswriter(self.logs_path) 120 | self.summary_writer.add_graph(self.sess.graph) 121 | 122 | def _count_trainable_params(self): 123 | total_parameters = 0 124 | for variable in tf.trainable_variables(): 125 | shape = variable.get_shape() 126 | variable_parametes = 1 127 | for dim in shape: 128 | variable_parametes *= dim.value 129 | total_parameters += variable_parametes 130 | print("Total training params: %.1fM" % (total_parameters / 1e6)) 131 | 132 | @property 133 | def save_path(self): 134 | try: 135 | save_path = self._save_path 136 | except AttributeError: 137 | save_path = 'saves/%s' % self.model_identifier 138 | os.makedirs(save_path, exist_ok=True) 139 | save_path = os.path.join(save_path, 'model.chkpt') 140 | self._save_path = save_path 141 | return save_path 142 | 143 | @property 144 | def logs_path(self): 145 | try: 146 | logs_path = self._logs_path 147 | except AttributeError: 148 | current_time = time.strftime("%Y_%m_%d_%H%M%S", time.gmtime()) 149 | logs_path = 'logs/%s/%s' % (self.model_identifier, current_time) 150 | if self.renew_logs: 151 | shutil.rmtree(logs_path, ignore_errors=True) 152 | os.makedirs(logs_path, exist_ok=True) 153 | self._logs_path = logs_path 154 | return logs_path 155 | 156 | @property 157 | def model_identifier(self): 158 | if self.use_sdr: 159 | return "{}-SDR_growth_rate={}_depth={}_dataset_{}".format( 160 | self.model_type, self.growth_rate, self.depth, self.dataset_name) 161 | else: 162 | return "{}_growth_rate={}_depth={}_dataset_{}".format( 163 | self.model_type, self.growth_rate, self.depth, self.dataset_name) 164 | 165 | def save_model(self, global_step=None): 166 | self.saver.save(self.sess, self.save_path, global_step=global_step) 167 | 168 | def load_model(self): 169 | try: 170 | self.saver.restore(self.sess, self.save_path) 171 | except Exception as e: 172 | raise IOError("Failed to to load model " 173 | "from save path: %s" % self.save_path) 174 | self.saver.restore(self.sess, self.save_path) 175 | print("Successfully load model from save path: %s" % self.save_path) 176 | 177 | def log_loss_accuracy(self, loss, accuracy, epoch, prefix, 178 | should_print=True): 179 | with tf.variable_scope("summaries"): 180 | 181 | if should_print: 182 | print("mean cross_entropy: %f, mean accuracy: %f" % ( 183 | loss, accuracy)) 184 | summary = tf.Summary(value=[ 185 | tf.Summary.Value( 186 | tag='loss_%s' % prefix, simple_value=float(loss)), 187 | tf.Summary.Value( 188 | tag='accuracy_%s' % prefix, simple_value=float(accuracy)) 189 | ]) 190 | self.summary_writer.add_summary(summary, epoch) 191 | 192 | def add_histograms(self, tensor): 193 | with tf.variable_scope("summaries"): 194 | tf.summary.histogram(tensor.name + "_hist", tensor) 195 | 196 | 197 | def _define_inputs(self): 198 | shape = [None] 199 | shape.extend(self.data_shape) 200 | self.images = tf.placeholder( 201 | tf.float32, 202 | shape=shape, 203 | name='input_images') 204 | self.labels = tf.placeholder( 205 | tf.float32, 206 | shape=[None, self.n_classes], 207 | name='labels') 208 | self.learning_rate = tf.placeholder( 209 | tf.float32, 210 | shape=[], 211 | name='learning_rate') 212 | self.is_training = tf.placeholder(tf.bool, shape=[]) 213 | 214 | #def _define_inputs(self): 215 | # shape = [self.batch_size] 216 | # shape.extend(self.data_shape) 217 | 218 | # self.images = tf.get_variable('input_images', 219 | # shape=shape, initializer=tf.zeros_initializer(dtype=tf.float32)) 220 | 221 | # #self.images = tf.placeholder( 222 | # # tf.float32, 223 | # # shape=shape, 224 | # # name='input_images') 225 | # #self.labels = tf.placeholder( 226 | # # tf.float32, 227 | # # shape=[None, self.n_classes], 228 | # # name='labels') 229 | # #labels_zeros = tf.zeros([None, self.n_classes], dtype=tf.float32) 230 | # self.labels = tf.get_variable('labels', 231 | # shape=[self.batch_size, self.n_classes], 232 | # initializer=tf.zeros_initializer(dtype=tf.float32)) 233 | 234 | # self.learning_rate = tf.constant(0.1, dtype=tf.float32) 235 | # #self.learning_rate = tf.placeholder( 236 | # # tf.float32, 237 | # # shape=[], 238 | # # name='learning_rate') 239 | # #self.is_training = tf.placeholder(tf.bool, shape=[]) 240 | # #tf.assign(self.is_training, True) 241 | 242 | def composite_function(self, _input, out_features, kernel_size=3): 243 | """Function from paper H_l that performs: 244 | - batch normalization 245 | - ReLU nonlinearity 246 | - convolution with required kernel 247 | - dropout, if required 248 | """ 249 | with tf.variable_scope("composite_function"): 250 | # BN 251 | output = self.batch_norm(_input) 252 | # ReLU 253 | output = tf.nn.relu(output) 254 | # convolution 255 | output = self.conv2d( 256 | output, out_features=out_features, kernel_size=kernel_size) 257 | # dropout(in case of training and in case it is no 1.0) 258 | output = self.dropout(output) 259 | return output 260 | 261 | def bottleneck(self, _input, out_features): 262 | with tf.variable_scope("bottleneck"): 263 | output = self.batch_norm(_input) 264 | output = tf.nn.relu(output) 265 | inter_features = out_features * 4 266 | output = self.conv2d( 267 | output, out_features=inter_features, kernel_size=1, 268 | padding='VALID') 269 | output = self.dropout(output) 270 | return output 271 | 272 | def add_internal_layer(self, _input, growth_rate): 273 | """Perform H_l composite function for the layer and after concatenate 274 | input with output from composite function. 275 | """ 276 | # call composite function with 3x3 kernel 277 | if not self.bc_mode: 278 | comp_out = self.composite_function( 279 | _input, out_features=growth_rate, kernel_size=3) 280 | elif self.bc_mode: 281 | bottleneck_out = self.bottleneck(_input, out_features=growth_rate) 282 | comp_out = self.composite_function( 283 | bottleneck_out, out_features=growth_rate, kernel_size=3) 284 | # concatenate _input with out from composite function 285 | if TF_VERSION >= 1.0: 286 | output = tf.concat(axis=3, values=(_input, comp_out)) 287 | else: 288 | output = tf.concat(3, (_input, comp_out)) 289 | return output 290 | 291 | def add_block(self, _input, growth_rate, layers_per_block): 292 | """Add N H_l internal layers""" 293 | output = _input 294 | for layer in range(layers_per_block): 295 | with tf.variable_scope("layer_%d" % layer): 296 | output = self.add_internal_layer(output, growth_rate) 297 | return output 298 | 299 | def transition_layer(self, _input): 300 | """Call H_l composite function with 1x1 kernel and after average 301 | pooling 302 | """ 303 | # call composite function with 1x1 kernel 304 | out_features = int(int(_input.get_shape()[-1]) * self.reduction) 305 | output = self.composite_function( 306 | _input, out_features=out_features, kernel_size=1) 307 | # run average pooling 308 | output = self.avg_pool(output, k=2) 309 | return output 310 | 311 | def transition_layer_to_classes(self, _input): 312 | """This is last transition to get probabilities by classes. It perform: 313 | - batch normalization 314 | - ReLU nonlinearity 315 | - wide average pooling 316 | - FC layer multiplication 317 | """ 318 | # BN 319 | output = self.batch_norm(_input) 320 | # ReLU 321 | output = tf.nn.relu(output) 322 | # average pooling 323 | last_pool_kernel = int(output.get_shape()[-2]) 324 | output = self.avg_pool(output, k=last_pool_kernel) 325 | # FC 326 | features_total = int(output.get_shape()[-1]) 327 | output = tf.reshape(output, [-1, features_total]) 328 | W = self.weight_variable_xavier( 329 | [features_total, self.n_classes], name='W') 330 | bias = self.bias_variable([self.n_classes]) 331 | logits = tf.matmul(output, W) + bias 332 | return logits 333 | 334 | def conv2d(self, _input, out_features, kernel_size, 335 | strides=[1, 1, 1, 1], padding='SAME'): 336 | in_features = int(_input.get_shape()[-1]) 337 | kernel = self.weight_variable_msra( 338 | [kernel_size, kernel_size, in_features, out_features], 339 | name='kernel') 340 | output = tf.nn.conv2d(_input, kernel, strides, padding) 341 | return output 342 | 343 | def avg_pool(self, _input, k): 344 | ksize = [1, k, k, 1] 345 | strides = [1, k, k, 1] 346 | padding = 'VALID' 347 | output = tf.nn.avg_pool(_input, ksize, strides, padding) 348 | return output 349 | 350 | def batch_norm(self, _input): 351 | output = tf.contrib.layers.batch_norm( 352 | _input, scale=True, is_training=self.is_training, 353 | updates_collections=None) 354 | return output 355 | 356 | def dropout(self, _input): 357 | if self.keep_prob < 1: 358 | output = tf.cond( 359 | self.is_training, 360 | lambda: tf.nn.dropout(_input, self.keep_prob), 361 | lambda: _input 362 | ) 363 | else: 364 | output = _input 365 | return output 366 | 367 | def weight_variable_msra(self, shape, name): 368 | return tf.get_variable( 369 | name=name, 370 | shape=shape, 371 | initializer=tf.contrib.layers.variance_scaling_initializer()) 372 | 373 | def weight_variable_xavier(self, shape, name): 374 | return tf.get_variable( 375 | name, 376 | shape=shape, 377 | initializer=tf.contrib.layers.xavier_initializer()) 378 | 379 | def bias_variable(self, shape, name='bias'): 380 | initial = tf.constant(0.0, shape=shape) 381 | return tf.get_variable(name, initializer=initial) 382 | 383 | def _build_graph(self): 384 | growth_rate = self.growth_rate 385 | layers_per_block = self.layers_per_block 386 | # first - initial 3 x 3 conv to first_output_features 387 | with tf.variable_scope("Initial_convolution"): 388 | output = self.conv2d( 389 | self.images, 390 | out_features=self.first_output_features, 391 | kernel_size=3) 392 | 393 | # add N required blocks 394 | for block in range(self.total_blocks): 395 | with tf.variable_scope("Block_%d" % block): 396 | output = self.add_block(output, growth_rate, layers_per_block) 397 | # last block exist without transition layer 398 | if block != self.total_blocks - 1: 399 | with tf.variable_scope("Transition_after_block_%d" % block): 400 | output = self.transition_layer(output) 401 | 402 | with tf.variable_scope("Transition_to_classes"): 403 | logits = self.transition_layer_to_classes(output) 404 | prediction = tf.nn.softmax(logits) 405 | 406 | 407 | 408 | # Losses 409 | cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( 410 | logits=logits, labels=self.labels)) 411 | self.cross_entropy = cross_entropy 412 | l2_loss = tf.add_n( 413 | [tf.nn.l2_loss(var) for var in tf.trainable_variables()]) 414 | 415 | # optimizer and train step 416 | optimizer = tf.train.MomentumOptimizer( 417 | self.learning_rate, self.nesterov_momentum, use_nesterov=True) 418 | self.optimizer = optimizer 419 | self.train_step = self.optimizer.minimize( 420 | cross_entropy + l2_loss * self.weight_decay) 421 | 422 | correct_prediction = tf.equal( 423 | tf.argmax(prediction, 1), 424 | tf.argmax(self.labels, 1)) 425 | self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 426 | 427 | 428 | def _build_graph_sdr(self): 429 | growth_rate = self.growth_rate 430 | layers_per_block = self.layers_per_block 431 | # first - initial 3 x 3 conv to first_output_features 432 | with tf.variable_scope("Initial_convolution"): 433 | output = self.conv2d( 434 | self.images, 435 | out_features=self.first_output_features, 436 | kernel_size=3) 437 | tf.summary.histogram('pre_gradients_weights_' + output.name, output) 438 | # add N required blocks 439 | for block in range(self.total_blocks): 440 | with tf.variable_scope("Block_%d" % block): 441 | output = self.add_block(output, growth_rate, layers_per_block) 442 | tf.summary.histogram('pre_gradients_weights_' + 443 | output.name + str(block), output) 444 | # last block exist without transition layer 445 | if block != self.total_blocks - 1: 446 | with tf.variable_scope("Transition_after_block_%d" % block): 447 | output = self.transition_layer(output) 448 | tf.summary.histogram('pre_gradients_weights_' + 449 | output.name + str(block), output) 450 | 451 | with tf.variable_scope("Transition_to_classes"): 452 | logits = self.transition_layer_to_classes(output) 453 | tf.summary.histogram('pre_gradients_weights_' + logits.name, logits) 454 | self.add_histograms(logits) 455 | 456 | prediction = tf.nn.softmax(logits) 457 | 458 | with tf.variable_scope("means_sd") as m_sd: 459 | sds_ = [tf.get_variable("sds_" + str(k), 460 | initializer=tf.random_uniform(v.shape, minval=0, maxval=0), trainable=False) 461 | for k, v in enumerate(tf.trainable_variables())] 462 | #trains = [v for v in tf.trainable_variables()] 463 | self.apply_ = [] 464 | for k in range(len(tf.trainable_variables())): 465 | dist = tf.distributions.Normal( 466 | loc=tf.trainable_variables()[k], scale=sds_[k]) 467 | new_trainable = tf.reshape(dist.sample([1]), 468 | tf.trainable_variables()[k].shape) 469 | 470 | self.apply_.append(tf.assign(tf.trainable_variables()[k], 471 | new_trainable)) 472 | 473 | # Losses 474 | cross_entropy = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits( 475 | logits=logits, labels=self.labels)) 476 | self.cross_entropy = cross_entropy 477 | 478 | l2_loss = tf.add_n( 479 | [tf.nn.l2_loss(var) for var in tf.trainable_variables()]) 480 | 481 | 482 | optimizer = tf.train.MomentumOptimizer( 483 | self.learning_rate, self.nesterov_momentum, use_nesterov=True) 484 | self.optimizer = optimizer 485 | 486 | grads_and_vars = self.optimizer.compute_gradients( 487 | cross_entropy + l2_loss * self.weight_decay) 488 | 489 | 490 | vars_with_grad = [v for g, v in grads_and_vars if g is not None] 491 | if not vars_with_grad: 492 | raise ValueError( 493 | "No gradients provided for any variable, check your graph for ops" 494 | " that do not support gradients, between variables %s and loss %s." % 495 | ([str(v) for _, v in grads_and_vars], l2_loss)) 496 | 497 | self.sd_asn = [] 498 | with tf.variable_scope(m_sd.original_name_scope): 499 | for k, (g, v) in enumerate(grads_and_vars): 500 | 501 | sd_tmp = tf.multiply(tf.constant( 502 | self.zeta, dtype=tf.float32), tf.add( 503 | tf.abs(tf.multiply(tf.constant(self.beta, 504 | dtype=tf.float32), g)), sds_[k])) 505 | self.sd_asn.append(tf.assign(sds_[k], sd_tmp)) 506 | 507 | for var in tf.trainable_variables(): 508 | tf.summary.histogram('post_sdr_weights_' + var.name, var) 509 | 510 | self.train_step = self.optimizer.apply_gradients( 511 | grads_and_vars) 512 | 513 | self.summaries = tf.summary.merge_all() 514 | 515 | correct_prediction = tf.equal( 516 | tf.argmax(prediction, 1), 517 | tf.argmax(self.labels, 1)) 518 | self.accuracy = tf.reduce_mean(tf.cast(correct_prediction, tf.float32)) 519 | 520 | 521 | def input_pipeline(self, batch_size, data, test=False): 522 | 523 | inputs, labels = data.images, data.labels 524 | # min_after_dequeue defines how big a buffer we will randomly sample 525 | # from -- bigger means better shuffling but slower start up and more 526 | # memory used. 527 | # capacity must be larger than min_after_dequeue and the amount larger 528 | # determines the maximum we will prefetch. Recommendation: 529 | # min_after_dequeue + (num_threads + a small safety margin) * batch_size 530 | min_after_dequeue = 1000 531 | capacity = min_after_dequeue + 3 * batch_size 532 | 533 | if test: 534 | example_batch, label_batch = tf.train.batch( 535 | [inputs, labels], batch_size=batch_size, capacity=capacity, 536 | enqueue_many=True) 537 | else: 538 | example_batch, label_batch = tf.train.shuffle_batch( 539 | [inputs, labels], batch_size=batch_size, num_threads=3, capacity=capacity, 540 | min_after_dequeue=min_after_dequeue, enqueue_many=True) 541 | return example_batch, label_batch 542 | 543 | def train_all_epochs(self, train_params): 544 | n_epochs = train_params['n_epochs'] 545 | learning_rate = train_params['initial_learning_rate'] 546 | batch_size = train_params['batch_size'] 547 | reduce_lr_epoch_1 = train_params['reduce_lr_epoch_1'] 548 | reduce_lr_epoch_2 = train_params['reduce_lr_epoch_2'] 549 | total_start_time = time.time() 550 | self._initialize_session() 551 | self._count_trainable_params() 552 | 553 | for epoch in range(1, n_epochs + 1): 554 | print("\n", '-' * 30, "Train epoch: %d" % epoch, '-' * 30, '\n') 555 | start_time = time.time() 556 | if epoch == reduce_lr_epoch_1 or epoch == reduce_lr_epoch_2: 557 | learning_rate = learning_rate / 10 558 | print("Decrease learning rate, new lr = %f" % learning_rate) 559 | 560 | print("Training...") 561 | if self.use_sdr: 562 | loss, acc = self.train_one_epoch_sdr( 563 | self.data_provider.train, batch_size, learning_rate) 564 | else: 565 | loss, acc = self.train_one_epoch( 566 | self.data_provider.train, batch_size, learning_rate) 567 | 568 | if self.should_save_logs: 569 | self.log_loss_accuracy(loss, acc, epoch, prefix='train') 570 | 571 | if train_params.get('validation_set', False): 572 | print("Validation...") 573 | loss, acc = self.test( 574 | self.data_provider.validation, batch_size) 575 | if self.should_save_logs: 576 | self.log_loss_accuracy(loss, acc, epoch, prefix='valid') 577 | 578 | time_per_epoch = time.time() - start_time 579 | seconds_left = int((n_epochs - epoch) * time_per_epoch) 580 | print("Time per epoch: %s, Est. complete in: %s" % ( 581 | str(timedelta(seconds=time_per_epoch)), 582 | str(timedelta(seconds=seconds_left)))) 583 | 584 | if self.should_save_model: 585 | self.save_model() 586 | 587 | total_training_time = time.time() - total_start_time 588 | print("\nTotal training time: %s" % str(timedelta( 589 | seconds=total_training_time))) 590 | 591 | def train_one_epoch(self, data, batch_size, learning_rate): 592 | num_examples = data.num_examples 593 | total_loss = [] 594 | total_accuracy = [] 595 | 596 | for i in range(num_examples // batch_size): 597 | batch = data.next_batch(batch_size) 598 | images, labels = batch 599 | feed_dict = { 600 | self.images: images, 601 | self.labels: labels, 602 | self.learning_rate: learning_rate, 603 | self.is_training: True, 604 | } 605 | result = self.sess.run([self.train_step, self.cross_entropy, self.accuracy], feed_dict=feed_dict) 606 | _, loss, accuracy = result 607 | #print("Iteration %d: loss=%f, accuracy=%f" % (i, loss, accuracy)) 608 | total_loss.append(loss) 609 | total_accuracy.append(accuracy) 610 | if self.should_save_logs: 611 | self.batches_step += 1 612 | self.log_loss_accuracy( 613 | loss, accuracy, self.batches_step, prefix='per_batch', 614 | should_print=False) 615 | mean_loss = np.mean(total_loss) 616 | mean_accuracy = np.mean(total_accuracy) 617 | return mean_loss, mean_accuracy 618 | 619 | 620 | def train_one_epoch_sdr(self, data, batch_size, learning_rate): 621 | #coord = tf.train.Coordinator() 622 | #threads = tf.train.start_queue_runners(sess=self.sess, coord=coord) 623 | num_examples = data.num_examples 624 | total_loss = [] 625 | total_accuracy = [] 626 | #self.is_training = tf.constant(True, dtype=tf.bool) 627 | 628 | for i in range(num_examples // batch_size): 629 | batch = data.next_batch(batch_size) 630 | images, labels = batch 631 | feed_dict = { 632 | self.images: images, 633 | self.labels: labels, 634 | self.learning_rate: learning_rate, 635 | self.is_training: True, 636 | } 637 | sess_list1 = self.apply_ + self.sd_asn 638 | sess_list2 = [self.train_step, self.cross_entropy, self.accuracy] 639 | #sess_list2 = [self.cross_entropy, self.accuracy] 640 | result1 = self.sess.run(sess_list1, feed_dict = feed_dict) 641 | result2 = self.sess.run(sess_list2, feed_dict = feed_dict) 642 | #record histograms, etc. every epoch 643 | if (not self.no_histograms) and i % (num_examples // batch_size) - 1 == 0: 644 | summ_ = self.sess.run(self.summaries, feed_dict=feed_dict) 645 | self.summary_writer.add_summary(summ_) 646 | _, loss, accuracy = result2 647 | #print("Iteration %d: loss=%f, accuracy=%f" % (i, loss, accuracy)) 648 | total_loss.append(loss) 649 | total_accuracy.append(accuracy) 650 | if self.should_save_logs: 651 | self.batches_step += 1 652 | self.log_loss_accuracy( 653 | loss, accuracy, self.batches_step, prefix='per_batch', 654 | should_print=False) 655 | 656 | mean_loss = np.mean(total_loss) 657 | mean_accuracy = np.mean(total_accuracy) 658 | return mean_loss, mean_accuracy 659 | 660 | 661 | def test(self, data, batch_size): 662 | num_examples = data.num_examples 663 | total_loss = [] 664 | total_accuracy = [] 665 | for i in range(num_examples // batch_size): 666 | batch = data.next_batch(batch_size) 667 | feed_dict = { 668 | self.images: batch[0], 669 | self.labels: batch[1], 670 | self.is_training: False, 671 | } 672 | fetches = [self.cross_entropy, self.accuracy] 673 | loss, accuracy = self.sess.run(fetches, feed_dict=feed_dict) 674 | total_loss.append(loss) 675 | total_accuracy.append(accuracy) 676 | mean_loss = np.mean(total_loss) 677 | mean_accuracy = np.mean(total_accuracy) 678 | return mean_loss, mean_accuracy 679 | # def test(self, data, batch_size): 680 | # images, labels = self.input_pipeline(batch_size, 681 | # data, test=True) 682 | # self.is_training = tf.constant(False, dtype=tf.bool) 683 | # train_images = self.images 684 | # train_labels = self.labels 685 | # self.images = tf.cast(images, tf.float32) 686 | # self.labels = tf.cast(labels, tf.float32) 687 | # 688 | # coord = tf.train.Coordinator() 689 | # threads = tf.train.start_queue_runners(sess=self.sess, coord=coord) 690 | # 691 | # num_examples = data.num_examples 692 | # total_loss = [] 693 | # total_accuracy = [] 694 | # for i in range(num_examples // batch_size): 695 | # loss, accuracy = self.sess.run([self.cross_entropy, self.accuracy]) 696 | # total_loss.append(loss) 697 | # total_accuracy.append(accuracy) 698 | # mean_loss = np.mean(total_loss) 699 | # mean_accuracy = np.mean(total_accuracy) 700 | # self.images = train_images 701 | # self.labels = train_labels 702 | # return mean_loss, mean_accuracy 703 | -------------------------------------------------------------------------------- /requirements/base.txt: -------------------------------------------------------------------------------- 1 | ipdb 2 | ipython 3 | matplotlib 4 | numpy 5 | Pillow 6 | scipy 7 | -------------------------------------------------------------------------------- /requirements/cpu.txt: -------------------------------------------------------------------------------- 1 | -r base.txt 2 | tensorflow>=0.10.0 3 | -------------------------------------------------------------------------------- /requirements/gpu.txt: -------------------------------------------------------------------------------- 1 | -r base.txt 2 | tensorflow-gpu>=0.10.0 3 | -------------------------------------------------------------------------------- /run_dense_net.py: -------------------------------------------------------------------------------- 1 | import argparse 2 | 3 | from models.dense_net import DenseNet 4 | from data_providers.utils import get_data_provider_by_name 5 | 6 | train_params_cifar = { 7 | 'batch_size': 100, 8 | 'n_epochs': 100, 9 | 'initial_learning_rate': 0.1, 10 | 'reduce_lr_epoch_1': 150, # epochs * 0.5 11 | 'reduce_lr_epoch_2': 225, # epochs * 0.75 12 | 'validation_set': True, 13 | 'validation_split': None, # None or float 14 | 'shuffle': 'every_epoch', # None, once_prior_train, every_epoch 15 | 'normalization': 'by_chanels', # None, divide_256, divide_255, by_chanels 16 | } 17 | 18 | train_params_svhn = { 19 | 'batch_size': 100, 20 | 'n_epochs': 20, 21 | 'initial_learning_rate': 0.1, 22 | 'reduce_lr_epoch_1': 20, 23 | 'reduce_lr_epoch_2': 30, 24 | 'validation_set': True, 25 | 'validation_split': None, # you may set it 6000 as in the paper 26 | 'shuffle': True, # shuffle dataset every epoch or not 27 | 'normalization': 'divide_255', 28 | } 29 | 30 | 31 | def get_train_params_by_name(name): 32 | if name in ['C10', 'C10+', 'C100', 'C100+']: 33 | return train_params_cifar 34 | if name == 'SVHN': 35 | return train_params_svhn 36 | 37 | 38 | if __name__ == '__main__': 39 | parser = argparse.ArgumentParser() 40 | parser.add_argument( 41 | '--train', action='store_true', 42 | help='Train the model') 43 | parser.add_argument( 44 | '--test', action='store_true', 45 | help='Test model for required dataset if pretrained model exists.' 46 | 'If provided together with `--train` flag testing will be' 47 | 'performed right after training.') 48 | parser.add_argument( 49 | '--model_type', '-m', type=str, choices=['DenseNet', 'DenseNet-BC'], 50 | default='DenseNet', 51 | help='What type of model to use') 52 | parser.add_argument( 53 | '--growth_rate', '-k', type=int, choices=[2, 12, 24, 40], 54 | default=12, 55 | help='Growth rate for every layer, ' 56 | 'choices were restricted to used in paper') 57 | parser.add_argument( 58 | '--depth', '-d', type=int, choices=[40, 100, 129, 154, 190, 250], 59 | default=40, 60 | help='Depth of whole network, restricted to paper choices') 61 | parser.add_argument( 62 | '--dataset', '-ds', type=str, 63 | choices=['C10', 'C10+', 'C100', 'C100+', 'SVHN'], 64 | default='C10', 65 | help='What dataset should be used') 66 | parser.add_argument( 67 | '--total_blocks', '-tb', type=int, default=3, metavar='', 68 | help='Total blocks of layers stack (default: %(default)s)') 69 | parser.add_argument( 70 | '--keep_prob', '-kp', type=float, metavar='', 71 | help="Keep probability for dropout.") 72 | parser.add_argument( 73 | '--weight_decay', '-wd', type=float, default=1e-4, metavar='', 74 | help='Weight decay for optimizer (default: %(default)s)') 75 | parser.add_argument( 76 | '--nesterov_momentum', '-nm', type=float, default=0.9, metavar='', 77 | help='Nesterov momentum (default: %(default)s)') 78 | parser.add_argument( 79 | '--reduction', '-red', type=float, default=0.5, metavar='', 80 | help='reduction Theta at transition layer for DenseNets-BC models') 81 | 82 | parser.add_argument( 83 | '--logs', dest='should_save_logs', action='store_true', 84 | help='Write tensorflow logs') 85 | parser.add_argument( 86 | '--no-logs', dest='should_save_logs', action='store_false', 87 | help='Do not write tensorflow logs') 88 | parser.set_defaults(should_save_logs=True) 89 | 90 | parser.add_argument( 91 | '--saves', dest='should_save_model', action='store_true', 92 | help='Save model during training') 93 | parser.add_argument( 94 | '--no-saves', dest='should_save_model', action='store_false', 95 | help='Do not save model during training') 96 | parser.set_defaults(should_save_model=True) 97 | 98 | parser.add_argument( 99 | '--renew-logs', dest='renew_logs', action='store_true', 100 | help='Erase previous logs for model if exists.') 101 | parser.add_argument( 102 | '--not-renew-logs', dest='renew_logs', action='store_false', 103 | help='Do not erase previous logs for model if exists.') 104 | parser.set_defaults(renew_logs=True) 105 | 106 | parser.add_argument( 107 | '--num_inter_threads', '-inter', type=int, default=1, metavar='', 108 | help='number of inter threads for inference / test') 109 | parser.add_argument( 110 | '--num_intra_threads', '-intra', type=int, default=128, metavar='', 111 | help='number of intra threads for inference / test') 112 | 113 | parser.add_argument( 114 | '--sdr', dest='use_sdr', action='store_true', 115 | help='Use Stochastic Delta Rule instead of dropout.') 116 | parser.set_defaults(use_sdr=False) 117 | parser.add_argument( 118 | '--no_histograms', dest='no_histograms', action='store_true', 119 | help='Omit histograms from logs to save disk space and speed up training') 120 | parser.set_defaults(use_sdr=False) 121 | 122 | args = parser.parse_args() 123 | 124 | if not args.keep_prob: 125 | if args.dataset in ['C10', 'C100', 'SVHN']: 126 | args.keep_prob = 0.8 127 | else: 128 | args.keep_prob = 1.0 129 | 130 | if args.use_sdr: 131 | args.keep_prob = 1.0 132 | if args.model_type == 'DenseNet': 133 | args.bc_mode = False 134 | args.reduction = 1.0 135 | elif args.model_type == 'DenseNet-BC': 136 | args.bc_mode = True 137 | 138 | model_params = vars(args) 139 | 140 | if not args.train and not args.test: 141 | print("You should train or test your network. Please check params.") 142 | exit() 143 | 144 | # some default params dataset/architecture related 145 | train_params = get_train_params_by_name(args.dataset) 146 | print("Params:") 147 | for k, v in model_params.items(): 148 | print("\t%s: %s" % (k, v)) 149 | print("Train params:") 150 | for k, v in train_params.items(): 151 | print("\t%s: %s" % (k, v)) 152 | 153 | print("Prepare training data...") 154 | data_provider = get_data_provider_by_name(args.dataset, train_params) 155 | print("Initialize the model..") 156 | model = DenseNet(data_provider=data_provider, **model_params) 157 | if args.train: 158 | print("Data provider train images: ", data_provider.train.num_examples) 159 | model.train_all_epochs(train_params) 160 | if args.test: 161 | if not args.train: 162 | model.load_model() 163 | print("Data provider test images: ", data_provider.test.num_examples) 164 | print("Testing...") 165 | loss, accuracy = model.test(data_provider.test, batch_size=200) 166 | print("mean cross_entropy: %f, mean accuracy: %f" % (loss, accuracy)) 167 | --------------------------------------------------------------------------------