├── .gitignore ├── LICENSE ├── README.md ├── __init__.py ├── fcn16_downsampled.png ├── fcn16_upsampled.png ├── fcn16_vgg.py ├── fcn32_downsampled.png ├── fcn32_upsampled.png ├── fcn32_vgg.py ├── fcn8_downsampled.png ├── fcn8_upsampled.png ├── fcn8_vgg.py ├── loss.py ├── requirements.txt ├── test_data └── tabby_cat.png ├── test_fcn16_vgg.py ├── test_fcn32_vgg.py ├── test_fcn8_vgg.py └── utils.py /.gitignore: -------------------------------------------------------------------------------- 1 | # Byte-compiled / optimized / DLL files 2 | __pycache__/ 3 | *.py[cod] 4 | *$py.class 5 | *.npy 6 | 7 | # C extensions 8 | *.so 9 | 10 | # Distribution / packaging 11 | .Python 12 | env/ 13 | build/ 14 | develop-eggs/ 15 | dist/ 16 | downloads/ 17 | eggs/ 18 | .eggs/ 19 | lib/ 20 | lib64/ 21 | parts/ 22 | sdist/ 23 | var/ 24 | *.egg-info/ 25 | .installed.cfg 26 | *.egg 27 | 28 | # PyInstaller 29 | # Usually these files are written by a python script from a template 30 | # before PyInstaller builds the exe, so as to inject date/other infos into it. 31 | *.manifest 32 | *.spec 33 | 34 | # Installer logs 35 | pip-log.txt 36 | pip-delete-this-directory.txt 37 | 38 | # Unit test / coverage reports 39 | htmlcov/ 40 | .tox/ 41 | .coverage 42 | .coverage.* 43 | .cache 44 | nosetests.xml 45 | coverage.xml 46 | *,cover 47 | .hypothesis/ 48 | 49 | # Translations 50 | *.mo 51 | *.pot 52 | 53 | # Django stuff: 54 | *.log 55 | 56 | # Sphinx documentation 57 | docs/_build/ 58 | 59 | # PyBuilder 60 | target/ 61 | 62 | #Ipython Notebook 63 | .ipynb_checkpoints 64 | 65 | # IDE 66 | .idea 67 | -------------------------------------------------------------------------------- /LICENSE: -------------------------------------------------------------------------------- 1 | The MIT License (MIT) 2 | 3 | Copyright (c) 2016 Marvin Teichmann 4 | 5 | Permission is hereby granted, free of charge, to any person obtaining a copy 6 | of this software and associated documentation files (the "Software"), to deal 7 | in the Software without restriction, including without limitation the rights 8 | to use, copy, modify, merge, publish, distribute, sublicense, and/or sell 9 | copies of the Software, and to permit persons to whom the Software is 10 | furnished to do so, subject to the following conditions: 11 | 12 | The above copyright notice and this permission notice shall be included in all 13 | copies or substantial portions of the Software. 14 | 15 | THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR 16 | IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, 17 | FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE 18 | AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER 19 | LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, 20 | OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE 21 | SOFTWARE. 22 | -------------------------------------------------------------------------------- /README.md: -------------------------------------------------------------------------------- 1 | ### Update 2 | 3 | An example on how to integrate this code into your own semantic segmentation pipeline can be found in my [KittiSeg](https://github.com/MarvinTeichmann/KittiSeg) project repository. 4 | 5 | # tensorflow-fcn 6 | This is a one file Tensorflow implementation of [Fully Convolutional Networks](http://arxiv.org/abs/1411.4038) in Tensorflow. The code can easily be integrated in your semantic segmentation pipeline. The network can be applied directly or finetuned to perform semantic segmentation using tensorflow training code. 7 | 8 | Deconvolution Layers are initialized as bilinear upsampling. Conv and FCN layer weights using VGG weights. Numpy load is used to read VGG weights. No Caffe or Caffe-Tensorflow is required to run this. **The .npy file for [VGG16] to be downloaded before using this needwork**. You can find the file here: ftp://mi.eng.cam.ac.uk/pub/mttt2/models/vgg16.npy 9 | 10 | No Pascal VOC finetuning was applied to the weights. The model is meant to be finetuned on your own data. The model can be applied to an image directly (see `test_fcn32_vgg.py`) but the result will be rather coarse. 11 | 12 | ## Requirements 13 | 14 | In addition to tensorflow the following packages are required: 15 | 16 | numpy 17 | scipy 18 | pillow 19 | matplotlib 20 | 21 | Those packages can be installed by running `pip install -r requirements.txt` or `pip install numpy scipy pillow matplotlib`. 22 | 23 | ### Tensorflow 1.0rc 24 | 25 | This code requires `Tensorflow Version >= 1.0rc` to run. If you want to use older Version you can try using commit `bf9400c6303826e1c25bf09a3b032e51cef57e3b`. This Commit has been tested using the pip version of `0.12`, `0.11` and `0.10`. 26 | 27 | Tensorflow 1.0 comes with a large number of breaking api changes. If you are currently running an older tensorflow version, I would suggest creating a new `virtualenv` and install 1.0rc using: 28 | 29 | ```bash 30 | export TF_BINARY_URL=https://storage.googleapis.com/tensorflow/linux/gpu/tensorflow_gpu-1.0.0rc0-cp27-none-linux_x86_64.whl 31 | pip install --upgrade $TF_BINARY_URL 32 | ``` 33 | 34 | Above commands will install the linux version with gpu support. For other versions follow the instructions [here](https://www.tensorflow.org/versions/r1.0/get_started/os_setup). 35 | 36 | ## Usage 37 | 38 | `python test_fcn32_vgg.py` to test the implementation. 39 | 40 | Use this to build the VGG object for finetuning: 41 | 42 | ``` 43 | vgg = vgg16.Vgg16() 44 | vgg.build(images, train=True, num_classes=num_classes, random_init_fc8=True) 45 | ``` 46 | The `images` is a tensor with shape `[None, h, w, 3]`. Where `h` and `w` can have arbitrary size. 47 | >Trick: the tensor can be a placeholder, a variable or even a constant. 48 | 49 | Be aware, that `num_classes` influences the way `score_fr` (the original `fc8` layer) is initialized. For finetuning I recommend using the option `random_init_fc8=True`. 50 | 51 | ### Training 52 | 53 | Example code for training can be found in the [KittiSeg](https://github.com/MarvinTeichmann/KittiSeg) project repository. 54 | 55 | ### Finetuning and training 56 | 57 | For training build the graph using `vgg.build(images, train=True, num_classes=num_classes)` were images is q queue yielding image batches. Use a softmax_cross_entropy loss function on top of the output of vgg.up. An Implementation of the loss function can be found in `loss.py`. 58 | 59 | To train the graph you need an input producer and a training script. Have a look at [TensorVision](https://github.com/TensorVision/TensorVision/blob/9db59e2f23755a17ddbae558f21ae371a07f1a83/tensorvision/train.py) to see how to build those. 60 | 61 | I had success finetuning the network using Adam Optimizer with a learning rate of `1e-6`. 62 | 63 | ## Content 64 | 65 | Currently the following Models are provided: 66 | 67 | - FCN32 68 | - FCN16 69 | - FCN8 70 | 71 | ## Remark 72 | 73 | The deconv layer of tensorflow allows to provide a shape. The crop layer of the original implementation is therefore not needed. 74 | 75 | I have slightly altered the naming of the upscore layer. 76 | 77 | #### Field of View 78 | 79 | The receptive field (also known as or `field of view`) of the provided model is: 80 | 81 | `( ( ( ( ( 7 ) * 2 + 6 ) * 2 + 6 ) * 2 + 6 ) * 2 + 4 ) * 2 + 4 = 404` 82 | 83 | ## Predecessors 84 | 85 | Weights were generated using [Caffe to Tensorflow](https://github.com/ethereon/caffe-tensorflow). The VGG implementation is based on [tensorflow-vgg16](https://github.com/ry/tensorflow-vgg16) and numpy loading is based on [tensorflow-vgg](https://github.com/machrisaa/tensorflow-vgg). You do not need any of the above cited code to run the model, not do you need caffe. 86 | 87 | ## Install 88 | 89 | Installing matplotlib from pip requires the following packages to be installed `libpng-dev`, `libjpeg8-dev`, `libfreetype6-dev` and `pkg-config`. On Debian, Linux Mint and Ubuntu Systems type: 90 | 91 | `sudo apt-get install libpng-dev libjpeg8-dev libfreetype6-dev pkg-config`
92 | `pip install -r requirements.txt` 93 | 94 | ## TODO 95 | 96 | - Provide finetuned FCN weights. 97 | - Provide general training code 98 | -------------------------------------------------------------------------------- /__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MarvinTeichmann/tensorflow-fcn/83a828382f7eaeda584357a56094c78d9fa13013/__init__.py -------------------------------------------------------------------------------- /fcn16_downsampled.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MarvinTeichmann/tensorflow-fcn/83a828382f7eaeda584357a56094c78d9fa13013/fcn16_downsampled.png -------------------------------------------------------------------------------- /fcn16_upsampled.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MarvinTeichmann/tensorflow-fcn/83a828382f7eaeda584357a56094c78d9fa13013/fcn16_upsampled.png -------------------------------------------------------------------------------- /fcn16_vgg.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import 2 | from __future__ import division 3 | from __future__ import print_function 4 | 5 | import os 6 | import logging 7 | from math import ceil 8 | import sys 9 | 10 | import numpy as np 11 | import tensorflow as tf 12 | 13 | VGG_MEAN = [103.939, 116.779, 123.68] 14 | 15 | 16 | class FCN16VGG: 17 | 18 | def __init__(self, vgg16_npy_path=None): 19 | if vgg16_npy_path is None: 20 | path = sys.modules[self.__class__.__module__].__file__ 21 | # print path 22 | path = os.path.abspath(os.path.join(path, os.pardir)) 23 | # print path 24 | path = os.path.join(path, "vgg16.npy") 25 | vgg16_npy_path = path 26 | logging.info("Load npy file from '%s'.", vgg16_npy_path) 27 | if not os.path.isfile(vgg16_npy_path): 28 | logging.error(("File '%s' not found. Download it from " 29 | "ftp://mi.eng.cam.ac.uk/pub/mttt2/" 30 | "models/vgg16.npy"), vgg16_npy_path) 31 | sys.exit(1) 32 | 33 | self.data_dict = np.load(vgg16_npy_path, encoding='latin1').item() 34 | self.wd = 5e-4 35 | print("npy file loaded") 36 | 37 | def build(self, rgb, train=False, num_classes=20, random_init_fc8=False, 38 | debug=False): 39 | """ 40 | Build the VGG model using loaded weights 41 | Parameters 42 | ---------- 43 | rgb: image batch tensor 44 | Image in rgb shap. Scaled to Intervall [0, 255] 45 | train: bool 46 | Whether to build train or inference graph 47 | num_classes: int 48 | How many classes should be predicted (by fc8) 49 | random_init_fc8 : bool 50 | Whether to initialize fc8 layer randomly. 51 | Finetuning is required in this case. 52 | debug: bool 53 | Whether to print additional Debug Information. 54 | """ 55 | # Convert RGB to BGR 56 | 57 | with tf.name_scope('Processing'): 58 | # rgb = tf.image.convert_image_dtype(rgb, tf.float32) 59 | red, green, blue = tf.split(rgb, 3, 3) 60 | # assert red.get_shape().as_list()[1:] == [224, 224, 1] 61 | # assert green.get_shape().as_list()[1:] == [224, 224, 1] 62 | # assert blue.get_shape().as_list()[1:] == [224, 224, 1] 63 | bgr = tf.concat([ 64 | blue - VGG_MEAN[0], 65 | green - VGG_MEAN[1], 66 | red - VGG_MEAN[2]], axis=3) 67 | 68 | if debug: 69 | bgr = tf.Print(bgr, [tf.shape(bgr)], 70 | message='Shape of input image: ', 71 | summarize=4, first_n=1) 72 | 73 | self.conv1_1 = self._conv_layer(bgr, "conv1_1") 74 | self.conv1_2 = self._conv_layer(self.conv1_1, "conv1_2") 75 | self.pool1 = self._max_pool(self.conv1_2, 'pool1', debug) 76 | 77 | self.conv2_1 = self._conv_layer(self.pool1, "conv2_1") 78 | self.conv2_2 = self._conv_layer(self.conv2_1, "conv2_2") 79 | self.pool2 = self._max_pool(self.conv2_2, 'pool2', debug) 80 | 81 | self.conv3_1 = self._conv_layer(self.pool2, "conv3_1") 82 | self.conv3_2 = self._conv_layer(self.conv3_1, "conv3_2") 83 | self.conv3_3 = self._conv_layer(self.conv3_2, "conv3_3") 84 | self.pool3 = self._max_pool(self.conv3_3, 'pool3', debug) 85 | 86 | self.conv4_1 = self._conv_layer(self.pool3, "conv4_1") 87 | self.conv4_2 = self._conv_layer(self.conv4_1, "conv4_2") 88 | self.conv4_3 = self._conv_layer(self.conv4_2, "conv4_3") 89 | self.pool4 = self._max_pool(self.conv4_3, 'pool4', debug) 90 | 91 | self.conv5_1 = self._conv_layer(self.pool4, "conv5_1") 92 | self.conv5_2 = self._conv_layer(self.conv5_1, "conv5_2") 93 | 94 | self.conv5_3 = self._conv_layer(self.conv5_2, "conv5_3") 95 | self.pool5 = self._max_pool(self.conv5_3, 'pool5', debug) 96 | 97 | self.fc6 = self._fc_layer(self.pool5, "fc6") 98 | 99 | if train: 100 | self.fc6 = tf.nn.dropout(self.fc6, 0.5) 101 | 102 | self.fc7 = self._fc_layer(self.fc6, "fc7") 103 | if train: 104 | self.fc7 = tf.nn.dropout(self.fc7, 0.5) 105 | 106 | if random_init_fc8: 107 | self.score_fr = self._score_layer(self.fc7, "score_fr", 108 | num_classes) 109 | else: 110 | self.score_fr = self._fc_layer(self.fc7, "score_fr", 111 | num_classes=num_classes, 112 | relu=False) 113 | 114 | self.pred = tf.argmax(self.score_fr, dimension=3) 115 | 116 | self.upscore2 = self._upscore_layer(self.score_fr, 117 | shape=tf.shape(self.pool4), 118 | num_classes=num_classes, 119 | debug=debug, name='upscore2', 120 | ksize=4, stride=2) 121 | 122 | self.score_pool4 = self._score_layer(self.pool4, "score_pool4", 123 | num_classes=num_classes) 124 | 125 | self.fuse_pool4 = tf.add(self.upscore2, self.score_pool4) 126 | 127 | self.upscore32 = self._upscore_layer(self.fuse_pool4, 128 | shape=tf.shape(bgr), 129 | num_classes=num_classes, 130 | debug=debug, name='upscore32', 131 | ksize=32, stride=16) 132 | 133 | self.pred_up = tf.argmax(self.upscore32, dimension=3) 134 | 135 | def _max_pool(self, bottom, name, debug): 136 | pool = tf.nn.max_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], 137 | padding='SAME', name=name) 138 | 139 | if debug: 140 | pool = tf.Print(pool, [tf.shape(pool)], 141 | message='Shape of %s' % name, 142 | summarize=4, first_n=1) 143 | return pool 144 | 145 | def _conv_layer(self, bottom, name): 146 | with tf.variable_scope(name) as scope: 147 | filt = self.get_conv_filter(name) 148 | conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding='SAME') 149 | 150 | conv_biases = self.get_bias(name) 151 | bias = tf.nn.bias_add(conv, conv_biases) 152 | 153 | relu = tf.nn.relu(bias) 154 | # Add summary to Tensorboard 155 | _activation_summary(relu) 156 | return relu 157 | 158 | def _fc_layer(self, bottom, name, num_classes=None, 159 | relu=True, debug=False): 160 | with tf.variable_scope(name) as scope: 161 | shape = bottom.get_shape().as_list() 162 | 163 | if name == 'fc6': 164 | filt = self.get_fc_weight_reshape(name, [7, 7, 512, 4096]) 165 | elif name == 'score_fr': 166 | name = 'fc8' # Name of score_fr layer in VGG Model 167 | filt = self.get_fc_weight_reshape(name, [1, 1, 4096, 1000], 168 | num_classes=num_classes) 169 | else: 170 | filt = self.get_fc_weight_reshape(name, [1, 1, 4096, 4096]) 171 | conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding='SAME') 172 | conv_biases = self.get_bias(name, num_classes=num_classes) 173 | bias = tf.nn.bias_add(conv, conv_biases) 174 | 175 | if relu: 176 | bias = tf.nn.relu(bias) 177 | _activation_summary(bias) 178 | 179 | if debug: 180 | bias = tf.Print(bias, [tf.shape(bias)], 181 | message='Shape of %s' % name, 182 | summarize=4, first_n=1) 183 | return bias 184 | 185 | def _score_layer(self, bottom, name, num_classes): 186 | with tf.variable_scope(name) as scope: 187 | # get number of input channels 188 | in_features = bottom.get_shape()[3].value 189 | shape = [1, 1, in_features, num_classes] 190 | # He initialization Sheme 191 | if name == "score_fr": 192 | num_input = in_features 193 | stddev = (2 / num_input)**0.5 194 | elif name == "score_pool4": 195 | stddev = 0.001 196 | # Apply convolution 197 | w_decay = self.wd 198 | weights = self._variable_with_weight_decay(shape, stddev, w_decay) 199 | conv = tf.nn.conv2d(bottom, weights, [1, 1, 1, 1], padding='SAME') 200 | # Apply bias 201 | conv_biases = self._bias_variable([num_classes], constant=0.0) 202 | bias = tf.nn.bias_add(conv, conv_biases) 203 | 204 | _activation_summary(bias) 205 | 206 | return bias 207 | 208 | def _upscore_layer(self, bottom, shape, 209 | num_classes, name, debug, 210 | ksize=4, stride=2): 211 | strides = [1, stride, stride, 1] 212 | with tf.variable_scope(name): 213 | in_features = bottom.get_shape()[3].value 214 | 215 | if shape is None: 216 | # Compute shape out of Bottom 217 | in_shape = tf.shape(bottom) 218 | 219 | h = ((in_shape[1] - 1) * stride) + 1 220 | w = ((in_shape[2] - 1) * stride) + 1 221 | new_shape = [in_shape[0], h, w, num_classes] 222 | else: 223 | new_shape = [shape[0], shape[1], shape[2], num_classes] 224 | output_shape = tf.stack(new_shape) 225 | 226 | logging.debug("Layer: %s, Fan-in: %d" % (name, in_features)) 227 | f_shape = [ksize, ksize, num_classes, in_features] 228 | 229 | # create 230 | num_input = ksize * ksize * in_features / stride 231 | stddev = (2 / num_input)**0.5 232 | 233 | weights = self.get_deconv_filter(f_shape) 234 | deconv = tf.nn.conv2d_transpose(bottom, weights, output_shape, 235 | strides=strides, padding='SAME') 236 | 237 | if debug: 238 | deconv = tf.Print(deconv, [tf.shape(deconv)], 239 | message='Shape of %s' % name, 240 | summarize=4, first_n=1) 241 | 242 | _activation_summary(deconv) 243 | return deconv 244 | 245 | def get_deconv_filter(self, f_shape): 246 | width = f_shape[0] 247 | height = f_shape[1] 248 | f = ceil(width/2.0) 249 | c = (2 * f - 1 - f % 2) / (2.0 * f) 250 | bilinear = np.zeros([f_shape[0], f_shape[1]]) 251 | for x in range(width): 252 | for y in range(height): 253 | value = (1 - abs(x / f - c)) * (1 - abs(y / f - c)) 254 | bilinear[x, y] = value 255 | weights = np.zeros(f_shape) 256 | for i in range(f_shape[2]): 257 | weights[:, :, i, i] = bilinear 258 | 259 | init = tf.constant_initializer(value=weights, 260 | dtype=tf.float32) 261 | return tf.get_variable(name="up_filter", initializer=init, 262 | shape=weights.shape) 263 | 264 | def get_conv_filter(self, name): 265 | init = tf.constant_initializer(value=self.data_dict[name][0], 266 | dtype=tf.float32) 267 | shape = self.data_dict[name][0].shape 268 | print('Layer name: %s' % name) 269 | print('Layer shape: %s' % str(shape)) 270 | var = tf.get_variable(name="filter", initializer=init, shape=shape) 271 | if not tf.get_variable_scope().reuse: 272 | weight_decay = tf.multiply(tf.nn.l2_loss(var), self.wd, 273 | name='weight_loss') 274 | tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, 275 | weight_decay) 276 | return var 277 | 278 | def get_bias(self, name, num_classes=None): 279 | bias_wights = self.data_dict[name][1] 280 | shape = self.data_dict[name][1].shape 281 | if name == 'fc8': 282 | bias_wights = self._bias_reshape(bias_wights, shape[0], 283 | num_classes) 284 | shape = [num_classes] 285 | init = tf.constant_initializer(value=bias_wights, 286 | dtype=tf.float32) 287 | return tf.get_variable(name="biases", initializer=init, shape=shape) 288 | 289 | def get_fc_weight(self, name): 290 | init = tf.constant_initializer(value=self.data_dict[name][0], 291 | dtype=tf.float32) 292 | shape = self.data_dict[name][0].shape 293 | var = tf.get_variable(name="weights", initializer=init, shape=shape) 294 | if not tf.get_variable_scope().reuse: 295 | weight_decay = tf.multiply(tf.nn.l2_loss(var), self.wd, 296 | name='weight_loss') 297 | tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, 298 | weight_decay) 299 | return var 300 | 301 | def _bias_reshape(self, bweight, num_orig, num_new): 302 | """ Build bias weights for filter produces with `_summary_reshape` 303 | 304 | """ 305 | n_averaged_elements = num_orig//num_new 306 | avg_bweight = np.zeros(num_new) 307 | for i in range(0, num_orig, n_averaged_elements): 308 | start_idx = i 309 | end_idx = start_idx + n_averaged_elements 310 | avg_idx = start_idx//n_averaged_elements 311 | if avg_idx == num_new: 312 | break 313 | avg_bweight[avg_idx] = np.mean(bweight[start_idx:end_idx]) 314 | return avg_bweight 315 | 316 | def _summary_reshape(self, fweight, shape, num_new): 317 | """ Produce weights for a reduced fully-connected layer. 318 | 319 | FC8 of VGG produces 1000 classes. Most semantic segmentation 320 | task require much less classes. This reshapes the original weights 321 | to be used in a fully-convolutional layer which produces num_new 322 | classes. To archive this the average (mean) of n adjanced classes is 323 | taken. 324 | 325 | Consider reordering fweight, to perserve semantic meaning of the 326 | weights. 327 | 328 | Args: 329 | fweight: original weights 330 | shape: shape of the desired fully-convolutional layer 331 | num_new: number of new classes 332 | 333 | 334 | Returns: 335 | Filter weights for `num_new` classes. 336 | """ 337 | num_orig = shape[3] 338 | shape[3] = num_new 339 | assert(num_new < num_orig) 340 | n_averaged_elements = num_orig//num_new 341 | avg_fweight = np.zeros(shape) 342 | for i in range(0, num_orig, n_averaged_elements): 343 | start_idx = i 344 | end_idx = start_idx + n_averaged_elements 345 | avg_idx = start_idx//n_averaged_elements 346 | if avg_idx == num_new: 347 | break 348 | avg_fweight[:, :, :, avg_idx] = np.mean( 349 | fweight[:, :, :, start_idx:end_idx], axis=3) 350 | return avg_fweight 351 | 352 | def _variable_with_weight_decay(self, shape, stddev, wd): 353 | """Helper to create an initialized Variable with weight decay. 354 | 355 | Note that the Variable is initialized with a truncated normal 356 | distribution. 357 | A weight decay is added only if one is specified. 358 | 359 | Args: 360 | name: name of the variable 361 | shape: list of ints 362 | stddev: standard deviation of a truncated Gaussian 363 | wd: add L2Loss weight decay multiplied by this float. If None, weight 364 | decay is not added for this Variable. 365 | 366 | Returns: 367 | Variable Tensor 368 | """ 369 | 370 | initializer = tf.truncated_normal_initializer(stddev=stddev) 371 | var = tf.get_variable('weights', shape=shape, 372 | initializer=initializer) 373 | 374 | if wd and (not tf.get_variable_scope().reuse): 375 | weight_decay = tf.multiply( 376 | tf.nn.l2_loss(var), wd, name='weight_loss') 377 | tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, 378 | weight_decay) 379 | return var 380 | 381 | def _bias_variable(self, shape, constant=0.0): 382 | initializer = tf.constant_initializer(constant) 383 | return tf.get_variable(name='biases', shape=shape, 384 | initializer=initializer) 385 | 386 | def get_fc_weight_reshape(self, name, shape, num_classes=None): 387 | print('Layer name: %s' % name) 388 | print('Layer shape: %s' % shape) 389 | weights = self.data_dict[name][0] 390 | weights = weights.reshape(shape) 391 | if num_classes is not None: 392 | weights = self._summary_reshape(weights, shape, 393 | num_new=num_classes) 394 | init = tf.constant_initializer(value=weights, 395 | dtype=tf.float32) 396 | return tf.get_variable(name="weights", initializer=init, shape=shape) 397 | 398 | 399 | def _activation_summary(x): 400 | """Helper to create summaries for activations. 401 | 402 | Creates a summary that provides a histogram of activations. 403 | Creates a summary that measure the sparsity of activations. 404 | 405 | Args: 406 | x: Tensor 407 | Returns: 408 | nothing 409 | """ 410 | # Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training 411 | # session. This helps the clarity of presentation on tensorboard. 412 | tensor_name = x.op.name 413 | # tensor_name = re.sub('%s_[0-9]*/' % TOWER_NAME, '', x.op.name) 414 | tf.summary.histogram(tensor_name + '/activations', x) 415 | tf.summary.scalar(tensor_name + '/sparsity', tf.nn.zero_fraction(x)) 416 | -------------------------------------------------------------------------------- /fcn32_downsampled.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MarvinTeichmann/tensorflow-fcn/83a828382f7eaeda584357a56094c78d9fa13013/fcn32_downsampled.png -------------------------------------------------------------------------------- /fcn32_upsampled.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MarvinTeichmann/tensorflow-fcn/83a828382f7eaeda584357a56094c78d9fa13013/fcn32_upsampled.png -------------------------------------------------------------------------------- /fcn32_vgg.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import 2 | from __future__ import division 3 | from __future__ import print_function 4 | 5 | import os 6 | import logging 7 | from math import ceil 8 | import sys 9 | 10 | import numpy as np 11 | import tensorflow as tf 12 | 13 | VGG_MEAN = [103.939, 116.779, 123.68] 14 | 15 | 16 | class FCN32VGG: 17 | 18 | def __init__(self, vgg16_npy_path=None): 19 | if vgg16_npy_path is None: 20 | path = sys.modules[self.__class__.__module__].__file__ 21 | # print path 22 | path = os.path.abspath(os.path.join(path, os.pardir)) 23 | # print path 24 | path = os.path.join(path, "vgg16.npy") 25 | vgg16_npy_path = path 26 | logging.info("Load npy file from '%s'.", vgg16_npy_path) 27 | if not os.path.isfile(vgg16_npy_path): 28 | logging.error(("File '%s' not found. Download it from " 29 | "ftp://mi.eng.cam.ac.uk/pub/mttt2/" 30 | "models/vgg16.npy"), vgg16_npy_path) 31 | sys.exit(1) 32 | 33 | self.data_dict = np.load(vgg16_npy_path, encoding='latin1').item() 34 | self.wd = 5e-4 35 | print("npy file loaded") 36 | 37 | def build(self, rgb, train=False, num_classes=20, random_init_fc8=False, 38 | debug=False): 39 | """ 40 | Build the VGG model using loaded weights 41 | Parameters 42 | ---------- 43 | rgb: image batch tensor 44 | Image in rgb shap. Scaled to Intervall [0, 255] 45 | train: bool 46 | Whether to build train or inference graph 47 | num_classes: int 48 | How many classes should be predicted (by fc8) 49 | random_init_fc8 : bool 50 | Whether to initialize fc8 layer randomly. 51 | Finetuning is required in this case. 52 | debug: bool 53 | Whether to print additional Debug Information. 54 | """ 55 | # Convert RGB to BGR 56 | 57 | with tf.name_scope('Processing'): 58 | 59 | red, green, blue = tf.split(rgb, 3, 3) 60 | # assert red.get_shape().as_list()[1:] == [224, 224, 1] 61 | # assert green.get_shape().as_list()[1:] == [224, 224, 1] 62 | # assert blue.get_shape().as_list()[1:] == [224, 224, 1] 63 | bgr = tf.concat([ 64 | blue - VGG_MEAN[0], 65 | green - VGG_MEAN[1], 66 | red - VGG_MEAN[2], 67 | ], 3) 68 | 69 | if debug: 70 | bgr = tf.Print(bgr, [tf.shape(bgr)], 71 | message='Shape of input image: ', 72 | summarize=4, first_n=1) 73 | 74 | self.conv1_1 = self._conv_layer(bgr, "conv1_1") 75 | self.conv1_2 = self._conv_layer(self.conv1_1, "conv1_2") 76 | self.pool1 = self._max_pool(self.conv1_2, 'pool1', debug) 77 | 78 | self.conv2_1 = self._conv_layer(self.pool1, "conv2_1") 79 | self.conv2_2 = self._conv_layer(self.conv2_1, "conv2_2") 80 | self.pool2 = self._max_pool(self.conv2_2, 'pool2', debug) 81 | 82 | self.conv3_1 = self._conv_layer(self.pool2, "conv3_1") 83 | self.conv3_2 = self._conv_layer(self.conv3_1, "conv3_2") 84 | self.conv3_3 = self._conv_layer(self.conv3_2, "conv3_3") 85 | self.pool3 = self._max_pool(self.conv3_3, 'pool3', debug) 86 | 87 | self.conv4_1 = self._conv_layer(self.pool3, "conv4_1") 88 | self.conv4_2 = self._conv_layer(self.conv4_1, "conv4_2") 89 | self.conv4_3 = self._conv_layer(self.conv4_2, "conv4_3") 90 | self.pool4 = self._max_pool(self.conv4_3, 'pool4', debug) 91 | 92 | self.conv5_1 = self._conv_layer(self.pool4, "conv5_1") 93 | self.conv5_2 = self._conv_layer(self.conv5_1, "conv5_2") 94 | self.conv5_3 = self._conv_layer(self.conv5_2, "conv5_3") 95 | self.pool5 = self._max_pool(self.conv5_3, 'pool5', debug) 96 | 97 | self.fc6 = self._fc_layer(self.pool5, "fc6") 98 | 99 | if train: 100 | self.fc6 = tf.nn.dropout(self.fc6, 0.5) 101 | 102 | self.fc7 = self._fc_layer(self.fc6, "fc7") 103 | if train: 104 | self.fc7 = tf.nn.dropout(self.fc7, 0.5) 105 | 106 | if random_init_fc8: 107 | self.score_fr = self._score_layer(self.fc7, "score_fr", 108 | num_classes) 109 | else: 110 | self.score_fr = self._fc_layer(self.fc7, "score_fr", 111 | num_classes=num_classes, 112 | relu=False) 113 | 114 | self.pred = tf.argmax(self.score_fr, dimension=3) 115 | 116 | self.upscore = self._upscore_layer(self.score_fr, shape=tf.shape(bgr), 117 | num_classes=num_classes, 118 | debug=debug, 119 | name='up', ksize=64, stride=32) 120 | 121 | self.pred_up = tf.argmax(self.upscore, dimension=3) 122 | 123 | def _max_pool(self, bottom, name, debug): 124 | pool = tf.nn.max_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], 125 | padding='SAME', name=name) 126 | 127 | if debug: 128 | pool = tf.Print(pool, [tf.shape(pool)], 129 | message='Shape of %s' % name, 130 | summarize=4, first_n=1) 131 | return pool 132 | 133 | def _conv_layer(self, bottom, name): 134 | with tf.variable_scope(name) as scope: 135 | filt = self.get_conv_filter(name) 136 | conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding='SAME') 137 | 138 | conv_biases = self.get_bias(name) 139 | bias = tf.nn.bias_add(conv, conv_biases) 140 | 141 | relu = tf.nn.relu(bias) 142 | # Add summary to Tensorboard 143 | _activation_summary(relu) 144 | return relu 145 | 146 | def _fc_layer(self, bottom, name, num_classes=None, 147 | relu=True, debug=False): 148 | with tf.variable_scope(name) as scope: 149 | shape = bottom.get_shape().as_list() 150 | 151 | if name == 'fc6': 152 | filt = self.get_fc_weight_reshape(name, [7, 7, 512, 4096]) 153 | elif name == 'score_fr': 154 | name = 'fc8' # Name of score_fr layer in VGG Model 155 | filt = self.get_fc_weight_reshape(name, [1, 1, 4096, 1000], 156 | num_classes=num_classes) 157 | else: 158 | filt = self.get_fc_weight_reshape(name, [1, 1, 4096, 4096]) 159 | conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding='SAME') 160 | conv_biases = self.get_bias(name, num_classes=num_classes) 161 | bias = tf.nn.bias_add(conv, conv_biases) 162 | 163 | if relu: 164 | bias = tf.nn.relu(bias) 165 | _activation_summary(bias) 166 | 167 | if debug: 168 | bias = tf.Print(bias, [tf.shape(bias)], 169 | message='Shape of %s' % name, 170 | summarize=4, first_n=1) 171 | return bias 172 | 173 | def _score_layer(self, bottom, name, num_classes): 174 | with tf.variable_scope(name) as scope: 175 | # get number of input channels 176 | in_features = bottom.get_shape()[3].value 177 | shape = [1, 1, in_features, num_classes] 178 | # He initialization Sheme 179 | num_input = in_features 180 | stddev = (2 / num_input)**0.5 181 | # Apply convolution 182 | w_decay = self.wd 183 | weights = self._variable_with_weight_decay(shape, stddev, w_decay) 184 | conv = tf.nn.conv2d(bottom, weights, [1, 1, 1, 1], padding='SAME') 185 | # Apply bias 186 | conv_biases = self._bias_variable([num_classes], constant=0.0) 187 | bias = tf.nn.bias_add(conv, conv_biases) 188 | 189 | _activation_summary(bias) 190 | 191 | return bias 192 | 193 | def _upscore_layer(self, bottom, shape, 194 | num_classes, name, debug, 195 | ksize=4, stride=2): 196 | strides = [1, stride, stride, 1] 197 | with tf.variable_scope(name): 198 | in_features = bottom.get_shape()[3].value 199 | 200 | if shape is None: 201 | # Compute shape out of Bottom 202 | in_shape = tf.shape(bottom) 203 | 204 | h = ((in_shape[1] - 1) * stride) + 1 205 | w = ((in_shape[2] - 1) * stride) + 1 206 | new_shape = [in_shape[0], h, w, num_classes] 207 | else: 208 | new_shape = [shape[0], shape[1], shape[2], num_classes] 209 | output_shape = tf.stack(new_shape) 210 | 211 | logging.debug("Layer: %s, Fan-in: %d" % (name, in_features)) 212 | f_shape = [ksize, ksize, num_classes, in_features] 213 | 214 | # create 215 | num_input = ksize * ksize * in_features / stride 216 | stddev = (2 / num_input)**0.5 217 | 218 | weights = self.get_deconv_filter(f_shape) 219 | deconv = tf.nn.conv2d_transpose(bottom, weights, output_shape, 220 | strides=strides, padding='SAME') 221 | 222 | if debug: 223 | deconv = tf.Print(deconv, [tf.shape(deconv)], 224 | message='Shape of %s' % name, 225 | summarize=4, first_n=1) 226 | 227 | _activation_summary(deconv) 228 | return deconv 229 | 230 | def get_deconv_filter(self, f_shape): 231 | width = f_shape[0] 232 | height = f_shape[1] 233 | f = ceil(width/2.0) 234 | c = (2 * f - 1 - f % 2) / (2.0 * f) 235 | bilinear = np.zeros([f_shape[0], f_shape[1]]) 236 | for x in range(width): 237 | for y in range(height): 238 | value = (1 - abs(x / f - c)) * (1 - abs(y / f - c)) 239 | bilinear[x, y] = value 240 | weights = np.zeros(f_shape) 241 | for i in range(f_shape[2]): 242 | weights[:, :, i, i] = bilinear 243 | 244 | init = tf.constant_initializer(value=weights, 245 | dtype=tf.float32) 246 | return tf.get_variable(name="up_filter", initializer=init, 247 | shape=weights.shape) 248 | 249 | def get_conv_filter(self, name): 250 | init = tf.constant_initializer(value=self.data_dict[name][0], 251 | dtype=tf.float32) 252 | shape = self.data_dict[name][0].shape 253 | print('Layer name: %s' % name) 254 | print('Layer shape: %s' % str(shape)) 255 | var = tf.get_variable(name="filter", initializer=init, shape=shape) 256 | if not tf.get_variable_scope().reuse: 257 | weight_decay = tf.multiply(tf.nn.l2_loss(var), self.wd, 258 | name='weight_loss') 259 | tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, 260 | weight_decay) 261 | return var 262 | 263 | def get_bias(self, name, num_classes=None): 264 | bias_wights = self.data_dict[name][1] 265 | shape = self.data_dict[name][1].shape 266 | if name == 'fc8': 267 | bias_wights = self._bias_reshape(bias_wights, shape[0], 268 | num_classes) 269 | shape = [num_classes] 270 | init = tf.constant_initializer(value=bias_wights, 271 | dtype=tf.float32) 272 | return tf.get_variable(name="biases", initializer=init, shape=shape) 273 | 274 | def get_fc_weight(self, name): 275 | init = tf.constant_initializer(value=self.data_dict[name][0], 276 | dtype=tf.float32) 277 | shape = self.data_dict[name][0].shape 278 | var = tf.get_variable(name="weights", initializer=init, shape=shape) 279 | if not tf.get_variable_scope().reuse: 280 | weight_decay = tf.multiply(tf.nn.l2_loss(var), self.wd, 281 | name='weight_loss') 282 | tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, 283 | weight_decay) 284 | return var 285 | 286 | def _bias_reshape(self, bweight, num_orig, num_new): 287 | """ Build bias weights for filter produces with `_summary_reshape` 288 | 289 | """ 290 | n_averaged_elements = num_orig//num_new 291 | avg_bweight = np.zeros(num_new) 292 | for i in range(0, num_orig, n_averaged_elements): 293 | start_idx = i 294 | end_idx = start_idx + n_averaged_elements 295 | avg_idx = start_idx//n_averaged_elements 296 | if avg_idx == num_new: 297 | break 298 | avg_bweight[avg_idx] = np.mean(bweight[start_idx:end_idx]) 299 | return avg_bweight 300 | 301 | def _summary_reshape(self, fweight, shape, num_new): 302 | """ Produce weights for a reduced fully-connected layer. 303 | 304 | FC8 of VGG produces 1000 classes. Most semantic segmentation 305 | task require much less classes. This reshapes the original weights 306 | to be used in a fully-convolutional layer which produces num_new 307 | classes. To archive this the average (mean) of n adjanced classes is 308 | taken. 309 | 310 | Consider reordering fweight, to perserve semantic meaning of the 311 | weights. 312 | 313 | Args: 314 | fweight: original weights 315 | shape: shape of the desired fully-convolutional layer 316 | num_new: number of new classes 317 | 318 | 319 | Returns: 320 | Filter weights for `num_new` classes. 321 | """ 322 | num_orig = shape[3] 323 | shape[3] = num_new 324 | assert(num_new < num_orig) 325 | n_averaged_elements = num_orig//num_new 326 | avg_fweight = np.zeros(shape) 327 | for i in range(0, num_orig, n_averaged_elements): 328 | start_idx = i 329 | end_idx = start_idx + n_averaged_elements 330 | avg_idx = start_idx//n_averaged_elements 331 | if avg_idx == num_new: 332 | break 333 | avg_fweight[:, :, :, avg_idx] = np.mean( 334 | fweight[:, :, :, start_idx:end_idx], axis=3) 335 | return avg_fweight 336 | 337 | def _variable_with_weight_decay(self, shape, stddev, wd): 338 | """Helper to create an initialized Variable with weight decay. 339 | 340 | Note that the Variable is initialized with a truncated normal 341 | distribution. 342 | A weight decay is added only if one is specified. 343 | 344 | Args: 345 | name: name of the variable 346 | shape: list of ints 347 | stddev: standard deviation of a truncated Gaussian 348 | wd: add L2Loss weight decay multiplied by this float. If None, weight 349 | decay is not added for this Variable. 350 | 351 | Returns: 352 | Variable Tensor 353 | """ 354 | 355 | initializer = tf.truncated_normal_initializer(stddev=stddev) 356 | var = tf.get_variable('weights', shape=shape, 357 | initializer=initializer) 358 | 359 | if wd and (not tf.get_variable_scope().reuse): 360 | weight_decay = tf.multiply( 361 | tf.nn.l2_loss(var), wd, name='weight_loss') 362 | tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, 363 | weight_decay) 364 | return var 365 | 366 | def _bias_variable(self, shape, constant=0.0): 367 | initializer = tf.constant_initializer(constant) 368 | return tf.get_variable(name='biases', shape=shape, 369 | initializer=initializer) 370 | 371 | def get_fc_weight_reshape(self, name, shape, num_classes=None): 372 | print('Layer name: %s' % name) 373 | print('Layer shape: %s' % shape) 374 | weights = self.data_dict[name][0] 375 | weights = weights.reshape(shape) 376 | if num_classes is not None: 377 | weights = self._summary_reshape(weights, shape, 378 | num_new=num_classes) 379 | init = tf.constant_initializer(value=weights, 380 | dtype=tf.float32) 381 | return tf.get_variable(name="weights", initializer=init, shape=shape) 382 | 383 | 384 | def _activation_summary(x): 385 | """Helper to create summaries for activations. 386 | 387 | Creates a summary that provides a histogram of activations. 388 | Creates a summary that measure the sparsity of activations. 389 | 390 | Args: 391 | x: Tensor 392 | Returns: 393 | nothing 394 | """ 395 | # Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training 396 | # session. This helps the clarity of presentation on tensorboard. 397 | tensor_name = x.op.name 398 | # tensor_name = re.sub('%s_[0-9]*/' % TOWER_NAME, '', x.op.name) 399 | tf.summary.histogram(tensor_name + '/activations', x) 400 | tf.summary.scalar(tensor_name + '/sparsity', tf.nn.zero_fraction(x)) 401 | -------------------------------------------------------------------------------- /fcn8_downsampled.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MarvinTeichmann/tensorflow-fcn/83a828382f7eaeda584357a56094c78d9fa13013/fcn8_downsampled.png -------------------------------------------------------------------------------- /fcn8_upsampled.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MarvinTeichmann/tensorflow-fcn/83a828382f7eaeda584357a56094c78d9fa13013/fcn8_upsampled.png -------------------------------------------------------------------------------- /fcn8_vgg.py: -------------------------------------------------------------------------------- 1 | from __future__ import absolute_import 2 | from __future__ import division 3 | from __future__ import print_function 4 | 5 | import os 6 | import logging 7 | from math import ceil 8 | import sys 9 | 10 | import numpy as np 11 | import tensorflow as tf 12 | 13 | VGG_MEAN = [103.939, 116.779, 123.68] 14 | 15 | 16 | class FCN8VGG: 17 | 18 | def __init__(self, vgg16_npy_path=None): 19 | if vgg16_npy_path is None: 20 | path = sys.modules[self.__class__.__module__].__file__ 21 | # print path 22 | path = os.path.abspath(os.path.join(path, os.pardir)) 23 | # print path 24 | path = os.path.join(path, "vgg16.npy") 25 | vgg16_npy_path = path 26 | logging.info("Load npy file from '%s'.", vgg16_npy_path) 27 | if not os.path.isfile(vgg16_npy_path): 28 | logging.error(("File '%s' not found. Download it from " 29 | "ftp://mi.eng.cam.ac.uk/pub/mttt2/" 30 | "models/vgg16.npy"), vgg16_npy_path) 31 | sys.exit(1) 32 | 33 | self.data_dict = np.load(vgg16_npy_path, encoding='latin1').item() 34 | self.wd = 5e-4 35 | print("npy file loaded") 36 | 37 | def build(self, rgb, train=False, num_classes=20, random_init_fc8=False, 38 | debug=False, use_dilated=False): 39 | """ 40 | Build the VGG model using loaded weights 41 | Parameters 42 | ---------- 43 | rgb: image batch tensor 44 | Image in rgb shap. Scaled to Intervall [0, 255] 45 | train: bool 46 | Whether to build train or inference graph 47 | num_classes: int 48 | How many classes should be predicted (by fc8) 49 | random_init_fc8 : bool 50 | Whether to initialize fc8 layer randomly. 51 | Finetuning is required in this case. 52 | debug: bool 53 | Whether to print additional Debug Information. 54 | """ 55 | # Convert RGB to BGR 56 | 57 | with tf.name_scope('Processing'): 58 | 59 | red, green, blue = tf.split(rgb, 3, 3) 60 | # assert red.get_shape().as_list()[1:] == [224, 224, 1] 61 | # assert green.get_shape().as_list()[1:] == [224, 224, 1] 62 | # assert blue.get_shape().as_list()[1:] == [224, 224, 1] 63 | bgr = tf.concat([ 64 | blue - VGG_MEAN[0], 65 | green - VGG_MEAN[1], 66 | red - VGG_MEAN[2], 67 | ], 3) 68 | 69 | if debug: 70 | bgr = tf.Print(bgr, [tf.shape(bgr)], 71 | message='Shape of input image: ', 72 | summarize=4, first_n=1) 73 | 74 | self.conv1_1 = self._conv_layer(bgr, "conv1_1") 75 | self.conv1_2 = self._conv_layer(self.conv1_1, "conv1_2") 76 | self.pool1 = self._max_pool(self.conv1_2, 'pool1', debug) 77 | 78 | self.conv2_1 = self._conv_layer(self.pool1, "conv2_1") 79 | self.conv2_2 = self._conv_layer(self.conv2_1, "conv2_2") 80 | self.pool2 = self._max_pool(self.conv2_2, 'pool2', debug) 81 | 82 | self.conv3_1 = self._conv_layer(self.pool2, "conv3_1") 83 | self.conv3_2 = self._conv_layer(self.conv3_1, "conv3_2") 84 | self.conv3_3 = self._conv_layer(self.conv3_2, "conv3_3") 85 | self.pool3 = self._max_pool(self.conv3_3, 'pool3', debug) 86 | 87 | self.conv4_1 = self._conv_layer(self.pool3, "conv4_1") 88 | self.conv4_2 = self._conv_layer(self.conv4_1, "conv4_2") 89 | self.conv4_3 = self._conv_layer(self.conv4_2, "conv4_3") 90 | 91 | if use_dilated: 92 | pad = [[0, 0], [0, 0]] 93 | self.pool4 = tf.nn.max_pool(self.conv4_3, ksize=[1, 2, 2, 1], 94 | strides=[1, 1, 1, 1], 95 | padding='SAME', name='pool4') 96 | self.pool4 = tf.space_to_batch(self.pool4, 97 | paddings=pad, block_size=2) 98 | else: 99 | self.pool4 = self._max_pool(self.conv4_3, 'pool4', debug) 100 | 101 | self.conv5_1 = self._conv_layer(self.pool4, "conv5_1") 102 | self.conv5_2 = self._conv_layer(self.conv5_1, "conv5_2") 103 | self.conv5_3 = self._conv_layer(self.conv5_2, "conv5_3") 104 | if use_dilated: 105 | pad = [[0, 0], [0, 0]] 106 | self.pool5 = tf.nn.max_pool(self.conv5_3, ksize=[1, 2, 2, 1], 107 | strides=[1, 1, 1, 1], 108 | padding='SAME', name='pool5') 109 | self.pool5 = tf.space_to_batch(self.pool5, 110 | paddings=pad, block_size=2) 111 | else: 112 | self.pool5 = self._max_pool(self.conv5_3, 'pool5', debug) 113 | 114 | self.fc6 = self._fc_layer(self.pool5, "fc6") 115 | 116 | if train: 117 | self.fc6 = tf.nn.dropout(self.fc6, 0.5) 118 | 119 | self.fc7 = self._fc_layer(self.fc6, "fc7") 120 | if train: 121 | self.fc7 = tf.nn.dropout(self.fc7, 0.5) 122 | 123 | if use_dilated: 124 | self.pool5 = tf.batch_to_space(self.pool5, crops=pad, block_size=2) 125 | self.pool5 = tf.batch_to_space(self.pool5, crops=pad, block_size=2) 126 | self.fc7 = tf.batch_to_space(self.fc7, crops=pad, block_size=2) 127 | self.fc7 = tf.batch_to_space(self.fc7, crops=pad, block_size=2) 128 | return 129 | 130 | if random_init_fc8: 131 | self.score_fr = self._score_layer(self.fc7, "score_fr", 132 | num_classes) 133 | else: 134 | self.score_fr = self._fc_layer(self.fc7, "score_fr", 135 | num_classes=num_classes, 136 | relu=False) 137 | 138 | self.pred = tf.argmax(self.score_fr, dimension=3) 139 | 140 | self.upscore2 = self._upscore_layer(self.score_fr, 141 | shape=tf.shape(self.pool4), 142 | num_classes=num_classes, 143 | debug=debug, name='upscore2', 144 | ksize=4, stride=2) 145 | self.score_pool4 = self._score_layer(self.pool4, "score_pool4", 146 | num_classes=num_classes) 147 | self.fuse_pool4 = tf.add(self.upscore2, self.score_pool4) 148 | 149 | self.upscore4 = self._upscore_layer(self.fuse_pool4, 150 | shape=tf.shape(self.pool3), 151 | num_classes=num_classes, 152 | debug=debug, name='upscore4', 153 | ksize=4, stride=2) 154 | self.score_pool3 = self._score_layer(self.pool3, "score_pool3", 155 | num_classes=num_classes) 156 | self.fuse_pool3 = tf.add(self.upscore4, self.score_pool3) 157 | 158 | self.upscore32 = self._upscore_layer(self.fuse_pool3, 159 | shape=tf.shape(bgr), 160 | num_classes=num_classes, 161 | debug=debug, name='upscore32', 162 | ksize=16, stride=8) 163 | 164 | self.pred_up = tf.argmax(self.upscore32, dimension=3) 165 | 166 | def _max_pool(self, bottom, name, debug): 167 | pool = tf.nn.max_pool(bottom, ksize=[1, 2, 2, 1], strides=[1, 2, 2, 1], 168 | padding='SAME', name=name) 169 | 170 | if debug: 171 | pool = tf.Print(pool, [tf.shape(pool)], 172 | message='Shape of %s' % name, 173 | summarize=4, first_n=1) 174 | return pool 175 | 176 | def _conv_layer(self, bottom, name): 177 | with tf.variable_scope(name) as scope: 178 | filt = self.get_conv_filter(name) 179 | conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding='SAME') 180 | 181 | conv_biases = self.get_bias(name) 182 | bias = tf.nn.bias_add(conv, conv_biases) 183 | 184 | relu = tf.nn.relu(bias) 185 | # Add summary to Tensorboard 186 | _activation_summary(relu) 187 | return relu 188 | 189 | def _fc_layer(self, bottom, name, num_classes=None, 190 | relu=True, debug=False): 191 | with tf.variable_scope(name) as scope: 192 | shape = bottom.get_shape().as_list() 193 | 194 | if name == 'fc6': 195 | filt = self.get_fc_weight_reshape(name, [7, 7, 512, 4096]) 196 | elif name == 'score_fr': 197 | name = 'fc8' # Name of score_fr layer in VGG Model 198 | filt = self.get_fc_weight_reshape(name, [1, 1, 4096, 1000], 199 | num_classes=num_classes) 200 | else: 201 | filt = self.get_fc_weight_reshape(name, [1, 1, 4096, 4096]) 202 | 203 | self._add_wd_and_summary(filt, self.wd, "fc_wlosses") 204 | 205 | conv = tf.nn.conv2d(bottom, filt, [1, 1, 1, 1], padding='SAME') 206 | conv_biases = self.get_bias(name, num_classes=num_classes) 207 | bias = tf.nn.bias_add(conv, conv_biases) 208 | 209 | if relu: 210 | bias = tf.nn.relu(bias) 211 | _activation_summary(bias) 212 | 213 | if debug: 214 | bias = tf.Print(bias, [tf.shape(bias)], 215 | message='Shape of %s' % name, 216 | summarize=4, first_n=1) 217 | return bias 218 | 219 | def _score_layer(self, bottom, name, num_classes): 220 | with tf.variable_scope(name) as scope: 221 | # get number of input channels 222 | in_features = bottom.get_shape()[3].value 223 | shape = [1, 1, in_features, num_classes] 224 | # He initialization Sheme 225 | if name == "score_fr": 226 | num_input = in_features 227 | stddev = (2 / num_input)**0.5 228 | elif name == "score_pool4": 229 | stddev = 0.001 230 | elif name == "score_pool3": 231 | stddev = 0.0001 232 | # Apply convolution 233 | w_decay = self.wd 234 | 235 | weights = self._variable_with_weight_decay(shape, stddev, w_decay, 236 | decoder=True) 237 | conv = tf.nn.conv2d(bottom, weights, [1, 1, 1, 1], padding='SAME') 238 | # Apply bias 239 | conv_biases = self._bias_variable([num_classes], constant=0.0) 240 | bias = tf.nn.bias_add(conv, conv_biases) 241 | 242 | _activation_summary(bias) 243 | 244 | return bias 245 | 246 | def _upscore_layer(self, bottom, shape, 247 | num_classes, name, debug, 248 | ksize=4, stride=2): 249 | strides = [1, stride, stride, 1] 250 | with tf.variable_scope(name): 251 | in_features = bottom.get_shape()[3].value 252 | 253 | if shape is None: 254 | # Compute shape out of Bottom 255 | in_shape = tf.shape(bottom) 256 | 257 | h = ((in_shape[1] - 1) * stride) + 1 258 | w = ((in_shape[2] - 1) * stride) + 1 259 | new_shape = [in_shape[0], h, w, num_classes] 260 | else: 261 | new_shape = [shape[0], shape[1], shape[2], num_classes] 262 | output_shape = tf.stack(new_shape) 263 | 264 | logging.debug("Layer: %s, Fan-in: %d" % (name, in_features)) 265 | f_shape = [ksize, ksize, num_classes, in_features] 266 | 267 | # create 268 | num_input = ksize * ksize * in_features / stride 269 | stddev = (2 / num_input)**0.5 270 | 271 | weights = self.get_deconv_filter(f_shape) 272 | self._add_wd_and_summary(weights, self.wd, "fc_wlosses") 273 | deconv = tf.nn.conv2d_transpose(bottom, weights, output_shape, 274 | strides=strides, padding='SAME') 275 | 276 | if debug: 277 | deconv = tf.Print(deconv, [tf.shape(deconv)], 278 | message='Shape of %s' % name, 279 | summarize=4, first_n=1) 280 | 281 | _activation_summary(deconv) 282 | return deconv 283 | 284 | def get_deconv_filter(self, f_shape): 285 | width = f_shape[0] 286 | height = f_shape[1] 287 | f = ceil(width/2.0) 288 | c = (2 * f - 1 - f % 2) / (2.0 * f) 289 | bilinear = np.zeros([f_shape[0], f_shape[1]]) 290 | for x in range(width): 291 | for y in range(height): 292 | value = (1 - abs(x / f - c)) * (1 - abs(y / f - c)) 293 | bilinear[x, y] = value 294 | weights = np.zeros(f_shape) 295 | for i in range(f_shape[2]): 296 | weights[:, :, i, i] = bilinear 297 | 298 | init = tf.constant_initializer(value=weights, 299 | dtype=tf.float32) 300 | var = tf.get_variable(name="up_filter", initializer=init, 301 | shape=weights.shape) 302 | return var 303 | 304 | def get_conv_filter(self, name): 305 | init = tf.constant_initializer(value=self.data_dict[name][0], 306 | dtype=tf.float32) 307 | shape = self.data_dict[name][0].shape 308 | print('Layer name: %s' % name) 309 | print('Layer shape: %s' % str(shape)) 310 | var = tf.get_variable(name="filter", initializer=init, shape=shape) 311 | if not tf.get_variable_scope().reuse: 312 | weight_decay = tf.multiply(tf.nn.l2_loss(var), self.wd, 313 | name='weight_loss') 314 | tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, 315 | weight_decay) 316 | _variable_summaries(var) 317 | return var 318 | 319 | def get_bias(self, name, num_classes=None): 320 | bias_wights = self.data_dict[name][1] 321 | shape = self.data_dict[name][1].shape 322 | if name == 'fc8': 323 | bias_wights = self._bias_reshape(bias_wights, shape[0], 324 | num_classes) 325 | shape = [num_classes] 326 | init = tf.constant_initializer(value=bias_wights, 327 | dtype=tf.float32) 328 | var = tf.get_variable(name="biases", initializer=init, shape=shape) 329 | _variable_summaries(var) 330 | return var 331 | 332 | def get_fc_weight(self, name): 333 | init = tf.constant_initializer(value=self.data_dict[name][0], 334 | dtype=tf.float32) 335 | shape = self.data_dict[name][0].shape 336 | var = tf.get_variable(name="weights", initializer=init, shape=shape) 337 | if not tf.get_variable_scope().reuse: 338 | weight_decay = tf.multiply(tf.nn.l2_loss(var), self.wd, 339 | name='weight_loss') 340 | tf.add_to_collection(tf.GraphKeys.REGULARIZATION_LOSSES, 341 | weight_decay) 342 | _variable_summaries(var) 343 | return var 344 | 345 | def _bias_reshape(self, bweight, num_orig, num_new): 346 | """ Build bias weights for filter produces with `_summary_reshape` 347 | 348 | """ 349 | n_averaged_elements = num_orig//num_new 350 | avg_bweight = np.zeros(num_new) 351 | for i in range(0, num_orig, n_averaged_elements): 352 | start_idx = i 353 | end_idx = start_idx + n_averaged_elements 354 | avg_idx = start_idx//n_averaged_elements 355 | if avg_idx == num_new: 356 | break 357 | avg_bweight[avg_idx] = np.mean(bweight[start_idx:end_idx]) 358 | return avg_bweight 359 | 360 | def _summary_reshape(self, fweight, shape, num_new): 361 | """ Produce weights for a reduced fully-connected layer. 362 | 363 | FC8 of VGG produces 1000 classes. Most semantic segmentation 364 | task require much less classes. This reshapes the original weights 365 | to be used in a fully-convolutional layer which produces num_new 366 | classes. To archive this the average (mean) of n adjanced classes is 367 | taken. 368 | 369 | Consider reordering fweight, to perserve semantic meaning of the 370 | weights. 371 | 372 | Args: 373 | fweight: original weights 374 | shape: shape of the desired fully-convolutional layer 375 | num_new: number of new classes 376 | 377 | 378 | Returns: 379 | Filter weights for `num_new` classes. 380 | """ 381 | num_orig = shape[3] 382 | shape[3] = num_new 383 | assert(num_new < num_orig) 384 | n_averaged_elements = num_orig//num_new 385 | avg_fweight = np.zeros(shape) 386 | for i in range(0, num_orig, n_averaged_elements): 387 | start_idx = i 388 | end_idx = start_idx + n_averaged_elements 389 | avg_idx = start_idx//n_averaged_elements 390 | if avg_idx == num_new: 391 | break 392 | avg_fweight[:, :, :, avg_idx] = np.mean( 393 | fweight[:, :, :, start_idx:end_idx], axis=3) 394 | return avg_fweight 395 | 396 | def _variable_with_weight_decay(self, shape, stddev, wd, decoder=False): 397 | """Helper to create an initialized Variable with weight decay. 398 | 399 | Note that the Variable is initialized with a truncated normal 400 | distribution. 401 | A weight decay is added only if one is specified. 402 | 403 | Args: 404 | name: name of the variable 405 | shape: list of ints 406 | stddev: standard deviation of a truncated Gaussian 407 | wd: add L2Loss weight decay multiplied by this float. If None, weight 408 | decay is not added for this Variable. 409 | 410 | Returns: 411 | Variable Tensor 412 | """ 413 | 414 | initializer = tf.truncated_normal_initializer(stddev=stddev) 415 | var = tf.get_variable('weights', shape=shape, 416 | initializer=initializer) 417 | 418 | collection_name = tf.GraphKeys.REGULARIZATION_LOSSES 419 | if wd and (not tf.get_variable_scope().reuse): 420 | weight_decay = tf.multiply( 421 | tf.nn.l2_loss(var), wd, name='weight_loss') 422 | tf.add_to_collection(collection_name, weight_decay) 423 | _variable_summaries(var) 424 | return var 425 | 426 | def _add_wd_and_summary(self, var, wd, collection_name=None): 427 | if collection_name is None: 428 | collection_name = tf.GraphKeys.REGULARIZATION_LOSSES 429 | if wd and (not tf.get_variable_scope().reuse): 430 | weight_decay = tf.multiply( 431 | tf.nn.l2_loss(var), wd, name='weight_loss') 432 | tf.add_to_collection(collection_name, weight_decay) 433 | _variable_summaries(var) 434 | return var 435 | 436 | def _bias_variable(self, shape, constant=0.0): 437 | initializer = tf.constant_initializer(constant) 438 | var = tf.get_variable(name='biases', shape=shape, 439 | initializer=initializer) 440 | _variable_summaries(var) 441 | return var 442 | 443 | def get_fc_weight_reshape(self, name, shape, num_classes=None): 444 | print('Layer name: %s' % name) 445 | print('Layer shape: %s' % shape) 446 | weights = self.data_dict[name][0] 447 | weights = weights.reshape(shape) 448 | if num_classes is not None: 449 | weights = self._summary_reshape(weights, shape, 450 | num_new=num_classes) 451 | init = tf.constant_initializer(value=weights, 452 | dtype=tf.float32) 453 | var = tf.get_variable(name="weights", initializer=init, shape=shape) 454 | return var 455 | 456 | 457 | def _activation_summary(x): 458 | """Helper to create summaries for activations. 459 | 460 | Creates a summary that provides a histogram of activations. 461 | Creates a summary that measure the sparsity of activations. 462 | 463 | Args: 464 | x: Tensor 465 | Returns: 466 | nothing 467 | """ 468 | # Remove 'tower_[0-9]/' from the name in case this is a multi-GPU training 469 | # session. This helps the clarity of presentation on tensorboard. 470 | tensor_name = x.op.name 471 | # tensor_name = re.sub('%s_[0-9]*/' % TOWER_NAME, '', x.op.name) 472 | tf.summary.histogram(tensor_name + '/activations', x) 473 | tf.summary.scalar(tensor_name + '/sparsity', tf.nn.zero_fraction(x)) 474 | 475 | 476 | def _variable_summaries(var): 477 | """Attach a lot of summaries to a Tensor.""" 478 | if not tf.get_variable_scope().reuse: 479 | name = var.op.name 480 | logging.info("Creating Summary for: %s" % name) 481 | with tf.name_scope('summaries'): 482 | mean = tf.reduce_mean(var) 483 | tf.summary.scalar(name + '/mean', mean) 484 | with tf.name_scope('stddev'): 485 | stddev = tf.sqrt(tf.reduce_sum(tf.square(var - mean))) 486 | tf.summary.scalar(name + '/sttdev', stddev) 487 | tf.summary.scalar(name + '/max', tf.reduce_max(var)) 488 | tf.summary.scalar(name + '/min', tf.reduce_min(var)) 489 | tf.summary.histogram(name, var) 490 | -------------------------------------------------------------------------------- /loss.py: -------------------------------------------------------------------------------- 1 | """This module provides the a softmax cross entropy loss for training FCN. 2 | 3 | In order to train VGG first build the model and then feed apply vgg_fcn.up 4 | to the loss. The loss function can be used in combination with any optimizer 5 | (e.g. Adam) to finetune the whole model. 6 | """ 7 | 8 | from __future__ import absolute_import 9 | from __future__ import division 10 | from __future__ import print_function 11 | 12 | import tensorflow as tf 13 | 14 | 15 | def loss(logits, labels, num_classes, head=None): 16 | """Calculate the loss from the logits and the labels. 17 | 18 | Args: 19 | logits: tensor, float - [batch_size, width, height, num_classes]. 20 | Use vgg_fcn.upscore as logits. 21 | labels: Labels tensor, int32 - [batch_size, width, height, num_classes]. 22 | The ground truth of your data. 23 | head: numpy array - [num_classes] 24 | Weighting the loss of each class 25 | Optional: Prioritize some classes 26 | 27 | Returns: 28 | loss: Loss tensor of type float. 29 | """ 30 | with tf.name_scope('loss'): 31 | logits = tf.reshape(logits, (-1, num_classes)) 32 | epsilon = tf.constant(value=1e-4) 33 | labels = tf.to_float(tf.reshape(labels, (-1, num_classes))) 34 | 35 | softmax = tf.nn.softmax(logits) + epsilon 36 | 37 | if head is not None: 38 | cross_entropy = -tf.reduce_sum(tf.multiply(labels * tf.log(softmax), 39 | head), reduction_indices=[1]) 40 | else: 41 | cross_entropy = -tf.reduce_sum( 42 | labels * tf.log(softmax), reduction_indices=[1]) 43 | 44 | cross_entropy_mean = tf.reduce_mean(cross_entropy, 45 | name='xentropy_mean') 46 | tf.add_to_collection('losses', cross_entropy_mean) 47 | 48 | loss = tf.add_n(tf.get_collection('losses'), name='total_loss') 49 | return loss 50 | -------------------------------------------------------------------------------- /requirements.txt: -------------------------------------------------------------------------------- 1 | matplotlib>=1.5.1 2 | numpy>=1.11.1 3 | Pillow>=3.3.0 4 | scipy>=0.17.1 5 | -------------------------------------------------------------------------------- /test_data/tabby_cat.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/MarvinTeichmann/tensorflow-fcn/83a828382f7eaeda584357a56094c78d9fa13013/test_data/tabby_cat.png -------------------------------------------------------------------------------- /test_fcn16_vgg.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import os 4 | import scipy as scp 5 | import scipy.misc 6 | 7 | import numpy as np 8 | import logging 9 | import tensorflow as tf 10 | import sys 11 | 12 | import fcn16_vgg 13 | import utils 14 | 15 | logging.basicConfig(format='%(asctime)s %(levelname)s %(message)s', 16 | level=logging.INFO, 17 | stream=sys.stdout) 18 | 19 | from tensorflow.python.framework import ops 20 | 21 | img1 = scp.misc.imread("./test_data/tabby_cat.png") 22 | 23 | with tf.Session() as sess: 24 | images = tf.placeholder("float") 25 | feed_dict = {images: img1} 26 | batch_images = tf.expand_dims(images, 0) 27 | 28 | vgg_fcn = fcn16_vgg.FCN16VGG() 29 | with tf.name_scope("content_vgg"): 30 | vgg_fcn.build(batch_images, debug=True) 31 | 32 | print('Finished building Network.') 33 | 34 | logging.warning("Score weights are initialized random.") 35 | logging.warning("Do not expect meaningful results.") 36 | 37 | logging.info("Start Initializing Variabels.") 38 | 39 | init = tf.global_variables_initializer() 40 | sess.run(init) 41 | 42 | print('Running the Network') 43 | tensors = [vgg_fcn.pred, vgg_fcn.pred_up] 44 | down, up = sess.run(tensors, feed_dict=feed_dict) 45 | 46 | down_color = utils.color_image(down[0]) 47 | up_color = utils.color_image(up[0]) 48 | 49 | scp.misc.imsave('fcn16_downsampled.png', down_color) 50 | scp.misc.imsave('fcn16_upsampled.png', up_color) 51 | -------------------------------------------------------------------------------- /test_fcn32_vgg.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import os 4 | import scipy as scp 5 | import scipy.misc 6 | 7 | import numpy as np 8 | import tensorflow as tf 9 | 10 | import fcn32_vgg 11 | import utils 12 | 13 | from tensorflow.python.framework import ops 14 | 15 | img1 = scp.misc.imread("./test_data/tabby_cat.png") 16 | 17 | with tf.Session() as sess: 18 | images = tf.placeholder("float") 19 | feed_dict = {images: img1} 20 | batch_images = tf.expand_dims(images, 0) 21 | 22 | vgg_fcn = fcn32_vgg.FCN32VGG() 23 | with tf.name_scope("content_vgg"): 24 | vgg_fcn.build(batch_images, debug=True) 25 | 26 | print('Finished building Network.') 27 | 28 | init = tf.global_variables_initializer() 29 | sess.run(init) 30 | 31 | print('Running the Network') 32 | tensors = [vgg_fcn.pred, vgg_fcn.pred_up] 33 | down, up = sess.run(tensors, feed_dict=feed_dict) 34 | 35 | down_color = utils.color_image(down[0]) 36 | up_color = utils.color_image(up[0]) 37 | 38 | scp.misc.imsave('fcn32_downsampled.png', down_color) 39 | scp.misc.imsave('fcn32_upsampled.png', up_color) 40 | -------------------------------------------------------------------------------- /test_fcn8_vgg.py: -------------------------------------------------------------------------------- 1 | #!/usr/bin/env python 2 | 3 | import os 4 | import scipy as scp 5 | import scipy.misc 6 | 7 | import numpy as np 8 | import logging 9 | import tensorflow as tf 10 | import sys 11 | 12 | import fcn8_vgg 13 | import utils 14 | 15 | logging.basicConfig(format='%(asctime)s %(levelname)s %(message)s', 16 | level=logging.INFO, 17 | stream=sys.stdout) 18 | 19 | from tensorflow.python.framework import ops 20 | 21 | img1 = scp.misc.imread("./test_data/tabby_cat.png") 22 | 23 | with tf.Session() as sess: 24 | images = tf.placeholder("float") 25 | feed_dict = {images: img1} 26 | batch_images = tf.expand_dims(images, 0) 27 | 28 | vgg_fcn = fcn8_vgg.FCN8VGG() 29 | with tf.name_scope("content_vgg"): 30 | vgg_fcn.build(batch_images, debug=True) 31 | 32 | print('Finished building Network.') 33 | 34 | logging.warning("Score weights are initialized random.") 35 | logging.warning("Do not expect meaningful results.") 36 | 37 | logging.info("Start Initializing Variabels.") 38 | 39 | init = tf.global_variables_initializer() 40 | sess.run(init) 41 | 42 | print('Running the Network') 43 | tensors = [vgg_fcn.pred, vgg_fcn.pred_up] 44 | down, up = sess.run(tensors, feed_dict=feed_dict) 45 | 46 | down_color = utils.color_image(down[0]) 47 | up_color = utils.color_image(up[0]) 48 | 49 | scp.misc.imsave('fcn8_downsampled.png', down_color) 50 | scp.misc.imsave('fcn8_upsampled.png', up_color) 51 | -------------------------------------------------------------------------------- /utils.py: -------------------------------------------------------------------------------- 1 | import numpy as np 2 | 3 | 4 | def color_image(image, num_classes=20): 5 | import matplotlib as mpl 6 | import matplotlib.cm 7 | norm = mpl.colors.Normalize(vmin=0., vmax=num_classes) 8 | mycm = mpl.cm.get_cmap('Set1') 9 | return mycm(norm(image)) 10 | --------------------------------------------------------------------------------