├── README.md ├── Weed Mapping.ipynb ├── builders ├── __init__.py ├── frontend_builder.py └── model_builder.py ├── frontends ├── __init__.py ├── conv_blocks.py ├── inception_utils.py ├── inception_v4.py ├── mobilenet_base.py ├── mobilenet_v2.py ├── resnet_utils.py ├── resnet_v1.py ├── resnet_v2.py └── se_resnext.py ├── iou_vs_epochs.png ├── models ├── AdapNet.py ├── BiSeNet.py ├── DDSC.py ├── DeepLabV3.py ├── DeepLabV3_plus.py ├── DenseASPP.py ├── Encoder_Decoder.py ├── FC_DenseNet_Tiramisu.py ├── FRRN.py ├── GCN.py ├── ICNet.py ├── MobileUNet.py ├── PSPNet.py ├── RefineNet.py ├── __init__.py └── custom_model.py ├── predict.py ├── test.py ├── train.py └── utils ├── __init__.py ├── get_pretrained_checkpoints.py ├── helpers.py └── utils.py /README.md: -------------------------------------------------------------------------------- 1 | ![banner cnns ppgcc ufsc](http://www.lapix.ufsc.br/wp-content/uploads/2019/06/VC-lapix.png) 2 | 3 | # Weed-Mapping 4 | Weed Mapping in Aerial Images through Identification and Segmentation of Crop Rows and Weeds using Convolutional Neural Networks 5 | 6 | 7 | ![alt-text-10](http://www.lapix.ufsc.br/wp-content/uploads/2019/06/results2.png) 8 | 9 | ## Description 10 | This repository serves as a Weed Mapping Semantic Segmentation Suite. The goal is to easily be able to implement, train, and test new Semantic Segmentation models! 11 | 12 | It is based upon the, meanwhile deprecated, code repo at: https://github.com/GeorgeSeif/Semantic-Segmentation-Suite. We, however, did not duplicate the whole repo and data here. Only the code that was necessary for our Weed Mapping apllication is here. Where modifications and extensions were needed, we did them. 13 | 14 | - The institutional code mirror repository for this work is at: https://codigos.ufsc.br/lapix/Weed-Mapping 15 | 16 | We also added a Jupyter Notebook with the whole high-level code necessary for training and predicting crop rows and weed areas. The datasets we employed in our experiments are here: 17 | 18 | - http://www.lapix.ufsc.br/weed-mapping-sugar-cane (Large Sugar Cane Field – Northern Brazil - contains weeds) 19 | - http://www.lapix.ufsc.br/crop-rows-sugar-cane (Large Sugar Cane Field – Northern Brazil - contains only well-behaved crops) 20 | 21 | This code repo is complete with the following: 22 | 23 | - Jupyter Notebook with the whole high-level code necessary for training and predicting crop rows and weed areas 24 | - Training and testing modes 25 | - Data augmentation 26 | - Several state-of-the-art models. Easily **plug and play** with different models 27 | - Able to use **any other** dataset besides our own 28 | - Evaluation including precision, recall, f1 score, average accuracy, per-class accuracy, and mean IoU 29 | - Plotting of loss function and accuracy over epochs 30 | 31 | **Any suggestions to improve this repository, including any new segmentation models you would like to see are welcome!** 32 | 33 | ## Frontends 34 | 35 | The following feature extraction models are currently made available: 36 | 37 | - [MobileNetV2](https://arxiv.org/abs/1801.04381), [ResNet50/101/152](https://arxiv.org/abs/1512.03385), and [InceptionV4](https://arxiv.org/abs/1602.07261) 38 | 39 | ## Models 40 | 41 | The following segmentation models are currently made available: 42 | 43 | - [Encoder-Decoder based on SegNet](https://arxiv.org/abs/1511.00561). This network uses a VGG-style encoder-decoder, where the upsampling in the decoder is done using transposed convolutions. 44 | 45 | - [Encoder-Decoder with skip connections based on SegNet](https://arxiv.org/abs/1511.00561). This network uses a VGG-style encoder-decoder, where the upsampling in the decoder is done using transposed convolutions. In addition, it employs additive skip connections from the encoder to the decoder. 46 | 47 | - [Mobile UNet for Semantic Segmentation](https://arxiv.org/abs/1704.04861). Combining the ideas of MobileNets Depthwise Separable Convolutions with UNet to build a high speed, low parameter Semantic Segmentation model. 48 | 49 | - [Pyramid Scene Parsing Network](https://arxiv.org/abs/1612.01105). In this paper, the capability of global context information by different-region based context aggregation is applied through a pyramid pooling module together with the proposed pyramid scene parsing network (PSPNet). **Note that the original PSPNet uses a ResNet with dilated convolutions, but the one is this respository has only a regular ResNet.** 50 | 51 | - [The One Hundred Layers Tiramisu: Fully Convolutional DenseNets for Semantic Segmentation](https://arxiv.org/abs/1611.09326). Uses a downsampling-upsampling style encoder-decoder network. Each stage i.e between the pooling layers uses dense blocks. In addition, it concatenated skip connections from the encoder to the decoder. In the code, this is the FC-DenseNet model. 52 | 53 | - [Rethinking Atrous Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1706.05587). This is the DeepLabV3 network. Uses Atrous Spatial Pyramid Pooling to capture multi-scale context by using multiple atrous rates. This creates a large receptive field. 54 | 55 | - [RefineNet: Multi-Path Refinement Networks for High-Resolution Semantic Segmentation](https://arxiv.org/abs/1611.06612). A multi-path refinement network that explicitly exploits all the information available along the down-sampling process to enable high-resolution prediction using long-range residual connections. In this way, the deeper layers that capture high-level semantic features can be directly refined using fine-grained features from earlier convolutions. 56 | 57 | - [Full-Resolution Residual Networks for Semantic Segmentation in Street Scenes](https://arxiv.org/abs/1611.08323). Combines multi-scale context with pixel-level accuracy by using two processing streams within the network. The residual stream carries information at the full image resolution, enabling precise adherence to segment boundaries. The pooling stream undergoes a sequence of pooling operations 58 | to obtain robust features for recognition. The two streams are coupled at the full image resolution using residuals. In the code, this is the FRRN model. 59 | 60 | - [Large Kernel Matters -- Improve Semantic Segmentation by Global Convolutional Network](https://arxiv.org/abs/1703.02719). Proposes a Global Convolutional Network to address both the classification and localization issues for the semantic segmentation. Uses large separable kernals to expand the receptive field, plus a boundary refinement block to further improve localization performance near boundaries. 61 | 62 | - [AdapNet: Adaptive Semantic Segmentation in Adverse Environmental Conditions](http://ais.informatik.uni-freiburg.de/publications/papers/valada17icra.pdf) Modifies the ResNet50 architecture by performing the lower resolution processing using a multi-scale strategy with atrous convolutions. This is a slightly modified version using bilinear upscaling instead of transposed convolutions as I found it gave better results. 63 | 64 | - [ICNet for Real-Time Semantic Segmentation on High-Resolution Images](https://arxiv.org/abs/1704.08545). Proposes a compressed-PSPNet-based image cascade network (ICNet) that incorporates multi-resolution branches under proper label guidance to address this challenge. Most of the processing is done at low resolution for high speed and the multi-scale auxillary loss helps get an accurate model. **Note that for this model, I have implemented the network but have not integrated its training yet** 65 | 66 | - [Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation](https://arxiv.org/abs/1802.02611). This is the DeepLabV3+ network which adds a Decoder module on top of the regular DeepLabV3 model. 67 | 68 | - [DenseASPP for Semantic Segmentation in Street Scenes](http://openaccess.thecvf.com/content_cvpr_2018/html/Yang_DenseASPP_for_Semantic_CVPR_2018_paper.html). Combines many different scales using dilated convolution but with dense connections 69 | 70 | - [Dense Decoder Shortcut Connections for Single-Pass Semantic Segmentation](http://openaccess.thecvf.com/content_cvpr_2018/html/Bilinski_Dense_Decoder_Shortcut_CVPR_2018_paper.html). Dense Decoder Shorcut Connections using dense connectivity in the decoder stage of the segmentation model. **Note: this network takes a bit of extra time to load due to the construction of the ResNeXt blocks** 71 | 72 | - [BiSeNet: Bilateral Segmentation Network for Real-time Semantic Segmentation](https://arxiv.org/abs/1808.00897). BiSeNet use a Spatial Path with a small stride to preserve the spatial information and generate high-resolution features while having a parallel Context Path with a fast downsampling strategy to obtain sufficient receptive field. 73 | 74 | - Or make your own and plug and play! 75 | 76 | 77 | ## Files and Directories 78 | 79 | - **weedMaping.ipynb**: Jupyter Notebook with the whole high-level code necessary for training and predicting crop rows and weed areas 80 | 81 | - **train.py:** Training on the dataset of your choice. Default is CamVid 82 | 83 | - **test.py:** Testing on the dataset of your choice. Default is CamVid 84 | 85 | - **predict.py:** Use your newly trained model to run a prediction on a single image 86 | 87 | - **helper.py:** Quick helper functions for data preparation and visualization 88 | 89 | - **utils.py:** Utilities for printing, debugging, testing, and evaluation 90 | 91 | - **models:** Folder containing all model files. Use this to build your models, or use a pre-built one 92 | 93 | - **CamVid:** The CamVid datatset for Semantic Segmentation as a test bed. This is the 32 class version 94 | 95 | - **checkpoints:** Checkpoint files for each epoch during training 96 | 97 | - **Test:** Test results including images, per-class accuracies, precision, recall, and f1 score 98 | 99 | 100 | ## Installation 101 | This project has the following dependencies: 102 | 103 | - Numpy `sudo pip install numpy` 104 | 105 | - OpenCV Python `sudo apt-get install python-opencv` 106 | 107 | - TensorFlow `sudo pip install --upgrade tensorflow-gpu` 108 | 109 | ## Usage 110 | The only thing you have to do to get started is set up the folders in the following structure: 111 | 112 | ├── "dataset_name" 113 | | ├── train 114 | | ├── train_labels 115 | | ├── val 116 | | ├── val_labels 117 | | ├── test 118 | | ├── test_labels 119 | 120 | Put a text file under the dataset directory called "class_dict.csv" which contains the list of classes along with the R, G, B colour labels to visualize the segmentation results. This kind of dictionairy is usually supplied with the dataset. Here is an example for the **Weed Mapping dataset**: 121 | 122 | ``` 123 | name,r,g,b 124 | SugarCane,0,255,0 125 | Soil,255,0,0 126 | Invasive,255,255,0 127 | ``` 128 | 129 | **Note:** If you are using any of the networks that rely on a pre-trained ResNet, then you will need to download the pre-trained weights using the provided script. These are currently: PSPNet, RefineNet, DeepLabV3, DeepLabV3+, GCN. 130 | 131 | Then you can simply run `train.py`! Check out the optional command line arguments: 132 | 133 | ``` 134 | usage: train.py [-h] [--num_epochs NUM_EPOCHS] 135 | [--checkpoint_step CHECKPOINT_STEP] 136 | [--validation_step VALIDATION_STEP] [--image IMAGE] 137 | [--continue_training CONTINUE_TRAINING] [--dataset DATASET] 138 | [--crop_height CROP_HEIGHT] [--crop_width CROP_WIDTH] 139 | [--batch_size BATCH_SIZE] [--num_val_images NUM_VAL_IMAGES] 140 | [--h_flip H_FLIP] [--v_flip V_FLIP] [--brightness BRIGHTNESS] 141 | [--rotation ROTATION] [--model MODEL] [--frontend FRONTEND] 142 | 143 | optional arguments: 144 | -h, --help show this help message and exit 145 | --num_epochs NUM_EPOCHS 146 | Number of epochs to train for 147 | --checkpoint_step CHECKPOINT_STEP 148 | How often to save checkpoints (epochs) 149 | --validation_step VALIDATION_STEP 150 | How often to perform validation (epochs) 151 | --image IMAGE The image you want to predict on. Only valid in 152 | "predict" mode. 153 | --continue_training CONTINUE_TRAINING 154 | Whether to continue training from a checkpoint 155 | --dataset DATASET Dataset you are using. 156 | --crop_height CROP_HEIGHT 157 | Height of cropped input image to network 158 | --crop_width CROP_WIDTH 159 | Width of cropped input image to network 160 | --batch_size BATCH_SIZE 161 | Number of images in each batch 162 | --num_val_images NUM_VAL_IMAGES 163 | The number of images to used for validations 164 | --h_flip H_FLIP Whether to randomly flip the image horizontally for 165 | data augmentation 166 | --v_flip V_FLIP Whether to randomly flip the image vertically for data 167 | augmentation 168 | --brightness BRIGHTNESS 169 | Whether to randomly change the image brightness for 170 | data augmentation. Specifies the max bightness change 171 | as a factor between 0.0 and 1.0. For example, 0.1 172 | represents a max brightness change of 10% (+-). 173 | --rotation ROTATION Whether to randomly rotate the image for data 174 | augmentation. Specifies the max rotation angle in 175 | degrees. 176 | --model MODEL The model you are using. See model_builder.py for 177 | supported models 178 | --frontend FRONTEND The frontend you are using. See frontend_builder.py 179 | for supported models 180 | 181 | ``` 182 | 183 | 184 | 185 | ## Acknowledgements 186 | This work was the result of a collaborative effort of a team of engaged researchers: 187 | - Alexandre Monteiro 188 | - Paulo Cesar Pereira Junior 189 | - Antonio Carlos Sobieranski 190 | - Rafael da Luz Ribeiro 191 | 192 | 193 | ## Citing this Git 194 | 195 | 196 | ```tex 197 | @misc{WeedMappingCode2019, 198 | author = {Monteiro, A.A.O. and von Wangenheim, A.}, 199 | title = {Weed Mapping in Aerial Images through Identification and Segmentation of Crop Rows and Weeds}, 200 | year = {2019}, 201 | publisher = {GitHub}, 202 | journal = {GitHub repository}, 203 | howpublished = {\url{https://github.com/awangenh/Weed-Mapping}} 204 | } 205 | ``` 206 | 207 | ![banner Creative Commons INCoD UFSC](http://www.lapix.ufsc.br/wp-content/uploads/2019/05/cc.png) 208 | 209 | -------------------------------------------------------------------------------- /Weed Mapping.ipynb: -------------------------------------------------------------------------------- 1 | { 2 | "cells": [ 3 | { 4 | "cell_type": "markdown", 5 | "metadata": {}, 6 | "source": [ 7 | "![banner cnns ppgcc ufsc](http://www.lapix.ufsc.br/wp-content/uploads/2019/06/VC-lapix.png)\n", 8 | "\n", 9 | "# Weed Mapping in Aerial Images through Identification and Segmentation of Crop Rows and Weeds\n", 10 | "\n", 11 | "Notebook for Weed Mapping in Aerial Images through Identification and Segmentation of Crop Rows and Weeds using Convolutional Neural Networks " 12 | ] 13 | }, 14 | { 15 | "cell_type": "markdown", 16 | "metadata": {}, 17 | "source": [ 18 | "\"Open   \"Creative    \"Jupyter   \"Python " 19 | ] 20 | }, 21 | { 22 | "cell_type": "markdown", 23 | "metadata": {}, 24 | "source": [ 25 | "## Initializations and general instructions\n", 26 | "\n", 27 | "Networks for semantic segmentation classify objects into images and are able to associate individual pixels of the images with the object class they represent, performing in practice a segmentation of the image according to the semantics of the object to which each individual pixel is associated.\n", 28 | "\n", 29 | "In this work we use our own dataset containing RGB images of a sugarcane plantation applied to four models of CNN deployed in this repository that was adapted from the, already deprecated, code at: https://github.com/GeorgeSeif/Semantic-Segmentation-Suite by George Seif. This notebook assumes that you are using Google Colab. If not, please see the instructions of installation and usage described along our repository at: \n", 30 | " - https://github.com/awangenh/Weed-Mapping or\n", 31 | " - https://codigos.ufsc.br/lapix/Weed-Mapping\n", 32 | "\n", 33 | "We used the models: **SegNet, UNet, FRRN and PSPNet**. Some ground truths and respective results are shown in the first figure of the repo.\n", 34 | "\n", 35 | "Everything you need to know about training, testing and making predictions on your dataset are explained in this notebook and in this repository depending on which platform you are using. To use Google Colab you don't need to import the Tensorflow Framework.\n", 36 | "\n", 37 | "## Setting you Dataset\n", 38 | "The first thing you need to acomplish is to organize the structure of the folders of your data as explained in the \"**Usage**\"\" part of the repository.\n", 39 | "Do not forget to edit the text file \"*class_dict.csv*\" specific for your information.\n", 40 | "\n", 41 | "Observe that our dataset was stored in a folder calle *Dataset_ArticleBackground*. The code below reflects this. You will have to adapt the code to your environment.\n", 42 | "\n", 43 | "After that, you just need to upload the content to the Drive.\n", 44 | "\n", 45 | "## Mounting your data:\n", 46 | "Next you need to define the place where all the scripts available in the repository and also your dataset are stored:\n" 47 | ] 48 | }, 49 | { 50 | "cell_type": "code", 51 | "execution_count": null, 52 | "metadata": {}, 53 | "outputs": [], 54 | "source": [ 55 | "# Code to mount Google Drive\n", 56 | "import os\n", 57 | "from google.colab import drive\n", 58 | "drive.mount('/content/drive')" 59 | ] 60 | }, 61 | { 62 | "cell_type": "markdown", 63 | "metadata": {}, 64 | "source": [ 65 | "# Check the processor\n", 66 | "\n", 67 | "To use the GPU available go to:\n", 68 | "\n", 69 | "Edit >> Notebook settings >> choose the Runtime type and GPU as Hardware accelerator.\n", 70 | "\n", 71 | "The code below is for you to check the version of the GPU being used." 72 | ] 73 | }, 74 | { 75 | "cell_type": "code", 76 | "execution_count": null, 77 | "metadata": {}, 78 | "outputs": [], 79 | "source": [ 80 | "!/opt/bin/nvidia-smi\n", 81 | "!nvcc --version" 82 | ] 83 | }, 84 | { 85 | "cell_type": "markdown", 86 | "metadata": {}, 87 | "source": [ 88 | "\n", 89 | "# Train the model\n", 90 | "\n", 91 | "Access the directory where you mounted your project and call the script to run the training of the model:\n", 92 | "\n", 93 | "In our work is the **train_balancing_metrics.py**\n", 94 | "\n", 95 | "Is also needed to give some parameters. We used the following:\n", 96 | "\n", 97 | "\n", 98 | "\n", 99 | "* num_epochs = 200\n", 100 | "\n", 101 | "* dataset = \"The folder where our dataset is located\"\n", 102 | "\n", 103 | "* num_val_images = 44, the number of images in our validation set\n", 104 | "\n", 105 | "* h_flip and v_flip = True, to use operations fo data augmentation\n", 106 | "\n", 107 | "* model = \"FRRN-B\", or any other model choosen\n", 108 | "\n", 109 | "* batch_size = 3 (worked for us!)\n", 110 | "\n", 111 | "* continue_training = False, to start training from the begining\n", 112 | "\n", 113 | "In the repository mentioned above, there is an explanation for all the parameters that can be used.\n", 114 | "\n" 115 | ] 116 | }, 117 | { 118 | "cell_type": "code", 119 | "execution_count": null, 120 | "metadata": {}, 121 | "outputs": [], 122 | "source": [ 123 | "%cd /content/drive/My\\ Drive/DeepLearning/Semantic-Segmentation-Suite-master/\n", 124 | "\n", 125 | "!python train_balancing_metrics.py --num_epochs=200 --dataset=\"Dataset_ArticleBackground\" --num_val_images=44 --h_flip=True --v_flip=True --model=\"DeepLabV3\" --batch_size=3 --continue_training=False" 126 | ] 127 | }, 128 | { 129 | "cell_type": "markdown", 130 | "metadata": {}, 131 | "source": [ 132 | "# Test the model\n", 133 | "\n", 134 | "Here is the code to test you model over your test set.\n", 135 | "\n", 136 | "Call the test script (**test.py**) and pass the parameters\n", 137 | "\n", 138 | "The **checkpoint_path** is the path where the weights for that trained model are located." 139 | ] 140 | }, 141 | { 142 | "cell_type": "code", 143 | "execution_count": null, 144 | "metadata": {}, 145 | "outputs": [], 146 | "source": [ 147 | "%cd /content/drive/My\\ Drive/DeepLearning/Semantic-Segmentation-Suite-master/\n", 148 | "\n", 149 | "!python test.py --dataset=\"Dataset_ArticleBackground\" --model=\"FRRN-B\" --checkpoint_path='checkpoints/latest_model_FRRN-B_Dataset_ArticleBackground.ckpt' " 150 | ] 151 | }, 152 | { 153 | "cell_type": "markdown", 154 | "metadata": {}, 155 | "source": [ 156 | "# Make a Prediction\n", 157 | "\n", 158 | "This code is used when you want to make a prediction for new single images.\n", 159 | "\n", 160 | "Call the **predict.py** with the correct parameters." 161 | ] 162 | }, 163 | { 164 | "cell_type": "code", 165 | "execution_count": null, 166 | "metadata": {}, 167 | "outputs": [], 168 | "source": [ 169 | "%cd /content/drive/My\\ Drive/DeepLearning/Semantic-Segmentation-Suite-master/\n", 170 | "\n", 171 | "!python predict.py --dataset=\"Dataset_ArticleBackground\" --model=\"FRRN-B\" --checkpoint_path='checkpoints/latest_model_FRRN-B_Dataset_ArticleBackground.ckpt' --crop_height=512 --crop_width=512 --image=\"Dataset_ArticleBackground/test/115.png\"" 172 | ] 173 | }, 174 | { 175 | "cell_type": "markdown", 176 | "metadata": {}, 177 | "source": [ 178 | "![banner Creative Commons INCoD UFSC](http://www.lapix.ufsc.br/wp-content/uploads/2019/05/cc.png)" 179 | ] 180 | } 181 | ], 182 | "metadata": { 183 | "kernelspec": { 184 | "display_name": "Python 3", 185 | "language": "python", 186 | "name": "python3" 187 | }, 188 | "language_info": { 189 | "codemirror_mode": { 190 | "name": "ipython", 191 | "version": 3 192 | }, 193 | "file_extension": ".py", 194 | "mimetype": "text/x-python", 195 | "name": "python", 196 | "nbconvert_exporter": "python", 197 | "pygments_lexer": "ipython3", 198 | "version": "3.7.1" 199 | }, 200 | "varInspector": { 201 | "cols": { 202 | "lenName": "20", 203 | "lenType": "20", 204 | "lenVar": "60" 205 | }, 206 | "kernels_config": { 207 | "python": { 208 | "delete_cmd_postfix": "", 209 | "delete_cmd_prefix": "del ", 210 | "library": "var_list.py", 211 | "varRefreshCmd": "print(var_dic_list())" 212 | }, 213 | "r": { 214 | "delete_cmd_postfix": ") ", 215 | "delete_cmd_prefix": "rm(", 216 | "library": "var_list.r", 217 | "varRefreshCmd": "cat(var_dic_list()) " 218 | } 219 | }, 220 | "types_to_exclude": [ 221 | "module", 222 | "function", 223 | "builtin_function_or_method", 224 | "instance", 225 | "_Feature" 226 | ], 227 | "window_display": false 228 | } 229 | }, 230 | "nbformat": 4, 231 | "nbformat_minor": 2 232 | } 233 | -------------------------------------------------------------------------------- /builders/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awangenh/Weed-Mapping/72526ebbc2abe3b9d35672689de25a321e36b039/builders/__init__.py -------------------------------------------------------------------------------- /builders/frontend_builder.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow.contrib import slim 3 | from frontends import resnet_v2 4 | from frontends import mobilenet_v2 5 | from frontends import inception_v4 6 | import os 7 | 8 | 9 | def build_frontend(inputs, frontend, is_training=True, pretrained_dir="models"): 10 | if frontend == 'ResNet50': 11 | with slim.arg_scope(resnet_v2.resnet_arg_scope()): 12 | logits, end_points = resnet_v2.resnet_v2_50(inputs, is_training=is_training, scope='resnet_v2_50') 13 | frontend_scope='resnet_v2_50' 14 | init_fn = slim.assign_from_checkpoint_fn(model_path=os.path.join(pretrained_dir, 'resnet_v2_50.ckpt'), var_list=slim.get_model_variables('resnet_v2_50'), ignore_missing_vars=True) 15 | elif frontend == 'ResNet101': 16 | with slim.arg_scope(resnet_v2.resnet_arg_scope()): 17 | logits, end_points = resnet_v2.resnet_v2_101(inputs, is_training=is_training, scope='resnet_v2_101') 18 | frontend_scope='resnet_v2_101' 19 | init_fn = slim.assign_from_checkpoint_fn(model_path=os.path.join(pretrained_dir, 'resnet_v2_101.ckpt'), var_list=slim.get_model_variables('resnet_v2_101'), ignore_missing_vars=True) 20 | elif frontend == 'ResNet152': 21 | with slim.arg_scope(resnet_v2.resnet_arg_scope()): 22 | logits, end_points = resnet_v2.resnet_v2_152(inputs, is_training=is_training, scope='resnet_v2_152') 23 | frontend_scope='resnet_v2_152' 24 | init_fn = slim.assign_from_checkpoint_fn(model_path=os.path.join(pretrained_dir, 'resnet_v2_152.ckpt'), var_list=slim.get_model_variables('resnet_v2_152'), ignore_missing_vars=True) 25 | elif frontend == 'MobileNetV2': 26 | with slim.arg_scope(mobilenet_v2.training_scope()): 27 | logits, end_points = mobilenet_v2.mobilenet(inputs, is_training=is_training, scope='mobilenet_v2', base_only=True) 28 | frontend_scope='mobilenet_v2' 29 | init_fn = slim.assign_from_checkpoint_fn(model_path=os.path.join(pretrained_dir, 'mobilenet_v2.ckpt'), var_list=slim.get_model_variables('mobilenet_v2'), ignore_missing_vars=True) 30 | elif frontend == 'InceptionV4': 31 | with slim.arg_scope(inception_v4.inception_v4_arg_scope()): 32 | logits, end_points = inception_v4.inception_v4(inputs, is_training=is_training, scope='inception_v4') 33 | frontend_scope='inception_v4' 34 | init_fn = slim.assign_from_checkpoint_fn(model_path=os.path.join(pretrained_dir, 'inception_v4.ckpt'), var_list=slim.get_model_variables('inception_v4'), ignore_missing_vars=True) 35 | else: 36 | raise ValueError("Unsupported fronetnd model '%s'. This function only supports ResNet50, ResNet101, ResNet152, and MobileNetV2" % (frontend)) 37 | 38 | return logits, end_points, frontend_scope, init_fn -------------------------------------------------------------------------------- /builders/model_builder.py: -------------------------------------------------------------------------------- 1 | import sys, os 2 | import tensorflow as tf 3 | import subprocess 4 | 5 | sys.path.append("models") 6 | from models.FC_DenseNet_Tiramisu import build_fc_densenet 7 | from models.Encoder_Decoder import build_encoder_decoder 8 | from models.RefineNet import build_refinenet 9 | from models.FRRN import build_frrn 10 | from models.MobileUNet import build_mobile_unet 11 | from models.PSPNet import build_pspnet 12 | from models.GCN import build_gcn 13 | from models.DeepLabV3 import build_deeplabv3 14 | from models.DeepLabV3_plus import build_deeplabv3_plus 15 | from models.AdapNet import build_adaptnet 16 | from models.custom_model import build_custom 17 | from models.DenseASPP import build_dense_aspp 18 | from models.DDSC import build_ddsc 19 | from models.BiSeNet import build_bisenet 20 | 21 | SUPPORTED_MODELS = ["FC-DenseNet56", "FC-DenseNet67", "FC-DenseNet103", "Encoder-Decoder", "Encoder-Decoder-Skip", "RefineNet", 22 | "FRRN-A", "FRRN-B", "MobileUNet", "MobileUNet-Skip", "PSPNet", "GCN", "DeepLabV3", "DeepLabV3_plus", "AdapNet", 23 | "DenseASPP", "DDSC", "BiSeNet", "custom"] 24 | 25 | SUPPORTED_FRONTENDS = ["ResNet50", "ResNet101", "ResNet152", "MobileNetV2", "InceptionV4"] 26 | 27 | def download_checkpoints(model_name): 28 | subprocess.check_output(["python", "utils/get_pretrained_checkpoints.py", "--model=" + model_name]) 29 | 30 | 31 | 32 | def build_model(model_name, net_input, num_classes, crop_width, crop_height, frontend="ResNet101", is_training=True): 33 | # Get the selected model. 34 | # Some of them require pre-trained ResNet 35 | 36 | print("Preparing the model ...") 37 | 38 | if model_name not in SUPPORTED_MODELS: 39 | raise ValueError("The model you selected is not supported. The following models are currently supported: {0}".format(SUPPORTED_MODELS)) 40 | 41 | if frontend not in SUPPORTED_FRONTENDS: 42 | raise ValueError("The frontend you selected is not supported. The following models are currently supported: {0}".format(SUPPORTED_FRONTENDS)) 43 | 44 | if "ResNet50" == frontend and not os.path.isfile("models/resnet_v2_50.ckpt"): 45 | download_checkpoints("ResNet50") 46 | if "ResNet101" == frontend and not os.path.isfile("models/resnet_v2_101.ckpt"): 47 | download_checkpoints("ResNet101") 48 | if "ResNet152" == frontend and not os.path.isfile("models/resnet_v2_152.ckpt"): 49 | download_checkpoints("ResNet152") 50 | if "MobileNetV2" == frontend and not os.path.isfile("models/mobilenet_v2.ckpt.data-00000-of-00001"): 51 | download_checkpoints("MobileNetV2") 52 | if "InceptionV4" == frontend and not os.path.isfile("models/inception_v4.ckpt"): 53 | download_checkpoints("InceptionV4") 54 | 55 | network = None 56 | init_fn = None 57 | if model_name == "FC-DenseNet56" or model_name == "FC-DenseNet67" or model_name == "FC-DenseNet103": 58 | network = build_fc_densenet(net_input, preset_model = model_name, num_classes=num_classes) 59 | elif model_name == "RefineNet": 60 | # RefineNet requires pre-trained ResNet weights 61 | network, init_fn = build_refinenet(net_input, preset_model = model_name, frontend=frontend, num_classes=num_classes, is_training=is_training) 62 | elif model_name == "FRRN-A" or model_name == "FRRN-B": 63 | network = build_frrn(net_input, preset_model = model_name, num_classes=num_classes) 64 | elif model_name == "Encoder-Decoder" or model_name == "Encoder-Decoder-Skip": 65 | network = build_encoder_decoder(net_input, preset_model = model_name, num_classes=num_classes) 66 | elif model_name == "MobileUNet" or model_name == "MobileUNet-Skip": 67 | network = build_mobile_unet(net_input, preset_model = model_name, num_classes=num_classes) 68 | elif model_name == "PSPNet": 69 | # Image size is required for PSPNet 70 | # PSPNet requires pre-trained ResNet weights 71 | network, init_fn = build_pspnet(net_input, label_size=[crop_height, crop_width], preset_model = model_name, frontend=frontend, num_classes=num_classes, is_training=is_training) 72 | elif model_name == "GCN": 73 | # GCN requires pre-trained ResNet weights 74 | network, init_fn = build_gcn(net_input, preset_model = model_name, frontend=frontend, num_classes=num_classes, is_training=is_training) 75 | elif model_name == "DeepLabV3": 76 | # DeepLabV requires pre-trained ResNet weights 77 | network, init_fn = build_deeplabv3(net_input, preset_model = model_name, frontend=frontend, num_classes=num_classes, is_training=is_training) 78 | elif model_name == "DeepLabV3_plus": 79 | # DeepLabV3+ requires pre-trained ResNet weights 80 | network, init_fn = build_deeplabv3_plus(net_input, preset_model = model_name, frontend=frontend, num_classes=num_classes, is_training=is_training) 81 | elif model_name == "DenseASPP": 82 | # DenseASPP requires pre-trained ResNet weights 83 | network, init_fn = build_dense_aspp(net_input, preset_model = model_name, frontend=frontend, num_classes=num_classes, is_training=is_training) 84 | elif model_name == "DDSC": 85 | # DDSC requires pre-trained ResNet weights 86 | network, init_fn = build_ddsc(net_input, preset_model = model_name, frontend=frontend, num_classes=num_classes, is_training=is_training) 87 | elif model_name == "BiSeNet": 88 | # BiSeNet requires pre-trained ResNet weights 89 | network, init_fn = build_bisenet(net_input, preset_model = model_name, frontend=frontend, num_classes=num_classes, is_training=is_training) 90 | elif model_name == "AdapNet": 91 | network = build_adaptnet(net_input, num_classes=num_classes) 92 | elif model_name == "custom": 93 | network = build_custom(net_input, num_classes) 94 | else: 95 | raise ValueError("Error: the model %d is not available. Try checking which models are available using the command python main.py --help") 96 | 97 | return network, init_fn -------------------------------------------------------------------------------- /frontends/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awangenh/Weed-Mapping/72526ebbc2abe3b9d35672689de25a321e36b039/frontends/__init__.py -------------------------------------------------------------------------------- /frontends/conv_blocks.py: -------------------------------------------------------------------------------- 1 | # Copyright 2018 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Convolution blocks for mobilenet.""" 16 | import contextlib 17 | import functools 18 | 19 | import tensorflow as tf 20 | 21 | slim = tf.contrib.slim 22 | 23 | 24 | def _fixed_padding(inputs, kernel_size, rate=1): 25 | """Pads the input along the spatial dimensions independently of input size. 26 | 27 | Pads the input such that if it was used in a convolution with 'VALID' padding, 28 | the output would have the same dimensions as if the unpadded input was used 29 | in a convolution with 'SAME' padding. 30 | 31 | Args: 32 | inputs: A tensor of size [batch, height_in, width_in, channels]. 33 | kernel_size: The kernel to be used in the conv2d or max_pool2d operation. 34 | rate: An integer, rate for atrous convolution. 35 | 36 | Returns: 37 | output: A tensor of size [batch, height_out, width_out, channels] with the 38 | input, either intact (if kernel_size == 1) or padded (if kernel_size > 1). 39 | """ 40 | kernel_size_effective = [kernel_size[0] + (kernel_size[0] - 1) * (rate - 1), 41 | kernel_size[0] + (kernel_size[0] - 1) * (rate - 1)] 42 | pad_total = [kernel_size_effective[0] - 1, kernel_size_effective[1] - 1] 43 | pad_beg = [pad_total[0] // 2, pad_total[1] // 2] 44 | pad_end = [pad_total[0] - pad_beg[0], pad_total[1] - pad_beg[1]] 45 | padded_inputs = tf.pad(inputs, [[0, 0], [pad_beg[0], pad_end[0]], 46 | [pad_beg[1], pad_end[1]], [0, 0]]) 47 | return padded_inputs 48 | 49 | 50 | def _make_divisible(v, divisor, min_value=None): 51 | if min_value is None: 52 | min_value = divisor 53 | new_v = max(min_value, int(v + divisor / 2) // divisor * divisor) 54 | # Make sure that round down does not go down by more than 10%. 55 | if new_v < 0.9 * v: 56 | new_v += divisor 57 | return new_v 58 | 59 | 60 | def _split_divisible(num, num_ways, divisible_by=8): 61 | """Evenly splits num, num_ways so each piece is a multiple of divisible_by.""" 62 | assert num % divisible_by == 0 63 | assert num / num_ways >= divisible_by 64 | # Note: want to round down, we adjust each split to match the total. 65 | base = num // num_ways // divisible_by * divisible_by 66 | result = [] 67 | accumulated = 0 68 | for i in range(num_ways): 69 | r = base 70 | while accumulated + r < num * (i + 1) / num_ways: 71 | r += divisible_by 72 | result.append(r) 73 | accumulated += r 74 | assert accumulated == num 75 | return result 76 | 77 | 78 | @contextlib.contextmanager 79 | def _v1_compatible_scope_naming(scope): 80 | if scope is None: # Create uniqified separable blocks. 81 | with tf.variable_scope(None, default_name='separable') as s, \ 82 | tf.name_scope(s.original_name_scope): 83 | yield '' 84 | else: 85 | # We use scope_depthwise, scope_pointwise for compatibility with V1 ckpts. 86 | # which provide numbered scopes. 87 | scope += '_' 88 | yield scope 89 | 90 | 91 | @slim.add_arg_scope 92 | def split_separable_conv2d(input_tensor, 93 | num_outputs, 94 | scope=None, 95 | normalizer_fn=None, 96 | stride=1, 97 | rate=1, 98 | endpoints=None, 99 | use_explicit_padding=False): 100 | """Separable mobilenet V1 style convolution. 101 | 102 | Depthwise convolution, with default non-linearity, 103 | followed by 1x1 depthwise convolution. This is similar to 104 | slim.separable_conv2d, but differs in tha it applies batch 105 | normalization and non-linearity to depthwise. This matches 106 | the basic building of Mobilenet Paper 107 | (https://arxiv.org/abs/1704.04861) 108 | 109 | Args: 110 | input_tensor: input 111 | num_outputs: number of outputs 112 | scope: optional name of the scope. Note if provided it will use 113 | scope_depthwise for deptwhise, and scope_pointwise for pointwise. 114 | normalizer_fn: which normalizer function to use for depthwise/pointwise 115 | stride: stride 116 | rate: output rate (also known as dilation rate) 117 | endpoints: optional, if provided, will export additional tensors to it. 118 | use_explicit_padding: Use 'VALID' padding for convolutions, but prepad 119 | inputs so that the output dimensions are the same as if 'SAME' padding 120 | were used. 121 | 122 | Returns: 123 | output tesnor 124 | """ 125 | 126 | with _v1_compatible_scope_naming(scope) as scope: 127 | dw_scope = scope + 'depthwise' 128 | endpoints = endpoints if endpoints is not None else {} 129 | kernel_size = [3, 3] 130 | padding = 'SAME' 131 | if use_explicit_padding: 132 | padding = 'VALID' 133 | input_tensor = _fixed_padding(input_tensor, kernel_size, rate) 134 | net = slim.separable_conv2d( 135 | input_tensor, 136 | None, 137 | kernel_size, 138 | depth_multiplier=1, 139 | stride=stride, 140 | rate=rate, 141 | normalizer_fn=normalizer_fn, 142 | padding=padding, 143 | scope=dw_scope) 144 | 145 | endpoints[dw_scope] = net 146 | 147 | pw_scope = scope + 'pointwise' 148 | net = slim.conv2d( 149 | net, 150 | num_outputs, [1, 1], 151 | stride=1, 152 | normalizer_fn=normalizer_fn, 153 | scope=pw_scope) 154 | endpoints[pw_scope] = net 155 | return net 156 | 157 | 158 | def expand_input_by_factor(n, divisible_by=8): 159 | return lambda num_inputs, **_: _make_divisible(num_inputs * n, divisible_by) 160 | 161 | 162 | @slim.add_arg_scope 163 | def expanded_conv(input_tensor, 164 | num_outputs, 165 | expansion_size=expand_input_by_factor(6), 166 | stride=1, 167 | rate=1, 168 | kernel_size=(3, 3), 169 | residual=True, 170 | normalizer_fn=None, 171 | project_activation_fn=tf.identity, 172 | split_projection=1, 173 | split_expansion=1, 174 | expansion_transform=None, 175 | depthwise_location='expansion', 176 | depthwise_channel_multiplier=1, 177 | endpoints=None, 178 | use_explicit_padding=False, 179 | padding='SAME', 180 | scope=None): 181 | """Depthwise Convolution Block with expansion. 182 | 183 | Builds a composite convolution that has the following structure 184 | expansion (1x1) -> depthwise (kernel_size) -> projection (1x1) 185 | 186 | Args: 187 | input_tensor: input 188 | num_outputs: number of outputs in the final layer. 189 | expansion_size: the size of expansion, could be a constant or a callable. 190 | If latter it will be provided 'num_inputs' as an input. For forward 191 | compatibility it should accept arbitrary keyword arguments. 192 | Default will expand the input by factor of 6. 193 | stride: depthwise stride 194 | rate: depthwise rate 195 | kernel_size: depthwise kernel 196 | residual: whether to include residual connection between input 197 | and output. 198 | normalizer_fn: batchnorm or otherwise 199 | project_activation_fn: activation function for the project layer 200 | split_projection: how many ways to split projection operator 201 | (that is conv expansion->bottleneck) 202 | split_expansion: how many ways to split expansion op 203 | (that is conv bottleneck->expansion) ops will keep depth divisible 204 | by this value. 205 | expansion_transform: Optional function that takes expansion 206 | as a single input and returns output. 207 | depthwise_location: where to put depthwise covnvolutions supported 208 | values None, 'input', 'output', 'expansion' 209 | depthwise_channel_multiplier: depthwise channel multiplier: 210 | each input will replicated (with different filters) 211 | that many times. So if input had c channels, 212 | output will have c x depthwise_channel_multpilier. 213 | endpoints: An optional dictionary into which intermediate endpoints are 214 | placed. The keys "expansion_output", "depthwise_output", 215 | "projection_output" and "expansion_transform" are always populated, even 216 | if the corresponding functions are not invoked. 217 | use_explicit_padding: Use 'VALID' padding for convolutions, but prepad 218 | inputs so that the output dimensions are the same as if 'SAME' padding 219 | were used. 220 | padding: Padding type to use if `use_explicit_padding` is not set. 221 | scope: optional scope. 222 | 223 | Returns: 224 | Tensor of depth num_outputs 225 | 226 | Raises: 227 | TypeError: on inval 228 | """ 229 | with tf.variable_scope(scope, default_name='expanded_conv') as s, \ 230 | tf.name_scope(s.original_name_scope): 231 | prev_depth = input_tensor.get_shape().as_list()[3] 232 | if depthwise_location not in [None, 'input', 'output', 'expansion']: 233 | raise TypeError('%r is unknown value for depthwise_location' % 234 | depthwise_location) 235 | if use_explicit_padding: 236 | if padding != 'SAME': 237 | raise TypeError('`use_explicit_padding` should only be used with ' 238 | '"SAME" padding.') 239 | padding = 'VALID' 240 | depthwise_func = functools.partial( 241 | slim.separable_conv2d, 242 | num_outputs=None, 243 | kernel_size=kernel_size, 244 | depth_multiplier=depthwise_channel_multiplier, 245 | stride=stride, 246 | rate=rate, 247 | normalizer_fn=normalizer_fn, 248 | padding=padding, 249 | scope='depthwise') 250 | # b1 -> b2 * r -> b2 251 | # i -> (o * r) (bottleneck) -> o 252 | input_tensor = tf.identity(input_tensor, 'input') 253 | net = input_tensor 254 | 255 | if depthwise_location == 'input': 256 | if use_explicit_padding: 257 | net = _fixed_padding(net, kernel_size, rate) 258 | net = depthwise_func(net, activation_fn=None) 259 | 260 | if callable(expansion_size): 261 | inner_size = expansion_size(num_inputs=prev_depth) 262 | else: 263 | inner_size = expansion_size 264 | 265 | if inner_size > net.shape[3]: 266 | net = split_conv( 267 | net, 268 | inner_size, 269 | num_ways=split_expansion, 270 | scope='expand', 271 | stride=1, 272 | normalizer_fn=normalizer_fn) 273 | net = tf.identity(net, 'expansion_output') 274 | if endpoints is not None: 275 | endpoints['expansion_output'] = net 276 | 277 | if depthwise_location == 'expansion': 278 | if use_explicit_padding: 279 | net = _fixed_padding(net, kernel_size, rate) 280 | net = depthwise_func(net) 281 | 282 | net = tf.identity(net, name='depthwise_output') 283 | if endpoints is not None: 284 | endpoints['depthwise_output'] = net 285 | if expansion_transform: 286 | net = expansion_transform(expansion_tensor=net, input_tensor=input_tensor) 287 | # Note in contrast with expansion, we always have 288 | # projection to produce the desired output size. 289 | net = split_conv( 290 | net, 291 | num_outputs, 292 | num_ways=split_projection, 293 | stride=1, 294 | scope='project', 295 | normalizer_fn=normalizer_fn, 296 | activation_fn=project_activation_fn) 297 | if endpoints is not None: 298 | endpoints['projection_output'] = net 299 | if depthwise_location == 'output': 300 | if use_explicit_padding: 301 | net = _fixed_padding(net, kernel_size, rate) 302 | net = depthwise_func(net, activation_fn=None) 303 | 304 | if callable(residual): # custom residual 305 | net = residual(input_tensor=input_tensor, output_tensor=net) 306 | elif (residual and 307 | # stride check enforces that we don't add residuals when spatial 308 | # dimensions are None 309 | stride == 1 and 310 | # Depth matches 311 | net.get_shape().as_list()[3] == 312 | input_tensor.get_shape().as_list()[3]): 313 | net += input_tensor 314 | return tf.identity(net, name='output') 315 | 316 | 317 | def split_conv(input_tensor, 318 | num_outputs, 319 | num_ways, 320 | scope, 321 | divisible_by=8, 322 | **kwargs): 323 | """Creates a split convolution. 324 | 325 | Split convolution splits the input and output into 326 | 'num_blocks' blocks of approximately the same size each, 327 | and only connects $i$-th input to $i$ output. 328 | 329 | Args: 330 | input_tensor: input tensor 331 | num_outputs: number of output filters 332 | num_ways: num blocks to split by. 333 | scope: scope for all the operators. 334 | divisible_by: make sure that every part is divisiable by this. 335 | **kwargs: will be passed directly into conv2d operator 336 | Returns: 337 | tensor 338 | """ 339 | b = input_tensor.get_shape().as_list()[3] 340 | 341 | if num_ways == 1 or min(b // num_ways, 342 | num_outputs // num_ways) < divisible_by: 343 | # Don't do any splitting if we end up with less than 8 filters 344 | # on either side. 345 | return slim.conv2d(input_tensor, num_outputs, [1, 1], scope=scope, **kwargs) 346 | 347 | outs = [] 348 | input_splits = _split_divisible(b, num_ways, divisible_by=divisible_by) 349 | output_splits = _split_divisible( 350 | num_outputs, num_ways, divisible_by=divisible_by) 351 | inputs = tf.split(input_tensor, input_splits, axis=3, name='split_' + scope) 352 | base = scope 353 | for i, (input_tensor, out_size) in enumerate(zip(inputs, output_splits)): 354 | scope = base + '_part_%d' % (i,) 355 | n = slim.conv2d(input_tensor, out_size, [1, 1], scope=scope, **kwargs) 356 | n = tf.identity(n, scope + '_output') 357 | outs.append(n) 358 | return tf.concat(outs, 3, name=scope + '_concat') -------------------------------------------------------------------------------- /frontends/inception_utils.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Contains common code shared by all inception models. 16 | 17 | Usage of arg scope: 18 | with slim.arg_scope(inception_arg_scope()): 19 | logits, end_points = inception.inception_v3(images, num_classes, 20 | is_training=is_training) 21 | 22 | """ 23 | from __future__ import absolute_import 24 | from __future__ import division 25 | from __future__ import print_function 26 | 27 | import tensorflow as tf 28 | 29 | slim = tf.contrib.slim 30 | 31 | 32 | def inception_arg_scope(weight_decay=0.00004, 33 | use_batch_norm=True, 34 | batch_norm_decay=0.9997, 35 | batch_norm_epsilon=0.001, 36 | activation_fn=tf.nn.relu, 37 | batch_norm_updates_collections=tf.GraphKeys.UPDATE_OPS): 38 | """Defines the default arg scope for inception models. 39 | 40 | Args: 41 | weight_decay: The weight decay to use for regularizing the model. 42 | use_batch_norm: "If `True`, batch_norm is applied after each convolution. 43 | batch_norm_decay: Decay for batch norm moving average. 44 | batch_norm_epsilon: Small float added to variance to avoid dividing by zero 45 | in batch norm. 46 | activation_fn: Activation function for conv2d. 47 | batch_norm_updates_collections: Collection for the update ops for 48 | batch norm. 49 | 50 | Returns: 51 | An `arg_scope` to use for the inception models. 52 | """ 53 | batch_norm_params = { 54 | # Decay for the moving averages. 55 | 'decay': batch_norm_decay, 56 | # epsilon to prevent 0s in variance. 57 | 'epsilon': batch_norm_epsilon, 58 | # collection containing update_ops. 59 | 'updates_collections': batch_norm_updates_collections, 60 | # use fused batch norm if possible. 61 | 'fused': None, 62 | } 63 | if use_batch_norm: 64 | normalizer_fn = slim.batch_norm 65 | normalizer_params = batch_norm_params 66 | else: 67 | normalizer_fn = None 68 | normalizer_params = {} 69 | # Set weight_decay for weights in Conv and FC layers. 70 | with slim.arg_scope([slim.conv2d, slim.fully_connected], 71 | weights_regularizer=slim.l2_regularizer(weight_decay)): 72 | with slim.arg_scope( 73 | [slim.conv2d], 74 | weights_initializer=slim.variance_scaling_initializer(), 75 | activation_fn=activation_fn, 76 | normalizer_fn=normalizer_fn, 77 | normalizer_params=normalizer_params) as sc: 78 | return sc -------------------------------------------------------------------------------- /frontends/mobilenet_v2.py: -------------------------------------------------------------------------------- 1 | # Copyright 2018 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Implementation of Mobilenet V2. 16 | 17 | Architecture: https://arxiv.org/abs/1801.04381 18 | 19 | The base model gives 72.2% accuracy on ImageNet, with 300MMadds, 20 | 3.4 M parameters. 21 | """ 22 | 23 | from __future__ import absolute_import 24 | from __future__ import division 25 | from __future__ import print_function 26 | 27 | import copy 28 | import functools 29 | 30 | import tensorflow as tf 31 | 32 | from frontends import conv_blocks as ops 33 | from frontends import mobilenet_base as lib 34 | 35 | slim = tf.contrib.slim 36 | op = lib.op 37 | 38 | expand_input = ops.expand_input_by_factor 39 | 40 | # pyformat: disable 41 | # Architecture: https://arxiv.org/abs/1801.04381 42 | V2_DEF = dict( 43 | defaults={ 44 | # Note: these parameters of batch norm affect the architecture 45 | # that's why they are here and not in training_scope. 46 | (slim.batch_norm,): {'center': True, 'scale': True}, 47 | (slim.conv2d, slim.fully_connected, slim.separable_conv2d): { 48 | 'normalizer_fn': slim.batch_norm, 'activation_fn': tf.nn.relu6 49 | }, 50 | (ops.expanded_conv,): { 51 | 'expansion_size': expand_input(6), 52 | 'split_expansion': 1, 53 | 'normalizer_fn': slim.batch_norm, 54 | 'residual': True 55 | }, 56 | (slim.conv2d, slim.separable_conv2d): {'padding': 'SAME'} 57 | }, 58 | spec=[ 59 | op(slim.conv2d, stride=2, num_outputs=32, kernel_size=[3, 3]), 60 | op(ops.expanded_conv, 61 | expansion_size=expand_input(1, divisible_by=1), 62 | num_outputs=16), 63 | op(ops.expanded_conv, stride=2, num_outputs=24), 64 | op(ops.expanded_conv, stride=1, num_outputs=24), 65 | op(ops.expanded_conv, stride=2, num_outputs=32), 66 | op(ops.expanded_conv, stride=1, num_outputs=32), 67 | op(ops.expanded_conv, stride=1, num_outputs=32), 68 | op(ops.expanded_conv, stride=2, num_outputs=64), 69 | op(ops.expanded_conv, stride=1, num_outputs=64), 70 | op(ops.expanded_conv, stride=1, num_outputs=64), 71 | op(ops.expanded_conv, stride=1, num_outputs=64), 72 | op(ops.expanded_conv, stride=1, num_outputs=96), 73 | op(ops.expanded_conv, stride=1, num_outputs=96), 74 | op(ops.expanded_conv, stride=1, num_outputs=96), 75 | op(ops.expanded_conv, stride=2, num_outputs=160), 76 | op(ops.expanded_conv, stride=1, num_outputs=160), 77 | op(ops.expanded_conv, stride=1, num_outputs=160), 78 | op(ops.expanded_conv, stride=1, num_outputs=320), 79 | op(slim.conv2d, stride=1, kernel_size=[1, 1], num_outputs=1280) 80 | ], 81 | ) 82 | # pyformat: enable 83 | 84 | 85 | @slim.add_arg_scope 86 | def mobilenet(input_tensor, 87 | num_classes=1001, 88 | depth_multiplier=1.0, 89 | scope='MobilenetV2', 90 | conv_defs=None, 91 | finegrain_classification_mode=False, 92 | min_depth=None, 93 | divisible_by=None, 94 | **kwargs): 95 | """Creates mobilenet V2 network. 96 | 97 | Inference mode is created by default. To create training use training_scope 98 | below. 99 | 100 | with tf.contrib.slim.arg_scope(mobilenet_v2.training_scope()): 101 | logits, endpoints = mobilenet_v2.mobilenet(input_tensor) 102 | 103 | Args: 104 | input_tensor: The input tensor 105 | num_classes: number of classes 106 | depth_multiplier: The multiplier applied to scale number of 107 | channels in each layer. Note: this is called depth multiplier in the 108 | paper but the name is kept for consistency with slim's model builder. 109 | scope: Scope of the operator 110 | conv_defs: Allows to override default conv def. 111 | finegrain_classification_mode: When set to True, the model 112 | will keep the last layer large even for small multipliers. Following 113 | https://arxiv.org/abs/1801.04381 114 | suggests that it improves performance for ImageNet-type of problems. 115 | *Note* ignored if final_endpoint makes the builder exit earlier. 116 | min_depth: If provided, will ensure that all layers will have that 117 | many channels after application of depth multiplier. 118 | divisible_by: If provided will ensure that all layers # channels 119 | will be divisible by this number. 120 | **kwargs: passed directly to mobilenet.mobilenet: 121 | prediction_fn- what prediction function to use. 122 | reuse-: whether to reuse variables (if reuse set to true, scope 123 | must be given). 124 | Returns: 125 | logits/endpoints pair 126 | 127 | Raises: 128 | ValueError: On invalid arguments 129 | """ 130 | if conv_defs is None: 131 | conv_defs = V2_DEF 132 | if 'multiplier' in kwargs: 133 | raise ValueError('mobilenetv2 doesn\'t support generic ' 134 | 'multiplier parameter use "depth_multiplier" instead.') 135 | if finegrain_classification_mode: 136 | conv_defs = copy.deepcopy(conv_defs) 137 | if depth_multiplier < 1: 138 | conv_defs['spec'][-1].params['num_outputs'] /= depth_multiplier 139 | 140 | depth_args = {} 141 | # NB: do not set depth_args unless they are provided to avoid overriding 142 | # whatever default depth_multiplier might have thanks to arg_scope. 143 | if min_depth is not None: 144 | depth_args['min_depth'] = min_depth 145 | if divisible_by is not None: 146 | depth_args['divisible_by'] = divisible_by 147 | 148 | with slim.arg_scope((lib.depth_multiplier,), **depth_args): 149 | return lib.mobilenet( 150 | input_tensor, 151 | num_classes=num_classes, 152 | conv_defs=conv_defs, 153 | scope=scope, 154 | multiplier=depth_multiplier, 155 | **kwargs) 156 | 157 | 158 | def wrapped_partial(func, *args, **kwargs): 159 | partial_func = functools.partial(func, *args, **kwargs) 160 | functools.update_wrapper(partial_func, func) 161 | return partial_func 162 | 163 | 164 | # Wrappers for mobilenet v2 with depth-multipliers. Be noticed that 165 | # 'finegrain_classification_mode' is set to True, which means the embedding 166 | # layer will not be shrinked when given a depth-multiplier < 1.0. 167 | mobilenet_v2_140 = wrapped_partial(mobilenet, depth_multiplier=1.4) 168 | mobilenet_v2_050 = wrapped_partial(mobilenet, depth_multiplier=0.50, 169 | finegrain_classification_mode=True) 170 | mobilenet_v2_035 = wrapped_partial(mobilenet, depth_multiplier=0.35, 171 | finegrain_classification_mode=True) 172 | 173 | 174 | @slim.add_arg_scope 175 | def mobilenet_base(input_tensor, depth_multiplier=1.0, **kwargs): 176 | """Creates base of the mobilenet (no pooling and no logits) .""" 177 | return mobilenet(input_tensor, 178 | depth_multiplier=depth_multiplier, 179 | base_only=True, **kwargs) 180 | 181 | 182 | def training_scope(**kwargs): 183 | """Defines MobilenetV2 training scope. 184 | 185 | Usage: 186 | with tf.contrib.slim.arg_scope(mobilenet_v2.training_scope()): 187 | logits, endpoints = mobilenet_v2.mobilenet(input_tensor) 188 | 189 | with slim. 190 | 191 | Args: 192 | **kwargs: Passed to mobilenet.training_scope. The following parameters 193 | are supported: 194 | weight_decay- The weight decay to use for regularizing the model. 195 | stddev- Standard deviation for initialization, if negative uses xavier. 196 | dropout_keep_prob- dropout keep probability 197 | bn_decay- decay for the batch norm moving averages. 198 | 199 | Returns: 200 | An `arg_scope` to use for the mobilenet v2 model. 201 | """ 202 | return lib.training_scope(**kwargs) 203 | 204 | 205 | __all__ = ['training_scope', 'mobilenet_base', 'mobilenet', 'V2_DEF'] -------------------------------------------------------------------------------- /frontends/resnet_utils.py: -------------------------------------------------------------------------------- 1 | # Copyright 2016 The TensorFlow Authors. All Rights Reserved. 2 | # 3 | # Licensed under the Apache License, Version 2.0 (the "License"); 4 | # you may not use this file except in compliance with the License. 5 | # You may obtain a copy of the License at 6 | # 7 | # http://www.apache.org/licenses/LICENSE-2.0 8 | # 9 | # Unless required by applicable law or agreed to in writing, software 10 | # distributed under the License is distributed on an "AS IS" BASIS, 11 | # WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. 12 | # See the License for the specific language governing permissions and 13 | # limitations under the License. 14 | # ============================================================================== 15 | """Contains building blocks for various versions of Residual Networks. 16 | 17 | Residual networks (ResNets) were proposed in: 18 | Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun 19 | Deep Residual Learning for Image Recognition. arXiv:1512.03385, 2015 20 | 21 | More variants were introduced in: 22 | Kaiming He, Xiangyu Zhang, Shaoqing Ren, Jian Sun 23 | Identity Mappings in Deep Residual Networks. arXiv: 1603.05027, 2016 24 | 25 | We can obtain different ResNet variants by changing the network depth, width, 26 | and form of residual unit. This module implements the infrastructure for 27 | building them. Concrete ResNet units and full ResNet networks are implemented in 28 | the accompanying resnet_v1.py and resnet_v2.py modules. 29 | 30 | Compared to https://github.com/KaimingHe/deep-residual-networks, in the current 31 | implementation we subsample the output activations in the last residual unit of 32 | each block, instead of subsampling the input activations in the first residual 33 | unit of each block. The two implementations give identical results but our 34 | implementation is more memory efficient. 35 | """ 36 | from __future__ import absolute_import 37 | from __future__ import division 38 | from __future__ import print_function 39 | 40 | import collections 41 | import tensorflow as tf 42 | 43 | slim = tf.contrib.slim 44 | 45 | 46 | class Block(collections.namedtuple('Block', ['scope', 'unit_fn', 'args'])): 47 | """A named tuple describing a ResNet block. 48 | 49 | Its parts are: 50 | scope: The scope of the `Block`. 51 | unit_fn: The ResNet unit function which takes as input a `Tensor` and 52 | returns another `Tensor` with the output of the ResNet unit. 53 | args: A list of length equal to the number of units in the `Block`. The list 54 | contains one (depth, depth_bottleneck, stride) tuple for each unit in the 55 | block to serve as argument to unit_fn. 56 | """ 57 | 58 | 59 | def subsample(inputs, factor, scope=None): 60 | """Subsamples the input along the spatial dimensions. 61 | 62 | Args: 63 | inputs: A `Tensor` of size [batch, height_in, width_in, channels]. 64 | factor: The subsampling factor. 65 | scope: Optional variable_scope. 66 | 67 | Returns: 68 | output: A `Tensor` of size [batch, height_out, width_out, channels] with the 69 | input, either intact (if factor == 1) or subsampled (if factor > 1). 70 | """ 71 | if factor == 1: 72 | return inputs 73 | else: 74 | return slim.max_pool2d(inputs, [1, 1], stride=factor, scope=scope) 75 | 76 | 77 | def conv2d_same(inputs, num_outputs, kernel_size, stride, rate=1, scope=None): 78 | """Strided 2-D convolution with 'SAME' padding. 79 | 80 | When stride > 1, then we do explicit zero-padding, followed by conv2d with 81 | 'VALID' padding. 82 | 83 | Note that 84 | 85 | net = conv2d_same(inputs, num_outputs, 3, stride=stride) 86 | 87 | is equivalent to 88 | 89 | net = slim.conv2d(inputs, num_outputs, 3, stride=1, padding='SAME') 90 | net = subsample(net, factor=stride) 91 | 92 | whereas 93 | 94 | net = slim.conv2d(inputs, num_outputs, 3, stride=stride, padding='SAME') 95 | 96 | is different when the input's height or width is even, which is why we add the 97 | current function. For more details, see ResnetUtilsTest.testConv2DSameEven(). 98 | 99 | Args: 100 | inputs: A 4-D tensor of size [batch, height_in, width_in, channels]. 101 | num_outputs: An integer, the number of output filters. 102 | kernel_size: An int with the kernel_size of the filters. 103 | stride: An integer, the output stride. 104 | rate: An integer, rate for atrous convolution. 105 | scope: Scope. 106 | 107 | Returns: 108 | output: A 4-D tensor of size [batch, height_out, width_out, channels] with 109 | the convolution output. 110 | """ 111 | if stride == 1: 112 | return slim.conv2d(inputs, num_outputs, kernel_size, stride=1, rate=rate, 113 | padding='SAME', scope=scope) 114 | else: 115 | kernel_size_effective = kernel_size + (kernel_size - 1) * (rate - 1) 116 | pad_total = kernel_size_effective - 1 117 | pad_beg = pad_total // 2 118 | pad_end = pad_total - pad_beg 119 | inputs = tf.pad(inputs, 120 | [[0, 0], [pad_beg, pad_end], [pad_beg, pad_end], [0, 0]]) 121 | return slim.conv2d(inputs, num_outputs, kernel_size, stride=stride, 122 | rate=rate, padding='VALID', scope=scope) 123 | 124 | 125 | @slim.add_arg_scope 126 | def stack_blocks_dense(net, blocks, multi_grid, output_stride=None, 127 | outputs_collections=None): 128 | """Stacks ResNet `Blocks` and controls output feature density. 129 | 130 | First, this function creates scopes for the ResNet in the form of 131 | 'block_name/unit_1', 'block_name/unit_2', etc. 132 | 133 | Second, this function allows the user to explicitly control the ResNet 134 | output_stride, which is the ratio of the input to output spatial resolution. 135 | This is useful for dense prediction tasks such as semantic segmentation or 136 | object detection. 137 | 138 | Most ResNets consist of 4 ResNet blocks and subsample the activations by a 139 | factor of 2 when transitioning between consecutive ResNet blocks. This results 140 | to a nominal ResNet output_stride equal to 8. If we set the output_stride to 141 | half the nominal network stride (e.g., output_stride=4), then we compute 142 | responses twice. 143 | 144 | Control of the output feature density is implemented by atrous convolution. 145 | 146 | Args: 147 | net: A `Tensor` of size [batch, height, width, channels]. 148 | blocks: A list of length equal to the number of ResNet `Blocks`. Each 149 | element is a ResNet `Block` object describing the units in the `Block`. 150 | output_stride: If `None`, then the output will be computed at the nominal 151 | network stride. If output_stride is not `None`, it specifies the requested 152 | ratio of input to output spatial resolution, which needs to be equal to 153 | the product of unit strides from the start up to some level of the ResNet. 154 | For example, if the ResNet employs units with strides 1, 2, 1, 3, 4, 1, 155 | then valid values for the output_stride are 1, 2, 6, 24 or None (which 156 | is equivalent to output_stride=24). 157 | outputs_collections: Collection to add the ResNet block outputs. 158 | 159 | Returns: 160 | net: Output tensor with stride equal to the specified output_stride. 161 | 162 | Raises: 163 | ValueError: If the target output_stride is not valid. 164 | """ 165 | # The current_stride variable keeps track of the effective stride of the 166 | # activations. This allows us to invoke atrous convolution whenever applying 167 | # the next residual unit would result in the activations having stride larger 168 | # than the target output_stride. 169 | current_stride = 1 170 | 171 | # The atrous convolution rate parameter. 172 | rate = 1 173 | 174 | for block in blocks: 175 | with tf.variable_scope(block.scope, 'block', [net]) as sc: 176 | for i, unit in enumerate(block.args): 177 | if output_stride is not None and current_stride > output_stride: 178 | raise ValueError('The target output_stride cannot be reached.') 179 | 180 | with tf.variable_scope('unit_%d' % (i + 1), values=[net]): 181 | # If we have reached the target output_stride, then we need to employ 182 | # atrous convolution with stride=1 and multiply the atrous rate by the 183 | # current unit's stride for use in subsequent layers. 184 | if output_stride is not None and current_stride == output_stride: 185 | # Only uses atrous convolutions with multi-graid rates in the last (block4) block 186 | if block.scope == "block4": 187 | net = block.unit_fn(net, rate=rate * multi_grid[i], **dict(unit, stride=1)) 188 | else: 189 | net = block.unit_fn(net, rate=rate, **dict(unit, stride=1)) 190 | rate *= unit.get('stride', 1) 191 | else: 192 | net = block.unit_fn(net, rate=1, **unit) 193 | current_stride *= unit.get('stride', 1) 194 | net = slim.utils.collect_named_outputs(outputs_collections, sc.name, net) 195 | 196 | if output_stride is not None and current_stride != output_stride: 197 | raise ValueError('The target output_stride cannot be reached.') 198 | 199 | return net 200 | 201 | 202 | def resnet_arg_scope(weight_decay=0.0001, 203 | is_training=True, 204 | batch_norm_decay=0.997, 205 | batch_norm_epsilon=1e-5, 206 | batch_norm_scale=True, 207 | activation_fn=tf.nn.relu, 208 | use_batch_norm=True): 209 | """Defines the default ResNet arg scope. 210 | 211 | TODO(gpapan): The batch-normalization related default values above are 212 | appropriate for use in conjunction with the reference ResNet models 213 | released at https://github.com/KaimingHe/deep-residual-networks. When 214 | training ResNets from scratch, they might need to be tuned. 215 | 216 | Args: 217 | weight_decay: The weight decay to use for regularizing the model. 218 | batch_norm_decay: The moving average decay when estimating layer activation 219 | statistics in batch normalization. 220 | batch_norm_epsilon: Small constant to prevent division by zero when 221 | normalizing activations by their variance in batch normalization. 222 | batch_norm_scale: If True, uses an explicit `gamma` multiplier to scale the 223 | activations in the batch normalization layer. 224 | activation_fn: The activation function which is used in ResNet. 225 | use_batch_norm: Whether or not to use batch normalization. 226 | 227 | Returns: 228 | An `arg_scope` to use for the resnet models. 229 | """ 230 | batch_norm_params = { 231 | 'decay': batch_norm_decay, 232 | 'epsilon': batch_norm_epsilon, 233 | 'scale': batch_norm_scale, 234 | 'updates_collections': None, 235 | 'is_training': is_training, 236 | 'fused': True, # Use fused batch norm if possible. 237 | } 238 | 239 | with slim.arg_scope( 240 | [slim.conv2d], 241 | weights_regularizer=slim.l2_regularizer(weight_decay), 242 | weights_initializer=slim.variance_scaling_initializer(), 243 | activation_fn=activation_fn, 244 | normalizer_fn=slim.batch_norm if use_batch_norm else None, 245 | normalizer_params=batch_norm_params): 246 | with slim.arg_scope([slim.batch_norm], **batch_norm_params): 247 | # The following implies padding='SAME' for pool1, which makes feature 248 | # alignment easier for dense prediction tasks. This is also used in 249 | # https://github.com/facebook/fb.resnet.torch. However the accompanying 250 | # code of 'Deep Residual Learning for Image Recognition' uses 251 | # padding='VALID' for pool1. You can switch to that choice by setting 252 | # slim.arg_scope([slim.max_pool2d], padding='VALID'). 253 | with slim.arg_scope([slim.max_pool2d], padding='SAME') as arg_sc: 254 | return arg_sc -------------------------------------------------------------------------------- /frontends/resnet_v1.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow.contrib import slim 3 | from frontends import resnet_utils 4 | 5 | resnet_arg_scope = resnet_utils.resnet_arg_scope 6 | 7 | @slim.add_arg_scope 8 | def bottleneck(inputs, depth, depth_bottleneck, stride, rate=1, 9 | outputs_collections=None, scope=None): 10 | """Bottleneck residual unit variant with BN after convolutions. 11 | This is the original residual unit proposed in [1]. See Fig. 1(a) of [2] for 12 | its definition. Note that we use here the bottleneck variant which has an 13 | extra bottleneck layer. 14 | When putting together two consecutive ResNet blocks that use this unit, one 15 | should use stride = 2 in the last unit of the first block. 16 | Args: 17 | inputs: A tensor of size [batch, height, width, channels]. 18 | depth: The depth of the ResNet unit output. 19 | depth_bottleneck: The depth of the bottleneck layers. 20 | stride: The ResNet unit's stride. Determines the amount of downsampling of 21 | the units output compared to its input. 22 | rate: An integer, rate for atrous convolution. 23 | outputs_collections: Collection to add the ResNet unit output. 24 | scope: Optional variable_scope. 25 | Returns: 26 | The ResNet unit's output. 27 | """ 28 | with tf.variable_scope(scope, 'bottleneck_v1', [inputs]) as sc: 29 | depth_in = slim.utils.last_dimension(inputs.get_shape(), min_rank=4) 30 | if depth == depth_in: 31 | shortcut = resnet_utils.subsample(inputs, stride, 'shortcut') 32 | else: 33 | shortcut = slim.conv2d(inputs, depth, [1, 1], stride=stride, 34 | activation_fn=None, scope='shortcut') 35 | residual = slim.conv2d(inputs, depth_bottleneck, [1, 1], stride=1, 36 | scope='conv1') 37 | residual = resnet_utils.conv2d_same(residual, depth_bottleneck, 3, stride, 38 | rate=rate, scope='conv2') 39 | residual = slim.conv2d(residual, depth, [1, 1], stride=1, 40 | activation_fn=None, scope='conv3') 41 | 42 | output = tf.nn.relu(shortcut + residual) 43 | 44 | return slim.utils.collect_named_outputs(outputs_collections, 45 | sc.original_name_scope, 46 | output) 47 | 48 | 49 | def resnet_v1(inputs, 50 | blocks, 51 | num_classes=None, 52 | is_training=True, 53 | global_pool=True, 54 | output_stride=None, 55 | include_root_block=True, 56 | spatial_squeeze=True, 57 | reuse=None, 58 | scope=None): 59 | """Generator for v1 ResNet models. 60 | 61 | This function generates a family of ResNet v1 models. See the resnet_v1_*() 62 | methods for specific model instantiations, obtained by selecting different 63 | block instantiations that produce ResNets of various depths. 64 | 65 | Training for image classification on Imagenet is usually done with [224, 224] 66 | inputs, resulting in [7, 7] feature maps at the output of the last ResNet 67 | block for the ResNets defined in [1] that have nominal stride equal to 32. 68 | However, for dense prediction tasks we advise that one uses inputs with 69 | spatial dimensions that are multiples of 32 plus 1, e.g., [321, 321]. In 70 | this case the feature maps at the ResNet output will have spatial shape 71 | [(height - 1) / output_stride + 1, (width - 1) / output_stride + 1] 72 | and corners exactly aligned with the input image corners, which greatly 73 | facilitates alignment of the features to the image. Using as input [225, 225] 74 | images results in [8, 8] feature maps at the output of the last ResNet block. 75 | 76 | For dense prediction tasks, the ResNet needs to run in fully-convolutional 77 | (FCN) mode and global_pool needs to be set to False. The ResNets in [1, 2] all 78 | have nominal stride equal to 32 and a good choice in FCN mode is to use 79 | output_stride=16 in order to increase the density of the computed features at 80 | small computational and memory overhead, cf. http://arxiv.org/abs/1606.00915. 81 | 82 | Args: 83 | inputs: A tensor of size [batch, height_in, width_in, channels]. 84 | blocks: A list of length equal to the number of ResNet blocks. Each element 85 | is a resnet_utils.Block object describing the units in the block. 86 | num_classes: Number of predicted classes for classification tasks. If None 87 | we return the features before the logit layer. 88 | is_training: whether is training or not. 89 | global_pool: If True, we perform global average pooling before computing the 90 | logits. Set to True for image classification, False for dense prediction. 91 | output_stride: If None, then the output will be computed at the nominal 92 | network stride. If output_stride is not None, it specifies the requested 93 | ratio of input to output spatial resolution. 94 | include_root_block: If True, include the initial convolution followed by 95 | max-pooling, if False excludes it. 96 | spatial_squeeze: if True, logits is of shape [B, C], if false logits is 97 | of shape [B, 1, 1, C], where B is batch_size and C is number of classes. 98 | reuse: whether or not the network and its variables should be reused. To be 99 | able to reuse 'scope' must be given. 100 | scope: Optional variable_scope. 101 | 102 | Returns: 103 | net: A rank-4 tensor of size [batch, height_out, width_out, channels_out]. 104 | If global_pool is False, then height_out and width_out are reduced by a 105 | factor of output_stride compared to the respective height_in and width_in, 106 | else both height_out and width_out equal one. If num_classes is None, then 107 | net is the output of the last ResNet block, potentially after global 108 | average pooling. If num_classes is not None, net contains the pre-softmax 109 | activations. 110 | end_points: A dictionary from components of the network to the corresponding 111 | activation. 112 | 113 | Raises: 114 | ValueError: If the target output_stride is not valid. 115 | """ 116 | with tf.variable_scope(scope, 'resnet_v1', [inputs], reuse=reuse) as sc: 117 | end_points_collection = sc.name + '_end_points' 118 | with slim.arg_scope([slim.conv2d, bottleneck, 119 | resnet_utils.stack_blocks_dense], 120 | outputs_collections=end_points_collection): 121 | with slim.arg_scope([slim.batch_norm], is_training=is_training): 122 | net = inputs 123 | if include_root_block: 124 | if output_stride is not None: 125 | if output_stride % 4 != 0: 126 | raise ValueError('The output_stride needs to be a multiple of 4.') 127 | output_stride /= 4 128 | net = resnet_utils.conv2d_same(net, 64, 7, stride=2, scope='conv1') 129 | net = slim.max_pool2d(net, [3, 3], stride=2, scope='pool1') 130 | 131 | net = slim.utils.collect_named_outputs(end_points_collection, 'pool2', net) 132 | 133 | net = resnet_utils.stack_blocks_dense(net, blocks, output_stride) 134 | end_points = slim.utils.convert_collection_to_dict(end_points_collection) 135 | 136 | end_points['pool3'] = end_points[scope + '/block1'] 137 | end_points['pool4'] = end_points[scope + '/block2'] 138 | end_points['pool5'] = net 139 | return net, end_points 140 | 141 | 142 | resnet_v1.default_image_size = 224 143 | 144 | def resnet_v1_50(inputs, 145 | num_classes=None, 146 | is_training=True, 147 | global_pool=True, 148 | output_stride=None, 149 | spatial_squeeze=True, 150 | reuse=None, 151 | scope='resnet_v1_50'): 152 | """ResNet-50 model of [1]. See resnet_v1() for arg and return description.""" 153 | blocks = [ 154 | resnet_utils.Block( 155 | 'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]), 156 | resnet_utils.Block( 157 | 'block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]), 158 | resnet_utils.Block( 159 | 'block3', bottleneck, [(1024, 256, 1)] * 5 + [(1024, 256, 2)]), 160 | resnet_utils.Block( 161 | 'block4', bottleneck, [(2048, 512, 1)] * 3) 162 | ] 163 | return resnet_v1(inputs, blocks, num_classes, is_training, 164 | global_pool=global_pool, output_stride=output_stride, 165 | include_root_block=True, spatial_squeeze=spatial_squeeze, 166 | reuse=reuse, scope=scope) 167 | 168 | 169 | resnet_v1_50.default_image_size = resnet_v1.default_image_size 170 | 171 | 172 | def resnet_v1_101(inputs, 173 | num_classes=None, 174 | is_training=True, 175 | global_pool=True, 176 | output_stride=None, 177 | spatial_squeeze=True, 178 | reuse=None, 179 | scope='resnet_v1_101'): 180 | """ResNet-101 model of [1]. See resnet_v1() for arg and return description.""" 181 | blocks = [ 182 | resnet_utils.Block( 183 | 'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]), 184 | resnet_utils.Block( 185 | 'block2', bottleneck, [(512, 128, 1)] * 3 + [(512, 128, 2)]), 186 | resnet_utils.Block( 187 | 'block3', bottleneck, [(1024, 256, 1)] * 22 + [(1024, 256, 2)]), 188 | resnet_utils.Block( 189 | 'block4', bottleneck, [(2048, 512, 1)] * 3) 190 | ] 191 | return resnet_v1(inputs, blocks, num_classes, is_training, 192 | global_pool=global_pool, output_stride=output_stride, 193 | include_root_block=True, spatial_squeeze=spatial_squeeze, 194 | reuse=reuse, scope=scope) 195 | 196 | 197 | resnet_v1_101.default_image_size = resnet_v1.default_image_size 198 | 199 | 200 | def resnet_v1_152(inputs, 201 | num_classes=None, 202 | is_training=True, 203 | global_pool=True, 204 | output_stride=None, 205 | spatial_squeeze=True, 206 | reuse=None, 207 | scope='resnet_v1_152'): 208 | """ResNet-152 model of [1]. See resnet_v1() for arg and return description.""" 209 | blocks = [ 210 | resnet_utils.Block( 211 | 'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]), 212 | resnet_utils.Block( 213 | 'block2', bottleneck, [(512, 128, 1)] * 7 + [(512, 128, 2)]), 214 | resnet_utils.Block( 215 | 'block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]), 216 | resnet_utils.Block( 217 | 'block4', bottleneck, [(2048, 512, 1)] * 3)] 218 | return resnet_v1(inputs, blocks, num_classes, is_training, 219 | global_pool=global_pool, output_stride=output_stride, 220 | include_root_block=True, spatial_squeeze=spatial_squeeze, 221 | reuse=reuse, scope=scope) 222 | 223 | 224 | resnet_v1_152.default_image_size = resnet_v1.default_image_size 225 | 226 | 227 | def resnet_v1_200(inputs, 228 | num_classes=None, 229 | is_training=True, 230 | global_pool=True, 231 | output_stride=None, 232 | spatial_squeeze=True, 233 | reuse=None, 234 | scope='resnet_v1_200'): 235 | """ResNet-200 model of [2]. See resnet_v1() for arg and return description.""" 236 | blocks = [ 237 | resnet_utils.Block( 238 | 'block1', bottleneck, [(256, 64, 1)] * 2 + [(256, 64, 2)]), 239 | resnet_utils.Block( 240 | 'block2', bottleneck, [(512, 128, 1)] * 23 + [(512, 128, 2)]), 241 | resnet_utils.Block( 242 | 'block3', bottleneck, [(1024, 256, 1)] * 35 + [(1024, 256, 2)]), 243 | resnet_utils.Block( 244 | 'block4', bottleneck, [(2048, 512, 1)] * 3)] 245 | return resnet_v1(inputs, blocks, num_classes, is_training, 246 | global_pool=global_pool, output_stride=output_stride, 247 | include_root_block=True, spatial_squeeze=spatial_squeeze, 248 | reuse=reuse, scope=scope) 249 | 250 | 251 | resnet_v1_200.default_image_size = resnet_v1.default_image_size 252 | 253 | 254 | if __name__ == '__main__': 255 | input = tf.placeholder(tf.float32, shape=(None, 224, 224, 3), name='input') 256 | with slim.arg_scope(resnet_arg_scope()) as sc: 257 | logits = resnet_v1_50(input) -------------------------------------------------------------------------------- /frontends/se_resnext.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | 3 | import math 4 | 5 | USE_FUSED_BN = True 6 | BN_EPSILON = 9.999999747378752e-06 7 | BN_MOMENTUM = 0.99 8 | 9 | VAR_LIST = [] 10 | 11 | # input image order: BGR, range [0-255] 12 | # mean_value: 104, 117, 123 13 | # only subtract mean is used 14 | def constant_xavier_initializer(shape, group, dtype=tf.float32, uniform=True): 15 | """Initializer function.""" 16 | if not dtype.is_floating: 17 | raise TypeError('Cannot create initializer for non-floating point type.') 18 | # Estimating fan_in and fan_out is not possible to do perfectly, but we try. 19 | # This is the right thing for matrix multiply and convolutions. 20 | if shape: 21 | fan_in = float(shape[-2]) if len(shape) > 1 else float(shape[-1]) 22 | fan_out = float(shape[-1])/group 23 | else: 24 | fan_in = 1.0 25 | fan_out = 1.0 26 | for dim in shape[:-2]: 27 | fan_in *= float(dim) 28 | fan_out *= float(dim) 29 | 30 | # Average number of inputs and output connections. 31 | n = (fan_in + fan_out) / 2.0 32 | if uniform: 33 | # To get stddev = math.sqrt(factor / n) need to adjust for uniform. 34 | limit = math.sqrt(3.0 * 1.0 / n) 35 | return tf.random_uniform(shape, -limit, limit, dtype, seed=None) 36 | else: 37 | # To get stddev = math.sqrt(factor / n) need to adjust for truncated. 38 | trunc_stddev = math.sqrt(1.3 * 1.0 / n) 39 | return tf.truncated_normal(shape, 0.0, trunc_stddev, dtype, seed=None) 40 | 41 | # for root block, use dummy input_filters, e.g. 128 rather than 64 for the first block 42 | def se_bottleneck_block(inputs, input_filters, name_prefix, is_training, group, data_format='channels_last', need_reduce=True, is_root=False, reduced_scale=16): 43 | bn_axis = -1 if data_format == 'channels_last' else 1 44 | strides_to_use = 1 45 | residuals = inputs 46 | if need_reduce: 47 | strides_to_use = 1 if is_root else 2 48 | proj_mapping = tf.layers.conv2d(inputs, input_filters, (1, 1), use_bias=False, 49 | name=name_prefix + '_1x1_proj', strides=(strides_to_use, strides_to_use), 50 | padding='valid', data_format=data_format, activation=None, 51 | kernel_initializer=tf.contrib.layers.xavier_initializer(), 52 | bias_initializer=tf.zeros_initializer()) 53 | residuals = tf.layers.batch_normalization(proj_mapping, momentum=BN_MOMENTUM, 54 | name=name_prefix + '_1x1_proj/bn', axis=bn_axis, 55 | epsilon=BN_EPSILON, training=is_training, reuse=None, fused=USE_FUSED_BN) 56 | 57 | reduced_inputs = tf.layers.conv2d(inputs, input_filters // 2, (1, 1), use_bias=False, 58 | name=name_prefix + '_1x1_reduce', strides=(1, 1), 59 | padding='valid', data_format=data_format, activation=None, 60 | kernel_initializer=tf.contrib.layers.xavier_initializer(), 61 | bias_initializer=tf.zeros_initializer()) 62 | reduced_inputs_bn = tf.layers.batch_normalization(reduced_inputs, momentum=BN_MOMENTUM, 63 | name=name_prefix + '_1x1_reduce/bn', axis=bn_axis, 64 | epsilon=BN_EPSILON, training=is_training, reuse=None, fused=USE_FUSED_BN) 65 | reduced_inputs_relu = tf.nn.relu(reduced_inputs_bn, name=name_prefix + '_1x1_reduce/relu') 66 | 67 | if data_format == 'channels_first': 68 | reduced_inputs_relu = tf.pad(reduced_inputs_relu, paddings = [[0, 0], [0, 0], [1, 1], [1, 1]]) 69 | weight_shape = [3, 3, reduced_inputs_relu.get_shape().as_list()[1]//group, input_filters // 2] 70 | weight_ = tf.Variable(constant_xavier_initializer(weight_shape, group=group, dtype=tf.float32), trainable=is_training, name=name_prefix + '_3x3/kernel') 71 | weight_groups = tf.split(weight_, num_or_size_splits=group, axis=-1, name=name_prefix + '_weight_split') 72 | xs = tf.split(reduced_inputs_relu, num_or_size_splits=group, axis=1, name=name_prefix + '_inputs_split') 73 | else: 74 | reduced_inputs_relu = tf.pad(reduced_inputs_relu, paddings = [[0, 0], [1, 1], [1, 1], [0, 0]]) 75 | weight_shape = [3, 3, reduced_inputs_relu.get_shape().as_list()[-1]//group, input_filters // 2] 76 | weight_ = tf.Variable(constant_xavier_initializer(weight_shape, group=group, dtype=tf.float32), trainable=is_training, name=name_prefix + '_3x3/kernel') 77 | weight_groups = tf.split(weight_, num_or_size_splits=group, axis=-1, name=name_prefix + '_weight_split') 78 | xs = tf.split(reduced_inputs_relu, num_or_size_splits=group, axis=-1, name=name_prefix + '_inputs_split') 79 | 80 | convolved = [tf.nn.convolution(x, weight, padding='VALID', strides=[strides_to_use, strides_to_use], name=name_prefix + '_group_conv', 81 | data_format=('NCHW' if data_format == 'channels_first' else 'NHWC')) for (x, weight) in zip(xs, weight_groups)] 82 | 83 | if data_format == 'channels_first': 84 | conv3_inputs = tf.concat(convolved, axis=1, name=name_prefix + '_concat') 85 | else: 86 | conv3_inputs = tf.concat(convolved, axis=-1, name=name_prefix + '_concat') 87 | 88 | conv3_inputs_bn = tf.layers.batch_normalization(conv3_inputs, momentum=BN_MOMENTUM, name=name_prefix + '_3x3/bn', 89 | axis=bn_axis, epsilon=BN_EPSILON, training=is_training, reuse=None, fused=USE_FUSED_BN) 90 | conv3_inputs_relu = tf.nn.relu(conv3_inputs_bn, name=name_prefix + '_3x3/relu') 91 | 92 | 93 | increase_inputs = tf.layers.conv2d(conv3_inputs_relu, input_filters, (1, 1), use_bias=False, 94 | name=name_prefix + '_1x1_increase', strides=(1, 1), 95 | padding='valid', data_format=data_format, activation=None, 96 | kernel_initializer=tf.contrib.layers.xavier_initializer(), 97 | bias_initializer=tf.zeros_initializer()) 98 | increase_inputs_bn = tf.layers.batch_normalization(increase_inputs, momentum=BN_MOMENTUM, 99 | name=name_prefix + '_1x1_increase/bn', axis=bn_axis, 100 | epsilon=BN_EPSILON, training=is_training, reuse=None, fused=USE_FUSED_BN) 101 | 102 | if data_format == 'channels_first': 103 | pooled_inputs = tf.reduce_mean(increase_inputs_bn, [2, 3], name=name_prefix + '_global_pool', keep_dims=True) 104 | else: 105 | pooled_inputs = tf.reduce_mean(increase_inputs_bn, [1, 2], name=name_prefix + '_global_pool', keep_dims=True) 106 | 107 | down_inputs = tf.layers.conv2d(pooled_inputs, input_filters // reduced_scale, (1, 1), use_bias=True, 108 | name=name_prefix + '_1x1_down', strides=(1, 1), 109 | padding='valid', data_format=data_format, activation=None, 110 | kernel_initializer=tf.contrib.layers.xavier_initializer(), 111 | bias_initializer=tf.zeros_initializer()) 112 | down_inputs_relu = tf.nn.relu(down_inputs, name=name_prefix + '_1x1_down/relu') 113 | 114 | up_inputs = tf.layers.conv2d(down_inputs_relu, input_filters, (1, 1), use_bias=True, 115 | name=name_prefix + '_1x1_up', strides=(1, 1), 116 | padding='valid', data_format=data_format, activation=None, 117 | kernel_initializer=tf.contrib.layers.xavier_initializer(), 118 | bias_initializer=tf.zeros_initializer()) 119 | prob_outputs = tf.nn.sigmoid(up_inputs, name=name_prefix + '_prob') 120 | 121 | rescaled_feat = tf.multiply(prob_outputs, increase_inputs_bn, name=name_prefix + '_mul') 122 | pre_act = tf.add(residuals, rescaled_feat, name=name_prefix + '_add') 123 | return tf.nn.relu(pre_act, name=name_prefix + '/relu') 124 | #return tf.nn.relu(residuals + prob_outputs * increase_inputs_bn, name=name_prefix + '/relu') 125 | 126 | def se_resnext(input_image, scope, is_training = False, group=16, data_format='channels_last', net_depth=50): 127 | end_points = dict() 128 | 129 | bn_axis = -1 if data_format == 'channels_last' else 1 130 | # the input image should in BGR order, note that this is not the common case in Tensorflow 131 | # convert from RGB to BGR 132 | if data_format == 'channels_last': 133 | image_channels = tf.unstack(input_image, axis=-1) 134 | swaped_input_image = tf.stack([image_channels[2], image_channels[1], image_channels[0]], axis=-1) 135 | else: 136 | image_channels = tf.unstack(input_image, axis=1) 137 | swaped_input_image = tf.stack([image_channels[2], image_channels[1], image_channels[0]], axis=1) 138 | #swaped_input_image = input_image 139 | 140 | if net_depth not in [50, 101]: 141 | raise TypeError('Only ResNeXt50 or ResNeXt101 are currently supported.') 142 | input_depth = [256, 512, 1024, 2048] # the input depth of the the first block is dummy input 143 | num_units = [3, 4, 6, 3] if net_depth==50 else [3, 4, 23, 3] 144 | 145 | block_name_prefix = ['conv2_{}', 'conv3_{}', 'conv4_{}', 'conv5_{}'] 146 | 147 | if data_format == 'channels_first': 148 | swaped_input_image = tf.pad(swaped_input_image, paddings = [[0, 0], [0, 0], [3, 3], [3, 3]]) 149 | else: 150 | swaped_input_image = tf.pad(swaped_input_image, paddings = [[0, 0], [3, 3], [3, 3], [0, 0]]) 151 | 152 | inputs_features = tf.layers.conv2d(swaped_input_image, input_depth[0]//4, (7, 7), use_bias=False, 153 | name='conv1/7x7_s2', strides=(2, 2), 154 | padding='valid', data_format=data_format, activation=None, 155 | kernel_initializer=tf.contrib.layers.xavier_initializer(), 156 | bias_initializer=tf.zeros_initializer()) 157 | VAR_LIST.append('conv1/7x7_s2') 158 | 159 | inputs_features = tf.layers.batch_normalization(inputs_features, momentum=BN_MOMENTUM, 160 | name='conv1/7x7_s2/bn', axis=bn_axis, 161 | epsilon=BN_EPSILON, training=is_training, reuse=None, fused=USE_FUSED_BN) 162 | inputs_features = tf.nn.relu(inputs_features, name='conv1/relu_7x7_s2') 163 | 164 | inputs_features = tf.layers.max_pooling2d(inputs_features, [3, 3], [2, 2], padding='same', data_format=data_format, name='pool1/3x3_s2') 165 | 166 | is_root = True 167 | for ind, num_unit in enumerate(num_units): 168 | need_reduce = True 169 | for unit_index in range(1, num_unit+1): 170 | inputs_features = se_bottleneck_block(inputs_features, input_depth[ind], block_name_prefix[ind].format(unit_index), is_training=is_training, group=group, data_format=data_format, need_reduce=need_reduce, is_root=is_root) 171 | need_reduce = False 172 | end_points['pool' + str(ind)] = inputs_features 173 | is_root = False 174 | 175 | if data_format == 'channels_first': 176 | pooled_inputs = tf.reduce_mean(inputs_features, [2, 3], name='pool5/7x7_s1', keep_dims=True) 177 | else: 178 | pooled_inputs = tf.reduce_mean(inputs_features, [1, 2], name='pool5/7x7_s1', keep_dims=True) 179 | 180 | pooled_inputs = tf.layers.flatten(pooled_inputs) 181 | 182 | # logits_output = tf.layers.dense(pooled_inputs, num_classes, 183 | # kernel_initializer=tf.contrib.layers.xavier_initializer(), 184 | # bias_initializer=tf.zeros_initializer(), use_bias=True) 185 | 186 | logits_output = None 187 | 188 | return logits_output, end_points, VAR_LIST 189 | -------------------------------------------------------------------------------- /iou_vs_epochs.png: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awangenh/Weed-Mapping/72526ebbc2abe3b9d35672689de25a321e36b039/iou_vs_epochs.png -------------------------------------------------------------------------------- /models/AdapNet.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | 3 | import tensorflow as tf 4 | from tensorflow.contrib import slim 5 | import numpy as np 6 | from frontends import resnet_v2 7 | import os, sys 8 | 9 | 10 | def Upsampling(inputs,scale): 11 | return tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*scale, tf.shape(inputs)[2]*scale]) 12 | 13 | def ConvBlock(inputs, n_filters, kernel_size=[3, 3], stride=1): 14 | """ 15 | Basic conv block for Encoder-Decoder 16 | Apply successivly Convolution, BatchNormalization, ReLU nonlinearity 17 | """ 18 | net = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 19 | net = slim.conv2d(net, n_filters, kernel_size, stride=stride, activation_fn=None, normalizer_fn=None) 20 | return net 21 | 22 | def ResNetBlock_1(inputs, filters_1, filters_2): 23 | net = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 24 | net = slim.conv2d(net, filters_1, [1, 1], activation_fn=None, normalizer_fn=None) 25 | 26 | net = tf.nn.relu(slim.batch_norm(net, fused=True)) 27 | net = slim.conv2d(net, filters_1, [3, 3], activation_fn=None, normalizer_fn=None) 28 | 29 | net = tf.nn.relu(slim.batch_norm(net, fused=True)) 30 | net = slim.conv2d(net, filters_2, [1, 1], activation_fn=None, normalizer_fn=None) 31 | 32 | net = tf.add(inputs, net) 33 | 34 | return net 35 | 36 | def ResNetBlock_2(inputs, filters_1, filters_2, s=1): 37 | net_1 = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 38 | net_1 = slim.conv2d(net_1, filters_1, [1, 1], stride=s, activation_fn=None, normalizer_fn=None) 39 | 40 | net_1 = tf.nn.relu(slim.batch_norm(net_1, fused=True)) 41 | net_1 = slim.conv2d(net_1, filters_1, [3, 3], activation_fn=None, normalizer_fn=None) 42 | 43 | net_1 = tf.nn.relu(slim.batch_norm(net_1, fused=True)) 44 | net_1 = slim.conv2d(net_1, filters_2, [1, 1], activation_fn=None, normalizer_fn=None) 45 | 46 | net_2 = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 47 | net_2 = slim.conv2d(net_2, filters_2, [1, 1], stride=s, activation_fn=None, normalizer_fn=None) 48 | 49 | net = tf.add(net_1, net_2) 50 | 51 | return net 52 | 53 | 54 | def MultiscaleBlock_1(inputs, filters_1, filters_2, filters_3, p, d): 55 | net = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 56 | net = slim.conv2d(net, filters_1, [1, 1], activation_fn=None, normalizer_fn=None) 57 | 58 | scale_1 = tf.nn.relu(slim.batch_norm(net, fused=True)) 59 | scale_1 = slim.conv2d(scale_1, filters_3 // 2, [3, 3], rate=p, activation_fn=None, normalizer_fn=None) 60 | scale_2 = tf.nn.relu(slim.batch_norm(net, fused=True)) 61 | scale_2 = slim.conv2d(scale_2, filters_3 // 2, [3, 3], rate=d, activation_fn=None, normalizer_fn=None) 62 | net = tf.concat((scale_1, scale_2), axis=-1) 63 | 64 | net = tf.nn.relu(slim.batch_norm(net, fused=True)) 65 | net = slim.conv2d(net, filters_2, [1, 1], activation_fn=None, normalizer_fn=None) 66 | 67 | net = tf.add(inputs, net) 68 | 69 | return net 70 | 71 | 72 | def MultiscaleBlock_2(inputs, filters_1, filters_2, filters_3, p, d): 73 | net_1 = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 74 | net_1 = slim.conv2d(net_1, filters_1, [1, 1], activation_fn=None, normalizer_fn=None) 75 | 76 | scale_1 = tf.nn.relu(slim.batch_norm(net_1, fused=True)) 77 | scale_1 = slim.conv2d(scale_1, filters_3 // 2, [3, 3], rate=p, activation_fn=None, normalizer_fn=None) 78 | scale_2 = tf.nn.relu(slim.batch_norm(net_1, fused=True)) 79 | scale_2 = slim.conv2d(scale_2, filters_3 // 2, [3, 3], rate=d, activation_fn=None, normalizer_fn=None) 80 | net_1 = tf.concat((scale_1, scale_2), axis=-1) 81 | 82 | net_1 = tf.nn.relu(slim.batch_norm(net_1, fused=True)) 83 | net_1 = slim.conv2d(net_1, filters_2, [1, 1], activation_fn=None, normalizer_fn=None) 84 | 85 | net_2 = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 86 | net_2 = slim.conv2d(net_2, filters_2, [1, 1], activation_fn=None, normalizer_fn=None) 87 | 88 | net = tf.add(net_1, net_2) 89 | 90 | return net 91 | 92 | 93 | 94 | 95 | 96 | 97 | def build_adaptnet(inputs, num_classes): 98 | """ 99 | Builds the AdaptNet model. 100 | 101 | Arguments: 102 | inputs: The input tensor= 103 | preset_model: Which model you want to use. Select which ResNet model to use for feature extraction 104 | num_classes: Number of classes 105 | 106 | Returns: 107 | AdaptNet model 108 | """ 109 | net = ConvBlock(inputs, n_filters=64, kernel_size=[3, 3]) 110 | net = ConvBlock(net, n_filters=64, kernel_size=[7, 7], stride=2) 111 | net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX') 112 | 113 | net = ResNetBlock_2(net, filters_1=64, filters_2=256, s=1) 114 | net = ResNetBlock_1(net, filters_1=64, filters_2=256) 115 | net = ResNetBlock_1(net, filters_1=64, filters_2=256) 116 | 117 | net = ResNetBlock_2(net, filters_1=128, filters_2=512, s=2) 118 | net = ResNetBlock_1(net, filters_1=128, filters_2=512) 119 | net = ResNetBlock_1(net, filters_1=128, filters_2=512) 120 | 121 | skip_connection = ConvBlock(net, n_filters=12, kernel_size=[1, 1]) 122 | 123 | 124 | net = MultiscaleBlock_1(net, filters_1=128, filters_2=512, filters_3=64, p=1, d=2) 125 | 126 | net = ResNetBlock_2(net, filters_1=256, filters_2=1024, s=2) 127 | net = ResNetBlock_1(net, filters_1=256, filters_2=1024) 128 | net = MultiscaleBlock_1(net, filters_1=256, filters_2=1024, filters_3=64, p=1, d=2) 129 | net = MultiscaleBlock_1(net, filters_1=256, filters_2=1024, filters_3=64, p=1, d=4) 130 | net = MultiscaleBlock_1(net, filters_1=256, filters_2=1024, filters_3=64, p=1, d=8) 131 | net = MultiscaleBlock_1(net, filters_1=256, filters_2=1024, filters_3=64, p=1, d=16) 132 | 133 | net = MultiscaleBlock_2(net, filters_1=512, filters_2=2048, filters_3=512, p=2, d=4) 134 | net = MultiscaleBlock_1(net, filters_1=512, filters_2=2048, filters_3=512, p=2, d=8) 135 | net = MultiscaleBlock_1(net, filters_1=512, filters_2=2048, filters_3=512, p=2, d=16) 136 | 137 | net = ConvBlock(net, n_filters=12, kernel_size=[1, 1]) 138 | net = Upsampling(net, scale=2) 139 | 140 | net = tf.add(skip_connection, net) 141 | 142 | net = Upsampling(net, scale=8) 143 | 144 | 145 | 146 | net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits') 147 | 148 | return net 149 | 150 | 151 | def mean_image_subtraction(inputs, means=[123.68, 116.78, 103.94]): 152 | inputs=tf.to_float(inputs) 153 | num_channels = inputs.get_shape().as_list()[-1] 154 | if len(means) != num_channels: 155 | raise ValueError('len(means) must match the number of channels') 156 | channels = tf.split(axis=3, num_or_size_splits=num_channels, value=inputs) 157 | for i in range(num_channels): 158 | channels[i] -= means[i] 159 | return tf.concat(axis=3, values=channels) -------------------------------------------------------------------------------- /models/BiSeNet.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | 3 | import tensorflow as tf 4 | from tensorflow.contrib import slim 5 | from builders import frontend_builder 6 | import numpy as np 7 | import os, sys 8 | 9 | def Upsampling(inputs,scale): 10 | return tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*scale, tf.shape(inputs)[2]*scale]) 11 | 12 | def ConvUpscaleBlock(inputs, n_filters, kernel_size=[3, 3], scale=2): 13 | """ 14 | Basic conv transpose block for Encoder-Decoder upsampling 15 | Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity 16 | """ 17 | net = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 18 | net = slim.conv2d_transpose(net, n_filters, kernel_size=[3, 3], stride=[scale, scale], activation_fn=None) 19 | return net 20 | 21 | def ConvBlock(inputs, n_filters, kernel_size=[3, 3], strides=1): 22 | """ 23 | Basic conv block for Encoder-Decoder 24 | Apply successivly Convolution, BatchNormalization, ReLU nonlinearity 25 | """ 26 | net = slim.conv2d(inputs, n_filters, kernel_size, stride=[strides, strides], activation_fn=None, normalizer_fn=None) 27 | net = tf.nn.relu(slim.batch_norm(net, fused=True)) 28 | return net 29 | 30 | def AttentionRefinementModule(inputs, n_filters): 31 | 32 | # Global average pooling 33 | net = tf.reduce_mean(inputs, [1, 2], keep_dims=True) 34 | 35 | net = slim.conv2d(net, n_filters, kernel_size=[1, 1]) 36 | net = slim.batch_norm(net, fused=True) 37 | net = tf.sigmoid(net) 38 | 39 | net = tf.multiply(inputs, net) 40 | 41 | return net 42 | 43 | def FeatureFusionModule(input_1, input_2, n_filters): 44 | inputs = tf.concat([input_1, input_2], axis=-1) 45 | inputs = ConvBlock(inputs, n_filters=n_filters, kernel_size=[3, 3]) 46 | 47 | # Global average pooling 48 | net = tf.reduce_mean(inputs, [1, 2], keep_dims=True) 49 | 50 | net = slim.conv2d(net, n_filters, kernel_size=[1, 1]) 51 | net = tf.nn.relu(net) 52 | net = slim.conv2d(net, n_filters, kernel_size=[1, 1]) 53 | net = tf.sigmoid(net) 54 | 55 | net = tf.multiply(inputs, net) 56 | 57 | net = tf.add(inputs, net) 58 | 59 | return net 60 | 61 | 62 | def build_bisenet(inputs, num_classes, preset_model='BiSeNet', frontend="ResNet101", weight_decay=1e-5, is_training=True, pretrained_dir="models"): 63 | """ 64 | Builds the BiSeNet model. 65 | 66 | Arguments: 67 | inputs: The input tensor= 68 | preset_model: Which model you want to use. Select which ResNet model to use for feature extraction 69 | num_classes: Number of classes 70 | 71 | Returns: 72 | BiSeNet model 73 | """ 74 | 75 | ### The spatial path 76 | ### The number of feature maps for each convolution is not specified in the paper 77 | ### It was chosen here to be equal to the number of feature maps of a classification 78 | ### model at each corresponding stage 79 | spatial_net = ConvBlock(inputs, n_filters=64, kernel_size=[3, 3], strides=2) 80 | spatial_net = ConvBlock(spatial_net, n_filters=128, kernel_size=[3, 3], strides=2) 81 | spatial_net = ConvBlock(spatial_net, n_filters=256, kernel_size=[3, 3], strides=2) 82 | 83 | 84 | ### Context path 85 | logits, end_points, frontend_scope, init_fn = frontend_builder.build_frontend(inputs, frontend, pretrained_dir=pretrained_dir, is_training=is_training) 86 | 87 | net_4 = AttentionRefinementModule(end_points['pool4'], n_filters=512) 88 | 89 | net_5 = AttentionRefinementModule(end_points['pool5'], n_filters=2048) 90 | 91 | global_channels = tf.reduce_mean(net_5, [1, 2], keep_dims=True) 92 | net_5_scaled = tf.multiply(global_channels, net_5) 93 | 94 | ### Combining the paths 95 | net_4 = Upsampling(net_4, scale=2) 96 | net_5_scaled = Upsampling(net_5_scaled, scale=4) 97 | 98 | context_net = tf.concat([net_4, net_5_scaled], axis=-1) 99 | 100 | net = FeatureFusionModule(input_1=spatial_net, input_2=context_net, n_filters=num_classes) 101 | 102 | 103 | ### Final upscaling and finish 104 | net = Upsampling(net, scale=8) 105 | 106 | net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits') 107 | 108 | return net, init_fn 109 | 110 | -------------------------------------------------------------------------------- /models/DDSC.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | 3 | import tensorflow as tf 4 | from tensorflow.contrib import slim 5 | from builders import frontend_builder 6 | import numpy as np 7 | import os, sys 8 | 9 | def Upsampling(inputs,scale): 10 | return tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*scale, tf.shape(inputs)[2]*scale]) 11 | 12 | def ConvUpscaleBlock(inputs, n_filters, kernel_size=[3, 3], scale=2): 13 | """ 14 | Basic conv transpose block for Encoder-Decoder upsampling 15 | Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity 16 | """ 17 | net = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 18 | net = slim.conv2d_transpose(net, n_filters, kernel_size=[3, 3], stride=[scale, scale], activation_fn=None) 19 | return net 20 | 21 | def ConvBlock(inputs, n_filters, kernel_size=[3, 3]): 22 | """ 23 | Basic conv block for Encoder-Decoder 24 | Apply successivly Convolution, BatchNormalization, ReLU nonlinearity 25 | """ 26 | net = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 27 | net = slim.conv2d(net, n_filters, kernel_size, activation_fn=None, normalizer_fn=None) 28 | return net 29 | 30 | def GroupedConvolutionBlock(inputs, grouped_channels, cardinality=32): 31 | group_list = [] 32 | 33 | net = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 34 | 35 | for c in range(cardinality): 36 | x = net[:, :, :, c * grouped_channels:(c + 1) * grouped_channels] 37 | 38 | x = slim.conv2d(x, grouped_channels, kernel_size=[3, 3]) 39 | 40 | group_list.append(x) 41 | 42 | group_merge = tf.concat(group_list, axis=-1) 43 | 44 | return group_merge 45 | 46 | def ResNeXtBlock(inputs, n_filters_out, bottleneck_factor=2, cardinality=32): 47 | 48 | assert not (n_filters_out // 2) % cardinality 49 | grouped_channels = (n_filters_out // 2) // cardinality 50 | 51 | net = ConvBlock(inputs, n_filters=n_filters_out / bottleneck_factor, kernel_size=[1, 1]) 52 | net = GroupedConvolutionBlock(net, grouped_channels, cardinality=32) 53 | net = ConvBlock(net, n_filters=n_filters_out, kernel_size=[1, 1]) 54 | 55 | 56 | net = tf.add(inputs, net) 57 | 58 | return net 59 | 60 | def EncoderAdaptionBlock(inputs, n_filters, bottleneck_factor=2, cardinality=32): 61 | 62 | net = ConvBlock(inputs, n_filters, kernel_size=[3, 3]) 63 | net = ResNeXtBlock(net, n_filters_out=n_filters, bottleneck_factor=bottleneck_factor) 64 | net = ResNeXtBlock(net, n_filters_out=n_filters, bottleneck_factor=bottleneck_factor) 65 | net = ResNeXtBlock(net, n_filters_out=n_filters, bottleneck_factor=bottleneck_factor) 66 | net = ConvBlock(net, n_filters, kernel_size=[3, 3]) 67 | 68 | return net 69 | 70 | 71 | def SemanticFeatureGenerationBlock(inputs, D_features, D_prime_features, O_features, bottleneck_factor=2, cardinality=32): 72 | 73 | d_1 = ConvBlock(inputs, D_features, kernel_size=[3, 3]) 74 | pool_1 = slim.pool(d_1, [5, 5], stride=[1, 1], pooling_type='MAX') 75 | d_prime_1 = ConvBlock(pool_1, D_prime_features, kernel_size=[3, 3]) 76 | 77 | d_2 = ConvBlock(pool_1, D_features, kernel_size=[3, 3]) 78 | pool_2 = slim.pool(d_2, [5, 5], stride=[1, 1], pooling_type='MAX') 79 | d_prime_2 = ConvBlock(pool_2, D_prime_features, kernel_size=[3, 3]) 80 | 81 | d_3 = ConvBlock(pool_2, D_features, kernel_size=[3, 3]) 82 | pool_3 = slim.pool(d_3, [5, 5], stride=[1, 1], pooling_type='MAX') 83 | d_prime_3 = ConvBlock(pool_3, D_prime_features, kernel_size=[3, 3]) 84 | 85 | d_4 = ConvBlock(pool_3, D_features, kernel_size=[3, 3]) 86 | pool_4 = slim.pool(d_4, [5, 5], stride=[1, 1], pooling_type='MAX') 87 | d_prime_4 = ConvBlock(pool_4, D_prime_features, kernel_size=[3, 3]) 88 | 89 | 90 | net = tf.concat([d_prime_1, d_prime_2, d_prime_3, d_prime_4], axis=-1) 91 | 92 | net = ConvBlock(net, n_filters=D_features, kernel_size=[3, 3]) 93 | 94 | net = ResNeXtBlock(net, n_filters_out=D_features, bottleneck_factor=bottleneck_factor) 95 | net = ResNeXtBlock(net, n_filters_out=D_features, bottleneck_factor=bottleneck_factor) 96 | net = ResNeXtBlock(net, n_filters_out=D_features, bottleneck_factor=bottleneck_factor) 97 | net = ResNeXtBlock(net, n_filters_out=D_features, bottleneck_factor=bottleneck_factor) 98 | 99 | net = ConvBlock(net, O_features, kernel_size=[3, 3]) 100 | 101 | return net 102 | 103 | 104 | 105 | def build_ddsc(inputs, num_classes, preset_model='DDSC', frontend="ResNet101", weight_decay=1e-5, is_training=True, pretrained_dir="models"): 106 | """ 107 | Builds the Dense Decoder Shortcut Connections model. 108 | 109 | Arguments: 110 | inputs: The input tensor= 111 | preset_model: Which model you want to use. Select which ResNet model to use for feature extraction 112 | num_classes: Number of classes 113 | 114 | Returns: 115 | Dense Decoder Shortcut Connections model 116 | """ 117 | 118 | logits, end_points, frontend_scope, init_fn = frontend_builder.build_frontend(inputs, frontend, pretrained_dir=pretrained_dir, is_training=is_training) 119 | 120 | ### Adapting features for all stages 121 | decoder_4 = EncoderAdaptionBlock(end_points['pool5'], n_filters=1024) 122 | decoder_3 = EncoderAdaptionBlock(end_points['pool4'], n_filters=512) 123 | decoder_2 = EncoderAdaptionBlock(end_points['pool3'], n_filters=256) 124 | decoder_1 = EncoderAdaptionBlock(end_points['pool2'], n_filters=128) 125 | 126 | decoder_4 = SemanticFeatureGenerationBlock(decoder_4, D_features=1024, D_prime_features = 1024 / 4, O_features=1024) 127 | 128 | ### Fusing features from 3 and 4 129 | decoder_4 = ConvBlock(decoder_4, n_filters=512, kernel_size=[3, 3]) 130 | decoder_4 = Upsampling(decoder_4, scale=2) 131 | 132 | decoder_3 = ConvBlock(decoder_3, n_filters=512, kernel_size=[3, 3]) 133 | 134 | decoder_3 = tf.add_n([decoder_4, decoder_3]) 135 | 136 | decoder_3 = SemanticFeatureGenerationBlock(decoder_3, D_features=512, D_prime_features = 512 / 4, O_features=512) 137 | 138 | ### Fusing features from 2, 3, 4 139 | decoder_4 = ConvBlock(decoder_4, n_filters=256, kernel_size=[3, 3]) 140 | decoder_4 = Upsampling(decoder_4, scale=4) 141 | 142 | decoder_3 = ConvBlock(decoder_3, n_filters=256, kernel_size=[3, 3]) 143 | decoder_3 = Upsampling(decoder_3, scale=2) 144 | 145 | decoder_2 = ConvBlock(decoder_2, n_filters=256, kernel_size=[3, 3]) 146 | 147 | decoder_2 = tf.add_n([decoder_4, decoder_3, decoder_2]) 148 | 149 | decoder_2 = SemanticFeatureGenerationBlock(decoder_2, D_features=256, D_prime_features = 256 / 4, O_features=256) 150 | 151 | ### Fusing features from 1, 2, 3, 4 152 | decoder_4 = ConvBlock(decoder_4, n_filters=128, kernel_size=[3, 3]) 153 | decoder_4 = Upsampling(decoder_4, scale=8) 154 | 155 | decoder_3 = ConvBlock(decoder_3, n_filters=128, kernel_size=[3, 3]) 156 | decoder_3 = Upsampling(decoder_3, scale=4) 157 | 158 | decoder_2 = ConvBlock(decoder_2, n_filters=128, kernel_size=[3, 3]) 159 | decoder_2 = Upsampling(decoder_2, scale=2) 160 | 161 | decoder_1 = ConvBlock(decoder_1, n_filters=128, kernel_size=[3, 3]) 162 | 163 | decoder_1 = tf.add_n([decoder_4, decoder_3, decoder_2, decoder_1]) 164 | 165 | decoder_1 = SemanticFeatureGenerationBlock(decoder_1, D_features=128, D_prime_features = 128 / 4, O_features=num_classes) 166 | 167 | 168 | ### Final upscaling and finish 169 | net = Upsampling(decoder_1, scale=4) 170 | 171 | net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits') 172 | 173 | return net, init_fn 174 | 175 | -------------------------------------------------------------------------------- /models/DeepLabV3.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | 3 | import tensorflow as tf 4 | from tensorflow.contrib import slim 5 | import numpy as np 6 | from builders import frontend_builder 7 | import os, sys 8 | 9 | def Upsampling(inputs,feature_map_shape): 10 | return tf.image.resize_bilinear(inputs, size=feature_map_shape) 11 | 12 | def ConvUpscaleBlock(inputs, n_filters, kernel_size=[3, 3], scale=2): 13 | """ 14 | Basic conv transpose block for Encoder-Decoder upsampling 15 | Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity 16 | """ 17 | net = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 18 | net = slim.conv2d_transpose(net, n_filters, kernel_size=[3, 3], stride=[scale, scale], activation_fn=None) 19 | return net 20 | 21 | def ConvBlock(inputs, n_filters, kernel_size=[3, 3]): 22 | """ 23 | Basic conv block for Encoder-Decoder 24 | Apply successivly Convolution, BatchNormalization, ReLU nonlinearity 25 | """ 26 | net = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 27 | net = slim.conv2d(net, n_filters, kernel_size, activation_fn=None, normalizer_fn=None) 28 | return net 29 | 30 | def AtrousSpatialPyramidPoolingModule(inputs, depth=256): 31 | """ 32 | 33 | ASPP consists of (a) one 1×1 convolution and three 3×3 convolutions with rates = (6, 12, 18) when output stride = 16 34 | (all with 256 filters and batch normalization), and (b) the image-level features as described in the paper 35 | 36 | """ 37 | 38 | feature_map_size = tf.shape(inputs) 39 | 40 | # Global average pooling 41 | image_features = tf.reduce_mean(inputs, [1, 2], keep_dims=True) 42 | 43 | image_features = slim.conv2d(image_features, depth, [1, 1], activation_fn=None) 44 | image_features = tf.image.resize_bilinear(image_features, (feature_map_size[1], feature_map_size[2])) 45 | 46 | atrous_pool_block_1 = slim.conv2d(inputs, depth, [1, 1], activation_fn=None) 47 | 48 | atrous_pool_block_6 = slim.conv2d(inputs, depth, [3, 3], rate=6, activation_fn=None) 49 | 50 | atrous_pool_block_12 = slim.conv2d(inputs, depth, [3, 3], rate=12, activation_fn=None) 51 | 52 | atrous_pool_block_18 = slim.conv2d(inputs, depth, [3, 3], rate=18, activation_fn=None) 53 | 54 | net = tf.concat((image_features, atrous_pool_block_1, atrous_pool_block_6, atrous_pool_block_12, atrous_pool_block_18), axis=3) 55 | net = slim.conv2d(net, depth, [1, 1], scope="conv_1x1_output", activation_fn=None) 56 | 57 | return net 58 | 59 | 60 | 61 | 62 | 63 | def build_deeplabv3(inputs, num_classes, preset_model='DeepLabV3', frontend="Res101", weight_decay=1e-5, is_training=True, pretrained_dir="models"): 64 | """ 65 | Builds the DeepLabV3 model. 66 | 67 | Arguments: 68 | inputs: The input tensor= 69 | preset_model: Which model you want to use. Select which ResNet model to use for feature extraction 70 | num_classes: Number of classes 71 | 72 | Returns: 73 | DeepLabV3 model 74 | """ 75 | 76 | logits, end_points, frontend_scope, init_fn = frontend_builder.build_frontend(inputs, frontend, pretrained_dir=pretrained_dir, is_training=is_training) 77 | 78 | label_size = tf.shape(inputs)[1:3] 79 | 80 | net = AtrousSpatialPyramidPoolingModule(end_points['pool4']) 81 | 82 | net = Upsampling(net, label_size) 83 | 84 | net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits') 85 | 86 | return net, init_fn 87 | 88 | 89 | def mean_image_subtraction(inputs, means=[123.68, 116.78, 103.94]): 90 | inputs=tf.to_float(inputs) 91 | num_channels = inputs.get_shape().as_list()[-1] 92 | if len(means) != num_channels: 93 | raise ValueError('len(means) must match the number of channels') 94 | channels = tf.split(axis=3, num_or_size_splits=num_channels, value=inputs) 95 | for i in range(num_channels): 96 | channels[i] -= means[i] 97 | return tf.concat(axis=3, values=channels) -------------------------------------------------------------------------------- /models/DeepLabV3_plus.py: -------------------------------------------------------------------------------- 1 | # coding=utf-8 2 | 3 | import tensorflow as tf 4 | from tensorflow.contrib import slim 5 | from builders import frontend_builder 6 | import numpy as np 7 | import os, sys 8 | 9 | def Upsampling(inputs,feature_map_shape): 10 | return tf.image.resize_bilinear(inputs, size=tf.cast(feature_map_shape, tf.int32)) 11 | 12 | def ConvUpscaleBlock(inputs, n_filters, kernel_size=[3, 3], scale=2): 13 | """ 14 | Basic conv transpose block for Encoder-Decoder upsampling 15 | Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity 16 | """ 17 | net = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 18 | net = slim.conv2d_transpose(net, n_filters, kernel_size=[3, 3], stride=[scale, scale], activation_fn=None) 19 | return net 20 | 21 | def ConvBlock(inputs, n_filters, kernel_size=[3, 3]): 22 | """ 23 | Basic conv block for Encoder-Decoder 24 | Apply successivly Convolution, BatchNormalization, ReLU nonlinearity 25 | """ 26 | net = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 27 | net = slim.conv2d(net, n_filters, kernel_size, activation_fn=None, normalizer_fn=None) 28 | return net 29 | 30 | def AtrousSpatialPyramidPoolingModule(inputs, depth=256): 31 | """ 32 | 33 | ASPP consists of (a) one 1×1 convolution and three 3×3 convolutions with rates = (6, 12, 18) when output stride = 16 34 | (all with 256 filters and batch normalization), and (b) the image-level features as described in the paper 35 | 36 | """ 37 | 38 | feature_map_size = tf.shape(inputs) 39 | 40 | # Global average pooling 41 | image_features = tf.reduce_mean(inputs, [1, 2], keep_dims=True) 42 | 43 | image_features = slim.conv2d(image_features, depth, [1, 1], activation_fn=None) 44 | image_features = tf.image.resize_bilinear(image_features, (feature_map_size[1], feature_map_size[2])) 45 | 46 | atrous_pool_block_1 = slim.conv2d(inputs, depth, [1, 1], activation_fn=None) 47 | 48 | atrous_pool_block_6 = slim.conv2d(inputs, depth, [3, 3], rate=6, activation_fn=None) 49 | 50 | atrous_pool_block_12 = slim.conv2d(inputs, depth, [3, 3], rate=12, activation_fn=None) 51 | 52 | atrous_pool_block_18 = slim.conv2d(inputs, depth, [3, 3], rate=18, activation_fn=None) 53 | 54 | net = tf.concat((image_features, atrous_pool_block_1, atrous_pool_block_6, atrous_pool_block_12, atrous_pool_block_18), axis=3) 55 | 56 | return net 57 | 58 | 59 | 60 | 61 | 62 | def build_deeplabv3_plus(inputs, num_classes, preset_model='DeepLabV3+', frontend="ResNet101", weight_decay=1e-5, is_training=True, pretrained_dir="models"): 63 | """ 64 | Builds the DeepLabV3 model. 65 | 66 | Arguments: 67 | inputs: The input tensor= 68 | preset_model: Which model you want to use. Select which ResNet model to use for feature extraction 69 | num_classes: Number of classes 70 | 71 | Returns: 72 | DeepLabV3 model 73 | """ 74 | 75 | logits, end_points, frontend_scope, init_fn = frontend_builder.build_frontend(inputs, frontend, pretrained_dir=pretrained_dir, is_training=is_training) 76 | 77 | 78 | label_size = tf.shape(inputs)[1:3] 79 | 80 | encoder_features = end_points['pool2'] 81 | 82 | net = AtrousSpatialPyramidPoolingModule(end_points['pool4']) 83 | net = slim.conv2d(net, 256, [1, 1], scope="conv_1x1_output", activation_fn=None) 84 | decoder_features = Upsampling(net, label_size / 4) 85 | 86 | encoder_features = slim.conv2d(encoder_features, 48, [1, 1], activation_fn=tf.nn.relu, normalizer_fn=None) 87 | 88 | net = tf.concat((encoder_features, decoder_features), axis=3) 89 | 90 | net = slim.conv2d(net, 256, [3, 3], activation_fn=tf.nn.relu, normalizer_fn=None) 91 | net = slim.conv2d(net, 256, [3, 3], activation_fn=tf.nn.relu, normalizer_fn=None) 92 | 93 | net = Upsampling(net, label_size) 94 | 95 | net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits') 96 | 97 | return net, init_fn 98 | 99 | 100 | def mean_image_subtraction(inputs, means=[123.68, 116.78, 103.94]): 101 | inputs=tf.to_float(inputs) 102 | num_channels = inputs.get_shape().as_list()[-1] 103 | if len(means) != num_channels: 104 | raise ValueError('len(means) must match the number of channels') 105 | channels = tf.split(axis=3, num_or_size_splits=num_channels, value=inputs) 106 | for i in range(num_channels): 107 | channels[i] -= means[i] 108 | return tf.concat(axis=3, values=channels) 109 | -------------------------------------------------------------------------------- /models/DenseASPP.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow.contrib import slim 3 | from builders import frontend_builder 4 | import os, sys 5 | 6 | 7 | def Upsampling(inputs,scale): 8 | return tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*scale, tf.shape(inputs)[2]*scale]) 9 | 10 | 11 | 12 | def DilatedConvBlock(inputs, n_filters, rate=1, kernel_size=[3, 3]): 13 | """ 14 | Basic dilated conv block 15 | Apply successivly BatchNormalization, ReLU nonlinearity, dilated convolution 16 | """ 17 | net = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 18 | net = slim.conv2d(net, n_filters, kernel_size, rate=rate, activation_fn=None, normalizer_fn=None) 19 | return net 20 | 21 | 22 | 23 | def build_dense_aspp(inputs, num_classes, preset_model='DenseASPP', frontend="ResNet101", weight_decay=1e-5, is_training=True, pretrained_dir="models"): 24 | 25 | 26 | logits, end_points, frontend_scope, init_fn = frontend_builder.build_frontend(inputs, frontend, pretrained_dir=pretrained_dir, is_training=is_training) 27 | 28 | init_features = end_points['pool3'] 29 | 30 | ### First block, rate = 3 31 | d_3_features = DilatedConvBlock(init_features, n_filters=256, kernel_size=[1, 1]) 32 | d_3 = DilatedConvBlock(d_3_features, n_filters=64, rate=3, kernel_size=[3, 3]) 33 | 34 | ### Second block, rate = 6 35 | d_4 = tf.concat([init_features, d_3], axis=-1) 36 | d_4 = DilatedConvBlock(d_4, n_filters=256, kernel_size=[1, 1]) 37 | d_4 = DilatedConvBlock(d_4, n_filters=64, rate=6, kernel_size=[3, 3]) 38 | 39 | ### Third block, rate = 12 40 | d_5 = tf.concat([init_features, d_3, d_4], axis=-1) 41 | d_5 = DilatedConvBlock(d_5, n_filters=256, kernel_size=[1, 1]) 42 | d_5 = DilatedConvBlock(d_5, n_filters=64, rate=12, kernel_size=[3, 3]) 43 | 44 | ### Fourth block, rate = 18 45 | d_6 = tf.concat([init_features, d_3, d_4, d_5], axis=-1) 46 | d_6 = DilatedConvBlock(d_6, n_filters=256, kernel_size=[1, 1]) 47 | d_6 = DilatedConvBlock(d_6, n_filters=64, rate=18, kernel_size=[3, 3]) 48 | 49 | ### Fifth block, rate = 24 50 | d_7 = tf.concat([init_features, d_3, d_4, d_5, d_6], axis=-1) 51 | d_7 = DilatedConvBlock(d_7, n_filters=256, kernel_size=[1, 1]) 52 | d_7 = DilatedConvBlock(d_7, n_filters=64, rate=24, kernel_size=[3, 3]) 53 | 54 | full_block = tf.concat([init_features, d_3, d_4, d_5, d_6, d_7], axis=-1) 55 | 56 | net = slim.conv2d(full_block, num_classes, [1, 1], activation_fn=None, scope='logits') 57 | 58 | net = Upsampling(net, scale=8) 59 | 60 | return net, init_fn -------------------------------------------------------------------------------- /models/Encoder_Decoder.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | import os,time,cv2 3 | import tensorflow as tf 4 | import tensorflow.contrib.slim as slim 5 | import numpy as np 6 | 7 | def conv_block(inputs, n_filters, kernel_size=[3, 3], dropout_p=0.0): 8 | """ 9 | Basic conv block for Encoder-Decoder 10 | Apply successivly Convolution, BatchNormalization, ReLU nonlinearity 11 | Dropout (if dropout_p > 0) on the inputs 12 | """ 13 | conv = slim.conv2d(inputs, n_filters, kernel_size, activation_fn=None, normalizer_fn=None) 14 | out = tf.nn.relu(slim.batch_norm(conv, fused=True)) 15 | if dropout_p != 0.0: 16 | out = slim.dropout(out, keep_prob=(1.0-dropout_p)) 17 | return out 18 | 19 | def conv_transpose_block(inputs, n_filters, kernel_size=[3, 3], dropout_p=0.0): 20 | """ 21 | Basic conv transpose block for Encoder-Decoder upsampling 22 | Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity 23 | Dropout (if dropout_p > 0) on the inputs 24 | """ 25 | conv = slim.conv2d_transpose(inputs, n_filters, kernel_size=[3, 3], stride=[2, 2], activation_fn=None) 26 | out = tf.nn.relu(slim.batch_norm(conv)) 27 | if dropout_p != 0.0: 28 | out = slim.dropout(out, keep_prob=(1.0-dropout_p)) 29 | return out 30 | 31 | def build_encoder_decoder(inputs, num_classes, preset_model = "Encoder-Decoder", dropout_p=0.5, scope=None): 32 | """ 33 | Builds the Encoder-Decoder model. Inspired by SegNet with some modifications 34 | Optionally includes skip connections 35 | 36 | Arguments: 37 | inputs: the input tensor 38 | n_classes: number of classes 39 | dropout_p: dropout rate applied after each convolution (0. for not using) 40 | 41 | Returns: 42 | Encoder-Decoder model 43 | """ 44 | 45 | 46 | if preset_model == "Encoder-Decoder": 47 | has_skip = False 48 | elif preset_model == "Encoder-Decoder-Skip": 49 | has_skip = True 50 | else: 51 | raise ValueError("Unsupported Encoder-Decoder model '%s'. This function only supports Encoder-Decoder and Encoder-Decoder-Skip" % (preset_model)) 52 | 53 | ##################### 54 | # Downsampling path # 55 | ##################### 56 | net = conv_block(inputs, 64) 57 | net = conv_block(net, 64) 58 | net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX') 59 | skip_1 = net 60 | 61 | net = conv_block(net, 128) 62 | net = conv_block(net, 128) 63 | net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX') 64 | skip_2 = net 65 | 66 | net = conv_block(net, 256) 67 | net = conv_block(net, 256) 68 | net = conv_block(net, 256) 69 | net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX') 70 | skip_3 = net 71 | 72 | net = conv_block(net, 512) 73 | net = conv_block(net, 512) 74 | net = conv_block(net, 512) 75 | net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX') 76 | skip_4 = net 77 | 78 | net = conv_block(net, 512) 79 | net = conv_block(net, 512) 80 | net = conv_block(net, 512) 81 | net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX') 82 | 83 | 84 | ##################### 85 | # Upsampling path # 86 | ##################### 87 | net = conv_transpose_block(net, 512) 88 | net = conv_block(net, 512) 89 | net = conv_block(net, 512) 90 | net = conv_block(net, 512) 91 | if has_skip: 92 | net = tf.add(net, skip_4) 93 | 94 | net = conv_transpose_block(net, 512) 95 | net = conv_block(net, 512) 96 | net = conv_block(net, 512) 97 | net = conv_block(net, 256) 98 | if has_skip: 99 | net = tf.add(net, skip_3) 100 | 101 | net = conv_transpose_block(net, 256) 102 | net = conv_block(net, 256) 103 | net = conv_block(net, 256) 104 | net = conv_block(net, 128) 105 | if has_skip: 106 | net = tf.add(net, skip_2) 107 | 108 | net = conv_transpose_block(net, 128) 109 | net = conv_block(net, 128) 110 | net = conv_block(net, 64) 111 | if has_skip: 112 | net = tf.add(net, skip_1) 113 | 114 | net = conv_transpose_block(net, 64) 115 | net = conv_block(net, 64) 116 | net = conv_block(net, 64) 117 | 118 | ##################### 119 | # Softmax # 120 | ##################### 121 | net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits') 122 | return net -------------------------------------------------------------------------------- /models/FC_DenseNet_Tiramisu.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | import os,time,cv2 3 | import tensorflow as tf 4 | import tensorflow.contrib.slim as slim 5 | import numpy as np 6 | 7 | def preact_conv(inputs, n_filters, kernel_size=[3, 3], dropout_p=0.2): 8 | """ 9 | Basic pre-activation layer for DenseNets 10 | Apply successivly BatchNormalization, ReLU nonlinearity, Convolution and 11 | Dropout (if dropout_p > 0) on the inputs 12 | """ 13 | preact = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 14 | conv = slim.conv2d(preact, n_filters, kernel_size, activation_fn=None, normalizer_fn=None) 15 | if dropout_p != 0.0: 16 | conv = slim.dropout(conv, keep_prob=(1.0-dropout_p)) 17 | return conv 18 | 19 | def DenseBlock(stack, n_layers, growth_rate, dropout_p, scope=None): 20 | """ 21 | DenseBlock for DenseNet and FC-DenseNet 22 | Arguments: 23 | stack: input 4D tensor 24 | n_layers: number of internal layers 25 | growth_rate: number of feature maps per internal layer 26 | Returns: 27 | stack: current stack of feature maps (4D tensor) 28 | new_features: 4D tensor containing only the new feature maps generated 29 | in this block 30 | """ 31 | with tf.name_scope(scope) as sc: 32 | new_features = [] 33 | for j in range(n_layers): 34 | # Compute new feature maps 35 | layer = preact_conv(stack, growth_rate, dropout_p=dropout_p) 36 | new_features.append(layer) 37 | # Stack new layer 38 | stack = tf.concat([stack, layer], axis=-1) 39 | new_features = tf.concat(new_features, axis=-1) 40 | return stack, new_features 41 | 42 | 43 | def TransitionDown(inputs, n_filters, dropout_p=0.2, scope=None): 44 | """ 45 | Transition Down (TD) for FC-DenseNet 46 | Apply 1x1 BN + ReLU + conv then 2x2 max pooling 47 | """ 48 | with tf.name_scope(scope) as sc: 49 | l = preact_conv(inputs, n_filters, kernel_size=[1, 1], dropout_p=dropout_p) 50 | l = slim.pool(l, [2, 2], stride=[2, 2], pooling_type='MAX') 51 | return l 52 | 53 | 54 | def TransitionUp(block_to_upsample, skip_connection, n_filters_keep, scope=None): 55 | """ 56 | Transition Up for FC-DenseNet 57 | Performs upsampling on block_to_upsample by a factor 2 and concatenates it with the skip_connection 58 | """ 59 | with tf.name_scope(scope) as sc: 60 | # Upsample 61 | l = slim.conv2d_transpose(block_to_upsample, n_filters_keep, kernel_size=[3, 3], stride=[2, 2], activation_fn=None) 62 | # Concatenate with skip connection 63 | l = tf.concat([l, skip_connection], axis=-1) 64 | return l 65 | 66 | def build_fc_densenet(inputs, num_classes, preset_model='FC-DenseNet56', n_filters_first_conv=48, n_pool=5, growth_rate=12, n_layers_per_block=4, dropout_p=0.2, scope=None): 67 | """ 68 | Builds the FC-DenseNet model 69 | 70 | Arguments: 71 | inputs: the input tensor 72 | preset_model: The model you want to use 73 | n_classes: number of classes 74 | n_filters_first_conv: number of filters for the first convolution applied 75 | n_pool: number of pooling layers = number of transition down = number of transition up 76 | growth_rate: number of new feature maps created by each layer in a dense block 77 | n_layers_per_block: number of layers per block. Can be an int or a list of size 2 * n_pool + 1 78 | dropout_p: dropout rate applied after each convolution (0. for not using) 79 | 80 | Returns: 81 | Fc-DenseNet model 82 | """ 83 | 84 | if preset_model == 'FC-DenseNet56': 85 | n_pool=5 86 | growth_rate=12 87 | n_layers_per_block=4 88 | elif preset_model == 'FC-DenseNet67': 89 | n_pool=5 90 | growth_rate=16 91 | n_layers_per_block=5 92 | elif preset_model == 'FC-DenseNet103': 93 | n_pool=5 94 | growth_rate=16 95 | n_layers_per_block=[4, 5, 7, 10, 12, 15, 12, 10, 7, 5, 4] 96 | else: 97 | raise ValueError("Unsupported FC-DenseNet model '%s'. This function only supports FC-DenseNet56, FC-DenseNet67, and FC-DenseNet103" % (preset_model)) 98 | 99 | if type(n_layers_per_block) == list: 100 | assert (len(n_layers_per_block) == 2 * n_pool + 1) 101 | elif type(n_layers_per_block) == int: 102 | n_layers_per_block = [n_layers_per_block] * (2 * n_pool + 1) 103 | else: 104 | raise ValueError 105 | 106 | with tf.variable_scope(scope, preset_model, [inputs]) as sc: 107 | 108 | ##################### 109 | # First Convolution # 110 | ##################### 111 | # We perform a first convolution. 112 | stack = slim.conv2d(inputs, n_filters_first_conv, [3, 3], scope='first_conv', activation_fn=None) 113 | 114 | n_filters = n_filters_first_conv 115 | 116 | ##################### 117 | # Downsampling path # 118 | ##################### 119 | 120 | skip_connection_list = [] 121 | 122 | for i in range(n_pool): 123 | # Dense Block 124 | stack, _ = DenseBlock(stack, n_layers_per_block[i], growth_rate, dropout_p, scope='denseblock%d' % (i+1)) 125 | n_filters += growth_rate * n_layers_per_block[i] 126 | # At the end of the dense block, the current stack is stored in the skip_connections list 127 | skip_connection_list.append(stack) 128 | 129 | # Transition Down 130 | stack = TransitionDown(stack, n_filters, dropout_p, scope='transitiondown%d'%(i+1)) 131 | 132 | skip_connection_list = skip_connection_list[::-1] 133 | 134 | ##################### 135 | # Bottleneck # 136 | ##################### 137 | 138 | # Dense Block 139 | # We will only upsample the new feature maps 140 | stack, block_to_upsample = DenseBlock(stack, n_layers_per_block[n_pool], growth_rate, dropout_p, scope='denseblock%d' % (n_pool + 1)) 141 | 142 | 143 | ####################### 144 | # Upsampling path # 145 | ####################### 146 | 147 | for i in range(n_pool): 148 | # Transition Up ( Upsampling + concatenation with the skip connection) 149 | n_filters_keep = growth_rate * n_layers_per_block[n_pool + i] 150 | stack = TransitionUp(block_to_upsample, skip_connection_list[i], n_filters_keep, scope='transitionup%d' % (n_pool + i + 1)) 151 | 152 | # Dense Block 153 | # We will only upsample the new feature maps 154 | stack, block_to_upsample = DenseBlock(stack, n_layers_per_block[n_pool + i + 1], growth_rate, dropout_p, scope='denseblock%d' % (n_pool + i + 2)) 155 | 156 | 157 | ##################### 158 | # Softmax # 159 | ##################### 160 | net = slim.conv2d(stack, num_classes, [1, 1], activation_fn=None, scope='logits') 161 | return net -------------------------------------------------------------------------------- /models/FRRN.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow.contrib import slim 3 | 4 | def Upsampling(inputs,scale): 5 | return tf.image.resize_nearest_neighbor(inputs, size=[tf.shape(inputs)[1]*scale, tf.shape(inputs)[2]*scale]) 6 | 7 | def Unpooling(inputs,scale): 8 | return tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*scale, tf.shape(inputs)[2]*scale]) 9 | 10 | def ResidualUnit(inputs, n_filters=48, filter_size=3): 11 | """ 12 | A local residual unit 13 | 14 | Arguments: 15 | inputs: The input tensor 16 | n_filters: Number of output feature maps for each conv 17 | filter_size: Size of convolution kernel 18 | 19 | Returns: 20 | Output of local residual block 21 | """ 22 | 23 | net = slim.conv2d(inputs, n_filters, filter_size, activation_fn=None) 24 | net = slim.batch_norm(net, fused=True) 25 | net = tf.nn.relu(net) 26 | net = slim.conv2d(net, n_filters, filter_size, activation_fn=None) 27 | net = slim.batch_norm(net, fused=True) 28 | 29 | return net 30 | 31 | def FullResolutionResidualUnit(pool_stream, res_stream, n_filters_3, n_filters_1, pool_scale): 32 | """ 33 | A full resolution residual unit 34 | 35 | Arguments: 36 | pool_stream: The inputs from the pooling stream 37 | res_stream: The inputs from the residual stream 38 | n_filters_3: Number of output feature maps for each 3x3 conv 39 | n_filters_1: Number of output feature maps for each 1x1 conv 40 | pool_scale: scale of the pooling layer i.e window size and stride 41 | 42 | Returns: 43 | Output of full resolution residual block 44 | """ 45 | 46 | G = tf.concat([pool_stream, slim.pool(res_stream, [pool_scale, pool_scale], stride=[pool_scale, pool_scale], pooling_type='MAX')], axis=-1) 47 | 48 | 49 | 50 | net = slim.conv2d(G, n_filters_3, kernel_size=3, activation_fn=None) 51 | net = slim.batch_norm(net, fused=True) 52 | net = tf.nn.relu(net) 53 | net = slim.conv2d(net, n_filters_3, kernel_size=3, activation_fn=None) 54 | net = slim.batch_norm(net, fused=True) 55 | pool_stream_out = tf.nn.relu(net) 56 | 57 | net = slim.conv2d(pool_stream_out, n_filters_1, kernel_size=1, activation_fn=None) 58 | net = Upsampling(net, scale=pool_scale) 59 | res_stream_out = tf.add(res_stream, net) 60 | 61 | return pool_stream_out, res_stream_out 62 | 63 | 64 | 65 | def build_frrn(inputs, num_classes, preset_model='FRRN-A'): 66 | """ 67 | Builds the Full Resolution Residual Network model. 68 | 69 | Arguments: 70 | inputs: The input tensor 71 | preset_model: Which model you want to use. Select FRRN-A or FRRN-B 72 | num_classes: Number of classes 73 | 74 | Returns: 75 | FRRN model 76 | """ 77 | 78 | if preset_model == 'FRRN-A': 79 | 80 | ##################### 81 | # Initial Stage 82 | ##################### 83 | net = slim.conv2d(inputs, 48, kernel_size=5, activation_fn=None) 84 | net = slim.batch_norm(net, fused=True) 85 | net = tf.nn.relu(net) 86 | 87 | net = ResidualUnit(net, n_filters=48, filter_size=3) 88 | net = ResidualUnit(net, n_filters=48, filter_size=3) 89 | net = ResidualUnit(net, n_filters=48, filter_size=3) 90 | 91 | 92 | ##################### 93 | # Downsampling Path 94 | ##################### 95 | pool_stream = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX') 96 | res_stream = slim.conv2d(net, 32, kernel_size=1, activation_fn=None) 97 | 98 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2) 99 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2) 100 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2) 101 | 102 | pool_stream = slim.pool(pool_stream, [2, 2], stride=[2, 2], pooling_type='MAX') 103 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4) 104 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4) 105 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4) 106 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4) 107 | 108 | pool_stream = slim.pool(pool_stream, [2, 2], stride=[2, 2], pooling_type='MAX') 109 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=8) 110 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=8) 111 | 112 | pool_stream = slim.pool(pool_stream, [2, 2], stride=[2, 2], pooling_type='MAX') 113 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=16) 114 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=16) 115 | 116 | ##################### 117 | # Upsampling Path 118 | ##################### 119 | pool_stream = Unpooling(pool_stream, 2) 120 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=8) 121 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=8) 122 | 123 | pool_stream = Unpooling(pool_stream, 2) 124 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4) 125 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4) 126 | 127 | pool_stream = Unpooling(pool_stream, 2) 128 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2) 129 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2) 130 | 131 | pool_stream = Unpooling(pool_stream, 2) 132 | 133 | ##################### 134 | # Final Stage 135 | ##################### 136 | net = tf.concat([pool_stream, res_stream], axis=-1) 137 | net = ResidualUnit(net, n_filters=48, filter_size=3) 138 | net = ResidualUnit(net, n_filters=48, filter_size=3) 139 | net = ResidualUnit(net, n_filters=48, filter_size=3) 140 | 141 | net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits') 142 | return net 143 | 144 | 145 | elif preset_model == 'FRRN-B': 146 | ##################### 147 | # Initial Stage 148 | ##################### 149 | net = slim.conv2d(inputs, 48, kernel_size=5, activation_fn=None) 150 | net = slim.batch_norm(net, fused=True) 151 | net = tf.nn.relu(net) 152 | 153 | net = ResidualUnit(net, n_filters=48, filter_size=3) 154 | net = ResidualUnit(net, n_filters=48, filter_size=3) 155 | net = ResidualUnit(net, n_filters=48, filter_size=3) 156 | 157 | 158 | ##################### 159 | # Downsampling Path 160 | ##################### 161 | pool_stream = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX') 162 | res_stream = slim.conv2d(net, 32, kernel_size=1, activation_fn=None) 163 | 164 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2) 165 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2) 166 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2) 167 | 168 | pool_stream = slim.pool(pool_stream, [2, 2], stride=[2, 2], pooling_type='MAX') 169 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4) 170 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4) 171 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4) 172 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4) 173 | 174 | pool_stream = slim.pool(pool_stream, [2, 2], stride=[2, 2], pooling_type='MAX') 175 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=8) 176 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=8) 177 | 178 | pool_stream = slim.pool(pool_stream, [2, 2], stride=[2, 2], pooling_type='MAX') 179 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=16) 180 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=16) 181 | 182 | pool_stream = slim.pool(pool_stream, [2, 2], stride=[2, 2], pooling_type='MAX') 183 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=32) 184 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=384, n_filters_1=32, pool_scale=32) 185 | 186 | ##################### 187 | # Upsampling Path 188 | ##################### 189 | pool_stream = Unpooling(pool_stream, 2) 190 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=16) 191 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=16) 192 | 193 | pool_stream = Unpooling(pool_stream, 2) 194 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=8) 195 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=8) 196 | 197 | pool_stream = Unpooling(pool_stream, 2) 198 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4) 199 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=192, n_filters_1=32, pool_scale=4) 200 | 201 | pool_stream = Unpooling(pool_stream, 2) 202 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2) 203 | pool_stream, res_stream = FullResolutionResidualUnit(pool_stream=pool_stream, res_stream=res_stream, n_filters_3=96, n_filters_1=32, pool_scale=2) 204 | 205 | pool_stream = Unpooling(pool_stream, 2) 206 | 207 | ##################### 208 | # Final Stage 209 | ##################### 210 | net = tf.concat([pool_stream, res_stream], axis=-1) 211 | net = ResidualUnit(net, n_filters=48, filter_size=3) 212 | net = ResidualUnit(net, n_filters=48, filter_size=3) 213 | net = ResidualUnit(net, n_filters=48, filter_size=3) 214 | 215 | net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits') 216 | return net 217 | 218 | else: 219 | raise ValueError("Unsupported FRRN model '%s'. This function only supports FRRN-A and FRRN-B" % (preset_model)) 220 | -------------------------------------------------------------------------------- /models/GCN.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow.contrib import slim 3 | from builders import frontend_builder 4 | import os, sys 5 | 6 | def Upsampling(inputs,scale): 7 | return tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*scale, tf.shape(inputs)[2]*scale]) 8 | 9 | 10 | def ConvUpscaleBlock(inputs, n_filters, kernel_size=[3, 3], scale=2): 11 | """ 12 | Basic deconv block for GCN 13 | Apply Transposed Convolution for feature map upscaling 14 | """ 15 | net = slim.conv2d_transpose(inputs, n_filters, kernel_size=[3, 3], stride=[2, 2], activation_fn=None) 16 | return net 17 | 18 | def BoundaryRefinementBlock(inputs, n_filters, kernel_size=[3, 3]): 19 | """ 20 | Boundary Refinement Block for GCN 21 | """ 22 | net = slim.conv2d(inputs, n_filters, kernel_size, activation_fn=None, normalizer_fn=None) 23 | net = tf.nn.relu(net) 24 | net = slim.conv2d(net, n_filters, kernel_size, activation_fn=None, normalizer_fn=None) 25 | net = tf.add(inputs, net) 26 | return net 27 | 28 | def GlobalConvBlock(inputs, n_filters=21, size=3): 29 | """ 30 | Global Conv Block for GCN 31 | """ 32 | 33 | net_1 = slim.conv2d(inputs, n_filters, [size, 1], activation_fn=None, normalizer_fn=None) 34 | net_1 = slim.conv2d(net_1, n_filters, [1, size], activation_fn=None, normalizer_fn=None) 35 | 36 | net_2 = slim.conv2d(inputs, n_filters, [1, size], activation_fn=None, normalizer_fn=None) 37 | net_2 = slim.conv2d(net_2, n_filters, [size, 1], activation_fn=None, normalizer_fn=None) 38 | 39 | net = tf.add(net_1, net_2) 40 | 41 | return net 42 | 43 | 44 | def build_gcn(inputs, num_classes, preset_model='GCN', frontend="ResNet101", weight_decay=1e-5, is_training=True, upscaling_method="bilinear", pretrained_dir="models"): 45 | """ 46 | Builds the GCN model. 47 | 48 | Arguments: 49 | inputs: The input tensor 50 | preset_model: Which model you want to use. Select which ResNet model to use for feature extraction 51 | num_classes: Number of classes 52 | 53 | Returns: 54 | GCN model 55 | """ 56 | 57 | logits, end_points, frontend_scope, init_fn = frontend_builder.build_frontend(inputs, frontend, pretrained_dir=pretrained_dir, is_training=is_training) 58 | 59 | 60 | 61 | 62 | res = [end_points['pool5'], end_points['pool4'], 63 | end_points['pool3'], end_points['pool2']] 64 | 65 | down_5 = GlobalConvBlock(res[0], n_filters=21, size=3) 66 | down_5 = BoundaryRefinementBlock(down_5, n_filters=21, kernel_size=[3, 3]) 67 | down_5 = ConvUpscaleBlock(down_5, n_filters=21, kernel_size=[3, 3], scale=2) 68 | 69 | down_4 = GlobalConvBlock(res[1], n_filters=21, size=3) 70 | down_4 = BoundaryRefinementBlock(down_4, n_filters=21, kernel_size=[3, 3]) 71 | down_4 = tf.add(down_4, down_5) 72 | down_4 = BoundaryRefinementBlock(down_4, n_filters=21, kernel_size=[3, 3]) 73 | down_4 = ConvUpscaleBlock(down_4, n_filters=21, kernel_size=[3, 3], scale=2) 74 | 75 | down_3 = GlobalConvBlock(res[2], n_filters=21, size=3) 76 | down_3 = BoundaryRefinementBlock(down_3, n_filters=21, kernel_size=[3, 3]) 77 | down_3 = tf.add(down_3, down_4) 78 | down_3 = BoundaryRefinementBlock(down_3, n_filters=21, kernel_size=[3, 3]) 79 | down_3 = ConvUpscaleBlock(down_3, n_filters=21, kernel_size=[3, 3], scale=2) 80 | 81 | down_2 = GlobalConvBlock(res[3], n_filters=21, size=3) 82 | down_2 = BoundaryRefinementBlock(down_2, n_filters=21, kernel_size=[3, 3]) 83 | down_2 = tf.add(down_2, down_3) 84 | down_2 = BoundaryRefinementBlock(down_2, n_filters=21, kernel_size=[3, 3]) 85 | down_2 = ConvUpscaleBlock(down_2, n_filters=21, kernel_size=[3, 3], scale=2) 86 | 87 | net = BoundaryRefinementBlock(down_2, n_filters=21, kernel_size=[3, 3]) 88 | net = ConvUpscaleBlock(net, n_filters=21, kernel_size=[3, 3], scale=2) 89 | net = BoundaryRefinementBlock(net, n_filters=21, kernel_size=[3, 3]) 90 | 91 | net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits') 92 | 93 | return net, init_fn 94 | 95 | 96 | def mean_image_subtraction(inputs, means=[123.68, 116.78, 103.94]): 97 | inputs=tf.to_float(inputs) 98 | num_channels = inputs.get_shape().as_list()[-1] 99 | if len(means) != num_channels: 100 | raise ValueError('len(means) must match the number of channels') 101 | channels = tf.split(axis=3, num_or_size_splits=num_channels, value=inputs) 102 | for i in range(num_channels): 103 | channels[i] -= means[i] 104 | return tf.concat(axis=3, values=channels) 105 | -------------------------------------------------------------------------------- /models/ICNet.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow.contrib import slim 3 | import numpy as np 4 | from frontends import frontend_builder 5 | import os, sys 6 | 7 | def Upsampling_by_shape(inputs, feature_map_shape): 8 | return tf.image.resize_bilinear(inputs, size=feature_map_shape) 9 | 10 | def Upsampling_by_scale(inputs, scale): 11 | return tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*scale, tf.shape(inputs)[2]*scale]) 12 | 13 | def ConvUpscaleBlock(inputs, n_filters, kernel_size=[3, 3], scale=2): 14 | """ 15 | Basic conv transpose block for Encoder-Decoder upsampling 16 | Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity 17 | """ 18 | net = slim.conv2d_transpose(inputs, n_filters, kernel_size=[3, 3], stride=[2, 2], activation_fn=None) 19 | net = tf.nn.relu(slim.batch_norm(net, fused=True)) 20 | return net 21 | 22 | def ConvBlock(inputs, n_filters, kernel_size=[3, 3]): 23 | """ 24 | Basic conv block for Encoder-Decoder 25 | Apply successivly Convolution, BatchNormalization, ReLU nonlinearity 26 | """ 27 | net = slim.conv2d(inputs, n_filters, kernel_size, activation_fn=None, normalizer_fn=None) 28 | net = tf.nn.relu(slim.batch_norm(net, fused=True)) 29 | return net 30 | 31 | def InterpBlock(net, level, feature_map_shape, pooling_type): 32 | 33 | # Compute the kernel and stride sizes according to how large the final feature map will be 34 | # When the kernel size and strides are equal, then we can compute the final feature map size 35 | # by simply dividing the current size by the kernel or stride size 36 | # The final feature map sizes are 1x1, 2x2, 3x3, and 6x6. We round to the closest integer 37 | kernel_size = [int(np.round(float(feature_map_shape[0]) / float(level))), int(np.round(float(feature_map_shape[1]) / float(level)))] 38 | stride_size = kernel_size 39 | 40 | net = slim.pool(net, kernel_size, stride=stride_size, pooling_type='MAX') 41 | net = slim.conv2d(net, 512, [1, 1], activation_fn=None) 42 | net = slim.batch_norm(net, fused=True) 43 | net = tf.nn.relu(net) 44 | net = Upsampling_by_shape(net, feature_map_shape) 45 | return net 46 | 47 | def PyramidPoolingModule_ICNet(inputs, feature_map_shape, pooling_type): 48 | """ 49 | Build the Pyramid Pooling Module. 50 | """ 51 | 52 | interp_block1 = InterpBlock(inputs, 1, feature_map_shape, pooling_type) 53 | interp_block2 = InterpBlock(inputs, 2, feature_map_shape, pooling_type) 54 | interp_block3 = InterpBlock(inputs, 3, feature_map_shape, pooling_type) 55 | interp_block6 = InterpBlock(inputs, 6, feature_map_shape, pooling_type) 56 | 57 | res = tf.add([inputs, interp_block6, interp_block3, interp_block2, interp_block1]) 58 | return res 59 | 60 | def CFFBlock(F1, F2, num_classes): 61 | F1_big = Upsampling_by_scale(F1, scale=2) 62 | F1_out = slim.conv2d(F1_big, num_classes, [1, 1], activation_fn=None) 63 | 64 | F1_big = slim.conv2d(F1_big, 2048, [3, 3], rate=2, activation_fn=None) 65 | F1_big = slim.batch_norm(F1_big, fused=True) 66 | 67 | F2_proj = slim.conv2d(F2, 512, [1, 1], rate=1, activation_fn=None) 68 | F2_proj = slim.batch_norm(F2_proj, fused=True) 69 | 70 | F2_out = tf.add([F1_big, F2_proj]) 71 | F2_out = tf.nn.relu(F2_out) 72 | 73 | return F1_out, F2_out 74 | 75 | 76 | def build_icnet(inputs, label_size, num_classes, preset_model='ICNet', pooling_type = "MAX", 77 | frontend="ResNet101", weight_decay=1e-5, is_training=True, pretrained_dir="models"): 78 | """ 79 | Builds the ICNet model. 80 | 81 | Arguments: 82 | inputs: The input tensor 83 | label_size: Size of the final label tensor. We need to know this for proper upscaling 84 | preset_model: Which model you want to use. Select which ResNet model to use for feature extraction 85 | num_classes: Number of classes 86 | pooling_type: Max or Average pooling 87 | 88 | Returns: 89 | ICNet model 90 | """ 91 | 92 | inputs_4 = tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*4, tf.shape(inputs)[2]*4]) 93 | inputs_2 = tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*2, tf.shape(inputs)[2]*2]) 94 | inputs_1 = inputs 95 | 96 | if frontend == 'Res50': 97 | with slim.arg_scope(resnet_v2.resnet_arg_scope(weight_decay=weight_decay)): 98 | logits_32, end_points_32 = resnet_v2.resnet_v2_50(inputs_4, is_training=is_training, scope='resnet_v2_50') 99 | logits_16, end_points_16 = resnet_v2.resnet_v2_50(inputs_2, is_training=is_training, scope='resnet_v2_50') 100 | logits_8, end_points_8 = resnet_v2.resnet_v2_50(inputs_1, is_training=is_training, scope='resnet_v2_50') 101 | resnet_scope='resnet_v2_50' 102 | # ICNet requires pre-trained ResNet weights 103 | init_fn = slim.assign_from_checkpoint_fn(os.path.join(pretrained_dir, 'resnet_v2_50.ckpt'), slim.get_model_variables('resnet_v2_50')) 104 | elif frontend == 'Res101': 105 | with slim.arg_scope(resnet_v2.resnet_arg_scope(weight_decay=weight_decay)): 106 | logits_32, end_points_32 = resnet_v2.resnet_v2_101(inputs_4, is_training=is_training, scope='resnet_v2_101') 107 | logits_16, end_points_16 = resnet_v2.resnet_v2_101(inputs_2, is_training=is_training, scope='resnet_v2_101') 108 | logits_8, end_points_8 = resnet_v2.resnet_v2_101(inputs_1, is_training=is_training, scope='resnet_v2_101') 109 | resnet_scope='resnet_v2_101' 110 | # ICNet requires pre-trained ResNet weights 111 | init_fn = slim.assign_from_checkpoint_fn(os.path.join(pretrained_dir, 'resnet_v2_101.ckpt'), slim.get_model_variables('resnet_v2_101')) 112 | elif frontend == 'Res152': 113 | with slim.arg_scope(resnet_v2.resnet_arg_scope(weight_decay=weight_decay)): 114 | logits_32, end_points_32 = resnet_v2.resnet_v2_152(inputs_4, is_training=is_training, scope='resnet_v2_152') 115 | logits_16, end_points_16 = resnet_v2.resnet_v2_152(inputs_2, is_training=is_training, scope='resnet_v2_152') 116 | logits_8, end_points_8 = resnet_v2.resnet_v2_152(inputs_1, is_training=is_training, scope='resnet_v2_152') 117 | resnet_scope='resnet_v2_152' 118 | # ICNet requires pre-trained ResNet weights 119 | init_fn = slim.assign_from_checkpoint_fn(os.path.join(pretrained_dir, 'resnet_v2_152.ckpt'), slim.get_model_variables('resnet_v2_152')) 120 | else: 121 | raise ValueError("Unsupported ResNet model '%s'. This function only supports ResNet 50, ResNet 101, and ResNet 152" % (frontend)) 122 | 123 | 124 | 125 | feature_map_shape = [int(x / 32.0) for x in label_size] 126 | block_32 = PyramidPoolingModule(end_points_32['pool3'], feature_map_shape=feature_map_shape, pooling_type=pooling_type) 127 | 128 | out_16, block_16 = CFFBlock(psp_32, end_points_16['pool3']) 129 | out_8, block_8 = CFFBlock(block_16, end_points_8['pool3']) 130 | out_4 = Upsampling_by_scale(out_8, scale=2) 131 | out_4 = slim.conv2d(out_4, num_classes, [1, 1], activation_fn=None) 132 | 133 | out_full = Upsampling_by_scale(out_4, scale=2) 134 | 135 | out_full = slim.conv2d(out_full, num_classes, [1, 1], activation_fn=None, scope='logits') 136 | 137 | net = tf.concat([out_16, out_8, out_4, out_final]) 138 | 139 | return net, init_fn 140 | 141 | 142 | def mean_image_subtraction(inputs, means=[123.68, 116.78, 103.94]): 143 | inputs=tf.to_float(inputs) 144 | num_channels = inputs.get_shape().as_list()[-1] 145 | if len(means) != num_channels: 146 | raise ValueError('len(means) must match the number of channels') 147 | channels = tf.split(axis=3, num_or_size_splits=num_channels, value=inputs) 148 | for i in range(num_channels): 149 | channels[i] -= means[i] 150 | return tf.concat(axis=3, values=channels) -------------------------------------------------------------------------------- /models/MobileUNet.py: -------------------------------------------------------------------------------- 1 | import os,time,cv2 2 | import tensorflow as tf 3 | import tensorflow.contrib.slim as slim 4 | import numpy as np 5 | 6 | def ConvBlock(inputs, n_filters, kernel_size=[3, 3]): 7 | """ 8 | Builds the conv block for MobileNets 9 | Apply successivly a 2D convolution, BatchNormalization relu 10 | """ 11 | # Skip pointwise by setting num_outputs=Non 12 | net = slim.conv2d(inputs, n_filters, kernel_size=[1, 1], activation_fn=None) 13 | net = slim.batch_norm(net, fused=True) 14 | net = tf.nn.relu(net) 15 | return net 16 | 17 | def DepthwiseSeparableConvBlock(inputs, n_filters, kernel_size=[3, 3]): 18 | """ 19 | Builds the Depthwise Separable conv block for MobileNets 20 | Apply successivly a 2D separable convolution, BatchNormalization relu, conv, BatchNormalization, relu 21 | """ 22 | # Skip pointwise by setting num_outputs=None 23 | net = slim.separable_convolution2d(inputs, num_outputs=None, depth_multiplier=1, kernel_size=[3, 3], activation_fn=None) 24 | 25 | net = slim.batch_norm(net, fused=True) 26 | net = tf.nn.relu(net) 27 | net = slim.conv2d(net, n_filters, kernel_size=[1, 1], activation_fn=None) 28 | net = slim.batch_norm(net, fused=True) 29 | net = tf.nn.relu(net) 30 | return net 31 | 32 | def conv_transpose_block(inputs, n_filters, kernel_size=[3, 3]): 33 | """ 34 | Basic conv transpose block for Encoder-Decoder upsampling 35 | Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity 36 | """ 37 | net = slim.conv2d_transpose(inputs, n_filters, kernel_size=[3, 3], stride=[2, 2], activation_fn=None) 38 | net = tf.nn.relu(slim.batch_norm(net)) 39 | return net 40 | 41 | def build_mobile_unet(inputs, preset_model, num_classes): 42 | 43 | has_skip = False 44 | if preset_model == "MobileUNet": 45 | has_skip = False 46 | elif preset_model == "MobileUNet-Skip": 47 | has_skip = True 48 | else: 49 | raise ValueError("Unsupported MobileUNet model '%s'. This function only supports MobileUNet and MobileUNet-Skip" % (preset_model)) 50 | 51 | ##################### 52 | # Downsampling path # 53 | ##################### 54 | net = ConvBlock(inputs, 64) 55 | net = DepthwiseSeparableConvBlock(net, 64) 56 | net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX') 57 | skip_1 = net 58 | 59 | net = DepthwiseSeparableConvBlock(net, 128) 60 | net = DepthwiseSeparableConvBlock(net, 128) 61 | net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX') 62 | skip_2 = net 63 | 64 | net = DepthwiseSeparableConvBlock(net, 256) 65 | net = DepthwiseSeparableConvBlock(net, 256) 66 | net = DepthwiseSeparableConvBlock(net, 256) 67 | net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX') 68 | skip_3 = net 69 | 70 | net = DepthwiseSeparableConvBlock(net, 512) 71 | net = DepthwiseSeparableConvBlock(net, 512) 72 | net = DepthwiseSeparableConvBlock(net, 512) 73 | net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX') 74 | skip_4 = net 75 | 76 | net = DepthwiseSeparableConvBlock(net, 512) 77 | net = DepthwiseSeparableConvBlock(net, 512) 78 | net = DepthwiseSeparableConvBlock(net, 512) 79 | net = slim.pool(net, [2, 2], stride=[2, 2], pooling_type='MAX') 80 | 81 | 82 | ##################### 83 | # Upsampling path # 84 | ##################### 85 | net = conv_transpose_block(net, 512) 86 | net = DepthwiseSeparableConvBlock(net, 512) 87 | net = DepthwiseSeparableConvBlock(net, 512) 88 | net = DepthwiseSeparableConvBlock(net, 512) 89 | if has_skip: 90 | net = tf.add(net, skip_4) 91 | 92 | net = conv_transpose_block(net, 512) 93 | net = DepthwiseSeparableConvBlock(net, 512) 94 | net = DepthwiseSeparableConvBlock(net, 512) 95 | net = DepthwiseSeparableConvBlock(net, 256) 96 | if has_skip: 97 | net = tf.add(net, skip_3) 98 | 99 | net = conv_transpose_block(net, 256) 100 | net = DepthwiseSeparableConvBlock(net, 256) 101 | net = DepthwiseSeparableConvBlock(net, 256) 102 | net = DepthwiseSeparableConvBlock(net, 128) 103 | if has_skip: 104 | net = tf.add(net, skip_2) 105 | 106 | net = conv_transpose_block(net, 128) 107 | net = DepthwiseSeparableConvBlock(net, 128) 108 | net = DepthwiseSeparableConvBlock(net, 64) 109 | if has_skip: 110 | net = tf.add(net, skip_1) 111 | 112 | net = conv_transpose_block(net, 64) 113 | net = DepthwiseSeparableConvBlock(net, 64) 114 | net = DepthwiseSeparableConvBlock(net, 64) 115 | 116 | ##################### 117 | # Softmax # 118 | ##################### 119 | net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits') 120 | return net -------------------------------------------------------------------------------- /models/PSPNet.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow.contrib import slim 3 | import numpy as np 4 | from builders import frontend_builder 5 | import os, sys 6 | 7 | def Upsampling(inputs,feature_map_shape): 8 | return tf.image.resize_bilinear(inputs, size=feature_map_shape) 9 | 10 | def ConvUpscaleBlock(inputs, n_filters, kernel_size=[3, 3], scale=2): 11 | """ 12 | Basic conv transpose block for Encoder-Decoder upsampling 13 | Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity 14 | """ 15 | net = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 16 | net = slim.conv2d_transpose(net, n_filters, kernel_size=[3, 3], stride=[scale, scale], activation_fn=None) 17 | return net 18 | 19 | def ConvBlock(inputs, n_filters, kernel_size=[3, 3]): 20 | """ 21 | Basic conv block for Encoder-Decoder 22 | Apply successivly Convolution, BatchNormalization, ReLU nonlinearity 23 | """ 24 | net = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 25 | net = slim.conv2d(net, n_filters, kernel_size, activation_fn=None, normalizer_fn=None) 26 | return net 27 | 28 | def InterpBlock(net, level, feature_map_shape, pooling_type): 29 | 30 | # Compute the kernel and stride sizes according to how large the final feature map will be 31 | # When the kernel size and strides are equal, then we can compute the final feature map size 32 | # by simply dividing the current size by the kernel or stride size 33 | # The final feature map sizes are 1x1, 2x2, 3x3, and 6x6. We round to the closest integer 34 | kernel_size = [int(np.round(float(feature_map_shape[0]) / float(level))), int(np.round(float(feature_map_shape[1]) / float(level)))] 35 | stride_size = kernel_size 36 | 37 | net = slim.pool(net, kernel_size, stride=stride_size, pooling_type='MAX') 38 | net = slim.conv2d(net, 512, [1, 1], activation_fn=None) 39 | net = slim.batch_norm(net, fused=True) 40 | net = tf.nn.relu(net) 41 | net = Upsampling(net, feature_map_shape) 42 | return net 43 | 44 | def PyramidPoolingModule(inputs, feature_map_shape, pooling_type): 45 | """ 46 | Build the Pyramid Pooling Module. 47 | """ 48 | 49 | interp_block1 = InterpBlock(inputs, 1, feature_map_shape, pooling_type) 50 | interp_block2 = InterpBlock(inputs, 2, feature_map_shape, pooling_type) 51 | interp_block3 = InterpBlock(inputs, 3, feature_map_shape, pooling_type) 52 | interp_block6 = InterpBlock(inputs, 6, feature_map_shape, pooling_type) 53 | 54 | res = tf.concat([inputs, interp_block6, interp_block3, interp_block2, interp_block1], axis=-1) 55 | return res 56 | 57 | 58 | 59 | def build_pspnet(inputs, label_size, num_classes, preset_model='PSPNet', frontend="ResNet101", pooling_type = "MAX", 60 | weight_decay=1e-5, upscaling_method="conv", is_training=True, pretrained_dir="models"): 61 | """ 62 | Builds the PSPNet model. 63 | 64 | Arguments: 65 | inputs: The input tensor 66 | label_size: Size of the final label tensor. We need to know this for proper upscaling 67 | preset_model: Which model you want to use. Select which ResNet model to use for feature extraction 68 | num_classes: Number of classes 69 | pooling_type: Max or Average pooling 70 | 71 | Returns: 72 | PSPNet model 73 | """ 74 | 75 | logits, end_points, frontend_scope, init_fn = frontend_builder.build_frontend(inputs, frontend, pretrained_dir=pretrained_dir, is_training=is_training) 76 | 77 | feature_map_shape = [int(x / 8.0) for x in label_size] 78 | print(feature_map_shape) 79 | psp = PyramidPoolingModule(end_points['pool3'], feature_map_shape=feature_map_shape, pooling_type=pooling_type) 80 | 81 | net = slim.conv2d(psp, 512, [3, 3], activation_fn=None) 82 | net = slim.batch_norm(net, fused=True) 83 | net = tf.nn.relu(net) 84 | 85 | if upscaling_method.lower() == "conv": 86 | net = ConvUpscaleBlock(net, 256, kernel_size=[3, 3], scale=2) 87 | net = ConvBlock(net, 256) 88 | net = ConvUpscaleBlock(net, 128, kernel_size=[3, 3], scale=2) 89 | net = ConvBlock(net, 128) 90 | net = ConvUpscaleBlock(net, 64, kernel_size=[3, 3], scale=2) 91 | net = ConvBlock(net, 64) 92 | elif upscaling_method.lower() == "bilinear": 93 | net = Upsampling(net, label_size) 94 | 95 | net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits') 96 | 97 | return net, init_fn 98 | 99 | 100 | def mean_image_subtraction(inputs, means=[123.68, 116.78, 103.94]): 101 | inputs=tf.to_float(inputs) 102 | num_channels = inputs.get_shape().as_list()[-1] 103 | if len(means) != num_channels: 104 | raise ValueError('len(means) must match the number of channels') 105 | channels = tf.split(axis=3, num_or_size_splits=num_channels, value=inputs) 106 | for i in range(num_channels): 107 | channels[i] -= means[i] 108 | return tf.concat(axis=3, values=channels) -------------------------------------------------------------------------------- /models/RefineNet.py: -------------------------------------------------------------------------------- 1 | import tensorflow as tf 2 | from tensorflow.contrib import slim 3 | from builders import frontend_builder 4 | import os, sys 5 | 6 | def Upsampling(inputs,scale): 7 | return tf.image.resize_bilinear(inputs, size=[tf.shape(inputs)[1]*scale, tf.shape(inputs)[2]*scale]) 8 | 9 | def ConvBlock(inputs, n_filters, kernel_size=[3, 3]): 10 | """ 11 | Basic conv block for Encoder-Decoder 12 | Apply successivly Convolution, BatchNormalization, ReLU nonlinearity 13 | """ 14 | net = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 15 | net = slim.conv2d(net, n_filters, kernel_size, activation_fn=None, normalizer_fn=None) 16 | return net 17 | 18 | def ConvUpscaleBlock(inputs, n_filters, kernel_size=[3, 3], scale=2): 19 | """ 20 | Basic conv transpose block for Encoder-Decoder upsampling 21 | Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity 22 | """ 23 | net = tf.nn.relu(slim.batch_norm(inputs, fused=True)) 24 | net = slim.conv2d_transpose(net, n_filters, kernel_size=[3, 3], stride=[scale, scale], activation_fn=None) 25 | return net 26 | 27 | 28 | def ResidualConvUnit(inputs,n_filters=256,kernel_size=3): 29 | """ 30 | A local residual unit designed to fine-tune the pretrained ResNet weights 31 | 32 | Arguments: 33 | inputs: The input tensor 34 | n_filters: Number of output feature maps for each conv 35 | kernel_size: Size of convolution kernel 36 | 37 | Returns: 38 | Output of local residual block 39 | """ 40 | net=tf.nn.relu(inputs) 41 | net=slim.conv2d(net, n_filters, kernel_size, activation_fn=None) 42 | net=tf.nn.relu(net) 43 | net=slim.conv2d(net,n_filters,kernel_size, activation_fn=None) 44 | net=tf.add(net,inputs) 45 | return net 46 | 47 | def ChainedResidualPooling(inputs,n_filters=256): 48 | """ 49 | Chained residual pooling aims to capture background 50 | context from a large image region. This component is 51 | built as a chain of 2 pooling blocks, each consisting 52 | of one max-pooling layer and one convolution layer. One pooling 53 | block takes the output of the previous pooling block as 54 | input. The output feature maps of all pooling blocks are 55 | fused together with the input feature map through summation 56 | of residual connections. 57 | 58 | Arguments: 59 | inputs: The input tensor 60 | n_filters: Number of output feature maps for each conv 61 | 62 | Returns: 63 | Double-pooled feature maps 64 | """ 65 | 66 | net_relu=tf.nn.relu(inputs) 67 | net=slim.max_pool2d(net_relu, [5, 5],stride=1,padding='SAME') 68 | net=slim.conv2d(net,n_filters,3, activation_fn=None) 69 | net_sum_1=tf.add(net,net_relu) 70 | 71 | net = slim.max_pool2d(net, [5, 5], stride=1, padding='SAME') 72 | net = slim.conv2d(net, n_filters, 3, activation_fn=None) 73 | net_sum_2=tf.add(net,net_sum_1) 74 | 75 | return net_sum_2 76 | 77 | 78 | def MultiResolutionFusion(high_inputs=None,low_inputs=None,n_filters=256): 79 | """ 80 | Fuse together all path inputs. This block first applies convolutions 81 | for input adaptation, which generate feature maps of the same feature dimension 82 | (the smallest one among the inputs), and then up-samples all (smaller) feature maps to 83 | the largest resolution of the inputs. Finally, all features maps are fused by summation. 84 | 85 | Arguments: 86 | high_inputs: The input tensors that have the higher resolution 87 | low_inputs: The input tensors that have the lower resolution 88 | n_filters: Number of output feature maps for each conv 89 | 90 | Returns: 91 | Fused feature maps at higher resolution 92 | 93 | """ 94 | 95 | if high_inputs is None: # RefineNet block 4 96 | 97 | fuse = slim.conv2d(low_inputs, n_filters, 3, activation_fn=None) 98 | 99 | return fuse 100 | 101 | else: 102 | 103 | conv_low = slim.conv2d(low_inputs, n_filters, 3, activation_fn=None) 104 | conv_high = slim.conv2d(high_inputs, n_filters, 3, activation_fn=None) 105 | 106 | conv_low_up = Upsampling(conv_low,2) 107 | 108 | return tf.add(conv_low_up, conv_high) 109 | 110 | 111 | def RefineBlock(high_inputs=None,low_inputs=None): 112 | """ 113 | A RefineNet Block which combines together the ResidualConvUnits, 114 | fuses the feature maps using MultiResolutionFusion, and then gets 115 | large-scale context with the ResidualConvUnit. 116 | 117 | Arguments: 118 | high_inputs: The input tensors that have the higher resolution 119 | low_inputs: The input tensors that have the lower resolution 120 | 121 | Returns: 122 | RefineNet block for a single path i.e one resolution 123 | 124 | """ 125 | 126 | if low_inputs is None: # block 4 127 | rcu_new_low= ResidualConvUnit(high_inputs, n_filters=512) 128 | rcu_new_low = ResidualConvUnit(rcu_new_low, n_filters=512) 129 | 130 | fuse = MultiResolutionFusion(high_inputs=None, low_inputs=rcu_new_low, n_filters=512) 131 | fuse_pooling = ChainedResidualPooling(fuse, n_filters=512) 132 | output = ResidualConvUnit(fuse_pooling, n_filters=512) 133 | return output 134 | else: 135 | rcu_high= ResidualConvUnit(high_inputs, n_filters=256) 136 | rcu_high = ResidualConvUnit(rcu_high, n_filters=256) 137 | 138 | fuse = MultiResolutionFusion(rcu_high, low_inputs,n_filters=256) 139 | fuse_pooling = ChainedResidualPooling(fuse, n_filters=256) 140 | output = ResidualConvUnit(fuse_pooling, n_filters=256) 141 | return output 142 | 143 | 144 | 145 | def build_refinenet(inputs, num_classes, preset_model='RefineNet', frontend="ResNet101", weight_decay=1e-5, upscaling_method="bilinear", pretrained_dir="models", is_training=True): 146 | """ 147 | Builds the RefineNet model. 148 | 149 | Arguments: 150 | inputs: The input tensor 151 | preset_model: Which model you want to use. Select which ResNet model to use for feature extraction 152 | num_classes: Number of classes 153 | 154 | Returns: 155 | RefineNet model 156 | """ 157 | 158 | logits, end_points, frontend_scope, init_fn = frontend_builder.build_frontend(inputs, frontend, pretrained_dir=pretrained_dir, is_training=is_training) 159 | 160 | 161 | 162 | 163 | high = [end_points['pool5'], end_points['pool4'], 164 | end_points['pool3'], end_points['pool2']] 165 | 166 | low = [None, None, None, None] 167 | 168 | # Get the feature maps to the proper size with bottleneck 169 | high[0]=slim.conv2d(high[0], 512, 1) 170 | high[1]=slim.conv2d(high[1], 256, 1) 171 | high[2]=slim.conv2d(high[2], 256, 1) 172 | high[3]=slim.conv2d(high[3], 256, 1) 173 | 174 | # RefineNet 175 | low[0]=RefineBlock(high_inputs=high[0],low_inputs=None) # Only input ResNet 1/32 176 | low[1]=RefineBlock(high[1],low[0]) # High input = ResNet 1/16, Low input = Previous 1/16 177 | low[2]=RefineBlock(high[2],low[1]) # High input = ResNet 1/8, Low input = Previous 1/8 178 | low[3]=RefineBlock(high[3],low[2]) # High input = ResNet 1/4, Low input = Previous 1/4 179 | 180 | # g[3]=Upsampling(g[3],scale=4) 181 | 182 | net = low[3] 183 | 184 | net = ResidualConvUnit(net) 185 | net = ResidualConvUnit(net) 186 | 187 | if upscaling_method.lower() == "conv": 188 | net = ConvUpscaleBlock(net, 128, kernel_size=[3, 3], scale=2) 189 | net = ConvBlock(net, 128) 190 | net = ConvUpscaleBlock(net, 64, kernel_size=[3, 3], scale=2) 191 | net = ConvBlock(net, 64) 192 | elif upscaling_method.lower() == "bilinear": 193 | net = Upsampling(net, scale=4) 194 | 195 | net = slim.conv2d(net, num_classes, [1, 1], activation_fn=None, scope='logits') 196 | 197 | return net, init_fn 198 | 199 | 200 | def mean_image_subtraction(inputs, means=[123.68, 116.78, 103.94]): 201 | inputs=tf.to_float(inputs) 202 | num_channels = inputs.get_shape().as_list()[-1] 203 | if len(means) != num_channels: 204 | raise ValueError('len(means) must match the number of channels') 205 | channels = tf.split(axis=3, num_or_size_splits=num_channels, value=inputs) 206 | for i in range(num_channels): 207 | channels[i] -= means[i] 208 | return tf.concat(axis=3, values=channels) 209 | -------------------------------------------------------------------------------- /models/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awangenh/Weed-Mapping/72526ebbc2abe3b9d35672689de25a321e36b039/models/__init__.py -------------------------------------------------------------------------------- /models/custom_model.py: -------------------------------------------------------------------------------- 1 | from __future__ import division 2 | import os,time,cv2 3 | import tensorflow as tf 4 | import tensorflow.contrib.slim as slim 5 | import numpy as np 6 | from builders import frontend_builder 7 | 8 | def conv_block(inputs, n_filters, filter_size=[3, 3], dropout_p=0.0): 9 | """ 10 | Basic conv block for Encoder-Decoder 11 | Apply successivly Convolution, BatchNormalization, ReLU nonlinearity 12 | Dropout (if dropout_p > 0) on the inputs 13 | """ 14 | conv = slim.conv2d(inputs, n_filters, filter_size, activation_fn=None, normalizer_fn=None) 15 | out = tf.nn.relu(slim.batch_norm(conv, fused=True)) 16 | if dropout_p != 0.0: 17 | out = slim.dropout(out, keep_prob=(1.0-dropout_p)) 18 | return out 19 | 20 | def conv_transpose_block(inputs, n_filters, strides=2, filter_size=[3, 3], dropout_p=0.0): 21 | """ 22 | Basic conv transpose block for Encoder-Decoder upsampling 23 | Apply successivly Transposed Convolution, BatchNormalization, ReLU nonlinearity 24 | Dropout (if dropout_p > 0) on the inputs 25 | """ 26 | conv = slim.conv2d_transpose(inputs, n_filters, kernel_size=[3, 3], stride=[strides, strides]) 27 | out = tf.nn.relu(slim.batch_norm(conv, fused=True)) 28 | if dropout_p != 0.0: 29 | out = slim.dropout(out, keep_prob=(1.0-dropout_p)) 30 | return out 31 | 32 | def build_custom(inputs, num_classes, frontend="ResNet101", weight_decay=1e-5, is_training=True, pretrained_dir="models"): 33 | 34 | 35 | logits, end_points, frontend_scope, init_fn = frontend_builder.build_frontend(inputs, frontend, is_training=is_training) 36 | 37 | up_1 = conv_transpose_block(end_points["pool2"], strides=4, n_filters=64) 38 | up_2 = conv_transpose_block(end_points["pool3"], strides=8, n_filters=64) 39 | up_3 = conv_transpose_block(end_points["pool4"], strides=16, n_filters=64) 40 | up_4 = conv_transpose_block(end_points["pool5"], strides=32, n_filters=64) 41 | 42 | features = tf.concat([up_1, up_2, up_3, up_4], axis=-1) 43 | 44 | features = conv_block(inputs=features, n_filters=256, filter_size=[1, 1]) 45 | 46 | features = conv_block(inputs=features, n_filters=64, filter_size=[3, 3]) 47 | features = conv_block(inputs=features, n_filters=64, filter_size=[3, 3]) 48 | features = conv_block(inputs=features, n_filters=64, filter_size=[3, 3]) 49 | 50 | 51 | net = slim.conv2d(features, num_classes, [1, 1], scope='logits') 52 | return net -------------------------------------------------------------------------------- /predict.py: -------------------------------------------------------------------------------- 1 | import os,time,cv2, sys, math 2 | import tensorflow as tf 3 | import argparse 4 | import numpy as np 5 | 6 | from utils import utils, helpers 7 | from builders import model_builder 8 | 9 | parser = argparse.ArgumentParser() 10 | parser.add_argument('--image', type=str, default=None, required=True, help='The image you want to predict on. ') 11 | parser.add_argument('--checkpoint_path', type=str, default=None, required=True, help='The path to the latest checkpoint weights for your model.') 12 | parser.add_argument('--crop_height', type=int, default=512, help='Height of cropped input image to network') 13 | parser.add_argument('--crop_width', type=int, default=512, help='Width of cropped input image to network') 14 | parser.add_argument('--model', type=str, default=None, required=True, help='The model you are using') 15 | parser.add_argument('--dataset', type=str, default="CamVid", required=False, help='The dataset you are using') 16 | args = parser.parse_args() 17 | 18 | class_names_list, label_values = helpers.get_label_info(os.path.join(args.dataset, "class_dict.csv")) 19 | 20 | num_classes = len(label_values) 21 | 22 | print("\n***** Begin prediction *****") 23 | print("Dataset -->", args.dataset) 24 | print("Model -->", args.model) 25 | print("Crop Height -->", args.crop_height) 26 | print("Crop Width -->", args.crop_width) 27 | print("Num Classes -->", num_classes) 28 | print("Image -->", args.image) 29 | 30 | # Initializing network 31 | config = tf.ConfigProto() 32 | config.gpu_options.allow_growth = True 33 | sess=tf.Session(config=config) 34 | 35 | net_input = tf.placeholder(tf.float32,shape=[None,None,None,3]) 36 | net_output = tf.placeholder(tf.float32,shape=[None,None,None,num_classes]) 37 | 38 | network, _ = model_builder.build_model(args.model, net_input=net_input, 39 | num_classes=num_classes, 40 | crop_width=args.crop_width, 41 | crop_height=args.crop_height, 42 | is_training=False) 43 | 44 | sess.run(tf.global_variables_initializer()) 45 | 46 | print('Loading model checkpoint weights') 47 | saver=tf.train.Saver(max_to_keep=1000) 48 | saver.restore(sess, args.checkpoint_path) 49 | 50 | 51 | print("Testing image " + args.image) 52 | 53 | loaded_image = utils.load_image(args.image) 54 | resized_image =cv2.resize(loaded_image, (args.crop_width, args.crop_height)) 55 | input_image = np.expand_dims(np.float32(resized_image[:args.crop_height, :args.crop_width]),axis=0)/255.0 56 | 57 | st = time.time() 58 | output_image = sess.run(network,feed_dict={net_input:input_image}) 59 | 60 | run_time = time.time()-st 61 | 62 | output_image = np.array(output_image[0,:,:,:]) 63 | output_image = helpers.reverse_one_hot(output_image) 64 | 65 | out_vis_image = helpers.colour_code_segmentation(output_image, label_values) 66 | file_name = utils.filepath_to_name(args.image) 67 | cv2.imwrite("%s_pred.png"%(file_name),cv2.cvtColor(np.uint8(out_vis_image), cv2.COLOR_RGB2BGR)) 68 | 69 | print("") 70 | print("Finished!") 71 | print("Wrote image " + "%s_pred.png"%(file_name)) 72 | -------------------------------------------------------------------------------- /test.py: -------------------------------------------------------------------------------- 1 | import os,time,cv2, sys, math 2 | import tensorflow as tf 3 | import argparse 4 | import numpy as np 5 | 6 | from utils import utils, helpers 7 | from builders import model_builder 8 | 9 | parser = argparse.ArgumentParser() 10 | parser.add_argument('--checkpoint_path', type=str, default=None, required=True, help='The path to the latest checkpoint weights for your model.') 11 | parser.add_argument('--crop_height', type=int, default=512, help='Height of cropped input image to network') 12 | parser.add_argument('--crop_width', type=int, default=512, help='Width of cropped input image to network') 13 | parser.add_argument('--model', type=str, default=None, required=True, help='The model you are using') 14 | parser.add_argument('--dataset', type=str, default="CamVid", required=False, help='The dataset you are using') 15 | args = parser.parse_args() 16 | 17 | # Get the names of the classes so we can record the evaluation results 18 | print("Retrieving dataset information ...") 19 | class_names_list, label_values = helpers.get_label_info(os.path.join(args.dataset, "class_dict.csv")) 20 | class_names_string = "" 21 | for class_name in class_names_list: 22 | if not class_name == class_names_list[-1]: 23 | class_names_string = class_names_string + class_name + ", " 24 | else: 25 | class_names_string = class_names_string + class_name 26 | 27 | num_classes = len(label_values) 28 | 29 | # Initializing network 30 | config = tf.ConfigProto() 31 | config.gpu_options.allow_growth = True 32 | sess=tf.Session(config=config) 33 | 34 | net_input = tf.placeholder(tf.float32,shape=[None,None,None,3]) 35 | net_output = tf.placeholder(tf.float32,shape=[None,None,None,num_classes]) 36 | 37 | network, _ = model_builder.build_model(args.model, net_input=net_input, num_classes=num_classes, crop_width=args.crop_width, crop_height=args.crop_height, is_training=False) 38 | 39 | sess.run(tf.global_variables_initializer()) 40 | 41 | print('Loading model checkpoint weights ...') 42 | saver=tf.train.Saver(max_to_keep=1000) 43 | saver.restore(sess, args.checkpoint_path) 44 | 45 | # Load the data 46 | print("Loading the data ...") 47 | train_input_names,train_output_names, val_input_names, val_output_names, test_input_names, test_output_names = utils.prepare_data(dataset_dir=args.dataset) 48 | 49 | # Create directories if needed 50 | if not os.path.isdir("%s"%("Test")): 51 | os.makedirs("%s"%("Test")) 52 | 53 | target=open("%s/test_scores.csv"%("Test"),'w') 54 | target.write("test_name, test_accuracy, precision, recall, f1 score, mean iou, %s\n" % (class_names_string)) 55 | scores_list = [] 56 | class_scores_list = [] 57 | precision_list = [] 58 | recall_list = [] 59 | f1_list = [] 60 | iou_list = [] 61 | run_times_list = [] 62 | 63 | # Run testing on ALL test images 64 | for ind in range(len(test_input_names)): 65 | sys.stdout.write("\rRunning test image %d / %d"%(ind+1, len(test_input_names))) 66 | sys.stdout.flush() 67 | 68 | input_image = np.expand_dims(np.float32(utils.load_image(test_input_names[ind])[:args.crop_height, :args.crop_width]),axis=0)/255.0 69 | gt = utils.load_image(test_output_names[ind])[:args.crop_height, :args.crop_width] 70 | gt = helpers.reverse_one_hot(helpers.one_hot_it(gt, label_values)) 71 | 72 | st = time.time() 73 | output_image = sess.run(network,feed_dict={net_input:input_image}) 74 | 75 | run_times_list.append(time.time()-st) 76 | 77 | output_image = np.array(output_image[0,:,:,:]) 78 | output_image = helpers.reverse_one_hot(output_image) 79 | out_vis_image = helpers.colour_code_segmentation(output_image, label_values) 80 | 81 | accuracy, class_accuracies, prec, rec, f1, iou = utils.evaluate_segmentation(pred=output_image, label=gt, num_classes=num_classes) 82 | 83 | file_name = utils.filepath_to_name(test_input_names[ind]) 84 | target.write("%s, %f, %f, %f, %f, %f"%(file_name, accuracy, prec, rec, f1, iou)) 85 | for item in class_accuracies: 86 | target.write(", %f"%(item)) 87 | target.write("\n") 88 | 89 | scores_list.append(accuracy) 90 | class_scores_list.append(class_accuracies) 91 | precision_list.append(prec) 92 | recall_list.append(rec) 93 | f1_list.append(f1) 94 | iou_list.append(iou) 95 | 96 | gt = helpers.colour_code_segmentation(gt, label_values) 97 | 98 | cv2.imwrite("%s/%s_pred.png"%("Test", file_name),cv2.cvtColor(np.uint8(out_vis_image), cv2.COLOR_RGB2BGR)) 99 | cv2.imwrite("%s/%s_gt.png"%("Test", file_name),cv2.cvtColor(np.uint8(gt), cv2.COLOR_RGB2BGR)) 100 | 101 | 102 | target.close() 103 | 104 | avg_score = np.mean(scores_list) 105 | class_avg_scores = np.mean(class_scores_list, axis=0) 106 | avg_precision = np.mean(precision_list) 107 | avg_recall = np.mean(recall_list) 108 | avg_f1 = np.mean(f1_list) 109 | avg_iou = np.mean(iou_list) 110 | avg_time = np.mean(run_times_list) 111 | print("Average test accuracy = ", avg_score) 112 | print("Average per class test accuracies = \n") 113 | for index, item in enumerate(class_avg_scores): 114 | print("%s = %f" % (class_names_list[index], item)) 115 | print("Average precision = ", avg_precision) 116 | print("Average recall = ", avg_recall) 117 | print("Average F1 score = ", avg_f1) 118 | print("Average mean IoU score = ", avg_iou) 119 | print("Average run time = ", avg_time) 120 | -------------------------------------------------------------------------------- /train.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function 2 | import os,time,cv2, sys, math 3 | import tensorflow as tf 4 | import tensorflow.contrib.slim as slim 5 | import numpy as np 6 | import time, datetime 7 | import argparse 8 | import random 9 | import os, sys 10 | import subprocess 11 | 12 | # use 'Agg' on matplotlib so that plots could be generated even without Xserver 13 | # running 14 | import matplotlib 15 | matplotlib.use('Agg') 16 | 17 | from utils import utils, helpers 18 | from builders import model_builder 19 | 20 | import matplotlib.pyplot as plt 21 | 22 | def str2bool(v): 23 | if v.lower() in ('yes', 'true', 't', 'y', '1'): 24 | return True 25 | elif v.lower() in ('no', 'false', 'f', 'n', '0'): 26 | return False 27 | else: 28 | raise argparse.ArgumentTypeError('Boolean value expected.') 29 | 30 | parser = argparse.ArgumentParser() 31 | parser.add_argument('--num_epochs', type=int, default=300, help='Number of epochs to train for') 32 | parser.add_argument('--epoch_start_i', type=int, default=0, help='Start counting epochs from this number') 33 | parser.add_argument('--checkpoint_step', type=int, default=5, help='How often to save checkpoints (epochs)') 34 | parser.add_argument('--validation_step', type=int, default=1, help='How often to perform validation (epochs)') 35 | parser.add_argument('--image', type=str, default=None, help='The image you want to predict on. Only valid in "predict" mode.') 36 | parser.add_argument('--continue_training', type=str2bool, default=False, help='Whether to continue training from a checkpoint') 37 | parser.add_argument('--dataset', type=str, default="CamVid", help='Dataset you are using.') 38 | parser.add_argument('--crop_height', type=int, default=512, help='Height of cropped input image to network') 39 | parser.add_argument('--crop_width', type=int, default=512, help='Width of cropped input image to network') 40 | parser.add_argument('--batch_size', type=int, default=1, help='Number of images in each batch') 41 | parser.add_argument('--num_val_images', type=int, default=20, help='The number of images to used for validations') 42 | parser.add_argument('--h_flip', type=str2bool, default=False, help='Whether to randomly flip the image horizontally for data augmentation') 43 | parser.add_argument('--v_flip', type=str2bool, default=False, help='Whether to randomly flip the image vertically for data augmentation') 44 | parser.add_argument('--brightness', type=float, default=None, help='Whether to randomly change the image brightness for data augmentation. Specifies the max bightness change as a factor between 0.0 and 1.0. For example, 0.1 represents a max brightness change of 10%% (+-).') 45 | parser.add_argument('--rotation', type=float, default=None, help='Whether to randomly rotate the image for data augmentation. Specifies the max rotation angle in degrees.') 46 | parser.add_argument('--model', type=str, default="FC-DenseNet56", help='The model you are using. See model_builder.py for supported models') 47 | parser.add_argument('--frontend', type=str, default="ResNet101", help='The frontend you are using. See frontend_builder.py for supported models') 48 | args = parser.parse_args() 49 | 50 | 51 | def data_augmentation(input_image, output_image): 52 | # Data augmentation 53 | input_image, output_image = utils.random_crop(input_image, output_image, args.crop_height, args.crop_width) 54 | 55 | if args.h_flip and random.randint(0,1): 56 | input_image = cv2.flip(input_image, 1) 57 | output_image = cv2.flip(output_image, 1) 58 | if args.v_flip and random.randint(0,1): 59 | input_image = cv2.flip(input_image, 0) 60 | output_image = cv2.flip(output_image, 0) 61 | if args.brightness: 62 | factor = 1.0 + random.uniform(-1.0*args.brightness, args.brightness) 63 | table = np.array([((i / 255.0) * factor) * 255 for i in np.arange(0, 256)]).astype(np.uint8) 64 | input_image = cv2.LUT(input_image, table) 65 | if args.rotation: 66 | angle = random.uniform(-1*args.rotation, args.rotation) 67 | if args.rotation: 68 | M = cv2.getRotationMatrix2D((input_image.shape[1]//2, input_image.shape[0]//2), angle, 1.0) 69 | input_image = cv2.warpAffine(input_image, M, (input_image.shape[1], input_image.shape[0]), flags=cv2.INTER_NEAREST) 70 | output_image = cv2.warpAffine(output_image, M, (output_image.shape[1], output_image.shape[0]), flags=cv2.INTER_NEAREST) 71 | 72 | return input_image, output_image 73 | 74 | 75 | # Get the names of the classes so we can record the evaluation results 76 | class_names_list, label_values = helpers.get_label_info(os.path.join(args.dataset, "class_dict.csv")) 77 | class_names_string = "" 78 | for class_name in class_names_list: 79 | if not class_name == class_names_list[-1]: 80 | class_names_string = class_names_string + class_name + ", " 81 | else: 82 | class_names_string = class_names_string + class_name 83 | 84 | num_classes = len(label_values) 85 | 86 | config = tf.ConfigProto() 87 | config.gpu_options.allow_growth = True 88 | sess=tf.Session(config=config) 89 | 90 | 91 | # Compute your softmax cross entropy loss 92 | net_input = tf.placeholder(tf.float32,shape=[None,None,None,3]) 93 | net_output = tf.placeholder(tf.float32,shape=[None,None,None,num_classes]) 94 | 95 | network, init_fn = model_builder.build_model(model_name=args.model, frontend=args.frontend, net_input=net_input, num_classes=num_classes, crop_width=args.crop_width, crop_height=args.crop_height, is_training=True) 96 | 97 | loss = tf.reduce_mean(tf.nn.softmax_cross_entropy_with_logits(logits=network, labels=net_output)) 98 | 99 | opt = tf.train.RMSPropOptimizer(learning_rate=0.0001, decay=0.995).minimize(loss, var_list=[var for var in tf.trainable_variables()]) 100 | 101 | saver=tf.train.Saver(max_to_keep=1000) 102 | sess.run(tf.global_variables_initializer()) 103 | 104 | utils.count_params() 105 | 106 | # If a pre-trained ResNet is required, load the weights. 107 | # This must be done AFTER the variables are initialized with sess.run(tf.global_variables_initializer()) 108 | if init_fn is not None: 109 | init_fn(sess) 110 | 111 | # Load a previous checkpoint if desired 112 | model_checkpoint_name = "checkpoints/latest_model_" + args.model + "_" + args.dataset + ".ckpt" 113 | if args.continue_training: 114 | print('Loaded latest model checkpoint') 115 | saver.restore(sess, model_checkpoint_name) 116 | 117 | # Load the data 118 | print("Loading the data ...") 119 | train_input_names,train_output_names, val_input_names, val_output_names, test_input_names, test_output_names = utils.prepare_data(dataset_dir=args.dataset) 120 | 121 | 122 | 123 | print("\n***** Begin training *****") 124 | print("Dataset -->", args.dataset) 125 | print("Model -->", args.model) 126 | print("Crop Height -->", args.crop_height) 127 | print("Crop Width -->", args.crop_width) 128 | print("Num Epochs -->", args.num_epochs) 129 | print("Batch Size -->", args.batch_size) 130 | print("Num Classes -->", num_classes) 131 | 132 | print("Data Augmentation:") 133 | print("\tVertical Flip -->", args.v_flip) 134 | print("\tHorizontal Flip -->", args.h_flip) 135 | print("\tBrightness Alteration -->", args.brightness) 136 | print("\tRotation -->", args.rotation) 137 | print("") 138 | 139 | avg_loss_per_epoch = [] 140 | avg_scores_per_epoch = [] 141 | avg_iou_per_epoch = [] 142 | 143 | # Which validation images do we want 144 | val_indices = [] 145 | num_vals = min(args.num_val_images, len(val_input_names)) 146 | 147 | # Set random seed to make sure models are validated on the same validation images. 148 | # So you can compare the results of different models more intuitively. 149 | random.seed(16) 150 | val_indices=random.sample(range(0,len(val_input_names)),num_vals) 151 | 152 | # Do the training here 153 | for epoch in range(args.epoch_start_i, args.num_epochs): 154 | 155 | current_losses = [] 156 | 157 | cnt=0 158 | 159 | # Equivalent to shuffling 160 | id_list = np.random.permutation(len(train_input_names)) 161 | 162 | num_iters = int(np.floor(len(id_list) / args.batch_size)) 163 | st = time.time() 164 | epoch_st=time.time() 165 | for i in range(num_iters): 166 | # st=time.time() 167 | 168 | input_image_batch = [] 169 | output_image_batch = [] 170 | 171 | # Collect a batch of images 172 | for j in range(args.batch_size): 173 | index = i*args.batch_size + j 174 | id = id_list[index] 175 | input_image = utils.load_image(train_input_names[id]) 176 | output_image = utils.load_image(train_output_names[id]) 177 | 178 | with tf.device('/cpu:0'): 179 | input_image, output_image = data_augmentation(input_image, output_image) 180 | 181 | 182 | # Prep the data. Make sure the labels are in one-hot format 183 | input_image = np.float32(input_image) / 255.0 184 | output_image = np.float32(helpers.one_hot_it(label=output_image, label_values=label_values)) 185 | 186 | input_image_batch.append(np.expand_dims(input_image, axis=0)) 187 | output_image_batch.append(np.expand_dims(output_image, axis=0)) 188 | 189 | if args.batch_size == 1: 190 | input_image_batch = input_image_batch[0] 191 | output_image_batch = output_image_batch[0] 192 | else: 193 | input_image_batch = np.squeeze(np.stack(input_image_batch, axis=1)) 194 | output_image_batch = np.squeeze(np.stack(output_image_batch, axis=1)) 195 | 196 | # Do the training 197 | _,current=sess.run([opt,loss],feed_dict={net_input:input_image_batch,net_output:output_image_batch}) 198 | current_losses.append(current) 199 | cnt = cnt + args.batch_size 200 | if cnt % 20 == 0: 201 | string_print = "Epoch = %d Count = %d Current_Loss = %.4f Time = %.2f"%(epoch,cnt,current,time.time()-st) 202 | utils.LOG(string_print) 203 | st = time.time() 204 | 205 | mean_loss = np.mean(current_losses) 206 | avg_loss_per_epoch.append(mean_loss) 207 | 208 | # Create directories if needed 209 | if not os.path.isdir("%s/%04d"%("checkpoints",epoch)): 210 | os.makedirs("%s/%04d"%("checkpoints",epoch)) 211 | 212 | # Save latest checkpoint to same file name 213 | print("Saving latest checkpoint") 214 | saver.save(sess,model_checkpoint_name) 215 | 216 | if val_indices != 0 and epoch % args.checkpoint_step == 0: 217 | print("Saving checkpoint for this epoch") 218 | saver.save(sess,"%s/%04d/model.ckpt"%("checkpoints",epoch)) 219 | 220 | 221 | if epoch % args.validation_step == 0: 222 | print("Performing validation") 223 | target=open("%s/%04d/val_scores.csv"%("checkpoints",epoch),'w') 224 | target.write("val_name, avg_accuracy, precision, recall, f1 score, mean iou, %s\n" % (class_names_string)) 225 | 226 | 227 | scores_list = [] 228 | class_scores_list = [] 229 | precision_list = [] 230 | recall_list = [] 231 | f1_list = [] 232 | iou_list = [] 233 | 234 | 235 | # Do the validation on a small set of validation images 236 | for ind in val_indices: 237 | 238 | input_image = np.expand_dims(np.float32(utils.load_image(val_input_names[ind])[:args.crop_height, :args.crop_width]),axis=0)/255.0 239 | gt = utils.load_image(val_output_names[ind])[:args.crop_height, :args.crop_width] 240 | gt = helpers.reverse_one_hot(helpers.one_hot_it(gt, label_values)) 241 | 242 | # st = time.time() 243 | 244 | output_image = sess.run(network,feed_dict={net_input:input_image}) 245 | 246 | 247 | output_image = np.array(output_image[0,:,:,:]) 248 | output_image = helpers.reverse_one_hot(output_image) 249 | out_vis_image = helpers.colour_code_segmentation(output_image, label_values) 250 | 251 | accuracy, class_accuracies, prec, rec, f1, iou = utils.evaluate_segmentation(pred=output_image, label=gt, num_classes=num_classes) 252 | 253 | file_name = utils.filepath_to_name(val_input_names[ind]) 254 | target.write("%s, %f, %f, %f, %f, %f"%(file_name, accuracy, prec, rec, f1, iou)) 255 | for item in class_accuracies: 256 | target.write(", %f"%(item)) 257 | target.write("\n") 258 | 259 | scores_list.append(accuracy) 260 | class_scores_list.append(class_accuracies) 261 | precision_list.append(prec) 262 | recall_list.append(rec) 263 | f1_list.append(f1) 264 | iou_list.append(iou) 265 | 266 | gt = helpers.colour_code_segmentation(gt, label_values) 267 | 268 | file_name = os.path.basename(val_input_names[ind]) 269 | file_name = os.path.splitext(file_name)[0] 270 | cv2.imwrite("%s/%04d/%s_pred.png"%("checkpoints",epoch, file_name),cv2.cvtColor(np.uint8(out_vis_image), cv2.COLOR_RGB2BGR)) 271 | cv2.imwrite("%s/%04d/%s_gt.png"%("checkpoints",epoch, file_name),cv2.cvtColor(np.uint8(gt), cv2.COLOR_RGB2BGR)) 272 | 273 | 274 | target.close() 275 | 276 | avg_score = np.mean(scores_list) 277 | class_avg_scores = np.mean(class_scores_list, axis=0) 278 | avg_scores_per_epoch.append(avg_score) 279 | avg_precision = np.mean(precision_list) 280 | avg_recall = np.mean(recall_list) 281 | avg_f1 = np.mean(f1_list) 282 | avg_iou = np.mean(iou_list) 283 | avg_iou_per_epoch.append(avg_iou) 284 | 285 | print("\nAverage validation accuracy for epoch # %04d = %f"% (epoch, avg_score)) 286 | print("Average per class validation accuracies for epoch # %04d:"% (epoch)) 287 | for index, item in enumerate(class_avg_scores): 288 | print("%s = %f" % (class_names_list[index], item)) 289 | print("Validation precision = ", avg_precision) 290 | print("Validation recall = ", avg_recall) 291 | print("Validation F1 score = ", avg_f1) 292 | print("Validation IoU score = ", avg_iou) 293 | 294 | epoch_time=time.time()-epoch_st 295 | remain_time=epoch_time*(args.num_epochs-1-epoch) 296 | m, s = divmod(remain_time, 60) 297 | h, m = divmod(m, 60) 298 | if s!=0: 299 | train_time="Remaining training time = %d hours %d minutes %d seconds\n"%(h,m,s) 300 | else: 301 | train_time="Remaining training time : Training completed.\n" 302 | utils.LOG(train_time) 303 | scores_list = [] 304 | 305 | 306 | fig1, ax1 = plt.subplots(figsize=(11, 8)) 307 | 308 | ax1.plot(range(epoch+1), avg_scores_per_epoch) 309 | ax1.set_title("Average validation accuracy vs epochs") 310 | ax1.set_xlabel("Epoch") 311 | ax1.set_ylabel("Avg. val. accuracy") 312 | 313 | 314 | plt.savefig('accuracy_vs_epochs.png') 315 | 316 | plt.clf() 317 | 318 | fig2, ax2 = plt.subplots(figsize=(11, 8)) 319 | 320 | ax2.plot(range(epoch+1), avg_loss_per_epoch) 321 | ax2.set_title("Average loss vs epochs") 322 | ax2.set_xlabel("Epoch") 323 | ax2.set_ylabel("Current loss") 324 | 325 | plt.savefig('loss_vs_epochs.png') 326 | 327 | plt.clf() 328 | 329 | fig3, ax3 = plt.subplots(figsize=(11, 8)) 330 | 331 | ax3.plot(range(epoch+1), avg_iou_per_epoch) 332 | ax3.set_title("Average IoU vs epochs") 333 | ax3.set_xlabel("Epoch") 334 | ax3.set_ylabel("Current IoU") 335 | 336 | plt.savefig('iou_vs_epochs.png') 337 | 338 | 339 | 340 | -------------------------------------------------------------------------------- /utils/__init__.py: -------------------------------------------------------------------------------- https://raw.githubusercontent.com/awangenh/Weed-Mapping/72526ebbc2abe3b9d35672689de25a321e36b039/utils/__init__.py -------------------------------------------------------------------------------- /utils/get_pretrained_checkpoints.py: -------------------------------------------------------------------------------- 1 | import subprocess 2 | import argparse 3 | 4 | parser = argparse.ArgumentParser() 5 | parser.add_argument('--model', type=str, default="ALL", help='Which model weights to download') 6 | args = parser.parse_args() 7 | 8 | 9 | if args.model == "ResNet50" or args.model == "ALL": 10 | subprocess.check_output(['wget','http://download.tensorflow.org/models/resnet_v2_50_2017_04_14.tar.gz', "-P", "models"]) 11 | try: 12 | subprocess.check_output(['tar', '-xvf', 'models/resnet_v2_50_2017_04_14.tar.gz', "-C", "models"]) 13 | subprocess.check_output(['rm', 'models/resnet_v2_50_2017_04_14.tar.gz']) 14 | except Exception as e: 15 | print(e) 16 | pass 17 | 18 | if args.model == "ResNet101" or args.model == "ALL": 19 | subprocess.check_output(['wget','http://download.tensorflow.org/models/resnet_v2_101_2017_04_14.tar.gz', "-P", "models"]) 20 | try: 21 | subprocess.check_output(['tar', '-xvf', 'models/resnet_v2_101_2017_04_14.tar.gz', "-C", "models"]) 22 | subprocess.check_output(['rm', 'models/resnet_v2_101_2017_04_14.tar.gz']) 23 | except Exception as e: 24 | print(e) 25 | pass 26 | 27 | if args.model == "ResNet152" or args.model == "ALL": 28 | subprocess.check_output(['wget','http://download.tensorflow.org/models/resnet_v2_152_2017_04_14.tar.gz', "-P", "models"]) 29 | try: 30 | subprocess.check_output(['tar', '-xvf', 'models/resnet_v2_152_2017_04_14.tar.gz', "-C", "models"]) 31 | subprocess.check_output(['rm', 'models/resnet_v2_152_2017_04_14.tar.gz']) 32 | except Exception as e: 33 | print(e) 34 | pass 35 | 36 | if args.model == "MobileNetV2" or args.model == "ALL": 37 | subprocess.check_output(['wget','https://storage.googleapis.com/mobilenet_v2/checkpoints/mobilenet_v2_1.4_224.tgz', "-P", "models"]) 38 | try: 39 | subprocess.check_output(['tar', '-xvf', 'models/mobilenet_v2_1.4_224.tgz', "-C", "models"]) 40 | subprocess.check_output(['rm', 'models/mobilenet_v2_1.4_224.tgz']) 41 | except Exception as e: 42 | print(e) 43 | pass 44 | 45 | if args.model == "InceptionV4" or args.model == "ALL": 46 | subprocess.check_output( 47 | ['wget', 'http://download.tensorflow.org/models/inception_v4_2016_09_09.tar.gz', "-P", "models"]) 48 | try: 49 | subprocess.check_output(['tar', '-xvf', 'models/inception_v4_2016_09_09.tar.gz', "-C", "models"]) 50 | subprocess.check_output(['rm', 'models/inception_v4_2016_09_09.tar.gz']) 51 | except Exception as e: 52 | print(e) 53 | pass 54 | -------------------------------------------------------------------------------- /utils/helpers.py: -------------------------------------------------------------------------------- 1 | import cv2 2 | import numpy as np 3 | import itertools 4 | import operator 5 | import os, csv 6 | import tensorflow as tf 7 | 8 | import time, datetime 9 | 10 | def get_label_info(csv_path): 11 | """ 12 | Retrieve the class names and label values for the selected dataset. 13 | Must be in CSV format! 14 | 15 | # Arguments 16 | csv_path: The file path of the class dictionairy 17 | 18 | # Returns 19 | Two lists: one for the class names and the other for the label values 20 | """ 21 | filename, file_extension = os.path.splitext(csv_path) 22 | if not file_extension == ".csv": 23 | return ValueError("File is not a CSV!") 24 | 25 | class_names = [] 26 | label_values = [] 27 | with open(csv_path, 'r') as csvfile: 28 | file_reader = csv.reader(csvfile, delimiter=',') 29 | header = next(file_reader) 30 | for row in file_reader: 31 | class_names.append(row[0]) 32 | label_values.append([int(row[1]), int(row[2]), int(row[3])]) 33 | # print(class_dict) 34 | return class_names, label_values 35 | 36 | 37 | def one_hot_it(label, label_values): 38 | """ 39 | Convert a segmentation image label array to one-hot format 40 | by replacing each pixel value with a vector of length num_classes 41 | 42 | # Arguments 43 | label: The 2D array segmentation image label 44 | label_values 45 | 46 | # Returns 47 | A 2D array with the same width and hieght as the input, but 48 | with a depth size of num_classes 49 | """ 50 | # st = time.time() 51 | # w = label.shape[0] 52 | # h = label.shape[1] 53 | # num_classes = len(class_dict) 54 | # x = np.zeros([w,h,num_classes]) 55 | # unique_labels = sortedlist((class_dict.values())) 56 | # for i in range(0, w): 57 | # for j in range(0, h): 58 | # index = unique_labels.index(list(label[i][j][:])) 59 | # x[i,j,index]=1 60 | # print("Time 1 = ", time.time() - st) 61 | 62 | # st = time.time() 63 | # https://stackoverflow.com/questions/46903885/map-rgb-semantic-maps-to-one-hot-encodings-and-vice-versa-in-tensorflow 64 | # https://stackoverflow.com/questions/14859458/how-to-check-if-all-values-in-the-columns-of-a-numpy-matrix-are-the-same 65 | semantic_map = [] 66 | for colour in label_values: 67 | # colour_map = np.full((label.shape[0], label.shape[1], label.shape[2]), colour, dtype=int) 68 | equality = np.equal(label, colour) 69 | class_map = np.all(equality, axis = -1) 70 | semantic_map.append(class_map) 71 | semantic_map = np.stack(semantic_map, axis=-1) 72 | # print("Time 2 = ", time.time() - st) 73 | 74 | return semantic_map 75 | 76 | def reverse_one_hot(image): 77 | """ 78 | Transform a 2D array in one-hot format (depth is num_classes), 79 | to a 2D array with only 1 channel, where each pixel value is 80 | the classified class key. 81 | 82 | # Arguments 83 | image: The one-hot format image 84 | 85 | # Returns 86 | A 2D array with the same width and hieght as the input, but 87 | with a depth size of 1, where each pixel value is the classified 88 | class key. 89 | """ 90 | # w = image.shape[0] 91 | # h = image.shape[1] 92 | # x = np.zeros([w,h,1]) 93 | 94 | # for i in range(0, w): 95 | # for j in range(0, h): 96 | # index, value = max(enumerate(image[i, j, :]), key=operator.itemgetter(1)) 97 | # x[i, j] = index 98 | 99 | x = np.argmax(image, axis = -1) 100 | return x 101 | 102 | 103 | def colour_code_segmentation(image, label_values): 104 | """ 105 | Given a 1-channel array of class keys, colour code the segmentation results. 106 | 107 | # Arguments 108 | image: single channel array where each value represents the class key. 109 | label_values 110 | 111 | # Returns 112 | Colour coded image for segmentation visualization 113 | """ 114 | 115 | # w = image.shape[0] 116 | # h = image.shape[1] 117 | # x = np.zeros([w,h,3]) 118 | # colour_codes = label_values 119 | # for i in range(0, w): 120 | # for j in range(0, h): 121 | # x[i, j, :] = colour_codes[int(image[i, j])] 122 | 123 | colour_codes = np.array(label_values) 124 | x = colour_codes[image.astype(int)] 125 | 126 | return x 127 | 128 | # class_dict = get_class_dict("CamVid/class_dict.csv") 129 | # gt = cv2.imread("CamVid/test_labels/0001TP_007170_L.png",-1) 130 | # gt = reverse_one_hot(one_hot_it(gt, class_dict)) 131 | # gt = colour_code_segmentation(gt, class_dict) 132 | 133 | # file_name = "gt_test.png" 134 | # cv2.imwrite(file_name,np.uint8(gt)) -------------------------------------------------------------------------------- /utils/utils.py: -------------------------------------------------------------------------------- 1 | from __future__ import print_function, division 2 | import os,time,cv2, sys, math 3 | import tensorflow as tf 4 | import tensorflow.contrib.slim as slim 5 | import numpy as np 6 | import time, datetime 7 | import os, random 8 | from scipy.misc import imread 9 | import ast 10 | from sklearn.metrics import precision_score, \ 11 | recall_score, confusion_matrix, classification_report, \ 12 | accuracy_score, f1_score 13 | 14 | from utils import helpers 15 | 16 | def prepare_data(dataset_dir): 17 | train_input_names=[] 18 | train_output_names=[] 19 | val_input_names=[] 20 | val_output_names=[] 21 | test_input_names=[] 22 | test_output_names=[] 23 | for file in os.listdir(dataset_dir + "/train"): 24 | cwd = os.getcwd() 25 | train_input_names.append(cwd + "/" + dataset_dir + "/train/" + file) 26 | for file in os.listdir(dataset_dir + "/train_labels"): 27 | cwd = os.getcwd() 28 | train_output_names.append(cwd + "/" + dataset_dir + "/train_labels/" + file) 29 | for file in os.listdir(dataset_dir + "/val"): 30 | cwd = os.getcwd() 31 | val_input_names.append(cwd + "/" + dataset_dir + "/val/" + file) 32 | for file in os.listdir(dataset_dir + "/val_labels"): 33 | cwd = os.getcwd() 34 | val_output_names.append(cwd + "/" + dataset_dir + "/val_labels/" + file) 35 | for file in os.listdir(dataset_dir + "/test"): 36 | cwd = os.getcwd() 37 | test_input_names.append(cwd + "/" + dataset_dir + "/test/" + file) 38 | for file in os.listdir(dataset_dir + "/test_labels"): 39 | cwd = os.getcwd() 40 | test_output_names.append(cwd + "/" + dataset_dir + "/test_labels/" + file) 41 | train_input_names.sort(),train_output_names.sort(), val_input_names.sort(), val_output_names.sort(), test_input_names.sort(), test_output_names.sort() 42 | return train_input_names,train_output_names, val_input_names, val_output_names, test_input_names, test_output_names 43 | 44 | def load_image(path): 45 | image = cv2.cvtColor(cv2.imread(path,-1), cv2.COLOR_BGR2RGB) 46 | return image 47 | 48 | # Takes an absolute file path and returns the name of the file without th extension 49 | def filepath_to_name(full_name): 50 | file_name = os.path.basename(full_name) 51 | file_name = os.path.splitext(file_name)[0] 52 | return file_name 53 | 54 | # Print with time. To console or file 55 | def LOG(X, f=None): 56 | time_stamp = datetime.datetime.now().strftime("[%Y-%m-%d %H:%M:%S]") 57 | if not f: 58 | print(time_stamp + " " + X) 59 | else: 60 | f.write(time_stamp + " " + X) 61 | 62 | 63 | # Count total number of parameters in the model 64 | def count_params(): 65 | total_parameters = 0 66 | for variable in tf.trainable_variables(): 67 | shape = variable.get_shape() 68 | variable_parameters = 1 69 | for dim in shape: 70 | variable_parameters *= dim.value 71 | total_parameters += variable_parameters 72 | print("This model has %d trainable parameters"% (total_parameters)) 73 | 74 | # Subtracts the mean images from ImageNet 75 | def mean_image_subtraction(inputs, means=[123.68, 116.78, 103.94]): 76 | inputs=tf.to_float(inputs) 77 | num_channels = inputs.get_shape().as_list()[-1] 78 | if len(means) != num_channels: 79 | raise ValueError('len(means) must match the number of channels') 80 | channels = tf.split(axis=3, num_or_size_splits=num_channels, value=inputs) 81 | for i in range(num_channels): 82 | channels[i] -= means[i] 83 | return tf.concat(axis=3, values=channels) 84 | 85 | def _lovasz_grad(gt_sorted): 86 | """ 87 | Computes gradient of the Lovasz extension w.r.t sorted errors 88 | See Alg. 1 in paper 89 | """ 90 | gts = tf.reduce_sum(gt_sorted) 91 | intersection = gts - tf.cumsum(gt_sorted) 92 | union = gts + tf.cumsum(1. - gt_sorted) 93 | jaccard = 1. - intersection / union 94 | jaccard = tf.concat((jaccard[0:1], jaccard[1:] - jaccard[:-1]), 0) 95 | return jaccard 96 | 97 | def _flatten_probas(probas, labels, ignore=None, order='BHWC'): 98 | """ 99 | Flattens predictions in the batch 100 | """ 101 | if order == 'BCHW': 102 | probas = tf.transpose(probas, (0, 2, 3, 1), name="BCHW_to_BHWC") 103 | order = 'BHWC' 104 | if order != 'BHWC': 105 | raise NotImplementedError('Order {} unknown'.format(order)) 106 | C = probas.shape[3] 107 | probas = tf.reshape(probas, (-1, C)) 108 | labels = tf.reshape(labels, (-1,)) 109 | if ignore is None: 110 | return probas, labels 111 | valid = tf.not_equal(labels, ignore) 112 | vprobas = tf.boolean_mask(probas, valid, name='valid_probas') 113 | vlabels = tf.boolean_mask(labels, valid, name='valid_labels') 114 | return vprobas, vlabels 115 | 116 | def _lovasz_softmax_flat(probas, labels, only_present=True): 117 | """ 118 | Multi-class Lovasz-Softmax loss 119 | probas: [P, C] Variable, class probabilities at each prediction (between 0 and 1) 120 | labels: [P] Tensor, ground truth labels (between 0 and C - 1) 121 | only_present: average only on classes present in ground truth 122 | """ 123 | C = probas.shape[1] 124 | losses = [] 125 | present = [] 126 | for c in range(C): 127 | fg = tf.cast(tf.equal(labels, c), probas.dtype) # foreground for class c 128 | if only_present: 129 | present.append(tf.reduce_sum(fg) > 0) 130 | errors = tf.abs(fg - probas[:, c]) 131 | errors_sorted, perm = tf.nn.top_k(errors, k=tf.shape(errors)[0], name="descending_sort_{}".format(c)) 132 | fg_sorted = tf.gather(fg, perm) 133 | grad = _lovasz_grad(fg_sorted) 134 | losses.append( 135 | tf.tensordot(errors_sorted, tf.stop_gradient(grad), 1, name="loss_class_{}".format(c)) 136 | ) 137 | losses_tensor = tf.stack(losses) 138 | if only_present: 139 | present = tf.stack(present) 140 | losses_tensor = tf.boolean_mask(losses_tensor, present) 141 | return losses_tensor 142 | 143 | def lovasz_softmax(probas, labels, only_present=True, per_image=False, ignore=None, order='BHWC'): 144 | """ 145 | Multi-class Lovasz-Softmax loss 146 | probas: [B, H, W, C] or [B, C, H, W] Variable, class probabilities at each prediction (between 0 and 1) 147 | labels: [B, H, W] Tensor, ground truth labels (between 0 and C - 1) 148 | only_present: average only on classes present in ground truth 149 | per_image: compute the loss per image instead of per batch 150 | ignore: void class labels 151 | order: use BHWC or BCHW 152 | """ 153 | probas = tf.nn.softmax(probas, 3) 154 | labels = helpers.reverse_one_hot(labels) 155 | 156 | if per_image: 157 | def treat_image(prob, lab): 158 | prob, lab = tf.expand_dims(prob, 0), tf.expand_dims(lab, 0) 159 | prob, lab = _flatten_probas(prob, lab, ignore, order) 160 | return _lovasz_softmax_flat(prob, lab, only_present=only_present) 161 | losses = tf.map_fn(treat_image, (probas, labels), dtype=tf.float32) 162 | else: 163 | losses = _lovasz_softmax_flat(*_flatten_probas(probas, labels, ignore, order), only_present=only_present) 164 | return losses 165 | 166 | 167 | # Randomly crop the image to a specific size. For data augmentation 168 | def random_crop(image, label, crop_height, crop_width): 169 | if (image.shape[0] != label.shape[0]) or (image.shape[1] != label.shape[1]): 170 | raise Exception('Image and label must have the same dimensions!') 171 | 172 | if (crop_width <= image.shape[1]) and (crop_height <= image.shape[0]): 173 | x = random.randint(0, image.shape[1]-crop_width) 174 | y = random.randint(0, image.shape[0]-crop_height) 175 | 176 | if len(label.shape) == 3: 177 | return image[y:y+crop_height, x:x+crop_width, :], label[y:y+crop_height, x:x+crop_width, :] 178 | else: 179 | return image[y:y+crop_height, x:x+crop_width, :], label[y:y+crop_height, x:x+crop_width] 180 | else: 181 | raise Exception('Crop shape (%d, %d) exceeds image dimensions (%d, %d)!' % (crop_height, crop_width, image.shape[0], image.shape[1])) 182 | 183 | # Compute the average segmentation accuracy across all classes 184 | def compute_global_accuracy(pred, label): 185 | total = len(label) 186 | count = 0.0 187 | for i in range(total): 188 | if pred[i] == label[i]: 189 | count = count + 1.0 190 | return float(count) / float(total) 191 | 192 | # Compute the class-specific segmentation accuracy 193 | def compute_class_accuracies(pred, label, num_classes): 194 | total = [] 195 | for val in range(num_classes): 196 | total.append((label == val).sum()) 197 | 198 | count = [0.0] * num_classes 199 | for i in range(len(label)): 200 | if pred[i] == label[i]: 201 | count[int(pred[i])] = count[int(pred[i])] + 1.0 202 | 203 | # If there are no pixels from a certain class in the GT, 204 | # it returns NAN because of divide by zero 205 | # Replace the nans with a 1.0. 206 | accuracies = [] 207 | for i in range(len(total)): 208 | if total[i] == 0: 209 | accuracies.append(1.0) 210 | else: 211 | accuracies.append(count[i] / total[i]) 212 | 213 | return accuracies 214 | 215 | 216 | def compute_mean_iou(pred, label): 217 | 218 | unique_labels = np.unique(label) 219 | num_unique_labels = len(unique_labels); 220 | 221 | I = np.zeros(num_unique_labels) 222 | U = np.zeros(num_unique_labels) 223 | 224 | for index, val in enumerate(unique_labels): 225 | pred_i = pred == val 226 | label_i = label == val 227 | 228 | I[index] = float(np.sum(np.logical_and(label_i, pred_i))) 229 | U[index] = float(np.sum(np.logical_or(label_i, pred_i))) 230 | 231 | 232 | mean_iou = np.mean(I / U) 233 | return mean_iou 234 | 235 | 236 | def evaluate_segmentation(pred, label, num_classes, score_averaging="weighted"): 237 | flat_pred = pred.flatten() 238 | flat_label = label.flatten() 239 | 240 | global_accuracy = compute_global_accuracy(flat_pred, flat_label) 241 | class_accuracies = compute_class_accuracies(flat_pred, flat_label, num_classes) 242 | 243 | prec = precision_score(flat_pred, flat_label, average=score_averaging) 244 | rec = recall_score(flat_pred, flat_label, average=score_averaging) 245 | f1 = f1_score(flat_pred, flat_label, average=score_averaging) 246 | 247 | iou = compute_mean_iou(flat_pred, flat_label) 248 | 249 | return global_accuracy, class_accuracies, prec, rec, f1, iou 250 | 251 | 252 | def compute_class_weights(labels_dir, label_values): 253 | ''' 254 | Arguments: 255 | labels_dir(list): Directory where the image segmentation labels are 256 | num_classes(int): the number of classes of pixels in all images 257 | 258 | Returns: 259 | class_weights(list): a list of class weights where each index represents each class label and the element is the class weight for that label. 260 | 261 | ''' 262 | image_files = [os.path.join(labels_dir, file) for file in os.listdir(labels_dir) if file.endswith('.png')] 263 | 264 | num_classes = len(label_values) 265 | 266 | class_pixels = np.zeros(num_classes) 267 | 268 | total_pixels = 0.0 269 | 270 | for n in range(len(image_files)): 271 | image = imread(image_files[n]) 272 | 273 | for index, colour in enumerate(label_values): 274 | class_map = np.all(np.equal(image, colour), axis = -1) 275 | class_map = class_map.astype(np.float32) 276 | class_pixels[index] += np.sum(class_map) 277 | 278 | 279 | print("\rProcessing image: " + str(n) + " / " + str(len(image_files)), end="") 280 | sys.stdout.flush() 281 | 282 | total_pixels = float(np.sum(class_pixels)) 283 | index_to_delete = np.argwhere(class_pixels==0.0) 284 | class_pixels = np.delete(class_pixels, index_to_delete) 285 | 286 | class_weights = total_pixels / class_pixels 287 | class_weights = class_weights / np.sum(class_weights) 288 | 289 | return class_weights 290 | 291 | # Compute the memory usage, for debugging 292 | def memory(): 293 | import os 294 | import psutil 295 | pid = os.getpid() 296 | py = psutil.Process(pid) 297 | memoryUse = py.memory_info()[0]/2.**30 # Memory use in GB 298 | print('Memory usage in GBs:', memoryUse) 299 | 300 | --------------------------------------------------------------------------------